From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932387AbaLDO47 (ORCPT ); Thu, 4 Dec 2014 09:56:59 -0500 Received: from mout.kundenserver.de ([212.227.126.187]:55099 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932128AbaLDO46 (ORCPT ); Thu, 4 Dec 2014 09:56:58 -0500 From: Arnd Bergmann To: Nicolas Pitre Cc: linux-arm-kernel@lists.infradead.org, Thomas Gleixner , John Stultz , linux-kernel@vger.kernel.org Subject: Re: [PATCH] optimize ktime_divns for constant divisors Date: Thu, 04 Dec 2014 15:56:47 +0100 Message-ID: <2362381.LDAGLC19vb@wuerfel> User-Agent: KMail/4.11.5 (Linux/3.16.0-10-generic; KDE/4.11.5; x86_64; ; ) In-Reply-To: References: <2165831.DQoLFmGhIf@wuerfel> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Provags-ID: V03:K0:6Fv7MVajYT45HH5qaKRsC5oqTQbGdWHxsXwYNq/XaWESBP2kIru zBoh3BIBeqRnJy/HefAO9DUo6rEaiyllR24F05RvFG8f1bq53lq/SM8983zDwHITbk7LgVC mwEe0i8jlI7GHhNnokB/RPuN5Nne4BlF1kRNQfEpi1nQRrcQZJ/CjyWc80xJDXoGNqyZEXd ryTE7gR0Ya0aJHDIekmGA== X-UI-Out-Filterresults: notjunk:1; Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thursday 04 December 2014 08:46:27 Nicolas Pitre wrote: > On Thu, 4 Dec 2014, Arnd Bergmann wrote: > Note the above code is for 32-bit architectures that support a 32x32=64 > bit multiply instruction. And even then, what kills performances is the > inhability to efficiently deal with carry bits from C code. Hence the > far better output from do_div() on ARM. > > If x86-64 has a 64x64=128 bit multiply instruction then the above may > greatly be simplified to a single multiply and a shift. That would > possibly outperform do_div(). I was trying this in 32-bit mode to see how it would work in x86-32 kernels. Since that architecture has a 64-by-32 divide instruction, that gets used here. x86-64 has a 64x64=128 multiply instruction and gcc uses that for any 64-bit division by constant, so that's what already happens in do_div. I assume for any 64-bit architecture, the result will be similar. I guess the only architectures that would benefit from your implementation above are the ones that do not have any optimization for constant 64-by-32-bit division and just call do_div. > > On a related note, I wonder if we can come up with a more efficient > > implementation for do_div on ARMv7ve, and I think we should add the > > Makefile logic to build with -march=armv7ve when we know that we do > > not need to support processors without idiv. > > Multiplications will always be faster than divisions. However the idiv > instruction would come very handy in the slow path when the divisor is > not constant. Makes sense. I also just checked the gcc sources and it seems that the idiv/udiv instructions on ARM are not even used for implementing __aeabi_uldivmod there. Not sure if that's intentional, but we probably don't need to bother optimizing this in the kernel before user space does. Building with -march=armv7ve still sounds helpful to avoid the __aeabi_uidiv calls though. Arnd