From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752297AbcAaH1s (ORCPT ); Sun, 31 Jan 2016 02:27:48 -0500 Received: from mail-pa0-f50.google.com ([209.85.220.50]:33619 "EHLO mail-pa0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751052AbcAaH1r (ORCPT ); Sun, 31 Jan 2016 02:27:47 -0500 From: Thomas Rohwer X-Google-Original-From: Thomas Rohwer Subject: Re: [PATCH] Optimize int_sqrt for small values for faster idle To: Andi Kleen , akpm@linux-foundation.org References: <1454017365-8509-1-git-send-email-andi@firstfloor.org> Cc: linux-kernel@vger.kernel.org, davidlohr.bueso@hp.com, rafael.j.wysocki@intel.com, lenb@kernel.org, Andi Kleen Newsgroups: gmane.linux.kernel Message-ID: <56ADB774.8080708@gmail.com> Date: Sun, 31 Jan 2016 15:27:48 +0800 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 In-Reply-To: <1454017365-8509-1-git-send-email-andi@firstfloor.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, > - m = 1UL << (BITS_PER_LONG - 2); > + if (x <= 0xffff) { > + if (m <= 0xff) > + m = 1UL << (8 - 2); > + else > + m = 1UL << (16 - 2); > + } else if (x <= 0xffffffff) > + m = 1UL << (32 - 2); > + else > + m = 1UL << (BITS_PER_LONG - 2); > while (m != 0) { > b = y + m; > y >>= 1; > I think, m can be initialized with 1 << (greatest multiple of 2 less than or equal to (position of most significant bit of x)) i.e. 1 << ((position of most significant bit of x) & 62) without changing the outcome of the original algorithm (as long as x>= 2). I believe, that for (position of most significant bit of x) there is an efficient macro, and some processors directly have an instruction for it. So this would probably be faster than your suggestion for an initial starting value and give an even better starting value (cutting in some cases further on the number of while loop interations). If one just wants to achieve a result with a certain relative error in terms of the fraction of the input, one can probably only look at the most significant bit and a few following bits of x. Sincerely, Thomas