From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752068AbcFUWAy (ORCPT ); Tue, 21 Jun 2016 18:00:54 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:41659 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752024AbcFUWAw (ORCPT ); Tue, 21 Jun 2016 18:00:52 -0400 Date: Tue, 21 Jun 2016 14:52:37 -0700 From: Andrew Morton To: Yury Norov Cc: masmart@yandex.ru, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cl@linux.com, enberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, linux@rasmusvillemoes.dk, Alexey Klimov Subject: Re: [PATCH] mm: slab.h: use ilog2() in kmalloc_index() Message-Id: <20160621145237.dae264ea5fe6b3b7f2d2d4e6@linux-foundation.org> In-Reply-To: <1466465586-22096-1-git-send-email-yury.norov@gmail.com> References: <1466465586-22096-1-git-send-email-yury.norov@gmail.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 21 Jun 2016 02:33:06 +0300 Yury Norov wrote: > kmalloc_index() uses simple straightforward way to calculate > bit position of nearest or equal upper power of 2. > This effectively results in generation of 24 episodes of > compare-branch instructions in assembler. > > There is shorter way to calculate this: fls(size - 1). > > The patch removes hard-coded calculation of kmalloc slab and > uses ilog2() instead that works on top of fls(). ilog2 is used > with intention that compiler also might optimize constant case > during compile time if it detects that. > > BUG() is moved to the beginning of function. We left it here to > provide identical behaviour to previous version. It may be removed > if there's no requirement in it anymore. > > While we're at this, fix comment that describes return value. kmalloc_index() is always called with a constant-valued `size' (see __builtin_constant_p() tests) so the compiler will evaluate the switch statement at compile-time. This will be more efficient than calling fls() at runtime.