Re: [PATCH v2 1/3] lib: find_*_bit reimplementation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
To: yury.norov@gmail.com
Cc: klimov.linux@gmail.com, davem@davemloft.net,
	akpm@linux-foundation.org, hannes@stressinduktion.org,
	dborkman@redhat.com, laijs@cn.fujitsu.com,
	takahiro.akashi@linaro.org, valentinrothberg@gmail.com,
	linux@horizon.com, msalter@redhat.com, chris@chris-wilson.co.uk,
	tgraf@suug.ch, linux-kernel@vger.kernel.org,
	Yury Norov <y.norov@samsung.com>
Subject: Re: [PATCH v2 1/3] lib: find_*_bit reimplementation
Date: Mon, 02 Feb 2015 11:43:49 +0100	[thread overview]
Message-ID: <878uggr3tm.fsf@rasmusvillemoes.dk> (raw)
In-Reply-To: <1422737907-26114-1-git-send-email-yury.norov@gmail.com> (yury norov's message of "Sat, 31 Jan 2015 23:58:25 +0300")

On Sat, Jan 31 2015, yury.norov@gmail.com wrote:

> From: Yury Norov <y.norov@samsung.com>
>
> New implementations takes less space in source file (see diffstat)
> and in object. For me it's 710 vs 453 bytes of text.
>

New version generally looks good. Please include a summary of the
changes between the versions either below the --- line or in a 0/n cover
letter, especially since you've now expanded the scope of the series.

Comments below.

>
> Patch was boot-tested on x86_64 and MIPS (big-endian) machines.
> Performance tests were ran on userspace with code like this:
>
> 	/* addr[] is filled from /dev/urandom */
> 	start = clock();
> 	while (ret < nbits)
> 		ret = find_next_bit(addr, nbits, ret + 1);
>
> 	end = clock();
> 	printf("%ld\t", (unsigned long) end - start);
>
> On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
> (for find_next_bit, nbits is 8M, for find_first_bit - 80K)
>
> 	find_next_bit:		find_first_bit:
> 	new	current		new	current
> 	26932	43151		14777	14925
> 	26947	43182		14521	15423
> 	26507	43824		15053	14705
> 	27329	43759		14473	14777
> 	26895	43367		14847	15023
> 	26990	43693		15103	15163
> 	26775	43299		15067	15232
> 	27282	42752		14544	15121
> 	27504	43088		14644	14858
> 	26761	43856		14699	15193
> 	26692	43075		14781	14681
> 	27137	42969		14451	15061
> 	...			...
>
> find_next_bit performance gain is 35-40%;
> find_first_bit - no measurable difference.
>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/find_last_bit.c |  31 ++-----
>  lib/find_next_bit.c | 254 +++++++++++++++-------------------------------------
>  2 files changed, 79 insertions(+), 206 deletions(-)
>
> diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
> index 91ca09f..e67e970 100644
> --- a/lib/find_last_bit.c
> +++ b/lib/find_last_bit.c
> @@ -4,44 +4,29 @@
>   * Written by Rusty Russell <rusty@rustcorp.com.au>
>   * (Inspired by David Howell's find_next_bit implementation)
>   *
> + * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
> + * size and improve performance, 2015.
> + *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
>   * as published by the Free Software Foundation; either version
>   * 2 of the License, or (at your option) any later version.
>   */
>  
> -#include <linux/bitops.h>

Why do you remove that #include? It is rather important that the header
and implementation don't get out of sync. I know that kernel.h includes
bitops.h, but please don't rely on such things. Quoting SubmitChecklist:

1: If you use a facility then #include the file that defines/declares
   that facility.  Don't depend on other header files pulling in ones
   that you use.


>  #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>

However, getting rid of includes that are no longer needed is certainly
a good thing.

> +#include <linux/kernel.h>
>  
>  #ifndef find_last_bit
>  
>  unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>  {
> -	unsigned long words;
> -	unsigned long tmp;
> -
> -	/* Start at final word. */
> -	words = size / BITS_PER_LONG;
> -
> -	/* Partial final word? */
> -	if (size & (BITS_PER_LONG-1)) {
> -		tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
> -					 - (size & (BITS_PER_LONG-1)))));
> -		if (tmp)
> -			goto found;
> -	}
> +	unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
>  
> -	while (words) {
> -		tmp = addr[--words];
> -		if (tmp) {
> -found:
> -			return words * BITS_PER_LONG + __fls(tmp);
> -		}
> +	while (idx--) {
> +		if (addr[idx])
> +			return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
>  	}
>  
> -	/* Not found */
>  	return size;
>  }
>  EXPORT_SYMBOL(find_last_bit);
> diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
> index 0cbfc0b..ebfb3dc 100644
> --- a/lib/find_next_bit.c
> +++ b/lib/find_next_bit.c
> @@ -3,18 +3,45 @@
>   * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
>   * Written by David Howells (dhowells@redhat.com)
>   *
> + * Rewritten by Yury Norov <yury.norov@gmail.com> to decrease
> + * size and improve performance, 2015.
> + *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
>   * as published by the Free Software Foundation; either version
>   * 2 of the License, or (at your option) any later version.
>   */
>  
> -#include <linux/bitops.h>
>  #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>
> +#include <linux/kernel.h>

Same as above.

> +#define HIGH_BITS_MASK(nr)		(ULONG_MAX << (nr))
> +
> +#if !defined(find_next_bit) || !defined(find_next_zero_bit)
> +static unsigned long _find_next_bit(const unsigned long *addr,
> +		unsigned long nbits, unsigned long start, bool set)
> +{
> +	unsigned long tmp = set ? addr[start / BITS_PER_LONG]
> +			: ~addr[start / BITS_PER_LONG];
> +
> +	/* Handle 1st word. */
> +	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
> +		tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
> +		start = round_down(start, BITS_PER_LONG);
> +	}
>  
> -#define BITOP_WORD(nr)		((nr) / BITS_PER_LONG)
> +	while (!tmp) {
> +		start += BITS_PER_LONG;
> +		if (start >= nbits)
> +			return nbits;
> +
> +		tmp = set ? addr[start / BITS_PER_LONG]
> +			: ~addr[start / BITS_PER_LONG];
> +	}
> +
> +	return start + __ffs(tmp);
> +}
> +#endif
>  
>  #ifndef find_next_bit
>  /*
> @@ -23,86 +50,22 @@
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>  			    unsigned long offset)
>  {
> -	const unsigned long *p = addr + BITOP_WORD(offset);
> -	unsigned long result = offset & ~(BITS_PER_LONG-1);
> -	unsigned long tmp;
> -
>  	if (offset >= size)
>  		return size;

Why can't this ...


> -	size -= result;
> -	offset %= BITS_PER_LONG;
> -	if (offset) {
> -		tmp = *(p++);
> -		tmp &= (~0UL << offset);
> -		if (size < BITS_PER_LONG)
> -			goto found_first;
> -		if (tmp)
> -			goto found_middle;
> -		size -= BITS_PER_LONG;
> -		result += BITS_PER_LONG;
> -	}
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if ((tmp = *(p++)))
> -			goto found_middle;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> -	}
> -	if (!size)
> -		return result;
> -	tmp = *p;
>  
> -found_first:
> -	tmp &= (~0UL >> (BITS_PER_LONG - size));
> -	if (tmp == 0UL)		/* Are any bits set? */
> -		return result + size;	/* Nope. */
> -found_middle:
> -	return result + __ffs(tmp);
> +	return min(_find_next_bit(addr, size, offset, 1), size);

... and this be part of _find_next_bit? Can find_next_bit not be simply
'return _find_next_bit(addr, size, offset, 1);', and similarly for
find_next_zero_bit? Btw., passing true and false for the boolean
parameter may be a little clearer.

>  }
>  EXPORT_SYMBOL(find_next_bit);
>  #endif
>  
>  #ifndef find_next_zero_bit
> -/*
> - * This implementation of find_{first,next}_zero_bit was stolen from
> - * Linus' asm-alpha/bitops.h.
> - */
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>  				 unsigned long offset)
>  {
> -	const unsigned long *p = addr + BITOP_WORD(offset);
> -	unsigned long result = offset & ~(BITS_PER_LONG-1);
> -	unsigned long tmp;
> -
>  	if (offset >= size)
>  		return size;

See above.

> -	size -= result;
> -	offset %= BITS_PER_LONG;
> -	if (offset) {
> -		tmp = *(p++);
> -		tmp |= ~0UL >> (BITS_PER_LONG - offset);
> -		if (size < BITS_PER_LONG)
> -			goto found_first;
> -		if (~tmp)
> -			goto found_middle;
> -		size -= BITS_PER_LONG;
> -		result += BITS_PER_LONG;
> -	}
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if (~(tmp = *(p++)))
> -			goto found_middle;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> -	}
> -	if (!size)
> -		return result;
> -	tmp = *p;
>  
> -found_first:
> -	tmp |= ~0UL << size;
> -	if (tmp == ~0UL)	/* Are any bits zero? */
> -		return result + size;	/* Nope. */
> -found_middle:
> -	return result + ffz(tmp);
> +	return min(_find_next_bit(addr, size, offset, 0), size);

See above.

>  }
>  EXPORT_SYMBOL(find_next_zero_bit);
>  #endif
> @@ -113,24 +76,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
>   */
>  unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = 0;
> -	unsigned long tmp;
> +	unsigned long idx;
>  
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if ((tmp = *(p++)))
> -			goto found;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> +	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> +		if (addr[idx])
> +			return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
>  	}
> -	if (!size)
> -		return result;
>  
> -	tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
> -	if (tmp == 0UL)		/* Are any bits set? */
> -		return result + size;	/* Nope. */
> -found:
> -	return result + __ffs(tmp);
> +	return size;
>  }
>  EXPORT_SYMBOL(find_first_bit);
>  #endif
> @@ -141,24 +94,14 @@ EXPORT_SYMBOL(find_first_bit);
>   */
>  unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = 0;
> -	unsigned long tmp;
> +	unsigned long idx;
>  
> -	while (size & ~(BITS_PER_LONG-1)) {
> -		if (~(tmp = *(p++)))
> -			goto found;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> +	for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> +		if (addr[idx] != ULONG_MAX)
> +			return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
>  	}
> -	if (!size)
> -		return result;
>  
> -	tmp = (*p) | (~0UL << size);
> -	if (tmp == ~0UL)	/* Are any bits zero? */
> -		return result + size;	/* Nope. */
> -found:
> -	return result + ffz(tmp);
> +	return size;
>  }
>  EXPORT_SYMBOL(find_first_zero_bit);
>  #endif
> @@ -166,18 +109,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
>  #ifdef __BIG_ENDIAN
>  
>  /* include/linux/byteorder does not support "unsigned long" type */
> -static inline unsigned long ext2_swabp(const unsigned long * x)
> -{
> -#if BITS_PER_LONG == 64
> -	return (unsigned long) __swab64p((u64 *) x);
> -#elif BITS_PER_LONG == 32
> -	return (unsigned long) __swab32p((u32 *) x);
> -#else
> -#error BITS_PER_LONG not defined
> -#endif
> -}
> -
> -/* include/linux/byteorder doesn't support "unsigned long" type */
>  static inline unsigned long ext2_swab(const unsigned long y)
>  {
>  #if BITS_PER_LONG == 64
> @@ -189,48 +120,40 @@ static inline unsigned long ext2_swab(const unsigned long y)
>  #endif
>  }
>  
> +#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
> +static unsigned long _find_next_bit_le(const unsigned long *addr,
> +		unsigned long nbits, unsigned long start, bool set)
> +{
> +	unsigned long tmp = set ? addr[start / BITS_PER_LONG]
> +			: ~addr[start / BITS_PER_LONG];
> +
> +	/* Handle 1st word. */
> +	if (!IS_ALIGNED(start, BITS_PER_LONG)) {
> +		tmp &= ext2_swab(HIGH_BITS_MASK(start % BITS_PER_LONG));
> +		start = round_down(start, BITS_PER_LONG);
> +	}
> +
> +	while (!tmp) {
> +		start += BITS_PER_LONG;
> +		if (start >= nbits)
> +			return nbits;
> +
> +		tmp = set ? addr[start / BITS_PER_LONG]
> +			: ~addr[start / BITS_PER_LONG];
> +	}
> +
> +	return start + __ffs(ext2_swab(tmp));
> +}
> +#endif
> +
>  #ifndef find_next_zero_bit_le
>  unsigned long find_next_zero_bit_le(const void *addr, unsigned
>  		long size, unsigned long offset)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = offset & ~(BITS_PER_LONG - 1);
> -	unsigned long tmp;
> -
>  	if (offset >= size)
>  		return size;

Again, I think this should be moved to the common implementation in
_find_next_bit_le, and similarly for find_next_bit_le below.

> -	p += BITOP_WORD(offset);
> -	size -= result;
> -	offset &= (BITS_PER_LONG - 1UL);
> -	if (offset) {
> -		tmp = ext2_swabp(p++);
> -		tmp |= (~0UL >> (BITS_PER_LONG - offset));
> -		if (size < BITS_PER_LONG)
> -			goto found_first;
> -		if (~tmp)
> -			goto found_middle;
> -		size -= BITS_PER_LONG;
> -		result += BITS_PER_LONG;
> -	}
>  
> -	while (size & ~(BITS_PER_LONG - 1)) {
> -		if (~(tmp = *(p++)))
> -			goto found_middle_swap;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> -	}
> -	if (!size)
> -		return result;
> -	tmp = ext2_swabp(p);
> -found_first:
> -	tmp |= ~0UL << size;
> -	if (tmp == ~0UL)	/* Are any bits zero? */
> -		return result + size; /* Nope. Skip ffz */
> -found_middle:
> -	return result + ffz(tmp);
> -
> -found_middle_swap:
> -	return result + ffz(ext2_swab(tmp));
> +	return min(_find_next_bit_le(addr, size, offset, 0), size);
>  }
>  EXPORT_SYMBOL(find_next_zero_bit_le);
>  #endif
> @@ -239,45 +162,10 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
>  unsigned long find_next_bit_le(const void *addr, unsigned
>  		long size, unsigned long offset)
>  {
> -	const unsigned long *p = addr;
> -	unsigned long result = offset & ~(BITS_PER_LONG - 1);
> -	unsigned long tmp;
> -
>  	if (offset >= size)
>  		return size;
> -	p += BITOP_WORD(offset);
> -	size -= result;
> -	offset &= (BITS_PER_LONG - 1UL);
> -	if (offset) {
> -		tmp = ext2_swabp(p++);
> -		tmp &= (~0UL << offset);
> -		if (size < BITS_PER_LONG)
> -			goto found_first;
> -		if (tmp)
> -			goto found_middle;
> -		size -= BITS_PER_LONG;
> -		result += BITS_PER_LONG;
> -	}
> -
> -	while (size & ~(BITS_PER_LONG - 1)) {
> -		tmp = *(p++);
> -		if (tmp)
> -			goto found_middle_swap;
> -		result += BITS_PER_LONG;
> -		size -= BITS_PER_LONG;
> -	}
> -	if (!size)
> -		return result;
> -	tmp = ext2_swabp(p);
> -found_first:
> -	tmp &= (~0UL >> (BITS_PER_LONG - size));
> -	if (tmp == 0UL)		/* Are any bits set? */
> -		return result + size; /* Nope. */
> -found_middle:
> -	return result + __ffs(tmp);
>  
> -found_middle_swap:
> -	return result + __ffs(ext2_swab(tmp));
> +	return min(_find_next_bit_le(addr, size, offset, 1), size);
>  }
>  EXPORT_SYMBOL(find_next_bit_le);
>  #endif

next prev parent reply	other threads:[~2015-02-02 10:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-31 20:58 [PATCH v2 1/3] lib: find_*_bit reimplementation yury.norov
2015-01-31 20:58 ` [PATCH v2 2/3] lib: move find_last_bit to lib/find_next_bit.c yury.norov
2015-01-31 20:58 ` [PATCH v2 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c yury.norov
2015-02-02 11:09   ` Rasmus Villemoes
2015-02-02  3:17 ` [PATCH v2 1/3] lib: find_*_bit reimplementation George Spelvin
2015-02-04 23:07   ` Yury
2015-02-02 10:43 ` Rasmus Villemoes [this message]
2015-02-02 11:47   ` George Spelvin
2015-02-02 12:56     ` Rasmus Villemoes
2015-02-04 23:45       ` Yury
2015-02-05 14:51         ` Rasmus Villemoes
2015-02-04 22:52   ` Yury
2015-02-05 15:01     ` Rasmus Villemoes
  -- strict thread matches above, loose matches on Subject: below --
2015-02-05 23:07 Alexey Klimov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878uggr3tm.fsf@rasmusvillemoes.dk \
    --to=linux@rasmusvillemoes.dk \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=hannes@stressinduktion.org \
    --cc=klimov.linux@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=msalter@redhat.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=tgraf@suug.ch \
    --cc=valentinrothberg@gmail.com \
    --cc=y.norov@samsung.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.