public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux@rasmusvillemoes.dk, yury.norov@gmail.com
Cc: akpm@linux-foundation.org, chris@chris-wilson.co.uk,
	davem@davemloft.net, dborkman@redhat.com,
	hannes@stressinduktion.org, klimov.linux@gmail.com,
	laijs@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	linux@horizon.com, msalter@redhat.com,
	takahiro.akashi@linaro.org, tgraf@suug.ch,
	valentinrothberg@gmail.com
Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
Date: 13 Feb 2015 16:09:03 -0500	[thread overview]
Message-ID: <20150213210903.4716.qmail@ns.horizon.com> (raw)
In-Reply-To: <CAAH8bW-mk0kk-GKDNny6hsjrbcjwdcAacsF_DEaXmNG==hXhRw@mail.gmail.com>

> Ohh... I used to think I know something about optimization. I tried build
> Rasmus' 'find_last_bit' against mine on MIPS, and found that sizes are as
> 268 vs 320 bytes. I think the best version is suggested by George, both
> readable, and effective. What about using it? If no objections, I'll
> gather all fixes and upload new patchset this weekend.

I'll happily ack whichever you prefer.  Tightening the code to the
maximum possible fun exercise, but not essential.  However, I finally
got GCC to generate reasonable code with the following:

size_t find_last_bit3(const unsigned long *addr, size_t size)
{
	if (size) {
		unsigned long val = LAST_WORD_MASK(size);
		size_t idx = (size-1) / BITS_PER_LONG;

		do {
			val &= addr[idx];
			if (val)
				return idx * BITS_PER_LONG + __fls(val);
			val = ~0ul;
		} while (idx--);
	}
	return size;
}

size_t find_last_bit4(const unsigned long *addr, size_t size)
{
	if (size) {
		unsigned long val = LAST_WORD_MASK(size);
		size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);

		do {
			val &= addr[idx];
			if (val)
				return idx * BITS_PER_LONG + __fls(val);
			val = ~0ul;
		} while (--idx);
	}
	return size;
}

Which produced, respectively, 76 and 68-byte functions:

0000000000000000 <find_last_bit3>:
   0:	31 c0                	xor    %eax,%eax
   2:	48 85 f6             	test   %rsi,%rsi
   5:	74 44                	je     4b <find_last_bit3+0x4b>
   7:	48 8d 46 ff          	lea    -0x1(%rsi),%rax
   b:	89 f1                	mov    %esi,%ecx
   d:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  14:	f7 d9                	neg    %ecx
  16:	48 c1 e8 06          	shr    $0x6,%rax
  1a:	48 d3 ea             	shr    %cl,%rdx
  1d:	eb 11                	jmp    30 <find_last_bit3+0x30>
  1f:	90                   	nop
  20:	48 83 e8 01          	sub    $0x1,%rax
  24:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  2b:	48 39 d0             	cmp    %rdx,%rax
  2e:	74 18                	je     48 <find_last_bit3+0x48>
  30:	48 23 14 c7          	and    (%rdi,%rax,8),%rdx
  34:	74 ea                	je     20 <find_last_bit3+0x20>
  36:	48 c1 e0 06          	shl    $0x6,%rax
  3a:	48 0f bd d2          	bsr    %rdx,%rdx
  3e:	48 01 d0             	add    %rdx,%rax
  41:	c3                   	retq   
  42:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  48:	48 89 f0             	mov    %rsi,%rax
  4b:	c3                   	retq   
  4c:	0f 1f 40 00          	nopl   0x0(%rax)

0000000000000050 <find_last_bit4>:
  50:	31 c0                	xor    %eax,%eax
  52:	48 85 f6             	test   %rsi,%rsi
  55:	74 3c                	je     93 <find_last_bit4+0x43>
  57:	48 8d 46 3f          	lea    0x3f(%rsi),%rax
  5b:	89 f1                	mov    %esi,%ecx
  5d:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  64:	f7 d9                	neg    %ecx
  66:	48 c1 e8 06          	shr    $0x6,%rax
  6a:	48 d3 ea             	shr    %cl,%rdx
  6d:	eb 0d                	jmp    7c <find_last_bit4+0x2c>
  6f:	90                   	nop
  70:	48 c7 c2 ff ff ff ff 	mov    $0xffffffffffffffff,%rdx
  77:	48 01 d0             	add    %rdx,%rax
  7a:	74 14                	je     90 <find_last_bit4+0x40>
  7c:	48 23 14 c7          	and    (%rdi,%rax,8),%rdx
  80:	74 ee                	je     70 <find_last_bit4+0x20>
  82:	48 c1 e0 06          	shl    $0x6,%rax
  86:	48 0f bd d2          	bsr    %rdx,%rdx
  8a:	48 01 d0             	add    %rdx,%rax
  8d:	c3                   	retq   
  8e:	66 90                	xchg   %ax,%ax
  90:	48 89 f0             	mov    %rsi,%rax
  93:	c3                   	retq   

The second one omits all the unwanted tests & compares, and even does
something clever with the -1 mask that I didn't think of.

       reply	other threads:[~2015-02-13 21:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAAH8bW-mk0kk-GKDNny6hsjrbcjwdcAacsF_DEaXmNG==hXhRw@mail.gmail.com>
2015-02-13 21:09 ` George Spelvin [this message]
2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
2015-02-08 18:48   ` George Spelvin
2015-02-09  8:32   ` George Spelvin
2015-02-09 11:53     ` Rasmus Villemoes
2015-02-09 16:45       ` George Spelvin
2015-02-11 22:14         ` Rasmus Villemoes
2015-02-11 23:05       ` Yury
2015-02-12  8:15         ` George Spelvin
2015-02-12  9:58           ` Rasmus Villemoes
2015-02-12 23:46             ` George Spelvin
2015-02-13 10:13               ` Rasmus Villemoes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150213210903.4716.qmail@ns.horizon.com \
    --to=linux@horizon.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=hannes@stressinduktion.org \
    --cc=klimov.linux@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=msalter@redhat.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=tgraf@suug.ch \
    --cc=valentinrothberg@gmail.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox