Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, linux@rasmusvillemoes.dk
Cc: akpm@linux-foundation.org, chris@chris-wilson.co.uk,
	davem@davemloft.net, dborkman@redhat.com,
	hannes@stressinduktion.org, klimov.linux@gmail.com,
	laijs@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	msalter@redhat.com, takahiro.akashi@linaro.org, tgraf@suug.ch,
	valentinrothberg@gmail.com, yury.norov@gmail.com
Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation
Date: 12 Feb 2015 18:46:03 -0500	[thread overview]
Message-ID: <20150212234603.30908.qmail@ns.horizon.com> (raw)
In-Reply-To: <87zj8jzc11.fsf@rasmusvillemoes.dk>

> That, and if the compiler was smart enough, the AND should actually be
> free (at least on x86), since it can replace a test instruction. [1]

Ah, right, x86 loads don't set the flags, so you need a TEST
instruction anyway.  So the AND is free!

Of course, not all the world's an x86.  But load/store architectures
generally need a test instruction as well.  It's things like MIPS and
Aloha, that don't have condition codes, but only conditional branches
on register values, that will need an extra cycle.

> In any case, my code compiles to fewer bytes (though not an entire cache
> line), and I think it is somewhat clearer - I'm obviously biased on the
> latter point.

Myself, I think the code that shows that only the first word gets masked
by control flow is clearer, but since the complexity is completely
localized to this function, I'll take smaller and faster.

> [1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
> compile my code with gcc 5.0, I get
> 
> 0000000000000000 <find_last_bit_rv>:
>    0:   53                      push   %rbx
>    1:   89 f1                   mov    %esi,%ecx
>    3:   48 8d 5e 3f             lea    0x3f(%rsi),%rbx
>    7:   f7 d9                   neg    %ecx
>    9:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
>   10:   48 83 e3 c0             and    $0xffffffffffffffc0,%rbx
>   14:   48 d3 ea                shr    %cl,%rdx
>   17:   eb 1a                   jmp    33 <find_last_bit_rv+0x33>
>   19:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
> 
>   20:   48 89 d1                mov    %rdx,%rcx
>   23:   48 23 0c df             and    (%rdi,%rbx,8),%rcx
>   27:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
>   2e:   48 85 c9                test   %rcx,%rcx
>   31:   75 15                   jne    48 <find_last_bit_rv+0x48>
>   33:   48 83 eb 01             sub    $0x1,%rbx
>   37:   48 83 fb ff             cmp    $0xffffffffffffffff,%rbx
>   3b:   75 e3                   jne    20 <find_last_bit_rv+0x20>
> 
>   3d:   48 89 f0                mov    %rsi,%rax
>   40:   5b                      pop    %rbx
>   41:   c3                      retq   
>   42:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
>   48:   48 89 cf                mov    %rcx,%rdi
>   4b:   48 c1 e3 06             shl    $0x6,%rbx
>   4f:   e8 00 00 00 00          callq  54 <find_last_bit_rv+0x54>
>   54:   48 98                   cltq   
>   56:   48 01 d8                add    %rbx,%rax
>   59:   5b                      pop    %rbx
>   5a:   c3                      retq   
>   5b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

> the main loop is 20--3b. The test instruction at 2e seems to be
> redundant. The same at 37: the sub instruction already sets plenty of
> flags that could be used, so explicitly comparing %rbx to -1 seems
> redundant.

Er... I think you hand-edited that code; it's wrong.  The loop assumes that
%rbx is in units of words, but the prologue sets it up in units of bits.

The mov to %rcx is also redundant, since it could be eliminated with some
minor rescheduling.

The code generation I *want* for that function is:

# addr in %rdi, size in %rsi
	movl	%esi, %ecx
	leaq	0x3f(%rsi), %rax
	negl	%ecx
	movq	$-1, %rdx
        shrq	$6, %rax
	shrq	%cl, %rdx
	jmp	2f
1:
	movq	$-1, %rdx
2:
	subq	$1, %rax
	jc	3f
	andq	(%rdi,%rax,8), %rdx
	jeq	1b

	bsrq	%rdx, %rdx
        salq    $6, %rax
	addq	%rdx, %rax
        ret
3:
	movq	%rsi, %rax
	retq

I wonder if the compiler can be convinced by a bit of source tweaking?

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
	size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
	unsigned long val = LAST_WORD_MASK(size);

	while (idx--) {
		val &= addr[idx];
		if (val)
			return idx * BITS_PER_LONG + __fls(val);
		val = ~0ul;
	}
	return size;
}

Darn, no difference...

next prev parent reply	other threads:[~2015-02-12 23:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-08 14:10 [PATCH v3 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
2015-02-08 18:48   ` George Spelvin
2015-02-09  8:32   ` George Spelvin
2015-02-09 11:53     ` Rasmus Villemoes
2015-02-09 16:45       ` George Spelvin
2015-02-11 22:14         ` Rasmus Villemoes
2015-02-11 23:05       ` Yury
2015-02-12  8:15         ` George Spelvin
2015-02-12  9:58           ` Rasmus Villemoes
2015-02-12 23:46             ` George Spelvin [this message]
2015-02-13 10:13               ` Rasmus Villemoes
2015-02-08 14:10 ` [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-08 14:10 ` [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-17  2:35 ` [PATCH v4 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-17  2:35   ` [PATCH v4 1/3] " Yury Norov
2015-02-18 17:57     ` Rasmus Villemoes
2015-02-17  2:35   ` [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-17  2:35   ` [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-22 17:24 ` [PATCH v5 0/3] lib: find_*_bit reimplementation Yury Norov
2015-02-22 17:24   ` [PATCH v5 1/3] " Yury Norov
2015-02-23 21:50     ` Rasmus Villemoes
2015-02-24  0:29       ` George Spelvin
2015-02-22 17:24   ` [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c Yury Norov
2015-02-22 17:24   ` [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c Yury Norov
2015-02-24  0:40   ` [PATCH v5 0/3] lib: find_*_bit reimplementation Andrew Morton
2015-03-08 18:17     ` Yury Norov
     [not found] <CAAH8bW-mk0kk-GKDNny6hsjrbcjwdcAacsF_DEaXmNG==hXhRw@mail.gmail.com>
2015-02-13 21:09 ` [PATCH v3 1/3] " George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150212234603.30908.qmail@ns.horizon.com \
    --to=linux@horizon.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=hannes@stressinduktion.org \
    --cc=klimov.linux@gmail.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=msalter@redhat.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=tgraf@suug.ch \
    --cc=valentinrothberg@gmail.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox