All of lore.kernel.org
 help / color / mirror / Atom feed
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Gabriel Paubert <paubert@iram.es>
Cc: linuxppc-dev@ozlabs.org, paulus@samba.org,
	LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] add strncmp to PowerPC
Date: Mon, 3 Mar 2008 20:08:59 +0100	[thread overview]
Message-ID: <ecb965a1e5feba94cd97fb4b9535908e@kernel.crashing.org> (raw)
In-Reply-To: <20080303095443.GB27105@iram.es>

>> Even if it was logically faster (which I still doubt) it's a hell of 
>> a lot
>> of cache lines to waste.

Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot.</sarcasm>

> Indeed, but there are some corner cases that the C code handles. Like
> a length of 0 which may lead to infinite loop in the asm code.
>
> OTOH, I'm a bit surprised by the extsb instructions in the compiler 
> generated
> code. We don't compile with -fsigned-char, do we? The clrldi
> instructions are also extremely stupid.

Those are both necessary to be equivalent to the C code, which uses
signed char explicitly.  It is generally considered a Good Thing(tm)
for the compiler to generate assembler code equivalent to the C code,
even if the C code is wrong.

> Now that I think a bit more about it, I believe that the C version is
> incorrect

It is.  It's a great entry for the IOCCC as well.

I just tested the following (can't guarantee it's correct, just a PoC):

int strncmp(const char *s1, const char *s2, unsigned long /*size_t*/ 
len)
{
         while (len--) {
                 unsigned char c1, c2;
                 c1 = *s1++;
                 c2 = *s2++;
                 int cmp = c1 - c2;
                 if (cmp)
                         return cmp;
                 if (c1 == 0 || c2 == 0)
                         break;
         }
         return 0;
}

which generates (with GCC-4.2.3)

strncmp:
         addi 5,5,1
         mtctr 5
.L2:
         bdz .L11
         lbz 0,0(3)
         addi 3,3,1
         lbz 9,0(4)
         addi 4,4,1
         cmpwi 7,0,0
         subf. 0,9,0
         cmpwi 6,9,0
         bne- 0,.L4
         beq- 7,.L4
         bne+ 6,.L2
.L4:
         mr 3,0
         blr
.L11:
         li 0,0
         mr 3,0
         blr

which isn't horrid, although it does some weirdish things obviously.

Current GCC-4.4.0 generates

strncmp:
         addi 5,5,1
         mr 10,3
         mtctr 5
         li 11,0
         bdz .L7
         .p2align 4,,15
.L4:
         lbzx 0,10,11
         lbzx 9,4,11
         addi 11,11,1
         subf. 3,9,0
         cmpwi 6,9,0
         cmpwi 7,0,0
         bnelr 0
         beqlr 7
         beqlr 6
         bdnz .L4
.L7:
         li 3,0
         blr

which is about as good as it can get (well, it didn't realise you
only need to test one of c1, c2 for zero.  Did I say this was just
proof-of-concept code?)


Segher

WARNING: multiple messages have this Message-ID (diff)
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Gabriel Paubert <paubert@iram.es>
Cc: paulus@samba.org, LKML <linux-kernel@vger.kernel.org>,
	linuxppc-dev@ozlabs.org, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] add strncmp to PowerPC
Date: Mon, 3 Mar 2008 20:08:59 +0100	[thread overview]
Message-ID: <ecb965a1e5feba94cd97fb4b9535908e@kernel.crashing.org> (raw)
In-Reply-To: <20080303095443.GB27105@iram.es>

>> Even if it was logically faster (which I still doubt) it's a hell of 
>> a lot
>> of cache lines to waste.

Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot.</sarcasm>

> Indeed, but there are some corner cases that the C code handles. Like
> a length of 0 which may lead to infinite loop in the asm code.
>
> OTOH, I'm a bit surprised by the extsb instructions in the compiler 
> generated
> code. We don't compile with -fsigned-char, do we? The clrldi
> instructions are also extremely stupid.

Those are both necessary to be equivalent to the C code, which uses
signed char explicitly.  It is generally considered a Good Thing(tm)
for the compiler to generate assembler code equivalent to the C code,
even if the C code is wrong.

> Now that I think a bit more about it, I believe that the C version is
> incorrect

It is.  It's a great entry for the IOCCC as well.

I just tested the following (can't guarantee it's correct, just a PoC):

int strncmp(const char *s1, const char *s2, unsigned long /*size_t*/ 
len)
{
         while (len--) {
                 unsigned char c1, c2;
                 c1 = *s1++;
                 c2 = *s2++;
                 int cmp = c1 - c2;
                 if (cmp)
                         return cmp;
                 if (c1 == 0 || c2 == 0)
                         break;
         }
         return 0;
}

which generates (with GCC-4.2.3)

strncmp:
         addi 5,5,1
         mtctr 5
.L2:
         bdz .L11
         lbz 0,0(3)
         addi 3,3,1
         lbz 9,0(4)
         addi 4,4,1
         cmpwi 7,0,0
         subf. 0,9,0
         cmpwi 6,9,0
         bne- 0,.L4
         beq- 7,.L4
         bne+ 6,.L2
.L4:
         mr 3,0
         blr
.L11:
         li 0,0
         mr 3,0
         blr

which isn't horrid, although it does some weirdish things obviously.

Current GCC-4.4.0 generates

strncmp:
         addi 5,5,1
         mr 10,3
         mtctr 5
         li 11,0
         bdz .L7
         .p2align 4,,15
.L4:
         lbzx 0,10,11
         lbzx 9,4,11
         addi 11,11,1
         subf. 3,9,0
         cmpwi 6,9,0
         cmpwi 7,0,0
         bnelr 0
         beqlr 7
         beqlr 6
         bdnz .L4
.L7:
         li 3,0
         blr

which is about as good as it can get (well, it didn't realise you
only need to test one of c1, c2 for zero.  Did I say this was just
proof-of-concept code?)


Segher


  parent reply	other threads:[~2008-03-03 19:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-29 16:04 [PATCH] add strncmp to PowerPC Steven Rostedt
2008-03-01  3:04 ` Benjamin Herrenschmidt
2008-03-01  3:04   ` Benjamin Herrenschmidt
2008-03-01  3:56   ` Steven Rostedt
2008-03-01  3:56     ` Steven Rostedt
2008-03-03  9:54     ` Gabriel Paubert
2008-03-03  9:54       ` Gabriel Paubert
2008-03-03 10:10       ` Andreas Schwab
2008-03-03 10:10         ` Andreas Schwab
2008-03-03 19:08       ` Segher Boessenkool [this message]
2008-03-03 19:08         ` Segher Boessenkool
2008-03-05  4:03   ` Paul Mackerras
2008-03-05  4:03     ` Paul Mackerras
2008-03-05  5:26     ` Segher Boessenkool
2008-03-05  5:26       ` Segher Boessenkool
2008-03-05  5:39       ` Paul Mackerras
2008-03-05  5:39         ` Paul Mackerras
2008-03-05  7:01         ` Segher Boessenkool
2008-03-05  7:01           ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecb965a1e5feba94cd97fb4b9535908e@kernel.crashing.org \
    --to=segher@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paubert@iram.es \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.