public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memchr (trivial) optimization
@ 2007-08-22  9:34 lode leroy
  2007-08-22 10:56 ` Andi Kleen
  2007-08-23  0:13 ` Ingo Oeser
  0 siblings, 2 replies; 8+ messages in thread
From: lode leroy @ 2007-08-22  9:34 UTC (permalink / raw)
  To: linux-kernel

While profiling something completely unrelated, I noticed
that on the workloads I used memchr for, I saw a 30%-40% improvement
in performance, with the following trivial changes...
(basically, it saves 3 operations for each call)

$ diff -Nurp linux/lib/string.c{-old,}
--- linux/lib/string.c-old      2007-08-22 11:18:54.000000000 +0200
+++ linux/lib/string.c  2007-08-22 11:19:20.000000000 +0200
@@ -623,10 +623,12 @@ EXPORT_SYMBOL(strstr);
void *memchr(const void *s, int c, size_t n)
{
        const unsigned char *p = s;
-       while (n-- != 0) {
-               if ((unsigned char)c == *p++) {
-                       return (void *)(p - 1);
+       while (n != 0) {
+               if ((unsigned char)c == *p) {
+                       return (void *)p;
                }
+               n--;
+               p++;
        }
        return NULL;
}

_________________________________________________________________
A lot of passions?  Collect all your personal info on one central location , 
for free! http://get.live.com/live/features


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-22  9:34 [PATCH] memchr (trivial) optimization lode leroy
@ 2007-08-22 10:56 ` Andi Kleen
  2007-08-23  0:13 ` Ingo Oeser
  1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2007-08-22 10:56 UTC (permalink / raw)
  To: lode leroy; +Cc: linux-kernel

"lode leroy" <lode_leroy@hotmail.com> writes:

> While profiling something completely unrelated, I noticed
> that on the workloads I used memchr for, I saw a 30%-40% improvement
> in performance, with the following trivial changes...
> (basically, it saves 3 operations for each call)

What kind of workload? I didn't think anything in tree
was memchr intensive.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-22  9:34 [PATCH] memchr (trivial) optimization lode leroy
  2007-08-22 10:56 ` Andi Kleen
@ 2007-08-23  0:13 ` Ingo Oeser
  2007-08-24  0:13   ` Matt Mackall
  1 sibling, 1 reply; 8+ messages in thread
From: Ingo Oeser @ 2007-08-23  0:13 UTC (permalink / raw)
  To: lode leroy; +Cc: linux-kernel

On Wednesday 22 August 2007, lode leroy wrote:
> While profiling something completely unrelated, I noticed
> that on the workloads I used memchr for, I saw a 30%-40% improvement
> in performance, with the following trivial changes...
> (basically, it saves 3 operations for each call)

Yes, but then you could be a bit more explicit to the compiler
on what you are doing here:
 
void *memchr(const void *s, int c, size_t n)
{
	const unsigned char *p = s;

	for (; n != 0; n--, p++) {
               if ((unsigned char)c == *p) {
                       return (void *)p;
	}
	return NULL;
}

Now the compiler should see the loop more clearly.


Best Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-23  0:13 ` Ingo Oeser
@ 2007-08-24  0:13   ` Matt Mackall
  2007-08-24  1:03     ` Jeremy Fitzhardinge
  2007-08-24 12:54     ` Jan Engelhardt
  0 siblings, 2 replies; 8+ messages in thread
From: Matt Mackall @ 2007-08-24  0:13 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: lode leroy, linux-kernel

On Thu, Aug 23, 2007 at 02:13:20AM +0200, Ingo Oeser wrote:
> On Wednesday 22 August 2007, lode leroy wrote:
> > While profiling something completely unrelated, I noticed
> > that on the workloads I used memchr for, I saw a 30%-40% improvement
> > in performance, with the following trivial changes...
> > (basically, it saves 3 operations for each call)
> 
> Yes, but then you could be a bit more explicit to the compiler
> on what you are doing here:
>  
> void *memchr(const void *s, int c, size_t n)
> {
> 	const unsigned char *p = s;
> 
> 	for (; n != 0; n--, p++) {
>                if ((unsigned char)c == *p) {
>                        return (void *)p;
> 	}
> 	return NULL;
> }
> 
> Now the compiler should see the loop more clearly.

And you can do even better with this:

void *memchr(const void *s, int c, size_t n)
{
       const unsigned char *p = s, *e = s + n;
       const unsigned char *e = p + n;

       for (; p < e ; p++)
                if ((unsigned char)c == *p)
                        return (void *)p;

       return NULL;
}

which changes the inner loop from:

  50:   38 08                   cmp    %cl,(%eax)
  52:   74 08                   je     5c <memchr2+0x1a>
  54:   4a                      dec    %edx
  55:   40                      inc    %eax
  56:   85 d2                   test   %edx,%edx
  58:   75 f6                   jne    50 <memchr2+0xe>

to:

  6e:   38 08                   cmp    %cl,(%eax)
  70:   74 07                   je     79 <memchr3+0x1b>
  72:   40                      inc    %eax
  73:   39 d0                   cmp    %edx,%eax
  75:   72 f7                   jb     6e <memchr3+0x10>


-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-24  0:13   ` Matt Mackall
@ 2007-08-24  1:03     ` Jeremy Fitzhardinge
  2007-08-24  2:19       ` Matt Mackall
  2007-08-24 12:54     ` Jan Engelhardt
  1 sibling, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2007-08-24  1:03 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Oeser, lode leroy, linux-kernel

Matt Mackall wrote:
>   6e:   38 08                   cmp    %cl,(%eax)
>   70:   74 07                   je     79 <memchr3+0x1b>
>   72:   40                      inc    %eax
>   
It's a bit gross that the compiler is using inc here rather than lea or
add, but still...

Er, something's spending 30% of its time in memchr?  This is not the
code to fix.

    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-24  1:03     ` Jeremy Fitzhardinge
@ 2007-08-24  2:19       ` Matt Mackall
  0 siblings, 0 replies; 8+ messages in thread
From: Matt Mackall @ 2007-08-24  2:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ingo Oeser, lode leroy, linux-kernel

On Thu, Aug 23, 2007 at 06:03:29PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> >   6e:   38 08                   cmp    %cl,(%eax)
> >   70:   74 07                   je     79 <memchr3+0x1b>
> >   72:   40                      inc    %eax
> >   
> It's a bit gross that the compiler is using inc here rather than lea or
> add, but still...
> 
> Er, something's spending 30% of its time in memchr?  This is not the
> code to fix.

Indeed. I'm just pointing out the general optimization of walking
one counter instead of two.

Hmm, perhaps the culprit is validate_nla?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-24  0:13   ` Matt Mackall
  2007-08-24  1:03     ` Jeremy Fitzhardinge
@ 2007-08-24 12:54     ` Jan Engelhardt
  2007-08-24 15:57       ` Matt Mackall
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Engelhardt @ 2007-08-24 12:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Ingo Oeser, lode leroy, linux-kernel


On Aug 23 2007 19:13, Matt Mackall wrote:
>
>And you can do even better with this:
>
>void *memchr(const void *s, int c, size_t n)
>{
>       const unsigned char *p = s, *e = s + n;
>       const unsigned char *e = p + n;

Uhm, you have two "e"s in there.

>       for (; p < e ; p++)
>                if ((unsigned char)c == *p)
>                        return (void *)p;
>
>       return NULL;
>}

Or do it glibc-style

void *memchr(const void *s, unsigned char c, size_t n)
{
	...
	for (; p + 3 < e; p += 4) {
		if (c == p[0])
			return (void *)&p[0];
		if (c == p[1])
			return (void *)&p[1];
		if (c == p[2])
			return (void *)&p[2];
		if (c == p[3])
			return (void *)&p[3];
	}
	... /* check the rest */
}


	Jan
-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] memchr (trivial) optimization
  2007-08-24 12:54     ` Jan Engelhardt
@ 2007-08-24 15:57       ` Matt Mackall
  0 siblings, 0 replies; 8+ messages in thread
From: Matt Mackall @ 2007-08-24 15:57 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Ingo Oeser, lode leroy, linux-kernel

On Fri, Aug 24, 2007 at 02:54:48PM +0200, Jan Engelhardt wrote:
> 
> On Aug 23 2007 19:13, Matt Mackall wrote:
> >
> >And you can do even better with this:
> >
> >void *memchr(const void *s, int c, size_t n)
> >{
> >       const unsigned char *p = s, *e = s + n;
> >       const unsigned char *e = p + n;
> 
> Uhm, you have two "e"s in there.

Yep, that's what I get for editing in email.

> Or do it glibc-style
> 
> void *memchr(const void *s, unsigned char c, size_t n)
> {
> 	...
> 	for (; p + 3 < e; p += 4) {
> 		if (c == p[0])
> 			return (void *)&p[0];
> 		if (c == p[1])
> 			return (void *)&p[1];
> 		if (c == p[2])
> 			return (void *)&p[2];
> 		if (c == p[3])
> 			return (void *)&p[3];
> 	}
> 	... /* check the rest */
> }

Yes, very funny.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-08-24 15:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-22  9:34 [PATCH] memchr (trivial) optimization lode leroy
2007-08-22 10:56 ` Andi Kleen
2007-08-23  0:13 ` Ingo Oeser
2007-08-24  0:13   ` Matt Mackall
2007-08-24  1:03     ` Jeremy Fitzhardinge
2007-08-24  2:19       ` Matt Mackall
2007-08-24 12:54     ` Jan Engelhardt
2007-08-24 15:57       ` Matt Mackall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox