public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* optimalisation for strlcpy (lib/string.c)
@ 2006-12-10 21:23 Folkert van Heusden
  2006-12-10 21:49 ` Eyal Lebedinsky
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Folkert van Heusden @ 2006-12-10 21:23 UTC (permalink / raw)
  To: linux-kernel

Hi,

Like the other patch (by that other person), I think it is faster to not
do a strlen first.
E.g. replace this:
ize_t strlcpy(char *dest, const char *src, size_t size)
{
        size_t ret = strlen(src);

        if (size) {
                size_t len = (ret >= size) ? size - 1 : ret;
                memcpy(dest, src, len);
                dest[len] = '\0';
        }
        return ret;
}
by this:
size_t strlcpy(char *dest, const char *src, size_t size)
{
        char *tmp = dest;

        for(;;)
        {
                *dest = *src;
                if (!*src)
                        break;

                if (--size == 0)
                        break;

                dest++;
                src++;
        }

        *dest = 0x00;

        return dest - tmp;
}
patch:
diff -uNrBbd lib/string.c string-new.c
--- lib/string.c        2006-11-04 02:33:58.000000000 +0100
+++ string-new.c        2006-12-10 22:22:08.000000000 +0100
@@ -121,14 +121,24 @@
  */
 size_t strlcpy(char *dest, const char *src, size_t size)
 {
-       size_t ret = strlen(src);
+        char *tmp = dest;

-       if (size) {
-               size_t len = (ret >= size) ? size - 1 : ret;
-               memcpy(dest, src, len);
-               dest[len] = '\0';
+        for(;;)
+        {
+                *dest = *src;
+                if (!*src)
+                        break;
+
+                if (--size == 0)
+                        break;
+
+                dest++;
+                src++;
        }
-       return ret;
+
+        *dest = 0x00;
+
+        return dest - tmp;
 }
 EXPORT_SYMBOL(strlcpy);
 #endif


I've tested the speed difference with this:
http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
and the speed difference is quite a bit on a P4: 28% faster.


Signed-off by: Folkert van Heusden <folkert@vanheusden.com>


Folkert van Heusden

-- 
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
               windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 21:55 ` Mitchell Blank Jr
@ 2006-12-10 21:41   ` Folkert van Heusden
  0 siblings, 0 replies; 7+ messages in thread
From: Folkert van Heusden @ 2006-12-10 21:41 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: linux-kernel

> > Like the other patch (by that other person), I think it is faster to not
> > do a strlen first.
> Debatable.  You've replaced a call to the highly-optimized memcpy function
> with a byte-by-byte copy.  It's probably a wash performance wise (have you
> benchmarked?) and is certainly less clear.

I tested it outside the kernel.
The test source can be found here:
http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c


Folkert van Heusden

-- 
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 21:23 optimalisation for strlcpy (lib/string.c) Folkert van Heusden
@ 2006-12-10 21:49 ` Eyal Lebedinsky
  2006-12-10 22:03   ` Folkert van Heusden
  2006-12-10 21:55 ` Mitchell Blank Jr
  2006-12-10 23:13 ` Adrian Bunk
  2 siblings, 1 reply; 7+ messages in thread
From: Eyal Lebedinsky @ 2006-12-10 21:49 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: linux-kernel

Folkert van Heusden wrote:
> Hi,
> 
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.
> E.g. replace this:
> ize_t strlcpy(char *dest, const char *src, size_t size)
> {
>         size_t ret = strlen(src);
> 
>         if (size) {
>                 size_t len = (ret >= size) ? size - 1 : ret;
>                 memcpy(dest, src, len);
>                 dest[len] = '\0';
>         }
>         return ret;
> }
> by this:
> size_t strlcpy(char *dest, const char *src, size_t size)
> {
>         char *tmp = dest;
> 
>         for(;;)
>         {
>                 *dest = *src;
>                 if (!*src)
>                         break;
> 
>                 if (--size == 0)
>                         break;
> 
>                 dest++;
>                 src++;
>         }
> 
>         *dest = 0x00;
> 
>         return dest - tmp;
> }
> patch:
> diff -uNrBbd lib/string.c string-new.c
> --- lib/string.c        2006-11-04 02:33:58.000000000 +0100
> +++ string-new.c        2006-12-10 22:22:08.000000000 +0100
> @@ -121,14 +121,24 @@
>   */
>  size_t strlcpy(char *dest, const char *src, size_t size)
>  {
> -       size_t ret = strlen(src);
> +        char *tmp = dest;
> 
> -       if (size) {
> -               size_t len = (ret >= size) ? size - 1 : ret;
> -               memcpy(dest, src, len);
> -               dest[len] = '\0';
> +        for(;;)
> +        {
> +                *dest = *src;
> +                if (!*src)
> +                        break;
> +
> +                if (--size == 0)
> +                        break;
> +
> +                dest++;
> +                src++;
>         }
> -       return ret;
> +
> +        *dest = 0x00;
> +
> +        return dest - tmp;
>  }
>  EXPORT_SYMBOL(strlcpy);
>  #endif
> 
> 
> I've tested the speed difference with this:
> http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> and the speed difference is quite a bit on a P4: 28% faster.
> 
> 
> Signed-off by: Folkert van Heusden <folkert@vanheusden.com>
> 
> 
> Folkert van Heusden

The two do not do exactly the same. The first one handles 'size == 0'
(should not happen?) safely while the other does not.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 21:23 optimalisation for strlcpy (lib/string.c) Folkert van Heusden
  2006-12-10 21:49 ` Eyal Lebedinsky
@ 2006-12-10 21:55 ` Mitchell Blank Jr
  2006-12-10 21:41   ` Folkert van Heusden
  2006-12-10 23:13 ` Adrian Bunk
  2 siblings, 1 reply; 7+ messages in thread
From: Mitchell Blank Jr @ 2006-12-10 21:55 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: linux-kernel

Folkert van Heusden wrote:
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.

Debatable.  You've replaced a call to the highly-optimized memcpy function
with a byte-by-byte copy.  It's probably a wash performance wise (have you
benchmarked?) and is certainly less clear.

-Mitch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 21:49 ` Eyal Lebedinsky
@ 2006-12-10 22:03   ` Folkert van Heusden
  0 siblings, 0 replies; 7+ messages in thread
From: Folkert van Heusden @ 2006-12-10 22:03 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-kernel

> > by this:
> > size_t strlcpy(char *dest, const char *src, size_t size)
> > {
> >         char *tmp = dest;
> > 
> >         for(;;)
> >         {
> >                 *dest = *src;
> >                 if (!*src)
> >                         break;
> > 
> >                 if (--size == 0)
> >                         break;
> > 
> >                 dest++;
> >                 src++;
> >         }
> > 
> >         *dest = 0x00;
> > 
> >         return dest - tmp;
> > }
> > 
> > I've tested the speed difference with this:
> > http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> > and the speed difference is quite a bit on a P4: 28% faster.
> 
> The two do not do exactly the same. The first one handles 'size == 0'
> (should not happen?) safely while the other does not.

Ok, small change:
diff -uNrBbd lib/string.c string-new.c
--- lib/string.c        2006-11-04 02:33:58.000000000 +0100
+++ string-new.c        2006-12-10 23:02:26.000000000 +0100
@@ -121,14 +122,27 @@
  */
 size_t strlcpy(char *dest, const char *src, size_t size)
 {
-       size_t ret = strlen(src);
+        char *tmp = dest;

-       if (size) {
-               size_t len = (ret >= size) ? size - 1 : ret;
-               memcpy(dest, src, len);
-               dest[len] = '\0';
+       if (likeley(size > 0))
+       {
+               for(;;)
+               {
+                       *dest = *src;
+                       if (unlikely(!*src))
+                               break;
+
+                       if (unlikely(--size == 0))
+                               break;
+
+                       dest++;
+                       src++;
        }
-       return ret;
+       }
+
+       *dest = 0x00;
+
+       return dest - tmp;
 }
 EXPORT_SYMBOL(strlcpy);
 #endif


Folkert van Heusden

-- 
Ever wonder what is out there? Any alien races? Then please support
the seti@home project: setiathome.ssl.berkeley.edu
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 21:23 optimalisation for strlcpy (lib/string.c) Folkert van Heusden
  2006-12-10 21:49 ` Eyal Lebedinsky
  2006-12-10 21:55 ` Mitchell Blank Jr
@ 2006-12-10 23:13 ` Adrian Bunk
  2006-12-11  5:26   ` Clemens Koller
  2 siblings, 1 reply; 7+ messages in thread
From: Adrian Bunk @ 2006-12-10 23:13 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: linux-kernel

On Sun, Dec 10, 2006 at 10:23:51PM +0100, Folkert van Heusden wrote:
> Hi,
> 
> Like the other patch (by that other person), I think it is faster to not
> do a strlen first.
>...
> --- lib/string.c        2006-11-04 02:33:58.000000000 +0100
> +++ string-new.c        2006-12-10 22:22:08.000000000 +0100
> @@ -121,14 +121,24 @@
>   */
>  size_t strlcpy(char *dest, const char *src, size_t size)
>  {
> -       size_t ret = strlen(src);
> +        char *tmp = dest;
> 
> -       if (size) {
> -               size_t len = (ret >= size) ? size - 1 : ret;
> -               memcpy(dest, src, len);
> -               dest[len] = '\0';
> +        for(;;)
> +        {
> +                *dest = *src;
> +                if (!*src)
> +                        break;
> +
> +                if (--size == 0)
> +                        break;
> +
> +                dest++;
> +                src++;
>         }
> -       return ret;
> +
> +        *dest = 0x00;
> +
> +        return dest - tmp;
>...

Two bugs in your code:
- you copy a maximum of size bytes _plus_ \0
- size == 0 is no longer handled correctly

> I've tested the speed difference with this:
> http://www.vanheusden.com/misc/kernel-strlcpy-opt-test.c
> and the speed difference is quite a bit on a P4: 28% faster.
>...

My Athlon says:
org: 2.400000
new: 6.710000

IOW, your version is much slower.

But the main question is actually:
Does the performance of this function matter anywhere inside the kernel?
Is strlcpy() used in any fast path?
If not, there's no point in trying to optimize it.

> Folkert van Heusden

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: optimalisation for strlcpy (lib/string.c)
  2006-12-10 23:13 ` Adrian Bunk
@ 2006-12-11  5:26   ` Clemens Koller
  0 siblings, 0 replies; 7+ messages in thread
From: Clemens Koller @ 2006-12-11  5:26 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Folkert van Heusden, linux-kernel

Hi, Adrian & Friends!

>> Like the other patch (by that other person), I think it is faster to not
>> do a strlen first.

There are PowerPC architectures, where a strlen is a matter of a single
instruction of the CPU.

Quote: "PowerPC 405 and 440 instruction sets offer the powerful
Determine Left-Most Zero Byte (DLMZB) instruction."

That's propably hard to beat.

So, if we really want to speed up these things, we should write
code which helps the compiler get that optimized instructions for
us - by improving the code of strlen() i.e.

Reference:
http://www-03.ibm.com/chips/power/powerpc/newsletter/sep2003/ppc_process_at_work2.html

Greets,

Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm-technology.com
Phone: +49-89-741518-50
Fax: +49-89-741518-19


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-12-11  5:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-10 21:23 optimalisation for strlcpy (lib/string.c) Folkert van Heusden
2006-12-10 21:49 ` Eyal Lebedinsky
2006-12-10 22:03   ` Folkert van Heusden
2006-12-10 21:55 ` Mitchell Blank Jr
2006-12-10 21:41   ` Folkert van Heusden
2006-12-10 23:13 ` Adrian Bunk
2006-12-11  5:26   ` Clemens Koller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox