public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot] [PATCH RFC] zlib: Optimize decompression
@ 2009-11-19 12:22 Joakim Tjernlund
  2009-11-19 14:07 ` Peter Korsgaard
  2009-12-05  0:32 ` Wolfgang Denk
  0 siblings, 2 replies; 5+ messages in thread
From: Joakim Tjernlund @ 2009-11-19 12:22 UTC (permalink / raw)
  To: u-boot

This patch optimizes the direct copy procedure.
Uses get_unaligned() but only in one place.
The copy loop just above this one can also use this
optimization, but I havn't done so as I have not tested if it
is a win there too.
On my MPC8321 this is about 17% faster on my JFFS2 root FS
than the original. No speed test has been performed in u-boot.

Size increase on ppc: 484 bytes

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---

 I have the same optimization queued for linux. Figured it
 would be useful for u-boot too.

 Testing and feedback welcome.

 lib_generic/zlib.c |   56 ++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/lib_generic/zlib.c b/lib_generic/zlib.c
index 8fe3bd0..5721968 100644
--- a/lib_generic/zlib.c
+++ b/lib_generic/zlib.c
@@ -26,8 +26,10 @@
 #define ZUTIL_H
 #define ZLIB_INTERNAL
 
-#include "u-boot/zlib.h"
 #include <common.h>
+#include <compiler.h>
+#include <asm/unaligned.h>
+#include "u-boot/zlib.h"
 /* To avoid a build time warning */
 #ifdef STDC
 #include <malloc.h>
@@ -400,6 +402,7 @@ void inflate_fast OF((z_streamp strm, unsigned start));
  */
 #define OFF 1
 #define PUP(a) *++(a)
+#define UP_UNALIGNED(a) get_unaligned(++(a))
 
 /*
    Decode literal, length, and distance codes and write out the resulting
@@ -616,18 +619,47 @@ unsigned start;         /* inflate()'s starting value for strm->avail_out */
                     }
                 }
                 else {
+		    unsigned short *sout;
+		    unsigned long loops;
+
                     from = out - dist;          /* copy direct from output */
-                    do {                        /* minimum length is three */
-                        PUP(out) = PUP(from);
-                        PUP(out) = PUP(from);
-                        PUP(out) = PUP(from);
-                        len -= 3;
-                    } while (len > 2);
-                    if (len) {
-                        PUP(out) = PUP(from);
-                        if (len > 1)
-                            PUP(out) = PUP(from);
-                    }
+                    /* minimum length is three */
+		    /* Align out addr */
+		    if (!((long)(out - 1 + OFF) & 1)) {
+			PUP(out) = PUP(from);
+			len--;
+		    }
+		    sout = (unsigned short *)(out - OFF);
+		    if (dist > 2 ) {
+			unsigned short *sfrom;
+
+			sfrom = (unsigned short *)(from - OFF);
+			loops = len >> 1;
+			do
+			    PUP(sout) = UP_UNALIGNED(sfrom);
+			while (--loops);
+			out = (unsigned char *)sout + OFF;
+			from = (unsigned char *)sfrom + OFF;
+		    } else { /* dist == 1 or dist == 2 */
+			unsigned short pat16;
+
+			pat16 = *(sout-2+2*OFF);
+			if (dist == 1)
+#if defined(__BIG_ENDIAN)
+			    pat16 = (pat16 & 0xff) | ((pat16 & 0xff ) << 8);
+#elif defined(__LITTLE_ENDIAN)
+			    pat16 = (pat16 & 0xff00) | ((pat16 & 0xff00 ) >> 8);
+#else
+#error __BIG_ENDIAN nor __LITTLE_ENDIAN is defined
+#endif
+			loops = len >> 1;
+			do
+			    PUP(sout) = pat16;
+			while (--loops);
+			out = (unsigned char *)sout + OFF;
+		    }
+		    if (len & 1)
+			PUP(out) = PUP(from);
                 }
             }
             else if ((op & 64) == 0) {          /* 2nd level distance code */
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [U-Boot] [PATCH RFC] zlib: Optimize decompression
  2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
@ 2009-11-19 14:07 ` Peter Korsgaard
  2009-11-19 14:32   ` Joakim Tjernlund
  2009-12-05  0:32 ` Wolfgang Denk
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Korsgaard @ 2009-11-19 14:07 UTC (permalink / raw)
  To: u-boot

>>>>> "Joakim" == Joakim Tjernlund <Joakim.Tjernlund@transmode.se> writes:

 Joakim> This patch optimizes the direct copy procedure.
 Joakim> Uses get_unaligned() but only in one place.
 Joakim> The copy loop just above this one can also use this
 Joakim> optimization, but I havn't done so as I have not tested if it
 Joakim> is a win there too.
 Joakim> On my MPC8321 this is about 17% faster on my JFFS2 root FS
 Joakim> than the original. No speed test has been performed in u-boot.

On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).

 Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>

Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [U-Boot] [PATCH RFC] zlib: Optimize decompression
  2009-11-19 14:07 ` Peter Korsgaard
@ 2009-11-19 14:32   ` Joakim Tjernlund
  2009-11-19 14:51     ` Peter Korsgaard
  0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2009-11-19 14:32 UTC (permalink / raw)
  To: u-boot

Peter Korsgaard <jacmet@gmail.com> wrote on 19/11/2009 15:07:12:
>
> >>>>> "Joakim" == Joakim Tjernlund <Joakim.Tjernlund@transmode.se> writes:
>
>  Joakim> This patch optimizes the direct copy procedure.
>  Joakim> Uses get_unaligned() but only in one place.
>  Joakim> The copy loop just above this one can also use this
>  Joakim> optimization, but I havn't done so as I have not tested if it
>  Joakim> is a win there too.
>  Joakim> On my MPC8321 this is about 17% faster on my JFFS2 root FS
>  Joakim> than the original. No speed test has been performed in u-boot.
>
> On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).
>
>  Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
>
> Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

Thanks, question: How does this compare with your lzo uncompress?

 Jocke

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [U-Boot] [PATCH RFC] zlib: Optimize decompression
  2009-11-19 14:32   ` Joakim Tjernlund
@ 2009-11-19 14:51     ` Peter Korsgaard
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Korsgaard @ 2009-11-19 14:51 UTC (permalink / raw)
  To: u-boot

>>>>> "Joakim" == Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:

Hi,

 >> On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).
 >> 
 Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
 >> 
 >> Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

 Joakim> Thanks, question: How does this compare with your lzo uncompress?

Lzo is still slightly faster, but it only about ~5% (60ms). This is at
400MHz with very slow flash. With slower cpu/faster flash the difference
would probably be bigger (it certainly was when I compared with zlib
before your optimization).

I can rerun that test, but first I need to figure out why 2009.11-rc1 is
more than 1 second slower than 2009.08.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [U-Boot] [PATCH RFC] zlib: Optimize decompression
  2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
  2009-11-19 14:07 ` Peter Korsgaard
@ 2009-12-05  0:32 ` Wolfgang Denk
  1 sibling, 0 replies; 5+ messages in thread
From: Wolfgang Denk @ 2009-12-05  0:32 UTC (permalink / raw)
  To: u-boot

Dear Joakim Tjernlund,

In message <1258633364-20805-1-git-send-email-Joakim.Tjernlund@transmode.se> you wrote:
> This patch optimizes the direct copy procedure.
> Uses get_unaligned() but only in one place.
> The copy loop just above this one can also use this
> optimization, but I havn't done so as I have not tested if it
> is a win there too.
> On my MPC8321 this is about 17% faster on my JFFS2 root FS
> than the original. No speed test has been performed in u-boot.
> 
> Size increase on ppc: 484 bytes
> 
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> ---
> 
>  I have the same optimization queued for linux. Figured it
>  would be useful for u-boot too.
> 
>  Testing and feedback welcome.
> 
>  lib_generic/zlib.c |   56 ++++++++++++++++++++++++++++++++++++++++-----------
>  1 files changed, 44 insertions(+), 12 deletions(-)

Applied to "next", thanks.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
I don't mind criticism. You know me. I've  never  been  one  to  take
offence  at  criticism. No one could say I'm the sort to take offence
at criticism -- Not twice, anyway. Not without blowing bubbles.
                                  - Terry Pratchett, _Witches Abroad_

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-12-05  0:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
2009-11-19 14:07 ` Peter Korsgaard
2009-11-19 14:32   ` Joakim Tjernlund
2009-11-19 14:51     ` Peter Korsgaard
2009-12-05  0:32 ` Wolfgang Denk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox