* [U-Boot] [PATCH RFC] zlib: Optimize decompression
@ 2009-11-19 12:22 Joakim Tjernlund
2009-11-19 14:07 ` Peter Korsgaard
2009-12-05 0:32 ` Wolfgang Denk
0 siblings, 2 replies; 5+ messages in thread
From: Joakim Tjernlund @ 2009-11-19 12:22 UTC (permalink / raw)
To: u-boot
This patch optimizes the direct copy procedure.
Uses get_unaligned() but only in one place.
The copy loop just above this one can also use this
optimization, but I havn't done so as I have not tested if it
is a win there too.
On my MPC8321 this is about 17% faster on my JFFS2 root FS
than the original. No speed test has been performed in u-boot.
Size increase on ppc: 484 bytes
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
I have the same optimization queued for linux. Figured it
would be useful for u-boot too.
Testing and feedback welcome.
lib_generic/zlib.c | 56 ++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/lib_generic/zlib.c b/lib_generic/zlib.c
index 8fe3bd0..5721968 100644
--- a/lib_generic/zlib.c
+++ b/lib_generic/zlib.c
@@ -26,8 +26,10 @@
#define ZUTIL_H
#define ZLIB_INTERNAL
-#include "u-boot/zlib.h"
#include <common.h>
+#include <compiler.h>
+#include <asm/unaligned.h>
+#include "u-boot/zlib.h"
/* To avoid a build time warning */
#ifdef STDC
#include <malloc.h>
@@ -400,6 +402,7 @@ void inflate_fast OF((z_streamp strm, unsigned start));
*/
#define OFF 1
#define PUP(a) *++(a)
+#define UP_UNALIGNED(a) get_unaligned(++(a))
/*
Decode literal, length, and distance codes and write out the resulting
@@ -616,18 +619,47 @@ unsigned start; /* inflate()'s starting value for strm->avail_out */
}
}
else {
+ unsigned short *sout;
+ unsigned long loops;
+
from = out - dist; /* copy direct from output */
- do { /* minimum length is three */
- PUP(out) = PUP(from);
- PUP(out) = PUP(from);
- PUP(out) = PUP(from);
- len -= 3;
- } while (len > 2);
- if (len) {
- PUP(out) = PUP(from);
- if (len > 1)
- PUP(out) = PUP(from);
- }
+ /* minimum length is three */
+ /* Align out addr */
+ if (!((long)(out - 1 + OFF) & 1)) {
+ PUP(out) = PUP(from);
+ len--;
+ }
+ sout = (unsigned short *)(out - OFF);
+ if (dist > 2 ) {
+ unsigned short *sfrom;
+
+ sfrom = (unsigned short *)(from - OFF);
+ loops = len >> 1;
+ do
+ PUP(sout) = UP_UNALIGNED(sfrom);
+ while (--loops);
+ out = (unsigned char *)sout + OFF;
+ from = (unsigned char *)sfrom + OFF;
+ } else { /* dist == 1 or dist == 2 */
+ unsigned short pat16;
+
+ pat16 = *(sout-2+2*OFF);
+ if (dist == 1)
+#if defined(__BIG_ENDIAN)
+ pat16 = (pat16 & 0xff) | ((pat16 & 0xff ) << 8);
+#elif defined(__LITTLE_ENDIAN)
+ pat16 = (pat16 & 0xff00) | ((pat16 & 0xff00 ) >> 8);
+#else
+#error __BIG_ENDIAN nor __LITTLE_ENDIAN is defined
+#endif
+ loops = len >> 1;
+ do
+ PUP(sout) = pat16;
+ while (--loops);
+ out = (unsigned char *)sout + OFF;
+ }
+ if (len & 1)
+ PUP(out) = PUP(from);
}
}
else if ((op & 64) == 0) { /* 2nd level distance code */
--
1.6.4.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [U-Boot] [PATCH RFC] zlib: Optimize decompression
2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
@ 2009-11-19 14:07 ` Peter Korsgaard
2009-11-19 14:32 ` Joakim Tjernlund
2009-12-05 0:32 ` Wolfgang Denk
1 sibling, 1 reply; 5+ messages in thread
From: Peter Korsgaard @ 2009-11-19 14:07 UTC (permalink / raw)
To: u-boot
>>>>> "Joakim" == Joakim Tjernlund <Joakim.Tjernlund@transmode.se> writes:
Joakim> This patch optimizes the direct copy procedure.
Joakim> Uses get_unaligned() but only in one place.
Joakim> The copy loop just above this one can also use this
Joakim> optimization, but I havn't done so as I have not tested if it
Joakim> is a win there too.
Joakim> On my MPC8321 this is about 17% faster on my JFFS2 root FS
Joakim> than the original. No speed test has been performed in u-boot.
On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).
Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
--
Bye, Peter Korsgaard
^ permalink raw reply [flat|nested] 5+ messages in thread
* [U-Boot] [PATCH RFC] zlib: Optimize decompression
2009-11-19 14:07 ` Peter Korsgaard
@ 2009-11-19 14:32 ` Joakim Tjernlund
2009-11-19 14:51 ` Peter Korsgaard
0 siblings, 1 reply; 5+ messages in thread
From: Joakim Tjernlund @ 2009-11-19 14:32 UTC (permalink / raw)
To: u-boot
Peter Korsgaard <jacmet@gmail.com> wrote on 19/11/2009 15:07:12:
>
> >>>>> "Joakim" == Joakim Tjernlund <Joakim.Tjernlund@transmode.se> writes:
>
> Joakim> This patch optimizes the direct copy procedure.
> Joakim> Uses get_unaligned() but only in one place.
> Joakim> The copy loop just above this one can also use this
> Joakim> optimization, but I havn't done so as I have not tested if it
> Joakim> is a win there too.
> Joakim> On my MPC8321 this is about 17% faster on my JFFS2 root FS
> Joakim> than the original. No speed test has been performed in u-boot.
>
> On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).
>
> Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
>
> Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Thanks, question: How does this compare with your lzo uncompress?
Jocke
^ permalink raw reply [flat|nested] 5+ messages in thread
* [U-Boot] [PATCH RFC] zlib: Optimize decompression
2009-11-19 14:32 ` Joakim Tjernlund
@ 2009-11-19 14:51 ` Peter Korsgaard
0 siblings, 0 replies; 5+ messages in thread
From: Peter Korsgaard @ 2009-11-19 14:51 UTC (permalink / raw)
To: u-boot
>>>>> "Joakim" == Joakim Tjernlund <joakim.tjernlund@transmode.se> writes:
Hi,
>> On a mpc8347 board it's ~12% faster at decompressing the uImage (165ms).
>>
Joakim> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
>>
>> Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Joakim> Thanks, question: How does this compare with your lzo uncompress?
Lzo is still slightly faster, but it only about ~5% (60ms). This is at
400MHz with very slow flash. With slower cpu/faster flash the difference
would probably be bigger (it certainly was when I compared with zlib
before your optimization).
I can rerun that test, but first I need to figure out why 2009.11-rc1 is
more than 1 second slower than 2009.08.
--
Bye, Peter Korsgaard
^ permalink raw reply [flat|nested] 5+ messages in thread
* [U-Boot] [PATCH RFC] zlib: Optimize decompression
2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
2009-11-19 14:07 ` Peter Korsgaard
@ 2009-12-05 0:32 ` Wolfgang Denk
1 sibling, 0 replies; 5+ messages in thread
From: Wolfgang Denk @ 2009-12-05 0:32 UTC (permalink / raw)
To: u-boot
Dear Joakim Tjernlund,
In message <1258633364-20805-1-git-send-email-Joakim.Tjernlund@transmode.se> you wrote:
> This patch optimizes the direct copy procedure.
> Uses get_unaligned() but only in one place.
> The copy loop just above this one can also use this
> optimization, but I havn't done so as I have not tested if it
> is a win there too.
> On my MPC8321 this is about 17% faster on my JFFS2 root FS
> than the original. No speed test has been performed in u-boot.
>
> Size increase on ppc: 484 bytes
>
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> ---
>
> I have the same optimization queued for linux. Figured it
> would be useful for u-boot too.
>
> Testing and feedback welcome.
>
> lib_generic/zlib.c | 56 ++++++++++++++++++++++++++++++++++++++++-----------
> 1 files changed, 44 insertions(+), 12 deletions(-)
Applied to "next", thanks.
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
I don't mind criticism. You know me. I've never been one to take
offence at criticism. No one could say I'm the sort to take offence
at criticism -- Not twice, anyway. Not without blowing bubbles.
- Terry Pratchett, _Witches Abroad_
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-05 0:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 12:22 [U-Boot] [PATCH RFC] zlib: Optimize decompression Joakim Tjernlund
2009-11-19 14:07 ` Peter Korsgaard
2009-11-19 14:32 ` Joakim Tjernlund
2009-11-19 14:51 ` Peter Korsgaard
2009-12-05 0:32 ` Wolfgang Denk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox