All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>
To: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Michal Marek <mmarek@suse.cz>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	x86@kernel.org, celinux-dev@lists.celinuxforum.org,
	Nicolas Pitre <nico@fluxnic.net>,
	Nitin Gupta <nitingupta910@gmail.com>,
	Richard Purdie <rpurdie@openedhand.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Joe Millenbach <jmillenbach@gmail.com>,
	David Sterba <dsterba@suse.cz>,
	Richard Cochran <richardcochran@gmail.com>,
	Albin Tonnerre <albin.tonnerre@free-electrons.com>,
	Egon Alter <egon.alter@gmx.net>,
	hyojun.im@lge.com, chan.jeong@lge.com,
	raphael.andy.lee@gmail.com
Subject: Re: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel
Date: Tue, 26 Feb 2013 21:33:22 +0100	[thread overview]
Message-ID: <512D1C12.4080109@oberhumer.com> (raw)
In-Reply-To: <1361859870-15751-1-git-send-email-kyungsik.lee@lge.com>

[-- Attachment #1: Type: text/plain, Size: 3866 bytes --]

On 2013-02-26 07:24, Kyungsik Lee wrote:
> Hi,
> 
> [...]
> 
> Through the benchmark, it was found that -Os Compiler flag for
> decompress.o brought better decompression performance in most of cases
> (ex, different compiler and hardware spec.) in ARM architecture.
> 
> Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
> option even though it is supported. The decompression speed can be
> slightly slower in some cases.
> 
> This patchset is based on 3.8.
> 
> Any comments are appreciated.

Did you actually *try* the new LZO version and the patch (which is attached
once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?

Because the new LZO version is faster than LZ4 in my testing, at least
when comparing apples with apples and enabling unaligned access in
BOTH versions:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2012    :          44 MB/sec          117 MB/sec     no unaligned access
  LZO-2013-UA :          47 MB/sec          167 MB/sec     Unaligned Access
  LZ4 r88  UA :          46 MB/sec          154 MB/sec     Unaligned Access

~Markus


> 
> Thanks,
> Kyungsik
> 
> 
> Benchmark Results(PATCH v2)
> Compiler: Linaro ARM gcc 4.6.2
> 1. ARMv7, 1.5GHz based board
>    Kernel: linux 3.4
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.7MB            21.1MB/s
>    LZ4  7.3MB            29.1MB/s, 45.6MB/s(UA)
> 2. ARMv7, 1.7GHz based board
>    Kernel: linux 3.7
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.0MB            34.1MB/s
>    LZ4  6.5MB            86.7MB/s
> UA: Unaligned memory Access support
> 
> 
> Change log: v2
> - Clean up code
> - Enable unaligned access for ARM v6 and above with
>   CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> - Add lz4_decompress() for faster decompression with
>   uncompressed output size
> - Use lz4_decompress() for LZ4-compressed kernel during
>   boot-process
> - Apply -Os to decompress.o to improve decompress
>   performance during boot-up process
> 
> 
> Kyungsik Lee (4):
>   decompressor: Add LZ4 decompressor module
>   lib: Add support for LZ4-compressed kernel
>   arm: Add support for LZ4-compressed kernel
>   x86: Add support for LZ4-compressed kernel
> 
>  arch/arm/Kconfig                      |   1 +
>  arch/arm/boot/compressed/.gitignore   |   1 +
>  arch/arm/boot/compressed/Makefile     |   6 +-
>  arch/arm/boot/compressed/decompress.c |   4 +
>  arch/arm/boot/compressed/piggy.lz4.S  |   6 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/compressed/Makefile     |   5 +-
>  arch/x86/boot/compressed/misc.c       |   4 +
>  include/linux/decompress/unlz4.h      |  10 +
>  include/linux/lz4.h                   |  48 +++++
>  init/Kconfig                          |  13 +-
>  lib/Kconfig                           |   7 +
>  lib/Makefile                          |   2 +
>  lib/decompress.c                      |   5 +
>  lib/decompress_unlz4.c                | 190 +++++++++++++++++++
>  lib/lz4/Makefile                      |   1 +
>  lib/lz4/lz4_decompress.c              | 331 ++++++++++++++++++++++++++++++++++
>  lib/lz4/lz4defs.h                     |  93 ++++++++++
>  scripts/Makefile.lib                  |   5 +
>  usr/Kconfig                           |   9 +
>  20 files changed, 739 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm/boot/compressed/piggy.lz4.S
>  create mode 100644 include/linux/decompress/unlz4.h
>  create mode 100644 include/linux/lz4.h
>  create mode 100644 lib/decompress_unlz4.c
>  create mode 100644 lib/lz4/Makefile
>  create mode 100644 lib/lz4/lz4_decompress.c
>  create mode 100644 lib/lz4/lz4defs.h
> 


-- 
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/

[-- Attachment #2: lib-lzo-huge-LZO-decompression-speedup-on-ARM.patch --]
[-- Type: text/x-patch, Size: 1584 bytes --]

commit 8745b927fcfcd6953ada9bd1220a73083db5948a
Author: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Date:   Mon Feb 4 02:26:14 2013 +0100

    lib/lzo: huge LZO decompression speedup on ARM by using unaligned access
    
    Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>

diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index 569985d..e3edc5f 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -72,9 +72,11 @@ copy_literal_run:
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  if !defined(__arm__)
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  endif
 					} while (ip < ie);
 					ip = ie;
 					op = oe;
@@ -159,9 +161,11 @@ copy_literal_run:
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  if !defined(__arm__)
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  endif
 				} while (op < oe);
 				op = oe;
 				if (HAVE_IP(6)) {
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index 5a4beb2..b230601 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -12,8 +12,14 @@
  */
 
 
+#if 1 && defined(__arm__) && ((__LINUX_ARM_ARCH__ >= 6) || defined(__ARM_FEATURE_UNALIGNED))
+#define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1
+#define COPY4(dst, src)	\
+		* (u32 *) (void *) (dst) = * (const u32 *) (const void *) (src)
+#else
 #define COPY4(dst, src)	\
 		put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#endif
 #if defined(__x86_64__)
 #define COPY8(dst, src)	\
 		put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))




WARNING: multiple messages have this Message-ID (diff)
From: markus@oberhumer.com (Markus F.X.J. Oberhumer)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel
Date: Tue, 26 Feb 2013 21:33:22 +0100	[thread overview]
Message-ID: <512D1C12.4080109@oberhumer.com> (raw)
In-Reply-To: <1361859870-15751-1-git-send-email-kyungsik.lee@lge.com>

On 2013-02-26 07:24, Kyungsik Lee wrote:
> Hi,
> 
> [...]
> 
> Through the benchmark, it was found that -Os Compiler flag for
> decompress.o brought better decompression performance in most of cases
> (ex, different compiler and hardware spec.) in ARM architecture.
> 
> Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
> option even though it is supported. The decompression speed can be
> slightly slower in some cases.
> 
> This patchset is based on 3.8.
> 
> Any comments are appreciated.

Did you actually *try* the new LZO version and the patch (which is attached
once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?

Because the new LZO version is faster than LZ4 in my testing, at least
when comparing apples with apples and enabling unaligned access in
BOTH versions:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2012    :          44 MB/sec          117 MB/sec     no unaligned access
  LZO-2013-UA :          47 MB/sec          167 MB/sec     Unaligned Access
  LZ4 r88  UA :          46 MB/sec          154 MB/sec     Unaligned Access

~Markus


> 
> Thanks,
> Kyungsik
> 
> 
> Benchmark Results(PATCH v2)
> Compiler: Linaro ARM gcc 4.6.2
> 1. ARMv7, 1.5GHz based board
>    Kernel: linux 3.4
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.7MB            21.1MB/s
>    LZ4  7.3MB            29.1MB/s, 45.6MB/s(UA)
> 2. ARMv7, 1.7GHz based board
>    Kernel: linux 3.7
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.0MB            34.1MB/s
>    LZ4  6.5MB            86.7MB/s
> UA: Unaligned memory Access support
> 
> 
> Change log: v2
> - Clean up code
> - Enable unaligned access for ARM v6 and above with
>   CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> - Add lz4_decompress() for faster decompression with
>   uncompressed output size
> - Use lz4_decompress() for LZ4-compressed kernel during
>   boot-process
> - Apply -Os to decompress.o to improve decompress
>   performance during boot-up process
> 
> 
> Kyungsik Lee (4):
>   decompressor: Add LZ4 decompressor module
>   lib: Add support for LZ4-compressed kernel
>   arm: Add support for LZ4-compressed kernel
>   x86: Add support for LZ4-compressed kernel
> 
>  arch/arm/Kconfig                      |   1 +
>  arch/arm/boot/compressed/.gitignore   |   1 +
>  arch/arm/boot/compressed/Makefile     |   6 +-
>  arch/arm/boot/compressed/decompress.c |   4 +
>  arch/arm/boot/compressed/piggy.lz4.S  |   6 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/compressed/Makefile     |   5 +-
>  arch/x86/boot/compressed/misc.c       |   4 +
>  include/linux/decompress/unlz4.h      |  10 +
>  include/linux/lz4.h                   |  48 +++++
>  init/Kconfig                          |  13 +-
>  lib/Kconfig                           |   7 +
>  lib/Makefile                          |   2 +
>  lib/decompress.c                      |   5 +
>  lib/decompress_unlz4.c                | 190 +++++++++++++++++++
>  lib/lz4/Makefile                      |   1 +
>  lib/lz4/lz4_decompress.c              | 331 ++++++++++++++++++++++++++++++++++
>  lib/lz4/lz4defs.h                     |  93 ++++++++++
>  scripts/Makefile.lib                  |   5 +
>  usr/Kconfig                           |   9 +
>  20 files changed, 739 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm/boot/compressed/piggy.lz4.S
>  create mode 100644 include/linux/decompress/unlz4.h
>  create mode 100644 include/linux/lz4.h
>  create mode 100644 lib/decompress_unlz4.c
>  create mode 100644 lib/lz4/Makefile
>  create mode 100644 lib/lz4/lz4_decompress.c
>  create mode 100644 lib/lz4/lz4defs.h
> 


-- 
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lib-lzo-huge-LZO-decompression-speedup-on-ARM.patch
Type: text/x-patch
Size: 1584 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130226/266cf234/attachment.bin>

  parent reply	other threads:[~2013-02-26 20:33 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-26  6:24 [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel Kyungsik Lee
2013-02-26  6:24 ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 1/4] decompressor: Add LZ4 decompressor module Kyungsik Lee
2013-02-26  6:24   ` Kyungsik Lee
2013-02-26 13:12   ` David Sterba
2013-02-26 13:12     ` David Sterba
2013-02-27  4:38     ` Kyungsik Lee
2013-02-27  4:38       ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 2/4] lib: Add support for LZ4-compressed kernel Kyungsik Lee
2013-02-26  6:24   ` Kyungsik Lee
2013-02-26 14:00   ` David Sterba
2013-02-26 14:00     ` David Sterba
2013-02-28  5:22     ` Kyungsik Lee
2013-02-28  5:22       ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 3/4] arm: " Kyungsik Lee
2013-02-26  6:24   ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 4/4] x86: " Kyungsik Lee
2013-02-26  6:24   ` Kyungsik Lee
2013-02-26 20:33 ` Markus F.X.J. Oberhumer [this message]
2013-02-26 20:33   ` [RFC PATCH v2 0/4] " Markus F.X.J. Oberhumer
2013-02-26 20:59   ` Nicolas Pitre
2013-02-26 20:59     ` Nicolas Pitre
2013-02-26 21:58     ` Peter Korsgaard
2013-02-26 21:58       ` Peter Korsgaard
2013-02-26 22:09       ` Nicolas Pitre
2013-02-26 22:09         ` Nicolas Pitre
2013-02-26 22:10       ` Russell King - ARM Linux
2013-02-26 22:10         ` Russell King - ARM Linux
2013-02-27  1:40         ` Joe Perches
2013-02-27  1:40           ` Joe Perches
2013-02-27  9:56           ` Russell King - ARM Linux
2013-02-27  9:56             ` Russell King - ARM Linux
2013-02-27 15:49             ` Joe Perches
2013-02-27 15:49               ` Joe Perches
2013-02-27 16:08               ` Nicolas Pitre
2013-02-27 16:08                 ` Nicolas Pitre
2013-02-27 16:08                 ` Nicolas Pitre
2013-02-27 16:31               ` Russell King - ARM Linux
2013-02-27 16:31                 ` Russell King - ARM Linux
2013-02-27 16:53                 ` Borislav Petkov
2013-02-27 16:53                   ` Borislav Petkov
2013-02-27 17:04                 ` Joe Perches
2013-02-27 17:04                   ` Joe Perches
2013-02-27 17:16                   ` Nicolas Pitre
2013-02-27 17:16                     ` Nicolas Pitre
2013-02-27 17:39                     ` Joe Perches
2013-02-27 17:39                       ` Joe Perches
2013-02-27 17:52                       ` Nicolas Pitre
2013-02-27 17:52                         ` Nicolas Pitre
2013-02-27 17:57                       ` Russell King - ARM Linux
2013-02-27 17:57                         ` Russell King - ARM Linux
2013-02-27 17:36                   ` Russell King - ARM Linux
2013-02-27 17:36                     ` Russell King - ARM Linux
2013-02-28  4:22                     ` Joe Perches
2013-02-28  4:22                       ` Joe Perches
2013-02-27  7:36   ` Kyungsik Lee
2013-02-27  7:36     ` Kyungsik Lee
2013-02-27  9:51     ` Russell King - ARM Linux
2013-02-27  9:51       ` Russell King - ARM Linux
2013-02-27 10:20       ` Johannes Stezenbach
2013-02-27 10:20         ` Johannes Stezenbach
2013-02-27 15:35         ` Nicolas Pitre
2013-02-27 15:35           ` Nicolas Pitre
2013-02-27 13:23       ` Kyungsik Lee
2013-02-27 13:23         ` Kyungsik Lee
2013-02-27 22:21       ` Andrew Morton
2013-02-27 22:21         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512D1C12.4080109@oberhumer.com \
    --to=markus@oberhumer.com \
    --cc=akpm@linux-foundation.org \
    --cc=albin.tonnerre@free-electrons.com \
    --cc=celinux-dev@lists.celinuxforum.org \
    --cc=chan.jeong@lge.com \
    --cc=dsterba@suse.cz \
    --cc=egon.alter@gmx.net \
    --cc=hpa@zytor.com \
    --cc=hyojun.im@lge.com \
    --cc=jmillenbach@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kyungsik.lee@lge.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@redhat.com \
    --cc=mmarek@suse.cz \
    --cc=nico@fluxnic.net \
    --cc=nitingupta910@gmail.com \
    --cc=raphael.andy.lee@gmail.com \
    --cc=richardcochran@gmail.com \
    --cc=rpurdie@openedhand.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.