public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>
To: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Michal Marek <mmarek@suse.cz>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	x86@kernel.org, celinux-dev@lists.celinuxforum.org,
	Nicolas Pitre <nico@fluxnic.net>,
	Nitin Gupta <nitingupta910@gmail.com>,
	Richard Purdie <rpurdie@openedhand.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Joe Millenbach <jmillenbach@gmail.com>,
	David Sterba <dsterba@suse.cz>,
	Richard Cochran <richardcochran@gmail.com>,
	Albin Tonnerre <albin.tonnerre@free-electrons.com>,
	Egon Alter <egon.alter@gmx.net>,
	hyojun.im@lge.com, chan.jeong@lge.com,
	raphael.andy.lee@gmail.com
Subject: Re: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel
Date: Tue, 26 Feb 2013 21:33:22 +0100	[thread overview]
Message-ID: <512D1C12.4080109@oberhumer.com> (raw)
In-Reply-To: <1361859870-15751-1-git-send-email-kyungsik.lee@lge.com>

[-- Attachment #1: Type: text/plain, Size: 3866 bytes --]

On 2013-02-26 07:24, Kyungsik Lee wrote:
> Hi,
> 
> [...]
> 
> Through the benchmark, it was found that -Os Compiler flag for
> decompress.o brought better decompression performance in most of cases
> (ex, different compiler and hardware spec.) in ARM architecture.
> 
> Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
> option even though it is supported. The decompression speed can be
> slightly slower in some cases.
> 
> This patchset is based on 3.8.
> 
> Any comments are appreciated.

Did you actually *try* the new LZO version and the patch (which is attached
once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?

Because the new LZO version is faster than LZ4 in my testing, at least
when comparing apples with apples and enabling unaligned access in
BOTH versions:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2012    :          44 MB/sec          117 MB/sec     no unaligned access
  LZO-2013-UA :          47 MB/sec          167 MB/sec     Unaligned Access
  LZ4 r88  UA :          46 MB/sec          154 MB/sec     Unaligned Access

~Markus


> 
> Thanks,
> Kyungsik
> 
> 
> Benchmark Results(PATCH v2)
> Compiler: Linaro ARM gcc 4.6.2
> 1. ARMv7, 1.5GHz based board
>    Kernel: linux 3.4
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.7MB            21.1MB/s
>    LZ4  7.3MB            29.1MB/s, 45.6MB/s(UA)
> 2. ARMv7, 1.7GHz based board
>    Kernel: linux 3.7
>    Uncompressed Kernel Size: 14MB
>         Compressed Size  Decompression Speed
>    LZO  6.0MB            34.1MB/s
>    LZ4  6.5MB            86.7MB/s
> UA: Unaligned memory Access support
> 
> 
> Change log: v2
> - Clean up code
> - Enable unaligned access for ARM v6 and above with
>   CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> - Add lz4_decompress() for faster decompression with
>   uncompressed output size
> - Use lz4_decompress() for LZ4-compressed kernel during
>   boot-process
> - Apply -Os to decompress.o to improve decompress
>   performance during boot-up process
> 
> 
> Kyungsik Lee (4):
>   decompressor: Add LZ4 decompressor module
>   lib: Add support for LZ4-compressed kernel
>   arm: Add support for LZ4-compressed kernel
>   x86: Add support for LZ4-compressed kernel
> 
>  arch/arm/Kconfig                      |   1 +
>  arch/arm/boot/compressed/.gitignore   |   1 +
>  arch/arm/boot/compressed/Makefile     |   6 +-
>  arch/arm/boot/compressed/decompress.c |   4 +
>  arch/arm/boot/compressed/piggy.lz4.S  |   6 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/boot/compressed/Makefile     |   5 +-
>  arch/x86/boot/compressed/misc.c       |   4 +
>  include/linux/decompress/unlz4.h      |  10 +
>  include/linux/lz4.h                   |  48 +++++
>  init/Kconfig                          |  13 +-
>  lib/Kconfig                           |   7 +
>  lib/Makefile                          |   2 +
>  lib/decompress.c                      |   5 +
>  lib/decompress_unlz4.c                | 190 +++++++++++++++++++
>  lib/lz4/Makefile                      |   1 +
>  lib/lz4/lz4_decompress.c              | 331 ++++++++++++++++++++++++++++++++++
>  lib/lz4/lz4defs.h                     |  93 ++++++++++
>  scripts/Makefile.lib                  |   5 +
>  usr/Kconfig                           |   9 +
>  20 files changed, 739 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm/boot/compressed/piggy.lz4.S
>  create mode 100644 include/linux/decompress/unlz4.h
>  create mode 100644 include/linux/lz4.h
>  create mode 100644 lib/decompress_unlz4.c
>  create mode 100644 lib/lz4/Makefile
>  create mode 100644 lib/lz4/lz4_decompress.c
>  create mode 100644 lib/lz4/lz4defs.h
> 


-- 
Markus Oberhumer, <markus@oberhumer.com>, http://www.oberhumer.com/

[-- Attachment #2: lib-lzo-huge-LZO-decompression-speedup-on-ARM.patch --]
[-- Type: text/x-patch, Size: 1584 bytes --]

commit 8745b927fcfcd6953ada9bd1220a73083db5948a
Author: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Date:   Mon Feb 4 02:26:14 2013 +0100

    lib/lzo: huge LZO decompression speedup on ARM by using unaligned access
    
    Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>

diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index 569985d..e3edc5f 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -72,9 +72,11 @@ copy_literal_run:
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  if !defined(__arm__)
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  endif
 					} while (ip < ie);
 					ip = ie;
 					op = oe;
@@ -159,9 +161,11 @@ copy_literal_run:
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  if !defined(__arm__)
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  endif
 				} while (op < oe);
 				op = oe;
 				if (HAVE_IP(6)) {
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index 5a4beb2..b230601 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -12,8 +12,14 @@
  */
 
 
+#if 1 && defined(__arm__) && ((__LINUX_ARM_ARCH__ >= 6) || defined(__ARM_FEATURE_UNALIGNED))
+#define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1
+#define COPY4(dst, src)	\
+		* (u32 *) (void *) (dst) = * (const u32 *) (const void *) (src)
+#else
 #define COPY4(dst, src)	\
 		put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#endif
 #if defined(__x86_64__)
 #define COPY8(dst, src)	\
 		put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))




  parent reply	other threads:[~2013-02-26 20:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-26  6:24 [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 1/4] decompressor: Add LZ4 decompressor module Kyungsik Lee
2013-02-26 13:12   ` David Sterba
2013-02-27  4:38     ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 2/4] lib: Add support for LZ4-compressed kernel Kyungsik Lee
2013-02-26 14:00   ` David Sterba
2013-02-28  5:22     ` Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 3/4] arm: " Kyungsik Lee
2013-02-26  6:24 ` [RFC PATCH v2 4/4] x86: " Kyungsik Lee
2013-02-26 20:33 ` Markus F.X.J. Oberhumer [this message]
2013-02-26 20:59   ` [RFC PATCH v2 0/4] " Nicolas Pitre
2013-02-26 21:58     ` Peter Korsgaard
2013-02-26 22:09       ` Nicolas Pitre
2013-02-26 22:10       ` Russell King - ARM Linux
2013-02-27  1:40         ` Joe Perches
2013-02-27  9:56           ` Russell King - ARM Linux
2013-02-27 15:49             ` Joe Perches
2013-02-27 16:08               ` Nicolas Pitre
2013-02-27 16:31               ` Russell King - ARM Linux
2013-02-27 16:53                 ` Borislav Petkov
2013-02-27 17:04                 ` Joe Perches
2013-02-27 17:16                   ` Nicolas Pitre
2013-02-27 17:39                     ` Joe Perches
2013-02-27 17:52                       ` Nicolas Pitre
2013-02-27 17:57                       ` Russell King - ARM Linux
2013-02-27 17:36                   ` Russell King - ARM Linux
2013-02-28  4:22                     ` Joe Perches
2013-02-27  7:36   ` Kyungsik Lee
2013-02-27  9:51     ` Russell King - ARM Linux
2013-02-27 10:20       ` Johannes Stezenbach
2013-02-27 15:35         ` Nicolas Pitre
2013-02-27 13:23       ` Kyungsik Lee
2013-02-27 22:21       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512D1C12.4080109@oberhumer.com \
    --to=markus@oberhumer.com \
    --cc=akpm@linux-foundation.org \
    --cc=albin.tonnerre@free-electrons.com \
    --cc=celinux-dev@lists.celinuxforum.org \
    --cc=chan.jeong@lge.com \
    --cc=dsterba@suse.cz \
    --cc=egon.alter@gmx.net \
    --cc=hpa@zytor.com \
    --cc=hyojun.im@lge.com \
    --cc=jmillenbach@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kyungsik.lee@lge.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@redhat.com \
    --cc=mmarek@suse.cz \
    --cc=nico@fluxnic.net \
    --cc=nitingupta910@gmail.com \
    --cc=raphael.andy.lee@gmail.com \
    --cc=richardcochran@gmail.com \
    --cc=rpurdie@openedhand.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox