* [PATCH 0/3] ARM ZSTD boot compression
@ 2023-04-12 21:21 Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-12 21:21 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Jonathan Neuschäfer,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel
This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
- LZO: 7.2 MiB, 6 seconds
- ZSTD: 5.6 MiB, 60 seconds
Jonathan Neuschäfer (3):
ARM: compressed: Pass the actual output length to the decompressor
ARM: compressed: Bump MALLOC_SIZE to 128 KiB
ARM: compressed: Enable ZSTD compression
arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/Makefile | 5 +++--
arch/arm/boot/compressed/decompress.c | 8 ++++++--
arch/arm/boot/compressed/head.S | 4 ++--
arch/arm/boot/compressed/misc.c | 12 ++++++++++--
5 files changed, 22 insertions(+), 8 deletions(-)
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor
2023-04-12 21:21 [PATCH 0/3] ARM ZSTD boot compression Jonathan Neuschäfer
@ 2023-04-12 21:21 ` Jonathan Neuschäfer
2023-04-12 21:42 ` Linus Walleij
` (2 more replies)
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
` (2 subsequent siblings)
3 siblings, 3 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-12 21:21 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Jonathan Neuschäfer,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel, Russell King (Oracle)
ZSTD writes outside of the space that is necessary for the uncompressed
data, when it is told it has unlimited output length. To fix this, pass
the actual output length (the length of the uncompressed kernel) to the
decompressor.
The uncompressed length is already stored as a little endian 32-bit
constant before the input_data_end symbol.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
---
arch/arm/boot/compressed/decompress.c | 4 ++--
arch/arm/boot/compressed/misc.c | 12 ++++++++++--
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
index 74255e8198314..3d098b84ee391 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -59,7 +59,7 @@ extern char * strchrnul(const char *, int);
#include "../../../../lib/decompress_unlz4.c"
#endif
-int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x))
+int do_decompress(u8 *input, int len, u8 *output, int outlen, void (*error)(char *x))
{
- return __decompress(input, len, NULL, NULL, output, 0, NULL, error);
+ return __decompress(input, len, NULL, NULL, output, outlen, NULL, error);
}
diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index abfed1aa2baa8..8402b29bccc82 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -22,6 +22,7 @@ unsigned int __machine_arch_type;
#include <linux/compiler.h> /* for inline */
#include <linux/types.h>
#include <linux/linkage.h>
+#include <asm/unaligned.h>
#include "misc.h"
#ifdef CONFIG_ARCH_EP93XX
#include "misc-ep93xx.h"
@@ -131,17 +132,24 @@ asmlinkage void __div0(void)
error("Attempting division by 0!");
}
-extern int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x));
+extern int do_decompress(u8 *input, int len, u8 *output, int outlen,
+ void (*error)(char *x));
+static u32 get_inflated_image_size(void)
+{
+ return get_unaligned_le32(input_data_end - 4);
+}
void
decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p,
unsigned long free_mem_ptr_end_p,
int arch_id)
{
+ unsigned long output_data_len;
int ret;
output_data = (unsigned char *)output_start;
+ output_data_len = get_inflated_image_size();
free_mem_ptr = free_mem_ptr_p;
free_mem_end_ptr = free_mem_ptr_end_p;
__machine_arch_type = arch_id;
@@ -153,7 +161,7 @@ decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p,
putstr("Uncompressing Linux...");
ret = do_decompress(input_data, input_data_end - input_data,
- output_data, error);
+ output_data, output_data_len, error);
if (ret)
error("decompressor returned an error");
else
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB
2023-04-12 21:21 [PATCH 0/3] ARM ZSTD boot compression Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
@ 2023-04-12 21:21 ` Jonathan Neuschäfer
2023-04-12 21:43 ` Linus Walleij
` (2 more replies)
2023-04-12 21:21 ` [PATCH 3/3] ARM: compressed: Enable ZSTD compression Jonathan Neuschäfer
2023-04-12 21:33 ` [PATCH 0/3] ARM ZSTD boot compression Arnd Bergmann
3 siblings, 3 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-12 21:21 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Jonathan Neuschäfer,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel, Russell King (Oracle),
Kees Cook
The ZSTD compressor needs about 100 KiB.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
---
arch/arm/boot/compressed/Makefile | 2 +-
arch/arm/boot/compressed/head.S | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 2ef651a78fa2a..dec565a5b1f21 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -69,7 +69,7 @@ ZTEXTADDR := 0
ZBSSADDR := ALIGN(8)
endif
-MALLOC_SIZE := 65536
+MALLOC_SIZE := 131072
AFLAGS_head.o += -DTEXT_OFFSET=$(TEXT_OFFSET) -DMALLOC_SIZE=$(MALLOC_SIZE)
CPPFLAGS_vmlinux.lds := -DTEXT_START="$(ZTEXTADDR)" -DBSS_START="$(ZBSSADDR)"
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 9f406e9c0ea6f..23fbbe94da6e8 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -337,7 +337,7 @@ restart: adr r0, LC1
get_inflated_image_size r9, r10, lr
#ifndef CONFIG_ZBOOT_ROM
- /* malloc space is above the relocated stack (64k max) */
+ /* malloc space is above the relocated stack (128k max) */
add r10, sp, #MALLOC_SIZE
#else
/*
@@ -629,7 +629,7 @@ not_relocated: mov r0, #0
*/
mov r0, r4
mov r1, sp @ malloc space above stack
- add r2, sp, #MALLOC_SIZE @ 64k max
+ add r2, sp, #MALLOC_SIZE @ 128k max
mov r3, r7
bl decompress_kernel
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/3] ARM: compressed: Enable ZSTD compression
2023-04-12 21:21 [PATCH 0/3] ARM ZSTD boot compression Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
@ 2023-04-12 21:21 ` Jonathan Neuschäfer
2023-04-12 21:45 ` Linus Walleij
2023-04-12 21:49 ` Florian Fainelli
2023-04-12 21:33 ` [PATCH 0/3] ARM ZSTD boot compression Arnd Bergmann
3 siblings, 2 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-12 21:21 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Jonathan Neuschäfer,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel, Russell King (Oracle),
Kees Cook
With the previous two commits, it is possible to enable ZSTD
in the decompressor stub for 32-bit ARM.
Unfortunately, ZSTD decompression has been quite slow in my tests
(on ARM926EJ-S, ARMv5T):
- LZO: 7.2 MiB, 6 seconds
- ZSTD: 5.6 MiB, 60 seconds
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
---
arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/Makefile | 3 ++-
arch/arm/boot/compressed/decompress.c | 4 ++++
3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12fa..065a1746a257a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -108,6 +108,7 @@ config ARM
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_LZO
select HAVE_KERNEL_XZ
+ select HAVE_KERNEL_ZSTD
select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
select HAVE_KRETPROBES if HAVE_KPROBES
select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index dec565a5b1f21..55bfca154b12a 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -81,6 +81,7 @@ compress-$(CONFIG_KERNEL_LZO) = lzo_with_size
compress-$(CONFIG_KERNEL_LZMA) = lzma_with_size
compress-$(CONFIG_KERNEL_XZ) = xzkern_with_size
compress-$(CONFIG_KERNEL_LZ4) = lz4_with_size
+compress-$(CONFIG_KERNEL_ZSTD) = zstd22_with_size
libfdt_objs := fdt_rw.o fdt_ro.o fdt_wip.o fdt.o
@@ -98,7 +99,7 @@ OBJS += lib1funcs.o ashldi3.o bswapsdi2.o
targets := vmlinux vmlinux.lds piggy_data piggy.o \
head.o $(OBJS)
-KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
+KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING -D__DISABLE_EXPORTS
ccflags-y := -fpic $(call cc-option,-mno-single-pic-base,) -fno-builtin \
-I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
index 3d098b84ee391..2c4fd33444829 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -59,6 +59,10 @@ extern char * strchrnul(const char *, int);
#include "../../../../lib/decompress_unlz4.c"
#endif
+#ifdef CONFIG_KERNEL_ZSTD
+#include "../../../../lib/decompress_unzstd.c"
+#endif
+
int do_decompress(u8 *input, int len, u8 *output, int outlen, void (*error)(char *x))
{
return __decompress(input, len, NULL, NULL, output, outlen, NULL, error);
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-04-12 21:21 [PATCH 0/3] ARM ZSTD boot compression Jonathan Neuschäfer
` (2 preceding siblings ...)
2023-04-12 21:21 ` [PATCH 3/3] ARM: compressed: Enable ZSTD compression Jonathan Neuschäfer
@ 2023-04-12 21:33 ` Arnd Bergmann
2023-04-13 11:13 ` Arnd Bergmann
2023-04-14 22:50 ` Jonathan Neuschäfer
3 siblings, 2 replies; 20+ messages in thread
From: Arnd Bergmann @ 2023-04-12 21:33 UTC (permalink / raw)
To: Jonathan Neuschäfer, linux-arm-kernel
Cc: Russell King, Nick Terrell, Tony Lindgren, Geert Uytterhoeven,
Linus Walleij, Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel
On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
>
> - LZO: 7.2 MiB, 6 seconds
> - ZSTD: 5.6 MiB, 60 seconds
That seems unexpected, as the usual numbers say it's about 25%
slower than LZO. Do you have an idea why it is so much slower
here? How long does it take to decompress the
generated arch/arm/boot/Image file in user space on the same
hardware using lzop and zstd?
Arnd
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
@ 2023-04-12 21:42 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
2023-04-13 5:20 ` Tony Lindgren
2 siblings, 0 replies; 20+ messages in thread
From: Linus Walleij @ 2023-04-12 21:42 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: linux-arm-kernel, Russell King, Nick Terrell, Arnd Bergmann,
Tony Lindgren, Geert Uytterhoeven, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Florian Fainelli,
Nick Desaulniers, Xin Li, Seung-Woo Kim, Paul Bolle,
Bart Van Assche, linux-kernel, Russell King (Oracle)
On Wed, Apr 12, 2023 at 11:21 PM Jonathan Neuschäfer
<j.neuschaefer@gmx.net> wrote:
> ZSTD writes outside of the space that is necessary for the uncompressed
> data, when it is told it has unlimited output length. To fix this, pass
> the actual output length (the length of the uncompressed kernel) to the
> decompressor.
>
> The uncompressed length is already stored as a little endian 32-bit
> constant before the input_data_end symbol.
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Really neat fix!
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
@ 2023-04-12 21:43 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
2023-05-02 8:39 ` Russell King (Oracle)
2 siblings, 0 replies; 20+ messages in thread
From: Linus Walleij @ 2023-04-12 21:43 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: linux-arm-kernel, Russell King, Nick Terrell, Arnd Bergmann,
Tony Lindgren, Geert Uytterhoeven, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Florian Fainelli,
Nick Desaulniers, Xin Li, Seung-Woo Kim, Paul Bolle,
Bart Van Assche, linux-kernel, Russell King (Oracle), Kees Cook
On Wed, Apr 12, 2023 at 11:21 PM Jonathan Neuschäfer
<j.neuschaefer@gmx.net> wrote:
> The ZSTD compressor needs about 100 KiB.
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Fair enough
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] ARM: compressed: Enable ZSTD compression
2023-04-12 21:21 ` [PATCH 3/3] ARM: compressed: Enable ZSTD compression Jonathan Neuschäfer
@ 2023-04-12 21:45 ` Linus Walleij
2023-04-12 21:49 ` Florian Fainelli
1 sibling, 0 replies; 20+ messages in thread
From: Linus Walleij @ 2023-04-12 21:45 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: linux-arm-kernel, Russell King, Nick Terrell, Arnd Bergmann,
Tony Lindgren, Geert Uytterhoeven, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Florian Fainelli,
Nick Desaulniers, Xin Li, Seung-Woo Kim, Paul Bolle,
Bart Van Assche, linux-kernel, Russell King (Oracle), Kees Cook
On Wed, Apr 12, 2023 at 11:22 PM Jonathan Neuschäfer
<j.neuschaefer@gmx.net> wrote:
> With the previous two commits, it is possible to enable ZSTD
> in the decompressor stub for 32-bit ARM.
>
> Unfortunately, ZSTD decompression has been quite slow in my tests
> (on ARM926EJ-S, ARMv5T):
>
> - LZO: 7.2 MiB, 6 seconds
> - ZSTD: 5.6 MiB, 60 seconds
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Disappointingly slow, but not all systems are 926, and
not all memory hierarchies are as slow.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
2023-04-12 21:42 ` Linus Walleij
@ 2023-04-12 21:48 ` Florian Fainelli
2023-04-13 5:20 ` Tony Lindgren
2 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2023-04-12 21:48 UTC (permalink / raw)
To: Jonathan Neuschäfer, linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Nick Desaulniers, Xin Li,
Seung-Woo Kim, Paul Bolle, Bart Van Assche, linux-kernel,
Russell King (Oracle)
On 4/12/23 14:21, Jonathan Neuschäfer wrote:
> ZSTD writes outside of the space that is necessary for the uncompressed
> data, when it is told it has unlimited output length. To fix this, pass
> the actual output length (the length of the uncompressed kernel) to the
> decompressor.
>
> The uncompressed length is already stored as a little endian 32-bit
> constant before the input_data_end symbol.
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
2023-04-12 21:43 ` Linus Walleij
@ 2023-04-12 21:48 ` Florian Fainelli
2023-05-02 8:39 ` Russell King (Oracle)
2 siblings, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2023-04-12 21:48 UTC (permalink / raw)
To: Jonathan Neuschäfer, linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Nick Desaulniers, Xin Li,
Seung-Woo Kim, Paul Bolle, Bart Van Assche, linux-kernel,
Russell King (Oracle), Kees Cook
On 4/12/23 14:21, Jonathan Neuschäfer wrote:
> The ZSTD compressor needs about 100 KiB.
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] ARM: compressed: Enable ZSTD compression
2023-04-12 21:21 ` [PATCH 3/3] ARM: compressed: Enable ZSTD compression Jonathan Neuschäfer
2023-04-12 21:45 ` Linus Walleij
@ 2023-04-12 21:49 ` Florian Fainelli
1 sibling, 0 replies; 20+ messages in thread
From: Florian Fainelli @ 2023-04-12 21:49 UTC (permalink / raw)
To: Jonathan Neuschäfer, linux-arm-kernel
Cc: Russell King, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Nick Desaulniers, Xin Li,
Seung-Woo Kim, Paul Bolle, Bart Van Assche, linux-kernel,
Russell King (Oracle), Kees Cook
On 4/12/23 14:21, Jonathan Neuschäfer wrote:
> With the previous two commits, it is possible to enable ZSTD
> in the decompressor stub for 32-bit ARM.
>
> Unfortunately, ZSTD decompression has been quite slow in my tests
> (on ARM926EJ-S, ARMv5T):
>
> - LZO: 7.2 MiB, 6 seconds
> - ZSTD: 5.6 MiB, 60 seconds
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
2023-04-12 21:42 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
@ 2023-04-13 5:20 ` Tony Lindgren
2023-04-15 1:52 ` Jonathan Neuschäfer
2 siblings, 1 reply; 20+ messages in thread
From: Tony Lindgren @ 2023-04-13 5:20 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: linux-arm-kernel, Russell King, Nick Terrell, Arnd Bergmann,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Florian Fainelli,
Nick Desaulniers, Xin Li, Seung-Woo Kim, Paul Bolle,
Bart Van Assche, linux-kernel, Russell King (Oracle)
* Jonathan Neuschäfer <j.neuschaefer@gmx.net> [230412 21:22]:
> --- a/arch/arm/boot/compressed/misc.c
> +++ b/arch/arm/boot/compressed/misc.c
> +static u32 get_inflated_image_size(void)
> +{
> + return get_unaligned_le32(input_data_end - 4);
> +}
Just something to check.. This patch should not picked for the old stable
kernels that did not have the uncompressed image size at the end. Maybe
the patch should have a Depends-on tag to prevent possible issues?
Other than that looks good to me:
Reviewed-by: Tony Lindgren <tony@atomide.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-04-12 21:33 ` [PATCH 0/3] ARM ZSTD boot compression Arnd Bergmann
@ 2023-04-13 11:13 ` Arnd Bergmann
2023-04-15 2:00 ` Jonathan Neuschäfer
2023-04-14 22:50 ` Jonathan Neuschäfer
1 sibling, 1 reply; 20+ messages in thread
From: Arnd Bergmann @ 2023-04-13 11:13 UTC (permalink / raw)
To: Jonathan Neuschäfer, linux-arm-kernel
Cc: Russell King, Nick Terrell, Tony Lindgren, Geert Uytterhoeven,
Linus Walleij, Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel
On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
>> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
>> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
>>
>> - LZO: 7.2 MiB, 6 seconds
>> - ZSTD: 5.6 MiB, 60 seconds
>
> That seems unexpected, as the usual numbers say it's about 25%
> slower than LZO. Do you have an idea why it is so much slower
> here? How long does it take to decompress the
> generated arch/arm/boot/Image file in user space on the same
> hardware using lzop and zstd?
I looked through this a bit more and found two interesting points:
- zstd uses a lot more unaligned loads and stores while
decompressing. On armv5 those turn into individual byte
accesses, while the others can likely use word-aligned
accesses. This could make a huge difference if caches are
disabled during the decompression.
- The sliding window on zstd is much larger, with the kernel
using an 8MB window (zstd=23), compared to the normal 32kb
for deflate (couldn't find the default for lzo), so on
machines with no L2 cache, it is much likely to thrash a
small L1 dcache that are used on most arm9.
Arnd
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-04-12 21:33 ` [PATCH 0/3] ARM ZSTD boot compression Arnd Bergmann
2023-04-13 11:13 ` Arnd Bergmann
@ 2023-04-14 22:50 ` Jonathan Neuschäfer
1 sibling, 0 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-14 22:50 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Jonathan Neuschäfer, linux-arm-kernel, Russell King,
Nick Terrell, Tony Lindgren, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 1861 bytes --]
On Wed, Apr 12, 2023 at 11:33:15PM +0200, Arnd Bergmann wrote:
> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
> > This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
> > Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
> >
> > - LZO: 7.2 MiB, 6 seconds
> > - ZSTD: 5.6 MiB, 60 seconds
>
> That seems unexpected, as the usual numbers say it's about 25%
> slower than LZO. Do you have an idea why it is so much slower
> here?
No clear idea.
I guess it might be related to caching or unaligned memory accesses
somehow.
I suspected CONFIG_CPU_DCACHE_WRITETHROUGH, which was enabled, but
disabling it didn't improve performance.
> How long does it take to decompress the generated arch/arm/boot/Image
> file in user space on the same hardware using lzop and zstd?
Unfortunately, the unzstd userspace tool requires a buffer of of 128 MiB
(the window size), which is too big for my usual devboard (which has about
100 MiB available). I'd have to test on a different board.
Jonathan
---
# uname -a
Linux buildroot 6.3.0-rc6-00020-g023058d50f2f #1212 PREEMPT Fri Apr 14 20:58:21 CEST 2023 armv5tejl GNU/Linux
# ls -lh
total 13M
-rw-r--r-- 1 root root 7.5M Jan 1 00:07 piggy.lzo
-rw-r--r-- 1 root root 5.8M Jan 1 00:07 piggy.zstd
# time lzop -d piggy.lzo -c > /dev/null
lzop: piggy.lzo: warning: ignoring trailing garbage in lzop file
Command exited with non-zero status 2
real 0m 3.38s
user 0m 3.20s
sys 0m 0.18s
# time unzstd piggy.zstd -c > /dev/null
[ 858.270000] __vm_enough_memory: pid: 114, comm: unzstd, not enough memory for the allocation
piggy.zstd : Decoding error (36) : Allocation error : not enough memory
Command exited with non-zero status 1
real 0m 0.03s
user 0m 0.01s
sys 0m 0.03s
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor
2023-04-13 5:20 ` Tony Lindgren
@ 2023-04-15 1:52 ` Jonathan Neuschäfer
0 siblings, 0 replies; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-15 1:52 UTC (permalink / raw)
To: Tony Lindgren
Cc: Jonathan Neuschäfer, linux-arm-kernel, Russell King,
Nick Terrell, Arnd Bergmann, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Nick Hawkins, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel, Russell King (Oracle)
[-- Attachment #1.1: Type: text/plain, Size: 879 bytes --]
On Thu, Apr 13, 2023 at 08:20:50AM +0300, Tony Lindgren wrote:
> * Jonathan Neuschäfer <j.neuschaefer@gmx.net> [230412 21:22]:
> > --- a/arch/arm/boot/compressed/misc.c
> > +++ b/arch/arm/boot/compressed/misc.c
> > +static u32 get_inflated_image_size(void)
> > +{
> > + return get_unaligned_le32(input_data_end - 4);
> > +}
>
> Just something to check.. This patch should not picked for the old stable
> kernels that did not have the uncompressed image size at the end. Maybe
> the patch should have a Depends-on tag to prevent possible issues?
As far as I can see, the appended size has been around for a really long
time (v2.6.28, commit bc22c17e12c130dc929218a95aa347e0f3fd05dc), far
longer than the oldest LTS kernel that's still around.
>
> Other than that looks good to me:
>
> Reviewed-by: Tony Lindgren <tony@atomide.com>
Thanks
Jonathan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-04-13 11:13 ` Arnd Bergmann
@ 2023-04-15 2:00 ` Jonathan Neuschäfer
2023-10-12 22:33 ` Nick Terrell
0 siblings, 1 reply; 20+ messages in thread
From: Jonathan Neuschäfer @ 2023-04-15 2:00 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Jonathan Neuschäfer, linux-arm-kernel, Russell King,
Nick Terrell, Tony Lindgren, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, linux-kernel
[-- Attachment #1.1: Type: text/plain, Size: 1539 bytes --]
On Thu, Apr 13, 2023 at 01:13:21PM +0200, Arnd Bergmann wrote:
> On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
> > On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
> >> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
> >> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
> >>
> >> - LZO: 7.2 MiB, 6 seconds
> >> - ZSTD: 5.6 MiB, 60 seconds
> >
> > That seems unexpected, as the usual numbers say it's about 25%
> > slower than LZO. Do you have an idea why it is so much slower
> > here? How long does it take to decompress the
> > generated arch/arm/boot/Image file in user space on the same
> > hardware using lzop and zstd?
>
> I looked through this a bit more and found two interesting points:
>
> - zstd uses a lot more unaligned loads and stores while
> decompressing. On armv5 those turn into individual byte
> accesses, while the others can likely use word-aligned
> accesses. This could make a huge difference if caches are
> disabled during the decompression.
>
> - The sliding window on zstd is much larger, with the kernel
> using an 8MB window (zstd=23), compared to the normal 32kb
> for deflate (couldn't find the default for lzo), so on
> machines with no L2 cache, it is much likely to thrash a
> small L1 dcache that are used on most arm9.
>
> Arnd
Make sense.
For ZSTD as used in kernel decompression (the zstd22 configuration), the
window is even bigger, 128 MiB. (AFAIU)
Thanks
Jonathan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
2023-04-12 21:43 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
@ 2023-05-02 8:39 ` Russell King (Oracle)
2 siblings, 0 replies; 20+ messages in thread
From: Russell King (Oracle) @ 2023-05-02 8:39 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: linux-arm-kernel, Nick Terrell, Arnd Bergmann, Tony Lindgren,
Geert Uytterhoeven, Linus Walleij, Sebastian Reichel,
Nick Hawkins, Christophe Leroy, Florian Fainelli,
Nick Desaulniers, Xin Li, Seung-Woo Kim, Paul Bolle,
Bart Van Assche, linux-kernel, Kees Cook
On Wed, Apr 12, 2023 at 11:21:25PM +0200, Jonathan Neuschäfer wrote:
> The ZSTD compressor needs about 100 KiB.
>
> Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Please check the kexec sources to see whether that also needs to be
updated. IIRC, kexec also needs to know how big this malloc space
is.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-04-15 2:00 ` Jonathan Neuschäfer
@ 2023-10-12 22:33 ` Nick Terrell
2023-10-13 1:27 ` J. Neuschäfer
0 siblings, 1 reply; 20+ messages in thread
From: Nick Terrell @ 2023-10-12 22:33 UTC (permalink / raw)
To: Jonathan Neuschäfer
Cc: Nick Terrell, Arnd Bergmann, Linux ARM, Russell King,
Nick Terrell, Tony Lindgren, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, Linux Kernel Mailing List
> On Apr 14, 2023, at 10:00 PM, Jonathan Neuschäfer <j.neuschaefer@gmx.net> wrote:
>
> On Thu, Apr 13, 2023 at 01:13:21PM +0200, Arnd Bergmann wrote:
>> On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
>>> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
>>>> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
>>>> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
>>>>
>>>> - LZO: 7.2 MiB, 6 seconds
>>>> - ZSTD: 5.6 MiB, 60 seconds
>>>
>>> That seems unexpected, as the usual numbers say it's about 25%
>>> slower than LZO. Do you have an idea why it is so much slower
>>> here? How long does it take to decompress the
>>> generated arch/arm/boot/Image file in user space on the same
>>> hardware using lzop and zstd?
>>
>> I looked through this a bit more and found two interesting points:
>>
>> - zstd uses a lot more unaligned loads and stores while
>> decompressing. On armv5 those turn into individual byte
>> accesses, while the others can likely use word-aligned
>> accesses. This could make a huge difference if caches are
>> disabled during the decompression.
>>
>> - The sliding window on zstd is much larger, with the kernel
>> using an 8MB window (zstd=23), compared to the normal 32kb
>> for deflate (couldn't find the default for lzo), so on
>> machines with no L2 cache, it is much likely to thrash a
>> small L1 dcache that are used on most arm9.
>>
>> Arnd
>
> Make sense.
>
> For ZSTD as used in kernel decompression (the zstd22 configuration), the
> window is even bigger, 128 MiB. (AFAIU)
Sorry, I’m a bit late to the party, I wasn’t getting LKML email for some time...
But this is totally configurable. You can switch compression configurations
at any time. If you believe that the window size is the issue causing speed
regressions, you could use a zstd compression to use a e.g. 256KB window
size like this:
zstd -19 --zstd=wlog=18
This will keep the same algorithm search strength, but limit the decoder memory
usage.
I will also try to get this patchset working on my machine, and try to debug.
The 10x slower speed difference is not expected, and we see much better speed
in userspace ARM. I suspect it has something to do with the preboot environment.
E.g. when implementing x86-64 zstd kernel decompression, I noticed that
memcpy(dst, src, 16) wasn’t getting inlined properly, causing a massive performance
penalty.
Best,
Nick Terrell
> Thanks
>
> Jonathan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-10-12 22:33 ` Nick Terrell
@ 2023-10-13 1:27 ` J. Neuschäfer
2023-10-20 18:53 ` Nick Terrell
0 siblings, 1 reply; 20+ messages in thread
From: J. Neuschäfer @ 2023-10-13 1:27 UTC (permalink / raw)
To: Nick Terrell
Cc: Jonathan Neuschäfer, Arnd Bergmann, Linux ARM, Russell King,
Tony Lindgren, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, Linux Kernel Mailing List
[-- Attachment #1.1: Type: text/plain, Size: 1952 bytes --]
On Thu, Oct 12, 2023 at 10:33:23PM +0000, Nick Terrell wrote:
> > On Apr 14, 2023, at 10:00 PM, Jonathan Neuschäfer <j.neuschaefer@gmx.net> wrote:
> > On Thu, Apr 13, 2023 at 01:13:21PM +0200, Arnd Bergmann wrote:
> >> On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
> >>> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
> >>>> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
> >>>> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
> >>>>
> >>>> - LZO: 7.2 MiB, 6 seconds
> >>>> - ZSTD: 5.6 MiB, 60 seconds
[...]
> > For ZSTD as used in kernel decompression (the zstd22 configuration), the
> > window is even bigger, 128 MiB. (AFAIU)
>
> Sorry, I’m a bit late to the party, I wasn’t getting LKML email for some time...
>
> But this is totally configurable. You can switch compression configurations
> at any time. If you believe that the window size is the issue causing speed
> regressions, you could use a zstd compression to use a e.g. 256KB window
> size like this:
>
> zstd -19 --zstd=wlog=18
>
> This will keep the same algorithm search strength, but limit the decoder memory
> usage.
Noted.
> I will also try to get this patchset working on my machine, and try to debug.
> The 10x slower speed difference is not expected, and we see much better speed
> in userspace ARM. I suspect it has something to do with the preboot environment.
> E.g. when implementing x86-64 zstd kernel decompression, I noticed that
> memcpy(dst, src, 16) wasn’t getting inlined properly, causing a massive performance
> penalty.
In the meantime I've seen 8s for ZSTD vs. 2s for other algorithms, on
only mildly less ancient hardware (Hi3518A, another ARM9 SoC), so I
think the main culprit here was particularly bad luck in my choice of
test hardware.
The inlining issues are a good point, noted for the next time I work on this.
Thanks,
Jonathan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/3] ARM ZSTD boot compression
2023-10-13 1:27 ` J. Neuschäfer
@ 2023-10-20 18:53 ` Nick Terrell
0 siblings, 0 replies; 20+ messages in thread
From: Nick Terrell @ 2023-10-20 18:53 UTC (permalink / raw)
To: J. Neuschäfer
Cc: Nick Terrell, Arnd Bergmann, Linux ARM, Russell King,
Tony Lindgren, Geert Uytterhoeven, Linus Walleij,
Sebastian Reichel, Hawkins, Nick, Christophe Leroy,
Florian Fainelli, Nick Desaulniers, Xin Li, Seung-Woo Kim,
Paul Bolle, Bart Van Assche, Linux Kernel Mailing List
> On Oct 12, 2023, at 6:27 PM, J. Neuschäfer <j.neuschaefer@gmx.net> wrote:
>
> On Thu, Oct 12, 2023 at 10:33:23PM +0000, Nick Terrell wrote:
>>> On Apr 14, 2023, at 10:00 PM, Jonathan Neuschäfer <j.neuschaefer@gmx.net> wrote:
>>> On Thu, Apr 13, 2023 at 01:13:21PM +0200, Arnd Bergmann wrote:
>>>> On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
>>>>> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
>>>>>> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
>>>>>> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
>>>>>>
>>>>>> - LZO: 7.2 MiB, 6 seconds
>>>>>> - ZSTD: 5.6 MiB, 60 seconds
> [...]
>>> For ZSTD as used in kernel decompression (the zstd22 configuration), the
>>> window is even bigger, 128 MiB. (AFAIU)
>>
>> Sorry, I’m a bit late to the party, I wasn’t getting LKML email for some time...
>>
>> But this is totally configurable. You can switch compression configurations
>> at any time. If you believe that the window size is the issue causing speed
>> regressions, you could use a zstd compression to use a e.g. 256KB window
>> size like this:
>>
>> zstd -19 --zstd=wlog=18
>>
>> This will keep the same algorithm search strength, but limit the decoder memory
>> usage.
>
> Noted.
>
>> I will also try to get this patchset working on my machine, and try to debug.
>> The 10x slower speed difference is not expected, and we see much better speed
>> in userspace ARM. I suspect it has something to do with the preboot environment.
>> E.g. when implementing x86-64 zstd kernel decompression, I noticed that
>> memcpy(dst, src, 16) wasn’t getting inlined properly, causing a massive performance
>> penalty.
>
> In the meantime I've seen 8s for ZSTD vs. 2s for other algorithms, on
> only mildly less ancient hardware (Hi3518A, another ARM9 SoC), so I
> think the main culprit here was particularly bad luck in my choice of
> test hardware.
>
> The inlining issues are a good point, noted for the next time I work on this.
I went out and bought a Raspberry Pi 4 to test on. I’ve done some crude measurements
and see that zstd kernel decompression is just slightly slower than gzip kernel
decompression, and about 2x slower than lzo. In userspace decompression of the same
file (a manually compressed kernel image) I see that zstd decompression is significantly
faster than gzip. So it is definitely something about the preboot boot environment, or how
the code is compiled for the preboot environment that is causing the issue.
My next step is to set up qemu on my Pi to try to get some perf measurements of the
decompression. One thing I’ve really been struggling with, and what thwarted my last
attempts at adding ARM zstd kernel decompression, was getting preboot logs printed.
I’ve figured out I need CONFIG_DEBUG_LL=y, but I’ve yet to actually get any logs.
And I can’t figure out how to get it working in qemu. I haven’t tried qemu on an ARM
host with kvm, but that’s the next thing I will try.
Do you happen to have any advice about how to get preboot logs in qemu? Is it
possible only on an ARM host, or would it also be possible on an x86-64 host?
Thanks,
Nick Terrell
> Thanks,
> Jonathan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-10-20 18:54 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-12 21:21 [PATCH 0/3] ARM ZSTD boot compression Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 1/3] ARM: compressed: Pass the actual output length to the decompressor Jonathan Neuschäfer
2023-04-12 21:42 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
2023-04-13 5:20 ` Tony Lindgren
2023-04-15 1:52 ` Jonathan Neuschäfer
2023-04-12 21:21 ` [PATCH 2/3] ARM: compressed: Bump MALLOC_SIZE to 128 KiB Jonathan Neuschäfer
2023-04-12 21:43 ` Linus Walleij
2023-04-12 21:48 ` Florian Fainelli
2023-05-02 8:39 ` Russell King (Oracle)
2023-04-12 21:21 ` [PATCH 3/3] ARM: compressed: Enable ZSTD compression Jonathan Neuschäfer
2023-04-12 21:45 ` Linus Walleij
2023-04-12 21:49 ` Florian Fainelli
2023-04-12 21:33 ` [PATCH 0/3] ARM ZSTD boot compression Arnd Bergmann
2023-04-13 11:13 ` Arnd Bergmann
2023-04-15 2:00 ` Jonathan Neuschäfer
2023-10-12 22:33 ` Nick Terrell
2023-10-13 1:27 ` J. Neuschäfer
2023-10-20 18:53 ` Nick Terrell
2023-04-14 22:50 ` Jonathan Neuschäfer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).