linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Add xxhash and zstd modules
@ 2017-06-29 19:41 Nick Terrell
  2017-06-29 19:41 ` [PATCH v2 1/4] lib: Add xxhash module Nick Terrell
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Nick Terrell @ 2017-06-29 19:41 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel

Hi all,

This patch set adds xxhash, zstd compression, and zstd decompression
modules. It also adds zstd support to BtrFS and SquashFS.

Each patch has relevant summaries, benchmarks, and tests.

Best,
Nick Terrell

Changelog:

v1 -> v2:
- Make pointer in lib/xxhash.c:394 non-const (1/4)
- Use div_u64() for division of u64s (2/4)
- Reduce stack usage of ZSTD_compressSequences(), ZSTD_buildSeqTable(),
  ZSTD_decompressSequencesLong(), FSE_buildDTable(), FSE_decompress_wksp(),
  HUF_writeCTable(), HUF_readStats(), HUF_readCTable(),
  HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() (2/4)
- No zstd function uses more than 400 B of stack space (2/4)

Nick Terrell (4):
  lib: Add xxhash module
  lib: Add zstd modules
  btrfs: Add zstd support
  squashfs: Add zstd support

 fs/btrfs/Kconfig           |    2 +
 fs/btrfs/Makefile          |    2 +-
 fs/btrfs/compression.c     |    1 +
 fs/btrfs/compression.h     |    6 +-
 fs/btrfs/ctree.h           |    1 +
 fs/btrfs/disk-io.c         |    2 +
 fs/btrfs/ioctl.c           |    6 +-
 fs/btrfs/props.c           |    6 +
 fs/btrfs/super.c           |   12 +-
 fs/btrfs/sysfs.c           |    2 +
 fs/btrfs/zstd.c            |  433 ++++++
 fs/squashfs/Kconfig        |   14 +
 fs/squashfs/Makefile       |    1 +
 fs/squashfs/decompressor.c |    7 +
 fs/squashfs/decompressor.h |    4 +
 fs/squashfs/squashfs_fs.h  |    1 +
 fs/squashfs/zstd_wrapper.c |  150 ++
 include/linux/xxhash.h     |  236 +++
 include/linux/zstd.h       | 1157 +++++++++++++++
 include/uapi/linux/btrfs.h |    8 +-
 lib/Kconfig                |   11 +
 lib/Makefile               |    3 +
 lib/xxhash.c               |  500 +++++++
 lib/zstd/Makefile          |   18 +
 lib/zstd/bitstream.h       |  374 +++++
 lib/zstd/compress.c        | 3479 ++++++++++++++++++++++++++++++++++++++++++++
 lib/zstd/decompress.c      | 2526 ++++++++++++++++++++++++++++++++
 lib/zstd/entropy_common.c  |  243 ++++
 lib/zstd/error_private.h   |   53 +
 lib/zstd/fse.h             |  575 ++++++++
 lib/zstd/fse_compress.c    |  795 ++++++++++
 lib/zstd/fse_decompress.c  |  332 +++++
 lib/zstd/huf.h             |  212 +++
 lib/zstd/huf_compress.c    |  771 ++++++++++
 lib/zstd/huf_decompress.c  |  960 ++++++++++++
 lib/zstd/mem.h             |  151 ++
 lib/zstd/zstd_common.c     |   75 +
 lib/zstd/zstd_internal.h   |  269 ++++
 lib/zstd/zstd_opt.h        | 1014 +++++++++++++
 39 files changed, 14400 insertions(+), 12 deletions(-)
 create mode 100644 fs/btrfs/zstd.c
 create mode 100644 fs/squashfs/zstd_wrapper.c
 create mode 100644 include/linux/xxhash.h
 create mode 100644 include/linux/zstd.h
 create mode 100644 lib/xxhash.c
 create mode 100644 lib/zstd/Makefile
 create mode 100644 lib/zstd/bitstream.h
 create mode 100644 lib/zstd/compress.c
 create mode 100644 lib/zstd/decompress.c
 create mode 100644 lib/zstd/entropy_common.c
 create mode 100644 lib/zstd/error_private.h
 create mode 100644 lib/zstd/fse.h
 create mode 100644 lib/zstd/fse_compress.c
 create mode 100644 lib/zstd/fse_decompress.c
 create mode 100644 lib/zstd/huf.h
 create mode 100644 lib/zstd/huf_compress.c
 create mode 100644 lib/zstd/huf_decompress.c
 create mode 100644 lib/zstd/mem.h
 create mode 100644 lib/zstd/zstd_common.c
 create mode 100644 lib/zstd/zstd_internal.h
 create mode 100644 lib/zstd/zstd_opt.h

--
2.9.3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/4] lib: Add xxhash module
  2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
@ 2017-06-29 19:41 ` Nick Terrell
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Nick Terrell @ 2017-06-29 19:41 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel

Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
extremely fast non-cryptographic hash algorithm for checksumming.
The zstd compression and decompression modules added in the next patch
require xxhash. I extracted it out from zstd since it is useful on its
own. I copied the code from the upstream XXHash source repository and
translated it into kernel style. I ran benchmarks and tests in the kernel
and tests in userland.

I benchmarked xxhash as a special character device. I ran in four modes,
no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
hashes on the copied data. I also ran it with four different buffer sizes.
The benchmark file is located in the upstream zstd source repository under
`contrib/linux-kernel/xxhash_test.c` [1].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

The time is reported by the time of the userland `cp`.
The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).


| Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
|-----------------|-------|----------|------|---------------|
|            1024 | none  |    0.408 | 3.77 |             - |
|            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
|            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
|            1024 | crc32 |    1.290 | 1.19 |          1.74 |
|            4096 | none  |    0.380 | 4.04 |             - |
|            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
|            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
|            4096 | crc32 |    1.168 | 1.32 |          1.95 |
|            8192 | none  |    0.351 | 4.38 |             - |
|            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
|            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
|            8192 | crc32 |    1.163 | 1.32 |          1.89 |
|           16384 | none  |    0.346 | 4.43 |             - |
|           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
|           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
|           16384 | crc32 |    1.183 | 1.30 |          1.84 |

Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
kernel functions. A line in each branch of every function in `xxhash.c`
was commented out to ensure that the test-suite fails. Additionally
tested while testing zstd and with SMHasher [3].

[1] https://phabricator.intern.facebook.com/P57526246
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
[3] https://github.com/aappleby/smhasher

zstd source repository: https://github.com/facebook/zstd
XXHash source repository: https://github.com/cyan4973/xxhash

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
v1 -> v2:
- Make pointer in lib/xxhash.c:394 non-const

 include/linux/xxhash.h | 236 +++++++++++++++++++++++
 lib/Kconfig            |   3 +
 lib/Makefile           |   1 +
 lib/xxhash.c           | 500 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 740 insertions(+)
 create mode 100644 include/linux/xxhash.h
 create mode 100644 lib/xxhash.c

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
new file mode 100644
index 0000000..9e1f42c
--- /dev/null
+++ b/include/linux/xxhash.h
@@ -0,0 +1,236 @@
+/*
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: http://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+/*
+ * Notice extracted from xxHash homepage:
+ *
+ * xxHash is an extremely fast Hash algorithm, running at RAM speed limits.
+ * It also successfully passes all tests from the SMHasher suite.
+ *
+ * Comparison (single thread, Windows Seven 32 bits, using SMHasher on a Core 2
+ * Duo @3GHz)
+ *
+ * Name            Speed       Q.Score   Author
+ * xxHash          5.4 GB/s     10
+ * CrapWow         3.2 GB/s      2       Andrew
+ * MumurHash 3a    2.7 GB/s     10       Austin Appleby
+ * SpookyHash      2.0 GB/s     10       Bob Jenkins
+ * SBox            1.4 GB/s      9       Bret Mulvey
+ * Lookup3         1.2 GB/s      9       Bob Jenkins
+ * SuperFastHash   1.2 GB/s      1       Paul Hsieh
+ * CityHash64      1.05 GB/s    10       Pike & Alakuijala
+ * FNV             0.55 GB/s     5       Fowler, Noll, Vo
+ * CRC32           0.43 GB/s     9
+ * MD5-32          0.33 GB/s    10       Ronald L. Rivest
+ * SHA1-32         0.28 GB/s    10
+ *
+ * Q.Score is a measure of quality of the hash function.
+ * It depends on successfully passing SMHasher test set.
+ * 10 is a perfect score.
+ *
+ * A 64-bits version, named xxh64 offers much better speed,
+ * but for 64-bits applications only.
+ * Name     Speed on 64 bits    Speed on 32 bits
+ * xxh64       13.8 GB/s            1.9 GB/s
+ * xxh32        6.8 GB/s            6.0 GB/s
+ */
+
+#ifndef XXHASH_H
+#define XXHASH_H
+
+#include <linux/types.h>
+
+/*-****************************
+ * Simple Hash Functions
+ *****************************/
+
+/**
+ * xxh32() - calculate the 32-bit hash of the input with a given seed.
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * Speed on Core 2 Duo @ 3 GHz (single thread, SMHasher benchmark) : 5.4 GB/s
+ *
+ * Return:  The 32-bit hash of the data.
+ */
+uint32_t xxh32(const void *input, size_t length, uint32_t seed);
+
+/**
+ * xxh64() - calculate the 64-bit hash of the input with a given seed.
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * This function runs 2x faster on 64-bit systems, but slower on 32-bit systems.
+ *
+ * Return:  The 64-bit hash of the data.
+ */
+uint64_t xxh64(const void *input, size_t length, uint64_t seed);
+
+/*-****************************
+ * Streaming Hash Functions
+ *****************************/
+
+/*
+ * These definitions are only meant to allow allocation of XXH state
+ * statically, on stack, or in a struct for example.
+ * Do not use members directly.
+ */
+
+/**
+ * struct xxh32_state - private xxh32 state, do not use members directly
+ */
+struct xxh32_state {
+	uint32_t total_len_32;
+	uint32_t large_len;
+	uint32_t v1;
+	uint32_t v2;
+	uint32_t v3;
+	uint32_t v4;
+	uint32_t mem32[4];
+	uint32_t memsize;
+};
+
+/**
+ * struct xxh32_state - private xxh64 state, do not use members directly
+ */
+struct xxh64_state {
+	uint64_t total_len;
+	uint64_t v1;
+	uint64_t v2;
+	uint64_t v3;
+	uint64_t v4;
+	uint64_t mem64[4];
+	uint32_t memsize;
+};
+
+/**
+ * xxh32_reset() - reset the xxh32 state to start a new hashing operation
+ *
+ * @state: The xxh32 state to reset.
+ * @seed:  Initialize the hash state with this seed.
+ *
+ * Call this function on any xxh32_state to prepare for a new hashing operation.
+ */
+void xxh32_reset(struct xxh32_state *state, uint32_t seed);
+
+/**
+ * xxh32_update() - hash the data given and update the xxh32 state
+ *
+ * @state:  The xxh32 state to update.
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ *
+ * After calling xxh32_reset() call xxh32_update() as many times as necessary.
+ *
+ * Return:  Zero on success, otherwise an error code.
+ */
+int xxh32_update(struct xxh32_state *state, const void *input, size_t length);
+
+/**
+ * xxh32_digest() - produce the current xxh32 hash
+ *
+ * @state: Produce the current xxh32 hash of this state.
+ *
+ * A hash value can be produced at any time. It is still possible to continue
+ * inserting input into the hash state after a call to xxh32_digest(), and
+ * generate new hashes later on, by calling xxh32_digest() again.
+ *
+ * Return: The xxh32 hash stored in the state.
+ */
+uint32_t xxh32_digest(const struct xxh32_state *state);
+
+/**
+ * xxh64_reset() - reset the xxh64 state to start a new hashing operation
+ *
+ * @state: The xxh64 state to reset.
+ * @seed:  Initialize the hash state with this seed.
+ */
+void xxh64_reset(struct xxh64_state *state, uint64_t seed);
+
+/**
+ * xxh64_update() - hash the data given and update the xxh64 state
+ * @state:  The xxh64 state to update.
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ *
+ * After calling xxh64_reset() call xxh64_update() as many times as necessary.
+ *
+ * Return:  Zero on success, otherwise an error code.
+ */
+int xxh64_update(struct xxh64_state *state, const void *input, size_t length);
+
+/**
+ * xxh64_digest() - produce the current xxh64 hash
+ *
+ * @state: Produce the current xxh64 hash of this state.
+ *
+ * A hash value can be produced at any time. It is still possible to continue
+ * inserting input into the hash state after a call to xxh64_digest(), and
+ * generate new hashes later on, by calling xxh64_digest() again.
+ *
+ * Return: The xxh64 hash stored in the state.
+ */
+uint64_t xxh64_digest(const struct xxh64_state *state);
+
+/*-**************************
+ * Utils
+ ***************************/
+
+/**
+ * xxh32_copy_state() - copy the source state into the destination state
+ *
+ * @src: The source xxh32 state.
+ * @dst: The destination xxh32 state.
+ */
+void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src);
+
+/**
+ * xxh64_copy_state() - copy the source state into the destination state
+ *
+ * @src: The source xxh64 state.
+ * @dst: The destination xxh64 state.
+ */
+void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src);
+
+#endif /* XXHASH_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 0c8b78a..b6009d7 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -184,6 +184,9 @@ config CRC8
 	  when they need to do cyclic redundancy check according CRC8
 	  algorithm. Module will be called crc8.

+config XXHASH
+	tristate
+
 config AUDIT_GENERIC
 	bool
 	depends on AUDIT && !AUDIT_ARCH
diff --git a/lib/Makefile b/lib/Makefile
index 0166fbc..1338226 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_CRC32_SELFTEST)	+= crc32test.o
 obj-$(CONFIG_CRC7)	+= crc7.o
 obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
+obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o

 obj-$(CONFIG_842_COMPRESS) += 842/
diff --git a/lib/xxhash.c b/lib/xxhash.c
new file mode 100644
index 0000000..aa61e2a
--- /dev/null
+++ b/lib/xxhash.c
@@ -0,0 +1,500 @@
+/*
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: http://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#include <asm/unaligned.h>
+#include <linux/errno.h>
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/xxhash.h>
+
+/*-*************************************
+ * Macros
+ **************************************/
+#define xxh_rotl32(x, r) ((x << r) | (x >> (32 - r)))
+#define xxh_rotl64(x, r) ((x << r) | (x >> (64 - r)))
+
+#ifdef __LITTLE_ENDIAN
+# define XXH_CPU_LITTLE_ENDIAN 1
+#else
+# define XXH_CPU_LITTLE_ENDIAN 0
+#endif
+
+/*-*************************************
+ * Constants
+ **************************************/
+static const uint32_t PRIME32_1 = 2654435761U;
+static const uint32_t PRIME32_2 = 2246822519U;
+static const uint32_t PRIME32_3 = 3266489917U;
+static const uint32_t PRIME32_4 =  668265263U;
+static const uint32_t PRIME32_5 =  374761393U;
+
+static const uint64_t PRIME64_1 = 11400714785074694791ULL;
+static const uint64_t PRIME64_2 = 14029467366897019727ULL;
+static const uint64_t PRIME64_3 =  1609587929392839161ULL;
+static const uint64_t PRIME64_4 =  9650029242287828579ULL;
+static const uint64_t PRIME64_5 =  2870177450012600261ULL;
+
+/*-**************************
+ *  Utils
+ ***************************/
+void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+EXPORT_SYMBOL(xxh32_copy_state);
+
+void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+EXPORT_SYMBOL(xxh64_copy_state);
+
+/*-***************************
+ * Simple Hash Functions
+ ****************************/
+static uint32_t xxh32_round(uint32_t seed, const uint32_t input)
+{
+	seed += input * PRIME32_2;
+	seed = xxh_rotl32(seed, 13);
+	seed *= PRIME32_1;
+	return seed;
+}
+
+uint32_t xxh32(const void *input, const size_t len, const uint32_t seed)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *b_end = p + len;
+	uint32_t h32;
+
+	if (len >= 16) {
+		const uint8_t *const limit = b_end - 16;
+		uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
+		uint32_t v2 = seed + PRIME32_2;
+		uint32_t v3 = seed + 0;
+		uint32_t v4 = seed - PRIME32_1;
+
+		do {
+			v1 = xxh32_round(v1, get_unaligned_le32(p));
+			p += 4;
+			v2 = xxh32_round(v2, get_unaligned_le32(p));
+			p += 4;
+			v3 = xxh32_round(v3, get_unaligned_le32(p));
+			p += 4;
+			v4 = xxh32_round(v4, get_unaligned_le32(p));
+			p += 4;
+		} while (p <= limit);
+
+		h32 = xxh_rotl32(v1, 1) + xxh_rotl32(v2, 7) +
+			xxh_rotl32(v3, 12) + xxh_rotl32(v4, 18);
+	} else {
+		h32 = seed + PRIME32_5;
+	}
+
+	h32 += (uint32_t)len;
+
+	while (p + 4 <= b_end) {
+		h32 += get_unaligned_le32(p) * PRIME32_3;
+		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h32 += (*p) * PRIME32_5;
+		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+		p++;
+	}
+
+	h32 ^= h32 >> 15;
+	h32 *= PRIME32_2;
+	h32 ^= h32 >> 13;
+	h32 *= PRIME32_3;
+	h32 ^= h32 >> 16;
+
+	return h32;
+}
+EXPORT_SYMBOL(xxh32);
+
+static uint64_t xxh64_round(uint64_t acc, const uint64_t input)
+{
+	acc += input * PRIME64_2;
+	acc = xxh_rotl64(acc, 31);
+	acc *= PRIME64_1;
+	return acc;
+}
+
+static uint64_t xxh64_merge_round(uint64_t acc, uint64_t val)
+{
+	val = xxh64_round(0, val);
+	acc ^= val;
+	acc = acc * PRIME64_1 + PRIME64_4;
+	return acc;
+}
+
+uint64_t xxh64(const void *input, const size_t len, const uint64_t seed)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+	uint64_t h64;
+
+	if (len >= 32) {
+		const uint8_t *const limit = b_end - 32;
+		uint64_t v1 = seed + PRIME64_1 + PRIME64_2;
+		uint64_t v2 = seed + PRIME64_2;
+		uint64_t v3 = seed + 0;
+		uint64_t v4 = seed - PRIME64_1;
+
+		do {
+			v1 = xxh64_round(v1, get_unaligned_le64(p));
+			p += 8;
+			v2 = xxh64_round(v2, get_unaligned_le64(p));
+			p += 8;
+			v3 = xxh64_round(v3, get_unaligned_le64(p));
+			p += 8;
+			v4 = xxh64_round(v4, get_unaligned_le64(p));
+			p += 8;
+		} while (p <= limit);
+
+		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
+			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
+		h64 = xxh64_merge_round(h64, v1);
+		h64 = xxh64_merge_round(h64, v2);
+		h64 = xxh64_merge_round(h64, v3);
+		h64 = xxh64_merge_round(h64, v4);
+
+	} else {
+		h64  = seed + PRIME64_5;
+	}
+
+	h64 += (uint64_t)len;
+
+	while (p + 8 <= b_end) {
+		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
+
+		h64 ^= k1;
+		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
+		p += 8;
+	}
+
+	if (p + 4 <= b_end) {
+		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
+		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h64 ^= (*p) * PRIME64_5;
+		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
+		p++;
+	}
+
+	h64 ^= h64 >> 33;
+	h64 *= PRIME64_2;
+	h64 ^= h64 >> 29;
+	h64 *= PRIME64_3;
+	h64 ^= h64 >> 32;
+
+	return h64;
+}
+EXPORT_SYMBOL(xxh64);
+
+/*-**************************************************
+ * Advanced Hash Functions
+ ***************************************************/
+void xxh32_reset(struct xxh32_state *statePtr, const uint32_t seed)
+{
+	/* use a local state for memcpy() to avoid strict-aliasing warnings */
+	struct xxh32_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.v1 = seed + PRIME32_1 + PRIME32_2;
+	state.v2 = seed + PRIME32_2;
+	state.v3 = seed + 0;
+	state.v4 = seed - PRIME32_1;
+	memcpy(statePtr, &state, sizeof(state));
+}
+EXPORT_SYMBOL(xxh32_reset);
+
+void xxh64_reset(struct xxh64_state *statePtr, const uint64_t seed)
+{
+	/* use a local state for memcpy() to avoid strict-aliasing warnings */
+	struct xxh64_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.v1 = seed + PRIME64_1 + PRIME64_2;
+	state.v2 = seed + PRIME64_2;
+	state.v3 = seed + 0;
+	state.v4 = seed - PRIME64_1;
+	memcpy(statePtr, &state, sizeof(state));
+}
+EXPORT_SYMBOL(xxh64_reset);
+
+int xxh32_update(struct xxh32_state *state, const void *input, const size_t len)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+
+	if (input == NULL)
+		return -EINVAL;
+
+	state->total_len_32 += (uint32_t)len;
+	state->large_len |= (len >= 16) | (state->total_len_32 >= 16);
+
+	if (state->memsize + len < 16) { /* fill in tmp buffer */
+		memcpy((uint8_t *)(state->mem32) + state->memsize, input, len);
+		state->memsize += (uint32_t)len;
+		return 0;
+	}
+
+	if (state->memsize) { /* some data left from previous update */
+		const uint32_t *p32 = state->mem32;
+
+		memcpy((uint8_t *)(state->mem32) + state->memsize, input,
+			16 - state->memsize);
+
+		state->v1 = xxh32_round(state->v1, get_unaligned_le32(p32));
+		p32++;
+		state->v2 = xxh32_round(state->v2, get_unaligned_le32(p32));
+		p32++;
+		state->v3 = xxh32_round(state->v3, get_unaligned_le32(p32));
+		p32++;
+		state->v4 = xxh32_round(state->v4, get_unaligned_le32(p32));
+		p32++;
+
+		p += 16-state->memsize;
+		state->memsize = 0;
+	}
+
+	if (p <= b_end - 16) {
+		const uint8_t *const limit = b_end - 16;
+		uint32_t v1 = state->v1;
+		uint32_t v2 = state->v2;
+		uint32_t v3 = state->v3;
+		uint32_t v4 = state->v4;
+
+		do {
+			v1 = xxh32_round(v1, get_unaligned_le32(p));
+			p += 4;
+			v2 = xxh32_round(v2, get_unaligned_le32(p));
+			p += 4;
+			v3 = xxh32_round(v3, get_unaligned_le32(p));
+			p += 4;
+			v4 = xxh32_round(v4, get_unaligned_le32(p));
+			p += 4;
+		} while (p <= limit);
+
+		state->v1 = v1;
+		state->v2 = v2;
+		state->v3 = v3;
+		state->v4 = v4;
+	}
+
+	if (p < b_end) {
+		memcpy(state->mem32, p, (size_t)(b_end-p));
+		state->memsize = (uint32_t)(b_end-p);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(xxh32_update);
+
+uint32_t xxh32_digest(const struct xxh32_state *state)
+{
+	const uint8_t *p = (const uint8_t *)state->mem32;
+	const uint8_t *const b_end = (const uint8_t *)(state->mem32) +
+		state->memsize;
+	uint32_t h32;
+
+	if (state->large_len) {
+		h32 = xxh_rotl32(state->v1, 1) + xxh_rotl32(state->v2, 7) +
+			xxh_rotl32(state->v3, 12) + xxh_rotl32(state->v4, 18);
+	} else {
+		h32 = state->v3 /* == seed */ + PRIME32_5;
+	}
+
+	h32 += state->total_len_32;
+
+	while (p + 4 <= b_end) {
+		h32 += get_unaligned_le32(p) * PRIME32_3;
+		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h32 += (*p) * PRIME32_5;
+		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+		p++;
+	}
+
+	h32 ^= h32 >> 15;
+	h32 *= PRIME32_2;
+	h32 ^= h32 >> 13;
+	h32 *= PRIME32_3;
+	h32 ^= h32 >> 16;
+
+	return h32;
+}
+EXPORT_SYMBOL(xxh32_digest);
+
+int xxh64_update(struct xxh64_state *state, const void *input, const size_t len)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+
+	if (input == NULL)
+		return -EINVAL;
+
+	state->total_len += len;
+
+	if (state->memsize + len < 32) { /* fill in tmp buffer */
+		memcpy(((uint8_t *)state->mem64) + state->memsize, input, len);
+		state->memsize += (uint32_t)len;
+		return 0;
+	}
+
+	if (state->memsize) { /* tmp buffer is full */
+		uint64_t *p64 = state->mem64;
+
+		memcpy(((uint8_t *)p64) + state->memsize, input,
+			32 - state->memsize);
+
+		state->v1 = xxh64_round(state->v1, get_unaligned_le64(p64));
+		p64++;
+		state->v2 = xxh64_round(state->v2, get_unaligned_le64(p64));
+		p64++;
+		state->v3 = xxh64_round(state->v3, get_unaligned_le64(p64));
+		p64++;
+		state->v4 = xxh64_round(state->v4, get_unaligned_le64(p64));
+
+		p += 32 - state->memsize;
+		state->memsize = 0;
+	}
+
+	if (p + 32 <= b_end) {
+		const uint8_t *const limit = b_end - 32;
+		uint64_t v1 = state->v1;
+		uint64_t v2 = state->v2;
+		uint64_t v3 = state->v3;
+		uint64_t v4 = state->v4;
+
+		do {
+			v1 = xxh64_round(v1, get_unaligned_le64(p));
+			p += 8;
+			v2 = xxh64_round(v2, get_unaligned_le64(p));
+			p += 8;
+			v3 = xxh64_round(v3, get_unaligned_le64(p));
+			p += 8;
+			v4 = xxh64_round(v4, get_unaligned_le64(p));
+			p += 8;
+		} while (p <= limit);
+
+		state->v1 = v1;
+		state->v2 = v2;
+		state->v3 = v3;
+		state->v4 = v4;
+	}
+
+	if (p < b_end) {
+		memcpy(state->mem64, p, (size_t)(b_end-p));
+		state->memsize = (uint32_t)(b_end - p);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(xxh64_update);
+
+uint64_t xxh64_digest(const struct xxh64_state *state)
+{
+	const uint8_t *p = (const uint8_t *)state->mem64;
+	const uint8_t *const b_end = (const uint8_t *)state->mem64 +
+		state->memsize;
+	uint64_t h64;
+
+	if (state->total_len >= 32) {
+		const uint64_t v1 = state->v1;
+		const uint64_t v2 = state->v2;
+		const uint64_t v3 = state->v3;
+		const uint64_t v4 = state->v4;
+
+		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
+			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
+		h64 = xxh64_merge_round(h64, v1);
+		h64 = xxh64_merge_round(h64, v2);
+		h64 = xxh64_merge_round(h64, v3);
+		h64 = xxh64_merge_round(h64, v4);
+	} else {
+		h64  = state->v3 + PRIME64_5;
+	}
+
+	h64 += (uint64_t)state->total_len;
+
+	while (p + 8 <= b_end) {
+		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
+
+		h64 ^= k1;
+		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
+		p += 8;
+	}
+
+	if (p + 4 <= b_end) {
+		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
+		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h64 ^= (*p) * PRIME64_5;
+		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
+		p++;
+	}
+
+	h64 ^= h64 >> 33;
+	h64 *= PRIME64_2;
+	h64 ^= h64 >> 29;
+	h64 *= PRIME64_3;
+	h64 ^= h64 >> 32;
+
+	return h64;
+}
+EXPORT_SYMBOL(xxh64_digest);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_DESCRIPTION("xxHash");
--
2.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
  2017-06-29 19:41 ` [PATCH v2 1/4] lib: Add xxhash module Nick Terrell
@ 2017-06-29 19:41 ` Nick Terrell
  2017-06-30  3:24   ` Adam Borowski
                     ` (3 more replies)
  2017-06-29 19:41 ` [PATCH v2 4/4] squashfs: " Nick Terrell
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 32+ messages in thread
From: Nick Terrell @ 2017-06-29 19:41 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel

Add zstd compression and decompression support to BtrFS. zstd at its
fastest level compresses almost as well as zlib, while offering much
faster compression and decompression, approaching lzo speeds.

I benchmarked btrfs with zstd compression against no compression, lzo
compression, and zlib compression. I benchmarked two scenarios. Copying
a set of files to btrfs, and then reading the files. Copying a tarball
to btrfs, extracting it to btrfs, and then reading the extracted files.
After every operation, I call `sync` and include the sync time.
Between every pair of operations I unmount and remount the filesystem
to avoid caching. The benchmark files can be found in the upstream
zstd source repository under
`contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
[1] [2].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD.

The first compression benchmark is copying 10 copies of the unzipped
Silesia corpus [3] into a BtrFS filesystem mounted with
`-o compress-force=Method`. The decompression benchmark times how long
it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
measured by comparing the output of `df` and `du`. See the benchmark file
[1] for details. I benchmarked multiple zstd compression levels, although
the patch uses zstd level 1.

| Method  | Ratio | Compression MB/s | Decompression speed |
|---------|-------|------------------|---------------------|
| None    |  0.99 |              504 |                 686 |
| lzo     |  1.66 |              398 |                 442 |
| zlib    |  2.58 |               65 |                 241 |
| zstd 1  |  2.57 |              260 |                 383 |
| zstd 3  |  2.71 |              174 |                 408 |
| zstd 6  |  2.87 |               70 |                 398 |
| zstd 9  |  2.92 |               43 |                 406 |
| zstd 12 |  2.93 |               21 |                 408 |
| zstd 15 |  3.01 |               11 |                 354 |

The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
measures the compression ratio, extracts the tar, and deletes the tar.
Then it measures the compression ratio again, and `tar`s the extracted
files into `/dev/null`. See the benchmark file [2] for details.

| Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
|--------|-----------|---------------|----------|------------|----------|
| None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
| lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
| zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
| zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |

[1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
[3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz

zstd source repository: https://github.com/facebook/zstd

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 fs/btrfs/Kconfig           |   2 +
 fs/btrfs/Makefile          |   2 +-
 fs/btrfs/compression.c     |   1 +
 fs/btrfs/compression.h     |   6 +-
 fs/btrfs/ctree.h           |   1 +
 fs/btrfs/disk-io.c         |   2 +
 fs/btrfs/ioctl.c           |   6 +-
 fs/btrfs/props.c           |   6 +
 fs/btrfs/super.c           |  12 +-
 fs/btrfs/sysfs.c           |   2 +
 fs/btrfs/zstd.c            | 433 +++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/btrfs.h |   8 +-
 12 files changed, 469 insertions(+), 12 deletions(-)
 create mode 100644 fs/btrfs/zstd.c

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 80e9c18..a26c63b 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -6,6 +6,8 @@ config BTRFS_FS
 	select ZLIB_DEFLATE
 	select LZO_COMPRESS
 	select LZO_DECOMPRESS
+	select ZSTD_COMPRESS
+	select ZSTD_DECOMPRESS
 	select RAID6_PQ
 	select XOR_BLOCKS
 	select SRCU
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17..962a95a 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -6,7 +6,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   transaction.o inode.o file.o tree-defrag.o \
 	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
-	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
+	   export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o hash.o free-space-tree.o
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 10e6b28..3beb0d0 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -761,6 +761,7 @@ static struct {
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
 	&btrfs_lzo_compress,
+	&btrfs_zstd_compress,
 };

 void __init btrfs_init_compress(void)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 39ec43a..d99fc21 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -60,8 +60,9 @@ enum btrfs_compression_type {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
-	BTRFS_COMPRESS_LAST  = 3,
+	BTRFS_COMPRESS_ZSTD  = 3,
+	BTRFS_COMPRESS_TYPES = 3,
+	BTRFS_COMPRESS_LAST  = 4,
 };

 struct btrfs_compress_op {
@@ -92,5 +93,6 @@ struct btrfs_compress_op {

 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;
+extern const struct btrfs_compress_op btrfs_zstd_compress;

 #endif
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4f8f75d..61dd3dd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -271,6 +271,7 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_BIG_METADATA |		\
 	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD |		\
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5f678dc..49c0e91 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2831,6 +2831,8 @@ int open_ctree(struct super_block *sb,
 	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
 	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
 		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+	else if (fs_info->compress_type == BTRFS_COMPRESS_ZSTD)
+		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;

 	if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
 		btrfs_info(fs_info, "has skinny extents");
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e176375..f732cfd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -327,8 +327,10 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)

 		if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
 			comp = "lzo";
-		else
+		else if (fs_info->compress_type == BTRFS_COMPRESS_ZLIB)
 			comp = "zlib";
+		else
+			comp = "zstd";
 		ret = btrfs_set_prop(inode, "btrfs.compression",
 				     comp, strlen(comp), 0);
 		if (ret)
@@ -1463,6 +1465,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,

 	if (range->compress_type == BTRFS_COMPRESS_LZO) {
 		btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
+	} else if (range->compress_type == BTRFS_COMPRESS_ZSTD) {
+		btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
 	}

 	ret = defrag_count;
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index d6cb155..162105f 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -383,6 +383,8 @@ static int prop_compression_validate(const char *value, size_t len)
 		return 0;
 	else if (!strncmp("zlib", value, len))
 		return 0;
+	else if (!strncmp("zstd", value, len))
+		return 0;

 	return -EINVAL;
 }
@@ -405,6 +407,8 @@ static int prop_compression_apply(struct inode *inode,
 		type = BTRFS_COMPRESS_LZO;
 	else if (!strncmp("zlib", value, len))
 		type = BTRFS_COMPRESS_ZLIB;
+	else if (!strncmp("zstd", value, len))
+		type = BTRFS_COMPRESS_ZSTD;
 	else
 		return -EINVAL;

@@ -422,6 +426,8 @@ static const char *prop_compression_extract(struct inode *inode)
 		return "zlib";
 	case BTRFS_COMPRESS_LZO:
 		return "lzo";
+	case BTRFS_COMPRESS_ZSTD:
+		return "zstd";
 	}

 	return NULL;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4f1cdd5..4f792d5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -513,6 +513,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 				btrfs_clear_opt(info->mount_opt, NODATASUM);
 				btrfs_set_fs_incompat(info, COMPRESS_LZO);
 				no_compress = 0;
+			} else if (strcmp(args[0].from, "zstd") == 0) {
+				compress_type = "zstd";
+				info->compress_type = BTRFS_COMPRESS_ZSTD;
+				btrfs_set_opt(info->mount_opt, COMPRESS);
+				btrfs_clear_opt(info->mount_opt, NODATACOW);
+				btrfs_clear_opt(info->mount_opt, NODATASUM);
+				btrfs_set_fs_incompat(info, COMPRESS_ZSTD);
+				no_compress = 0;
 			} else if (strncmp(args[0].from, "no", 2) == 0) {
 				compress_type = "no";
 				btrfs_clear_opt(info->mount_opt, COMPRESS);
@@ -1240,8 +1248,10 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 	if (btrfs_test_opt(info, COMPRESS)) {
 		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
 			compress_type = "zlib";
-		else
+		else if (info->compress_type == BTRFS_COMPRESS_LZO)
 			compress_type = "lzo";
+		else
+			compress_type = "zstd";
 		if (btrfs_test_opt(info, FORCE_COMPRESS))
 			seq_printf(seq, ",compress-force=%s", compress_type);
 		else
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 1f157fb..b0dec90 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -200,6 +200,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(mixed_backref, MIXED_BACKREF);
 BTRFS_FEAT_ATTR_INCOMPAT(default_subvol, DEFAULT_SUBVOL);
 BTRFS_FEAT_ATTR_INCOMPAT(mixed_groups, MIXED_GROUPS);
 BTRFS_FEAT_ATTR_INCOMPAT(compress_lzo, COMPRESS_LZO);
+BTRFS_FEAT_ATTR_INCOMPAT(compress_zstd, COMPRESS_ZSTD);
 BTRFS_FEAT_ATTR_INCOMPAT(big_metadata, BIG_METADATA);
 BTRFS_FEAT_ATTR_INCOMPAT(extended_iref, EXTENDED_IREF);
 BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56);
@@ -212,6 +213,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(default_subvol),
 	BTRFS_FEAT_ATTR_PTR(mixed_groups),
 	BTRFS_FEAT_ATTR_PTR(compress_lzo),
+	BTRFS_FEAT_ATTR_PTR(compress_zstd),
 	BTRFS_FEAT_ATTR_PTR(big_metadata),
 	BTRFS_FEAT_ATTR_PTR(extended_iref),
 	BTRFS_FEAT_ATTR_PTR(raid56),
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
new file mode 100644
index 0000000..838741b
--- /dev/null
+++ b/fs/btrfs/zstd.c
@@ -0,0 +1,433 @@
+/*
+ * Copyright (c) 2016-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/init.h>
+#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/pagemap.h>
+#include <linux/bio.h>
+#include <linux/zstd.h>
+#include "compression.h"
+
+#define ZSTD_BTRFS_MAX_WINDOWLOG 17
+#define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
+
+static ZSTD_parameters zstd_get_btrfs_parameters(size_t src_len)
+{
+	ZSTD_parameters params = ZSTD_getParams(1, src_len, 0);
+
+	if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
+		params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
+	WARN_ON(src_len > ZSTD_BTRFS_MAX_INPUT);
+	return params;
+}
+
+struct workspace {
+	void *mem;
+	size_t size;
+	char *buf;
+	struct list_head list;
+};
+
+static void zstd_free_workspace(struct list_head *ws)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+
+	vfree(workspace->mem);
+	kfree(workspace->buf);
+	kfree(workspace);
+}
+
+static struct list_head *zstd_alloc_workspace(void)
+{
+	ZSTD_parameters params =
+			zstd_get_btrfs_parameters(ZSTD_BTRFS_MAX_INPUT);
+	struct workspace *workspace;
+
+	workspace = kzalloc(sizeof(*workspace), GFP_NOFS);
+	if (!workspace)
+		return ERR_PTR(-ENOMEM);
+
+	workspace->size = max_t(size_t,
+			ZSTD_CStreamWorkspaceBound(params.cParams),
+			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
+	workspace->mem = vmalloc(workspace->size);
+	workspace->buf = kmalloc(PAGE_SIZE, GFP_NOFS);
+	if (!workspace->mem || !workspace->buf)
+		goto fail;
+
+	INIT_LIST_HEAD(&workspace->list);
+
+	return &workspace->list;
+fail:
+	zstd_free_workspace(&workspace->list);
+	return ERR_PTR(-ENOMEM);
+}
+
+static int zstd_compress_pages(struct list_head *ws,
+		struct address_space *mapping,
+		u64 start,
+		struct page **pages,
+		unsigned long *out_pages,
+		unsigned long *total_in,
+		unsigned long *total_out)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	ZSTD_CStream *stream;
+	int ret = 0;
+	int nr_pages = 0;
+	struct page *in_page = NULL;  /* The current page to read */
+	struct page *out_page = NULL; /* The current page to write to */
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+	unsigned long tot_in = 0;
+	unsigned long tot_out = 0;
+	unsigned long len = *total_out;
+	const unsigned long nr_dest_pages = *out_pages;
+	unsigned long max_out = nr_dest_pages * PAGE_SIZE;
+	ZSTD_parameters params = zstd_get_btrfs_parameters(len);
+
+	*out_pages = 0;
+	*total_out = 0;
+	*total_in = 0;
+
+	/* Initialize the stream */
+	stream = ZSTD_initCStream(params, len, workspace->mem,
+			workspace->size);
+	if (!stream) {
+		pr_warn("BTRFS: ZSTD_initCStream failed\n");
+		ret = -EIO;
+		goto out;
+	}
+
+	/* map in the first page of input data */
+	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+	in_buf.src = kmap(in_page);
+	in_buf.pos = 0;
+	in_buf.size = min_t(size_t, len, PAGE_SIZE);
+
+
+	/* Allocate and map in the output buffer */
+	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+	if (out_page == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pages[nr_pages++] = out_page;
+	out_buf.dst = kmap(out_page);
+	out_buf.pos = 0;
+	out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_compressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_compressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto out;
+		}
+
+		/* Check to see if we are making it bigger */
+		if (tot_in + in_buf.pos > 8192 &&
+				tot_in + in_buf.pos <
+				tot_out + out_buf.pos) {
+			ret = -E2BIG;
+			goto out;
+		}
+
+		/* We've reached the end of our output range */
+		if (out_buf.pos >= max_out) {
+			tot_out += out_buf.pos;
+			ret = -E2BIG;
+			goto out;
+		}
+
+		/* Check if we need more output space */
+		if (out_buf.pos == out_buf.size) {
+			tot_out += PAGE_SIZE;
+			max_out -= PAGE_SIZE;
+			kunmap(out_page);
+			if (nr_pages == nr_dest_pages) {
+				out_page = NULL;
+				ret = -E2BIG;
+				goto out;
+			}
+			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			if (out_page == NULL) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			pages[nr_pages++] = out_page;
+			out_buf.dst = kmap(out_page);
+			out_buf.pos = 0;
+			out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+		}
+
+		/* We've reached the end of the input */
+		if (in_buf.pos >= len) {
+			tot_in += in_buf.pos;
+			break;
+		}
+
+		/* Check if we need more input */
+		if (in_buf.pos == in_buf.size) {
+			tot_in += PAGE_SIZE;
+			kunmap(in_page);
+			put_page(in_page);
+
+			start += PAGE_SIZE;
+			len -= PAGE_SIZE;
+			in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+			in_buf.src = kmap(in_page);
+			in_buf.pos = 0;
+			in_buf.size = min_t(size_t, len, PAGE_SIZE);
+		}
+	}
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_endStream(stream, &out_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_endStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto out;
+		}
+		if (ret2 == 0) {
+			tot_out += out_buf.pos;
+			break;
+		}
+		if (out_buf.pos >= max_out) {
+			tot_out += out_buf.pos;
+			ret = -E2BIG;
+			goto out;
+		}
+
+		tot_out += PAGE_SIZE;
+		max_out -= PAGE_SIZE;
+		kunmap(out_page);
+		if (nr_pages == nr_dest_pages) {
+			out_page = NULL;
+			ret = -E2BIG;
+			goto out;
+		}
+		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (out_page == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		pages[nr_pages++] = out_page;
+		out_buf.dst = kmap(out_page);
+		out_buf.pos = 0;
+		out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+	}
+
+	if (tot_out >= tot_in) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	ret = 0;
+	*total_in = tot_in;
+	*total_out = tot_out;
+out:
+	*out_pages = nr_pages;
+	/* Cleanup */
+	if (in_page) {
+		kunmap(in_page);
+		put_page(in_page);
+	}
+	if (out_page)
+		kunmap(out_page);
+	return ret;
+}
+
+static int zstd_decompress_bio(struct list_head *ws, struct page **pages_in,
+		u64 disk_start,
+		struct bio *orig_bio,
+		size_t srclen)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	ZSTD_DStream *stream;
+	int ret = 0;
+	unsigned long page_in_index = 0;
+	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
+	unsigned long buf_start;
+	unsigned long total_out = 0;
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+
+	stream = ZSTD_initDStream(
+			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
+	if (!stream) {
+		pr_debug("BTRFS: ZSTD_initDStream failed\n");
+		ret = -EIO;
+		goto done;
+	}
+
+	in_buf.src = kmap(pages_in[page_in_index]);
+	in_buf.pos = 0;
+	in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+
+	out_buf.dst = workspace->buf;
+	out_buf.pos = 0;
+	out_buf.size = PAGE_SIZE;
+
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto done;
+		}
+		buf_start = total_out;
+		total_out += out_buf.pos;
+		out_buf.pos = 0;
+
+		ret = btrfs_decompress_buf2page(out_buf.dst, buf_start,
+				total_out, disk_start, orig_bio);
+		if (ret == 0)
+			break;
+
+		if (in_buf.pos >= srclen)
+			break;
+
+		/* Check if we've hit the end of a frame */
+		if (ret2 == 0)
+			break;
+
+		if (in_buf.pos == in_buf.size) {
+			kunmap(pages_in[page_in_index++]);
+			if (page_in_index >= total_pages_in) {
+				in_buf.src = NULL;
+				ret = -EIO;
+				goto done;
+			}
+			srclen -= PAGE_SIZE;
+			in_buf.src = kmap(pages_in[page_in_index]);
+			in_buf.pos = 0;
+			in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+		}
+	}
+	ret = 0;
+	zero_fill_bio(orig_bio);
+done:
+	if (in_buf.src)
+		kunmap(pages_in[page_in_index]);
+	return ret;
+}
+
+static int zstd_decompress(struct list_head *ws, unsigned char *data_in,
+		struct page *dest_page,
+		unsigned long start_byte,
+		size_t srclen, size_t destlen)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	ZSTD_DStream *stream;
+	int ret = 0;
+	size_t ret2;
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+	unsigned long total_out = 0;
+	unsigned long pg_offset = 0;
+	char *kaddr;
+
+	stream = ZSTD_initDStream(
+			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
+	if (!stream) {
+		pr_warn("BTRFS: ZSTD_initDStream failed\n");
+		ret = -EIO;
+		goto finish;
+	}
+
+	destlen = min_t(size_t, destlen, PAGE_SIZE);
+
+	in_buf.src = data_in;
+	in_buf.pos = 0;
+	in_buf.size = srclen;
+
+	out_buf.dst = workspace->buf;
+	out_buf.pos = 0;
+	out_buf.size = PAGE_SIZE;
+
+	ret2 = 1;
+	while (pg_offset < destlen && in_buf.pos < in_buf.size) {
+		unsigned long buf_start;
+		unsigned long buf_offset;
+		unsigned long bytes;
+
+		/* Check if the frame is over and we still need more input */
+		if (ret2 == 0) {
+			pr_debug("BTRFS: ZSTD_decompressStream ended early\n");
+			ret = -EIO;
+			goto finish;
+		}
+		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto finish;
+		}
+
+		buf_start = total_out;
+		total_out += out_buf.pos;
+		out_buf.pos = 0;
+
+		if (total_out <= start_byte)
+			continue;
+
+		if (total_out > start_byte && buf_start < start_byte)
+			buf_offset = start_byte - buf_start;
+		else
+			buf_offset = 0;
+
+		bytes = min_t(unsigned long, destlen - pg_offset,
+				out_buf.size - buf_offset);
+
+		kaddr = kmap_atomic(dest_page);
+		memcpy(kaddr + pg_offset, out_buf.dst + buf_offset, bytes);
+		kunmap_atomic(kaddr);
+
+		pg_offset += bytes;
+	}
+	ret = 0;
+finish:
+	if (pg_offset < destlen) {
+		kaddr = kmap_atomic(dest_page);
+		memset(kaddr + pg_offset, 0, destlen - pg_offset);
+		kunmap_atomic(kaddr);
+	}
+	return ret;
+}
+
+const struct btrfs_compress_op btrfs_zstd_compress = {
+	.alloc_workspace = zstd_alloc_workspace,
+	.free_workspace = zstd_free_workspace,
+	.compress_pages = zstd_compress_pages,
+	.decompress_bio = zstd_decompress_bio,
+	.decompress = zstd_decompress,
+};
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index a456e53..992c150 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -255,13 +255,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
-/*
- * some patches floated around with a second compression method
- * lets save that incompat here for when they do get in
- * Note we don't actually support it, we're just reserving the
- * number
- */
-#define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZOv2	(1ULL << 4)
+#define BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD	(1ULL << 4)

 /*
  * older kernels tried to do bigger metadata blocks, but the
--
2.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/4] squashfs: Add zstd support
  2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
  2017-06-29 19:41 ` [PATCH v2 1/4] lib: Add xxhash module Nick Terrell
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
@ 2017-06-29 19:41 ` Nick Terrell
  2017-06-30  7:36 ` [PATCH v2 0/4] Add xxhash and zstd modules David Sterba
  2017-06-30 16:46 ` Timofey Titovets
  4 siblings, 0 replies; 32+ messages in thread
From: Nick Terrell @ 2017-06-29 19:41 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel,
	Sean Purcell

Add zstd compression and decompression support to SquashFS. zstd is a
great fit for SquashFS because it can compress at ratios approaching xz,
while decompressing twice as fast as zlib. For SquashFS in particular,
it can decompress as fast as lzo and lz4. It also has the flexibility
to turn down the compression ratio for faster compression times.

The compression benchmark is run on the file tree from the SquashFS archive
found in ubuntu-16.10-desktop-amd64.iso [1]. It uses `mksquashfs` with the
default block size (128 KB) and and various compression algorithms/levels.
xz and zstd are also benchmarked with 256 KB blocks. The decompression
benchmark times how long it takes to `tar` the file tree into `/dev/null`.
See the benchmark file in the upstream zstd source repository located under
`contrib/linux-kernel/squashfs-benchmark.sh` [2] for details.

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD.

| Method         | Ratio | Compression MB/s | Decompression MB/s |
|----------------|-------|------------------|--------------------|
| gzip           |  2.92 |               15 |                128 |
| lzo            |  2.64 |              9.5 |                217 |
| lz4            |  2.12 |               94 |                218 |
| xz             |  3.43 |              5.5 |                 35 |
| xz 256 KB      |  3.53 |              5.4 |                 40 |
| zstd 1         |  2.71 |               96 |                210 |
| zstd 5         |  2.93 |               69 |                198 |
| zstd 10        |  3.01 |               41 |                225 |
| zstd 15        |  3.13 |             11.4 |                224 |
| zstd 16 256 KB |  3.24 |              8.1 |                210 |

This patch was written by Sean Purcell <me@seanp.xyz>, but I will be
taking over the submission process.

[1] http://releases.ubuntu.com/16.10/
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/squashfs-benchmark.sh

zstd source repository: https://github.com/facebook/zstd

Cc: Sean Purcell <me@seanp.xyz>
Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 fs/squashfs/Kconfig        |  14 +++++
 fs/squashfs/Makefile       |   1 +
 fs/squashfs/decompressor.c |   7 +++
 fs/squashfs/decompressor.h |   4 ++
 fs/squashfs/squashfs_fs.h  |   1 +
 fs/squashfs/zstd_wrapper.c | 150 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 177 insertions(+)
 create mode 100644 fs/squashfs/zstd_wrapper.c

diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index ffb093e..1adb334 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -165,6 +165,20 @@ config SQUASHFS_XZ

 	  If unsure, say N.

+config SQUASHFS_ZSTD
+	bool "Include support for ZSTD compressed file systems"
+	depends on SQUASHFS
+	select ZSTD_DECOMPRESS
+	help
+	  Saying Y here includes support for reading Squashfs file systems
+	  compressed with ZSTD compression.  ZSTD gives better compression than
+	  the default ZLIB compression, while using less CPU.
+
+	  ZSTD is not the standard compression used in Squashfs and so most
+	  file systems will be readable without selecting this option.
+
+	  If unsure, say N.
+
 config SQUASHFS_4K_DEVBLK_SIZE
 	bool "Use 4K device block size?"
 	depends on SQUASHFS
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 246a6f3..6655631 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -15,3 +15,4 @@ squashfs-$(CONFIG_SQUASHFS_LZ4) += lz4_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_LZO) += lzo_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_XZ) += xz_wrapper.o
 squashfs-$(CONFIG_SQUASHFS_ZLIB) += zlib_wrapper.o
+squashfs-$(CONFIG_SQUASHFS_ZSTD) += zstd_wrapper.o
diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c
index d2bc136..8366398 100644
--- a/fs/squashfs/decompressor.c
+++ b/fs/squashfs/decompressor.c
@@ -65,6 +65,12 @@ static const struct squashfs_decompressor squashfs_zlib_comp_ops = {
 };
 #endif

+#ifndef CONFIG_SQUASHFS_ZSTD
+static const struct squashfs_decompressor squashfs_zstd_comp_ops = {
+	NULL, NULL, NULL, NULL, ZSTD_COMPRESSION, "zstd", 0
+};
+#endif
+
 static const struct squashfs_decompressor squashfs_unknown_comp_ops = {
 	NULL, NULL, NULL, NULL, 0, "unknown", 0
 };
@@ -75,6 +81,7 @@ static const struct squashfs_decompressor *decompressor[] = {
 	&squashfs_lzo_comp_ops,
 	&squashfs_xz_comp_ops,
 	&squashfs_lzma_unsupported_comp_ops,
+	&squashfs_zstd_comp_ops,
 	&squashfs_unknown_comp_ops
 };

diff --git a/fs/squashfs/decompressor.h b/fs/squashfs/decompressor.h
index a25713c..0f5a8e4 100644
--- a/fs/squashfs/decompressor.h
+++ b/fs/squashfs/decompressor.h
@@ -58,4 +58,8 @@ extern const struct squashfs_decompressor squashfs_lzo_comp_ops;
 extern const struct squashfs_decompressor squashfs_zlib_comp_ops;
 #endif

+#ifdef CONFIG_SQUASHFS_ZSTD
+extern const struct squashfs_decompressor squashfs_zstd_comp_ops;
+#endif
+
 #endif
diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h
index 506f4ba..24d12fd 100644
--- a/fs/squashfs/squashfs_fs.h
+++ b/fs/squashfs/squashfs_fs.h
@@ -241,6 +241,7 @@ struct meta_index {
 #define LZO_COMPRESSION		3
 #define XZ_COMPRESSION		4
 #define LZ4_COMPRESSION		5
+#define ZSTD_COMPRESSION	6

 struct squashfs_super_block {
 	__le32			s_magic;
diff --git a/fs/squashfs/zstd_wrapper.c b/fs/squashfs/zstd_wrapper.c
new file mode 100644
index 0000000..8cb7c76
--- /dev/null
+++ b/fs/squashfs/zstd_wrapper.c
@@ -0,0 +1,150 @@
+/*
+ * Squashfs - a compressed read only filesystem for Linux
+ *
+ * Copyright (c) 2016-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * zstd_wrapper.c
+ */
+
+#include <linux/mutex.h>
+#include <linux/buffer_head.h>
+#include <linux/slab.h>
+#include <linux/zstd.h>
+#include <linux/vmalloc.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs.h"
+#include "decompressor.h"
+#include "page_actor.h"
+
+struct workspace {
+	void *mem;
+	size_t mem_size;
+};
+
+static void *zstd_init(struct squashfs_sb_info *msblk, void *buff)
+{
+	struct workspace *wksp = kmalloc(sizeof(*wksp), GFP_KERNEL);
+	if (wksp == NULL)
+		goto failed;
+	wksp->mem_size = ZSTD_DStreamWorkspaceBound(max_t(size_t,
+				msblk->block_size, SQUASHFS_METADATA_SIZE));
+	wksp->mem = vmalloc(wksp->mem_size);
+	if (wksp->mem == NULL)
+		goto failed;
+
+	return wksp;
+
+failed:
+	ERROR("Failed to allocate zstd workspace\n");
+	kfree(wksp);
+	return ERR_PTR(-ENOMEM);
+}
+
+
+static void zstd_free(void *strm)
+{
+	struct workspace *wksp = strm;
+
+	if (wksp)
+		vfree(wksp->mem);
+	kfree(wksp);
+}
+
+
+static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
+	struct buffer_head **bh, int b, int offset, int length,
+	struct squashfs_page_actor *output)
+{
+	struct workspace *wksp = strm;
+	ZSTD_DStream *stream;
+	size_t total_out = 0;
+	size_t zstd_err;
+	int k = 0;
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+
+	stream = ZSTD_initDStream(wksp->mem_size, wksp->mem, wksp->mem_size);
+
+	if (!stream) {
+		ERROR("Failed to initialize zstd decompressor\n");
+		goto out;
+	}
+
+	out_buf.size = PAGE_SIZE;
+	out_buf.dst = squashfs_first_page(output);
+
+	do {
+		if (in_buf.pos == in_buf.size && k < b) {
+			int avail = min(length, msblk->devblksize - offset);
+			length -= avail;
+			in_buf.src = bh[k]->b_data + offset;
+			in_buf.size = avail;
+			in_buf.pos = 0;
+			offset = 0;
+		}
+
+		if (out_buf.pos == out_buf.size) {
+			out_buf.dst = squashfs_next_page(output);
+			if (out_buf.dst == NULL) {
+				/* shouldn't run out of pages before stream is
+				 * done */
+				squashfs_finish_page(output);
+				goto out;
+			}
+			out_buf.pos = 0;
+			out_buf.size = PAGE_SIZE;
+		}
+
+		total_out -= out_buf.pos;
+		zstd_err = ZSTD_decompressStream(stream, &out_buf, &in_buf);
+		total_out += out_buf.pos; /* add the additional data produced */
+
+		if (in_buf.pos == in_buf.size && k < b)
+			put_bh(bh[k++]);
+	} while (zstd_err != 0 && !ZSTD_isError(zstd_err));
+
+	squashfs_finish_page(output);
+
+	if (ZSTD_isError(zstd_err)) {
+		ERROR("zstd decompression error: %d\n",
+				(int)ZSTD_getErrorCode(zstd_err));
+		goto out;
+	}
+
+	if (k < b)
+		goto out;
+
+	return (int)total_out;
+
+out:
+	for (; k < b; k++)
+		put_bh(bh[k]);
+
+	return -EIO;
+}
+
+const struct squashfs_decompressor squashfs_zstd_comp_ops = {
+	.init = zstd_init,
+	.free = zstd_free,
+	.decompress = zstd_uncompress,
+	.id = ZSTD_COMPRESSION,
+	.name = "zstd",
+	.supported = 1
+};
--
2.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
@ 2017-06-30  3:24   ` Adam Borowski
  2017-06-30 12:16   ` E V
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Adam Borowski @ 2017-06-30  3:24 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, David Sterba,
	squashfs-devel, linux-btrfs, linux-kernel

On Thu, Jun 29, 2017 at 12:41:07PM -0700, Nick Terrell wrote:
> Add zstd compression and decompression support to BtrFS. zstd at its
> fastest level compresses almost as well as zlib, while offering much
> faster compression and decompression, approaching lzo speeds.

> +static int zstd_decompress_bio(struct list_head *ws, struct page **pages_in,
> +		u64 disk_start,
> +		struct bio *orig_bio,
> +		size_t srclen)

Note that this fails to build if e1ddce71 is applied, and it'll be in
mainline in just a few days from now.  Fixing that is obvious, but
to avoid duplicating the effort:


diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 838741b3732c..e9403e569d8e 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -25,6 +25,7 @@
 #include <linux/pagemap.h>
 #include <linux/bio.h>
 #include <linux/zstd.h>
+#include <linux/refcount.h>
 #include "compression.h"
 
 #define ZSTD_BTRFS_MAX_WINDOWLOG 17
@@ -262,17 +263,18 @@ static int zstd_compress_pages(struct list_head *ws,
        return ret;
 }
 
-static int zstd_decompress_bio(struct list_head *ws, struct page **pages_in,
-               u64 disk_start,
-               struct bio *orig_bio,
-               size_t srclen)
+static int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 {
        struct workspace *workspace = list_entry(ws, struct workspace, list);
        ZSTD_DStream *stream;
        int ret = 0;
        unsigned long page_in_index = 0;
+       size_t srclen = cb->compressed_len;
        unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
        unsigned long buf_start;
+       struct page **pages_in = cb->compressed_pages;
+       u64 disk_start = cb->start;
+       struct bio *orig_bio = cb->orig_bio;
        unsigned long total_out = 0;
        ZSTD_inBuffer in_buf = { NULL, 0, 0 };
        ZSTD_outBuffer out_buf = { NULL, 0, 0 };


-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] Add xxhash and zstd modules
  2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
                   ` (2 preceding siblings ...)
  2017-06-29 19:41 ` [PATCH v2 4/4] squashfs: " Nick Terrell
@ 2017-06-30  7:36 ` David Sterba
  2017-06-30 16:46 ` Timofey Titovets
  4 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2017-06-30  7:36 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel

On Thu, Jun 29, 2017 at 12:41:04PM -0700, Nick Terrell wrote:
>  lib/zstd/Makefile          |   18 +
>  lib/zstd/bitstream.h       |  374 +++++
>  lib/zstd/compress.c        | 3479 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/zstd/decompress.c      | 2526 ++++++++++++++++++++++++++++++++
>  lib/zstd/entropy_common.c  |  243 ++++
>  lib/zstd/error_private.h   |   53 +
>  lib/zstd/fse.h             |  575 ++++++++
>  lib/zstd/fse_compress.c    |  795 ++++++++++
>  lib/zstd/fse_decompress.c  |  332 +++++
>  lib/zstd/huf.h             |  212 +++
>  lib/zstd/huf_compress.c    |  771 ++++++++++
>  lib/zstd/huf_decompress.c  |  960 ++++++++++++
>  lib/zstd/mem.h             |  151 ++
>  lib/zstd/zstd_common.c     |   75 +
>  lib/zstd/zstd_internal.h   |  269 ++++
>  lib/zstd/zstd_opt.h        | 1014 +++++++++++++

I don't see patch 2/4 in the mailinglist, probably over the 100kb limit.

The lib/* stuff needs to be merged first, I shouldn't add non-btrfs code
to our regular for-next branch. A testing branch with all the zstd code
can be created, but from the upstream merging process, I don't think we
can avoid two step merging.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
  2017-06-30  3:24   ` Adam Borowski
@ 2017-06-30 12:16   ` E V
  2017-06-30 14:21     ` David Sterba
  2017-07-10 21:11     ` Clemens Eisserer
  2017-07-06 16:32   ` Adam Borowski
  2017-07-18 18:21   ` David Sterba
  3 siblings, 2 replies; 32+ messages in thread
From: E V @ 2017-06-30 12:16 UTC (permalink / raw)
  To: Nick Terrell, linux-btrfs

On Thu, Jun 29, 2017 at 3:41 PM, Nick Terrell <terrelln@fb.com> wrote:
> Add zstd compression and decompression support to BtrFS. zstd at its
> fastest level compresses almost as well as zlib, while offering much
> faster compression and decompression, approaching lzo speeds.
>
> I benchmarked btrfs with zstd compression against no compression, lzo
> compression, and zlib compression. I benchmarked two scenarios. Copying
> a set of files to btrfs, and then reading the files. Copying a tarball
> to btrfs, extracting it to btrfs, and then reading the extracted files.
> After every operation, I call `sync` and include the sync time.
> Between every pair of operations I unmount and remount the filesystem
> to avoid caching. The benchmark files can be found in the upstream
> zstd source repository under
> `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
> [1] [2].
>
> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> 16 GB of RAM, and a SSD.
>
> The first compression benchmark is copying 10 copies of the unzipped
> Silesia corpus [3] into a BtrFS filesystem mounted with
> `-o compress-force=Method`. The decompression benchmark times how long
> it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
> measured by comparing the output of `df` and `du`. See the benchmark file
> [1] for details. I benchmarked multiple zstd compression levels, although
> the patch uses zstd level 1.
>
> | Method  | Ratio | Compression MB/s | Decompression speed |
> |---------|-------|------------------|---------------------|
> | None    |  0.99 |              504 |                 686 |
> | lzo     |  1.66 |              398 |                 442 |
> | zlib    |  2.58 |               65 |                 241 |
> | zstd 1  |  2.57 |              260 |                 383 |
> | zstd 3  |  2.71 |              174 |                 408 |
> | zstd 6  |  2.87 |               70 |                 398 |
> | zstd 9  |  2.92 |               43 |                 406 |
> | zstd 12 |  2.93 |               21 |                 408 |
> | zstd 15 |  3.01 |               11 |                 354 |
>

As a user looking at this graph the zstd 3 seems like the sweet spot to me,
more then twice as fast as zlib with a bit better compression. Is this
going to be
configurable?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-30 12:16   ` E V
@ 2017-06-30 14:21     ` David Sterba
  2017-06-30 18:25       ` Austin S. Hemmelgarn
  2017-07-10 21:11     ` Clemens Eisserer
  1 sibling, 1 reply; 32+ messages in thread
From: David Sterba @ 2017-06-30 14:21 UTC (permalink / raw)
  To: E V; +Cc: Nick Terrell, linux-btrfs

On Fri, Jun 30, 2017 at 08:16:20AM -0400, E V wrote:
> On Thu, Jun 29, 2017 at 3:41 PM, Nick Terrell <terrelln@fb.com> wrote:
> > Add zstd compression and decompression support to BtrFS. zstd at its
> > fastest level compresses almost as well as zlib, while offering much
> > faster compression and decompression, approaching lzo speeds.
> >
> > I benchmarked btrfs with zstd compression against no compression, lzo
> > compression, and zlib compression. I benchmarked two scenarios. Copying
> > a set of files to btrfs, and then reading the files. Copying a tarball
> > to btrfs, extracting it to btrfs, and then reading the extracted files.
> > After every operation, I call `sync` and include the sync time.
> > Between every pair of operations I unmount and remount the filesystem
> > to avoid caching. The benchmark files can be found in the upstream
> > zstd source repository under
> > `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
> > [1] [2].
> >
> > I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> > The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> > 16 GB of RAM, and a SSD.
> >
> > The first compression benchmark is copying 10 copies of the unzipped
> > Silesia corpus [3] into a BtrFS filesystem mounted with
> > `-o compress-force=Method`. The decompression benchmark times how long
> > it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
> > measured by comparing the output of `df` and `du`. See the benchmark file
> > [1] for details. I benchmarked multiple zstd compression levels, although
> > the patch uses zstd level 1.
> >
> > | Method  | Ratio | Compression MB/s | Decompression speed |
> > |---------|-------|------------------|---------------------|
> > | None    |  0.99 |              504 |                 686 |
> > | lzo     |  1.66 |              398 |                 442 |
> > | zlib    |  2.58 |               65 |                 241 |
> > | zstd 1  |  2.57 |              260 |                 383 |
> > | zstd 3  |  2.71 |              174 |                 408 |
> > | zstd 6  |  2.87 |               70 |                 398 |
> > | zstd 9  |  2.92 |               43 |                 406 |
> > | zstd 12 |  2.93 |               21 |                 408 |
> > | zstd 15 |  3.01 |               11 |                 354 |
> >
> 
> As a user looking at this graph the zstd 3 seems like the sweet spot to me,
> more then twice as fast as zlib with a bit better compression. Is this
> going to be
> configurable?

If we're going to make that configurable, there are some things to
consider:

* the underlying compressed format -- does not change for different
  levels

* the configuration interface -- mount options, defrag ioctl

* backward compatibility

For the mount option specification, sorted from the worst to best per my
preference:

* new option, eg. clevel=%d or compress-level=%d
* use existing options, extend the compression name
  * compress=zlib3
  * compress=zlib/3
  * compress=zlib:3

The defrag ioctl args have some reserved space for extension or we can
abuse btrfs_ioctl_defrag_range_args::compress_type that's unnecessarily
u32. Either way we don't need to introduce a new ioctl number and struct
(which is good of course).

Regarding backward compatibility, older kernel would probably not
recognize the extended spec format. We use strcmp, so the full name must
match. Had we used strncmp, we could have compared just the prefix of
known length and the level part would be ignored. A patch for that would
not be intrusive and could be ported to older stable kernels, if there's
enough user demand.

So, I don't see any problem making the level configurable.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] Add xxhash and zstd modules
  2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
                   ` (3 preceding siblings ...)
  2017-06-30  7:36 ` [PATCH v2 0/4] Add xxhash and zstd modules David Sterba
@ 2017-06-30 16:46 ` Timofey Titovets
  2017-06-30 19:52   ` Nick Terrell
  4 siblings, 1 reply; 32+ messages in thread
From: Timofey Titovets @ 2017-06-30 16:46 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, Linux Kernel

Hi Nick Terrell,
If i understood all correctly,
zstd can compress (decompress) data in way compatible with gzip (zlib)
Do that also true for in kernel library?
If that true, does that make a sense to directly replace zlib with
zstd (configured to work like zlib) in place (as example for btrfs
zlib decompression/compression code)?

Thanks!

-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-30 14:21     ` David Sterba
@ 2017-06-30 18:25       ` Austin S. Hemmelgarn
  2017-06-30 23:01         ` Nick Terrell
  0 siblings, 1 reply; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-06-30 18:25 UTC (permalink / raw)
  To: dsterba, E V, Nick Terrell, linux-btrfs

On 2017-06-30 10:21, David Sterba wrote:
> On Fri, Jun 30, 2017 at 08:16:20AM -0400, E V wrote:
>> On Thu, Jun 29, 2017 at 3:41 PM, Nick Terrell <terrelln@fb.com> wrote:
>>> Add zstd compression and decompression support to BtrFS. zstd at its
>>> fastest level compresses almost as well as zlib, while offering much
>>> faster compression and decompression, approaching lzo speeds.
>>>
>>> I benchmarked btrfs with zstd compression against no compression, lzo
>>> compression, and zlib compression. I benchmarked two scenarios. Copying
>>> a set of files to btrfs, and then reading the files. Copying a tarball
>>> to btrfs, extracting it to btrfs, and then reading the extracted files.
>>> After every operation, I call `sync` and include the sync time.
>>> Between every pair of operations I unmount and remount the filesystem
>>> to avoid caching. The benchmark files can be found in the upstream
>>> zstd source repository under
>>> `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
>>> [1] [2].
>>>
>>> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>>> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>>> 16 GB of RAM, and a SSD.
>>>
>>> The first compression benchmark is copying 10 copies of the unzipped
>>> Silesia corpus [3] into a BtrFS filesystem mounted with
>>> `-o compress-force=Method`. The decompression benchmark times how long
>>> it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
>>> measured by comparing the output of `df` and `du`. See the benchmark file
>>> [1] for details. I benchmarked multiple zstd compression levels, although
>>> the patch uses zstd level 1.
>>>
>>> | Method  | Ratio | Compression MB/s | Decompression speed |
>>> |---------|-------|------------------|---------------------|
>>> | None    |  0.99 |              504 |                 686 |
>>> | lzo     |  1.66 |              398 |                 442 |
>>> | zlib    |  2.58 |               65 |                 241 |
>>> | zstd 1  |  2.57 |              260 |                 383 |
>>> | zstd 3  |  2.71 |              174 |                 408 |
>>> | zstd 6  |  2.87 |               70 |                 398 |
>>> | zstd 9  |  2.92 |               43 |                 406 |
>>> | zstd 12 |  2.93 |               21 |                 408 |
>>> | zstd 15 |  3.01 |               11 |                 354 |
>>>
>>
>> As a user looking at this graph the zstd 3 seems like the sweet spot to me,
>> more then twice as fast as zlib with a bit better compression. Is this
>> going to be
>> configurable?
> 
> If we're going to make that configurable, there are some things to
> consider:
> 
> * the underlying compressed format -- does not change for different
>    levels
> 
> * the configuration interface -- mount options, defrag ioctl
> 
> * backward compatibility
There is also the fact of deciding what to use for the default when 
specified without a level.  This is easy for lzo and zlib, where we can 
just use the existing level, but for zstd we would need to decide how to 
handle a user just specifying 'zstd' without a level.  I agree with E V 
that level 3 appears to be the turnover point, and would suggest using 
that for the default.
> 
> For the mount option specification, sorted from the worst to best per my
> preference:
> 
> * new option, eg. clevel=%d or compress-level=%d
> * use existing options, extend the compression name
>    * compress=zlib3
>    * compress=zlib/3
>    * compress=zlib:3
I think it makes more sense to make the level part of the existing 
specification.  ZFS does things that way (although they use a - to 
separate the name from the level), and any arbitrary level does not mean 
the same thing across different algorithms (for example, level 15 means 
nothing for zlib, but is the highest level for zstd).
> 
> The defrag ioctl args have some reserved space for extension or we can
> abuse btrfs_ioctl_defrag_range_args::compress_type that's unnecessarily
> u32. Either way we don't need to introduce a new ioctl number and struct
> (which is good of course).
> 
> Regarding backward compatibility, older kernel would probably not
> recognize the extended spec format. We use strcmp, so the full name must
> match. Had we used strncmp, we could have compared just the prefix of
> known length and the level part would be ignored. A patch for that would
> not be intrusive and could be ported to older stable kernels, if there's
> enough user demand.
TBH, I would think that that's required if this is going to be 
implemented, but it may be tricky because 'lzo' and 'zlib' are not the 
same length.
> 
> So, I don't see any problem making the level configurable.
I would actually love to see this, I regularly make use of varying 
compression both on BTRFS (with separate filesystems) and on the 
ZFS-based NAS systems we have at work (where it can be set per-dataset) 
to allow better compression on less frequently accessed data.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/4] Add xxhash and zstd modules
  2017-06-30 16:46 ` Timofey Titovets
@ 2017-06-30 19:52   ` Nick Terrell
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Terrell @ 2017-06-30 19:52 UTC (permalink / raw)
  To: Timofey Titovets
  Cc: Kernel Team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel@lists.sourceforge.net, linux-btrfs,
	Linux Kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 613 bytes --]

> If i understood all correctly,
> zstd can compress (decompress) data in way compatible with gzip (zlib)
> Do that also true for in kernel library?

The zstd command line tool can decompress gzip/zlib compressed data, and
can compress in the gzip format. However, the zstd format is not compatible
with zlib, so gzip/zlib cannot decompress zstd compressed data. The zstd
kernel module can only compress and decompress the zstd format, and can't
decompress zlib/gzip compressed data.


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-30 18:25       ` Austin S. Hemmelgarn
@ 2017-06-30 23:01         ` Nick Terrell
  2017-07-05 11:43           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-06-30 23:01 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, dsterba@suse.cz, E V, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1624 bytes --]

>> If we're going to make that configurable, there are some things to
>> consider:
>> 
>> * the underlying compressed format -- does not change for different
>>    levels

This is true for zlib and zstd. lzo in the kernel only supports one
compression level.

>> * the configuration interface -- mount options, defrag ioctl
>> 
>> * backward compatibility
> There is also the fact of deciding what to use for the default when 
> specified without a level.  This is easy for lzo and zlib, where we can 
> just use the existing level, but for zstd we would need to decide how to 
> handle a user just specifying 'zstd' without a level.  I agree with E V 
> that level 3 appears to be the turnover point, and would suggest using 
> that for the default.

I chose level 1 because I thought if we had to choose one speed/compression
trade off, faster would be better. However, with a configerable compression
level, level 3 is a great default, and is the default of the zstd CLI.

>> So, I don't see any problem making the level configurable.
> I would actually love to see this, I regularly make use of varying 
> compression both on BTRFS (with separate filesystems) and on the 
> ZFS-based NAS systems we have at work (where it can be set per-dataset) 
> to allow better compression on less frequently accessed data.

I would love to see configurable compression level as well. Would you want
me to add it to my patch set, or should I adapt my patch set to work on top
of it when it is ready?
    

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-30 23:01         ` Nick Terrell
@ 2017-07-05 11:43           ` Austin S. Hemmelgarn
  2017-07-05 18:18             ` Adam Borowski
  0 siblings, 1 reply; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-05 11:43 UTC (permalink / raw)
  To: Nick Terrell, dsterba@suse.cz, E V, linux-btrfs

On 2017-06-30 19:01, Nick Terrell wrote:
>>> If we're going to make that configurable, there are some things to
>>> consider:
>>>
>>> * the underlying compressed format -- does not change for different
>>>     levels
> 
> This is true for zlib and zstd. lzo in the kernel only supports one
> compression level.
I had actually forgot that the kernel only implements one compression 
level for LZO.
> 
>>> * the configuration interface -- mount options, defrag ioctl
>>>
>>> * backward compatibility
>> There is also the fact of deciding what to use for the default when
>> specified without a level.  This is easy for lzo and zlib, where we can
>> just use the existing level, but for zstd we would need to decide how to
>> handle a user just specifying 'zstd' without a level.  I agree with E V
>> that level 3 appears to be the turnover point, and would suggest using
>> that for the default.
> 
> I chose level 1 because I thought if we had to choose one speed/compression
> trade off, faster would be better. However, with a configerable compression
> level, level 3 is a great default, and is the default of the zstd CLI.
Actually, even if it's not configurable, I would prefer 3, as that still 
performs better in both respects (speed and compression ratio) than zlib 
while being sufficiently different from lzo performance to make it easy 
to decide on one or the other.  As far as configurable levels for 
regular usage on a filesystem, there are only three levels you 
benchmarked that I would be interested in, namely level 1 (highly active 
data on slow storage with a fast CPU), 3 (stuff I would use zlib for 
today), and 15 (stuff I would use out-of-band compression for today (for 
example, archival storage)).
> 
>>> So, I don't see any problem making the level configurable.
>> I would actually love to see this, I regularly make use of varying
>> compression both on BTRFS (with separate filesystems) and on the
>> ZFS-based NAS systems we have at work (where it can be set per-dataset)
>> to allow better compression on less frequently accessed data.
> 
> I would love to see configurable compression level as well. Would you want
> me to add it to my patch set, or should I adapt my patch set to work on top
> of it when it is ready?
David or one of the other developers would be a better person to ask, I 
mostly review kernel patches from a admin perspective, not a development 
perspective, so I don't really know which option would be better in this 
case.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-05 11:43           ` Austin S. Hemmelgarn
@ 2017-07-05 18:18             ` Adam Borowski
  2017-07-05 18:45               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Adam Borowski @ 2017-07-05 18:18 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Nick Terrell, dsterba@suse.cz, E V, linux-btrfs

On Wed, Jul 05, 2017 at 07:43:27AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-06-30 19:01, Nick Terrell wrote:
> > > There is also the fact of deciding what to use for the default when
> > > specified without a level.  This is easy for lzo and zlib, where we can
> > > just use the existing level, but for zstd we would need to decide how to
> > > handle a user just specifying 'zstd' without a level.  I agree with E V
> > > that level 3 appears to be the turnover point, and would suggest using
> > > that for the default.
> > 
> > I chose level 1 because I thought if we had to choose one speed/compression
> > trade off, faster would be better. However, with a configerable compression
> > level, level 3 is a great default, and is the default of the zstd CLI.
> Actually, even if it's not configurable, I would prefer 3, as that still
> performs better in both respects (speed and compression ratio) than zlib
> while being sufficiently different from lzo performance to make it easy to
> decide on one or the other.  As far as configurable levels for regular usage
> on a filesystem, there are only three levels you benchmarked that I would be
> interested in, namely level 1 (highly active data on slow storage with a
> fast CPU), 3 (stuff I would use zlib for today), and 15 (stuff I would use
> out-of-band compression for today (for example, archival storage)).

If you guys are going to argue between 1 and 3, just go the cracovian deal
and settle at 2. :þ

But more seriously: zstd looks so much better than lzo and zlib than I'd
suggest making it the default compression in cases where there's no way to
choose, such as chattr +c.  But then, changing the default before the
previous LTS kernel can mount it would be irresponsible -- thus, if you can
get it into 4.14, we're looking at 4.19 at soonest (or later if we consider
distro kernels).

Which means the timing is quite tight: if, per DSterba's request, /lib/
parts are going via a non-btrfs tree, there'll be not enough adequate
testing in -next.  Thus, would it be possible to have /lib/ patches in
btrfs-next but not in for-linus?  That would allow testing so you can catch
the LTS train.

> > > > So, I don't see any problem making the level configurable.
> > > I would actually love to see this, I regularly make use of varying
> > > compression both on BTRFS (with separate filesystems) and on the
> > > ZFS-based NAS systems we have at work (where it can be set per-dataset)
> > > to allow better compression on less frequently accessed data.
> > 
> > I would love to see configurable compression level as well. Would you want
> > me to add it to my patch set, or should I adapt my patch set to work on top
> > of it when it is ready?

Note that as zstd is new, there's no backwards compat to care of, thus you
are free to use whatever -o/prop syntax you'd like.  If zstd ends up being
configurable while zlib is not -- oh well, there's no reason to use zlib
anymore other than for mounting with old kernels, in which case we can't use
configurable props anyway.  Unlike zlib, lzo is not strictly worse than
configurable zstd, but it has only one level so there's nothing to configure
as well.

Thus, I'd suggest skipping the issue and implement levels for zstd only.


As for testing: I assume you guys are doing stress testing on amd64 already,
right?  I got two crappy arm64 machines but won't be able to test there
before 4.13 (Icenowy has been lazy and didn't push any near-mainline patch
set recently; every single Pine64/Pinebook kernel pushed by anyone but her
didn't work for me so I don't even bother trying anymore).  On the other
hand, the armhf box I'm running Debian rebuilds on is a gem among cheap
SoCs.  After Nick fixed the flickering workspaces bug, there were no further
hiccups; on monday I switched to 4.12.0 + btrfs-for-4.13 + zstd (thus
luckily avoiding that nowait aio bug), also no explosions yet.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-05 18:18             ` Adam Borowski
@ 2017-07-05 18:45               ` Austin S. Hemmelgarn
  2017-07-05 19:35                 ` Nick Terrell
  0 siblings, 1 reply; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-05 18:45 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Nick Terrell, dsterba@suse.cz, E V, linux-btrfs

On 2017-07-05 14:18, Adam Borowski wrote:
> On Wed, Jul 05, 2017 at 07:43:27AM -0400, Austin S. Hemmelgarn
> wrote:
>> On 2017-06-30 19:01, Nick Terrell wrote:
>>>> There is also the fact of deciding what to use for the default
>>>> when specified without a level.  This is easy for lzo and zlib,
>>>> where we can just use the existing level, but for zstd we would
>>>> need to decide how to handle a user just specifying 'zstd'
>>>> without a level.  I agree with E V that level 3 appears to be
>>>> the turnover point, and would suggest using that for the
>>>> default.
>>> 
>>> I chose level 1 because I thought if we had to choose one
>>> speed/compression trade off, faster would be better. However,
>>> with a configerable compression level, level 3 is a great
>>> default, and is the default of the zstd CLI.
>> Actually, even if it's not configurable, I would prefer 3, as that
>> still performs better in both respects (speed and compression
>> ratio) than zlib while being sufficiently different from lzo
>> performance to make it easy to decide on one or the other.  As far
>> as configurable levels for regular usage on a filesystem, there are
>> only three levels you benchmarked that I would be interested in,
>> namely level 1 (highly active data on slow storage with a fast
>> CPU), 3 (stuff I would use zlib for today), and 15 (stuff I would
>> use out-of-band compression for today (for example, archival
>> storage)).
> 
> If you guys are going to argue between 1 and 3, just go the cracovian
> deal and settle at 2. :þ
> 
> But more seriously: zstd looks so much better than lzo and zlib than
> I'd suggest making it the default compression in cases where there's
> no way to choose, such as chattr +c.  But then, changing the default
> before the previous LTS kernel can mount it would be irresponsible --
> thus, if you can get it into 4.14, we're looking at 4.19 at soonest
> (or later if we consider distro kernels).
To be entirely honest, we probably should have switched to LZO as the
default a while back to put things more in-line with ZFS (which
traditionally favors performance for in-line compression) and Windows
(which uses a custom LZ77 derivative that's wicked fast on most modern
systems).  I would say that as soon as we get to the point that the last 
two LTS releases support it, zstd should probably become the default.

Also, slightly OT, but I would love to have the ability to set per 
volume (not subvolume but volume itself) what to use for compression 
when no algorithm is specified.
> 
> Which means the timing is quite tight: if, per DSterba's request,
> /lib/ parts are going via a non-btrfs tree, there'll be not enough
> adequate testing in -next.  Thus, would it be possible to have /lib/
> patches in btrfs-next but not in for-linus?  That would allow testing
> so you can catch the LTS train.
> 
>>>>> So, I don't see any problem making the level configurable.
>>>> I would actually love to see this, I regularly make use of
>>>> varying compression both on BTRFS (with separate filesystems)
>>>> and on the ZFS-based NAS systems we have at work (where it can
>>>> be set per-dataset) to allow better compression on less
>>>> frequently accessed data.
>>> 
>>> I would love to see configurable compression level as well. Would
>>> you want me to add it to my patch set, or should I adapt my patch
>>> set to work on top of it when it is ready?
> 
> Note that as zstd is new, there's no backwards compat to care of,
> thus you are free to use whatever -o/prop syntax you'd like.  If zstd
> ends up being configurable while zlib is not -- oh well, there's no
> reason to use zlib anymore other than for mounting with old kernels,
> in which case we can't use configurable props anyway.  Unlike zlib,
> lzo is not strictly worse than configurable zstd, but it has only one
> level so there's nothing to configure as well.
> 
> Thus, I'd suggest skipping the issue and implement levels for zstd
> only.
I would mostly agree, with one possible exception.  _If_ zlib at the max
level gets similar compression ratios to zstd on it's higher levels
_and_ it also gets better performance on at least one aspect (I think
zlib at level 9 will probably have better compression performance than
zstd at level 15, but I'm not sure about the compression ratio), then I
would argue that it might be worth implementing levels for zlib.
Actually determining that will of course require proper testing, but
that's probably better left as a separate discussion.
> 
> 
> As for testing: I assume you guys are doing stress testing on amd64
> already, right?  I got two crappy arm64 machines but won't be able to
> test there before 4.13 (Icenowy has been lazy and didn't push any
> near-mainline patch set recently; every single Pine64/Pinebook kernel
> pushed by anyone but her didn't work for me so I don't even bother
> trying anymore).  On the other hand, the armhf box I'm running Debian
> rebuilds on is a gem among cheap SoCs.  After Nick fixed the
> flickering workspaces bug, there were no further hiccups; on monday I
> switched to 4.12.0 + btrfs-for-4.13 + zstd (thus luckily avoiding
> that nowait aio bug), also no explosions yet.
Which reminds me, I forgot to mention in my last e-mail, I ran stress
tests over the holiday weekend with pretty much the same kernel
combination on QEMU instances for amd64, i686, armv7a, armv6 (EABI only
on both), arm64, ppc64 (both big and little endian), and 32-bit MIPS
(new ABI only, also both big and little endian), and saw no issues
relating to zstd or BTRFS (I did run into some other issues, but they 
were because I still don't have things quite configured properly for 
this new testing setup).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-05 18:45               ` Austin S. Hemmelgarn
@ 2017-07-05 19:35                 ` Nick Terrell
  2017-07-05 19:57                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-07-05 19:35 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6716 bytes --]

On 7/5/17, 11:45 AM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>On 2017-07-05 14:18, Adam Borowski wrote:
>> On Wed, Jul 05, 2017 at 07:43:27AM -0400, Austin S. Hemmelgarn
>> wrote:
>>> On 2017-06-30 19:01, Nick Terrell wrote:
>>>>> There is also the fact of deciding what to use for the default
>>>>> when specified without a level.  This is easy for lzo and zlib,
>>>>> where we can just use the existing level, but for zstd we would
>>>>> need to decide how to handle a user just specifying 'zstd'
>>>>> without a level.  I agree with E V that level 3 appears to be
>>>>> the turnover point, and would suggest using that for the
>>>>> default.
>>>> 
>>>> I chose level 1 because I thought if we had to choose one
>>>> speed/compression trade off, faster would be better. However,
>>>> with a configerable compression level, level 3 is a great
>>>> default, and is the default of the zstd CLI.
>>> Actually, even if it's not configurable, I would prefer 3, as that
>>> still performs better in both respects (speed and compression
>>> ratio) than zlib while being sufficiently different from lzo
>>> performance to make it easy to decide on one or the other.  As far
>>> as configurable levels for regular usage on a filesystem, there are
>>> only three levels you benchmarked that I would be interested in,
>>> namely level 1 (highly active data on slow storage with a fast
>>> CPU), 3 (stuff I would use zlib for today), and 15 (stuff I would
>>> use out-of-band compression for today (for example, archival
>>> storage)).
>> 
>> If you guys are going to argue between 1 and 3, just go the cracovian
>> deal and settle at 2. :þ

I'll change the default to level 3 in the next update.

>> 
>> But more seriously: zstd looks so much better than lzo and zlib than
>> I'd suggest making it the default compression in cases where there's
>> no way to choose, such as chattr +c.  But then, changing the default
>> before the previous LTS kernel can mount it would be irresponsible --
>> thus, if you can get it into 4.14, we're looking at 4.19 at soonest
>> (or later if we consider distro kernels).
> To be entirely honest, we probably should have switched to LZO as the
> default a while back to put things more in-line with ZFS (which
> traditionally favors performance for in-line compression) and Windows
> (which uses a custom LZ77 derivative that's wicked fast on most modern
> systems).  I would say that as soon as we get to the point that the last 
> two LTS releases support it, zstd should probably become the default.
>
> Also, slightly OT, but I would love to have the ability to set per 
> volume (not subvolume but volume itself) what to use for compression 
> when no algorithm is specified.
>> 
>> Which means the timing is quite tight: if, per DSterba's request,
>> /lib/ parts are going via a non-btrfs tree, there'll be not enough
>> adequate testing in -next.  Thus, would it be possible to have /lib/
>> patches in btrfs-next but not in for-linus?  That would allow testing
>> so you can catch the LTS train.
>> 
>>>>>> So, I don't see any problem making the level configurable.
>>>>> I would actually love to see this, I regularly make use of
>>>>> varying compression both on BTRFS (with separate filesystems)
>>>>> and on the ZFS-based NAS systems we have at work (where it can
>>>>> be set per-dataset) to allow better compression on less
>>>>> frequently accessed data.
>>>> 
>>>> I would love to see configurable compression level as well. Would
>>>> you want me to add it to my patch set, or should I adapt my patch
>>>> set to work on top of it when it is ready?
>> 
>> Note that as zstd is new, there's no backwards compat to care of,
>> thus you are free to use whatever -o/prop syntax you'd like.  If zstd
>> ends up being configurable while zlib is not -- oh well, there's no
>> reason to use zlib anymore other than for mounting with old kernels,
>> in which case we can't use configurable props anyway.  Unlike zlib,
>> lzo is not strictly worse than configurable zstd, but it has only one
>> level so there's nothing to configure as well.
>> 
>> Thus, I'd suggest skipping the issue and implement levels for zstd
>> only.
> I would mostly agree, with one possible exception.  _If_ zlib at the max
> level gets similar compression ratios to zstd on it's higher levels
> _and_ it also gets better performance on at least one aspect (I think
> zlib at level 9 will probably have better compression performance than
> zstd at level 15, but I'm not sure about the compression ratio), then I
> would argue that it might be worth implementing levels for zlib.
> Actually determining that will of course require proper testing, but
> that's probably better left as a separate discussion.

For every zlib level, there is one or more level if zstd that performes
better in all of compression speed, decompression speed, and compression
level. zstd also has levels that compress better than zlib, but slower
compression speed and faster decompression speed.

These benchmarks are run in userland, but I expect the results to be the
same in BtrFS. See the lzbench compression benchmark
https://github.com/inikep/lzbench. There are corner cases where zlib can
compress better than zstd, but I wouldn't expect to see them often, and
the gains on other files should offset the small loss on the edge cases.

>> As for testing: I assume you guys are doing stress testing on amd64
>> already, right?  I got two crappy arm64 machines but won't be able to
>> test there before 4.13 (Icenowy has been lazy and didn't push any
>> near-mainline patch set recently; every single Pine64/Pinebook kernel
>> pushed by anyone but her didn't work for me so I don't even bother
>> trying anymore).  On the other hand, the armhf box I'm running Debian
>> rebuilds on is a gem among cheap SoCs.  After Nick fixed the
>> flickering workspaces bug, there were no further hiccups; on monday I
>> switched to 4.12.0 + btrfs-for-4.13 + zstd (thus luckily avoiding
>> that nowait aio bug), also no explosions yet.
> Which reminds me, I forgot to mention in my last e-mail, I ran stress
> tests over the holiday weekend with pretty much the same kernel
> combination on QEMU instances for amd64, i686, armv7a, armv6 (EABI only
> on both), arm64, ppc64 (both big and little endian), and 32-bit MIPS
> (new ABI only, also both big and little endian), and saw no issues
> relating to zstd or BTRFS (I did run into some other issues, but they 
> were because I still don't have things quite configured properly for 
> this new testing setup).

Thanks for running the stress tests! 

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-05 19:35                 ` Nick Terrell
@ 2017-07-05 19:57                   ` Austin S. Hemmelgarn
  2017-07-06  0:25                     ` Nick Terrell
  0 siblings, 1 reply; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-05 19:57 UTC (permalink / raw)
  To: Nick Terrell, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

On 2017-07-05 15:35, Nick Terrell wrote:
> On 7/5/17, 11:45 AM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>> On 2017-07-05 14:18, Adam Borowski wrote:
>>> On Wed, Jul 05, 2017 at 07:43:27AM -0400, Austin S. Hemmelgarn
>>> wrote:
>>>> On 2017-06-30 19:01, Nick Terrell wrote:
>>>>>> There is also the fact of deciding what to use for the default
>>>>>> when specified without a level.  This is easy for lzo and zlib,
>>>>>> where we can just use the existing level, but for zstd we would
>>>>>> need to decide how to handle a user just specifying 'zstd'
>>>>>> without a level.  I agree with E V that level 3 appears to be
>>>>>> the turnover point, and would suggest using that for the
>>>>>> default.
>>>>>
>>>>> I chose level 1 because I thought if we had to choose one
>>>>> speed/compression trade off, faster would be better. However,
>>>>> with a configerable compression level, level 3 is a great
>>>>> default, and is the default of the zstd CLI.
>>>> Actually, even if it's not configurable, I would prefer 3, as that
>>>> still performs better in both respects (speed and compression
>>>> ratio) than zlib while being sufficiently different from lzo
>>>> performance to make it easy to decide on one or the other.  As far
>>>> as configurable levels for regular usage on a filesystem, there are
>>>> only three levels you benchmarked that I would be interested in,
>>>> namely level 1 (highly active data on slow storage with a fast
>>>> CPU), 3 (stuff I would use zlib for today), and 15 (stuff I would
>>>> use out-of-band compression for today (for example, archival
>>>> storage)).
>>>
>>> If you guys are going to argue between 1 and 3, just go the cracovian
>>> deal and settle at 2. :þ
> 
> I'll change the default to level 3 in the next update.
> 
>>>
>>> But more seriously: zstd looks so much better than lzo and zlib than
>>> I'd suggest making it the default compression in cases where there's
>>> no way to choose, such as chattr +c.  But then, changing the default
>>> before the previous LTS kernel can mount it would be irresponsible --
>>> thus, if you can get it into 4.14, we're looking at 4.19 at soonest
>>> (or later if we consider distro kernels).
>> To be entirely honest, we probably should have switched to LZO as the
>> default a while back to put things more in-line with ZFS (which
>> traditionally favors performance for in-line compression) and Windows
>> (which uses a custom LZ77 derivative that's wicked fast on most modern
>> systems).  I would say that as soon as we get to the point that the last
>> two LTS releases support it, zstd should probably become the default.
>>
>> Also, slightly OT, but I would love to have the ability to set per
>> volume (not subvolume but volume itself) what to use for compression
>> when no algorithm is specified.
>>>
>>> Which means the timing is quite tight: if, per DSterba's request,
>>> /lib/ parts are going via a non-btrfs tree, there'll be not enough
>>> adequate testing in -next.  Thus, would it be possible to have /lib/
>>> patches in btrfs-next but not in for-linus?  That would allow testing
>>> so you can catch the LTS train.
>>>
>>>>>>> So, I don't see any problem making the level configurable.
>>>>>> I would actually love to see this, I regularly make use of
>>>>>> varying compression both on BTRFS (with separate filesystems)
>>>>>> and on the ZFS-based NAS systems we have at work (where it can
>>>>>> be set per-dataset) to allow better compression on less
>>>>>> frequently accessed data.
>>>>>
>>>>> I would love to see configurable compression level as well. Would
>>>>> you want me to add it to my patch set, or should I adapt my patch
>>>>> set to work on top of it when it is ready?
>>>
>>> Note that as zstd is new, there's no backwards compat to care of,
>>> thus you are free to use whatever -o/prop syntax you'd like.  If zstd
>>> ends up being configurable while zlib is not -- oh well, there's no
>>> reason to use zlib anymore other than for mounting with old kernels,
>>> in which case we can't use configurable props anyway.  Unlike zlib,
>>> lzo is not strictly worse than configurable zstd, but it has only one
>>> level so there's nothing to configure as well.
>>>
>>> Thus, I'd suggest skipping the issue and implement levels for zstd
>>> only.
>> I would mostly agree, with one possible exception.  _If_ zlib at the max
>> level gets similar compression ratios to zstd on it's higher levels
>> _and_ it also gets better performance on at least one aspect (I think
>> zlib at level 9 will probably have better compression performance than
>> zstd at level 15, but I'm not sure about the compression ratio), then I
>> would argue that it might be worth implementing levels for zlib.
>> Actually determining that will of course require proper testing, but
>> that's probably better left as a separate discussion.
> 
> For every zlib level, there is one or more level if zstd that performes
> better in all of compression speed, decompression speed, and compression
> level. zstd also has levels that compress better than zlib, but slower
> compression speed and faster decompression speed.
It's the slower compression speed that has me arguing for the 
possibility of configurable levels on zlib.  11MB/s is painfully slow 
considering that most decent HDD's these days can get almost 5-10x that 
speed with no compression.  There are cases (WORM pattern archival 
storage for example) where slow writes to that degree may be acceptable, 
but for most users they won't be, and zlib at level 9 would probably be 
a better choice.  I don't think it can beat zstd at level 15 for 
compression ratio, but if they're even close, then zlib would still be a 
better option at that high of a compression level most of the time.
> 
> These benchmarks are run in userland, but I expect the results to be the
> same in BtrFS. See the lzbench compression benchmark
> https://github.com/inikep/lzbench. There are corner cases where zlib can
> compress better than zstd, but I wouldn't expect to see them often, and
> the gains on other files should offset the small loss on the edge cases.
> 
>>> As for testing: I assume you guys are doing stress testing on amd64
>>> already, right?  I got two crappy arm64 machines but won't be able to
>>> test there before 4.13 (Icenowy has been lazy and didn't push any
>>> near-mainline patch set recently; every single Pine64/Pinebook kernel
>>> pushed by anyone but her didn't work for me so I don't even bother
>>> trying anymore).  On the other hand, the armhf box I'm running Debian
>>> rebuilds on is a gem among cheap SoCs.  After Nick fixed the
>>> flickering workspaces bug, there were no further hiccups; on monday I
>>> switched to 4.12.0 + btrfs-for-4.13 + zstd (thus luckily avoiding
>>> that nowait aio bug), also no explosions yet.
>> Which reminds me, I forgot to mention in my last e-mail, I ran stress
>> tests over the holiday weekend with pretty much the same kernel
>> combination on QEMU instances for amd64, i686, armv7a, armv6 (EABI only
>> on both), arm64, ppc64 (both big and little endian), and 32-bit MIPS
>> (new ABI only, also both big and little endian), and saw no issues
>> relating to zstd or BTRFS (I did run into some other issues, but they
>> were because I still don't have things quite configured properly for
>> this new testing setup).
> 
> Thanks for running the stress tests!
> 
Always glad to help.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-05 19:57                   ` Austin S. Hemmelgarn
@ 2017-07-06  0:25                     ` Nick Terrell
  2017-07-06 11:59                       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-07-06  0:25 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1805 bytes --]

On 7/5/17, 12:57 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> It's the slower compression speed that has me arguing for the 
> possibility of configurable levels on zlib.  11MB/s is painfully slow 
> considering that most decent HDD's these days can get almost 5-10x that 
> speed with no compression.  There are cases (WORM pattern archival 
> storage for example) where slow writes to that degree may be acceptable, 
> but for most users they won't be, and zlib at level 9 would probably be 
> a better choice.  I don't think it can beat zstd at level 15 for 
> compression ratio, but if they're even close, then zlib would still be a 
> better option at that high of a compression level most of the time.

I don't imagine the very high zstd levels would be useful to too many
btrfs users, except in rare cases. However, lower levels of zstd should
outperform zlib level 9 in all aspects except memory usage. I would expect
zstd level 7 would compress as well as or better than zlib 9 with faster
compression and decompression speed. It's worth benchmarking to ensure that
it holds for many different workloads, but I wouldn't expect zlib 9 to
compress better than zstd 7 often. zstd up to level 12 should compress as
fast as or faster than zlib level 9. zstd levels 12 and beyond allow
stronger compression than zlib, at the cost of slow compression and more
memory usage. 

Supporting multiple zlib compression levels could be intersting for older
kernels, lower memory usage, or backwards compatibility with older btrfs
versions. But for every zlib level, zstd has a level that provides better
compression ratio, compression speed, and decompression speed.


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-06  0:25                     ` Nick Terrell
@ 2017-07-06 11:59                       ` Austin S. Hemmelgarn
  2017-07-06 12:09                         ` Lionel Bouton
  0 siblings, 1 reply; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-06 11:59 UTC (permalink / raw)
  To: Nick Terrell, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

On 2017-07-05 20:25, Nick Terrell wrote:
> On 7/5/17, 12:57 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>> It's the slower compression speed that has me arguing for the
>> possibility of configurable levels on zlib.  11MB/s is painfully slow
>> considering that most decent HDD's these days can get almost 5-10x that
>> speed with no compression.  There are cases (WORM pattern archival
>> storage for example) where slow writes to that degree may be acceptable,
>> but for most users they won't be, and zlib at level 9 would probably be
>> a better choice.  I don't think it can beat zstd at level 15 for
>> compression ratio, but if they're even close, then zlib would still be a
>> better option at that high of a compression level most of the time.
> 
> I don't imagine the very high zstd levels would be useful to too many
> btrfs users, except in rare cases. However, lower levels of zstd should
> outperform zlib level 9 in all aspects except memory usage. I would expect
> zstd level 7 would compress as well as or better than zlib 9 with faster
> compression and decompression speed. It's worth benchmarking to ensure that
> it holds for many different workloads, but I wouldn't expect zlib 9 to
> compress better than zstd 7 often. zstd up to level 12 should compress as
> fast as or faster than zlib level 9. zstd levels 12 and beyond allow
> stronger compression than zlib, at the cost of slow compression and more
> memory usage.
While I generally agree that most people probably won't use zstd levels 
above 3, it shouldn't be hard to support them if we're going to have 
configurable compression levels, so I would argue that it's still worth 
supporting anyway.

Looking at it another way, ZFS (at least, ZFS on FreeBSD) supports zlib 
compression (they call it gzip) with selectable compression levels, but 
95% of the time nobody uses anything but levels 1, 3, and 9.  Despite 
this, they still support other levels, and I have seen cases where other 
levels have been better than one of those three 'normal' ones.
> 
> Supporting multiple zlib compression levels could be intersting for older
> kernels, lower memory usage, or backwards compatibility with older btrfs
> versions. But for every zlib level, zstd has a level that provides better
> compression ratio, compression speed, and decompression speed.
Just the memory footprint is a remarkably strong argument in many cases. 
  It appears to be one of the few things that zlib does better than zstd 
(although I'm not 100% certain about that), and can make a very big 
difference when dealing with small systems.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-06 11:59                       ` Austin S. Hemmelgarn
@ 2017-07-06 12:09                         ` Lionel Bouton
  2017-07-06 12:27                           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Lionel Bouton @ 2017-07-06 12:09 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Nick Terrell, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

Le 06/07/2017 à 13:59, Austin S. Hemmelgarn a écrit :
> On 2017-07-05 20:25, Nick Terrell wrote:
>> On 7/5/17, 12:57 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
>> wrote:
>>> It's the slower compression speed that has me arguing for the
>>> possibility of configurable levels on zlib.  11MB/s is painfully slow
>>> considering that most decent HDD's these days can get almost 5-10x that
>>> speed with no compression.  There are cases (WORM pattern archival
>>> storage for example) where slow writes to that degree may be
>>> acceptable,
>>> but for most users they won't be, and zlib at level 9 would probably be
>>> a better choice.  I don't think it can beat zstd at level 15 for
>>> compression ratio, but if they're even close, then zlib would still
>>> be a
>>> better option at that high of a compression level most of the time.
>>
>> I don't imagine the very high zstd levels would be useful to too many
>> btrfs users, except in rare cases. However, lower levels of zstd should
>> outperform zlib level 9 in all aspects except memory usage. I would
>> expect
>> zstd level 7 would compress as well as or better than zlib 9 with faster
>> compression and decompression speed. It's worth benchmarking to
>> ensure that
>> it holds for many different workloads, but I wouldn't expect zlib 9 to
>> compress better than zstd 7 often. zstd up to level 12 should
>> compress as
>> fast as or faster than zlib level 9. zstd levels 12 and beyond allow
>> stronger compression than zlib, at the cost of slow compression and more
>> memory usage.
> While I generally agree that most people probably won't use zstd
> levels above 3, it shouldn't be hard to support them if we're going to
> have configurable compression levels, so I would argue that it's still
> worth supporting anyway.

One use case for the higher compression levels would be manual
defragmentation with recompression for a subset of data (files that
won't be updated and are stored for long periods typically). The
filesystem would be mounted with a low level for general usage low
latencies and the subset of files would be recompressed with a high
level asynchronously.

Best regards,

Lionel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-06 12:09                         ` Lionel Bouton
@ 2017-07-06 12:27                           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-06 12:27 UTC (permalink / raw)
  To: Lionel Bouton, Nick Terrell, Adam Borowski
  Cc: dsterba@suse.cz, E V, linux-btrfs, Yann Collet

On 2017-07-06 08:09, Lionel Bouton wrote:
> Le 06/07/2017 à 13:59, Austin S. Hemmelgarn a écrit :
>> On 2017-07-05 20:25, Nick Terrell wrote:
>>> On 7/5/17, 12:57 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
>>> wrote:
>>>> It's the slower compression speed that has me arguing for the
>>>> possibility of configurable levels on zlib.  11MB/s is painfully slow
>>>> considering that most decent HDD's these days can get almost 5-10x that
>>>> speed with no compression.  There are cases (WORM pattern archival
>>>> storage for example) where slow writes to that degree may be
>>>> acceptable,
>>>> but for most users they won't be, and zlib at level 9 would probably be
>>>> a better choice.  I don't think it can beat zstd at level 15 for
>>>> compression ratio, but if they're even close, then zlib would still
>>>> be a
>>>> better option at that high of a compression level most of the time.
>>>
>>> I don't imagine the very high zstd levels would be useful to too many
>>> btrfs users, except in rare cases. However, lower levels of zstd should
>>> outperform zlib level 9 in all aspects except memory usage. I would
>>> expect
>>> zstd level 7 would compress as well as or better than zlib 9 with faster
>>> compression and decompression speed. It's worth benchmarking to
>>> ensure that
>>> it holds for many different workloads, but I wouldn't expect zlib 9 to
>>> compress better than zstd 7 often. zstd up to level 12 should
>>> compress as
>>> fast as or faster than zlib level 9. zstd levels 12 and beyond allow
>>> stronger compression than zlib, at the cost of slow compression and more
>>> memory usage.
>> While I generally agree that most people probably won't use zstd
>> levels above 3, it shouldn't be hard to support them if we're going to
>> have configurable compression levels, so I would argue that it's still
>> worth supporting anyway.
> 
> One use case for the higher compression levels would be manual
> defragmentation with recompression for a subset of data (files that
> won't be updated and are stored for long periods typically). The
> filesystem would be mounted with a low level for general usage low
> latencies and the subset of files would be recompressed with a high
> level asynchronously.
That's one of the cases I was thinking of, we actually do that on a 
couple of systems where I work that see mostly WORM access patterns, so 
zstd level 15's insanely good decompression speed would be great for this.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
  2017-06-30  3:24   ` Adam Borowski
  2017-06-30 12:16   ` E V
@ 2017-07-06 16:32   ` Adam Borowski
  2017-07-07 23:17     ` Nick Terrell
  2017-07-18 18:21   ` David Sterba
  3 siblings, 1 reply; 32+ messages in thread
From: Adam Borowski @ 2017-07-06 16:32 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, David Sterba, linux-btrfs,
	linux-kernel

On Thu, Jun 29, 2017 at 12:41:07PM -0700, Nick Terrell wrote:
> Add zstd compression and decompression support to BtrFS. zstd at its
> fastest level compresses almost as well as zlib, while offering much
> faster compression and decompression, approaching lzo speeds.

Got a reproducible crash on amd64:

[98235.266511] BUG: unable to handle kernel paging request at ffffc90001251000
[98235.267485] IP: ZSTD_storeSeq.constprop.24+0x67/0xe0
[98235.269395] PGD 227034067 
[98235.269397] P4D 227034067 
[98235.271587] PUD 227035067 
[98235.273657] PMD 223323067 
[98235.275744] PTE 0

[98235.281545] Oops: 0002 [#1] SMP
[98235.283353] Modules linked in: loop veth tun fuse arc4 rtl8xxxu mac80211 cfg80211 cp210x pl2303 rfkill usbserial nouveau video mxm_wmi ttm
[98235.285203] CPU: 0 PID: 10850 Comm: kworker/u12:9 Not tainted 4.12.0+ #1
[98235.287070] Hardware name: System manufacturer System Product Name/M4A77T, BIOS 2401    05/18/2011
[98235.288964] Workqueue: btrfs-delalloc btrfs_delalloc_helper
[98235.290934] task: ffff880224984140 task.stack: ffffc90007e5c000
[98235.292731] RIP: 0010:ZSTD_storeSeq.constprop.24+0x67/0xe0
[98235.294579] RSP: 0018:ffffc90007e5fa68 EFLAGS: 00010282
[98235.296395] RAX: ffffc90001251001 RBX: 0000000000000094 RCX: ffffc9000118f930
[98235.298380] RDX: 0000000000000006 RSI: ffffc900011b06b0 RDI: ffffc9000118d1e0
[98235.300321] RBP: 000000000000009f R08: 1fffffffffffbe58 R09: 0000000000000000
[98235.302282] R10: ffffc9000118f970 R11: 0000000000000005 R12: ffffc9000118f878
[98235.304221] R13: 000000000000005b R14: ffffc9000118f915 R15: ffffc900011cfe88
[98235.306147] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
[98235.308162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[98235.310129] CR2: ffffc90001251000 CR3: 000000021018d000 CR4: 00000000000006f0
[98235.312095] Call Trace:
[98235.314008]  ? ZSTD_compressBlock_fast+0x94b/0xb30
[98235.315975]  ? ZSTD_compressContinue_internal+0x1a0/0x580
[98235.317938]  ? ZSTD_compressStream_generic+0x248/0x2f0
[98235.319877]  ? ZSTD_compressStream+0x41/0x60
[98235.321821]  ? zstd_compress_pages+0x236/0x5d0
[98235.323724]  ? btrfs_compress_pages+0x5e/0x80
[98235.325684]  ? compress_file_range.constprop.79+0x1eb/0x750
[98235.327668]  ? async_cow_start+0x2e/0x50
[98235.329594]  ? btrfs_worker_helper+0x1b9/0x1d0
[98235.331486]  ? process_one_work+0x158/0x2f0
[98235.333361]  ? worker_thread+0x45/0x3a0
[98235.335253]  ? process_one_work+0x2f0/0x2f0
[98235.337189]  ? kthread+0x10e/0x130
[98235.339020]  ? kthread_park+0x60/0x60
[98235.340819]  ? ret_from_fork+0x22/0x30
[98235.342637] Code: 8b 4e d0 4c 89 48 d0 4c 8b 4e d8 4c 89 48 d8 4c 8b 4e e0 4c 89 48 e0 4c 8b 4e e8 4c 89 48 e8 4c 8b 4e f0 4c 89 48 f0 4c 8b 4e f8 <4c> 89 48 f8 48 39 f1 75 a2 4e 8d 04 c0 48 8b 31 48 83 c0 08 48 
[98235.346773] RIP: ZSTD_storeSeq.constprop.24+0x67/0xe0 RSP: ffffc90007e5fa68
[98235.348809] CR2: ffffc90001251000
[98235.363216] ---[ end trace 5fb3ad0f2aec0605 ]---
[98235.363218] BUG: unable to handle kernel paging request at ffffc9000393a000
[98235.363239] IP: ZSTD_storeSeq.constprop.24+0x67/0xe0
[98235.363241] PGD 227034067 
[98235.363242] P4D 227034067 
[98235.363243] PUD 227035067 
[98235.363244] PMD 21edec067 
[98235.363245] PTE 0
(More of the above follows.)

My reproducer copies an uncompressed tarball onto a fresh filesystem:
.----
#!/bin/sh
set -e

losetup -D; umount /mnt/vol1 ||:
dd if=/dev/zero of=/tmp/disk bs=2048 seek=1048575 count=1
mkfs.btrfs -msingle /tmp/disk
losetup -f /tmp/disk
sleep 1 # yay udev races
mount -onoatime,compress=$1 /dev/loop0 /mnt/vol1
time sh -c 'cp -p ~kilobyte/tmp/kernel.tar /mnt/vol1 && umount /mnt/vol1'
losetup -D
`----
(run it with arg of "zstd")

Kernel is 4.12.0 + btrfs-for-4.13 + v4 of Qu's chunk check + some unrelated
stuff + zstd; in case it matters I've pushed my tree to
https://github.com/kilobyte/linux/tree/zstd-crash

The payload is a tarball of the above, but, for debugging compression you
need the exact byte stream.  https://angband.pl/tmp/kernel.tar.xz --
without xz, I compressed it for transport.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-06 16:32   ` Adam Borowski
@ 2017-07-07 23:17     ` Nick Terrell
  2017-07-07 23:40       ` Adam Borowski
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-07-07 23:17 UTC (permalink / raw)
  To: Adam Borowski
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5004 bytes --]

On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
> On Thu, Jun 29, 2017 at 12:41:07PM -0700, Nick Terrell wrote:
>> Add zstd compression and decompression support to BtrFS. zstd at its
>> fastest level compresses almost as well as zlib, while offering much
>> faster compression and decompression, approaching lzo speeds.
>
> Got a reproducible crash on amd64:
>
> [98235.266511] BUG: unable to handle kernel paging request at ffffc90001251000
> [98235.267485] IP: ZSTD_storeSeq.constprop.24+0x67/0xe0
> [98235.269395] PGD 227034067 
> [98235.269397] P4D 227034067 
> [98235.271587] PUD 227035067 
> [98235.273657] PMD 223323067 
> [98235.275744] PTE 0
>
> [98235.281545] Oops: 0002 [#1] SMP
> [98235.283353] Modules linked in: loop veth tun fuse arc4 rtl8xxxu mac80211 cfg80211 cp210x pl2303 rfkill usbserial nouveau video mxm_wmi ttm
> [98235.285203] CPU: 0 PID: 10850 Comm: kworker/u12:9 Not tainted 4.12.0+ #1
> [98235.287070] Hardware name: System manufacturer System Product Name/M4A77T, BIOS 2401    05/18/2011
> [98235.288964] Workqueue: btrfs-delalloc btrfs_delalloc_helper
> [98235.290934] task: ffff880224984140 task.stack: ffffc90007e5c000
> [98235.292731] RIP: 0010:ZSTD_storeSeq.constprop.24+0x67/0xe0
> [98235.294579] RSP: 0018:ffffc90007e5fa68 EFLAGS: 00010282
> [98235.296395] RAX: ffffc90001251001 RBX: 0000000000000094 RCX: ffffc9000118f930
> [98235.298380] RDX: 0000000000000006 RSI: ffffc900011b06b0 RDI: ffffc9000118d1e0
> [98235.300321] RBP: 000000000000009f R08: 1fffffffffffbe58 R09: 0000000000000000
> [98235.302282] R10: ffffc9000118f970 R11: 0000000000000005 R12: ffffc9000118f878
> [98235.304221] R13: 000000000000005b R14: ffffc9000118f915 R15: ffffc900011cfe88
> [98235.306147] FS:  0000000000000000(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000
> [98235.308162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [98235.310129] CR2: ffffc90001251000 CR3: 000000021018d000 CR4: 00000000000006f0
> [98235.312095] Call Trace:
> [98235.314008]  ? ZSTD_compressBlock_fast+0x94b/0xb30
> [98235.315975]  ? ZSTD_compressContinue_internal+0x1a0/0x580
> [98235.317938]  ? ZSTD_compressStream_generic+0x248/0x2f0
> [98235.319877]  ? ZSTD_compressStream+0x41/0x60
> [98235.321821]  ? zstd_compress_pages+0x236/0x5d0
> [98235.323724]  ? btrfs_compress_pages+0x5e/0x80
> [98235.325684]  ? compress_file_range.constprop.79+0x1eb/0x750
> [98235.327668]  ? async_cow_start+0x2e/0x50
> [98235.329594]  ? btrfs_worker_helper+0x1b9/0x1d0
> [98235.331486]  ? process_one_work+0x158/0x2f0
> [98235.333361]  ? worker_thread+0x45/0x3a0
> [98235.335253]  ? process_one_work+0x2f0/0x2f0
> [98235.337189]  ? kthread+0x10e/0x130
> [98235.339020]  ? kthread_park+0x60/0x60
> [98235.340819]  ? ret_from_fork+0x22/0x30
> [98235.342637] Code: 8b 4e d0 4c 89 48 d0 4c 8b 4e d8 4c 89 48 d8 4c 8b 4e e0 4c 89 48 e0 4c 8b 4e e8 4c 89 48 e8 4c 8b 4e f0 4c 89 48 f0 4c 8b 4e f8 <4c> 89 48 f8 48 39 f1 75 a2 4e 8d 04 c0 48 8b 31 48 83 c0 08 48 
> [98235.346773] RIP: ZSTD_storeSeq.constprop.24+0x67/0xe0 RSP: ffffc90007e5fa68
> [98235.348809] CR2: ffffc90001251000
> [98235.363216] ---[ end trace 5fb3ad0f2aec0605 ]---
> [98235.363218] BUG: unable to handle kernel paging request at ffffc9000393a000
> [98235.363239] IP: ZSTD_storeSeq.constprop.24+0x67/0xe0
> [98235.363241] PGD 227034067 
> [98235.363242] P4D 227034067 
> [98235.363243] PUD 227035067 
> [98235.363244] PMD 21edec067 
> [98235.363245] PTE 0
> (More of the above follows.)
>
> My reproducer copies an uncompressed tarball onto a fresh filesystem:
> .----
> #!/bin/sh
> set -e
>
> losetup -D; umount /mnt/vol1 ||:
> dd if=/dev/zero of=/tmp/disk bs=2048 seek=1048575 count=1
> mkfs.btrfs -msingle /tmp/disk
> losetup -f /tmp/disk
> sleep 1 # yay udev races
> mount -onoatime,compress=$1 /dev/loop0 /mnt/vol1
> time sh -c 'cp -p ~kilobyte/tmp/kernel.tar /mnt/vol1 && umount /mnt/vol1'
> losetup -D
> `----
> (run it with arg of "zstd")
>
> Kernel is 4.12.0 + btrfs-for-4.13 + v4 of Qu's chunk check + some unrelated
> stuff + zstd; in case it matters I've pushed my tree to
> https://github.com/kilobyte/linux/tree/zstd-crash
>
> The payload is a tarball of the above, but, for debugging compression you
> need the exact byte stream.  https://angband.pl/tmp/kernel.tar.xz  --
> without xz, I compressed it for transport.

Thanks for the bug report Adam! I'm looking into the failure, and haven't
been able to reproduce it yet. I've built my kernel from your tree, and
I ran your script with the kernel.tar tarball 100 times, but haven't gotten
a failure yet. I have a few questions to guide my debugging.

- How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
- Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
- Are the failures always in exactly the same place, and does it fail 100%
  of the time or just regularly?


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-07 23:17     ` Nick Terrell
@ 2017-07-07 23:40       ` Adam Borowski
  2017-07-08  3:07         ` Adam Borowski
  0 siblings, 1 reply; 32+ messages in thread
From: Adam Borowski @ 2017-07-07 23:40 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

On Fri, Jul 07, 2017 at 11:17:49PM +0000, Nick Terrell wrote:
> On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
> > Got a reproducible crash on amd64:
> >
> > [98235.266511] BUG: unable to handle kernel paging request at ffffc90001251000

> > [98235.314008]  ? ZSTD_compressBlock_fast+0x94b/0xb30
> > [98235.315975]  ? ZSTD_compressContinue_internal+0x1a0/0x580
> > [98235.317938]  ? ZSTD_compressStream_generic+0x248/0x2f0
> > [98235.319877]  ? ZSTD_compressStream+0x41/0x60
> > [98235.321821]  ? zstd_compress_pages+0x236/0x5d0
> > [98235.323724]  ? btrfs_compress_pages+0x5e/0x80
> > [98235.325684]  ? compress_file_range.constprop.79+0x1eb/0x750
> 
> Thanks for the bug report Adam! I'm looking into the failure, and haven't
> been able to reproduce it yet. I've built my kernel from your tree, and
> I ran your script with the kernel.tar tarball 100 times, but haven't gotten
> a failure yet.

Crashed the same way 4 out of 4 tries for me.

> I have a few questions to guide my debugging.
> 
> - How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
> - Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
> - Are the failures always in exactly the same place, and does it fail 100%
>   of the time or just regularly?

6 cores -- all on bare metal.  gcc-7.1.0-9.

Lemme try with gcc-6, a different config or in a VM.

-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-07 23:40       ` Adam Borowski
@ 2017-07-08  3:07         ` Adam Borowski
  2017-07-10 12:36           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 32+ messages in thread
From: Adam Borowski @ 2017-07-08  3:07 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

On Sat, Jul 08, 2017 at 01:40:18AM +0200, Adam Borowski wrote:
> On Fri, Jul 07, 2017 at 11:17:49PM +0000, Nick Terrell wrote:
> > On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
> > > Got a reproducible crash on amd64:
>
> > Thanks for the bug report Adam! I'm looking into the failure, and haven't
> > been able to reproduce it yet. I've built my kernel from your tree, and
> > I ran your script with the kernel.tar tarball 100 times, but haven't gotten
> > a failure yet.
> 
> > I have a few questions to guide my debugging.
> > 
> > - How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
> > - Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
> > - Are the failures always in exactly the same place, and does it fail 100%
> >   of the time or just regularly?
> 
> 6 cores -- all on bare metal.  gcc-7.1.0-9.
> Lemme try with gcc-6, a different config or in a VM.

I've tried the following:
* gcc-6, defconfig (+btrfs obviously)
* gcc-7, defconfig
* gcc-6, my regular config
* gcc-7, my regular config
* gcc-7, debug + UBSAN + etc
* gcc-7, defconfig, qemu-kvm with only 1 core

Every build with gcc-7 reproduces the crash, every with gcc-6 does not.

-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-08  3:07         ` Adam Borowski
@ 2017-07-10 12:36           ` Austin S. Hemmelgarn
  2017-07-10 20:57             ` Nick Terrell
  2017-07-11  4:57             ` Nick Terrell
  0 siblings, 2 replies; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2017-07-10 12:36 UTC (permalink / raw)
  To: Adam Borowski, Nick Terrell
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

On 2017-07-07 23:07, Adam Borowski wrote:
> On Sat, Jul 08, 2017 at 01:40:18AM +0200, Adam Borowski wrote:
>> On Fri, Jul 07, 2017 at 11:17:49PM +0000, Nick Terrell wrote:
>>> On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
>>>> Got a reproducible crash on amd64:
>>
>>> Thanks for the bug report Adam! I'm looking into the failure, and haven't
>>> been able to reproduce it yet. I've built my kernel from your tree, and
>>> I ran your script with the kernel.tar tarball 100 times, but haven't gotten
>>> a failure yet.
>>
>>> I have a few questions to guide my debugging.
>>>
>>> - How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
>>> - Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
>>> - Are the failures always in exactly the same place, and does it fail 100%
>>>    of the time or just regularly?
>>
>> 6 cores -- all on bare metal.  gcc-7.1.0-9.
>> Lemme try with gcc-6, a different config or in a VM.
> 
> I've tried the following:
> * gcc-6, defconfig (+btrfs obviously)
> * gcc-7, defconfig
> * gcc-6, my regular config
> * gcc-7, my regular config
> * gcc-7, debug + UBSAN + etc
> * gcc-7, defconfig, qemu-kvm with only 1 core
> 
> Every build with gcc-7 reproduces the crash, every with gcc-6 does not.
> 
Got a GCC7 tool-chain built, and I can confirm this here too, tested 
with various numbers of cores ranging from 1-32 in a QEMU+KVM VM, with 
various combinations of debug options and other config switches.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-10 12:36           ` Austin S. Hemmelgarn
@ 2017-07-10 20:57             ` Nick Terrell
  2017-07-11  4:57             ` Nick Terrell
  1 sibling, 0 replies; 32+ messages in thread
From: Nick Terrell @ 2017-07-10 20:57 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Adam Borowski
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2208 bytes --]

On 7/10/17, 5:36 AM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> On 2017-07-07 23:07, Adam Borowski wrote:
>> On Sat, Jul 08, 2017 at 01:40:18AM +0200, Adam Borowski wrote:
>>> On Fri, Jul 07, 2017 at 11:17:49PM +0000, Nick Terrell wrote:
>>>> On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
>>>>> Got a reproducible crash on amd64:
>>>
>>>> Thanks for the bug report Adam! I'm looking into the failure, and haven't
>>>> been able to reproduce it yet. I've built my kernel from your tree, and
>>>> I ran your script with the kernel.tar tarball 100 times, but haven't gotten
>>>> a failure yet.
>>>
>>>> I have a few questions to guide my debugging.
>>>>
>>>> - How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
>>>> - Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
>>>> - Are the failures always in exactly the same place, and does it fail 100%
>>>>    of the time or just regularly?
>>>
>>> 6 cores -- all on bare metal.  gcc-7.1.0-9.
>>> Lemme try with gcc-6, a different config or in a VM.
>> 
>> I've tried the following:
>> * gcc-6, defconfig (+btrfs obviously)
>> * gcc-7, defconfig
>> * gcc-6, my regular config
>> * gcc-7, my regular config
>> * gcc-7, debug + UBSAN + etc
>> * gcc-7, defconfig, qemu-kvm with only 1 core
>>
>> Every build with gcc-7 reproduces the crash, every with gcc-6 does not.
>>
> Got a GCC7 tool-chain built, and I can confirm this here too, tested 
> with various numbers of cores ranging from 1-32 in a QEMU+KVM VM, with 
> various combinations of debug options and other config switches.

I was running in an Ubuntu 16.10 VM on a MacBook Pro. I built with gcc-6.2
with KASAN, and couldn't trigger it, as expected. I built with gcc-7.1.0
built from source, and couldn't reproduce it. However, when I set up
qemu-kvm on another device, and compiled with gcc-7.1.0 built from source,
I was able to reproduce the bug. Now that I can reproduce it, I'll look
into a fix. Thanks Adam and Austin for finding, reproducing, and verifying
the bug.
 

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-30 12:16   ` E V
  2017-06-30 14:21     ` David Sterba
@ 2017-07-10 21:11     ` Clemens Eisserer
  1 sibling, 0 replies; 32+ messages in thread
From: Clemens Eisserer @ 2017-07-10 21:11 UTC (permalink / raw)
  To: linux-btrfs

> So, I don't see any problem making the level configurable.

+1 - configureable compression level would be very appreciated from my side.
Can't wait until zstd support is mainline :)

Thanks and br, Clemens

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-10 12:36           ` Austin S. Hemmelgarn
  2017-07-10 20:57             ` Nick Terrell
@ 2017-07-11  4:57             ` Nick Terrell
  2017-07-11  6:01               ` Nick Terrell
  1 sibling, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-07-11  4:57 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Adam Borowski
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3262 bytes --]

On 7/10/17, 5:36 AM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> On 2017-07-07 23:07, Adam Borowski wrote:
>> On Sat, Jul 08, 2017 at 01:40:18AM +0200, Adam Borowski wrote:
>>> On Fri, Jul 07, 2017 at 11:17:49PM +0000, Nick Terrell wrote:
>>>> On 7/6/17, 9:32 AM, "Adam Borowski" <kilobyte@angband.pl> wrote:
>>>>> Got a reproducible crash on amd64:
>>>
>>>> Thanks for the bug report Adam! I'm looking into the failure, and haven't
>>>> been able to reproduce it yet. I've built my kernel from your tree, and
>>>> I ran your script with the kernel.tar tarball 100 times, but haven't gotten
>>>> a failure yet.
>>>
>>>> I have a few questions to guide my debugging.
>>>>
>>>> - How many cores are you running with? I’ve run the script with 1, 2, and 4 cores.
>>>> - Which version of gcc are you using to compile the kernel? I’m using gcc-6.2.0-5ubuntu12.
>>>> - Are the failures always in exactly the same place, and does it fail 100%
>>>>    of the time or just regularly?
>>>
>>> 6 cores -- all on bare metal.  gcc-7.1.0-9.
>>> Lemme try with gcc-6, a different config or in a VM.
>>
>> I've tried the following:
>> * gcc-6, defconfig (+btrfs obviously)
>> * gcc-7, defconfig
>> * gcc-6, my regular config
>> * gcc-7, my regular config
>> * gcc-7, debug + UBSAN + etc
>> * gcc-7, defconfig, qemu-kvm with only 1 core
>>
>> Every build with gcc-7 reproduces the crash, every with gcc-6 does not.
>>
> Got a GCC7 tool-chain built, and I can confirm this here too, tested 
> with various numbers of cores ranging from 1-32 in a QEMU+KVM VM, with 
> various combinations of debug options and other config switches.

The problem is caused by a gcc-7 bug [1]. It miscompiles
ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0.
It only happens when it can't analyze ZSTD_copy8(), which is the case in
the kernel, because memcpy() is implemented with inline assembly. The
generated code is slow anyways, so I propose this workaround, which will
be included in the next patch set. I've confirmed that it fixes the bug for
me. This alternative implementation is also 10-20x faster, and compiles to
the same x86 assembly as the original ZSTD_wildcopy() with the userland
memcpy() implementation [2].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment
[2] https://godbolt.org/g/q5YpLx

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/zstd_internal.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
index 6748719..ade0365 100644
--- a/lib/zstd/zstd_internal.h
+++ b/lib/zstd/zstd_internal.h
@@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG;
 /*-*******************************************
 *  Shared functions to include for inlining
 *********************************************/
-static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
+static void ZSTD_copy8(void *dst, const void *src) {
+       ZSTD_write64(dst, ZSTD_read64(src));
+}
 #define COPY8(d, s)               \
        {                         \
                ZSTD_copy8(d, s); \
--
2.9.3


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-11  4:57             ` Nick Terrell
@ 2017-07-11  6:01               ` Nick Terrell
  2017-07-12  3:38                 ` Adam Borowski
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Terrell @ 2017-07-11  6:01 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Adam Borowski
  Cc: Kernel Team, Chris Mason, Yann Collet, David Sterba,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2776 bytes --]

On 7/10/17, 9:57 PM, "Nick Terrell" <terrelln@fb.com> wrote:
> The problem is caused by a gcc-7 bug [1]. It miscompiles
> ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0.
> It only happens when it can't analyze ZSTD_copy8(), which is the case in
> the kernel, because memcpy() is implemented with inline assembly. The
> generated code is slow anyways, so I propose this workaround, which will
> be included in the next patch set. I've confirmed that it fixes the bug for
> me. This alternative implementation is also 10-20x faster, and compiles to
> the same x86 assembly as the original ZSTD_wildcopy() with the userland
> memcpy() implementation [2].
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388#add_comment
> [2] https://godbolt.org/g/q5YpLx
>
> Signed-off-by: Nick Terrell <terrelln@fb.com>
> ---
>  lib/zstd/zstd_internal.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
> index 6748719..ade0365 100644
> --- a/lib/zstd/zstd_internal.h
> +++ b/lib/zstd/zstd_internal.h
> @@ -126,7 +126,9 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG;
>  /*-*******************************************
>  *  Shared functions to include for inlining
>  *********************************************/
> -static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
> +static void ZSTD_copy8(void *dst, const void *src) {
> +       ZSTD_write64(dst, ZSTD_read64(src));
> +}

Sorry, my patch still triggered the gcc bug, I used the wrong compiler.
This patch works, and runs about the same speed as before the patch for
small inputs, and slightly faster for larger inputs (100+ bytes). I'll
look for a faster workaround if benchmarks show it matters.

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 lib/zstd/zstd_internal.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
index 6748719..839014d 100644
--- a/lib/zstd/zstd_internal.h
+++ b/lib/zstd/zstd_internal.h
@@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
 #define WILDCOPY_OVERLENGTH 8
 ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
 {
-	const BYTE *ip = (const BYTE *)src;
-	BYTE *op = (BYTE *)dst;
-	BYTE *const oend = op + length;
-	do
-		COPY8(op, ip)
-	while (op < oend);
+	if (length > 0)
+		memcpy(dst, src, length);
 }
 
 ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */
-- 
2.9.3



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-07-11  6:01               ` Nick Terrell
@ 2017-07-12  3:38                 ` Adam Borowski
  0 siblings, 0 replies; 32+ messages in thread
From: Adam Borowski @ 2017-07-12  3:38 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Austin S. Hemmelgarn, Kernel Team, Chris Mason, Yann Collet,
	David Sterba, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Tue, Jul 11, 2017 at 06:01:38AM +0000, Nick Terrell wrote:
> On 7/10/17, 9:57 PM, "Nick Terrell" <terrelln@fb.com> wrote:
> > The problem is caused by a gcc-7 bug [1]. It miscompiles
> > ZSTD_wildcopy(void *dst, void const *src, ptrdiff_t len) when len is 0.
>
> Sorry, my patch still triggered the gcc bug, I used the wrong compiler.
> This patch works, and runs about the same speed as before the patch for
> small inputs, and slightly faster for larger inputs (100+ bytes). I'll
> look for a faster workaround if benchmarks show it matters.

Confirmed, the fix stops the crash for me.  Yay!

> --- a/lib/zstd/zstd_internal.h
> +++ b/lib/zstd/zstd_internal.h
> @@ -139,12 +139,8 @@ static void ZSTD_copy8(void *dst, const void *src) { memcpy(dst, src, 8); }
>  #define WILDCOPY_OVERLENGTH 8
>  ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
>  {
> -	const BYTE *ip = (const BYTE *)src;
> -	BYTE *op = (BYTE *)dst;
> -	BYTE *const oend = op + length;
> -	do
> -		COPY8(op, ip)
> -	while (op < oend);
> +	if (length > 0)
> +		memcpy(dst, src, length);
>  }
>  
>  ZSTD_STATIC void ZSTD_wildcopy_e(void *dst, const void *src, void *dstEnd) /* should be faster for decoding, but strangely, not verified on all platform */

-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/4] btrfs: Add zstd support
  2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
                     ` (2 preceding siblings ...)
  2017-07-06 16:32   ` Adam Borowski
@ 2017-07-18 18:21   ` David Sterba
  3 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2017-07-18 18:21 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, squashfs-devel, linux-btrfs, linux-kernel

On Thu, Jun 29, 2017 at 12:41:07PM -0700, Nick Terrell wrote:
> +static void zstd_free_workspace(struct list_head *ws)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +
> +	vfree(workspace->mem);
> +	kfree(workspace->buf);
> +	kfree(workspace);
> +}
> +
> +static struct list_head *zstd_alloc_workspace(void)
> +{
> +	ZSTD_parameters params =
> +			zstd_get_btrfs_parameters(ZSTD_BTRFS_MAX_INPUT);
> +	struct workspace *workspace;
> +
> +	workspace = kzalloc(sizeof(*workspace), GFP_NOFS);
> +	if (!workspace)
> +		return ERR_PTR(-ENOMEM);
> +
> +	workspace->size = max_t(size_t,
> +			ZSTD_CStreamWorkspaceBound(params.cParams),
> +			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> +	workspace->mem = vmalloc(workspace->size);
> +	workspace->buf = kmalloc(PAGE_SIZE, GFP_NOFS);
> +	if (!workspace->mem || !workspace->buf)
> +		goto fail;
> +
> +	INIT_LIST_HEAD(&workspace->list);
> +
> +	return &workspace->list;
> +fail:
> +	zstd_free_workspace(&workspace->list);
> +	return ERR_PTR(-ENOMEM);
> +}

In the next iteration, please update the workspace allocations so that
they use kvmalloc/kvfree and GFP_KERNEL (eg. 6acafd1eff426).

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-07-18 18:22 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-29 19:41 [PATCH v2 0/4] Add xxhash and zstd modules Nick Terrell
2017-06-29 19:41 ` [PATCH v2 1/4] lib: Add xxhash module Nick Terrell
2017-06-29 19:41 ` [PATCH v2 3/4] btrfs: Add zstd support Nick Terrell
2017-06-30  3:24   ` Adam Borowski
2017-06-30 12:16   ` E V
2017-06-30 14:21     ` David Sterba
2017-06-30 18:25       ` Austin S. Hemmelgarn
2017-06-30 23:01         ` Nick Terrell
2017-07-05 11:43           ` Austin S. Hemmelgarn
2017-07-05 18:18             ` Adam Borowski
2017-07-05 18:45               ` Austin S. Hemmelgarn
2017-07-05 19:35                 ` Nick Terrell
2017-07-05 19:57                   ` Austin S. Hemmelgarn
2017-07-06  0:25                     ` Nick Terrell
2017-07-06 11:59                       ` Austin S. Hemmelgarn
2017-07-06 12:09                         ` Lionel Bouton
2017-07-06 12:27                           ` Austin S. Hemmelgarn
2017-07-10 21:11     ` Clemens Eisserer
2017-07-06 16:32   ` Adam Borowski
2017-07-07 23:17     ` Nick Terrell
2017-07-07 23:40       ` Adam Borowski
2017-07-08  3:07         ` Adam Borowski
2017-07-10 12:36           ` Austin S. Hemmelgarn
2017-07-10 20:57             ` Nick Terrell
2017-07-11  4:57             ` Nick Terrell
2017-07-11  6:01               ` Nick Terrell
2017-07-12  3:38                 ` Adam Borowski
2017-07-18 18:21   ` David Sterba
2017-06-29 19:41 ` [PATCH v2 4/4] squashfs: " Nick Terrell
2017-06-30  7:36 ` [PATCH v2 0/4] Add xxhash and zstd modules David Sterba
2017-06-30 16:46 ` Timofey Titovets
2017-06-30 19:52   ` Nick Terrell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).