[PATCH v5 0/5] Add xxhash and zstd modules

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/5] Add xxhash and zstd modules
@ 2017-08-10  2:35 Nick Terrell
  2017-08-10  2:35 ` [PATCH v5 1/5] lib: Add xxhash module Nick Terrell
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-10  2:35 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

Hi all,

This patch set adds xxhash, zstd compression, and zstd decompression
modules. It also adds zstd support to BtrFS and SquashFS.

Each patch has relevant summaries, benchmarks, and tests.

Best,
Nick Terrell

Changelog:

v1 -> v2:
- Make pointer in lib/xxhash.c:394 non-const (1/5)
- Use div_u64() for division of u64s (2/5)
- Reduce stack usage of ZSTD_compressSequences(), ZSTD_buildSeqTable(),
  ZSTD_decompressSequencesLong(), FSE_buildDTable(), FSE_decompress_wksp(),
  HUF_writeCTable(), HUF_readStats(), HUF_readCTable(),
  HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() (2/5)
- No zstd function uses more than 400 B of stack space (2/5)

v2 -> v3:
- Work around gcc-7 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388
  (2/5)
- Fix bug in dictionary compression from upstream commit cc1522351f (2/5)
- Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff (3/5)
- Change default compression level for BtrFS to 3 (3/5)

v3 -> v4:
- Fix compiler warnings (2/5)
- Add missing includes (3/5)
- Fix minor linter warnings (3/5, 4/5)
- Add crypto patch (5/5)

v4 -> v5:
- Fix rare compression bug from upstream commit 308047eb5d (2/5)
- Fix bug introduced in v3 when working around the gcc-7 bug (2/5)
- Fix ZSTD_DStream initialization code in squashfs (4/5)
- Fix patch documentation for patches written by Sean Purcell (4/5)

Nick Terrell (5):
  lib: Add xxhash module
  lib: Add zstd modules
  btrfs: Add zstd support
  squashfs: Add zstd support
  crypto: Add zstd support

 crypto/Kconfig             |    9 +
 crypto/Makefile            |    1 +
 crypto/testmgr.c           |   10 +
 crypto/testmgr.h           |   71 +
 crypto/zstd.c              |  265 ++++
 fs/btrfs/Kconfig           |    2 +
 fs/btrfs/Makefile          |    2 +-
 fs/btrfs/compression.c     |    1 +
 fs/btrfs/compression.h     |    6 +-
 fs/btrfs/ctree.h           |    1 +
 fs/btrfs/disk-io.c         |    2 +
 fs/btrfs/ioctl.c           |    6 +-
 fs/btrfs/props.c           |    6 +
 fs/btrfs/super.c           |   12 +-
 fs/btrfs/sysfs.c           |    2 +
 fs/btrfs/zstd.c            |  432 ++++++
 fs/squashfs/Kconfig        |   14 +
 fs/squashfs/Makefile       |    1 +
 fs/squashfs/decompressor.c |    7 +
 fs/squashfs/decompressor.h |    4 +
 fs/squashfs/squashfs_fs.h  |    1 +
 fs/squashfs/zstd_wrapper.c |  151 ++
 include/linux/xxhash.h     |  236 +++
 include/linux/zstd.h       | 1157 +++++++++++++++
 include/uapi/linux/btrfs.h |    8 +-
 lib/Kconfig                |   11 +
 lib/Makefile               |    3 +
 lib/xxhash.c               |  500 +++++++
 lib/zstd/Makefile          |   18 +
 lib/zstd/bitstream.h       |  374 +++++
 lib/zstd/compress.c        | 3484 ++++++++++++++++++++++++++++++++++++++++++++
 lib/zstd/decompress.c      | 2528 ++++++++++++++++++++++++++++++++
 lib/zstd/entropy_common.c  |  243 +++
 lib/zstd/error_private.h   |   53 +
 lib/zstd/fse.h             |  575 ++++++++
 lib/zstd/fse_compress.c    |  795 ++++++++++
 lib/zstd/fse_decompress.c  |  332 +++++
 lib/zstd/huf.h             |  212 +++
 lib/zstd/huf_compress.c    |  770 ++++++++++
 lib/zstd/huf_decompress.c  |  960 ++++++++++++
 lib/zstd/mem.h             |  151 ++
 lib/zstd/zstd_common.c     |   75 +
 lib/zstd/zstd_internal.h   |  263 ++++
 lib/zstd/zstd_opt.h        | 1014 +++++++++++++
 44 files changed, 14756 insertions(+), 12 deletions(-)
 create mode 100644 crypto/zstd.c
 create mode 100644 fs/btrfs/zstd.c
 create mode 100644 fs/squashfs/zstd_wrapper.c
 create mode 100644 include/linux/xxhash.h
 create mode 100644 include/linux/zstd.h
 create mode 100644 lib/xxhash.c
 create mode 100644 lib/zstd/Makefile
 create mode 100644 lib/zstd/bitstream.h
 create mode 100644 lib/zstd/compress.c
 create mode 100644 lib/zstd/decompress.c
 create mode 100644 lib/zstd/entropy_common.c
 create mode 100644 lib/zstd/error_private.h
 create mode 100644 lib/zstd/fse.h
 create mode 100644 lib/zstd/fse_compress.c
 create mode 100644 lib/zstd/fse_decompress.c
 create mode 100644 lib/zstd/huf.h
 create mode 100644 lib/zstd/huf_compress.c
 create mode 100644 lib/zstd/huf_decompress.c
 create mode 100644 lib/zstd/mem.h
 create mode 100644 lib/zstd/zstd_common.c
 create mode 100644 lib/zstd/zstd_internal.h
 create mode 100644 lib/zstd/zstd_opt.h

--
2.9.3

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 1/5] lib: Add xxhash module
  2017-08-10  2:35 [PATCH v5 0/5] Add xxhash and zstd modules Nick Terrell
@ 2017-08-10  2:35 ` Nick Terrell
  2017-08-10  2:39 ` [PATCH v5 3/5] btrfs: Add zstd support Nick Terrell
       [not found] ` <20170810023553.3200875-3-terrelln@fb.com>
  2 siblings, 0 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-10  2:35 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
extremely fast non-cryptographic hash algorithm for checksumming.
The zstd compression and decompression modules added in the next patch
require xxhash. I extracted it out from zstd since it is useful on its
own. I copied the code from the upstream XXHash source repository and
translated it into kernel style. I ran benchmarks and tests in the kernel
and tests in userland.

I benchmarked xxhash as a special character device. I ran in four modes,
no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
hashes on the copied data. I also ran it with four different buffer sizes.
The benchmark file is located in the upstream zstd source repository under
`contrib/linux-kernel/xxhash_test.c` [1].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
Run the following commands for the benchmark:

    modprobe xxhash_test
    mknod xxhash_test c 245 0
    time cp filesystem.squashfs xxhash_test

The time is reported by the time of the userland `cp`.
The GB/s is computed with

    1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Normalized GB/s is computed with

    1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).


| Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
|-----------------|-------|----------|------|---------------|
|            1024 | none  |    0.408 | 3.77 |             - |
|            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
|            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
|            1024 | crc32 |    1.290 | 1.19 |          1.74 |
|            4096 | none  |    0.380 | 4.04 |             - |
|            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
|            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
|            4096 | crc32 |    1.168 | 1.32 |          1.95 |
|            8192 | none  |    0.351 | 4.38 |             - |
|            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
|            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
|            8192 | crc32 |    1.163 | 1.32 |          1.89 |
|           16384 | none  |    0.346 | 4.43 |             - |
|           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
|           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
|           16384 | crc32 |    1.183 | 1.30 |          1.84 |

Tested in userland using the test-suite in the zstd repo under
`contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
kernel functions. A line in each branch of every function in `xxhash.c`
was commented out to ensure that the test-suite fails. Additionally
tested while testing zstd and with SMHasher [3].

[1] https://phabricator.intern.facebook.com/P57526246
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
[3] https://github.com/aappleby/smhasher

zstd source repository: https://github.com/facebook/zstd
XXHash source repository: https://github.com/cyan4973/xxhash

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
v1 -> v2:
- Make pointer in lib/xxhash.c:394 non-const

 include/linux/xxhash.h | 236 +++++++++++++++++++++++
 lib/Kconfig            |   3 +
 lib/Makefile           |   1 +
 lib/xxhash.c           | 500 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 740 insertions(+)
 create mode 100644 include/linux/xxhash.h
 create mode 100644 lib/xxhash.c

diff --git a/include/linux/xxhash.h b/include/linux/xxhash.h
new file mode 100644
index 0000000..9e1f42c
--- /dev/null
+++ b/include/linux/xxhash.h
@@ -0,0 +1,236 @@
+/*
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: http://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+/*
+ * Notice extracted from xxHash homepage:
+ *
+ * xxHash is an extremely fast Hash algorithm, running at RAM speed limits.
+ * It also successfully passes all tests from the SMHasher suite.
+ *
+ * Comparison (single thread, Windows Seven 32 bits, using SMHasher on a Core 2
+ * Duo @3GHz)
+ *
+ * Name            Speed       Q.Score   Author
+ * xxHash          5.4 GB/s     10
+ * CrapWow         3.2 GB/s      2       Andrew
+ * MumurHash 3a    2.7 GB/s     10       Austin Appleby
+ * SpookyHash      2.0 GB/s     10       Bob Jenkins
+ * SBox            1.4 GB/s      9       Bret Mulvey
+ * Lookup3         1.2 GB/s      9       Bob Jenkins
+ * SuperFastHash   1.2 GB/s      1       Paul Hsieh
+ * CityHash64      1.05 GB/s    10       Pike & Alakuijala
+ * FNV             0.55 GB/s     5       Fowler, Noll, Vo
+ * CRC32           0.43 GB/s     9
+ * MD5-32          0.33 GB/s    10       Ronald L. Rivest
+ * SHA1-32         0.28 GB/s    10
+ *
+ * Q.Score is a measure of quality of the hash function.
+ * It depends on successfully passing SMHasher test set.
+ * 10 is a perfect score.
+ *
+ * A 64-bits version, named xxh64 offers much better speed,
+ * but for 64-bits applications only.
+ * Name     Speed on 64 bits    Speed on 32 bits
+ * xxh64       13.8 GB/s            1.9 GB/s
+ * xxh32        6.8 GB/s            6.0 GB/s
+ */
+
+#ifndef XXHASH_H
+#define XXHASH_H
+
+#include <linux/types.h>
+
+/*-****************************
+ * Simple Hash Functions
+ *****************************/
+
+/**
+ * xxh32() - calculate the 32-bit hash of the input with a given seed.
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * Speed on Core 2 Duo @ 3 GHz (single thread, SMHasher benchmark) : 5.4 GB/s
+ *
+ * Return:  The 32-bit hash of the data.
+ */
+uint32_t xxh32(const void *input, size_t length, uint32_t seed);
+
+/**
+ * xxh64() - calculate the 64-bit hash of the input with a given seed.
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * This function runs 2x faster on 64-bit systems, but slower on 32-bit systems.
+ *
+ * Return:  The 64-bit hash of the data.
+ */
+uint64_t xxh64(const void *input, size_t length, uint64_t seed);
+
+/*-****************************
+ * Streaming Hash Functions
+ *****************************/
+
+/*
+ * These definitions are only meant to allow allocation of XXH state
+ * statically, on stack, or in a struct for example.
+ * Do not use members directly.
+ */
+
+/**
+ * struct xxh32_state - private xxh32 state, do not use members directly
+ */
+struct xxh32_state {
+	uint32_t total_len_32;
+	uint32_t large_len;
+	uint32_t v1;
+	uint32_t v2;
+	uint32_t v3;
+	uint32_t v4;
+	uint32_t mem32[4];
+	uint32_t memsize;
+};
+
+/**
+ * struct xxh32_state - private xxh64 state, do not use members directly
+ */
+struct xxh64_state {
+	uint64_t total_len;
+	uint64_t v1;
+	uint64_t v2;
+	uint64_t v3;
+	uint64_t v4;
+	uint64_t mem64[4];
+	uint32_t memsize;
+};
+
+/**
+ * xxh32_reset() - reset the xxh32 state to start a new hashing operation
+ *
+ * @state: The xxh32 state to reset.
+ * @seed:  Initialize the hash state with this seed.
+ *
+ * Call this function on any xxh32_state to prepare for a new hashing operation.
+ */
+void xxh32_reset(struct xxh32_state *state, uint32_t seed);
+
+/**
+ * xxh32_update() - hash the data given and update the xxh32 state
+ *
+ * @state:  The xxh32 state to update.
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ *
+ * After calling xxh32_reset() call xxh32_update() as many times as necessary.
+ *
+ * Return:  Zero on success, otherwise an error code.
+ */
+int xxh32_update(struct xxh32_state *state, const void *input, size_t length);
+
+/**
+ * xxh32_digest() - produce the current xxh32 hash
+ *
+ * @state: Produce the current xxh32 hash of this state.
+ *
+ * A hash value can be produced at any time. It is still possible to continue
+ * inserting input into the hash state after a call to xxh32_digest(), and
+ * generate new hashes later on, by calling xxh32_digest() again.
+ *
+ * Return: The xxh32 hash stored in the state.
+ */
+uint32_t xxh32_digest(const struct xxh32_state *state);
+
+/**
+ * xxh64_reset() - reset the xxh64 state to start a new hashing operation
+ *
+ * @state: The xxh64 state to reset.
+ * @seed:  Initialize the hash state with this seed.
+ */
+void xxh64_reset(struct xxh64_state *state, uint64_t seed);
+
+/**
+ * xxh64_update() - hash the data given and update the xxh64 state
+ * @state:  The xxh64 state to update.
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ *
+ * After calling xxh64_reset() call xxh64_update() as many times as necessary.
+ *
+ * Return:  Zero on success, otherwise an error code.
+ */
+int xxh64_update(struct xxh64_state *state, const void *input, size_t length);
+
+/**
+ * xxh64_digest() - produce the current xxh64 hash
+ *
+ * @state: Produce the current xxh64 hash of this state.
+ *
+ * A hash value can be produced at any time. It is still possible to continue
+ * inserting input into the hash state after a call to xxh64_digest(), and
+ * generate new hashes later on, by calling xxh64_digest() again.
+ *
+ * Return: The xxh64 hash stored in the state.
+ */
+uint64_t xxh64_digest(const struct xxh64_state *state);
+
+/*-**************************
+ * Utils
+ ***************************/
+
+/**
+ * xxh32_copy_state() - copy the source state into the destination state
+ *
+ * @src: The source xxh32 state.
+ * @dst: The destination xxh32 state.
+ */
+void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src);
+
+/**
+ * xxh64_copy_state() - copy the source state into the destination state
+ *
+ * @src: The source xxh64 state.
+ * @dst: The destination xxh64 state.
+ */
+void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src);
+
+#endif /* XXHASH_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 6762529..5e7541f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -192,6 +192,9 @@ config CRC8
 	  when they need to do cyclic redundancy check according CRC8
 	  algorithm. Module will be called crc8.

+config XXHASH
+	tristate
+
 config AUDIT_GENERIC
 	bool
 	depends on AUDIT && !AUDIT_ARCH
diff --git a/lib/Makefile b/lib/Makefile
index 40c1837..d06b68a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_CRC4)	+= crc4.o
 obj-$(CONFIG_CRC7)	+= crc7.o
 obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
+obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o

 obj-$(CONFIG_842_COMPRESS) += 842/
diff --git a/lib/xxhash.c b/lib/xxhash.c
new file mode 100644
index 0000000..aa61e2a
--- /dev/null
+++ b/lib/xxhash.c
@@ -0,0 +1,500 @@
+/*
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: http://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#include <asm/unaligned.h>
+#include <linux/errno.h>
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/xxhash.h>
+
+/*-*************************************
+ * Macros
+ **************************************/
+#define xxh_rotl32(x, r) ((x << r) | (x >> (32 - r)))
+#define xxh_rotl64(x, r) ((x << r) | (x >> (64 - r)))
+
+#ifdef __LITTLE_ENDIAN
+# define XXH_CPU_LITTLE_ENDIAN 1
+#else
+# define XXH_CPU_LITTLE_ENDIAN 0
+#endif
+
+/*-*************************************
+ * Constants
+ **************************************/
+static const uint32_t PRIME32_1 = 2654435761U;
+static const uint32_t PRIME32_2 = 2246822519U;
+static const uint32_t PRIME32_3 = 3266489917U;
+static const uint32_t PRIME32_4 =  668265263U;
+static const uint32_t PRIME32_5 =  374761393U;
+
+static const uint64_t PRIME64_1 = 11400714785074694791ULL;
+static const uint64_t PRIME64_2 = 14029467366897019727ULL;
+static const uint64_t PRIME64_3 =  1609587929392839161ULL;
+static const uint64_t PRIME64_4 =  9650029242287828579ULL;
+static const uint64_t PRIME64_5 =  2870177450012600261ULL;
+
+/*-**************************
+ *  Utils
+ ***************************/
+void xxh32_copy_state(struct xxh32_state *dst, const struct xxh32_state *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+EXPORT_SYMBOL(xxh32_copy_state);
+
+void xxh64_copy_state(struct xxh64_state *dst, const struct xxh64_state *src)
+{
+	memcpy(dst, src, sizeof(*dst));
+}
+EXPORT_SYMBOL(xxh64_copy_state);
+
+/*-***************************
+ * Simple Hash Functions
+ ****************************/
+static uint32_t xxh32_round(uint32_t seed, const uint32_t input)
+{
+	seed += input * PRIME32_2;
+	seed = xxh_rotl32(seed, 13);
+	seed *= PRIME32_1;
+	return seed;
+}
+
+uint32_t xxh32(const void *input, const size_t len, const uint32_t seed)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *b_end = p + len;
+	uint32_t h32;
+
+	if (len >= 16) {
+		const uint8_t *const limit = b_end - 16;
+		uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
+		uint32_t v2 = seed + PRIME32_2;
+		uint32_t v3 = seed + 0;
+		uint32_t v4 = seed - PRIME32_1;
+
+		do {
+			v1 = xxh32_round(v1, get_unaligned_le32(p));
+			p += 4;
+			v2 = xxh32_round(v2, get_unaligned_le32(p));
+			p += 4;
+			v3 = xxh32_round(v3, get_unaligned_le32(p));
+			p += 4;
+			v4 = xxh32_round(v4, get_unaligned_le32(p));
+			p += 4;
+		} while (p <= limit);
+
+		h32 = xxh_rotl32(v1, 1) + xxh_rotl32(v2, 7) +
+			xxh_rotl32(v3, 12) + xxh_rotl32(v4, 18);
+	} else {
+		h32 = seed + PRIME32_5;
+	}
+
+	h32 += (uint32_t)len;
+
+	while (p + 4 <= b_end) {
+		h32 += get_unaligned_le32(p) * PRIME32_3;
+		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h32 += (*p) * PRIME32_5;
+		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+		p++;
+	}
+
+	h32 ^= h32 >> 15;
+	h32 *= PRIME32_2;
+	h32 ^= h32 >> 13;
+	h32 *= PRIME32_3;
+	h32 ^= h32 >> 16;
+
+	return h32;
+}
+EXPORT_SYMBOL(xxh32);
+
+static uint64_t xxh64_round(uint64_t acc, const uint64_t input)
+{
+	acc += input * PRIME64_2;
+	acc = xxh_rotl64(acc, 31);
+	acc *= PRIME64_1;
+	return acc;
+}
+
+static uint64_t xxh64_merge_round(uint64_t acc, uint64_t val)
+{
+	val = xxh64_round(0, val);
+	acc ^= val;
+	acc = acc * PRIME64_1 + PRIME64_4;
+	return acc;
+}
+
+uint64_t xxh64(const void *input, const size_t len, const uint64_t seed)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+	uint64_t h64;
+
+	if (len >= 32) {
+		const uint8_t *const limit = b_end - 32;
+		uint64_t v1 = seed + PRIME64_1 + PRIME64_2;
+		uint64_t v2 = seed + PRIME64_2;
+		uint64_t v3 = seed + 0;
+		uint64_t v4 = seed - PRIME64_1;
+
+		do {
+			v1 = xxh64_round(v1, get_unaligned_le64(p));
+			p += 8;
+			v2 = xxh64_round(v2, get_unaligned_le64(p));
+			p += 8;
+			v3 = xxh64_round(v3, get_unaligned_le64(p));
+			p += 8;
+			v4 = xxh64_round(v4, get_unaligned_le64(p));
+			p += 8;
+		} while (p <= limit);
+
+		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
+			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
+		h64 = xxh64_merge_round(h64, v1);
+		h64 = xxh64_merge_round(h64, v2);
+		h64 = xxh64_merge_round(h64, v3);
+		h64 = xxh64_merge_round(h64, v4);
+
+	} else {
+		h64  = seed + PRIME64_5;
+	}
+
+	h64 += (uint64_t)len;
+
+	while (p + 8 <= b_end) {
+		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
+
+		h64 ^= k1;
+		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
+		p += 8;
+	}
+
+	if (p + 4 <= b_end) {
+		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
+		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h64 ^= (*p) * PRIME64_5;
+		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
+		p++;
+	}
+
+	h64 ^= h64 >> 33;
+	h64 *= PRIME64_2;
+	h64 ^= h64 >> 29;
+	h64 *= PRIME64_3;
+	h64 ^= h64 >> 32;
+
+	return h64;
+}
+EXPORT_SYMBOL(xxh64);
+
+/*-**************************************************
+ * Advanced Hash Functions
+ ***************************************************/
+void xxh32_reset(struct xxh32_state *statePtr, const uint32_t seed)
+{
+	/* use a local state for memcpy() to avoid strict-aliasing warnings */
+	struct xxh32_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.v1 = seed + PRIME32_1 + PRIME32_2;
+	state.v2 = seed + PRIME32_2;
+	state.v3 = seed + 0;
+	state.v4 = seed - PRIME32_1;
+	memcpy(statePtr, &state, sizeof(state));
+}
+EXPORT_SYMBOL(xxh32_reset);
+
+void xxh64_reset(struct xxh64_state *statePtr, const uint64_t seed)
+{
+	/* use a local state for memcpy() to avoid strict-aliasing warnings */
+	struct xxh64_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.v1 = seed + PRIME64_1 + PRIME64_2;
+	state.v2 = seed + PRIME64_2;
+	state.v3 = seed + 0;
+	state.v4 = seed - PRIME64_1;
+	memcpy(statePtr, &state, sizeof(state));
+}
+EXPORT_SYMBOL(xxh64_reset);
+
+int xxh32_update(struct xxh32_state *state, const void *input, const size_t len)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+
+	if (input == NULL)
+		return -EINVAL;
+
+	state->total_len_32 += (uint32_t)len;
+	state->large_len |= (len >= 16) | (state->total_len_32 >= 16);
+
+	if (state->memsize + len < 16) { /* fill in tmp buffer */
+		memcpy((uint8_t *)(state->mem32) + state->memsize, input, len);
+		state->memsize += (uint32_t)len;
+		return 0;
+	}
+
+	if (state->memsize) { /* some data left from previous update */
+		const uint32_t *p32 = state->mem32;
+
+		memcpy((uint8_t *)(state->mem32) + state->memsize, input,
+			16 - state->memsize);
+
+		state->v1 = xxh32_round(state->v1, get_unaligned_le32(p32));
+		p32++;
+		state->v2 = xxh32_round(state->v2, get_unaligned_le32(p32));
+		p32++;
+		state->v3 = xxh32_round(state->v3, get_unaligned_le32(p32));
+		p32++;
+		state->v4 = xxh32_round(state->v4, get_unaligned_le32(p32));
+		p32++;
+
+		p += 16-state->memsize;
+		state->memsize = 0;
+	}
+
+	if (p <= b_end - 16) {
+		const uint8_t *const limit = b_end - 16;
+		uint32_t v1 = state->v1;
+		uint32_t v2 = state->v2;
+		uint32_t v3 = state->v3;
+		uint32_t v4 = state->v4;
+
+		do {
+			v1 = xxh32_round(v1, get_unaligned_le32(p));
+			p += 4;
+			v2 = xxh32_round(v2, get_unaligned_le32(p));
+			p += 4;
+			v3 = xxh32_round(v3, get_unaligned_le32(p));
+			p += 4;
+			v4 = xxh32_round(v4, get_unaligned_le32(p));
+			p += 4;
+		} while (p <= limit);
+
+		state->v1 = v1;
+		state->v2 = v2;
+		state->v3 = v3;
+		state->v4 = v4;
+	}
+
+	if (p < b_end) {
+		memcpy(state->mem32, p, (size_t)(b_end-p));
+		state->memsize = (uint32_t)(b_end-p);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(xxh32_update);
+
+uint32_t xxh32_digest(const struct xxh32_state *state)
+{
+	const uint8_t *p = (const uint8_t *)state->mem32;
+	const uint8_t *const b_end = (const uint8_t *)(state->mem32) +
+		state->memsize;
+	uint32_t h32;
+
+	if (state->large_len) {
+		h32 = xxh_rotl32(state->v1, 1) + xxh_rotl32(state->v2, 7) +
+			xxh_rotl32(state->v3, 12) + xxh_rotl32(state->v4, 18);
+	} else {
+		h32 = state->v3 /* == seed */ + PRIME32_5;
+	}
+
+	h32 += state->total_len_32;
+
+	while (p + 4 <= b_end) {
+		h32 += get_unaligned_le32(p) * PRIME32_3;
+		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h32 += (*p) * PRIME32_5;
+		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+		p++;
+	}
+
+	h32 ^= h32 >> 15;
+	h32 *= PRIME32_2;
+	h32 ^= h32 >> 13;
+	h32 *= PRIME32_3;
+	h32 ^= h32 >> 16;
+
+	return h32;
+}
+EXPORT_SYMBOL(xxh32_digest);
+
+int xxh64_update(struct xxh64_state *state, const void *input, const size_t len)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *const b_end = p + len;
+
+	if (input == NULL)
+		return -EINVAL;
+
+	state->total_len += len;
+
+	if (state->memsize + len < 32) { /* fill in tmp buffer */
+		memcpy(((uint8_t *)state->mem64) + state->memsize, input, len);
+		state->memsize += (uint32_t)len;
+		return 0;
+	}
+
+	if (state->memsize) { /* tmp buffer is full */
+		uint64_t *p64 = state->mem64;
+
+		memcpy(((uint8_t *)p64) + state->memsize, input,
+			32 - state->memsize);
+
+		state->v1 = xxh64_round(state->v1, get_unaligned_le64(p64));
+		p64++;
+		state->v2 = xxh64_round(state->v2, get_unaligned_le64(p64));
+		p64++;
+		state->v3 = xxh64_round(state->v3, get_unaligned_le64(p64));
+		p64++;
+		state->v4 = xxh64_round(state->v4, get_unaligned_le64(p64));
+
+		p += 32 - state->memsize;
+		state->memsize = 0;
+	}
+
+	if (p + 32 <= b_end) {
+		const uint8_t *const limit = b_end - 32;
+		uint64_t v1 = state->v1;
+		uint64_t v2 = state->v2;
+		uint64_t v3 = state->v3;
+		uint64_t v4 = state->v4;
+
+		do {
+			v1 = xxh64_round(v1, get_unaligned_le64(p));
+			p += 8;
+			v2 = xxh64_round(v2, get_unaligned_le64(p));
+			p += 8;
+			v3 = xxh64_round(v3, get_unaligned_le64(p));
+			p += 8;
+			v4 = xxh64_round(v4, get_unaligned_le64(p));
+			p += 8;
+		} while (p <= limit);
+
+		state->v1 = v1;
+		state->v2 = v2;
+		state->v3 = v3;
+		state->v4 = v4;
+	}
+
+	if (p < b_end) {
+		memcpy(state->mem64, p, (size_t)(b_end-p));
+		state->memsize = (uint32_t)(b_end - p);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(xxh64_update);
+
+uint64_t xxh64_digest(const struct xxh64_state *state)
+{
+	const uint8_t *p = (const uint8_t *)state->mem64;
+	const uint8_t *const b_end = (const uint8_t *)state->mem64 +
+		state->memsize;
+	uint64_t h64;
+
+	if (state->total_len >= 32) {
+		const uint64_t v1 = state->v1;
+		const uint64_t v2 = state->v2;
+		const uint64_t v3 = state->v3;
+		const uint64_t v4 = state->v4;
+
+		h64 = xxh_rotl64(v1, 1) + xxh_rotl64(v2, 7) +
+			xxh_rotl64(v3, 12) + xxh_rotl64(v4, 18);
+		h64 = xxh64_merge_round(h64, v1);
+		h64 = xxh64_merge_round(h64, v2);
+		h64 = xxh64_merge_round(h64, v3);
+		h64 = xxh64_merge_round(h64, v4);
+	} else {
+		h64  = state->v3 + PRIME64_5;
+	}
+
+	h64 += (uint64_t)state->total_len;
+
+	while (p + 8 <= b_end) {
+		const uint64_t k1 = xxh64_round(0, get_unaligned_le64(p));
+
+		h64 ^= k1;
+		h64 = xxh_rotl64(h64, 27) * PRIME64_1 + PRIME64_4;
+		p += 8;
+	}
+
+	if (p + 4 <= b_end) {
+		h64 ^= (uint64_t)(get_unaligned_le32(p)) * PRIME64_1;
+		h64 = xxh_rotl64(h64, 23) * PRIME64_2 + PRIME64_3;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h64 ^= (*p) * PRIME64_5;
+		h64 = xxh_rotl64(h64, 11) * PRIME64_1;
+		p++;
+	}
+
+	h64 ^= h64 >> 33;
+	h64 *= PRIME64_2;
+	h64 ^= h64 >> 29;
+	h64 *= PRIME64_3;
+	h64 ^= h64 >> 32;
+
+	return h64;
+}
+EXPORT_SYMBOL(xxh64_digest);
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_DESCRIPTION("xxHash");
--
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 3/5] btrfs: Add zstd support
  2017-08-10  2:35 [PATCH v5 0/5] Add xxhash and zstd modules Nick Terrell
  2017-08-10  2:35 ` [PATCH v5 1/5] lib: Add xxhash module Nick Terrell
@ 2017-08-10  2:39 ` Nick Terrell
  2017-08-11  2:13   ` Adam Borowski
  2017-08-11 11:45   ` Austin S. Hemmelgarn
       [not found] ` <20170810023553.3200875-3-terrelln@fb.com>
  2 siblings, 2 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-10  2:39 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Austin S . Hemmelgarn, kernel-team, Chris Mason, Yann Collet,
	Adam Borowski, David Sterba, linux-btrfs, linux-kernel

Add zstd compression and decompression support to BtrFS. zstd at its
fastest level compresses almost as well as zlib, while offering much
faster compression and decompression, approaching lzo speeds.

I benchmarked btrfs with zstd compression against no compression, lzo
compression, and zlib compression. I benchmarked two scenarios. Copying
a set of files to btrfs, and then reading the files. Copying a tarball
to btrfs, extracting it to btrfs, and then reading the extracted files.
After every operation, I call `sync` and include the sync time.
Between every pair of operations I unmount and remount the filesystem
to avoid caching. The benchmark files can be found in the upstream
zstd source repository under
`contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
[1] [2].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD.

The first compression benchmark is copying 10 copies of the unzipped
Silesia corpus [3] into a BtrFS filesystem mounted with
`-o compress-force=Method`. The decompression benchmark times how long
it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
measured by comparing the output of `df` and `du`. See the benchmark file
[1] for details. I benchmarked multiple zstd compression levels, although
the patch uses zstd level 1.

| Method  | Ratio | Compression MB/s | Decompression speed |
|---------|-------|------------------|---------------------|
| None    |  0.99 |              504 |                 686 |
| lzo     |  1.66 |              398 |                 442 |
| zlib    |  2.58 |               65 |                 241 |
| zstd 1  |  2.57 |              260 |                 383 |
| zstd 3  |  2.71 |              174 |                 408 |
| zstd 6  |  2.87 |               70 |                 398 |
| zstd 9  |  2.92 |               43 |                 406 |
| zstd 12 |  2.93 |               21 |                 408 |
| zstd 15 |  3.01 |               11 |                 354 |

The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
measures the compression ratio, extracts the tar, and deletes the tar.
Then it measures the compression ratio again, and `tar`s the extracted
files into `/dev/null`. See the benchmark file [2] for details.

| Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
|--------|-----------|---------------|----------|------------|----------|
| None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
| lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
| zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
| zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |

[1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
[3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz

zstd source repository: https://github.com/facebook/zstd

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
v2 -> v3:
- Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff
- Change default compression level for BtrFS to 3

v3 -> v4:
- Add missing includes, which fixes the aarch64 build
- Fix minor linter warnings

 fs/btrfs/Kconfig           |   2 +
 fs/btrfs/Makefile          |   2 +-
 fs/btrfs/compression.c     |   1 +
 fs/btrfs/compression.h     |   6 +-
 fs/btrfs/ctree.h           |   1 +
 fs/btrfs/disk-io.c         |   2 +
 fs/btrfs/ioctl.c           |   6 +-
 fs/btrfs/props.c           |   6 +
 fs/btrfs/super.c           |  12 +-
 fs/btrfs/sysfs.c           |   2 +
 fs/btrfs/zstd.c            | 432 +++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/btrfs.h |   8 +-
 12 files changed, 468 insertions(+), 12 deletions(-)
 create mode 100644 fs/btrfs/zstd.c

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 80e9c18..a26c63b 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -6,6 +6,8 @@ config BTRFS_FS
 	select ZLIB_DEFLATE
 	select LZO_COMPRESS
 	select LZO_DECOMPRESS
+	select ZSTD_COMPRESS
+	select ZSTD_DECOMPRESS
 	select RAID6_PQ
 	select XOR_BLOCKS
 	select SRCU
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17..962a95a 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -6,7 +6,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   transaction.o inode.o file.o tree-defrag.o \
 	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
-	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
+	   export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o hash.o free-space-tree.o
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d2ef9ac..4ff42d1 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -704,6 +704,7 @@ static struct {
 static const struct btrfs_compress_op * const btrfs_compress_op[] = {
 	&btrfs_zlib_compress,
 	&btrfs_lzo_compress,
+	&btrfs_zstd_compress,
 };

 void __init btrfs_init_compress(void)
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 87f6d33..2269e00 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -99,8 +99,9 @@ enum btrfs_compression_type {
 	BTRFS_COMPRESS_NONE  = 0,
 	BTRFS_COMPRESS_ZLIB  = 1,
 	BTRFS_COMPRESS_LZO   = 2,
-	BTRFS_COMPRESS_TYPES = 2,
-	BTRFS_COMPRESS_LAST  = 3,
+	BTRFS_COMPRESS_ZSTD  = 3,
+	BTRFS_COMPRESS_TYPES = 3,
+	BTRFS_COMPRESS_LAST  = 4,
 };

 struct btrfs_compress_op {
@@ -128,5 +129,6 @@ struct btrfs_compress_op {

 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;
+extern const struct btrfs_compress_op btrfs_zstd_compress;

 #endif
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3f3eb7b..845d77c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -270,6 +270,7 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_BIG_METADATA |		\
 	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
+	 BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD |		\
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 080e2eb..04632f4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2828,6 +2828,8 @@ int open_ctree(struct super_block *sb,
 	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
 	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
 		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+	else if (fs_info->compress_type == BTRFS_COMPRESS_ZSTD)
+		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;

 	if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
 		btrfs_info(fs_info, "has skinny extents");
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fa1b78c..b9963d9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -327,8 +327,10 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)

 		if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
 			comp = "lzo";
-		else
+		else if (fs_info->compress_type == BTRFS_COMPRESS_ZLIB)
 			comp = "zlib";
+		else
+			comp = "zstd";
 		ret = btrfs_set_prop(inode, "btrfs.compression",
 				     comp, strlen(comp), 0);
 		if (ret)
@@ -1466,6 +1468,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,

 	if (range->compress_type == BTRFS_COMPRESS_LZO) {
 		btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
+	} else if (range->compress_type == BTRFS_COMPRESS_ZSTD) {
+		btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
 	}

 	ret = defrag_count;
diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index 4b23ae5..20631e9 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -390,6 +390,8 @@ static int prop_compression_validate(const char *value, size_t len)
 		return 0;
 	else if (!strncmp("zlib", value, len))
 		return 0;
+	else if (!strncmp("zstd", value, len))
+		return 0;

 	return -EINVAL;
 }
@@ -412,6 +414,8 @@ static int prop_compression_apply(struct inode *inode,
 		type = BTRFS_COMPRESS_LZO;
 	else if (!strncmp("zlib", value, len))
 		type = BTRFS_COMPRESS_ZLIB;
+	else if (!strncmp("zstd", value, len))
+		type = BTRFS_COMPRESS_ZSTD;
 	else
 		return -EINVAL;

@@ -429,6 +433,8 @@ static const char *prop_compression_extract(struct inode *inode)
 		return "zlib";
 	case BTRFS_COMPRESS_LZO:
 		return "lzo";
+	case BTRFS_COMPRESS_ZSTD:
+		return "zstd";
 	}

 	return NULL;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 12540b6..c370dea 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -513,6 +513,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 				btrfs_clear_opt(info->mount_opt, NODATASUM);
 				btrfs_set_fs_incompat(info, COMPRESS_LZO);
 				no_compress = 0;
+			} else if (strcmp(args[0].from, "zstd") == 0) {
+				compress_type = "zstd";
+				info->compress_type = BTRFS_COMPRESS_ZSTD;
+				btrfs_set_opt(info->mount_opt, COMPRESS);
+				btrfs_clear_opt(info->mount_opt, NODATACOW);
+				btrfs_clear_opt(info->mount_opt, NODATASUM);
+				btrfs_set_fs_incompat(info, COMPRESS_ZSTD);
+				no_compress = 0;
 			} else if (strncmp(args[0].from, "no", 2) == 0) {
 				compress_type = "no";
 				btrfs_clear_opt(info->mount_opt, COMPRESS);
@@ -1227,8 +1235,10 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 	if (btrfs_test_opt(info, COMPRESS)) {
 		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
 			compress_type = "zlib";
-		else
+		else if (info->compress_type == BTRFS_COMPRESS_LZO)
 			compress_type = "lzo";
+		else
+			compress_type = "zstd";
 		if (btrfs_test_opt(info, FORCE_COMPRESS))
 			seq_printf(seq, ",compress-force=%s", compress_type);
 		else
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index c2d5f35..2b6d37c 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -200,6 +200,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(mixed_backref, MIXED_BACKREF);
 BTRFS_FEAT_ATTR_INCOMPAT(default_subvol, DEFAULT_SUBVOL);
 BTRFS_FEAT_ATTR_INCOMPAT(mixed_groups, MIXED_GROUPS);
 BTRFS_FEAT_ATTR_INCOMPAT(compress_lzo, COMPRESS_LZO);
+BTRFS_FEAT_ATTR_INCOMPAT(compress_zstd, COMPRESS_ZSTD);
 BTRFS_FEAT_ATTR_INCOMPAT(big_metadata, BIG_METADATA);
 BTRFS_FEAT_ATTR_INCOMPAT(extended_iref, EXTENDED_IREF);
 BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56);
@@ -212,6 +213,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(default_subvol),
 	BTRFS_FEAT_ATTR_PTR(mixed_groups),
 	BTRFS_FEAT_ATTR_PTR(compress_lzo),
+	BTRFS_FEAT_ATTR_PTR(compress_zstd),
 	BTRFS_FEAT_ATTR_PTR(big_metadata),
 	BTRFS_FEAT_ATTR_PTR(extended_iref),
 	BTRFS_FEAT_ATTR_PTR(raid56),
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
new file mode 100644
index 0000000..607ce47
--- /dev/null
+++ b/fs/btrfs/zstd.c
@@ -0,0 +1,432 @@
+/*
+ * Copyright (c) 2016-present, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <linux/bio.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/refcount.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/zstd.h>
+#include "compression.h"
+
+#define ZSTD_BTRFS_MAX_WINDOWLOG 17
+#define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
+#define ZSTD_BTRFS_DEFAULT_LEVEL 3
+
+static ZSTD_parameters zstd_get_btrfs_parameters(size_t src_len)
+{
+	ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL,
+						src_len, 0);
+
+	if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
+		params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
+	WARN_ON(src_len > ZSTD_BTRFS_MAX_INPUT);
+	return params;
+}
+
+struct workspace {
+	void *mem;
+	size_t size;
+	char *buf;
+	struct list_head list;
+};
+
+static void zstd_free_workspace(struct list_head *ws)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+
+	kvfree(workspace->mem);
+	kfree(workspace->buf);
+	kfree(workspace);
+}
+
+static struct list_head *zstd_alloc_workspace(void)
+{
+	ZSTD_parameters params =
+			zstd_get_btrfs_parameters(ZSTD_BTRFS_MAX_INPUT);
+	struct workspace *workspace;
+
+	workspace = kzalloc(sizeof(*workspace), GFP_KERNEL);
+	if (!workspace)
+		return ERR_PTR(-ENOMEM);
+
+	workspace->size = max_t(size_t,
+			ZSTD_CStreamWorkspaceBound(params.cParams),
+			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
+	workspace->mem = kvmalloc(workspace->size, GFP_KERNEL);
+	workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!workspace->mem || !workspace->buf)
+		goto fail;
+
+	INIT_LIST_HEAD(&workspace->list);
+
+	return &workspace->list;
+fail:
+	zstd_free_workspace(&workspace->list);
+	return ERR_PTR(-ENOMEM);
+}
+
+static int zstd_compress_pages(struct list_head *ws,
+		struct address_space *mapping,
+		u64 start,
+		struct page **pages,
+		unsigned long *out_pages,
+		unsigned long *total_in,
+		unsigned long *total_out)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	ZSTD_CStream *stream;
+	int ret = 0;
+	int nr_pages = 0;
+	struct page *in_page = NULL;  /* The current page to read */
+	struct page *out_page = NULL; /* The current page to write to */
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+	unsigned long tot_in = 0;
+	unsigned long tot_out = 0;
+	unsigned long len = *total_out;
+	const unsigned long nr_dest_pages = *out_pages;
+	unsigned long max_out = nr_dest_pages * PAGE_SIZE;
+	ZSTD_parameters params = zstd_get_btrfs_parameters(len);
+
+	*out_pages = 0;
+	*total_out = 0;
+	*total_in = 0;
+
+	/* Initialize the stream */
+	stream = ZSTD_initCStream(params, len, workspace->mem,
+			workspace->size);
+	if (!stream) {
+		pr_warn("BTRFS: ZSTD_initCStream failed\n");
+		ret = -EIO;
+		goto out;
+	}
+
+	/* map in the first page of input data */
+	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+	in_buf.src = kmap(in_page);
+	in_buf.pos = 0;
+	in_buf.size = min_t(size_t, len, PAGE_SIZE);
+
+
+	/* Allocate and map in the output buffer */
+	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+	if (out_page == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pages[nr_pages++] = out_page;
+	out_buf.dst = kmap(out_page);
+	out_buf.pos = 0;
+	out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_compressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_compressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto out;
+		}
+
+		/* Check to see if we are making it bigger */
+		if (tot_in + in_buf.pos > 8192 &&
+				tot_in + in_buf.pos <
+				tot_out + out_buf.pos) {
+			ret = -E2BIG;
+			goto out;
+		}
+
+		/* We've reached the end of our output range */
+		if (out_buf.pos >= max_out) {
+			tot_out += out_buf.pos;
+			ret = -E2BIG;
+			goto out;
+		}
+
+		/* Check if we need more output space */
+		if (out_buf.pos == out_buf.size) {
+			tot_out += PAGE_SIZE;
+			max_out -= PAGE_SIZE;
+			kunmap(out_page);
+			if (nr_pages == nr_dest_pages) {
+				out_page = NULL;
+				ret = -E2BIG;
+				goto out;
+			}
+			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			if (out_page == NULL) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			pages[nr_pages++] = out_page;
+			out_buf.dst = kmap(out_page);
+			out_buf.pos = 0;
+			out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+		}
+
+		/* We've reached the end of the input */
+		if (in_buf.pos >= len) {
+			tot_in += in_buf.pos;
+			break;
+		}
+
+		/* Check if we need more input */
+		if (in_buf.pos == in_buf.size) {
+			tot_in += PAGE_SIZE;
+			kunmap(in_page);
+			put_page(in_page);
+
+			start += PAGE_SIZE;
+			len -= PAGE_SIZE;
+			in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+			in_buf.src = kmap(in_page);
+			in_buf.pos = 0;
+			in_buf.size = min_t(size_t, len, PAGE_SIZE);
+		}
+	}
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_endStream(stream, &out_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_endStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto out;
+		}
+		if (ret2 == 0) {
+			tot_out += out_buf.pos;
+			break;
+		}
+		if (out_buf.pos >= max_out) {
+			tot_out += out_buf.pos;
+			ret = -E2BIG;
+			goto out;
+		}
+
+		tot_out += PAGE_SIZE;
+		max_out -= PAGE_SIZE;
+		kunmap(out_page);
+		if (nr_pages == nr_dest_pages) {
+			out_page = NULL;
+			ret = -E2BIG;
+			goto out;
+		}
+		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (out_page == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		pages[nr_pages++] = out_page;
+		out_buf.dst = kmap(out_page);
+		out_buf.pos = 0;
+		out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
+	}
+
+	if (tot_out >= tot_in) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	ret = 0;
+	*total_in = tot_in;
+	*total_out = tot_out;
+out:
+	*out_pages = nr_pages;
+	/* Cleanup */
+	if (in_page) {
+		kunmap(in_page);
+		put_page(in_page);
+	}
+	if (out_page)
+		kunmap(out_page);
+	return ret;
+}
+
+static int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	struct page **pages_in = cb->compressed_pages;
+	u64 disk_start = cb->start;
+	struct bio *orig_bio = cb->orig_bio;
+	size_t srclen = cb->compressed_len;
+	ZSTD_DStream *stream;
+	int ret = 0;
+	unsigned long page_in_index = 0;
+	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
+	unsigned long buf_start;
+	unsigned long total_out = 0;
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+
+	stream = ZSTD_initDStream(
+			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
+	if (!stream) {
+		pr_debug("BTRFS: ZSTD_initDStream failed\n");
+		ret = -EIO;
+		goto done;
+	}
+
+	in_buf.src = kmap(pages_in[page_in_index]);
+	in_buf.pos = 0;
+	in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+
+	out_buf.dst = workspace->buf;
+	out_buf.pos = 0;
+	out_buf.size = PAGE_SIZE;
+
+	while (1) {
+		size_t ret2;
+
+		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto done;
+		}
+		buf_start = total_out;
+		total_out += out_buf.pos;
+		out_buf.pos = 0;
+
+		ret = btrfs_decompress_buf2page(out_buf.dst, buf_start,
+				total_out, disk_start, orig_bio);
+		if (ret == 0)
+			break;
+
+		if (in_buf.pos >= srclen)
+			break;
+
+		/* Check if we've hit the end of a frame */
+		if (ret2 == 0)
+			break;
+
+		if (in_buf.pos == in_buf.size) {
+			kunmap(pages_in[page_in_index++]);
+			if (page_in_index >= total_pages_in) {
+				in_buf.src = NULL;
+				ret = -EIO;
+				goto done;
+			}
+			srclen -= PAGE_SIZE;
+			in_buf.src = kmap(pages_in[page_in_index]);
+			in_buf.pos = 0;
+			in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
+		}
+	}
+	ret = 0;
+	zero_fill_bio(orig_bio);
+done:
+	if (in_buf.src)
+		kunmap(pages_in[page_in_index]);
+	return ret;
+}
+
+static int zstd_decompress(struct list_head *ws, unsigned char *data_in,
+		struct page *dest_page,
+		unsigned long start_byte,
+		size_t srclen, size_t destlen)
+{
+	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	ZSTD_DStream *stream;
+	int ret = 0;
+	size_t ret2;
+	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
+	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
+	unsigned long total_out = 0;
+	unsigned long pg_offset = 0;
+	char *kaddr;
+
+	stream = ZSTD_initDStream(
+			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
+	if (!stream) {
+		pr_warn("BTRFS: ZSTD_initDStream failed\n");
+		ret = -EIO;
+		goto finish;
+	}
+
+	destlen = min_t(size_t, destlen, PAGE_SIZE);
+
+	in_buf.src = data_in;
+	in_buf.pos = 0;
+	in_buf.size = srclen;
+
+	out_buf.dst = workspace->buf;
+	out_buf.pos = 0;
+	out_buf.size = PAGE_SIZE;
+
+	ret2 = 1;
+	while (pg_offset < destlen && in_buf.pos < in_buf.size) {
+		unsigned long buf_start;
+		unsigned long buf_offset;
+		unsigned long bytes;
+
+		/* Check if the frame is over and we still need more input */
+		if (ret2 == 0) {
+			pr_debug("BTRFS: ZSTD_decompressStream ended early\n");
+			ret = -EIO;
+			goto finish;
+		}
+		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
+		if (ZSTD_isError(ret2)) {
+			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
+					ZSTD_getErrorCode(ret2));
+			ret = -EIO;
+			goto finish;
+		}
+
+		buf_start = total_out;
+		total_out += out_buf.pos;
+		out_buf.pos = 0;
+
+		if (total_out <= start_byte)
+			continue;
+
+		if (total_out > start_byte && buf_start < start_byte)
+			buf_offset = start_byte - buf_start;
+		else
+			buf_offset = 0;
+
+		bytes = min_t(unsigned long, destlen - pg_offset,
+				out_buf.size - buf_offset);
+
+		kaddr = kmap_atomic(dest_page);
+		memcpy(kaddr + pg_offset, out_buf.dst + buf_offset, bytes);
+		kunmap_atomic(kaddr);
+
+		pg_offset += bytes;
+	}
+	ret = 0;
+finish:
+	if (pg_offset < destlen) {
+		kaddr = kmap_atomic(dest_page);
+		memset(kaddr + pg_offset, 0, destlen - pg_offset);
+		kunmap_atomic(kaddr);
+	}
+	return ret;
+}
+
+const struct btrfs_compress_op btrfs_zstd_compress = {
+	.alloc_workspace = zstd_alloc_workspace,
+	.free_workspace = zstd_free_workspace,
+	.compress_pages = zstd_compress_pages,
+	.decompress_bio = zstd_decompress_bio,
+	.decompress = zstd_decompress,
+};
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 9aa74f3..378230c 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -255,13 +255,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
 #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
-/*
- * some patches floated around with a second compression method
- * lets save that incompat here for when they do get in
- * Note we don't actually support it, we're just reserving the
- * number
- */
-#define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZOv2	(1ULL << 4)
+#define BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD	(1ULL << 4)

 /*
  * older kernels tried to do bigger metadata blocks, but the
--
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 3/5] btrfs: Add zstd support
  2017-08-10  2:39 ` [PATCH v5 3/5] btrfs: Add zstd support Nick Terrell
@ 2017-08-11  2:13   ` Adam Borowski
  2017-08-11  3:23     ` Nick Terrell
  2017-08-11 11:45   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 21+ messages in thread
From: Adam Borowski @ 2017-08-11  2:13 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Austin S . Hemmelgarn, kernel-team, Chris Mason, Yann Collet,
	David Sterba, linux-btrfs, linux-kernel

On Wed, Aug 09, 2017 at 07:39:02PM -0700, Nick Terrell wrote:
> Add zstd compression and decompression support to BtrFS.

Re-tested on arm64, amd64 and i386, this time everything seems fine so far.

As I'm too lazy to have a separate test setup for the zlib level patch,
I'm using a dummy implementation:

--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -423,10 +423,16 @@ static int zstd_decompress(struct list_head *ws, unsigned char *data_in,
        return ret;
 }
 
+static void zstd_set_level(struct list_head *ws, unsigned int type)
+{
+       // TODO
+}
+
 const struct btrfs_compress_op btrfs_zstd_compress = {
        .alloc_workspace = zstd_alloc_workspace,
        .free_workspace = zstd_free_workspace,
        .compress_pages = zstd_compress_pages,
        .decompress_bio = zstd_decompress_bio,
        .decompress = zstd_decompress,
+       .set_level = zstd_set_level,
 };


It might be worthwhile to do some early testing using different levels,
though.


喵!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 3/5] btrfs: Add zstd support
  2017-08-11  2:13   ` Adam Borowski
@ 2017-08-11  3:23     ` Nick Terrell
  0 siblings, 0 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-11  3:23 UTC (permalink / raw)
  To: Adam Borowski
  Cc: Austin S . Hemmelgarn, Kernel Team, Chris Mason, Yann Collet,
	David Sterba, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1558 bytes --]

On 8/10/17, 7:13 PM, "Adam Borowski" <kilobyte@angband.pl> wrote:
> On Wed, Aug 09, 2017 at 07:39:02PM -0700, Nick Terrell wrote:
>> Add zstd compression and decompression support to BtrFS.
> 
> Re-tested on arm64, amd64 and i386, this time everything seems fine so far.
> 
> As I'm too lazy to have a separate test setup for the zlib level patch,
> I'm using a dummy implementation:
> 
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -423,10 +423,16 @@ static int zstd_decompress(struct list_head *ws, unsigned char *data_in,
>         return ret;
>  }
>  
> +static void zstd_set_level(struct list_head *ws, unsigned int type)
> +{
> +       // TODO
> +}
> +
>  const struct btrfs_compress_op btrfs_zstd_compress = {
>         .alloc_workspace = zstd_alloc_workspace,
>         .free_workspace = zstd_free_workspace,
>         .compress_pages = zstd_compress_pages,
>         .decompress_bio = zstd_decompress_bio,
>         .decompress = zstd_decompress,
> +       .set_level = zstd_set_level,
>  };
> 
> 
> It might be worthwhile to do some early testing using different levels,
> though.

I'll run some preliminary tests and make sure nothing blows up and the
compression ratio looks good.

> 
> 
> å–µ!
> -- 
> â¢€â£´â ¾â »â¢¶â£¦â € 
> â£¾â â¢ â ’â €â£¿â¡ A dumb species has no way to open a tuna can.
> â¢¿â¡„â ˜â ·â šâ ‹â € A smart species invents a can opener.
> â ˆâ ³â£„â €â €â €â € A master species delegates.


ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨èÚ&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~††Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ß£øm

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 3/5] btrfs: Add zstd support
  2017-08-10  2:39 ` [PATCH v5 3/5] btrfs: Add zstd support Nick Terrell
  2017-08-11  2:13   ` Adam Borowski
@ 2017-08-11 11:45   ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-11 11:45 UTC (permalink / raw)
  To: Nick Terrell
  Cc: kernel-team, Chris Mason, Yann Collet, Adam Borowski,
	David Sterba, linux-btrfs, linux-kernel

On 2017-08-09 22:39, Nick Terrell wrote:
> Add zstd compression and decompression support to BtrFS. zstd at its
> fastest level compresses almost as well as zlib, while offering much
> faster compression and decompression, approaching lzo speeds.
> 
> I benchmarked btrfs with zstd compression against no compression, lzo
> compression, and zlib compression. I benchmarked two scenarios. Copying
> a set of files to btrfs, and then reading the files. Copying a tarball
> to btrfs, extracting it to btrfs, and then reading the extracted files.
> After every operation, I call `sync` and include the sync time.
> Between every pair of operations I unmount and remount the filesystem
> to avoid caching. The benchmark files can be found in the upstream
> zstd source repository under
> `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
> [1] [2].
> 
> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> 16 GB of RAM, and a SSD.
> 
> The first compression benchmark is copying 10 copies of the unzipped
> Silesia corpus [3] into a BtrFS filesystem mounted with
> `-o compress-force=Method`. The decompression benchmark times how long
> it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
> measured by comparing the output of `df` and `du`. See the benchmark file
> [1] for details. I benchmarked multiple zstd compression levels, although
> the patch uses zstd level 1.
> 
> | Method  | Ratio | Compression MB/s | Decompression speed |
> |---------|-------|------------------|---------------------|
> | None    |  0.99 |              504 |                 686 |
> | lzo     |  1.66 |              398 |                 442 |
> | zlib    |  2.58 |               65 |                 241 |
> | zstd 1  |  2.57 |              260 |                 383 |
> | zstd 3  |  2.71 |              174 |                 408 |
> | zstd 6  |  2.87 |               70 |                 398 |
> | zstd 9  |  2.92 |               43 |                 406 |
> | zstd 12 |  2.93 |               21 |                 408 |
> | zstd 15 |  3.01 |               11 |                 354 |
> 
> The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
> measures the compression ratio, extracts the tar, and deletes the tar.
> Then it measures the compression ratio again, and `tar`s the extracted
> files into `/dev/null`. See the benchmark file [2] for details.
> 
> | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
> |--------|-----------|---------------|----------|------------|----------|
> | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
> | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
> | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
> | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
> 
> [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
> [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
> [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
> [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
> 
> zstd source repository: https://github.com/facebook/zstd
> 
> Signed-off-by: Nick Terrell <terrelln@fb.com>
Considering how things went with the previous patch, I've been a bit 
more aggressive testing this one, but after now almost 72 hours of 
combined runtime for each of the architectures I have tests set up for 
with nothing breaking (well, nothing breaking that wasn't already 
breaking before this patch set, some of the raid56 tests are still 
failing semi-reliably as expected), I'd say this is reasonably stable 
and looks good overall.
> ---
> v2 -> v3:
> - Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff
> - Change default compression level for BtrFS to 3
> 
> v3 -> v4:
> - Add missing includes, which fixes the aarch64 build
> - Fix minor linter warnings
> 
>   fs/btrfs/Kconfig           |   2 +
>   fs/btrfs/Makefile          |   2 +-
>   fs/btrfs/compression.c     |   1 +
>   fs/btrfs/compression.h     |   6 +-
>   fs/btrfs/ctree.h           |   1 +
>   fs/btrfs/disk-io.c         |   2 +
>   fs/btrfs/ioctl.c           |   6 +-
>   fs/btrfs/props.c           |   6 +
>   fs/btrfs/super.c           |  12 +-
>   fs/btrfs/sysfs.c           |   2 +
>   fs/btrfs/zstd.c            | 432 +++++++++++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/btrfs.h |   8 +-
>   12 files changed, 468 insertions(+), 12 deletions(-)
>   create mode 100644 fs/btrfs/zstd.c
> 
> diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
> index 80e9c18..a26c63b 100644
> --- a/fs/btrfs/Kconfig
> +++ b/fs/btrfs/Kconfig
> @@ -6,6 +6,8 @@ config BTRFS_FS
>   	select ZLIB_DEFLATE
>   	select LZO_COMPRESS
>   	select LZO_DECOMPRESS
> +	select ZSTD_COMPRESS
> +	select ZSTD_DECOMPRESS
>   	select RAID6_PQ
>   	select XOR_BLOCKS
>   	select SRCU
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 128ce17..962a95a 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -6,7 +6,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
>   	   transaction.o inode.o file.o tree-defrag.o \
>   	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
>   	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
> -	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
> +	   export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
>   	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
>   	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
>   	   uuid-tree.o props.o hash.o free-space-tree.o
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index d2ef9ac..4ff42d1 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -704,6 +704,7 @@ static struct {
>   static const struct btrfs_compress_op * const btrfs_compress_op[] = {
>   	&btrfs_zlib_compress,
>   	&btrfs_lzo_compress,
> +	&btrfs_zstd_compress,
>   };
> 
>   void __init btrfs_init_compress(void)
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index 87f6d33..2269e00 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -99,8 +99,9 @@ enum btrfs_compression_type {
>   	BTRFS_COMPRESS_NONE  = 0,
>   	BTRFS_COMPRESS_ZLIB  = 1,
>   	BTRFS_COMPRESS_LZO   = 2,
> -	BTRFS_COMPRESS_TYPES = 2,
> -	BTRFS_COMPRESS_LAST  = 3,
> +	BTRFS_COMPRESS_ZSTD  = 3,
> +	BTRFS_COMPRESS_TYPES = 3,
> +	BTRFS_COMPRESS_LAST  = 4,
>   };
> 
>   struct btrfs_compress_op {
> @@ -128,5 +129,6 @@ struct btrfs_compress_op {
> 
>   extern const struct btrfs_compress_op btrfs_zlib_compress;
>   extern const struct btrfs_compress_op btrfs_lzo_compress;
> +extern const struct btrfs_compress_op btrfs_zstd_compress;
> 
>   #endif
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3f3eb7b..845d77c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -270,6 +270,7 @@ struct btrfs_super_block {
>   	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
>   	 BTRFS_FEATURE_INCOMPAT_BIG_METADATA |		\
>   	 BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO |		\
> +	 BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD |		\
>   	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
>   	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
>   	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 080e2eb..04632f4 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2828,6 +2828,8 @@ int open_ctree(struct super_block *sb,
>   	features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
>   	if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
>   		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
> +	else if (fs_info->compress_type == BTRFS_COMPRESS_ZSTD)
> +		features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD;
> 
>   	if (features & BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
>   		btrfs_info(fs_info, "has skinny extents");
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index fa1b78c..b9963d9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -327,8 +327,10 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
> 
>   		if (fs_info->compress_type == BTRFS_COMPRESS_LZO)
>   			comp = "lzo";
> -		else
> +		else if (fs_info->compress_type == BTRFS_COMPRESS_ZLIB)
>   			comp = "zlib";
> +		else
> +			comp = "zstd";
>   		ret = btrfs_set_prop(inode, "btrfs.compression",
>   				     comp, strlen(comp), 0);
>   		if (ret)
> @@ -1466,6 +1468,8 @@ int btrfs_defrag_file(struct inode *inode, struct file *file,
> 
>   	if (range->compress_type == BTRFS_COMPRESS_LZO) {
>   		btrfs_set_fs_incompat(fs_info, COMPRESS_LZO);
> +	} else if (range->compress_type == BTRFS_COMPRESS_ZSTD) {
> +		btrfs_set_fs_incompat(fs_info, COMPRESS_ZSTD);
>   	}
> 
>   	ret = defrag_count;
> diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
> index 4b23ae5..20631e9 100644
> --- a/fs/btrfs/props.c
> +++ b/fs/btrfs/props.c
> @@ -390,6 +390,8 @@ static int prop_compression_validate(const char *value, size_t len)
>   		return 0;
>   	else if (!strncmp("zlib", value, len))
>   		return 0;
> +	else if (!strncmp("zstd", value, len))
> +		return 0;
> 
>   	return -EINVAL;
>   }
> @@ -412,6 +414,8 @@ static int prop_compression_apply(struct inode *inode,
>   		type = BTRFS_COMPRESS_LZO;
>   	else if (!strncmp("zlib", value, len))
>   		type = BTRFS_COMPRESS_ZLIB;
> +	else if (!strncmp("zstd", value, len))
> +		type = BTRFS_COMPRESS_ZSTD;
>   	else
>   		return -EINVAL;
> 
> @@ -429,6 +433,8 @@ static const char *prop_compression_extract(struct inode *inode)
>   		return "zlib";
>   	case BTRFS_COMPRESS_LZO:
>   		return "lzo";
> +	case BTRFS_COMPRESS_ZSTD:
> +		return "zstd";
>   	}
> 
>   	return NULL;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 12540b6..c370dea 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -513,6 +513,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>   				btrfs_clear_opt(info->mount_opt, NODATASUM);
>   				btrfs_set_fs_incompat(info, COMPRESS_LZO);
>   				no_compress = 0;
> +			} else if (strcmp(args[0].from, "zstd") == 0) {
> +				compress_type = "zstd";
> +				info->compress_type = BTRFS_COMPRESS_ZSTD;
> +				btrfs_set_opt(info->mount_opt, COMPRESS);
> +				btrfs_clear_opt(info->mount_opt, NODATACOW);
> +				btrfs_clear_opt(info->mount_opt, NODATASUM);
> +				btrfs_set_fs_incompat(info, COMPRESS_ZSTD);
> +				no_compress = 0;
>   			} else if (strncmp(args[0].from, "no", 2) == 0) {
>   				compress_type = "no";
>   				btrfs_clear_opt(info->mount_opt, COMPRESS);
> @@ -1227,8 +1235,10 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
>   	if (btrfs_test_opt(info, COMPRESS)) {
>   		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
>   			compress_type = "zlib";
> -		else
> +		else if (info->compress_type == BTRFS_COMPRESS_LZO)
>   			compress_type = "lzo";
> +		else
> +			compress_type = "zstd";
>   		if (btrfs_test_opt(info, FORCE_COMPRESS))
>   			seq_printf(seq, ",compress-force=%s", compress_type);
>   		else
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index c2d5f35..2b6d37c 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -200,6 +200,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(mixed_backref, MIXED_BACKREF);
>   BTRFS_FEAT_ATTR_INCOMPAT(default_subvol, DEFAULT_SUBVOL);
>   BTRFS_FEAT_ATTR_INCOMPAT(mixed_groups, MIXED_GROUPS);
>   BTRFS_FEAT_ATTR_INCOMPAT(compress_lzo, COMPRESS_LZO);
> +BTRFS_FEAT_ATTR_INCOMPAT(compress_zstd, COMPRESS_ZSTD);
>   BTRFS_FEAT_ATTR_INCOMPAT(big_metadata, BIG_METADATA);
>   BTRFS_FEAT_ATTR_INCOMPAT(extended_iref, EXTENDED_IREF);
>   BTRFS_FEAT_ATTR_INCOMPAT(raid56, RAID56);
> @@ -212,6 +213,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
>   	BTRFS_FEAT_ATTR_PTR(default_subvol),
>   	BTRFS_FEAT_ATTR_PTR(mixed_groups),
>   	BTRFS_FEAT_ATTR_PTR(compress_lzo),
> +	BTRFS_FEAT_ATTR_PTR(compress_zstd),
>   	BTRFS_FEAT_ATTR_PTR(big_metadata),
>   	BTRFS_FEAT_ATTR_PTR(extended_iref),
>   	BTRFS_FEAT_ATTR_PTR(raid56),
> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> new file mode 100644
> index 0000000..607ce47
> --- /dev/null
> +++ b/fs/btrfs/zstd.c
> @@ -0,0 +1,432 @@
> +/*
> + * Copyright (c) 2016-present, Facebook, Inc.
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +#include <linux/bio.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/pagemap.h>
> +#include <linux/refcount.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/zstd.h>
> +#include "compression.h"
> +
> +#define ZSTD_BTRFS_MAX_WINDOWLOG 17
> +#define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
> +#define ZSTD_BTRFS_DEFAULT_LEVEL 3
> +
> +static ZSTD_parameters zstd_get_btrfs_parameters(size_t src_len)
> +{
> +	ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL,
> +						src_len, 0);
> +
> +	if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
> +		params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
> +	WARN_ON(src_len > ZSTD_BTRFS_MAX_INPUT);
> +	return params;
> +}
> +
> +struct workspace {
> +	void *mem;
> +	size_t size;
> +	char *buf;
> +	struct list_head list;
> +};
> +
> +static void zstd_free_workspace(struct list_head *ws)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +
> +	kvfree(workspace->mem);
> +	kfree(workspace->buf);
> +	kfree(workspace);
> +}
> +
> +static struct list_head *zstd_alloc_workspace(void)
> +{
> +	ZSTD_parameters params =
> +			zstd_get_btrfs_parameters(ZSTD_BTRFS_MAX_INPUT);
> +	struct workspace *workspace;
> +
> +	workspace = kzalloc(sizeof(*workspace), GFP_KERNEL);
> +	if (!workspace)
> +		return ERR_PTR(-ENOMEM);
> +
> +	workspace->size = max_t(size_t,
> +			ZSTD_CStreamWorkspaceBound(params.cParams),
> +			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> +	workspace->mem = kvmalloc(workspace->size, GFP_KERNEL);
> +	workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	if (!workspace->mem || !workspace->buf)
> +		goto fail;
> +
> +	INIT_LIST_HEAD(&workspace->list);
> +
> +	return &workspace->list;
> +fail:
> +	zstd_free_workspace(&workspace->list);
> +	return ERR_PTR(-ENOMEM);
> +}
> +
> +static int zstd_compress_pages(struct list_head *ws,
> +		struct address_space *mapping,
> +		u64 start,
> +		struct page **pages,
> +		unsigned long *out_pages,
> +		unsigned long *total_in,
> +		unsigned long *total_out)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +	ZSTD_CStream *stream;
> +	int ret = 0;
> +	int nr_pages = 0;
> +	struct page *in_page = NULL;  /* The current page to read */
> +	struct page *out_page = NULL; /* The current page to write to */
> +	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
> +	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
> +	unsigned long tot_in = 0;
> +	unsigned long tot_out = 0;
> +	unsigned long len = *total_out;
> +	const unsigned long nr_dest_pages = *out_pages;
> +	unsigned long max_out = nr_dest_pages * PAGE_SIZE;
> +	ZSTD_parameters params = zstd_get_btrfs_parameters(len);
> +
> +	*out_pages = 0;
> +	*total_out = 0;
> +	*total_in = 0;
> +
> +	/* Initialize the stream */
> +	stream = ZSTD_initCStream(params, len, workspace->mem,
> +			workspace->size);
> +	if (!stream) {
> +		pr_warn("BTRFS: ZSTD_initCStream failed\n");
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	/* map in the first page of input data */
> +	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> +	in_buf.src = kmap(in_page);
> +	in_buf.pos = 0;
> +	in_buf.size = min_t(size_t, len, PAGE_SIZE);
> +
> +
> +	/* Allocate and map in the output buffer */
> +	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +	if (out_page == NULL) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	pages[nr_pages++] = out_page;
> +	out_buf.dst = kmap(out_page);
> +	out_buf.pos = 0;
> +	out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
> +
> +	while (1) {
> +		size_t ret2;
> +
> +		ret2 = ZSTD_compressStream(stream, &out_buf, &in_buf);
> +		if (ZSTD_isError(ret2)) {
> +			pr_debug("BTRFS: ZSTD_compressStream returned %d\n",
> +					ZSTD_getErrorCode(ret2));
> +			ret = -EIO;
> +			goto out;
> +		}
> +
> +		/* Check to see if we are making it bigger */
> +		if (tot_in + in_buf.pos > 8192 &&
> +				tot_in + in_buf.pos <
> +				tot_out + out_buf.pos) {
> +			ret = -E2BIG;
> +			goto out;
> +		}
> +
> +		/* We've reached the end of our output range */
> +		if (out_buf.pos >= max_out) {
> +			tot_out += out_buf.pos;
> +			ret = -E2BIG;
> +			goto out;
> +		}
> +
> +		/* Check if we need more output space */
> +		if (out_buf.pos == out_buf.size) {
> +			tot_out += PAGE_SIZE;
> +			max_out -= PAGE_SIZE;
> +			kunmap(out_page);
> +			if (nr_pages == nr_dest_pages) {
> +				out_page = NULL;
> +				ret = -E2BIG;
> +				goto out;
> +			}
> +			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +			if (out_page == NULL) {
> +				ret = -ENOMEM;
> +				goto out;
> +			}
> +			pages[nr_pages++] = out_page;
> +			out_buf.dst = kmap(out_page);
> +			out_buf.pos = 0;
> +			out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
> +		}
> +
> +		/* We've reached the end of the input */
> +		if (in_buf.pos >= len) {
> +			tot_in += in_buf.pos;
> +			break;
> +		}
> +
> +		/* Check if we need more input */
> +		if (in_buf.pos == in_buf.size) {
> +			tot_in += PAGE_SIZE;
> +			kunmap(in_page);
> +			put_page(in_page);
> +
> +			start += PAGE_SIZE;
> +			len -= PAGE_SIZE;
> +			in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> +			in_buf.src = kmap(in_page);
> +			in_buf.pos = 0;
> +			in_buf.size = min_t(size_t, len, PAGE_SIZE);
> +		}
> +	}
> +	while (1) {
> +		size_t ret2;
> +
> +		ret2 = ZSTD_endStream(stream, &out_buf);
> +		if (ZSTD_isError(ret2)) {
> +			pr_debug("BTRFS: ZSTD_endStream returned %d\n",
> +					ZSTD_getErrorCode(ret2));
> +			ret = -EIO;
> +			goto out;
> +		}
> +		if (ret2 == 0) {
> +			tot_out += out_buf.pos;
> +			break;
> +		}
> +		if (out_buf.pos >= max_out) {
> +			tot_out += out_buf.pos;
> +			ret = -E2BIG;
> +			goto out;
> +		}
> +
> +		tot_out += PAGE_SIZE;
> +		max_out -= PAGE_SIZE;
> +		kunmap(out_page);
> +		if (nr_pages == nr_dest_pages) {
> +			out_page = NULL;
> +			ret = -E2BIG;
> +			goto out;
> +		}
> +		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +		if (out_page == NULL) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		pages[nr_pages++] = out_page;
> +		out_buf.dst = kmap(out_page);
> +		out_buf.pos = 0;
> +		out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
> +	}
> +
> +	if (tot_out >= tot_in) {
> +		ret = -E2BIG;
> +		goto out;
> +	}
> +
> +	ret = 0;
> +	*total_in = tot_in;
> +	*total_out = tot_out;
> +out:
> +	*out_pages = nr_pages;
> +	/* Cleanup */
> +	if (in_page) {
> +		kunmap(in_page);
> +		put_page(in_page);
> +	}
> +	if (out_page)
> +		kunmap(out_page);
> +	return ret;
> +}
> +
> +static int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +	struct page **pages_in = cb->compressed_pages;
> +	u64 disk_start = cb->start;
> +	struct bio *orig_bio = cb->orig_bio;
> +	size_t srclen = cb->compressed_len;
> +	ZSTD_DStream *stream;
> +	int ret = 0;
> +	unsigned long page_in_index = 0;
> +	unsigned long total_pages_in = DIV_ROUND_UP(srclen, PAGE_SIZE);
> +	unsigned long buf_start;
> +	unsigned long total_out = 0;
> +	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
> +	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
> +
> +	stream = ZSTD_initDStream(
> +			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
> +	if (!stream) {
> +		pr_debug("BTRFS: ZSTD_initDStream failed\n");
> +		ret = -EIO;
> +		goto done;
> +	}
> +
> +	in_buf.src = kmap(pages_in[page_in_index]);
> +	in_buf.pos = 0;
> +	in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
> +
> +	out_buf.dst = workspace->buf;
> +	out_buf.pos = 0;
> +	out_buf.size = PAGE_SIZE;
> +
> +	while (1) {
> +		size_t ret2;
> +
> +		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
> +		if (ZSTD_isError(ret2)) {
> +			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
> +					ZSTD_getErrorCode(ret2));
> +			ret = -EIO;
> +			goto done;
> +		}
> +		buf_start = total_out;
> +		total_out += out_buf.pos;
> +		out_buf.pos = 0;
> +
> +		ret = btrfs_decompress_buf2page(out_buf.dst, buf_start,
> +				total_out, disk_start, orig_bio);
> +		if (ret == 0)
> +			break;
> +
> +		if (in_buf.pos >= srclen)
> +			break;
> +
> +		/* Check if we've hit the end of a frame */
> +		if (ret2 == 0)
> +			break;
> +
> +		if (in_buf.pos == in_buf.size) {
> +			kunmap(pages_in[page_in_index++]);
> +			if (page_in_index >= total_pages_in) {
> +				in_buf.src = NULL;
> +				ret = -EIO;
> +				goto done;
> +			}
> +			srclen -= PAGE_SIZE;
> +			in_buf.src = kmap(pages_in[page_in_index]);
> +			in_buf.pos = 0;
> +			in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
> +		}
> +	}
> +	ret = 0;
> +	zero_fill_bio(orig_bio);
> +done:
> +	if (in_buf.src)
> +		kunmap(pages_in[page_in_index]);
> +	return ret;
> +}
> +
> +static int zstd_decompress(struct list_head *ws, unsigned char *data_in,
> +		struct page *dest_page,
> +		unsigned long start_byte,
> +		size_t srclen, size_t destlen)
> +{
> +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> +	ZSTD_DStream *stream;
> +	int ret = 0;
> +	size_t ret2;
> +	ZSTD_inBuffer in_buf = { NULL, 0, 0 };
> +	ZSTD_outBuffer out_buf = { NULL, 0, 0 };
> +	unsigned long total_out = 0;
> +	unsigned long pg_offset = 0;
> +	char *kaddr;
> +
> +	stream = ZSTD_initDStream(
> +			ZSTD_BTRFS_MAX_INPUT, workspace->mem, workspace->size);
> +	if (!stream) {
> +		pr_warn("BTRFS: ZSTD_initDStream failed\n");
> +		ret = -EIO;
> +		goto finish;
> +	}
> +
> +	destlen = min_t(size_t, destlen, PAGE_SIZE);
> +
> +	in_buf.src = data_in;
> +	in_buf.pos = 0;
> +	in_buf.size = srclen;
> +
> +	out_buf.dst = workspace->buf;
> +	out_buf.pos = 0;
> +	out_buf.size = PAGE_SIZE;
> +
> +	ret2 = 1;
> +	while (pg_offset < destlen && in_buf.pos < in_buf.size) {
> +		unsigned long buf_start;
> +		unsigned long buf_offset;
> +		unsigned long bytes;
> +
> +		/* Check if the frame is over and we still need more input */
> +		if (ret2 == 0) {
> +			pr_debug("BTRFS: ZSTD_decompressStream ended early\n");
> +			ret = -EIO;
> +			goto finish;
> +		}
> +		ret2 = ZSTD_decompressStream(stream, &out_buf, &in_buf);
> +		if (ZSTD_isError(ret2)) {
> +			pr_debug("BTRFS: ZSTD_decompressStream returned %d\n",
> +					ZSTD_getErrorCode(ret2));
> +			ret = -EIO;
> +			goto finish;
> +		}
> +
> +		buf_start = total_out;
> +		total_out += out_buf.pos;
> +		out_buf.pos = 0;
> +
> +		if (total_out <= start_byte)
> +			continue;
> +
> +		if (total_out > start_byte && buf_start < start_byte)
> +			buf_offset = start_byte - buf_start;
> +		else
> +			buf_offset = 0;
> +
> +		bytes = min_t(unsigned long, destlen - pg_offset,
> +				out_buf.size - buf_offset);
> +
> +		kaddr = kmap_atomic(dest_page);
> +		memcpy(kaddr + pg_offset, out_buf.dst + buf_offset, bytes);
> +		kunmap_atomic(kaddr);
> +
> +		pg_offset += bytes;
> +	}
> +	ret = 0;
> +finish:
> +	if (pg_offset < destlen) {
> +		kaddr = kmap_atomic(dest_page);
> +		memset(kaddr + pg_offset, 0, destlen - pg_offset);
> +		kunmap_atomic(kaddr);
> +	}
> +	return ret;
> +}
> +
> +const struct btrfs_compress_op btrfs_zstd_compress = {
> +	.alloc_workspace = zstd_alloc_workspace,
> +	.free_workspace = zstd_free_workspace,
> +	.compress_pages = zstd_compress_pages,
> +	.decompress_bio = zstd_decompress_bio,
> +	.decompress = zstd_decompress,
> +};
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index 9aa74f3..378230c 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -255,13 +255,7 @@ struct btrfs_ioctl_fs_info_args {
>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
>   #define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO	(1ULL << 3)
> -/*
> - * some patches floated around with a second compression method
> - * lets save that incompat here for when they do get in
> - * Note we don't actually support it, we're just reserving the
> - * number
> - */
> -#define BTRFS_FEATURE_INCOMPAT_COMPRESS_LZOv2	(1ULL << 4)
> +#define BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD	(1ULL << 4)
> 
>   /*
>    * older kernels tried to do bigger metadata blocks, but the
> --
> 2.9.3
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <20170810023553.3200875-3-terrelln@fb.com>]

* Re: [PATCH v5 2/5] lib: Add zstd modules
       [not found] ` <20170810023553.3200875-3-terrelln@fb.com>
@ 2017-08-10  8:30   ` Eric Biggers
  2017-08-10 11:32     ` Austin S. Hemmelgarn
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Eric Biggers @ 2017-08-10  8:30 UTC (permalink / raw)
  To: Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>
> It can compress at speeds approaching lz4, and quality approaching lzma.

Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.

> 
> The code was ported from the upstream zstd source repository.

What version?

> `linux/zstd.h` header was modified to match linux kernel style.
> The cross-platform and allocation code was stripped out. Instead zstd
> requires the caller to pass a preallocated workspace. The source files
> were clang-formatted [1] to match the Linux Kernel style as much as
> possible. 

It would be easier to compare to the upstream version if it was not all
reformatted.  There is a chance that bugs were introduced by Linux-specific
changes, and it would be nice if they could be easily reviewed.  (Also I don't
know what clang-format settings you used, but there are still a lot of
differences from the Linux coding style.)

> 
> I benchmarked zstd compression as a special character device. I ran zstd
> and zlib compression at several levels, as well as performing no
> compression, which measure the time spent copying the data to kernel space.
> Data is passed to the compresser 4096 B at a time. The benchmark file is
> located in the upstream zstd source repository under
> `contrib/linux-kernel/zstd_compress_test.c` [2].
> 
> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
> 211,988,480 B large. Run the following commands for the benchmark:
> 
>     sudo modprobe zstd_compress_test
>     sudo mknod zstd_compress_test c 245 0
>     sudo cp silesia.tar zstd_compress_test
> 
> The time is reported by the time of the userland `cp`.
> The MB/s is computed with
> 
>     1,536,217,008 B / time(buffer size, hash)
> 
> which includes the time to copy from userland.
> The Adjusted MB/s is computed with
> 
>     1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
> 
> The memory reported is the amount of memory the compressor requests.
> 
> | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
> |----------|----------|----------|-------|---------|----------|----------|
> | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
> | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
> | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
> | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
> | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
> | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
> | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
> | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
> | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
> | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
> 

Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10  8:30   ` [PATCH v5 2/5] lib: Add zstd modules Eric Biggers
@ 2017-08-10 11:32     ` Austin S. Hemmelgarn
  2017-08-10 14:57       ` Austin S. Hemmelgarn
  2017-08-10 17:24       ` Eric Biggers
  2017-08-10 17:41     ` Chris Mason
  2017-08-10 19:16     ` Nick Terrell
  2 siblings, 2 replies; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-10 11:32 UTC (permalink / raw)
  To: Eric Biggers, Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

On 2017-08-10 04:30, Eric Biggers wrote:
> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>
>> It can compress at speeds approaching lz4, and quality approaching lzma.
> 
> Well, for a very loose definition of "approaching", and certainly not at the
> same time.  I doubt there's a use case for using the highest compression levels
> in kernel mode --- especially the ones using zstd_opt.h.
Large data-sets with WORM access patterns and infrequent writes 
immediately come to mind as a use case for the highest compression level.

As a more specific example, the company I work for has a very large 
amount of documentation, and we keep all old versions.  This is all 
stored on a file server which is currently using BTRFS.  Once a document 
is written, it's almost never rewritten, so write performance only 
matters for the first write.  However, they're read back pretty 
frequently, so we need good read performance.  As of right now, the 
system is set to use LZO compression by default, and then when a new 
document is added, the previous version of that document gets 
re-compressed using zlib compression, which actually results in pretty 
significant space savings most of the time.  I would absolutely love to 
use zstd compression with this system with the highest compression 
level, because most people don't care how long it takes to write the 
file out, but they do care how long it takes to read a file (even if 
it's an older version).
> 
>>
>> The code was ported from the upstream zstd source repository.
> 
> What version?
> 
>> `linux/zstd.h` header was modified to match linux kernel style.
>> The cross-platform and allocation code was stripped out. Instead zstd
>> requires the caller to pass a preallocated workspace. The source files
>> were clang-formatted [1] to match the Linux Kernel style as much as
>> possible.
> 
> It would be easier to compare to the upstream version if it was not all
> reformatted.  There is a chance that bugs were introduced by Linux-specific
> changes, and it would be nice if they could be easily reviewed.  (Also I don't
> know what clang-format settings you used, but there are still a lot of
> differences from the Linux coding style.)
> 
>>
>> I benchmarked zstd compression as a special character device. I ran zstd
>> and zlib compression at several levels, as well as performing no
>> compression, which measure the time spent copying the data to kernel space.
>> Data is passed to the compresser 4096 B at a time. The benchmark file is
>> located in the upstream zstd source repository under
>> `contrib/linux-kernel/zstd_compress_test.c` [2].
>>
>> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
>> 211,988,480 B large. Run the following commands for the benchmark:
>>
>>      sudo modprobe zstd_compress_test
>>      sudo mknod zstd_compress_test c 245 0
>>      sudo cp silesia.tar zstd_compress_test
>>
>> The time is reported by the time of the userland `cp`.
>> The MB/s is computed with
>>
>>      1,536,217,008 B / time(buffer size, hash)
>>
>> which includes the time to copy from userland.
>> The Adjusted MB/s is computed with
>>
>>      1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
>>
>> The memory reported is the amount of memory the compressor requests.
>>
>> | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
>> |----------|----------|----------|-------|---------|----------|----------|
>> | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
>> | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
>> | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
>> | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
>> | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
>> | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
>> | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
>> | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
>> | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
>> | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
>>
> 
> Theses benchmarks are misleading because they compress the whole file as a
> single stream without resetting the dictionary, which isn't how data will
> typically be compressed in kernel mode.  With filesystem compression the data
> has to be divided into small chunks that can each be decompressed independently.
> That eliminates one of the primary advantages of Zstandard (support for large
> dictionary sizes).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 11:32     ` Austin S. Hemmelgarn
@ 2017-08-10 14:57       ` Austin S. Hemmelgarn
  2017-08-10 17:36         ` Eric Biggers
  2017-08-10 17:24       ` Eric Biggers
  1 sibling, 1 reply; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-10 14:57 UTC (permalink / raw)
  To: Eric Biggers, Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

On 2017-08-10 07:32, Austin S. Hemmelgarn wrote:
> On 2017-08-10 04:30, Eric Biggers wrote:
>> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>>
>>> It can compress at speeds approaching lz4, and quality approaching lzma.
>>
>> Well, for a very loose definition of "approaching", and certainly not 
>> at the
>> same time.  I doubt there's a use case for using the highest 
>> compression levels
>> in kernel mode --- especially the ones using zstd_opt.h.
> Large data-sets with WORM access patterns and infrequent writes 
> immediately come to mind as a use case for the highest compression level.
> 
> As a more specific example, the company I work for has a very large 
> amount of documentation, and we keep all old versions.  This is all 
> stored on a file server which is currently using BTRFS.  Once a document 
> is written, it's almost never rewritten, so write performance only 
> matters for the first write.  However, they're read back pretty 
> frequently, so we need good read performance.  As of right now, the 
> system is set to use LZO compression by default, and then when a new 
> document is added, the previous version of that document gets 
> re-compressed using zlib compression, which actually results in pretty 
> significant space savings most of the time.  I would absolutely love to 
> use zstd compression with this system with the highest compression 
> level, because most people don't care how long it takes to write the 
> file out, but they do care how long it takes to read a file (even if 
> it's an older version).
Also didn't think to mention this, but I could see the max level being 
very popular for use with SquashFS root filesystems used in LiveCD's. 
Currently, they have to decide between read performance and image size, 
while zstd would provide both.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 14:57       ` Austin S. Hemmelgarn
@ 2017-08-10 17:36         ` Eric Biggers
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Biggers @ 2017-08-10 17:36 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Nick Terrell, Herbert Xu, kernel-team, squashfs-devel,
	linux-btrfs, linux-kernel, linux-crypto

On Thu, Aug 10, 2017 at 10:57:01AM -0400, Austin S. Hemmelgarn wrote:
> Also didn't think to mention this, but I could see the max level
> being very popular for use with SquashFS root filesystems used in
> LiveCD's. Currently, they have to decide between read performance
> and image size, while zstd would provide both.

The high compression levels of Zstandard are indeed a great fit for SquashFS,
but SquashFS images are created in userspace by squashfs-tools.  The kernel only
needs to be able to decompress them.

(Also, while Zstandard provides very good tradeoffs and will probably become the
preferred algorithm for SquashFS, it's misleading to imply that users won't have
to make decisions anymore.  It does not compress as well as XZ or decompress as
fast as LZ4, except maybe in very carefully crafted benchmarks.)

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 11:32     ` Austin S. Hemmelgarn
  2017-08-10 14:57       ` Austin S. Hemmelgarn
@ 2017-08-10 17:24       ` Eric Biggers
  2017-08-10 17:47         ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Biggers @ 2017-08-10 17:24 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Nick Terrell, Herbert Xu, kernel-team, squashfs-devel,
	linux-btrfs, linux-kernel, linux-crypto

On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-08-10 04:30, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> >>
> >>It can compress at speeds approaching lz4, and quality approaching lzma.
> >
> >Well, for a very loose definition of "approaching", and certainly not at the
> >same time.  I doubt there's a use case for using the highest compression levels
> >in kernel mode --- especially the ones using zstd_opt.h.
> Large data-sets with WORM access patterns and infrequent writes
> immediately come to mind as a use case for the highest compression
> level.
> 
> As a more specific example, the company I work for has a very large
> amount of documentation, and we keep all old versions.  This is all
> stored on a file server which is currently using BTRFS.  Once a
> document is written, it's almost never rewritten, so write
> performance only matters for the first write.  However, they're read
> back pretty frequently, so we need good read performance.  As of
> right now, the system is set to use LZO compression by default, and
> then when a new document is added, the previous version of that
> document gets re-compressed using zlib compression, which actually
> results in pretty significant space savings most of the time.  I
> would absolutely love to use zstd compression with this system with
> the highest compression level, because most people don't care how
> long it takes to write the file out, but they do care how long it
> takes to read a file (even if it's an older version).

This may be a reasonable use case, but note this cannot just be the regular
"zstd" compression setting, since filesystem compression by default must provide
reasonable performance for many different access patterns.  See the patch in
this series which actually adds zstd compression to btrfs; it only uses level 1.
I do not see a patch which adds a higher compression mode.  It would need to be
a special setting like "zstdhc" that users could opt-in to on specific
directories.  It also would need to be compared to simply compressing in
userspace.  In many cases compressing in userspace is probably the better
solution for the use case in question because it works on any filesystem, allows
using any compression algorithm, and if random access is not needed it is
possible to compress each file as a single stream (like a .xz file), which
produces a much better compression ratio than the block-by-block compression
that filesystems have to use.

Note also that LZ4HC is in the kernel source tree currently but no one is using
it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
originally, but at the end of the day no one really wants to use it in kernel
mode.  I'd certainly be interested in actual patches, though.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 17:24       ` Eric Biggers
@ 2017-08-10 17:47         ` Austin S. Hemmelgarn
  2017-08-10 19:24           ` Nick Terrell
  0 siblings, 1 reply; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-10 17:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Nick Terrell, Herbert Xu, kernel-team, squashfs-devel,
	linux-btrfs, linux-kernel, linux-crypto

On 2017-08-10 13:24, Eric Biggers wrote:
> On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
>> On 2017-08-10 04:30, Eric Biggers wrote:
>>> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>>>
>>>> It can compress at speeds approaching lz4, and quality approaching lzma.
>>>
>>> Well, for a very loose definition of "approaching", and certainly not at the
>>> same time.  I doubt there's a use case for using the highest compression levels
>>> in kernel mode --- especially the ones using zstd_opt.h.
>> Large data-sets with WORM access patterns and infrequent writes
>> immediately come to mind as a use case for the highest compression
>> level.
>>
>> As a more specific example, the company I work for has a very large
>> amount of documentation, and we keep all old versions.  This is all
>> stored on a file server which is currently using BTRFS.  Once a
>> document is written, it's almost never rewritten, so write
>> performance only matters for the first write.  However, they're read
>> back pretty frequently, so we need good read performance.  As of
>> right now, the system is set to use LZO compression by default, and
>> then when a new document is added, the previous version of that
>> document gets re-compressed using zlib compression, which actually
>> results in pretty significant space savings most of the time.  I
>> would absolutely love to use zstd compression with this system with
>> the highest compression level, because most people don't care how
>> long it takes to write the file out, but they do care how long it
>> takes to read a file (even if it's an older version).
> 
> This may be a reasonable use case, but note this cannot just be the regular
> "zstd" compression setting, since filesystem compression by default must provide
> reasonable performance for many different access patterns.  See the patch in
> this series which actually adds zstd compression to btrfs; it only uses level 1.
> I do not see a patch which adds a higher compression mode.  It would need to be
> a special setting like "zstdhc" that users could opt-in to on specific
> directories.  It also would need to be compared to simply compressing in
> userspace.  In many cases compressing in userspace is probably the better
> solution for the use case in question because it works on any filesystem, allows
> using any compression algorithm, and if random access is not needed it is
> possible to compress each file as a single stream (like a .xz file), which
> produces a much better compression ratio than the block-by-block compression
> that filesystems have to use.
There has been discussion as well as (I think) initial patches merged 
for support of specifying the compression level for algorithms which 
support multiple compression levels in BTRFS.  I was actually under the 
impression that we had decided to use level 3 as the default for zstd, 
but that apparently isn't the case, and with the benchmark issues, it 
may not be once proper benchmarks are run.

Also, on the note of compressing in userspace, the use case I quoted at 
least can't do that because we have to deal with Windows clients and 
users have to be able to open files directly on said Windows clients.  I 
entirely agree that real archival storage is better off using userspace 
compression, but sometimes real archival storage isn't an option.
> 
> Note also that LZ4HC is in the kernel source tree currently but no one is using
> it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
> originally, but at the end of the day no one really wants to use it in kernel
> mode.  I'd certainly be interested in actual patches, though.
Part of that is the fact that BTRFS is one of the only consumers (AFAIK) 
of this API that can freely choose all aspects of their usage, and the 
consensus here (which I don't agree with I might add) amounts to the 
argument that 'we already have <X> compression with a <Y> compression 
ratio, we don't need more things like that'.  I would personally love to 
see LZ4HC support in BTRFS (based on testing my own use cases, LZ4 is 
more deterministic than LZO for both compression and decompression, and 
most of the non archival usage I have of BTRFS benefits from 
determinism), but there's not any point in me writing up such a patch 
because it's almost certain to get rejected because BTRFS already has 
LZO.  The main reason that zstd is getting considered at all is that the 
quoted benchmarks show clear benefits in decompression speed relative to 
zlib and far better compression ratios than LZO.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 17:47         ` Austin S. Hemmelgarn
@ 2017-08-10 19:24           ` Nick Terrell
  0 siblings, 0 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-10 19:24 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Eric Biggers
  Cc: Herbert Xu, Kernel Team, squashfs-devel@lists.sourceforge.net,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3529 bytes --]

On 8/10/17, 10:48 AM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>On 2017-08-10 13:24, Eric Biggers wrote:
>>On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
>>>On 2017-08-10 04:30, Eric Biggers wrote:
>>>>On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>>>>
>>>>> It can compress at speeds approaching lz4, and quality approaching lzma.
>>>>
>>>> Well, for a very loose definition of "approaching", and certainly not at the
>>>> same time.  I doubt there's a use case for using the highest compression levels
>>>> in kernel mode --- especially the ones using zstd_opt.h.
>>> Large data-sets with WORM access patterns and infrequent writes
>>> immediately come to mind as a use case for the highest compression
>>> level.
>>>
>>> As a more specific example, the company I work for has a very large
>>> amount of documentation, and we keep all old versions.  This is all
>>> stored on a file server which is currently using BTRFS.  Once a
>>> document is written, it's almost never rewritten, so write
>>> performance only matters for the first write.  However, they're read
>>> back pretty frequently, so we need good read performance.  As of
>>> right now, the system is set to use LZO compression by default, and
>>> then when a new document is added, the previous version of that
>>> document gets re-compressed using zlib compression, which actually
>>> results in pretty significant space savings most of the time.  I
>>> would absolutely love to use zstd compression with this system with
>>> the highest compression level, because most people don't care how
>>> long it takes to write the file out, but they do care how long it
>>> takes to read a file (even if it's an older version).
>> 
>> This may be a reasonable use case, but note this cannot just be the regular
>> "zstd" compression setting, since filesystem compression by default must provide
>> reasonable performance for many different access patterns.  See the patch in
>> this series which actually adds zstd compression to btrfs; it only uses level 1.
>> I do not see a patch which adds a higher compression mode.  It would need to be
>> a special setting like "zstdhc" that users could opt-in to on specific
>> directories.  It also would need to be compared to simply compressing in
>> userspace.  In many cases compressing in userspace is probably the better
>> solution for the use case in question because it works on any filesystem, allows
>> using any compression algorithm, and if random access is not needed it is
>> possible to compress each file as a single stream (like a .xz file), which
>> produces a much better compression ratio than the block-by-block compression
>> that filesystems have to use.
> There has been discussion as well as (I think) initial patches merged 
> for support of specifying the compression level for algorithms which 
> support multiple compression levels in BTRFS.  I was actually under the 
> impression that we had decided to use level 3 as the default for zstd, 
> but that apparently isn't the case, and with the benchmark issues, it 
> may not be once proper benchmarks are run.

There are some initial patches to add compression levels to BtrFS [1]. Once
it's ready, we can add compression levels to zstd. The default compression
level in the current patch is 3.

[1] https://lkml.kernel.org/r/20170724172939.24527-1-dsterba@suse.com


ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨èÚ&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~††Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ß£øm

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10  8:30   ` [PATCH v5 2/5] lib: Add zstd modules Eric Biggers
  2017-08-10 11:32     ` Austin S. Hemmelgarn
@ 2017-08-10 17:41     ` Chris Mason
  2017-08-10 19:00       ` Eric Biggers
  2017-08-10 19:25       ` Hugo Mills
  2017-08-10 19:16     ` Nick Terrell
  2 siblings, 2 replies; 21+ messages in thread
From: Chris Mason @ 2017-08-10 17:41 UTC (permalink / raw)
  To: Eric Biggers, Nick Terrell
  Cc: Herbert Xu, kernel-team, squashfs-devel, linux-btrfs,
	linux-kernel, linux-crypto

On 08/10/2017 04:30 AM, Eric Biggers wrote:
> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:

>> The memory reported is the amount of memory the compressor requests.
>>
>> | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
>> |----------|----------|----------|-------|---------|----------|----------|
>> | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
>> | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
>> | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
>> | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
>> | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
>> | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
>> | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
>> | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
>> | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
>> | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
>>
> 
> Theses benchmarks are misleading because they compress the whole file as a
> single stream without resetting the dictionary, which isn't how data will
> typically be compressed in kernel mode.  With filesystem compression the data
> has to be divided into small chunks that can each be decompressed independently.
> That eliminates one of the primary advantages of Zstandard (support for large
> dictionary sizes).

I did btrfs benchmarks of kernel trees and other normal data sets as 
well.  The numbers were in line with what Nick is posting here.  zstd is 
a big win over both lzo and zlib from a btrfs point of view.

It's true Nick's patches only support a single compression level in 
btrfs, but that's because btrfs doesn't have a way to pass in the 
compression ratio.  It could easily be a mount option, it was just 
outside the scope of Nick's initial work.

-chris




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 17:41     ` Chris Mason
@ 2017-08-10 19:00       ` Eric Biggers
  2017-08-10 19:07         ` Chris Mason
  2017-08-10 19:25       ` Hugo Mills
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Biggers @ 2017-08-10 19:00 UTC (permalink / raw)
  To: Chris Mason
  Cc: Nick Terrell, Herbert Xu, kernel-team, squashfs-devel,
	linux-btrfs, linux-kernel, linux-crypto

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> 
> >>The memory reported is the amount of memory the compressor requests.
> >>
> >>| Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
> >>|----------|----------|----------|-------|---------|----------|----------|
> >>| none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
> >>| zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
> >>| zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
> >>| zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
> >>| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
> >>| zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
> >>| zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
> >>| zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
> >>| zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
> >>| zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
> >>| zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
> >>
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.
> 

I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 19:00       ` Eric Biggers
@ 2017-08-10 19:07         ` Chris Mason
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Mason @ 2017-08-10 19:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Nick Terrell, Herbert Xu, kernel-team, squashfs-devel,
	linux-btrfs, linux-kernel, linux-crypto

On 08/10/2017 03:00 PM, Eric Biggers wrote:
> On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
>> On 08/10/2017 04:30 AM, Eric Biggers wrote:
>>> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>
>>>> The memory reported is the amount of memory the compressor requests.
>>>>
>>>> | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
>>>> |----------|----------|----------|-------|---------|----------|----------|
>>>> | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
>>>> | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
>>>> | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
>>>> | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
>>>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
>>>> | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
>>>> | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
>>>> | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
>>>> | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
>>>> | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
>>>> | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
>>>>
>>>
>>> Theses benchmarks are misleading because they compress the whole file as a
>>> single stream without resetting the dictionary, which isn't how data will
>>> typically be compressed in kernel mode.  With filesystem compression the data
>>> has to be divided into small chunks that can each be decompressed independently.
>>> That eliminates one of the primary advantages of Zstandard (support for large
>>> dictionary sizes).
>>
>> I did btrfs benchmarks of kernel trees and other normal data sets as
>> well.  The numbers were in line with what Nick is posting here.
>> zstd is a big win over both lzo and zlib from a btrfs point of view.
>>
>> It's true Nick's patches only support a single compression level in
>> btrfs, but that's because btrfs doesn't have a way to pass in the
>> compression ratio.  It could easily be a mount option, it was just
>> outside the scope of Nick's initial work.
>>
> 
> I am not surprised --- Zstandard is closer to the state of the art, both
> format-wise and implementation-wise, than the other choices in BTRFS.  My point
> is that benchmarks need to account for how much data is compressed at a time.
> This is a common mistake when comparing different compression algorithms; the
> algorithm name and compression level do not tell the whole story.  The
> dictionary size is extremely significant.  No one is going to compress or
> decompress a 200 MB file as a single stream in kernel mode, so it does not make
> sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
> is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
> that it compressed only one page (4 KiB) at a time, but I hope that has been, or
> is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
> amount of data compressed at a time happens to be different between the
> different algorithms, note that BTRFS benchmarks are likely to be measuring that
> as much as the algorithms themselves.)

Btrfs hooks the compression code into the delayed allocation mechanism 
we use to gather large extents for COW.  So if you write 100MB to a 
file, we'll have 100MB to compress at a time (within the limits of the 
amount of pages we allow to collect before forcing it down).

But we want to balance how much memory you might need to uncompress 
during random reads.  So we have an artificial limit of 128KB that we 
send at a time to the compression code.  It's easy to change this, it's 
just a tradeoff made to limit the cost of reading small bits.

It's the same for zlib,lzo and the new zstd patch.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 17:41     ` Chris Mason
  2017-08-10 19:00       ` Eric Biggers
@ 2017-08-10 19:25       ` Hugo Mills
  2017-08-10 19:54         ` Austin S. Hemmelgarn
  2017-08-11 13:20         ` Chris Mason
  1 sibling, 2 replies; 21+ messages in thread
From: Hugo Mills @ 2017-08-10 19:25 UTC (permalink / raw)
  To: Chris Mason
  Cc: Eric Biggers, Nick Terrell, Herbert Xu, kernel-team,
	squashfs-devel, linux-btrfs, linux-kernel, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.

   Could we please not add more mount options? I get that they're easy
to implement, but it's a very blunt instrument. What we tend to see
(with both nodatacow and compress) is people using the mount options,
then asking for exceptions, discovering that they can't do that, and
then falling back to doing it with attributes or btrfs properties.
Could we just start with btrfs properties this time round, and cut out
the mount option part of this cycle.

   In the long run, it'd be great to see most of the btrfs-specific
mount options get deprecated and ultimately removed entirely, in
favour of attributes/properties, where feasible.

   Hugo.

-- 
Hugo Mills             | Klytus! Are your men on the right pills? Maybe you
hugo@... carfax.org.uk | should execute their trainer!
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                      Ming the Merciless, Flash Gordon

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 19:25       ` Hugo Mills
@ 2017-08-10 19:54         ` Austin S. Hemmelgarn
  2017-08-11 13:20         ` Chris Mason
  1 sibling, 0 replies; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-10 19:54 UTC (permalink / raw)
  To: Hugo Mills, Chris Mason, Eric Biggers, Nick Terrell, Herbert Xu,
	kernel-team, squashfs-devel, linux-btrfs, linux-kernel,
	linux-crypto

On 2017-08-10 15:25, Hugo Mills wrote:
> On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
>> On 08/10/2017 04:30 AM, Eric Biggers wrote:
>>>
>>> Theses benchmarks are misleading because they compress the whole file as a
>>> single stream without resetting the dictionary, which isn't how data will
>>> typically be compressed in kernel mode.  With filesystem compression the data
>>> has to be divided into small chunks that can each be decompressed independently.
>>> That eliminates one of the primary advantages of Zstandard (support for large
>>> dictionary sizes).
>>
>> I did btrfs benchmarks of kernel trees and other normal data sets as
>> well.  The numbers were in line with what Nick is posting here.
>> zstd is a big win over both lzo and zlib from a btrfs point of view.
>>
>> It's true Nick's patches only support a single compression level in
>> btrfs, but that's because btrfs doesn't have a way to pass in the
>> compression ratio.  It could easily be a mount option, it was just
>> outside the scope of Nick's initial work.
> 
>     Could we please not add more mount options? I get that they're easy
> to implement, but it's a very blunt instrument. What we tend to see
> (with both nodatacow and compress) is people using the mount options,
> then asking for exceptions, discovering that they can't do that, and
> then falling back to doing it with attributes or btrfs properties.
> Could we just start with btrfs properties this time round, and cut out
> the mount option part of this cycle.
AFAIUI, the intent is to extend the compression type specification for 
both the mount options and the property, not to add a new mount option. 
I think we all agree that `mount -o compress=zstd3` is a lot better than 
`mount -o compress=zstd,compresslevel=3`.
> 
>     In the long run, it'd be great to see most of the btrfs-specific
> mount options get deprecated and ultimately removed entirely, in
> favour of attributes/properties, where feasible.
Are properties set on the root subvolume inherited properly?  Because 
unless they are, we can't get the same semantics.

Two other counter arguments on completely removing BTRFS-specific mount 
options:
1. It's a lot easier and a lot more clearly defined to change things 
that affect global behavior of the FS by a remount than having to 
iterate everything in the FS to update properties.  If I'm disabling 
autodefrag, I'd much rather just `mount -o remount,noautodefrag` than 
`find / -xdev -exec btrfs property set \{\} autodefrag false`, as the 
first will take effect for everything simultaneously and run 
exponentially quicker.
2. There are some things that don't make sense as per-object settings or 
are otherwise nonsensical on objects.  Many, but not all, of the BTRFS 
specific mount options fall into this category IMO, with the notable 
exception of compress[-force], [no]autodefrag, [no]datacow, and 
[no]datasum.  Some other options do make sense as properties of the 
filesystem (commit, flushoncommit, {inode,space}_cache, max_inline, 
metadata_ratio, [no]ssd, and [no]treelog are such options), but many are 
one-off options that affect behavior on mount (like skip_balance, 
clear_cache, nologreplay, norecovery, usebbackuproot, and subvol).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10 19:25       ` Hugo Mills
  2017-08-10 19:54         ` Austin S. Hemmelgarn
@ 2017-08-11 13:20         ` Chris Mason
  2017-08-14 13:30           ` David Sterba
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Mason @ 2017-08-11 13:20 UTC (permalink / raw)
  To: Hugo Mills, Eric Biggers, Nick Terrell, Herbert Xu, kernel-team,
	squashfs-devel, linux-btrfs, linux-kernel, linux-crypto



On 08/10/2017 03:25 PM, Hugo Mills wrote:
> On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
>> On 08/10/2017 04:30 AM, Eric Biggers wrote:
>>>
>>> Theses benchmarks are misleading because they compress the whole file as a
>>> single stream without resetting the dictionary, which isn't how data will
>>> typically be compressed in kernel mode.  With filesystem compression the data
>>> has to be divided into small chunks that can each be decompressed independently.
>>> That eliminates one of the primary advantages of Zstandard (support for large
>>> dictionary sizes).
>>
>> I did btrfs benchmarks of kernel trees and other normal data sets as
>> well.  The numbers were in line with what Nick is posting here.
>> zstd is a big win over both lzo and zlib from a btrfs point of view.
>>
>> It's true Nick's patches only support a single compression level in
>> btrfs, but that's because btrfs doesn't have a way to pass in the
>> compression ratio.  It could easily be a mount option, it was just
>> outside the scope of Nick's initial work.
> 
>     Could we please not add more mount options? I get that they're easy
> to implement, but it's a very blunt instrument. What we tend to see
> (with both nodatacow and compress) is people using the mount options,
> then asking for exceptions, discovering that they can't do that, and
> then falling back to doing it with attributes or btrfs properties.
> Could we just start with btrfs properties this time round, and cut out
> the mount option part of this cycle.
> 
>     In the long run, it'd be great to see most of the btrfs-specific
> mount options get deprecated and ultimately removed entirely, in
> favour of attributes/properties, where feasible.
> 

It's a good point, and as was commented later down I'd just do mount -o 
compress=zstd:3 or something.

But I do prefer properties in general for this.  My big point was just 
that next step is outside of Nick's scope.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-11 13:20         ` Chris Mason
@ 2017-08-14 13:30           ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-08-14 13:30 UTC (permalink / raw)
  To: Chris Mason
  Cc: Hugo Mills, Eric Biggers, Nick Terrell, Herbert Xu, kernel-team,
	squashfs-devel, linux-btrfs, linux-kernel, linux-crypto

On Fri, Aug 11, 2017 at 09:20:10AM -0400, Chris Mason wrote:
> 
> 
> On 08/10/2017 03:25 PM, Hugo Mills wrote:
> > On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> >> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >>>
> >>> Theses benchmarks are misleading because they compress the whole file as a
> >>> single stream without resetting the dictionary, which isn't how data will
> >>> typically be compressed in kernel mode.  With filesystem compression the data
> >>> has to be divided into small chunks that can each be decompressed independently.
> >>> That eliminates one of the primary advantages of Zstandard (support for large
> >>> dictionary sizes).
> >>
> >> I did btrfs benchmarks of kernel trees and other normal data sets as
> >> well.  The numbers were in line with what Nick is posting here.
> >> zstd is a big win over both lzo and zlib from a btrfs point of view.
> >>
> >> It's true Nick's patches only support a single compression level in
> >> btrfs, but that's because btrfs doesn't have a way to pass in the
> >> compression ratio.  It could easily be a mount option, it was just
> >> outside the scope of Nick's initial work.
> > 
> >     Could we please not add more mount options? I get that they're easy
> > to implement, but it's a very blunt instrument. What we tend to see
> > (with both nodatacow and compress) is people using the mount options,
> > then asking for exceptions, discovering that they can't do that, and
> > then falling back to doing it with attributes or btrfs properties.
> > Could we just start with btrfs properties this time round, and cut out
> > the mount option part of this cycle.
> > 
> >     In the long run, it'd be great to see most of the btrfs-specific
> > mount options get deprecated and ultimately removed entirely, in
> > favour of attributes/properties, where feasible.
> > 
> 
> It's a good point, and as was commented later down I'd just do mount -o 
> compress=zstd:3 or something.

We've solved that already, here's a proposed patch that extends current
mount options,

https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg66248.html

and a patch that could be backported to older kernels so the new mount
options do not break mounts on older kernels with the new syntax

https://patchwork.kernel.org/patch/9845697/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/5] lib: Add zstd modules
  2017-08-10  8:30   ` [PATCH v5 2/5] lib: Add zstd modules Eric Biggers
  2017-08-10 11:32     ` Austin S. Hemmelgarn
  2017-08-10 17:41     ` Chris Mason
@ 2017-08-10 19:16     ` Nick Terrell
  2 siblings, 0 replies; 21+ messages in thread
From: Nick Terrell @ 2017-08-10 19:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, Kernel Team, squashfs-devel@lists.sourceforge.net,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-crypto@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6015 bytes --]

On 8/10/17, 1:30 AM, "Eric Biggers" <ebiggers3@gmail.com> wrote:
> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>
>> It can compress at speeds approaching lz4, and quality approaching lzma.
> 
> Well, for a very loose definition of "approaching", and certainly not at the
> same time.  I doubt there's a use case for using the highest compression levels
> in kernel mode --- especially the ones using zstd_opt.h.
> 
>> 
>> The code was ported from the upstream zstd source repository.
> 
> What version?

zstd-1.1.4 with patches applied from upstream. I'll include it in the
next patch version.

>> `linux/zstd.h` header was modified to match linux kernel style.
>> The cross-platform and allocation code was stripped out. Instead zstd
>> requires the caller to pass a preallocated workspace. The source files
>> were clang-formatted [1] to match the Linux Kernel style as much as
>> possible. 
> 
> It would be easier to compare to the upstream version if it was not all
> reformatted.  There is a chance that bugs were introduced by Linux-specific
> changes, and it would be nice if they could be easily reviewed.  (Also I don't
> know what clang-format settings you used, but there are still a lot of
> differences from the Linux coding style.)

The clang-format settings I used are available in the zstd repo [1]. I left
the line length long, since it looked terrible otherwise.I set up a branch
in my zstd GitHub fork called "original-formatted" [2]. I've taken the source
I based the kernel patches off of [3] and ran clang-format without any other
changes. If you have any suggestions to improve the clang-formatting
please let me know.

>> 
>> I benchmarked zstd compression as a special character device. I ran zstd
>> and zlib compression at several levels, as well as performing no
>> compression, which measure the time spent copying the data to kernel space.
>> Data is passed to the compresser 4096 B at a time. The benchmark file is
>> located in the upstream zstd source repository under
>> `contrib/linux-kernel/zstd_compress_test.c` [2].
>> 
>> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
>> 211,988,480 B large. Run the following commands for the benchmark:
>> 
>>     sudo modprobe zstd_compress_test
>>     sudo mknod zstd_compress_test c 245 0
>>     sudo cp silesia.tar zstd_compress_test
>> 
>> The time is reported by the time of the userland `cp`.
>> The MB/s is computed with
>> 
>>     1,536,217,008 B / time(buffer size, hash)
>> 
>> which includes the time to copy from userland.
>> The Adjusted MB/s is computed with
>> 
>>     1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
>> 
>> The memory reported is the amount of memory the compressor requests.
>> 
>> | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
>> |----------|----------|----------|-------|---------|----------|----------|
>> | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
>> | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
>> | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
>> | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
>> | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
>> | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
>> | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
>> | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
>> | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
>> | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
>> 
> 
> Theses benchmarks are misleading because they compress the whole file as a
> single stream without resetting the dictionary, which isn't how data will
> typically be compressed in kernel mode.  With filesystem compression the data
> has to be divided into small chunks that can each be decompressed independently.
> That eliminates one of the primary advantages of Zstandard (support for large
> dictionary sizes).

This benchmark isn't meant to be representative of a filesystem scenario. I
wanted to show off zstd without anything else going on. Even in filesystems
where the data is chunked, zstd uses the whole chunk as the window (128 KB
in BtrFS and SquashFS by default), where zlib uses 32 KB. I have benchmarks
for BtrFS and SquashFS in their respective patches [4][5], and I've copied
the BtrFS table below (which was run with 2 threads).

| Method  | Ratio | Compression MB/s | Decompression speed |
|---------|-------|------------------|---------------------|
| None    |  0.99 |              504 |                 686 |
| lzo     |  1.66 |              398 |                 442 |
| zlib    |  2.58 |               65 |                 241 |
| zstd 1  |  2.57 |              260 |                 383 |
| zstd 3  |  2.71 |              174 |                 408 |
| zstd 6  |  2.87 |               70 |                 398 |
| zstd 9  |  2.92 |               43 |                 406 |
| zstd 12 |  2.93 |               21 |                 408 |
| zstd 15 |  3.01 |               11 |                 354 |

> 
> Eric
> 

[1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/lib/zstd/.clang-format
[2] https://github.com/terrelln/zstd/tree/original-formatted/contrib/linux-kernel/original-formatted
[3] https://github.com/facebook/zstd/commit/b1c6bb87022404da56cc3015c85494c0ffcec520
[4] https://lkml.kernel.org/r/20170810023902.3231324-1-terrelln@fb.com
[5] https://lkml.kernel.org/r/20170810024236.3243941-1-terrelln@fb.com

ÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨èÚ&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~††Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ß£øm

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-08-14 13:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-10  2:35 [PATCH v5 0/5] Add xxhash and zstd modules Nick Terrell
2017-08-10  2:35 ` [PATCH v5 1/5] lib: Add xxhash module Nick Terrell
2017-08-10  2:39 ` [PATCH v5 3/5] btrfs: Add zstd support Nick Terrell
2017-08-11  2:13   ` Adam Borowski
2017-08-11  3:23     ` Nick Terrell
2017-08-11 11:45   ` Austin S. Hemmelgarn
     [not found] ` <20170810023553.3200875-3-terrelln@fb.com>
2017-08-10  8:30   ` [PATCH v5 2/5] lib: Add zstd modules Eric Biggers
2017-08-10 11:32     ` Austin S. Hemmelgarn
2017-08-10 14:57       ` Austin S. Hemmelgarn
2017-08-10 17:36         ` Eric Biggers
2017-08-10 17:24       ` Eric Biggers
2017-08-10 17:47         ` Austin S. Hemmelgarn
2017-08-10 19:24           ` Nick Terrell
2017-08-10 17:41     ` Chris Mason
2017-08-10 19:00       ` Eric Biggers
2017-08-10 19:07         ` Chris Mason
2017-08-10 19:25       ` Hugo Mills
2017-08-10 19:54         ` Austin S. Hemmelgarn
2017-08-11 13:20         ` Chris Mason
2017-08-14 13:30           ` David Sterba
2017-08-10 19:16     ` Nick Terrell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).