From: Omar Sandoval <osandov@osandov.com>
To: linux-btrfs@vger.kernel.org
Cc: kernel-team@fb.com, linux-fsdevel@vger.kernel.org,
Al Viro <viro@zeniv.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-api@vger.kernel.org
Subject: Re: [PATCH v10 06/10] btrfs-progs: receive: encoded_write fallback to explicit decode and write
Date: Wed, 18 Aug 2021 11:07:43 -0700 [thread overview]
Message-ID: <YR1Mb0i6Fk1LggJb@relinquished.localdomain> (raw)
In-Reply-To: <27ad30578c6e4347ff4161183c55ba6dee2e9227.1629234282.git.osandov@fb.com>
On Tue, Aug 17, 2021 at 02:06:52PM -0700, Omar Sandoval wrote:
> From: Boris Burkov <boris@bur.io>
>
> An encoded_write can fail if the file system it is being applied to does
> not support encoded writes or if it can't find enough contiguous space
> to accommodate the encoded extent. In those cases, we can likely still
> process an encoded_write by explicitly decoding the data and doing a
> normal write.
>
> Add the necessary fallback path for decoding data compressed with zlib,
> lzo, or zstd. zlib and zstd have reusable decoding context data
> structures which we cache in the receive context so that we don't have
> to recreate them on every encoded_write.
>
> Finally, add a command line flag for force-decompress which causes
> receive to always use the fallback path rather than first attempting the
> encoded write.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> Documentation/btrfs-receive.asciidoc | 4 +
> cmds/receive.c | 266 ++++++++++++++++++++++++++-
> 2 files changed, 261 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/btrfs-receive.asciidoc b/Documentation/btrfs-receive.asciidoc
> index e4c4d2c0..354a71dc 100644
> --- a/Documentation/btrfs-receive.asciidoc
> +++ b/Documentation/btrfs-receive.asciidoc
> @@ -60,6 +60,10 @@ By default the mountpoint is searched in '/proc/self/mounts'.
> If '/proc' is not accessible, eg. in a chroot environment, use this option to
> tell us where this filesystem is mounted.
>
> +--force-decompress::
> +if the stream contains compressed data (see '--compressed-data' in
> +`btrfs-send`(8)), always decompress it instead of writing it with encoded I/O.
> +
> --dump::
> dump the stream metadata, one line per operation
> +
> diff --git a/cmds/receive.c b/cmds/receive.c
> index b43c298f..7506f992 100644
> --- a/cmds/receive.c
> +++ b/cmds/receive.c
> @@ -40,6 +40,10 @@
> #include <sys/xattr.h>
> #include <uuid/uuid.h>
>
> +#include <lzo/lzo1x.h>
> +#include <zlib.h>
> +#include <zstd.h>
> +
> #include "kernel-shared/ctree.h"
> #include "ioctl.h"
> #include "cmds/commands.h"
> @@ -79,6 +83,12 @@ struct btrfs_receive
> struct subvol_uuid_search sus;
>
> int honor_end_cmd;
> +
> + int force_decompress;
> +
> + /* Reuse stream objects for encoded_write decompression fallback */
> + ZSTD_DStream *zstd_dstream;
> + z_stream *zlib_stream;
> };
>
> static int finish_subvol(struct btrfs_receive *rctx)
> @@ -989,9 +999,222 @@ static int process_update_extent(const char *path, u64 offset, u64 len,
> return 0;
> }
>
> +static int decompress_zlib(struct btrfs_receive *rctx, const char *encoded_data,
> + u64 encoded_len, char *unencoded_data,
> + u64 unencoded_len)
> +{
> + bool init = false;
> + int ret;
> +
> + if (!rctx->zlib_stream) {
> + init = true;
> + rctx->zlib_stream = malloc(sizeof(z_stream));
> + if (!rctx->zlib_stream) {
> + error("failed to allocate zlib stream %m");
> + return -ENOMEM;
> + }
> + }
> + rctx->zlib_stream->next_in = (void *)encoded_data;
> + rctx->zlib_stream->avail_in = encoded_len;
> + rctx->zlib_stream->next_out = (void *)unencoded_data;
> + rctx->zlib_stream->avail_out = unencoded_len;
> +
> + if (init) {
> + rctx->zlib_stream->zalloc = Z_NULL;
> + rctx->zlib_stream->zfree = Z_NULL;
> + rctx->zlib_stream->opaque = Z_NULL;
> + ret = inflateInit(rctx->zlib_stream);
> + } else {
> + ret = inflateReset(rctx->zlib_stream);
> + }
> + if (ret != Z_OK) {
> + error("zlib inflate init failed: %d", ret);
> + return -EIO;
> + }
> +
> + while (rctx->zlib_stream->avail_in > 0 &&
> + rctx->zlib_stream->avail_out > 0) {
> + ret = inflate(rctx->zlib_stream, Z_FINISH);
> + if (ret == Z_STREAM_END) {
> + break;
> + } else if (ret != Z_OK) {
> + error("zlib inflate failed: %d", ret);
> + return -EIO;
> + }
> + }
> + return 0;
> +}
> +
> +static int decompress_zstd(struct btrfs_receive *rctx, const char *encoded_buf,
> + u64 encoded_len, char *unencoded_buf,
> + u64 unencoded_len)
> +{
> + ZSTD_inBuffer in_buf = {
> + .src = encoded_buf,
> + .size = encoded_len
> + };
> + ZSTD_outBuffer out_buf = {
> + .dst = unencoded_buf,
> + .size = unencoded_len
> + };
> + size_t ret;
> +
> + if (!rctx->zstd_dstream) {
> + rctx->zstd_dstream = ZSTD_createDStream();
> + if (!rctx->zstd_dstream) {
> + error("failed to create zstd dstream");
> + return -ENOMEM;
> + }
> + }
> + ret = ZSTD_initDStream(rctx->zstd_dstream);
> + if (ZSTD_isError(ret)) {
> + error("failed to init zstd stream: %s", ZSTD_getErrorName(ret));
> + return -EIO;
> + }
> + while (in_buf.pos < in_buf.size && out_buf.pos < out_buf.size) {
> + ret = ZSTD_decompressStream(rctx->zstd_dstream, &out_buf, &in_buf);
> + if (ret == 0) {
> + break;
> + } else if (ZSTD_isError(ret)) {
> + error("failed to decompress zstd stream: %s",
> + ZSTD_getErrorName(ret));
> + return -EIO;
> + }
> + }
> + return 0;
> +}
> +
> +static int decompress_lzo(const char *encoded_data, u64 encoded_len,
> + char *unencoded_data, u64 unencoded_len,
> + unsigned int page_size)
> +{
> + uint32_t total_len;
> + size_t in_pos, out_pos;
> +
> + if (encoded_len < 4) {
> + error("lzo header is truncated");
> + return -EIO;
> + }
> + memcpy(&total_len, encoded_data, 4);
> + total_len = le32toh(total_len);
> + if (total_len > encoded_len) {
> + error("lzo header is invalid");
> + return -EIO;
> + }
> +
> + in_pos = 4;
> + out_pos = 0;
> + while (in_pos < total_len && out_pos < unencoded_len) {
> + size_t page_remaining;
> + uint32_t src_len;
> + lzo_uint dst_len;
> + int ret;
> +
> + page_remaining = -in_pos % page_size;
> + if (page_remaining < 4) {
> + if (total_len - in_pos <= page_remaining)
> + break;
> + in_pos += page_remaining;
> + }
> +
> + if (total_len - in_pos < 4) {
> + error("lzo segment header is truncated");
> + return -EIO;
> + }
> +
> + memcpy(&src_len, encoded_data + in_pos, 4);
> + src_len = le32toh(src_len);
> + in_pos += 4;
> + if (src_len > total_len - in_pos) {
> + error("lzo segment header is invalid");
> + return -EIO;
> + }
> +
> + dst_len = page_size;
> + ret = lzo1x_decompress_safe((void *)(encoded_data + in_pos),
> + src_len,
> + (void *)(unencoded_data + out_pos),
> + &dst_len, NULL);
> + if (ret != LZO_E_OK) {
> + error("lzo1x_decompress_safe failed: %d", ret);
> + return -EIO;
> + }
> +
> + in_pos += src_len;
> + out_pos += dst_len;
> + }
> + return 0;
> +}
> +
> +static int decompress_and_write(struct btrfs_receive *rctx,
> + const char *encoded_data, u64 offset,
> + u64 encoded_len, u64 unencoded_file_len,
> + u64 unencoded_len, u64 unencoded_offset,
> + u32 compression)
> +{
> + int ret = 0;
> + size_t pos;
> + ssize_t w;
> + char *unencoded_data;
> + int page_shift;
> +
> + unencoded_data = calloc(unencoded_len, 1);
> + if (!unencoded_data) {
> + error("allocating space for unencoded data failed: %m");
> + return -errno;
> + }
> +
> + switch (compression) {
> + case BTRFS_ENCODED_IO_COMPRESSION_ZLIB:
> + ret = decompress_zlib(rctx, encoded_data, encoded_len,
> + unencoded_data, unencoded_len);
> + if (ret)
> + goto out;
> + break;
> + case BTRFS_ENCODED_IO_COMPRESSION_ZSTD:
> + ret = decompress_zstd(rctx, encoded_data, encoded_len,
> + unencoded_data, unencoded_len);
> + if (ret)
> + goto out;
> + break;
> + case BTRFS_ENCODED_IO_COMPRESSION_LZO_4K:
> + case BTRFS_ENCODED_IO_COMPRESSION_LZO_8K:
> + case BTRFS_ENCODED_IO_COMPRESSION_LZO_16K:
> + case BTRFS_ENCODED_IO_COMPRESSION_LZO_32K:
> + case BTRFS_ENCODED_IO_COMPRESSION_LZO_64K:
> + page_shift = compression - BTRFS_ENCODED_IO_COMPRESSION_LZO_4K + 12;
> + ret = decompress_lzo(encoded_data, encoded_len, unencoded_data,
> + unencoded_len, 1U << page_shift);
> + if (ret)
> + goto out;
> + break;
> + default:
> + error("unknown compression: %d", compression);
> + ret = -EOPNOTSUPP;
> + goto out;
> + }
> +
> + pos = unencoded_offset;
> + while (pos < unencoded_file_len) {
> + w = pwrite(rctx->write_fd, unencoded_data + pos,
> + unencoded_file_len - pos, offset);
> + if (w < 0) {
> + ret = -errno;
> + error("writing unencoded data failed: %m");
> + goto out;
> + }
> + pos += w;
> + offset += w;
> + }
> +out:
> + free(unencoded_data);
> + return ret;
> +}
> +
> static int process_encoded_write(const char *path, const void *data, u64 offset,
> - u64 len, u64 unencoded_file_len, u64 unencoded_len,
> - u64 unencoded_offset, u32 compression, u32 encryption, void *user)
> + u64 len, u64 unencoded_file_len,
> + u64 unencoded_len, u64 unencoded_offset,
> + u32 compression, u32 encryption, void *user)
> {
> int ret;
> struct btrfs_receive *rctx = user;
> @@ -1007,6 +1230,7 @@ static int process_encoded_write(const char *path, const void *data, u64 offset,
> .compression = compression,
> .encryption = encryption,
> };
> + bool encoded_write = !rctx->force_decompress;
>
> if (encryption) {
> error("encoded_write: encryption not supported");
> @@ -1023,13 +1247,21 @@ static int process_encoded_write(const char *path, const void *data, u64 offset,
> if (ret < 0)
> return ret;
>
> - ret = ioctl(rctx->write_fd, BTRFS_IOC_ENCODED_WRITE, &encoded);
> - if (ret < 0) {
> - ret = -errno;
> - error("encoded_write: writing to %s failed: %m", path);
> - return ret;
> + if (encoded_write) {
> + ret = ioctl(rctx->write_fd, BTRFS_IOC_ENCODED_WRITE, &encoded);
> + if (ret >= 0)
> + return 0;
> + /* Fall back for these errors, fail hard for anything else. */
> + if (errno != ENOSPC && errno != EOPNOTSUPP && errno != EINVAL) {
Just caught something that I missed in the conversion, this needs to be
ENOTTY instead of EOPNOTSUPP.
next prev parent reply other threads:[~2021-08-18 18:07 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-17 21:06 [PATCH v10 00/14] btrfs: add ioctls and send/receive support for reading/writing compressed data Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 01/14] fs: export rw_verify_area() Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 02/14] fs: export variant of generic_write_checks without iov_iter Omar Sandoval
2021-08-20 7:59 ` Nikolay Borisov
2021-08-20 17:31 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 03/14] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Omar Sandoval
2021-08-20 8:08 ` Nikolay Borisov
2021-08-20 17:37 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 04/14] btrfs: add ram_bytes and offset to btrfs_ordered_extent Omar Sandoval
2021-08-20 8:34 ` Nikolay Borisov
2021-08-20 17:43 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 05/14] btrfs: support different disk extent size for delalloc Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 06/14] btrfs: optionally extend i_size in cow_file_range_inline() Omar Sandoval
2021-08-20 8:51 ` Nikolay Borisov
2021-08-20 9:13 ` Qu Wenruo
2021-08-20 18:11 ` Omar Sandoval
2021-08-21 1:11 ` Qu Wenruo
2021-08-23 18:16 ` Omar Sandoval
2021-08-23 23:32 ` Qu Wenruo
2021-08-23 23:46 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 07/14] btrfs: add definitions + documentation for encoded I/O ioctls Omar Sandoval
2021-08-20 8:56 ` Nikolay Borisov
2021-08-20 17:48 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 08/14] btrfs: add BTRFS_IOC_ENCODED_READ Omar Sandoval
2021-08-20 12:30 ` Nikolay Borisov
2021-08-20 17:58 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 09/14] btrfs: add BTRFS_IOC_ENCODED_WRITE Omar Sandoval
2021-08-20 13:44 ` Nikolay Borisov
2021-08-20 17:59 ` Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 10/14] btrfs: add send stream v2 definitions Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 11/14] btrfs: send: write larger chunks when using stream v2 Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 12/14] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2 Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 13/14] btrfs: send: send compressed extents with encoded writes Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 14/14] btrfs: send: enable support for stream v2 and compressed writes Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 01/10] btrfs-progs: receive: support v2 send stream larger tlv_len Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 02/10] btrfs-progs: receive: dynamically allocate sctx->read_buf Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 03/10] btrfs-progs: receive: support v2 send stream DATA tlv format Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 04/10] btrfs-progs: receive: add send stream v2 cmds and attrs to send.h Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 05/10] btrfs-progs: receive: process encoded_write commands Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 06/10] btrfs-progs: receive: encoded_write fallback to explicit decode and write Omar Sandoval
2021-08-18 18:07 ` Omar Sandoval [this message]
2021-08-17 21:06 ` [PATCH v10 07/10] btrfs-progs: receive: process fallocate commands Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 08/10] btrfs-progs: receive: process setflags ioctl commands Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 09/10] btrfs-progs: send: stream v2 ioctl flags Omar Sandoval
2021-08-17 21:06 ` [PATCH v10 10/10] btrfs-progs: receive: add tests for basic encoded_write send/receive Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YR1Mb0i6Fk1LggJb@relinquished.localdomain \
--to=osandov@osandov.com \
--cc=kernel-team@fb.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).