From: Gao Xiang <xiang@kernel.org>
To: Nithurshen <nithurshen.dev@gmail.com>
Cc: linux-erofs@lists.ozlabs.org, hsiangkao@linux.alibaba.com,
xiang@kernel.org
Subject: Re: [PATCH 1/2] fsck.erofs: introduce multi-threaded decompression with static batching
Date: Sun, 7 Jun 2026 09:50:39 +0800 [thread overview]
Message-ID: <aiTObzdT2gThkwln@debian> (raw)
In-Reply-To: <20260523003757.13078-2-nithurshen.dev@gmail.com>
Hi Nithurshen,
On Sat, May 23, 2026 at 06:07:56AM +0530, Nithurshen wrote:
> Currently, fsck.erofs extracts files synchronously. When decompressing
> heavily packed images (like LZ4HC with 4K pclusters), the main thread
> spends the majority of its time blocked on a combination of synchronous
> vfs_write() syscalls and LZ4_decompress_safe(), bottlenecking overall
> extraction speed.
>
> This patch introduces a scalable, multi-threaded decompression framework
> using the existing erofs_workqueue infrastructure to decouple compute
> from the main thread's I/O.
>
> To prevent massive scheduling overhead (futex contention) where worker
> threads spend more CPU time waking up than actually decompressing small
> 4KB clusters, this implementation introduces a batching context. The
> main thread collects an array of sequential pclusters (temporarily hard-
> capped at Z_EROFS_PCLUSTER_BATCH_SIZE = 32) before submitting a single
> erofs_work unit.
>
> Key details of this implementation:
> - The worker pool is dynamically sized based on available system CPUs.
> - Decompression tasks take strict ownership of the raw and output
> buffers (safely tracking memory via a `free_out` flag) to prevent
> data races and memory leaks.
> - Output buffers are explicitly zero-initialized via calloc() to
> prevent trailing garbage bytes from leaking into extracted files.
> - Tail-end packed fragments are processed synchronously by the main
> thread, as their minimal overhead does not benefit from asynchronous
> offloading.
>
> Signed-off-by: Nithurshen <nithurshen.dev@gmail.com>
> ---
> fsck/main.c | 234 +++++++++++++++++----------------------
> include/erofs/internal.h | 18 ++-
> lib/data.c | 203 +++++++++++++++++++++++----------
> 3 files changed, 265 insertions(+), 190 deletions(-)
>
> diff --git a/fsck/main.c b/fsck/main.c
> index 16cc627..d7810e8 100644
> --- a/fsck/main.c
> +++ b/fsck/main.c
> @@ -8,14 +8,18 @@
> #include <time.h>
> #include <utime.h>
> #include <unistd.h>
> +#include <pthread.h>
> #include <sys/stat.h>
> #include "erofs/print.h"
> #include "erofs/decompress.h"
> #include "erofs/dir.h"
> #include "erofs/xattr.h"
> +#include "erofs/workqueue.h"
> #include "../lib/compressor.h"
> #include "../lib/liberofs_compress.h"
>
> +extern struct erofs_workqueue erofs_wq;
> +
> static int erofsfsck_check_inode(erofs_nid_t pnid, erofs_nid_t nid);
>
> struct erofsfsck_dirstack {
> @@ -505,135 +509,95 @@ out:
>
> static int erofs_verify_inode_data(struct erofs_inode *inode, int outfd)
> {
> - struct erofs_map_blocks map = {
> - .buf = __EROFS_BUF_INITIALIZER,
> - };
> - bool needdecode = fsckcfg.check_decomp && !erofs_is_packed_inode(inode);
> - int ret = 0;
> - bool compressed;
> - erofs_off_t pos = 0;
> - u64 pchunk_len = 0;
> - unsigned int raw_size = 0, buffer_size = 0;
> - char *raw = NULL, *buffer = NULL;
> -
> - erofs_dbg("verify data chunk of nid(%llu): type(%d)",
> - inode->nid | 0ULL, inode->datalayout);
> -
> - compressed = erofs_inode_is_data_compressed(inode->datalayout);
> - while (pos < inode->i_size) {
> - unsigned int alloc_rawsize;
> -
> - map.m_la = pos;
> - ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_FIEMAP);
> - if (ret)
> - goto out;
> -
> - if (!compressed && map.m_llen != map.m_plen) {
> - erofs_err("broken chunk length m_la %" PRIu64 " m_llen %" PRIu64 " m_plen %" PRIu64,
> - map.m_la, map.m_llen, map.m_plen);
> - ret = -EFSCORRUPTED;
> - goto out;
> - }
> -
> - /* the last lcluster can be divided into 3 parts */
> - if (map.m_la + map.m_llen > inode->i_size)
> - map.m_llen = inode->i_size - map.m_la;
> -
> - pchunk_len += map.m_plen;
> - pos += map.m_llen;
> -
> - /* should skip decomp? */
> - if (map.m_la >= inode->i_size || !needdecode)
> - continue;
> + struct erofs_map_blocks map = { .buf = __EROFS_BUF_INITIALIZER };
> + bool needdecode = fsckcfg.check_decomp && !erofs_is_packed_inode(inode);
> + int ret = 0;
> + bool compressed = erofs_inode_is_data_compressed(inode->datalayout);
> + erofs_off_t pos = 0;
> + u64 pchunk_len = 0;
> +
> + struct z_erofs_read_ctx ctx = {
> + .pending_tasks = 0,
> + .final_err = 0,
> + .outfd = outfd,
> + .free_out = true,
> + .current_task = NULL
> + };
> + pthread_mutex_init(&ctx.lock, NULL);
> + pthread_cond_init(&ctx.cond, NULL);
Please avoid barely used pthread interface.
For example, erofs_mutex_t is needed for erofs-utils
instead of pthread_mutex_t;
pthread_cond_init needs to be replaced by an abstract
too.
Also, I may forget to mention that, your new
implementation should be workable for both EROFS_MT_ENABLED
and !EROFS_MT_ENABLED, not only EROFS_MT_ENABLED.
> +
> + erofs_dbg("verify data chunk of nid(%llu): type(%d)", inode->nid | 0ULL, inode->datalayout);
> +
> + while (pos < inode->i_size) {
> + map.m_la = pos;
> + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_FIEMAP);
> + if (ret) goto out;
> +
> + if (map.m_la + map.m_llen > inode->i_size)
> + map.m_llen = inode->i_size - map.m_la;
> +
> + pchunk_len += map.m_plen;
> + pos += map.m_llen;
> +
> + if (map.m_la >= inode->i_size || !needdecode)
> + continue;
> +
> + if (outfd >= 0 && !(map.m_flags & EROFS_MAP_MAPPED)) {
> + ret = lseek(outfd, map.m_llen, SEEK_CUR);
> + if (ret < 0) {
> + ret = -errno;
> + goto out;
> + }
> + continue;
> + }
> +
> + if (compressed) {
> + char *raw = malloc(map.m_plen);
> + size_t buffer_size = map.m_llen > erofs_blksiz(inode->sbi) ? map.m_llen : erofs_blksiz(inode->sbi);
> + char *buffer = calloc(1, buffer_size);
> +
> + if (!raw || !buffer) {
> + free(raw); free(buffer);
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + ret = z_erofs_read_one_data(inode, &map, raw, buffer, 0, map.m_llen, false, map.m_la, &ctx);
> + if (ret) {
> + /* DO NOT free(raw) or free(buffer) here. z_erofs_read_one_data took ownership! */
> + goto out;
> + }
> + } else {
> + char *raw = calloc(1, map.m_llen);
> + ret = erofs_read_one_data(inode, &map, raw, 0, map.m_llen);
> + if (ret >= 0 && outfd >= 0)
> + pwrite(outfd, raw, map.m_llen, map.m_la);
> + free(raw);
> + if (ret) goto out;
> + }
I think the file read should be multithreaded too, especially the inode
could be large.
Thanks,
Gao Xiang
next prev parent reply other threads:[~2026-06-07 1:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-23 0:37 [PATCH 0/2] fsck.erofs: introduce multi-threaded decompression Nithurshen
2026-05-23 0:37 ` [PATCH 1/2] fsck.erofs: introduce multi-threaded decompression with static batching Nithurshen
2026-06-07 1:50 ` Gao Xiang [this message]
2026-05-23 0:37 ` [PATCH 2/2] fsck.erofs: implement dynamic pcluster batching based on algorithm complexity Nithurshen
2026-06-07 1:52 ` Gao Xiang
2026-06-08 5:07 ` [PATCH v2 0/2] fsck.erofs: add multi-threaded decompression Nithurshen
2026-06-08 5:07 ` [PATCH v2 1/2] fsck.erofs: introduce multi-threaded decompression with static batching Nithurshen
2026-06-08 6:25 ` Gao Xiang
2026-06-08 5:07 ` [PATCH v2 2/2] fsck.erofs: implement algorithm-aware pcluster batching Nithurshen
2026-06-09 9:17 ` [PATCH v3 0/2] fsck.erofs: add multi-threaded decompression Nithurshen
2026-06-09 9:17 ` [PATCH v3 1/2] fsck.erofs: introduce multi-threaded decompression with static batching Nithurshen
2026-06-09 9:17 ` [PATCH v3 2/2] fsck.erofs: implement algorithm-aware pcluster batching Nithurshen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiTObzdT2gThkwln@debian \
--to=xiang@kernel.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=linux-erofs@lists.ozlabs.org \
--cc=nithurshen.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox