public inbox for linux-erofs@ozlabs.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode
@ 2026-04-14 19:10 Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 1/4] erofs-utils: lib: remove redundant if check Lucas Karpinski
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-14 19:10 UTC (permalink / raw)
  To: linux-erofs; +Cc: jcalmels, Lucas Karpinski

Currently, erofs-utils supports backing blobs for multi-image setups.  This
implements the FULLDATA import which allows for the merging of multiple
source images into a single self-contained erofs image.

To optimize the rebuild process, erofs_io_xcopy() is used to leverage the
copy_file_range(2) if available. This bypasses userspace buffering and
enables kernel side data transfers.
 
Verification: Built same image with default rebuild and rebuild with
FULLDATA. Then ran F-i-f/tdiff comparing the two.

changes in v3:
- adhere to uniaddress semantics.
- take advantage of existing infrastructure which allows us to drop a
  significant amount of complexity + code.

changes in v2:
- reworked erofs_rebuild_load_trees_full into
  erofs_mkfs_rebuild_load_trees.
- removed --merge option (use --clean=data instead).
- updated man.

Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
---
Lucas Karpinski (4):
      erofs-utils: lib: remove redundant if check
      erofs-utils: lib: add helper function erofs_uuid_unparse_as_tag
      erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images
      erofs-utils: manpages: update to reflect fulldata support

 include/erofs/internal.h |  3 +++
 lib/inode.c              | 39 ++++++++++++++++++++-------
 lib/liberofs_uuid.h      |  1 +
 lib/rebuild.c            | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/uuid_unparse.c       | 16 ++++++++++-
 man/mkfs.erofs.1         |  7 ++++-
 mkfs/main.c              | 16 ++++-------
 7 files changed, 130 insertions(+), 22 deletions(-)
---
base-commit: 58c3351d5b4b0fc5e4a05d2200c1cf9f85902899
change-id: 20260220-merge-fs-e6231a3a3a1c

Best regards,
-- 
Lucas Karpinski <lkarpinski@nvidia.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/4] erofs-utils: lib: remove redundant if check
  2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
@ 2026-04-14 19:10 ` Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 2/4] erofs-utils: lib: add helper function erofs_uuid_unparse_as_tag Lucas Karpinski
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-14 19:10 UTC (permalink / raw)
  To: linux-erofs; +Cc: jcalmels, Lucas Karpinski

Remove the if check since erofs_set_inode_fingerprint is already protected
by the same if statement.

Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
---
 lib/inode.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/inode.c b/lib/inode.c
index c932981a..2f78d9b8 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -2018,11 +2018,9 @@ static int erofs_mkfs_begin_nondirectory(const struct erofs_mkfs_btctx *btctx,
 			goto out;
 		}
 
-		if (S_ISREG(inode->i_mode) && inode->i_size) {
-			ret = erofs_set_inode_fingerprint(inode, ctx.fd, ctx.fpos);
-			if (ret < 0)
-				return ret;
-		}
+		ret = erofs_set_inode_fingerprint(inode, ctx.fd, ctx.fpos);
+		if (ret < 0)
+			return ret;
 
 		if (inode->sbi->available_compr_algs &&
 		    erofs_file_is_compressible(im, inode)) {

-- 
Git-155)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/4] erofs-utils: lib: add helper function erofs_uuid_unparse_as_tag
  2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 1/4] erofs-utils: lib: remove redundant if check Lucas Karpinski
@ 2026-04-14 19:10 ` Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images Lucas Karpinski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-14 19:10 UTC (permalink / raw)
  To: linux-erofs; +Cc: jcalmels, Lucas Karpinski

Add helper function for converting uuid to tag.

Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
---
 lib/liberofs_uuid.h |  1 +
 lib/uuid_unparse.c  | 16 +++++++++++++++-
 mkfs/main.c         | 11 +----------
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/lib/liberofs_uuid.h b/lib/liberofs_uuid.h
index e8bb1be9..1a2a067a 100644
--- a/lib/liberofs_uuid.h
+++ b/lib/liberofs_uuid.h
@@ -4,6 +4,7 @@
 
 void erofs_uuid_generate(unsigned char *out);
 void erofs_uuid_unparse_lower(const unsigned char *buf, char *out);
+void erofs_uuid_unparse_as_tag(const unsigned char *buf, char *out);
 int erofs_uuid_parse(const char *in, unsigned char *uu);
 
 #endif
diff --git a/lib/uuid_unparse.c b/lib/uuid_unparse.c
index 890acda8..53e5f936 100644
--- a/lib/uuid_unparse.c
+++ b/lib/uuid_unparse.c
@@ -8,7 +8,8 @@
 #include "erofs/config.h"
 #include "liberofs_uuid.h"
 
-void erofs_uuid_unparse_lower(const unsigned char *buf, char *out) {
+void erofs_uuid_unparse_lower(const unsigned char *buf, char *out)
+{
 	sprintf(out, "%04x%04x-%04x-%04x-%04x-%04x%04x%04x",
 			(buf[0] << 8) | buf[1],
 			(buf[2] << 8) | buf[3],
@@ -19,3 +20,16 @@ void erofs_uuid_unparse_lower(const unsigned char *buf, char *out) {
 			(buf[12] << 8) | buf[13],
 			(buf[14] << 8) | buf[15]);
 }
+
+void erofs_uuid_unparse_as_tag(const unsigned char *buf, char *out)
+{
+	sprintf(out, "%04x%04x%04x%04x%04x%04x%04x%04x",
+			(buf[0] << 8) | buf[1],
+			(buf[2] << 8) | buf[3],
+			(buf[4] << 8) | buf[5],
+			(buf[6] << 8) | buf[7],
+			(buf[8] << 8) | buf[9],
+			(buf[10] << 8) | buf[11],
+			(buf[12] << 8) | buf[13],
+			(buf[14] << 8) | buf[15]);
+}
diff --git a/mkfs/main.c b/mkfs/main.c
index 5006f76f..6867478b 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1788,16 +1788,7 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
 			memcpy(devs[idx].tag, tag, sizeof(devs[0].tag));
 		else
 			/* convert UUID of the source image to a hex string */
-			sprintf((char *)g_sbi.devs[idx].tag,
-				"%04x%04x%04x%04x%04x%04x%04x%04x",
-				(src->uuid[0] << 8) | src->uuid[1],
-				(src->uuid[2] << 8) | src->uuid[3],
-				(src->uuid[4] << 8) | src->uuid[5],
-				(src->uuid[6] << 8) | src->uuid[7],
-				(src->uuid[8] << 8) | src->uuid[9],
-				(src->uuid[10] << 8) | src->uuid[11],
-				(src->uuid[12] << 8) | src->uuid[13],
-				(src->uuid[14] << 8) | src->uuid[15]);
+			erofs_uuid_unparse_as_tag(src->uuid, (char *)g_sbi.devs[idx].tag);
 	}
 	return 0;
 }

-- 
Git-155)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images
  2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 1/4] erofs-utils: lib: remove redundant if check Lucas Karpinski
  2026-04-14 19:10 ` [PATCH v3 2/4] erofs-utils: lib: add helper function erofs_uuid_unparse_as_tag Lucas Karpinski
@ 2026-04-14 19:10 ` Lucas Karpinski
  2026-04-15  3:35   ` zhaoyifan (H)
  2026-04-14 19:10 ` [PATCH v3 4/4] erofs-utils: manpages: update to reflect fulldata support Lucas Karpinski
  2026-04-15  2:07 ` [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Gao Xiang
  4 siblings, 1 reply; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-14 19:10 UTC (permalink / raw)
  To: linux-erofs; +Cc: jcalmels, Lucas Karpinski

This patch introduces experimental support for merging multiple source
images in mkfs. Each regular file record the source image path and its byte
offset. During the blob mkfs opens the blob and pulls the payload in via
erofs_io_xcopy.

This does not yet support chunk-based files or compressed images.

Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
---
 include/erofs/internal.h |  3 +++
 lib/inode.c              | 31 ++++++++++++++++++---
 lib/rebuild.c            | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 mkfs/main.c              |  7 +++--
 4 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index c780228c..450e2647 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -208,6 +208,7 @@ struct erofs_diskbuf;
 #define EROFS_INODE_DATA_SOURCE_LOCALPATH	1
 #define EROFS_INODE_DATA_SOURCE_DISKBUF		2
 #define EROFS_INODE_DATA_SOURCE_RESVSP		3
+#define EROFS_INODE_DATA_SOURCE_REBUILD_BLOB	4
 
 #define EROFS_I_BLKADDR_DEV_ID_BIT		48
 
@@ -253,6 +254,8 @@ struct erofs_inode {
 		char *i_link;
 		struct erofs_diskbuf *i_diskbuf;
 	};
+	char *rebuild_blobpath;
+	erofs_off_t rebuild_src_dataoff;
 	unsigned char datalayout;
 	unsigned char inode_isize;
 	/* inline tail-end packing size */
diff --git a/lib/inode.c b/lib/inode.c
index 2f78d9b8..bd10e267 100644
--- a/lib/inode.c
+++ b/lib/inode.c
@@ -158,6 +158,8 @@ unsigned int erofs_iput(struct erofs_inode *inode)
 	if (inode->datasource == EROFS_INODE_DATA_SOURCE_DISKBUF) {
 		erofs_diskbuf_close(inode->i_diskbuf);
 		free(inode->i_diskbuf);
+	} else if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
+		free(inode->rebuild_blobpath);
 	} else {
 		free(inode->i_link);
 	}
@@ -697,7 +699,10 @@ static int erofs_write_unencoded_data(struct erofs_inode *inode,
 
 int erofs_write_unencoded_file(struct erofs_inode *inode, int fd, u64 fpos)
 {
-	if (cfg.c_chunkbits) {
+	struct erofs_vfile vf = { .fd = fd };
+
+	if (cfg.c_chunkbits &&
+	    inode->datasource != EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
 		inode->u.chunkbits = cfg.c_chunkbits;
 		/* chunk indexes when explicitly specified */
 		inode->u.chunkformat = 0;
@@ -706,10 +711,15 @@ int erofs_write_unencoded_file(struct erofs_inode *inode, int fd, u64 fpos)
 		return erofs_blob_write_chunked_file(inode, fd, fpos);
 	}
 
+	if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
+		if (erofs_io_lseek(&vf, fpos, SEEK_SET) != (off_t)fpos)
+			return -EIO;
+		return erofs_write_unencoded_data(inode, &vf, fpos, true, false);
+	}
+
 	inode->datalayout = EROFS_INODE_FLAT_INLINE;
 	/* fallback to all data uncompressed */
-	return erofs_write_unencoded_data(inode,
-			&(struct erofs_vfile){ .fd = fd }, fpos,
+	return erofs_write_unencoded_data(inode, &vf, fpos,
 			inode->datasource == EROFS_INODE_DATA_SOURCE_DISKBUF, false);
 }
 
@@ -1508,6 +1518,12 @@ out:
 		free(inode->i_diskbuf);
 		inode->i_diskbuf = NULL;
 		inode->datasource = EROFS_INODE_DATA_SOURCE_NONE;
+	} else if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
+		free(inode->rebuild_blobpath);
+		inode->rebuild_blobpath = NULL;
+		inode->datasource = EROFS_INODE_DATA_SOURCE_NONE;
+		DBG_BUGON(ctx->fd < 0);
+		close(ctx->fd);
 	} else {
 		DBG_BUGON(ctx->fd < 0);
 		close(ctx->fd);
@@ -2014,6 +2030,12 @@ static int erofs_mkfs_begin_nondirectory(const struct erofs_mkfs_btctx *btctx,
 			if (ctx.fd < 0)
 				return -errno;
 			break;
+		case EROFS_INODE_DATA_SOURCE_REBUILD_BLOB:
+			ctx.fd = open(inode->rebuild_blobpath, O_RDONLY | O_BINARY);
+			if (ctx.fd < 0)
+				return -errno;
+			ctx.fpos = inode->rebuild_src_dataoff;
+			break;
 		default:
 			goto out;
 		}
@@ -2022,7 +2044,8 @@ static int erofs_mkfs_begin_nondirectory(const struct erofs_mkfs_btctx *btctx,
 		if (ret < 0)
 			return ret;
 
-		if (inode->sbi->available_compr_algs &&
+		if (inode->datasource != EROFS_INODE_DATA_SOURCE_REBUILD_BLOB &&
+		    inode->sbi->available_compr_algs &&
 		    erofs_file_is_compressible(im, inode)) {
 			ctx.ictx = erofs_prepare_compressed_file(im, inode);
 			if (IS_ERR(ctx.ictx))
diff --git a/lib/rebuild.c b/lib/rebuild.c
index 7ab2b499..3785afd0 100644
--- a/lib/rebuild.c
+++ b/lib/rebuild.c
@@ -14,8 +14,10 @@
 #include "erofs/xattr.h"
 #include "erofs/blobchunk.h"
 #include "erofs/internal.h"
+#include "erofs/io.h"
 #include "liberofs_rebuild.h"
 #include "liberofs_uuid.h"
+#include "liberofs_cache.h"
 
 #ifdef HAVE_LINUX_AUFS_TYPE_H
 #include <linux/aufs_type.h>
@@ -221,6 +223,71 @@ err:
 	return ret;
 }
 
+static int erofs_rebuild_write_full_data(struct erofs_inode *inode)
+{
+	struct erofs_sb_info *src_sbi = inode->sbi;
+	int err = 0;
+
+	if (inode->datalayout == EROFS_INODE_FLAT_PLAIN) {
+		if (inode->u.i_blkaddr == EROFS_NULL_ADDR) {
+			if (inode->i_size)
+				return -EFSCORRUPTED;
+			return 0;
+		}
+		inode->rebuild_blobpath = strdup(src_sbi->devname);
+		if (!inode->rebuild_blobpath)
+			return -ENOMEM;
+		inode->rebuild_src_dataoff =
+			erofs_pos(src_sbi, erofs_inode_dev_baddr(inode));
+		inode->datasource = EROFS_INODE_DATA_SOURCE_REBUILD_BLOB;
+	} else if (inode->datalayout == EROFS_INODE_FLAT_INLINE) {
+		erofs_blk_t nblocks = erofs_blknr(src_sbi, inode->i_size);
+		unsigned int inline_size = inode->i_size % erofs_blksiz(src_sbi);
+
+		if (nblocks > 0 && inode->u.i_blkaddr != EROFS_NULL_ADDR) {
+			inode->rebuild_blobpath = strdup(src_sbi->devname);
+			if (!inode->rebuild_blobpath)
+				return -ENOMEM;
+			inode->rebuild_src_dataoff =
+				erofs_pos(src_sbi,
+					  erofs_inode_dev_baddr(inode));
+			inode->datasource = EROFS_INODE_DATA_SOURCE_REBUILD_BLOB;
+		}
+
+		inode->idata_size = inline_size;
+		if (inline_size > 0) {
+			struct erofs_vfile vf;
+			erofs_off_t tail_offset = erofs_pos(src_sbi, nblocks);
+
+			inode->idata = malloc(inline_size);
+			if (!inode->idata)
+				return -ENOMEM;
+			err = erofs_iopen(&vf, inode);
+			if (err) {
+				free(inode->idata);
+				inode->idata = NULL;
+				return err;
+			}
+			err = erofs_pread(&vf, inode->idata, inline_size,
+					  tail_offset);
+			if (err) {
+				free(inode->idata);
+				inode->idata = NULL;
+				return err;
+			}
+		}
+	} else if (inode->datalayout == EROFS_INODE_CHUNK_BASED) {
+		erofs_err("chunk-based files not yet supported: %s",
+			  inode->i_srcpath);
+		err = -EOPNOTSUPP;
+	} else if (is_inode_layout_compression(inode)) {
+		erofs_err("compressed files not yet supported: %s",
+			  inode->i_srcpath);
+		err = -EOPNOTSUPP;
+	}
+	return err;
+}
+
 static int erofs_rebuild_update_inode(struct erofs_sb_info *dst_sb,
 				      struct erofs_inode *inode,
 				      enum erofs_rebuild_datamode datamode)
@@ -265,6 +332,8 @@ static int erofs_rebuild_update_inode(struct erofs_sb_info *dst_sb,
 			err = erofs_rebuild_write_blob_index(dst_sb, inode);
 		else if (datamode == EROFS_REBUILD_DATA_RESVSP)
 			inode->datasource = EROFS_INODE_DATA_SOURCE_RESVSP;
+		else if (datamode == EROFS_REBUILD_DATA_FULL)
+			err = erofs_rebuild_write_full_data(inode);
 		else
 			err = -EOPNOTSUPP;
 		break;
@@ -553,3 +622,4 @@ int erofs_rebuild_load_basedir(struct erofs_inode *dir, u64 *nr_subdirs,
 	};
 	return erofs_iterate_dir(&ctx.ctx, false);
 }
+
diff --git a/mkfs/main.c b/mkfs/main.c
index 6867478b..d75c97b2 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1756,7 +1756,7 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
 		extra_devices += src->extra_devices;
 	}
 
-	if (datamode != EROFS_REBUILD_DATA_BLOB_INDEX)
+	if (datamode == EROFS_REBUILD_DATA_RESVSP)
 		return 0;
 
 	/* Each blob has either no extra device or only one device for TarFS */
@@ -1766,6 +1766,9 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
 		return -EOPNOTSUPP;
 	}
 
+	if (datamode == EROFS_REBUILD_DATA_FULL)
+		return 0;
+
 	ret = erofs_mkfs_init_devices(&g_sbi, rebuild_src_count);
 	if (ret)
 		return ret;
@@ -1788,7 +1791,7 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
 			memcpy(devs[idx].tag, tag, sizeof(devs[0].tag));
 		else
 			/* convert UUID of the source image to a hex string */
-			erofs_uuid_unparse_as_tag(src->uuid, (char *)g_sbi.devs[idx].tag);
+			erofs_uuid_unparse_as_tag(src->uuid, (char *)devs[idx].tag);
 	}
 	return 0;
 }

-- 
Git-155)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/4] erofs-utils: manpages: update to reflect fulldata support
  2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
                   ` (2 preceding siblings ...)
  2026-04-14 19:10 ` [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images Lucas Karpinski
@ 2026-04-14 19:10 ` Lucas Karpinski
  2026-04-15  2:07 ` [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Gao Xiang
  4 siblings, 0 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-14 19:10 UTC (permalink / raw)
  To: linux-erofs; +Cc: jcalmels, Lucas Karpinski

Specify that data (fulldata) mode is now supporting alongside rsvp when
using --clean={data|rsvp}.

Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
---
 man/mkfs.erofs.1 | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/man/mkfs.erofs.1 b/man/mkfs.erofs.1
index a102e65e..65ec8079 100644
--- a/man/mkfs.erofs.1
+++ b/man/mkfs.erofs.1
@@ -229,7 +229,7 @@ Only \fBdata\fR is supported. \fBrvsp\fR and \fB0\fR will be ignored.
 Note that \fBrvsp\fR takes precedence over \fB--tar=i\fR or \fB--tar=headerball\fR.
 .TP
 .I Rebuild mode
-Only \fBrvsp\fR is supported.
+\fBdata\fR and \fBrvsp\fR are supported.
 .TP
 .I S3 source (\fB\-\-s3\fR)
 \fBdata\fR and \fB0\fR are supported.
@@ -521,6 +521,11 @@ source images, which act as external blob devices. This creates a compact
 metadata layer suitable for layered filesystem scenarios, similar to container
 image layers.
 .TP
+.I data mode
+\fB\-\-clean=data\fR: Import complete file data from all source images into
+the destination image, producing a fully self-contained EROFS image that does
+not depend on external blob devices.
+.TP
 .I rvsp mode
 \fB\-\-clean=rvsp\fR or \fB\-\-incremental=rvsp\fR: Reserve space for file
 data without copying actual content, useful for creating sparse images.

-- 
Git-155)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode
  2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
                   ` (3 preceding siblings ...)
  2026-04-14 19:10 ` [PATCH v3 4/4] erofs-utils: manpages: update to reflect fulldata support Lucas Karpinski
@ 2026-04-15  2:07 ` Gao Xiang
  2026-04-15 14:09   ` Lucas Karpinski
  4 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2026-04-15  2:07 UTC (permalink / raw)
  To: Lucas Karpinski, linux-erofs; +Cc: jcalmels



On 2026/4/15 03:10, Lucas Karpinski wrote:
> Currently, erofs-utils supports backing blobs for multi-image setups.  This
> implements the FULLDATA import which allows for the merging of multiple
> source images into a single self-contained erofs image.
> 
> To optimize the rebuild process, erofs_io_xcopy() is used to leverage the
> copy_file_range(2) if available. This bypasses userspace buffering and
> enables kernel side data transfers.
>   
> Verification: Built same image with default rebuild and rebuild with
> FULLDATA. Then ran F-i-f/tdiff comparing the two.
> 
> changes in v3:
> - adhere to uniaddress semantics.
> - take advantage of existing infrastructure which allows us to drop a
>    significant amount of complexity + code.
> 
> changes in v2:
> - reworked erofs_rebuild_load_trees_full into
>    erofs_mkfs_rebuild_load_trees.
> - removed --merge option (use --clean=data instead).
> - updated man.
> 
> Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>

Overally looks good to me, will apply to -experimental branch.

... Would you mind taking some time working on some tests
in experimental-tests branch?

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images
  2026-04-14 19:10 ` [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images Lucas Karpinski
@ 2026-04-15  3:35   ` zhaoyifan (H)
  2026-04-15  7:47     ` Gao Xiang
  0 siblings, 1 reply; 10+ messages in thread
From: zhaoyifan (H) @ 2026-04-15  3:35 UTC (permalink / raw)
  To: Lucas Karpinski, linux-erofs, Gao Xiang; +Cc: jcalmels

This patch incorrectly handles inline inode:

Reproduce in erofs-utils directory:
     mkfs/mkfs.erofs 1.erofs man/
     mkfs/mkfs.erofs 2.erofs docs/
     mkfs/mkfs.erofs --clean=data merged.erofs 1.erofs 2.erofs

Then PERFORMANCE.md in merged.erofs contains incorrect data after offset 
0x2000.

Fixed with following diff:

diff --git a/lib/inode.c b/lib/inode.c
   index bd10e26..36dce56 100644
   --- a/lib/inode.c
   +++ b/lib/inode.c
   @@ -683,6 +683,13 @@ static int erofs_write_unencoded_data(struct 
erofs_inode *inode,

         /* read the tail-end data */
         if (inode->idata_size) {
   +             /*
   +              * If inode->idata is already present, the caller has 
prepared
   +              * the tail data and nothing more needs to be done here.
   +              */
   +             if (inode->idata)
   +                     return 0;
   +
                 inode->idata = malloc(inode->idata_size);
                 if (!inode->idata)
                         return -ENOMEM;


On 2026/4/15 3:10, Lucas Karpinski wrote:
> This patch introduces experimental support for merging multiple source
> images in mkfs. Each regular file record the source image path and its byte
> offset. During the blob mkfs opens the blob and pulls the payload in via
> erofs_io_xcopy.
>
> This does not yet support chunk-based files or compressed images.
>
> Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
> ---
>   include/erofs/internal.h |  3 +++
>   lib/inode.c              | 31 ++++++++++++++++++---
>   lib/rebuild.c            | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
>   mkfs/main.c              |  7 +++--
>   4 files changed, 105 insertions(+), 6 deletions(-)
>
> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
> index c780228c..450e2647 100644
> --- a/include/erofs/internal.h
> +++ b/include/erofs/internal.h
> @@ -208,6 +208,7 @@ struct erofs_diskbuf;
>   #define EROFS_INODE_DATA_SOURCE_LOCALPATH	1
>   #define EROFS_INODE_DATA_SOURCE_DISKBUF		2
>   #define EROFS_INODE_DATA_SOURCE_RESVSP		3
> +#define EROFS_INODE_DATA_SOURCE_REBUILD_BLOB	4
>   
>   #define EROFS_I_BLKADDR_DEV_ID_BIT		48
>   
> @@ -253,6 +254,8 @@ struct erofs_inode {
>   		char *i_link;
>   		struct erofs_diskbuf *i_diskbuf;
>   	};
> +	char *rebuild_blobpath;
> +	erofs_off_t rebuild_src_dataoff;
>   	unsigned char datalayout;
>   	unsigned char inode_isize;
>   	/* inline tail-end packing size */
> diff --git a/lib/inode.c b/lib/inode.c
> index 2f78d9b8..bd10e267 100644
> --- a/lib/inode.c
> +++ b/lib/inode.c
> @@ -158,6 +158,8 @@ unsigned int erofs_iput(struct erofs_inode *inode)
>   	if (inode->datasource == EROFS_INODE_DATA_SOURCE_DISKBUF) {
>   		erofs_diskbuf_close(inode->i_diskbuf);
>   		free(inode->i_diskbuf);
> +	} else if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
> +		free(inode->rebuild_blobpath);
>   	} else {
>   		free(inode->i_link);
>   	}
> @@ -697,7 +699,10 @@ static int erofs_write_unencoded_data(struct erofs_inode *inode,
>   
>   int erofs_write_unencoded_file(struct erofs_inode *inode, int fd, u64 fpos)
>   {
> -	if (cfg.c_chunkbits) {
> +	struct erofs_vfile vf = { .fd = fd };
> +
> +	if (cfg.c_chunkbits &&
> +	    inode->datasource != EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
>   		inode->u.chunkbits = cfg.c_chunkbits;
>   		/* chunk indexes when explicitly specified */
>   		inode->u.chunkformat = 0;
> @@ -706,10 +711,15 @@ int erofs_write_unencoded_file(struct erofs_inode *inode, int fd, u64 fpos)
>   		return erofs_blob_write_chunked_file(inode, fd, fpos);
>   	}
>   
> +	if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
> +		if (erofs_io_lseek(&vf, fpos, SEEK_SET) != (off_t)fpos)
> +			return -EIO;
> +		return erofs_write_unencoded_data(inode, &vf, fpos, true, false);
> +	}
> +
>   	inode->datalayout = EROFS_INODE_FLAT_INLINE;
>   	/* fallback to all data uncompressed */
> -	return erofs_write_unencoded_data(inode,
> -			&(struct erofs_vfile){ .fd = fd }, fpos,
> +	return erofs_write_unencoded_data(inode, &vf, fpos,
>   			inode->datasource == EROFS_INODE_DATA_SOURCE_DISKBUF, false);
>   }
>   
> @@ -1508,6 +1518,12 @@ out:
>   		free(inode->i_diskbuf);
>   		inode->i_diskbuf = NULL;
>   		inode->datasource = EROFS_INODE_DATA_SOURCE_NONE;
> +	} else if (inode->datasource == EROFS_INODE_DATA_SOURCE_REBUILD_BLOB) {
> +		free(inode->rebuild_blobpath);
> +		inode->rebuild_blobpath = NULL;
> +		inode->datasource = EROFS_INODE_DATA_SOURCE_NONE;
> +		DBG_BUGON(ctx->fd < 0);
> +		close(ctx->fd);
>   	} else {
>   		DBG_BUGON(ctx->fd < 0);
>   		close(ctx->fd);
> @@ -2014,6 +2030,12 @@ static int erofs_mkfs_begin_nondirectory(const struct erofs_mkfs_btctx *btctx,
>   			if (ctx.fd < 0)
>   				return -errno;
>   			break;
> +		case EROFS_INODE_DATA_SOURCE_REBUILD_BLOB:
> +			ctx.fd = open(inode->rebuild_blobpath, O_RDONLY | O_BINARY);
> +			if (ctx.fd < 0)
> +				return -errno;
> +			ctx.fpos = inode->rebuild_src_dataoff;
> +			break;
>   		default:
>   			goto out;
>   		}
> @@ -2022,7 +2044,8 @@ static int erofs_mkfs_begin_nondirectory(const struct erofs_mkfs_btctx *btctx,
>   		if (ret < 0)
>   			return ret;
>   
> -		if (inode->sbi->available_compr_algs &&
> +		if (inode->datasource != EROFS_INODE_DATA_SOURCE_REBUILD_BLOB &&
> +		    inode->sbi->available_compr_algs &&
>   		    erofs_file_is_compressible(im, inode)) {
>   			ctx.ictx = erofs_prepare_compressed_file(im, inode);
>   			if (IS_ERR(ctx.ictx))
> diff --git a/lib/rebuild.c b/lib/rebuild.c
> index 7ab2b499..3785afd0 100644
> --- a/lib/rebuild.c
> +++ b/lib/rebuild.c
> @@ -14,8 +14,10 @@
>   #include "erofs/xattr.h"
>   #include "erofs/blobchunk.h"
>   #include "erofs/internal.h"
> +#include "erofs/io.h"
>   #include "liberofs_rebuild.h"
>   #include "liberofs_uuid.h"
> +#include "liberofs_cache.h"

Unnecessary include `liberofs_cache.h`


Thanks,

Yifan Zhao

>   
>   #ifdef HAVE_LINUX_AUFS_TYPE_H
>   #include <linux/aufs_type.h>
> @@ -221,6 +223,71 @@ err:
>   	return ret;
>   }
>   
> +static int erofs_rebuild_write_full_data(struct erofs_inode *inode)
> +{
> +	struct erofs_sb_info *src_sbi = inode->sbi;
> +	int err = 0;
> +
> +	if (inode->datalayout == EROFS_INODE_FLAT_PLAIN) {
> +		if (inode->u.i_blkaddr == EROFS_NULL_ADDR) {
> +			if (inode->i_size)
> +				return -EFSCORRUPTED;
> +			return 0;
> +		}
> +		inode->rebuild_blobpath = strdup(src_sbi->devname);
> +		if (!inode->rebuild_blobpath)
> +			return -ENOMEM;
> +		inode->rebuild_src_dataoff =
> +			erofs_pos(src_sbi, erofs_inode_dev_baddr(inode));
> +		inode->datasource = EROFS_INODE_DATA_SOURCE_REBUILD_BLOB;
> +	} else if (inode->datalayout == EROFS_INODE_FLAT_INLINE) {
> +		erofs_blk_t nblocks = erofs_blknr(src_sbi, inode->i_size);
> +		unsigned int inline_size = inode->i_size % erofs_blksiz(src_sbi);
> +
> +		if (nblocks > 0 && inode->u.i_blkaddr != EROFS_NULL_ADDR) {
> +			inode->rebuild_blobpath = strdup(src_sbi->devname);
> +			if (!inode->rebuild_blobpath)
> +				return -ENOMEM;
> +			inode->rebuild_src_dataoff =
> +				erofs_pos(src_sbi,
> +					  erofs_inode_dev_baddr(inode));
> +			inode->datasource = EROFS_INODE_DATA_SOURCE_REBUILD_BLOB;
> +		}
> +
> +		inode->idata_size = inline_size;
> +		if (inline_size > 0) {
> +			struct erofs_vfile vf;
> +			erofs_off_t tail_offset = erofs_pos(src_sbi, nblocks);
> +
> +			inode->idata = malloc(inline_size);
> +			if (!inode->idata)
> +				return -ENOMEM;
> +			err = erofs_iopen(&vf, inode);
> +			if (err) {
> +				free(inode->idata);
> +				inode->idata = NULL;
> +				return err;
> +			}
> +			err = erofs_pread(&vf, inode->idata, inline_size,
> +					  tail_offset);
> +			if (err) {
> +				free(inode->idata);
> +				inode->idata = NULL;
> +				return err;
> +			}
> +		}
> +	} else if (inode->datalayout == EROFS_INODE_CHUNK_BASED) {
> +		erofs_err("chunk-based files not yet supported: %s",
> +			  inode->i_srcpath);
> +		err = -EOPNOTSUPP;
> +	} else if (is_inode_layout_compression(inode)) {
> +		erofs_err("compressed files not yet supported: %s",
> +			  inode->i_srcpath);
> +		err = -EOPNOTSUPP;
> +	}
> +	return err;
> +}
> +
>   static int erofs_rebuild_update_inode(struct erofs_sb_info *dst_sb,
>   				      struct erofs_inode *inode,
>   				      enum erofs_rebuild_datamode datamode)
> @@ -265,6 +332,8 @@ static int erofs_rebuild_update_inode(struct erofs_sb_info *dst_sb,
>   			err = erofs_rebuild_write_blob_index(dst_sb, inode);
>   		else if (datamode == EROFS_REBUILD_DATA_RESVSP)
>   			inode->datasource = EROFS_INODE_DATA_SOURCE_RESVSP;
> +		else if (datamode == EROFS_REBUILD_DATA_FULL)
> +			err = erofs_rebuild_write_full_data(inode);
>   		else
>   			err = -EOPNOTSUPP;
>   		break;
> @@ -553,3 +622,4 @@ int erofs_rebuild_load_basedir(struct erofs_inode *dir, u64 *nr_subdirs,
>   	};
>   	return erofs_iterate_dir(&ctx.ctx, false);
>   }
> +
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 6867478b..d75c97b2 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1756,7 +1756,7 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
>   		extra_devices += src->extra_devices;
>   	}
>   
> -	if (datamode != EROFS_REBUILD_DATA_BLOB_INDEX)
> +	if (datamode == EROFS_REBUILD_DATA_RESVSP)
>   		return 0;
>   
>   	/* Each blob has either no extra device or only one device for TarFS */
> @@ -1766,6 +1766,9 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
>   		return -EOPNOTSUPP;
>   	}
>   
> +	if (datamode == EROFS_REBUILD_DATA_FULL)
> +		return 0;
> +
>   	ret = erofs_mkfs_init_devices(&g_sbi, rebuild_src_count);
>   	if (ret)
>   		return ret;
> @@ -1788,7 +1791,7 @@ static int erofs_mkfs_rebuild_load_trees(struct erofs_inode *root)
>   			memcpy(devs[idx].tag, tag, sizeof(devs[0].tag));
>   		else
>   			/* convert UUID of the source image to a hex string */
> -			erofs_uuid_unparse_as_tag(src->uuid, (char *)g_sbi.devs[idx].tag);
> +			erofs_uuid_unparse_as_tag(src->uuid, (char *)devs[idx].tag);
>   	}
>   	return 0;
>   }
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images
  2026-04-15  3:35   ` zhaoyifan (H)
@ 2026-04-15  7:47     ` Gao Xiang
  2026-04-15 13:30       ` Lucas Karpinski
  0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2026-04-15  7:47 UTC (permalink / raw)
  To: zhaoyifan (H), Lucas Karpinski, linux-erofs; +Cc: jcalmels



On 2026/4/15 11:35, zhaoyifan (H) wrote:
> This patch incorrectly handles inline inode:
> 
> Reproduce in erofs-utils directory:
>      mkfs/mkfs.erofs 1.erofs man/
>      mkfs/mkfs.erofs 2.erofs docs/
>      mkfs/mkfs.erofs --clean=data merged.erofs 1.erofs 2.erofs
> 
> Then PERFORMANCE.md in merged.erofs contains incorrect data after offset 0x2000.
> 
> Fixed with following diff:
> 
> diff --git a/lib/inode.c b/lib/inode.c
>    index bd10e26..36dce56 100644
>    --- a/lib/inode.c
>    +++ b/lib/inode.c
>    @@ -683,6 +683,13 @@ static int erofs_write_unencoded_data(struct erofs_inode *inode,
> 
>          /* read the tail-end data */
>          if (inode->idata_size) {
>    +             /*
>    +              * If inode->idata is already present, the caller has prepared
>    +              * the tail data and nothing more needs to be done here.
>    +              */
>    +             if (inode->idata)
>    +                     return 0;

Yes, it should be fixed as:
	/*
	 * Read the tail-end data if inode->idata is NULL; if the tail data
	 * has been prepared then nothing more needs to be done here.
	 */
	if (inode->idata_size && !inode->idata) {
	}

...

>> +#include "liberofs_cache.h"
> Unnecessary include `liberofs_cache.h`

That would be nice to be addressed too.


I've applied [PATCH 1 and 2], could you send v4 to address this?

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images
  2026-04-15  7:47     ` Gao Xiang
@ 2026-04-15 13:30       ` Lucas Karpinski
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-15 13:30 UTC (permalink / raw)
  To: Gao Xiang, zhaoyifan (H), linux-erofs; +Cc: jcalmels

On 2026-04-15 3:47 a.m., Gao Xiang wrote:
> 
> 
> On 2026/4/15 11:35, zhaoyifan (H) wrote:
>> This patch incorrectly handles inline inode:
>>
>> Reproduce in erofs-utils directory:
>>      mkfs/mkfs.erofs 1.erofs man/
>>      mkfs/mkfs.erofs 2.erofs docs/
>>      mkfs/mkfs.erofs --clean=data merged.erofs 1.erofs 2.erofs
>>
>> Then PERFORMANCE.md in merged.erofs contains incorrect data after
>> offset 0x2000.
>>
>> Fixed with following diff:
>>
>> diff --git a/lib/inode.c b/lib/inode.c
>>    index bd10e26..36dce56 100644
>>    --- a/lib/inode.c
>>    +++ b/lib/inode.c
>>    @@ -683,6 +683,13 @@ static int erofs_write_unencoded_data(struct
>> erofs_inode *inode,
>>
>>          /* read the tail-end data */
>>          if (inode->idata_size) {
>>    +             /*
>>    +              * If inode->idata is already present, the caller has
>> prepared
>>    +              * the tail data and nothing more needs to be done here.
>>    +              */
>>    +             if (inode->idata)
>>    +                     return 0;
> 
> Yes, it should be fixed as:
>     /*
>      * Read the tail-end data if inode->idata is NULL; if the tail data
>      * has been prepared then nothing more needs to be done here.
>      */
>     if (inode->idata_size && !inode->idata) {
>     }
> 
> ...
> 
>>> +#include "liberofs_cache.h"
>> Unnecessary include `liberofs_cache.h`
> 
> That would be nice to be addressed too.
> 
> 
> I've applied [PATCH 1 and 2], could you send v4 to address this?
> 
> Thanks,
> Gao Xiang

Thanks for the catch Zhao and ACK will add the changes.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode
  2026-04-15  2:07 ` [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Gao Xiang
@ 2026-04-15 14:09   ` Lucas Karpinski
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas Karpinski @ 2026-04-15 14:09 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs; +Cc: jcalmels

On 2026-04-14 10:07 p.m., Gao Xiang wrote:
> 
> 
> On 2026/4/15 03:10, Lucas Karpinski wrote:
>> Currently, erofs-utils supports backing blobs for multi-image setups. 
>> This
>> implements the FULLDATA import which allows for the merging of multiple
>> source images into a single self-contained erofs image.
>>
>> To optimize the rebuild process, erofs_io_xcopy() is used to leverage the
>> copy_file_range(2) if available. This bypasses userspace buffering and
>> enables kernel side data transfers.
>>   Verification: Built same image with default rebuild and rebuild with
>> FULLDATA. Then ran F-i-f/tdiff comparing the two.
>>
>> changes in v3:
>> - adhere to uniaddress semantics.
>> - take advantage of existing infrastructure which allows us to drop a
>>    significant amount of complexity + code.
>>
>> changes in v2:
>> - reworked erofs_rebuild_load_trees_full into
>>    erofs_mkfs_rebuild_load_trees.
>> - removed --merge option (use --clean=data instead).
>> - updated man.
>>
>> Signed-off-by: Lucas Karpinski <lkarpinski@nvidia.com>
> 
> Overally looks good to me, will apply to -experimental branch.
> 
> ... Would you mind taking some time working on some tests
> in experimental-tests branch?
> 
> Thanks,
> Gao Xiang

Yes, I'll work on adding some tests.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-15 14:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 19:10 [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Lucas Karpinski
2026-04-14 19:10 ` [PATCH v3 1/4] erofs-utils: lib: remove redundant if check Lucas Karpinski
2026-04-14 19:10 ` [PATCH v3 2/4] erofs-utils: lib: add helper function erofs_uuid_unparse_as_tag Lucas Karpinski
2026-04-14 19:10 ` [PATCH v3 3/4] erofs-utils: mfks: add rebuild FULLDATA for combined EROFS images Lucas Karpinski
2026-04-15  3:35   ` zhaoyifan (H)
2026-04-15  7:47     ` Gao Xiang
2026-04-15 13:30       ` Lucas Karpinski
2026-04-14 19:10 ` [PATCH v3 4/4] erofs-utils: manpages: update to reflect fulldata support Lucas Karpinski
2026-04-15  2:07 ` [PATCH v3 0/4] erofs-utils: implement the FULLDATA rebuild mode Gao Xiang
2026-04-15 14:09   ` Lucas Karpinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox