public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/2] mkfs: add ability to populate filesystem from directory
@ 2025-04-26 13:55 Luca Di Maio
  2025-04-26 13:55 ` [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory Luca Di Maio
  2025-04-26 13:55 ` [PATCH v7 2/2] mkfs: modify -p flag " Luca Di Maio
  0 siblings, 2 replies; 8+ messages in thread
From: Luca Di Maio @ 2025-04-26 13:55 UTC (permalink / raw)
  To: linux-xfs; +Cc: Luca Di Maio, dimitri.ledkov, smoser, djwong, hch

Currently the only way to pre populate an XFS partition is via the
prototype file. While it works it has some limitations like:
  - not allowed spaces in file names
  - not preserving timestamps of original inodes

This series adds a new -P option to mkfs.xfs that allows users to
populate a newly created filesystem directly from an existing directory.
While similar to the prototype functionality, this doesn't require
writing a prototype file.
The implementation preserves file and directory attributes (ownership,
permissions, timestamps) from the source directory when copying content
to the new filesystem.

[v1] -> [v2]
  remove changes to protofile spec
  ensure backward compatibility
[v2] -> [v3]
  use inode_set_[acm]time() as suggested
  avoid copying atime and ctime
  they are often problematic for reproducibility, and
  mtime is the important information to preserve anyway
[v3] -> [v4]
  rewrite functionality to populate directly from an input directory
  this is similar to mkfs.ext4 option.
[v4] -> [v5]
  reorder patch to make it easier to review
  reflow to keep code below 80 chars
  use _() macro in prints
  add SPDX headers to new files
  fix comment styling
  move from typedef to structs
  move direntry handling to own function
[v5] -> [v6]
  rebase on 6.14
[v6] -> [v7]
  move functionality to common -p flag
  add noatime flag to skip atime copy and set to current time
  set ctime/crtime to current time
  preserve hardlinks
  preserve extended attributes for all file/dir types
  add fsxattr to copied files/dirs


Luca Di Maio (2):
  proto: add ability to populate a filesystem from a directory
  mkfs: modify -p flag to populate a filesystem from a directory

 man/man8/mkfs.xfs.8.in |  41 ++-
 mkfs/proto.c           | 643 ++++++++++++++++++++++++++++++++++++++++-
 mkfs/proto.h           |   2 +-
 mkfs/xfs_mkfs.c        |  19 +-
 4 files changed, 682 insertions(+), 23 deletions(-)

--
2.49.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory
  2025-04-26 13:55 [PATCH v7 0/2] mkfs: add ability to populate filesystem from directory Luca Di Maio
@ 2025-04-26 13:55 ` Luca Di Maio
  2025-04-28 17:16   ` Darrick J. Wong
  2025-04-26 13:55 ` [PATCH v7 2/2] mkfs: modify -p flag " Luca Di Maio
  1 sibling, 1 reply; 8+ messages in thread
From: Luca Di Maio @ 2025-04-26 13:55 UTC (permalink / raw)
  To: linux-xfs; +Cc: Luca Di Maio, dimitri.ledkov, smoser, djwong, hch

This patch implements the functionality to populate a newly created XFS
filesystem directly from an existing directory structure.

It resuses existing protofile logic, it branches if input is a
directory.

The population process steps are as follows:
  - create the root inode before populating content
  - recursively process nested directories
  - handle regular files, directories, symlinks, char devices, block
    devices, and fifos
  - preserve attributes (ownership, permissions)
  - preserve mtime timestamps from source files to maintain file history
    - possible to specify noatime=1 to use current time also for atime
    - use current time for ctime/crtime
  - preserve extended attributes and fsxattrs for all file types
  - preserve hardlinks

This functionality makes it easier to create populated filesystems
without having to write protofiles manually.
It's particularly useful for reproducible builds.

Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
---
 mkfs/proto.c | 653 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 mkfs/proto.h |   2 +-
 2 files changed, 646 insertions(+), 9 deletions(-)

diff --git a/mkfs/proto.c b/mkfs/proto.c
index 7f56a3d8..23f7998b 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -5,11 +5,17 @@
  */

 #include "libxfs.h"
+#include "xfs_inode.h"
+#include <fcntl.h>
+#include <linux/limits.h>
+#include <stdio.h>
+#include <sys/resource.h>
 #include <sys/stat.h>
 #include <sys/xattr.h>
 #include <linux/xattr.h>
 #include "libfrog/convert.h"
 #include "proto.h"
+#include <dirent.h>

 /*
  * Prototypes for internal functions.
@@ -22,6 +28,11 @@ static int newregfile(char **pp, char **fname);
 static void rtinit(xfs_mount_t *mp);
 static off_t filesize(int fd);
 static int slashes_are_spaces;
+static int noatime;
+static void populate_from_dir(struct xfs_mount *mp, struct xfs_inode *pip,
+				struct fsxattr *fsxp, char *cur_path);
+static void walk_dir(struct xfs_mount *mp, struct xfs_inode *pip,
+				struct fsxattr *fsxp, char *cur_path);

 /*
  * Use this for block reservations needed for mkfs's conditions
@@ -65,6 +76,18 @@ setup_proto(

 	if (!fname)
 		return dflt;
+
+	/*
+	 * handle directory inputs
+	 * in this case we noop and let successive
+	 * parse_proto() to handle the directory
+	 * input.
+	 */
+	if ((fd = open(fname, O_DIRECTORY)) > 0) {
+		close(fd);
+		return fname;
+	}
+
 	if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
 		fprintf(stderr, _("%s: failed to open %s: %s\n"),
 			progname, fname, strerror(errno));
@@ -380,9 +403,17 @@ writeattr(

 	ret = fgetxattr(fd, attrname, valuebuf, valuelen);
 	if (ret < 0) {
-		if (errno == EOPNOTSUPP)
-			return;
-		fail(_("error collecting xattr value"), errno);
+		/*
+		 * in case of filedescriptors with O_PATH, fgetxattr() will
+		 * fail. let's try to fallback to lgetxattr() using input
+		 * path.
+		 */
+		ret = lgetxattr(fname, attrname, valuebuf, valuelen);
+		if (ret < 0) {
+			if (errno == EOPNOTSUPP)
+				return;
+			fail(_("error collecting xattr value"), errno);
+		}
 	}
 	if (ret == 0)
 		return;
@@ -426,9 +457,17 @@ writeattrs(

 	ret = flistxattr(fd, namebuf, XATTR_LIST_MAX);
 	if (ret < 0) {
-		if (errno == EOPNOTSUPP)
-			goto out_namebuf;
-		fail(_("error collecting xattr names"), errno);
+		/*
+		 * in case of filedescriptors with O_PATH, flistxattr() will
+		 * fail. let's try to fallback to llistxattr() using input
+		 * path.
+		 */
+		ret = llistxattr(fname, namebuf, XATTR_LIST_MAX);
+		if (ret < 0) {
+			if (errno == EOPNOTSUPP)
+				goto out_namebuf;
+			fail(_("error collecting xattr names"), errno);
+		}
 	}

 	p = namebuf;
@@ -934,10 +973,24 @@ parse_proto(
 	xfs_mount_t	*mp,
 	struct fsxattr	*fsx,
 	char		**pp,
-	int		proto_slashes_are_spaces)
+	int proto_slashes_are_spaces,
+	int proto_noatime)
 {
 	slashes_are_spaces = proto_slashes_are_spaces;
-	parseproto(mp, NULL, fsx, pp, NULL);
+	noatime = proto_noatime;
+
+	/*
+	 * in case of a file input, we will use the prototype file logic
+	 * else we will fallback to populate from dir.
+	 */
+	int fd;
+	if ((fd = open(*pp,  O_DIRECTORY)) < 0) {
+		parseproto(mp, NULL, fsx, pp, NULL);
+		return;
+	}
+
+	close(fd);
+	populate_from_dir(mp, NULL, fsx, *pp);
 }

 /* Create a sb-rooted metadata file. */
@@ -1171,3 +1224,587 @@ filesize(
 		return -1;
 	return stb.st_size;
 }
+
+/* Try to allow as many memfds as possible. */
+static void
+bump_max_fds(void)
+{
+	struct rlimit rlim = {};
+	int ret;
+
+	ret = getrlimit(RLIMIT_NOFILE, &rlim);
+	if (!ret) {
+		rlim.rlim_cur = rlim.rlim_max;
+		setrlimit(RLIMIT_NOFILE, &rlim);
+	}
+}
+
+static void
+writefsxattrs(
+		struct fsxattr *fsxp,
+		struct xfs_inode *ip)
+{
+	ip->i_projid = fsxp->fsx_projid;
+	ip->i_extsize = fsxp->fsx_extsize;
+	ip->i_diflags = xfs_flags2diflags(ip, fsxp->fsx_xflags);
+	if (xfs_has_v3inodes(ip->i_mount)) {
+		ip->i_diflags2 = xfs_flags2diflags2(ip, fsxp->fsx_xflags);
+		ip->i_cowextsize = fsxp->fsx_cowextsize;
+	}
+}
+
+static void
+writetimestamps(
+		struct xfs_inode *ip,
+		struct stat statbuf)
+{
+	struct timespec64 ts;
+
+	/*
+	 * Copy timestamps from source file to destination inode.
+	 * Usually reproducible archives will delete or not register
+	 * atime and ctime, for example:
+	 *    https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html
+	 * hence we will only copy mtime, and let ctime/crtime be set to
+	 * current time.
+	 * atime will be copied over if noatime is false.
+	 */
+	ts.tv_sec = statbuf.st_mtim.tv_sec;
+	ts.tv_nsec = statbuf.st_mtim.tv_nsec;
+	inode_set_mtime_to_ts(VFS_I(ip), ts);
+
+	/*
+	 * in case of noatime option, we will not copy the atime
+	 * timestamp from source, but let it be set from gettimeofday()
+	 */
+	if (!noatime) {
+		ts.tv_sec = statbuf.st_atim.tv_sec;
+		ts.tv_nsec = statbuf.st_atim.tv_nsec;
+		inode_set_atime_to_ts(VFS_I(ip), ts);
+	}
+
+	return;
+}
+
+struct hardlink {
+	unsigned long i_ino;
+	struct xfs_inode *existing_ip;
+};
+
+struct hardlinks {
+	int count;
+	size_t size;
+	struct hardlink *entries;
+};
+
+/*
+ * keep track of source inodes that are from hardlinks
+ * so we can retrieve them when needed to setup in
+ * destination.
+ */
+static struct hardlinks *hardlink_tracker = { 0 };
+
+static void
+init_hardlink_tracker(void) {
+	hardlink_tracker = malloc(sizeof(struct hardlinks));
+	if (!hardlink_tracker)
+		fail(_("error allocating hardlinks tracking array"), errno);
+	memset(hardlink_tracker, 0, sizeof(struct hardlinks));
+
+	hardlink_tracker->count = 0;
+	hardlink_tracker->size = PATH_MAX;
+
+	hardlink_tracker->entries = malloc(
+			hardlink_tracker->size * sizeof(struct hardlink));
+	if (!hardlink_tracker->entries)
+		fail(_("error allocating hardlinks tracking array"), errno);
+}
+
+static void
+cleanup_hardlink_tracker(void) {
+	/*
+	 * cleanup all pending inodes, call libxfs_irele() on them
+	 * before freeing memory.
+	 */
+	for (int i = 0; i < hardlink_tracker->count; i++)
+		libxfs_irele(hardlink_tracker->entries[i].existing_ip);
+
+	free(hardlink_tracker->entries);
+	free(hardlink_tracker);
+}
+
+static struct xfs_inode*
+get_hardlink_src_inode(
+		unsigned long i_ino)
+{
+	for (int i = 0; i < hardlink_tracker->count; i++) {
+		if (hardlink_tracker->entries[i].i_ino == i_ino) {
+			return hardlink_tracker->entries[i].existing_ip;
+		}
+	}
+	return NULL;
+}
+
+static void
+track_hardlink_inode(
+		unsigned long i_ino,
+		struct xfs_inode *ip)
+{
+	if (hardlink_tracker->count >= hardlink_tracker->size) {
+		/*
+		 * double for smaller capacity.
+		 * instead grow by 25% steps for larger capacities.
+		 */
+		const size_t old_size = hardlink_tracker->size;
+		size_t new_size = old_size * 2;
+		if (old_size > 1024)
+			new_size = old_size + (old_size / 4);
+
+		struct hardlink *resized_array = realloc(
+				hardlink_tracker->entries,
+				new_size * sizeof(struct hardlink));
+		if (!resized_array) {
+			fail(_("error enlarging hardlinks tracking array"), errno);
+		}
+		memset(&resized_array[old_size],
+				0, (new_size - old_size) * sizeof(struct hardlink));
+
+		hardlink_tracker->entries = resized_array;
+		hardlink_tracker->size = new_size;
+	}
+
+	hardlink_tracker->entries[hardlink_tracker->count].i_ino = i_ino;
+	hardlink_tracker->entries[hardlink_tracker->count].existing_ip = ip;
+	hardlink_tracker->count++;
+}
+
+static int
+handle_hardlink(
+		struct xfs_mount *mp,
+		struct xfs_inode *pip,
+		struct fsxattr *fsxp,
+		int mode,
+		struct cred creds,
+		struct xfs_name xname,
+		int flags,
+		struct stat file_stat,
+		xfs_dev_t rdev,
+		int fd,
+		char *fname,
+		char *path)
+{
+	int error;
+	struct xfs_parent_args *ppargs = NULL;
+	struct xfs_inode *ip;
+	struct xfs_trans *tp;
+	tp = getres(mp, 0);
+	ppargs = newpptr(mp);
+
+	ip = get_hardlink_src_inode(file_stat.st_ino);
+	if (!ip) {
+		/*
+		 * we didn't find the hardlink inode, this means
+		 * it's the first time we see it, report error
+		 * so create_file() can continue handling the inode
+		 * as a regular file type, and later save
+		 * *ip in our buffer for future consumption.
+		 */
+		return 1;
+	}
+	/*
+	* In case the inode was already in our tracker
+	* we need to setup the hardlink and skip file
+	* copy.
+	*/
+	libxfs_trans_ijoin(tp, pip, 0);
+	libxfs_trans_ijoin(tp, ip, 0);
+
+	newdirent(mp, tp, pip, &xname, ip, ppargs);
+
+	/*
+	 * Increment the link count
+	 */
+	libxfs_bumplink(tp, ip);
+
+	/*
+	 * we won't need fd for hardlinks
+	 * so we close and reset it.
+	 */
+	if (fd >= 0)
+		close(fd);
+
+	libxfs_trans_log_inode(tp, ip, flags);
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		fail(_("Error encountered creating file from prototype file"), error);
+
+	libxfs_parent_finish(mp, ppargs);
+
+	return 0;
+}
+
+static void
+create_file(
+		struct xfs_mount *mp,
+		struct xfs_inode *pip,
+		struct fsxattr *fsxp,
+		int mode,
+		struct cred creds,
+		struct xfs_name xname,
+		int flags,
+		struct stat file_stat,
+		xfs_dev_t rdev,
+		int fd,
+		char *fname,
+		char *path)
+{
+
+	int error;
+	struct xfs_parent_args *ppargs = NULL;
+	struct xfs_inode *ip;
+	struct xfs_trans *tp;
+
+	if (file_stat.st_nlink > 1) {
+		error = handle_hardlink(mp, pip, fsxp, mode, creds,
+			    xname, flags, file_stat,
+			    rdev, fd, fname, path);
+		/*
+		 * if no error is reported it means the hardlink has
+		 * been correctly found and set, so we don't need to
+		 * do anything else.
+		 */
+		if (!error)
+			return;
+		/*
+		 * if instead we have an error it means the hardlink
+		 * was not registered, so we proceed to treat it like
+		 * a regular file, and save it to our tracker later.
+		 */
+	}
+
+	tp = getres(mp, 0);
+	ppargs = newpptr(mp);
+
+	error = creatproto(&tp, pip, mode, rdev, &creds, fsxp, &ip);
+	if (error)
+		fail(_("Inode allocation failed"), error);
+	libxfs_trans_ijoin(tp, pip, 0);
+	newdirent(mp, tp, pip, &xname, ip, ppargs);
+
+	/*
+	 * copy over timestamps
+	 */
+	writetimestamps(ip, file_stat);
+	libxfs_trans_log_inode(tp, ip, flags);
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		fail(_("Error encountered creating file from prototype file"), error);
+
+	libxfs_parent_finish(mp, ppargs);
+
+	/*
+	 * copy over file content, attributes,
+	 * extended attributes and timestamps
+	 *
+	 * hardlinks will be skipped as fd will
+	 * be closed before this.
+	 */
+	if (fd >= 0) {
+		writefile(ip, fname, fd);
+		writeattrs(ip, fname, fd);
+		writefsxattrs(fsxp, ip);
+		close(fd);
+	}
+
+	if (file_stat.st_nlink > 1)
+		/*
+		 * if we're here it means this is the first time we're
+		 * encountering an hardlink, so we need to store it and
+		 * skpi libxfs_irele() to keep it around.
+		 */
+		track_hardlink_inode(file_stat.st_ino, ip);
+	else
+		/*
+		 * We release the inode pointer only if we're dealing
+		 * with a regular file, we need to keep the original
+		 * inode pointer for hardlinks, they'll be released
+		 * at the end of the lifecycle when we cleanup the
+		 * hardlink_tracker.
+		 */
+		libxfs_irele(ip);
+}
+
+static void
+handle_direntry(
+		struct xfs_mount *mp,
+		struct xfs_inode *pip,
+		struct fsxattr *fsxp,
+		char *cur_path,
+		struct dirent *entry)
+{
+	char link_target[PATH_MAX];
+	char path[PATH_MAX];
+	int error;
+	int fd = -1;
+	int flags;
+	int majdev;
+	int mindev;
+	int mode;
+	off_t len;
+	struct cred creds;
+	struct stat file_stat;
+	struct xfs_name xname;
+	struct xfs_parent_args *ppargs = NULL;
+	struct xfs_inode *ip;
+	struct xfs_trans *tp;
+
+	/*
+	 * Skip "." and ".." directories to avoid looping
+	 */
+	if (strcmp(entry->d_name, ".") == 0 ||
+	    strcmp(entry->d_name, "..") == 0) {
+		return;
+	}
+
+	/*
+	 * Create the full path to the original file or directory
+	 */
+	snprintf(path, sizeof(path), "%s/%s", cur_path, entry->d_name);
+
+	if (lstat(path, &file_stat) < 0) {
+		fprintf(stderr, _("%s (error accessing)\n"), entry->d_name);
+		exit(1);
+	}
+
+	/*
+	 * symlinks will need to be opened with O_PATH to work, so we handle this
+	 * special case.
+	 */
+	int open_flags = O_NOFOLLOW | O_RDONLY | O_NOATIME;
+	if ((file_stat.st_mode & S_IFMT) == S_IFLNK) {
+		open_flags = O_NOFOLLOW | O_PATH;
+	}
+	if ((fd = open(path, open_flags)) < 0) {
+		fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, path,
+			strerror(errno));
+		exit(1);
+	}
+
+	memset(&creds, 0, sizeof(creds));
+	creds.cr_uid = file_stat.st_uid;
+	creds.cr_gid = file_stat.st_gid;
+	xname.name = (unsigned char *)entry->d_name;
+	xname.len = strlen(entry->d_name);
+	xname.type = 0;
+	mode = file_stat.st_mode;
+	flags = XFS_ILOG_CORE;
+	switch (file_stat.st_mode & S_IFMT) {
+	case S_IFDIR:
+		tp = getres(mp, 0);
+		ppargs = newpptr(mp);
+		error = creatproto(&tp, pip, mode, 0, &creds, fsxp, &ip);
+		if (error)
+			fail(_("Inode allocation failed"), error);
+		libxfs_trans_ijoin(tp, pip, 0);
+		xname.type = XFS_DIR3_FT_DIR;
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
+		libxfs_bumplink(tp, pip);
+		libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
+		newdirectory(mp, tp, ip, pip);
+
+		/*
+		 * copy over timestamps
+		 */
+		writetimestamps(ip, file_stat);
+
+		libxfs_trans_log_inode(tp, ip, flags);
+		error = -libxfs_trans_commit(tp);
+		if (error)
+			fail(_("Directory inode allocation failed."), error);
+
+		libxfs_parent_finish(mp, ppargs);
+		tp = NULL;
+
+		/*
+		 * copy over attributes
+		 */
+		writeattrs(ip, entry->d_name, fd);
+		writefsxattrs(fsxp, ip);
+		close(fd);
+
+		walk_dir(mp, ip, fsxp, path);
+
+		libxfs_irele(ip);
+		break;
+	case S_IFLNK:
+		len = readlink(path, link_target, PATH_MAX - 1);
+		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		ppargs = newpptr(mp);
+		error = creatproto(&tp, pip, mode, 0, &creds, fsxp, &ip);
+		if (error)
+			fail(_("Inode allocation failed"), error);
+		writesymlink(tp, ip, link_target, len);
+		libxfs_trans_ijoin(tp, pip, 0);
+		xname.type = XFS_DIR3_FT_SYMLINK;
+		newdirent(mp, tp, pip, &xname, ip, ppargs);
+
+		/*
+		 * copy over timestamps
+		 */
+		writetimestamps(ip, file_stat);
+
+		libxfs_trans_log_inode(tp, ip, flags);
+		error = -libxfs_trans_commit(tp);
+		if (error)
+			fail(_("Error encountered creating file from prototype file"),
+			     error);
+		libxfs_parent_finish(mp, ppargs);
+
+		/*
+		 * copy over attributes
+		 *
+		 * being a symlink we opened the filedescriptor with O_PATH
+		 * this will make flistxattr() and fgetxattr() fail, so we
+		 * will need to fallback to llistxattr() and lgetxattr(), this
+		 * will need the full path to the original file, not just the
+		 * entry name.
+		 */
+		writeattrs(ip, path, fd);
+		writefsxattrs(fsxp, ip);
+		close(fd);
+
+		libxfs_irele(ip);
+		break;
+	case S_IFREG:
+		xname.type = XFS_DIR3_FT_REG_FILE;
+		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
+			    0, fd, entry->d_name, path);
+		break;
+	case S_IFCHR:
+		xname.type = XFS_DIR3_FT_CHRDEV;
+		majdev = major(file_stat.st_rdev);
+		mindev = minor(file_stat.st_rdev);
+		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
+			    IRIX_MKDEV(majdev, mindev), fd, entry->d_name,
+			    path);
+		break;
+	case S_IFBLK:
+		xname.type = XFS_DIR3_FT_BLKDEV;
+		majdev = major(file_stat.st_rdev);
+		mindev = minor(file_stat.st_rdev);
+		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
+			    IRIX_MKDEV(majdev, mindev), fd, entry->d_name,
+			    path);
+		break;
+	case S_IFIFO:
+		flags |= XFS_ILOG_DEV;
+		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
+			    0, fd, entry->d_name, path);
+		break;
+	default:
+		break;
+	}
+}
+
+/*
+ * walk_dir will recursively list files and directories
+ * and populate the mountpoint *mp with them using handle_direntry().
+ */
+static void
+walk_dir(
+		struct xfs_mount *mp,
+		struct xfs_inode *pip,
+		struct fsxattr *fsxp,
+		char *cur_path)
+{
+	DIR *dir;
+	struct dirent *entry;
+
+	/*
+	 * open input directory and iterate over all entries in it.
+	 * when another directory is found, we will recursively call
+	 * walk_dir.
+	 */
+	if ((dir = opendir(cur_path)) == NULL)
+		fail(_("cannot open input dir"), 1);
+	while ((entry = readdir(dir)) != NULL) {
+		handle_direntry(mp, pip, fsxp, cur_path, entry);
+	}
+	closedir(dir);
+}
+
+static void
+populate_from_dir(
+		struct xfs_mount *mp,
+		struct xfs_inode *pip,
+		struct fsxattr *fsxp,
+		char *cur_path)
+{
+	int error;
+	int mode;
+	struct cred creds;
+	struct xfs_inode *ip;
+	struct xfs_trans *tp;
+
+	/*
+	 * we first ensure we have the root inode
+	 */
+	memset(&creds, 0, sizeof(creds));
+	creds.cr_uid = 0;
+	creds.cr_gid = 0;
+	mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
+	tp = getres(mp, 0);
+	error = creatproto(&tp, pip, mode | S_IFDIR, 0, &creds, fsxp, &ip);
+	if (error)
+		fail(_("Inode allocation failed"), error);
+	pip = ip;
+	mp->m_sb.sb_rootino = ip->i_ino;
+	libxfs_log_sb(tp);
+	newdirectory(mp, tp, ip, pip);
+	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	error = -libxfs_trans_commit(tp);
+	if (error)
+		fail(_("Inode allocation failed"), error);
+
+	libxfs_parent_finish(mp, NULL);
+
+	/*
+	 * RT initialization.  Do this here to ensure that
+	 * the RT inodes get placed after the root inode.
+	 */
+	error = create_metadir(mp);
+	if (error)
+		fail(_("Creation of the metadata directory inode failed"), error);
+
+	rtinit(mp);
+
+	/*
+	 * by nature of walk_dir() we could be opening
+	 * a great number of fds for deeply nested directory
+	 * trees.
+	 * try to bump max fds limit.
+	 */
+	bump_max_fds();
+
+	/*
+	 * initialize the hardlinks tracker
+	 */
+	init_hardlink_tracker();
+	/*
+	 * now that we have a root inode, let's
+	 * walk the input dir and populate the partition
+	 */
+	walk_dir(mp, ip, fsxp, cur_path);
+
+	/*
+	 * cleanup hardlinks tracker
+	 */
+	cleanup_hardlink_tracker();
+
+	/*
+	 * we free up our root inode
+	 * only when we finished populating the
+	 * root filesystem
+	 */
+	libxfs_irele(ip);
+}
diff --git a/mkfs/proto.h b/mkfs/proto.h
index be1ceb45..fea416f6 100644
--- a/mkfs/proto.h
+++ b/mkfs/proto.h
@@ -8,7 +8,7 @@

 char *setup_proto(char *fname);
 void parse_proto(struct xfs_mount *mp, struct fsxattr *fsx, char **pp,
-		int proto_slashes_are_spaces);
+		int proto_slashes_are_spaces, int proto_noatime);
 void res_failed(int err);

 #endif /* MKFS_PROTO_H_ */
--
2.49.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v7 2/2] mkfs: modify -p flag to populate a filesystem from a directory
  2025-04-26 13:55 [PATCH v7 0/2] mkfs: add ability to populate filesystem from directory Luca Di Maio
  2025-04-26 13:55 ` [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory Luca Di Maio
@ 2025-04-26 13:55 ` Luca Di Maio
  2025-04-28 17:23   ` Darrick J. Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Luca Di Maio @ 2025-04-26 13:55 UTC (permalink / raw)
  To: linux-xfs; +Cc: Luca Di Maio, dimitri.ledkov, smoser, djwong, hch

right now the `-p` flag only supports a file input.
this patch will add support to input a directory.
on directory input, the populate functionality to copy files into
the root filesystem.

add `noatime` flag to popts, that will let the user choose if copy the
atime timestamps from source directory.

add documentation for new functionalities in man pages.

Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
---
 man/man8/mkfs.xfs.8.in | 41 +++++++++++++++++++++++++++++------------
 mkfs/xfs_mkfs.c        | 19 +++++++++++++++++--
 2 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 37e3a88e..f06a3c9c 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -28,7 +28,7 @@ mkfs.xfs \- construct an XFS filesystem
 .I naming_options
 ] [
 .B \-p
-.I protofile_options
+.I prototype_options
 ] [
 .B \-q
 ] [
@@ -977,30 +977,39 @@ option set.
 .PP
 .PD 0
 .TP
-.BI \-p " protofile_options"
+.BI \-p " prototype_options"
 .TP
 .BI "Section Name: " [proto]
 .PD
-These options specify the protofile parameters for populating the filesystem.
+These options specify the prototype parameters for populating the filesystem.
 The valid
-.I protofile_options
+.I prototype_options
 are:
 .RS 1.2i
 .TP
-.BI [file=] protofile
+.BI [file=]
 The
 .B file=
 prefix is not required for this CLI argument for legacy reasons.
 If specified as a config file directive, the prefix is required.
-
+.TP
+.BI [file=] directory
 If the optional
 .PD
-.I protofile
-argument is given,
+.I prototype
+argument is given, and it's a directory,
 .B mkfs.xfs
-uses
-.I protofile
-as a prototype file and takes its directions from that file.
+will populate the root file system with the contents of the given directory.
+Content, timestamps, attributes and extended attributes are preserved
+for all file types.
+.TP
+.BI [file=] protofile
+If the optional
+.PD
+.I prototype
+argument is given, and it's a file,
+.B mkfs.xfs
+uses it as a prototype file and takes its directions from that file.
 The blocks and inodes specifiers in the
 .I protofile
 are provided for backwards compatibility, but are otherwise unused.
@@ -1136,8 +1145,16 @@ always terminated with the dollar (
 .B $
 ) token.
 .TP
+.BI noatime= value
+If set to 1, when we're populating the root filesystem from a directory (
+.B file=directory
+option)
+access times are NOT preserved and are set to the current time instead.
+Set to 0 to preserve access times from source files.
+By default, this is set to 1.
+.TP
 .BI slashes_are_spaces= value
-If set to 1, slashes ("/") in the first token of each line of the protofile
+If set to 1, slashes ("/") in the first token of each line of the prototype file
 are converted to spaces.
 This enables the creation of a filesystem containing filenames with spaces.
 By default, this is set to 0.
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 3f4455d4..1715e3fb 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -121,6 +121,7 @@ enum {

 enum {
 	P_FILE = 0,
+	P_NOATIME,
 	P_SLASHES,
 	P_MAX_OPTS,
 };
@@ -709,6 +710,7 @@ static struct opt_params popts = {
 	.ini_section = "proto",
 	.subopts = {
 		[P_FILE] = "file",
+		[P_NOATIME] = "noatime",
 		[P_SLASHES] = "slashes_are_spaces",
 		[P_MAX_OPTS] = NULL,
 	},
@@ -717,6 +719,12 @@ static struct opt_params popts = {
 		  .conflicts = { { NULL, LAST_CONFLICT } },
 		  .defaultval = SUBOPT_NEEDS_VAL,
 		},
+		{ .index = P_NOATIME,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
 		{ .index = P_SLASHES,
 		  .conflicts = { { NULL, LAST_CONFLICT } },
 		  .minval = 0,
@@ -1022,7 +1030,6 @@ struct cli_params {

 	char	*cfgfile;
 	char	*protofile;
-
 	enum fsprop_autofsck autofsck;

 	/* parameters that depend on sector/block size being validated. */
@@ -1045,6 +1052,7 @@ struct cli_params {
 	int	lsunit;
 	int	is_supported;
 	int	proto_slashes_are_spaces;
+	int	proto_noatime;
 	int	data_concurrency;
 	int	log_concurrency;
 	int	rtvol_concurrency;
@@ -1170,6 +1178,7 @@ usage( void )
 /* naming */		[-n size=num,version=2|ci,ftype=0|1,parent=0|1]]\n\
 /* no-op info only */	[-N]\n\
 /* prototype file */	[-p fname]\n\
+/* populate from directory */	[-p dirname]\n\
 /* quiet */		[-q]\n\
 /* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
 			    concurrency=num]\n\
@@ -2067,6 +2076,9 @@ proto_opts_parser(
 	case P_SLASHES:
 		cli->proto_slashes_are_spaces = getnum(value, opts, subopt);
 		break;
+	case P_NOATIME:
+		cli->proto_noatime = getnum(value, opts, subopt);
+		break;
 	case P_FILE:
 		fallthrough;
 	default:
@@ -5181,6 +5193,7 @@ main(
 		.log_concurrency = -1, /* auto detect non-mechanical ddev */
 		.rtvol_concurrency = -1, /* auto detect non-mechanical rtdev */
 		.autofsck = FSPROP_AUTOFSCK_UNSET,
+		.proto_noatime = 1,
 	};
 	struct mkfs_params	cfg = {};

@@ -5480,7 +5493,9 @@ main(
 	/*
 	 * Allocate the root inode and anything else in the proto file.
 	 */
-	parse_proto(mp, &cli.fsx, &protostring, cli.proto_slashes_are_spaces);
+	parse_proto(mp, &cli.fsx, &protostring,
+			cli.proto_slashes_are_spaces,
+			cli.proto_noatime);

 	/*
 	 * Protect ourselves against possible stupidity
--
2.49.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory
  2025-04-26 13:55 ` [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory Luca Di Maio
@ 2025-04-28 17:16   ` Darrick J. Wong
  2025-04-29 17:33     ` Luca Di Maio
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2025-04-28 17:16 UTC (permalink / raw)
  To: Luca Di Maio; +Cc: linux-xfs, dimitri.ledkov, smoser, hch

On Sat, Apr 26, 2025 at 03:55:34PM +0200, Luca Di Maio wrote:
> This patch implements the functionality to populate a newly created XFS
> filesystem directly from an existing directory structure.
> 
> It resuses existing protofile logic, it branches if input is a
> directory.
> 
> The population process steps are as follows:
>   - create the root inode before populating content
>   - recursively process nested directories
>   - handle regular files, directories, symlinks, char devices, block
>     devices, and fifos
>   - preserve attributes (ownership, permissions)
>   - preserve mtime timestamps from source files to maintain file history
>     - possible to specify noatime=1 to use current time also for atime
>     - use current time for ctime/crtime
>   - preserve extended attributes and fsxattrs for all file types
>   - preserve hardlinks
> 
> This functionality makes it easier to create populated filesystems
> without having to write protofiles manually.
> It's particularly useful for reproducible builds.
> 
> Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
> ---
>  mkfs/proto.c | 653 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  mkfs/proto.h |   2 +-
>  2 files changed, 646 insertions(+), 9 deletions(-)
> 
> diff --git a/mkfs/proto.c b/mkfs/proto.c
> index 7f56a3d8..23f7998b 100644
> --- a/mkfs/proto.c
> +++ b/mkfs/proto.c
> @@ -5,11 +5,17 @@
>   */
> 
>  #include "libxfs.h"
> +#include "xfs_inode.h"
> +#include <fcntl.h>
> +#include <linux/limits.h>
> +#include <stdio.h>
> +#include <sys/resource.h>
>  #include <sys/stat.h>
>  #include <sys/xattr.h>
>  #include <linux/xattr.h>
>  #include "libfrog/convert.h"
>  #include "proto.h"
> +#include <dirent.h>
> 
>  /*
>   * Prototypes for internal functions.
> @@ -22,6 +28,11 @@ static int newregfile(char **pp, char **fname);
>  static void rtinit(xfs_mount_t *mp);
>  static off_t filesize(int fd);
>  static int slashes_are_spaces;
> +static int noatime;
> +static void populate_from_dir(struct xfs_mount *mp, struct xfs_inode *pip,
> +				struct fsxattr *fsxp, char *cur_path);
> +static void walk_dir(struct xfs_mount *mp, struct xfs_inode *pip,
> +				struct fsxattr *fsxp, char *cur_path);
> 
>  /*
>   * Use this for block reservations needed for mkfs's conditions
> @@ -65,6 +76,18 @@ setup_proto(
> 
>  	if (!fname)
>  		return dflt;
> +
> +	/*
> +	 * handle directory inputs
> +	 * in this case we noop and let successive
> +	 * parse_proto() to handle the directory
> +	 * input.
> +	 */
> +	if ((fd = open(fname, O_DIRECTORY)) > 0) {

0 is a valid (if unlikely) fd return value for open.  But the bigger
problem is...

> +		close(fd);
> +		return fname;

...that now you've reduced the coherence of this function by making it
return either a buffer containing the contents of a protofile minus the
first two lines; a default protofile excerpt if there was no filename,
or ... the path argument, if the path happened to point to a directory.

How does the caller (or parse_proto) figure out what they're supposed to
do with the protostring?  (More on this below.)

> +	}
> +
>  	if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
>  		fprintf(stderr, _("%s: failed to open %s: %s\n"),
>  			progname, fname, strerror(errno));
> @@ -380,9 +403,17 @@ writeattr(
> 
>  	ret = fgetxattr(fd, attrname, valuebuf, valuelen);
>  	if (ret < 0) {
> -		if (errno == EOPNOTSUPP)
> -			return;
> -		fail(_("error collecting xattr value"), errno);
> +		/*
> +		 * in case of filedescriptors with O_PATH, fgetxattr() will
> +		 * fail. let's try to fallback to lgetxattr() using input
> +		 * path.

Fail how?  Shouldn't we gate this retry on the specific error code
returned by fgetxattr on an O_PATH fd?

> +		 */
> +		ret = lgetxattr(fname, attrname, valuebuf, valuelen);
> +		if (ret < 0) {
> +			if (errno == EOPNOTSUPP)
> +				return;
> +			fail(_("error collecting xattr value"), errno);
> +		}
>  	}
>  	if (ret == 0)
>  		return;
> @@ -426,9 +457,17 @@ writeattrs(
> 
>  	ret = flistxattr(fd, namebuf, XATTR_LIST_MAX);
>  	if (ret < 0) {
> -		if (errno == EOPNOTSUPP)
> -			goto out_namebuf;
> -		fail(_("error collecting xattr names"), errno);
> +		/*
> +		 * in case of filedescriptors with O_PATH, flistxattr() will
> +		 * fail. let's try to fallback to llistxattr() using input
> +		 * path.
> +		 */
> +		ret = llistxattr(fname, namebuf, XATTR_LIST_MAX);

Same here.

> +		if (ret < 0) {
> +			if (errno == EOPNOTSUPP)
> +				goto out_namebuf;
> +			fail(_("error collecting xattr names"), errno);
> +		}
>  	}
> 
>  	p = namebuf;
> @@ -934,10 +973,24 @@ parse_proto(
>  	xfs_mount_t	*mp,
>  	struct fsxattr	*fsx,
>  	char		**pp,
> -	int		proto_slashes_are_spaces)
> +	int proto_slashes_are_spaces,
> +	int proto_noatime)
>  {
>  	slashes_are_spaces = proto_slashes_are_spaces;
> -	parseproto(mp, NULL, fsx, pp, NULL);
> +	noatime = proto_noatime;
> +
> +	/*
> +	 * in case of a file input, we will use the prototype file logic
> +	 * else we will fallback to populate from dir.
> +	 */
> +	int fd;
> +	if ((fd = open(*pp,  O_DIRECTORY)) < 0) {

Urrrk, here we just open the protostring returned by setup_proto and
passed in here as @pp?  Recall that if -p file= pointed to a regular
file, then protostring contains everything except the first two lines.

So if you pass in a corrupt protofile:

/stand/diskboot
4872 110
/etc/

this line will see /etc, the directory open succeeds, and now we start
importing /etc.  That isn't the documented behavior of a protofile.

Also, you open the passed-in path twice.  This is a minor point for
mkfs, but what if the path changes from a directory to a regular file in
between the two open(..., O_DIRECTORY) calls?

> +		parseproto(mp, NULL, fsx, pp, NULL);
> +		return;
> +	}
> +
> +	close(fd);
> +	populate_from_dir(mp, NULL, fsx, *pp);

I'm not sure why you don't pass in the opened directory here.  I'm also
not sure why the second parameter exists here; it's always NULL.

>  }
> 
>  /* Create a sb-rooted metadata file. */
> @@ -1171,3 +1224,587 @@ filesize(
>  		return -1;
>  	return stb.st_size;
>  }
> +
> +/* Try to allow as many memfds as possible. */
> +static void
> +bump_max_fds(void)
> +{
> +	struct rlimit rlim = {};
> +	int ret;
> +
> +	ret = getrlimit(RLIMIT_NOFILE, &rlim);
> +	if (!ret) {
> +		rlim.rlim_cur = rlim.rlim_max;
> +		setrlimit(RLIMIT_NOFILE, &rlim);
> +	}
> +}
> +
> +static void
> +writefsxattrs(
> +		struct fsxattr *fsxp,
> +		struct xfs_inode *ip)

Strange indenting here, the usual parameter declaration convention in
xfs is:

	struct xfs_inode	*ip

^ one tab               ^ another tab

also we usually put the inode first because this function modifies the
inode...

> +{
> +	ip->i_projid = fsxp->fsx_projid;
> +	ip->i_extsize = fsxp->fsx_extsize;
> +	ip->i_diflags = xfs_flags2diflags(ip, fsxp->fsx_xflags);
> +	if (xfs_has_v3inodes(ip->i_mount)) {
> +		ip->i_diflags2 = xfs_flags2diflags2(ip, fsxp->fsx_xflags);
> +		ip->i_cowextsize = fsxp->fsx_cowextsize;
> +	}
> +}
> +
> +static void
> +writetimestamps(
> +		struct xfs_inode *ip,
> +		struct stat statbuf)

...like you do here.

> +{
> +	struct timespec64 ts;

                         ^ tab indent here
> +
> +	/*
> +	 * Copy timestamps from source file to destination inode.
> +	 * Usually reproducible archives will delete or not register
> +	 * atime and ctime, for example:
> +	 *    https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html
> +	 * hence we will only copy mtime, and let ctime/crtime be set to
> +	 * current time.
> +	 * atime will be copied over if noatime is false.
> +	 */
> +	ts.tv_sec = statbuf.st_mtim.tv_sec;
> +	ts.tv_nsec = statbuf.st_mtim.tv_nsec;
> +	inode_set_mtime_to_ts(VFS_I(ip), ts);
> +
> +	/*
> +	 * in case of noatime option, we will not copy the atime
> +	 * timestamp from source, but let it be set from gettimeofday()
> +	 */
> +	if (!noatime) {
> +		ts.tv_sec = statbuf.st_atim.tv_sec;
> +		ts.tv_nsec = statbuf.st_atim.tv_nsec;
> +		inode_set_atime_to_ts(VFS_I(ip), ts);
> +	}
> +
> +	return;

No need to explicitly return from the bottom of a void-return function.

> +}
> +
> +struct hardlink {
> +	unsigned long i_ino;

ino_t ?  Since that's what stat returns.

also struct field declarations need a tab between the type and the field
name.

> +	struct xfs_inode *existing_ip;
> +};
> +
> +struct hardlinks {
> +	int count;

This can overflow to negative if there are more than 2^32 hardlinked
files in the source filesystem.  My guess is that this will be rare, but
you ought to program against that anyway...

> +	size_t size;

...since I'm guessing the upper limit on this data structure is whatever
can be held in size_t.

> +	struct hardlink *entries;
> +};
> +
> +/*
> + * keep track of source inodes that are from hardlinks
> + * so we can retrieve them when needed to setup in
> + * destination.
> + */
> +static struct hardlinks *hardlink_tracker = { 0 };
> +
> +static void
> +init_hardlink_tracker(void) {
> +	hardlink_tracker = malloc(sizeof(struct hardlinks));
> +	if (!hardlink_tracker)
> +		fail(_("error allocating hardlinks tracking array"), errno);
> +	memset(hardlink_tracker, 0, sizeof(struct hardlinks));
> +
> +	hardlink_tracker->count = 0;

You just memset the object to zero, this isn't necessary.

(use calloc to elide the memset?)

> +	hardlink_tracker->size = PATH_MAX;

Wait, why is the size being set to PATH_MAX?  Are we tracking path
strings somehow?

> +
> +	hardlink_tracker->entries = malloc(
> +			hardlink_tracker->size * sizeof(struct hardlink));
> +	if (!hardlink_tracker->entries)
> +		fail(_("error allocating hardlinks tracking array"), errno);
> +}
> +
> +static void
> +cleanup_hardlink_tracker(void) {
> +	/*
> +	 * cleanup all pending inodes, call libxfs_irele() on them
> +	 * before freeing memory.
> +	 */
> +	for (int i = 0; i < hardlink_tracker->count; i++)
> +		libxfs_irele(hardlink_tracker->entries[i].existing_ip);
> +
> +	free(hardlink_tracker->entries);
> +	free(hardlink_tracker);

Probably want to set hardlink_tracker back to NULL to prevent UAF.

> +}
> +
> +static struct xfs_inode*
> +get_hardlink_src_inode(
> +		unsigned long i_ino)
> +{
> +	for (int i = 0; i < hardlink_tracker->count; i++) {

Urk, linear search.  Oh well, I guess we can switch to a hashtable if
we get complaints about issues.

> +		if (hardlink_tracker->entries[i].i_ino == i_ino) {
> +			return hardlink_tracker->entries[i].existing_ip;

/me notes that this pins the hardlinked xfs_inode objects in memory.
That's a risky thing for an xfsprogs utility to do because there's no
inode cache like there is in the kernel.  In other words, xfs_iget
creates a brand new xfs_inode object even for the same inumber.

I /think/ that's not technically an issue in mkfs because it's
single-threaded and never goes back to an existing inode.  But you might
consider changing struct hardlink to:

/*
 * Map an inumber in the source filesystem to an inumber in the new
 * filesystem
 */
struct hardlink {
	ino_t		src_ino;
	xfs_ino_t	dst_ino;
};

since xfs_inodes are fairly large objects.

> +		}
> +	}
> +	return NULL;
> +}
> +
> +static void
> +track_hardlink_inode(
> +		unsigned long i_ino,
> +		struct xfs_inode *ip)
> +{
> +	if (hardlink_tracker->count >= hardlink_tracker->size) {
> +		/*
> +		 * double for smaller capacity.
> +		 * instead grow by 25% steps for larger capacities.
> +		 */
> +		const size_t old_size = hardlink_tracker->size;
> +		size_t new_size = old_size * 2;
> +		if (old_size > 1024)
> +			new_size = old_size + (old_size / 4);
> +
> +		struct hardlink *resized_array = realloc(
> +				hardlink_tracker->entries,
> +				new_size * sizeof(struct hardlink));

At this point, is it safe to use reallocarray?  Or will that cause
problems with musl?

> +		if (!resized_array) {
> +			fail(_("error enlarging hardlinks tracking array"), errno);
> +		}
> +		memset(&resized_array[old_size],
> +				0, (new_size - old_size) * sizeof(struct hardlink));
> +
> +		hardlink_tracker->entries = resized_array;
> +		hardlink_tracker->size = new_size;
> +	}
> +
> +	hardlink_tracker->entries[hardlink_tracker->count].i_ino = i_ino;
> +	hardlink_tracker->entries[hardlink_tracker->count].existing_ip = ip;
> +	hardlink_tracker->count++;
> +}
> +
> +static int
> +handle_hardlink(
> +		struct xfs_mount *mp,
> +		struct xfs_inode *pip,
> +		struct fsxattr *fsxp,
> +		int mode,
> +		struct cred creds,
> +		struct xfs_name xname,
> +		int flags,
> +		struct stat file_stat,
> +		xfs_dev_t rdev,
> +		int fd,
> +		char *fname,
> +		char *path)

How many of these parameters are actually needed to create a hardlink?

> +{
> +	int error;
> +	struct xfs_parent_args *ppargs = NULL;
> +	struct xfs_inode *ip;
> +	struct xfs_trans *tp;

Same odd indenting style.

> +	tp = getres(mp, 0);
> +	ppargs = newpptr(mp);

	struct xfs_trans	*tp = getres(mp, 0);
	struct xfs_parent_args	*ppargs = newpptr(mp);

> +
> +	ip = get_hardlink_src_inode(file_stat.st_ino);
> +	if (!ip) {
> +		/*
> +		 * we didn't find the hardlink inode, this means
> +		 * it's the first time we see it, report error
> +		 * so create_file() can continue handling the inode
> +		 * as a regular file type, and later save
> +		 * *ip in our buffer for future consumption.

Documentation of the return value should be in a comment attached to the
top of the function.

> +		 */
> +		return 1;
> +	}
> +	/*
> +	* In case the inode was already in our tracker
> +	* we need to setup the hardlink and skip file
> +	* copy.
> +	*/
> +	libxfs_trans_ijoin(tp, pip, 0);
> +	libxfs_trans_ijoin(tp, ip, 0);
> +
> +	newdirent(mp, tp, pip, &xname, ip, ppargs);
> +
> +	/*
> +	 * Increment the link count
> +	 */
> +	libxfs_bumplink(tp, ip);
> +
> +	/*
> +	 * we won't need fd for hardlinks
> +	 * so we close and reset it.
> +	 */
> +	if (fd >= 0)
> +		close(fd);
> +
> +	libxfs_trans_log_inode(tp, ip, flags);
> +	error = -libxfs_trans_commit(tp);
> +	if (error)
> +		fail(_("Error encountered creating file from prototype file"), error);
> +
> +	libxfs_parent_finish(mp, ppargs);
> +
> +	return 0;
> +}
> +
> +static void
> +create_file(
> +		struct xfs_mount *mp,
> +		struct xfs_inode *pip,
> +		struct fsxattr *fsxp,
> +		int mode,
> +		struct cred creds,
> +		struct xfs_name xname,
> +		int flags,
> +		struct stat file_stat,
> +		xfs_dev_t rdev,
> +		int fd,
> +		char *fname,
> +		char *path)
> +{
> +
> +	int error;
> +	struct xfs_parent_args *ppargs = NULL;
> +	struct xfs_inode *ip;
> +	struct xfs_trans *tp;
> +
> +	if (file_stat.st_nlink > 1) {
> +		error = handle_hardlink(mp, pip, fsxp, mode, creds,
> +			    xname, flags, file_stat,
> +			    rdev, fd, fname, path);
> +		/*
> +		 * if no error is reported it means the hardlink has
> +		 * been correctly found and set, so we don't need to
> +		 * do anything else.
> +		 */
> +		if (!error)
> +			return;

You know, if you inverted the polarity of handle_hardlink's return
value, you could do:

	if (file_stat.st_nlink > 1 && handle_hardlink(...)) {
		close(fd);
		return;
	}

> +		/*
> +		 * if instead we have an error it means the hardlink
> +		 * was not registered, so we proceed to treat it like
> +		 * a regular file, and save it to our tracker later.
> +		 */
> +	}
> +
> +	tp = getres(mp, 0);
> +	ppargs = newpptr(mp);
> +
> +	error = creatproto(&tp, pip, mode, rdev, &creds, fsxp, &ip);
> +	if (error)
> +		fail(_("Inode allocation failed"), error);
> +	libxfs_trans_ijoin(tp, pip, 0);
> +	newdirent(mp, tp, pip, &xname, ip, ppargs);
> +
> +	/*
> +	 * copy over timestamps
> +	 */
> +	writetimestamps(ip, file_stat);
> +	libxfs_trans_log_inode(tp, ip, flags);
> +	error = -libxfs_trans_commit(tp);
> +	if (error)
> +		fail(_("Error encountered creating file from prototype file"), error);
> +
> +	libxfs_parent_finish(mp, ppargs);
> +
> +	/*
> +	 * copy over file content, attributes,
> +	 * extended attributes and timestamps
> +	 *
> +	 * hardlinks will be skipped as fd will
> +	 * be closed before this.

Aren't hardlinks skipped because this function returns if
handle_hardlink() returns 0?

> +	 */
> +	if (fd >= 0) {

Do callers actually pass us negative fd numbers?

> +		writefile(ip, fname, fd);
> +		writeattrs(ip, fname, fd);
> +		writefsxattrs(fsxp, ip);
> +		close(fd);
> +	}
> +
> +	if (file_stat.st_nlink > 1)
> +		/*
> +		 * if we're here it means this is the first time we're
> +		 * encountering an hardlink, so we need to store it and
> +		 * skpi libxfs_irele() to keep it around.
> +		 */
> +		track_hardlink_inode(file_stat.st_ino, ip);
> +	else
> +		/*
> +		 * We release the inode pointer only if we're dealing
> +		 * with a regular file, we need to keep the original
> +		 * inode pointer for hardlinks, they'll be released
> +		 * at the end of the lifecycle when we cleanup the
> +		 * hardlink_tracker.
> +		 */
> +		libxfs_irele(ip);
> +}
> +
> +static void
> +handle_direntry(
> +		struct xfs_mount *mp,
> +		struct xfs_inode *pip,
> +		struct fsxattr *fsxp,
> +		char *cur_path,
> +		struct dirent *entry)
> +{
> +	char link_target[PATH_MAX];
> +	char path[PATH_MAX];
> +	int error;
> +	int fd = -1;
> +	int flags;
> +	int majdev;
> +	int mindev;
> +	int mode;
> +	off_t len;
> +	struct cred creds;
> +	struct stat file_stat;
> +	struct xfs_name xname;
> +	struct xfs_parent_args *ppargs = NULL;
> +	struct xfs_inode *ip;
> +	struct xfs_trans *tp;
> +
> +	/*
> +	 * Skip "." and ".." directories to avoid looping
> +	 */
> +	if (strcmp(entry->d_name, ".") == 0 ||
> +	    strcmp(entry->d_name, "..") == 0) {
> +		return;
> +	}
> +
> +	/*
> +	 * Create the full path to the original file or directory
> +	 */
> +	snprintf(path, sizeof(path), "%s/%s", cur_path, entry->d_name);

Urrk, check the return value here!

/me wonders if you should declare /one/ char path[PATH_MAX] at the top
of the call chain and modify that as we walk down the directory tree,
rather than declaring a new (4k) stack variable every time we walk down
another level.

> +
> +	if (lstat(path, &file_stat) < 0) {
> +		fprintf(stderr, _("%s (error accessing)\n"), entry->d_name);
> +		exit(1);
> +	}
> +
> +	/*
> +	 * symlinks will need to be opened with O_PATH to work, so we handle this
> +	 * special case.
> +	 */
> +	int open_flags = O_NOFOLLOW | O_RDONLY | O_NOATIME;
> +	if ((file_stat.st_mode & S_IFMT) == S_IFLNK) {

S_ISLNK() ?

> +		open_flags = O_NOFOLLOW | O_PATH;
> +	}
> +	if ((fd = open(path, open_flags)) < 0) {
> +		fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, path,
> +			strerror(errno));
> +		exit(1);
> +	}

Not sure you want to open() on a fifo, those will block until someone
opens the other end.

> +
> +	memset(&creds, 0, sizeof(creds));
> +	creds.cr_uid = file_stat.st_uid;
> +	creds.cr_gid = file_stat.st_gid;
> +	xname.name = (unsigned char *)entry->d_name;
> +	xname.len = strlen(entry->d_name);
> +	xname.type = 0;
> +	mode = file_stat.st_mode;

Most of these could be moved to the variable declarations:

	struct cred creds = {
		.cr_uid	= file-stat.st_uid,
		.cr_gid	= file_stat.st_gid,
	};

and now future programmers don't have to be careful about uninitialized
variables.

> +	flags = XFS_ILOG_CORE;
> +	switch (file_stat.st_mode & S_IFMT) {
> +	case S_IFDIR:
> +		tp = getres(mp, 0);
> +		ppargs = newpptr(mp);
> +		error = creatproto(&tp, pip, mode, 0, &creds, fsxp, &ip);
> +		if (error)
> +			fail(_("Inode allocation failed"), error);
> +		libxfs_trans_ijoin(tp, pip, 0);
> +		xname.type = XFS_DIR3_FT_DIR;
> +		newdirent(mp, tp, pip, &xname, ip, ppargs);
> +		libxfs_bumplink(tp, pip);
> +		libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
> +		newdirectory(mp, tp, ip, pip);
> +
> +		/*
> +		 * copy over timestamps
> +		 */
> +		writetimestamps(ip, file_stat);
> +
> +		libxfs_trans_log_inode(tp, ip, flags);
> +		error = -libxfs_trans_commit(tp);
> +		if (error)
> +			fail(_("Directory inode allocation failed."), error);
> +
> +		libxfs_parent_finish(mp, ppargs);
> +		tp = NULL;
> +
> +		/*
> +		 * copy over attributes
> +		 */
> +		writeattrs(ip, entry->d_name, fd);
> +		writefsxattrs(fsxp, ip);
> +		close(fd);
> +
> +		walk_dir(mp, ip, fsxp, path);
> +
> +		libxfs_irele(ip);
> +		break;
> +	case S_IFLNK:
> +		len = readlink(path, link_target, PATH_MAX - 1);
> +		tp = getres(mp, XFS_B_TO_FSB(mp, len));
> +		ppargs = newpptr(mp);
> +		error = creatproto(&tp, pip, mode, 0, &creds, fsxp, &ip);
> +		if (error)
> +			fail(_("Inode allocation failed"), error);
> +		writesymlink(tp, ip, link_target, len);
> +		libxfs_trans_ijoin(tp, pip, 0);
> +		xname.type = XFS_DIR3_FT_SYMLINK;
> +		newdirent(mp, tp, pip, &xname, ip, ppargs);
> +
> +		/*
> +		 * copy over timestamps
> +		 */
> +		writetimestamps(ip, file_stat);
> +
> +		libxfs_trans_log_inode(tp, ip, flags);
> +		error = -libxfs_trans_commit(tp);
> +		if (error)
> +			fail(_("Error encountered creating file from prototype file"),
> +			     error);
> +		libxfs_parent_finish(mp, ppargs);
> +
> +		/*
> +		 * copy over attributes
> +		 *
> +		 * being a symlink we opened the filedescriptor with O_PATH
> +		 * this will make flistxattr() and fgetxattr() fail, so we
> +		 * will need to fallback to llistxattr() and lgetxattr(), this
> +		 * will need the full path to the original file, not just the
> +		 * entry name.
> +		 */
> +		writeattrs(ip, path, fd);
> +		writefsxattrs(fsxp, ip);
> +		close(fd);

PS: symlink (really, non-directory) files can be hardlinked too.

$ ln -s moo cow
$ ls -d cow
lrwxrwxrwx 1 djwong djwong 3 Apr 28 10:05 cow -> moo
$ ln cow bar
$ ls -d cow bar
lrwxrwxrwx 2 djwong djwong 3 Apr 28 10:05 bar -> moo
lrwxrwxrwx 2 djwong djwong 3 Apr 28 10:05 cow -> moo

> +
> +		libxfs_irele(ip);
> +		break;
> +	case S_IFREG:
> +		xname.type = XFS_DIR3_FT_REG_FILE;
> +		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
> +			    0, fd, entry->d_name, path);
> +		break;
> +	case S_IFCHR:
> +		xname.type = XFS_DIR3_FT_CHRDEV;
> +		majdev = major(file_stat.st_rdev);
> +		mindev = minor(file_stat.st_rdev);
> +		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
> +			    IRIX_MKDEV(majdev, mindev), fd, entry->d_name,
> +			    path);
> +		break;
> +	case S_IFBLK:
> +		xname.type = XFS_DIR3_FT_BLKDEV;
> +		majdev = major(file_stat.st_rdev);
> +		mindev = minor(file_stat.st_rdev);
> +		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
> +			    IRIX_MKDEV(majdev, mindev), fd, entry->d_name,
> +			    path);
> +		break;
> +	case S_IFIFO:
> +		flags |= XFS_ILOG_DEV;
> +		create_file(mp, pip, fsxp, mode, creds, xname, flags, file_stat,
> +			    0, fd, entry->d_name, path);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +/*
> + * walk_dir will recursively list files and directories
> + * and populate the mountpoint *mp with them using handle_direntry().
> + */
> +static void
> +walk_dir(
> +		struct xfs_mount *mp,
> +		struct xfs_inode *pip,
> +		struct fsxattr *fsxp,
> +		char *cur_path)
> +{
> +	DIR *dir;
> +	struct dirent *entry;
> +
> +	/*
> +	 * open input directory and iterate over all entries in it.
> +	 * when another directory is found, we will recursively call
> +	 * walk_dir.
> +	 */
> +	if ((dir = opendir(cur_path)) == NULL)
> +		fail(_("cannot open input dir"), 1);

Please report *which* directory couldn't be opened.

> +	while ((entry = readdir(dir)) != NULL) {
> +		handle_direntry(mp, pip, fsxp, cur_path, entry);
> +	}
> +	closedir(dir);
> +}
> +
> +static void
> +populate_from_dir(
> +		struct xfs_mount *mp,
> +		struct xfs_inode *pip,
> +		struct fsxattr *fsxp,
> +		char *cur_path)
> +{
> +	int error;
> +	int mode;
> +	struct cred creds;
> +	struct xfs_inode *ip;
> +	struct xfs_trans *tp;
> +
> +	/*
> +	 * we first ensure we have the root inode
> +	 */
> +	memset(&creds, 0, sizeof(creds));
> +	creds.cr_uid = 0;
> +	creds.cr_gid = 0;
> +	mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
> +	tp = getres(mp, 0);
> +	error = creatproto(&tp, pip, mode | S_IFDIR, 0, &creds, fsxp, &ip);
> +	if (error)
> +		fail(_("Inode allocation failed"), error);
> +	pip = ip;
> +	mp->m_sb.sb_rootino = ip->i_ino;
> +	libxfs_log_sb(tp);
> +	newdirectory(mp, tp, ip, pip);
> +	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +	error = -libxfs_trans_commit(tp);
> +	if (error)
> +		fail(_("Inode allocation failed"), error);
> +
> +	libxfs_parent_finish(mp, NULL);
> +
> +	/*
> +	 * RT initialization.  Do this here to ensure that
> +	 * the RT inodes get placed after the root inode.
> +	 */
> +	error = create_metadir(mp);
> +	if (error)
> +		fail(_("Creation of the metadata directory inode failed"), error);
> +
> +	rtinit(mp);
> +
> +	/*
> +	 * by nature of walk_dir() we could be opening
> +	 * a great number of fds for deeply nested directory
> +	 * trees.
> +	 * try to bump max fds limit.
> +	 */
> +	bump_max_fds();
> +
> +	/*
> +	 * initialize the hardlinks tracker
> +	 */
> +	init_hardlink_tracker();
> +	/*
> +	 * now that we have a root inode, let's
> +	 * walk the input dir and populate the partition
> +	 */
> +	walk_dir(mp, ip, fsxp, cur_path);
> +
> +	/*
> +	 * cleanup hardlinks tracker
> +	 */
> +	cleanup_hardlink_tracker();
> +
> +	/*
> +	 * we free up our root inode
> +	 * only when we finished populating the
> +	 * root filesystem
> +	 */
> +	libxfs_irele(ip);
> +}
> diff --git a/mkfs/proto.h b/mkfs/proto.h
> index be1ceb45..fea416f6 100644
> --- a/mkfs/proto.h
> +++ b/mkfs/proto.h
> @@ -8,7 +8,7 @@
> 
>  char *setup_proto(char *fname);
>  void parse_proto(struct xfs_mount *mp, struct fsxattr *fsx, char **pp,
> -		int proto_slashes_are_spaces);
> +		int proto_slashes_are_spaces, int proto_noatime);

Kinda wondering if parse_proto should just take a const struct
cli_params pointer.

--D

>  void res_failed(int err);
> 
>  #endif /* MKFS_PROTO_H_ */
> --
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/2] mkfs: modify -p flag to populate a filesystem from a directory
  2025-04-26 13:55 ` [PATCH v7 2/2] mkfs: modify -p flag " Luca Di Maio
@ 2025-04-28 17:23   ` Darrick J. Wong
  2025-04-28 17:55     ` Luca Di Maio
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2025-04-28 17:23 UTC (permalink / raw)
  To: Luca Di Maio; +Cc: linux-xfs, dimitri.ledkov, smoser, hch

On Sat, Apr 26, 2025 at 03:55:35PM +0200, Luca Di Maio wrote:
> right now the `-p` flag only supports a file input.
> this patch will add support to input a directory.
> on directory input, the populate functionality to copy files into
> the root filesystem.
> 
> add `noatime` flag to popts, that will let the user choose if copy the
> atime timestamps from source directory.
> 
> add documentation for new functionalities in man pages.
> 
> Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com>
> ---
>  man/man8/mkfs.xfs.8.in | 41 +++++++++++++++++++++++++++++------------
>  mkfs/xfs_mkfs.c        | 19 +++++++++++++++++--
>  2 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
> index 37e3a88e..f06a3c9c 100644
> --- a/man/man8/mkfs.xfs.8.in
> +++ b/man/man8/mkfs.xfs.8.in
> @@ -28,7 +28,7 @@ mkfs.xfs \- construct an XFS filesystem
>  .I naming_options
>  ] [
>  .B \-p
> -.I protofile_options
> +.I prototype_options
>  ] [
>  .B \-q
>  ] [
> @@ -977,30 +977,39 @@ option set.
>  .PP
>  .PD 0
>  .TP
> -.BI \-p " protofile_options"
> +.BI \-p " prototype_options"
>  .TP
>  .BI "Section Name: " [proto]
>  .PD
> -These options specify the protofile parameters for populating the filesystem.
> +These options specify the prototype parameters for populating the filesystem.
>  The valid
> -.I protofile_options
> +.I prototype_options
>  are:
>  .RS 1.2i
>  .TP
> -.BI [file=] protofile
> +.BI [file=]
>  The
>  .B file=
>  prefix is not required for this CLI argument for legacy reasons.
>  If specified as a config file directive, the prefix is required.
> -
> +.TP
> +.BI [file=] directory
>  If the optional
>  .PD
> -.I protofile
> -argument is given,
> +.I prototype
> +argument is given, and it's a directory,
>  .B mkfs.xfs
> -uses
> -.I protofile
> -as a prototype file and takes its directions from that file.
> +will populate the root file system with the contents of the given directory.
> +Content, timestamps, attributes and extended attributes are preserved

I thought this only preserved the atime and mtime timestamps?

> +for all file types.
> +.TP
> +.BI [file=] protofile
> +If the optional
> +.PD
> +.I prototype
> +argument is given, and it's a file,

"If the optional prototype argument is given and points to a regular
file..."

(I think, technically speaking, you store a protofile on a block device
and pass that in, but let's not encourage that kind of absurdity)

> +.B mkfs.xfs
> +uses it as a prototype file and takes its directions from that file.
>  The blocks and inodes specifiers in the
>  .I protofile
>  are provided for backwards compatibility, but are otherwise unused.
> @@ -1136,8 +1145,16 @@ always terminated with the dollar (
>  .B $
>  ) token.
>  .TP
> +.BI noatime= value
> +If set to 1, when we're populating the root filesystem from a directory (
> +.B file=directory
> +option)
> +access times are NOT preserved and are set to the current time instead.
> +Set to 0 to preserve access times from source files.

I would say "Set to 0 to copy access timestamps from source files".

Though if the default is going to be "do not copy atime from sourcev
file" then the option to enable copying of atime ought to be named
"atime".

> +By default, this is set to 1.
> +.TP
>  .BI slashes_are_spaces= value
> -If set to 1, slashes ("/") in the first token of each line of the protofile
> +If set to 1, slashes ("/") in the first token of each line of the prototype file
>  are converted to spaces.
>  This enables the creation of a filesystem containing filenames with spaces.
>  By default, this is set to 0.
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 3f4455d4..1715e3fb 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -121,6 +121,7 @@ enum {
> 
>  enum {
>  	P_FILE = 0,
> +	P_NOATIME,
>  	P_SLASHES,
>  	P_MAX_OPTS,
>  };
> @@ -709,6 +710,7 @@ static struct opt_params popts = {
>  	.ini_section = "proto",
>  	.subopts = {
>  		[P_FILE] = "file",
> +		[P_NOATIME] = "noatime",
>  		[P_SLASHES] = "slashes_are_spaces",
>  		[P_MAX_OPTS] = NULL,
>  	},
> @@ -717,6 +719,12 @@ static struct opt_params popts = {
>  		  .conflicts = { { NULL, LAST_CONFLICT } },
>  		  .defaultval = SUBOPT_NEEDS_VAL,
>  		},
> +		{ .index = P_NOATIME,
> +		  .conflicts = { { NULL, LAST_CONFLICT } },
> +		  .minval = 0,
> +		  .maxval = 1,
> +		  .defaultval = 1,
> +		},
>  		{ .index = P_SLASHES,
>  		  .conflicts = { { NULL, LAST_CONFLICT } },
>  		  .minval = 0,
> @@ -1022,7 +1030,6 @@ struct cli_params {
> 
>  	char	*cfgfile;
>  	char	*protofile;
> -

Extraneous change?

>  	enum fsprop_autofsck autofsck;
> 
>  	/* parameters that depend on sector/block size being validated. */
> @@ -1045,6 +1052,7 @@ struct cli_params {
>  	int	lsunit;
>  	int	is_supported;
>  	int	proto_slashes_are_spaces;
> +	int	proto_noatime;
>  	int	data_concurrency;
>  	int	log_concurrency;
>  	int	rtvol_concurrency;
> @@ -1170,6 +1178,7 @@ usage( void )
>  /* naming */		[-n size=num,version=2|ci,ftype=0|1,parent=0|1]]\n\
>  /* no-op info only */	[-N]\n\
>  /* prototype file */	[-p fname]\n\
> +/* populate from directory */	[-p dirname]\n\

/* populate from directory */		[-p dirname,noatime=0|1]

--D

>  /* quiet */		[-q]\n\
>  /* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx,rgcount=n,rgsize=n,\n\
>  			    concurrency=num]\n\
> @@ -2067,6 +2076,9 @@ proto_opts_parser(
>  	case P_SLASHES:
>  		cli->proto_slashes_are_spaces = getnum(value, opts, subopt);
>  		break;
> +	case P_NOATIME:
> +		cli->proto_noatime = getnum(value, opts, subopt);
> +		break;
>  	case P_FILE:
>  		fallthrough;
>  	default:
> @@ -5181,6 +5193,7 @@ main(
>  		.log_concurrency = -1, /* auto detect non-mechanical ddev */
>  		.rtvol_concurrency = -1, /* auto detect non-mechanical rtdev */
>  		.autofsck = FSPROP_AUTOFSCK_UNSET,
> +		.proto_noatime = 1,
>  	};
>  	struct mkfs_params	cfg = {};
> 
> @@ -5480,7 +5493,9 @@ main(
>  	/*
>  	 * Allocate the root inode and anything else in the proto file.
>  	 */
> -	parse_proto(mp, &cli.fsx, &protostring, cli.proto_slashes_are_spaces);
> +	parse_proto(mp, &cli.fsx, &protostring,
> +			cli.proto_slashes_are_spaces,
> +			cli.proto_noatime);
> 
>  	/*
>  	 * Protect ourselves against possible stupidity
> --
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/2] mkfs: modify -p flag to populate a filesystem from a directory
  2025-04-28 17:23   ` Darrick J. Wong
@ 2025-04-28 17:55     ` Luca Di Maio
  0 siblings, 0 replies; 8+ messages in thread
From: Luca Di Maio @ 2025-04-28 17:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, dimitri.ledkov, smoser, hch

Thanks Darrick for the review

On Mon, Apr 28, 2025 at 10:23:55AM -0700, Darrick J. Wong wrote:
> On Sat, Apr 26, 2025 at 03:55:35PM +0200, Luca Di Maio wrote:
> > +Content, timestamps, attributes and extended attributes are preserved
>
> I thought this only preserved the atime and mtime timestamps?

Right, I'll be more specific

> > +for all file types.
> > +.TP
> > +.BI [file=] protofile
> > +If the optional
> > +.PD
> > +.I prototype
> > +argument is given, and it's a file,
>
> "If the optional prototype argument is given and points to a regular
> file..."
>
> (I think, technically speaking, you store a protofile on a block device
> and pass that in, but let's not encourage that kind of absurdity)

Will modify, thanks

> > +.B mkfs.xfs
> > +uses it as a prototype file and takes its directions from that file.
> >  The blocks and inodes specifiers in the
> >  .I protofile
> >  are provided for backwards compatibility, but are otherwise unused.
> > @@ -1136,8 +1145,16 @@ always terminated with the dollar (
> >  .B $
> >  ) token.
> >  .TP
> > +.BI noatime= value
> > +If set to 1, when we're populating the root filesystem from a directory (
> > +.B file=directory
> > +option)
> > +access times are NOT preserved and are set to the current time instead.
> > +Set to 0 to preserve access times from source files.
>
> I would say "Set to 0 to copy access timestamps from source files".
>
> Though if the default is going to be "do not copy atime from sourcev
> file" then the option to enable copying of atime ought to be named
> "atime".

I was also thinking `atime=1` would be better to not invert the logic, I
was going with the `noatime` option for consistency with mount options,
but I assume this has nothing to do with that, so we can use atime

Will modify to be atime

> > -
>
> Extraneous change?

Yep sorry

> > +/* populate from directory */	[-p dirname]\n\
>
> /* populate from directory */		[-p dirname,noatime=0|1]
>
> --D

Ack thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory
  2025-04-28 17:16   ` Darrick J. Wong
@ 2025-04-29 17:33     ` Luca Di Maio
  2025-05-02  7:04       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Luca Di Maio @ 2025-04-29 17:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, dimitri.ledkov, smoser, hch

Thanks again Darrick,

On Mon, Apr 28, 2025 at 10:16:06AM -0700, Darrick J. Wong wrote:
> On Sat, Apr 26, 2025 at 03:55:34PM +0200, Luca Di Maio wrote:
> > +	if ((fd = open(fname, O_DIRECTORY)) > 0) {
>
> 0 is a valid (if unlikely) fd return value for open.  But the bigger
> problem is...
>
> > +		return fname;
>
> ...that now you've reduced the coherence of this function by making it
> return either a buffer containing the contents of a protofile minus the
> first two lines; a default protofile excerpt if there was no filename,
> or ... the path argument, if the path happened to point to a directory.
>
> How does the caller (or parse_proto) figure out what they're supposed to
> do with the protostring?  (More on this below.)
>
> > +	if ((fd = open(*pp,  O_DIRECTORY)) < 0) {
>
> Urrrk, here we just open the protostring returned by setup_proto and
> passed in here as @pp?  Recall that if -p file= pointed to a regular
> file, then protostring contains everything except the first two lines.
>
> So if you pass in a corrupt protofile:
>
> /stand/diskboot
> 4872 110
> /etc/
>
> this line will see /etc, the directory open succeeds, and now we start
> importing /etc.  That isn't the documented behavior of a protofile.
>
> Also, you open the passed-in path twice.  This is a minor point for
> mkfs, but what if the path changes from a directory to a regular file in
> between the two open(..., O_DIRECTORY) calls?

Right, this is confusing, I'll do a more structured return type
for the various cases

> > -		fail(_("error collecting xattr value"), errno);
> > +		/*
> > +		 * in case of filedescriptors with O_PATH, fgetxattr() will
> > +		 * fail. let's try to fallback to lgetxattr() using input
> > +		 * path.
>
> Fail how?  Shouldn't we gate this retry on the specific error code
> returned by fgetxattr on an O_PATH fd?
>
> > +		ret = llistxattr(fname, namebuf, XATTR_LIST_MAX);
>
> Same here.
>

Right, will fix and explain the error

> > +		parseproto(mp, NULL, fsx, pp, NULL);
> > +		return;
> > +	}
> > +
> > +	close(fd);
> > +	populate_from_dir(mp, NULL, fsx, *pp);
>
> I'm not sure why you don't pass in the opened directory here.  I'm also
> not sure why the second parameter exists here; it's always NULL.

Right, will cleanup

> > +static void
> > +writefsxattrs(
> > +		struct fsxattr *fsxp,
> > +		struct xfs_inode *ip)
>
> Strange indenting here, the usual parameter declaration convention in
> xfs is:
>
> 	struct xfs_inode	*ip
>
> ^ one tab               ^ another tab
>
> also we usually put the inode first because this function modifies the
> inode...
>

Ack will fix those

> > +}
> > +
> > +struct hardlink {
> > +	unsigned long i_ino;
>
> ino_t ?  Since that's what stat returns.
>
> also struct field declarations need a tab between the type and the field
> name.

Ack

> > +	struct xfs_inode *existing_ip;
> > +};
> > +
> > +struct hardlinks {
> > +	int count;
>
> This can overflow to negative if there are more than 2^32 hardlinked
> files in the source filesystem.  My guess is that this will be rare, but
> you ought to program against that anyway...
>
> > +	size_t size;
>
> ...since I'm guessing the upper limit on this data structure is whatever
> can be held in size_t.

Right, will upgrate it to size_t

> > +	struct hardlink *entries;
> > +};
> > +
> > +/*
> > + * keep track of source inodes that are from hardlinks
> > + * so we can retrieve them when needed to setup in
> > + * destination.
> > + */
> > +static struct hardlinks *hardlink_tracker = { 0 };
> > +
> > +static void
> > +init_hardlink_tracker(void) {
> > +	hardlink_tracker = malloc(sizeof(struct hardlinks));
> > +	if (!hardlink_tracker)
> > +		fail(_("error allocating hardlinks tracking array"), errno);
> > +	memset(hardlink_tracker, 0, sizeof(struct hardlinks));
> > +
> > +	hardlink_tracker->count = 0;
>
> You just memset the object to zero, this isn't necessary.
>
> (use calloc to elide the memset?)

Yep will switch to calloc, thanks

> > +	hardlink_tracker->size = PATH_MAX;
>
> Wait, why is the size being set to PATH_MAX?  Are we tracking path
> strings somehow?
>

Right, I tought to just reuse a constant already defined, I'll just set
it explicitly to a number, it will be resized most likely anyway.

> > +}
> > +
> > +static struct xfs_inode*
> > +get_hardlink_src_inode(
> > +		unsigned long i_ino)
> > +{
> > +	for (int i = 0; i < hardlink_tracker->count; i++) {
>
> Urk, linear search.  Oh well, I guess we can switch to a hashtable if
> we get complaints about issues.

Yea, if it's a problem I can implement an hashtable, from my local
testing using larger source directories (1.3mln inodes, ~400k hardlinks)
the difference was actually just a few seconds (given that most of the
time is doing i/o)

If we get complains we can come back to this

> > +		if (hardlink_tracker->entries[i].i_ino == i_ino) {
> > +			return hardlink_tracker->entries[i].existing_ip;
>
> /me notes that this pins the hardlinked xfs_inode objects in memory.
> That's a risky thing for an xfsprogs utility to do because there's no
> inode cache like there is in the kernel.  In other words, xfs_iget
> creates a brand new xfs_inode object even for the same inumber.
>
> I /think/ that's not technically an issue in mkfs because it's
> single-threaded and never goes back to an existing inode.  But you might
> consider changing struct hardlink to:
>
> /*
>  * Map an inumber in the source filesystem to an inumber in the new
>  * filesystem
>  */
> struct hardlink {
> 	ino_t		src_ino;
> 	xfs_ino_t	dst_ino;
> };
>
> since xfs_inodes are fairly large objects.

Ack, didn't know about libxfs_iget()
I'll switch to only store src/dst ino_t, thanks

> > +		struct hardlink *resized_array = realloc(
> > +				hardlink_tracker->entries,
> > +				new_size * sizeof(struct hardlink));
>
> At this point, is it safe to use reallocarray?  Or will that cause
> problems with musl?

I don't think it's a problem, reading around I see that modern
versions of musl (since around 2020-2021) do include reallocarray().
I'll switch to that

> > +static int
> > +handle_hardlink(
> > +		struct xfs_mount *mp,
> > +		struct xfs_inode *pip,
> > +		struct fsxattr *fsxp,
> > +		int mode,
> > +		struct cred creds,
> > +		struct xfs_name xname,
> > +		int flags,
> > +		struct stat file_stat,
> > +		xfs_dev_t rdev,
> > +		int fd,
> > +		char *fname,
> > +		char *path)
>
> How many of these parameters are actually needed to create a hardlink?

Will cleanup

> > +		/*
> > +		 * if no error is reported it means the hardlink has
> > +		 * been correctly found and set, so we don't need to
> > +		 * do anything else.
> > +		 */
> > +		if (!error)
> > +			return;
>
> You know, if you inverted the polarity of handle_hardlink's return
> value, you could do:
>
> 	if (file_stat.st_nlink > 1 && handle_hardlink(...)) {
> 		close(fd);
> 		return;
> 	}

Right, that's better, thanks

> > +
> > +	/*
> > +	 * copy over file content, attributes,
> > +	 * extended attributes and timestamps
> > +	 *
> > +	 * hardlinks will be skipped as fd will
> > +	 * be closed before this.
>
> Aren't hardlinks skipped because this function returns if
> handle_hardlink() returns 0?
>
> > +	 */
> > +	if (fd >= 0) {
>
> Do callers actually pass us negative fd numbers?

Hardlink of a FIFO should be fd=-1 as we're going to skip open() for
FIFOs (as you suggested)

> > +
> > +	/*
> > +	 * Create the full path to the original file or directory
> > +	 */
> > +	snprintf(path, sizeof(path), "%s/%s", cur_path, entry->d_name);
>
> Urrk, check the return value here!
>
> /me wonders if you should declare /one/ char path[PATH_MAX] at the top
> of the call chain and modify that as we walk down the directory tree,
> rather than declaring a new (4k) stack variable every time we walk down
> another level.

Yes it's better to have one buffer PATH_MAX long and handle one instance
of that, will modify accordingly

> > +
> > +	if (lstat(path, &file_stat) < 0) {
> > +		fprintf(stderr, _("%s (error accessing)\n"), entry->d_name);
> > +		exit(1);
> > +	}
> > +
> > +	/*
> > +	 * symlinks will need to be opened with O_PATH to work, so we handle this
> > +	 * special case.
> > +	 */
> > +	int open_flags = O_NOFOLLOW | O_RDONLY | O_NOATIME;
> > +	if ((file_stat.st_mode & S_IFMT) == S_IFLNK) {
>
> S_ISLNK() ?

Ack

> > +		open_flags = O_NOFOLLOW | O_PATH;
> > +	}
> > +	if ((fd = open(path, open_flags)) < 0) {
> > +		fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, path,
> > +			strerror(errno));
> > +		exit(1);
> > +	}
>
> Not sure you want to open() on a fifo, those will block until someone
> opens the other end.

Right, will fix

> > +
> > +	memset(&creds, 0, sizeof(creds));
> > +	creds.cr_uid = file_stat.st_uid;
> > +	creds.cr_gid = file_stat.st_gid;
> > +	xname.name = (unsigned char *)entry->d_name;
> > +	xname.len = strlen(entry->d_name);
> > +	xname.type = 0;
> > +	mode = file_stat.st_mode;
>
> Most of these could be moved to the variable declarations:
>
> 	struct cred creds = {
> 		.cr_uid	= file-stat.st_uid,
> 		.cr_gid	= file_stat.st_gid,
> 	};
>
> and now future programmers don't have to be careful about uninitialized
> variables.

Right, thanks

> > +		/*
> > +		 * copy over attributes
> > +		 *
> > +		 * being a symlink we opened the filedescriptor with O_PATH
> > +		 * this will make flistxattr() and fgetxattr() fail, so we
> > +		 * will need to fallback to llistxattr() and lgetxattr(), this
> > +		 * will need the full path to the original file, not just the
> > +		 * entry name.
> > +		 */
> > +		writeattrs(ip, path, fd);
> > +		writefsxattrs(fsxp, ip);
> > +		close(fd);
>
> PS: symlink (really, non-directory) files can be hardlinked too.
>
> $ ln -s moo cow
> $ ls -d cow
> lrwxrwxrwx 1 djwong djwong 3 Apr 28 10:05 cow -> moo
> $ ln cow bar
> $ ls -d cow bar
> lrwxrwxrwx 2 djwong djwong 3 Apr 28 10:05 bar -> moo
> lrwxrwxrwx 2 djwong djwong 3 Apr 28 10:05 cow -> moo

Right, I didn't think of this, will modify accordingly

> > +
> > +	/*
> > +	 * open input directory and iterate over all entries in it.
> > +	 * when another directory is found, we will recursively call
> > +	 * walk_dir.
> > +	 */
> > +	if ((dir = opendir(cur_path)) == NULL)
> > +		fail(_("cannot open input dir"), 1);
>
> Please report *which* directory couldn't be opened.

Ack

Thanks for the review
L.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory
  2025-04-29 17:33     ` Luca Di Maio
@ 2025-05-02  7:04       ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2025-05-02  7:04 UTC (permalink / raw)
  To: Luca Di Maio; +Cc: Darrick J. Wong, linux-xfs, dimitri.ledkov, smoser, hch

On the way back from meetings and long distance travel,
so just chiming in with a few points.

One is that the patch split still doesn't make sense to me.  Patch 1 adds
all the infrastructure which remains unused, and then patch 2 trivially
wires it up.  Please merge the two.

> > Urk, linear search.  Oh well, I guess we can switch to a hashtable if
> > we get complaints about issues.
> 
> Yea, if it's a problem I can implement an hashtable, from my local
> testing using larger source directories (1.3mln inodes, ~400k hardlinks)
> the difference was actually just a few seconds (given that most of the
> time is doing i/o)

Please add this to the commit log, it's always good to know about the
trade offs and the data supporting it when later looking at the commit
history.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-02  7:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-26 13:55 [PATCH v7 0/2] mkfs: add ability to populate filesystem from directory Luca Di Maio
2025-04-26 13:55 ` [PATCH v7 1/2] proto: add ability to populate a filesystem from a directory Luca Di Maio
2025-04-28 17:16   ` Darrick J. Wong
2025-04-29 17:33     ` Luca Di Maio
2025-05-02  7:04       ` Christoph Hellwig
2025-04-26 13:55 ` [PATCH v7 2/2] mkfs: modify -p flag " Luca Di Maio
2025-04-28 17:23   ` Darrick J. Wong
2025-04-28 17:55     ` Luca Di Maio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox