linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs-progs: mkfs/rootdir: add hard link support
@ 2024-08-20  3:45 Qu Wenruo
  2024-08-20  3:45 ` [PATCH v2 1/2] " Qu Wenruo
  2024-08-20  3:45 ` [PATCH v2 2/2] btrfs-progs: mkfs-tests: add hardlink related tests for --subvol Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-08-20  3:45 UTC (permalink / raw)
  To: linux-btrfs

[Changelog]
v2:
- Fix several grammar errors and change "can not" to more formal
  "cannot"
- Initialize the temporary hardlink_entry structure as const
- Double quote the $nr_hardlink used in test case

With the recently reworked --rootdir support, although it solves several
hard link related problems, it splits the hard links into new inodes.

And on each split, it shows a warning on each file with hardlinks.

Although the split behavior doesn't cause any data corruption, it can
still be pretty noisy for rootfs creation, as there are a lot of distros
storing timezone files as hardlinks.

This patchset adds back the hard link detection and creation, with
enhanced handling to co-operate with --subvol option.

The details can be found in the first patch, with the new corner case
introduced by --subvol option.

The second patch enhances the existing --rootdir and --subvol test case
with extra corner cases like hard links, and hard links split by
subvolume boundary.

Qu Wenruo (2):
  btrfs-progs: mkfs/rootdir: add hard link support
  btrfs-progs: mkfs-tests: add hardlink related tests for --subvol

 Documentation/mkfs.btrfs.rst                |  13 ++
 mkfs/rootdir.c                              | 199 +++++++++++++++++---
 tests/mkfs-tests/036-rootdir-subvol/test.sh |  78 ++++++--
 3 files changed, 247 insertions(+), 43 deletions(-)

--
2.46.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2 1/2] btrfs-progs: mkfs/rootdir: add hard link support
  2024-08-20  3:45 [PATCH v2 0/2] btrfs-progs: mkfs/rootdir: add hard link support Qu Wenruo
@ 2024-08-20  3:45 ` Qu Wenruo
  2024-08-20  3:45 ` [PATCH v2 2/2] btrfs-progs: mkfs-tests: add hardlink related tests for --subvol Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-08-20  3:45 UTC (permalink / raw)
  To: linux-btrfs

The new hard link detection and creation support is done by maintaining
an rb tree with the following members:

- st_ino, st_dev
  This is to record the stat() report from the host fs.
  With this two, we can detect if it's really a hard link (st_dev
  determines one filesystem/subvolume, and st_ino determines the inode
  number inside the fs).

- root
  This is btrfs root pointer. This a special requirement for the recent
  introduced "--subvol" option.

  As we can have the following corner case:

  rootdir/
  |- foobar_hardlink1
  |- foobar_hardlink2
  |- subv/		<- To be a subvolume inside btrfs
     |- foobar_hardlink3

  In above case, on the host fs, `subv/` directory is just a regular
  directory, but in the new btrfs it will be a subvolume.

  In that case, `foobar_hardlink3` cannot be created as a hard link,
  but a new inode.

- st_nlink and found_nlink
  Records the original reported number of links, and the nlinks we
  created inside btrfs.
  This is recorded in case we created all hard links and can remove
  the entry early.

- btrfs_ino
  This is the inode number inside btrfs.

And since we can handle hard links safely, remove all the related
warnings, and add a new note for `--subvol` option, warning about the
case where we need to split hard links due to subvolume boundary.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/mkfs.btrfs.rst |  13 +++
 mkfs/rootdir.c               | 203 +++++++++++++++++++++++++++++------
 2 files changed, 185 insertions(+), 31 deletions(-)

diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
index 0e9e84adffc8..18c491da4c94 100644
--- a/Documentation/mkfs.btrfs.rst
+++ b/Documentation/mkfs.btrfs.rst
@@ -160,6 +160,19 @@ OPTIONS
         directory.  The option *--rootdir* must also be specified, and *subdir* must be an
         existing subdirectory within it.  This option can be specified multiple times.
 
+	If there are hard links inside *rootdir* and *subdir* will split the
+	subvolumes, like the following case::
+
+		rootdir/
+		|- hardlink1
+		|- hardlink2
+		|- subdir/  <- will be a subvolume
+		   |- hardlink3
+
+	In that case we cannot create `hardlink3` as hardlinks of
+	`hardlink1` and `hardlink2` because hardlink3 will be inside a new
+	subvolume.
+
 --shrink
         Shrink the filesystem to its minimal size, only works with *--rootdir* option.
 
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 3cc94316be4d..e1ca00f57e60 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -42,6 +42,7 @@
 #include "common/extent-tree-utils.h"
 #include "common/root-tree-utils.h"
 #include "common/path-utils.h"
+#include "common/rbtree-utils.h"
 #include "mkfs/rootdir.h"
 
 static u32 fs_block_size;
@@ -74,6 +75,52 @@ struct inode_entry {
 	struct list_head list;
 };
 
+/*
+ * Record all the hard links we found for a specific file inside
+ * rootdir.
+ *
+ * The search is based on (root, st_dev, st_ino).
+ * The reason for @root as a search index is, for hard links separated by
+ * subvolume boundaries:
+ *
+ * rootdir/
+ * |- foobar_hardlink1
+ * |- foobar_hardlink2
+ * |- subv/	<- Will be created as a subvolume
+ *    |- foobar_hardlink3.
+ *
+ * Since all the 3 hard links are inside the same rootdir and the same
+ * filesystem, on the host fs they are all hard links to the same inode.
+ *
+ * But for the btrfs we are building, only hardlink1 and hardlink2 can be
+ * created as hardlinks. Since we cannot create hardlink across subvolume.
+ * So we need @root as a search index to handle such case.
+ */
+struct hardlink_entry {
+	struct rb_node node;
+	/*
+	 * The following three members are reported from the stat() of the
+	 * host filesystem.
+	 *
+	 * For st_nlink we cannot trust it unconditionally, as
+	 * some hard links may be out of rootdir.
+	 * If @found_nlink reached @st_nlink, we know we have created all
+	 * the hard links and can remove the entry.
+	 */
+	dev_t st_dev;
+	ino_t st_ino;
+	nlink_t st_nlink;
+
+	/* The following two are inside the new btrfs. */
+	struct btrfs_root *root;
+	u64 btrfs_ino;
+
+	/* How many hard links we have created. */
+	nlink_t found_nlink;
+};
+
+static struct rb_root hardlink_root = RB_ROOT;
+
 /*
  * The path towards the rootdir.
  *
@@ -93,9 +140,6 @@ static struct rootdir_path current_path = {
 	.level = 0,
 };
 
-/* Track if a hardlink was found and a warning was printed. */
-static bool g_hardlink_warning;
-static u64 g_hardlink_count;
 static struct btrfs_trans_handle *g_trans = NULL;
 static struct list_head *g_subvols;
 static u64 next_subvol_id = BTRFS_FIRST_FREE_OBJECTID;
@@ -133,6 +177,82 @@ static int rootdir_path_push(struct rootdir_path *path, struct btrfs_root *root,
 	return 0;
 }
 
+static int hardlink_compare_nodes(const struct rb_node *node1,
+				  const struct rb_node *node2)
+{
+	const struct hardlink_entry *entry1;
+	const struct hardlink_entry *entry2;
+
+	entry1 = rb_entry(node1, struct hardlink_entry, node);
+	entry2 = rb_entry(node2, struct hardlink_entry, node);
+	UASSERT(entry1->root);
+	UASSERT(entry2->root);
+
+	if (entry1->st_dev < entry2->st_dev)
+		return -1;
+	if (entry1->st_dev > entry2->st_dev)
+		return 1;
+	if (entry1->st_ino < entry2->st_ino)
+		return -1;
+	if (entry1->st_ino > entry2->st_ino)
+		return 1;
+	if (entry1->root < entry2->root)
+		return -1;
+	if (entry1->root > entry2->root)
+		return 1;
+	return 0;
+}
+
+static struct hardlink_entry *find_hard_link(struct btrfs_root *root,
+					     const struct stat *st)
+{
+	struct rb_node *node;
+	const struct hardlink_entry tmp = {
+		.st_dev = st->st_dev,
+		.st_ino = st->st_ino,
+		.root = root,
+	};
+
+	node = rb_search(&hardlink_root, &tmp,
+			(rb_compare_keys)hardlink_compare_nodes, NULL);
+	if (node)
+		return rb_entry(node, struct hardlink_entry, node);
+	return NULL;
+}
+
+static int add_hard_link(struct btrfs_root *root, u64 btrfs_ino,
+			 const struct stat *st)
+{
+	struct hardlink_entry *new;
+	int ret;
+
+	UASSERT(st->st_nlink > 1);
+
+	new = calloc(1, sizeof(*new));
+	if (!new)
+		return -ENOMEM;
+
+	new->root = root;
+	new->btrfs_ino = btrfs_ino;
+	new->found_nlink = 1;
+	new->st_dev = st->st_dev;
+	new->st_ino = st->st_ino;
+	new->st_nlink = st->st_nlink;
+	ret = rb_insert(&hardlink_root, &new->node, hardlink_compare_nodes);
+	if (ret) {
+		free(new);
+		return -EEXIST;
+	}
+	return 0;
+}
+
+static void free_one_hardlink(struct rb_node *node)
+{
+	struct hardlink_entry *entry = rb_entry(node, struct hardlink_entry, node);
+
+	free(entry);
+}
+
 static void stat_to_inode_item(struct btrfs_inode_item *dst, const struct stat *st)
 {
 	/*
@@ -498,29 +618,10 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 	struct btrfs_inode_item inode_item = { 0 };
 	struct inode_entry *parent;
 	struct rootdir_subvol *rds;
+	const bool have_hard_links = (!S_ISDIR(st->st_mode) && st->st_nlink > 1);
 	u64 ino;
 	int ret;
 
-	/*
-	 * Hard link needs extra detection code, not supported for now, but
-	 * it's not to break anything but splitting the hard links into new
-	 * inodes.  And we do not even know if the hard links are inside the
-	 * rootdir.
-	 *
-	 * So here we only need to do extra warning.
-	 *
-	 * On most filesystems st_nlink of a directory is the number of
-	 * subdirs, including "." and "..", so skip directory inodes.
-	 */
-	if (unlikely(!S_ISDIR(st->st_mode) && st->st_nlink > 1)) {
-		if (!g_hardlink_warning) {
-			warning("'%s' has extra hardlinks, they will be converted into new inodes",
-				full_path);
-			g_hardlink_warning = true;
-		}
-		g_hardlink_count++;
-	}
-
 	/* The rootdir itself. */
 	if (unlikely(ftwbuf->level == 0)) {
 		u64 root_ino;
@@ -620,6 +721,37 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 	parent = rootdir_path_last(&current_path);
 	root = parent->root;
 
+	/* For non-directory inode, check if there is already any hard link. */
+	if (have_hard_links) {
+		struct hardlink_entry *found;
+
+		found = find_hard_link(root, st);
+		/*
+		 * Can only add the hard link if it doesn't cross subvolume
+		 * boundary.
+		 */
+		if (found && found->root == root) {
+			ret = btrfs_add_link(g_trans, root, found->btrfs_ino,
+					     parent->ino, full_path + ftwbuf->base,
+					     strlen(full_path) - ftwbuf->base,
+					     ftype_to_btrfs_type(st->st_mode),
+					     NULL, 1, 0);
+			if (ret < 0) {
+				errno = -ret;
+				error(
+			"failed to add link for hard link ('%s'): %m", full_path);
+				return ret;
+			}
+			found->found_nlink++;
+			/* We found all hard links for it. Can remove the entry. */
+			if (found->found_nlink >= found->st_nlink) {
+				rb_erase(&found->node, &hardlink_root);
+				free(found);
+			}
+			return 0;
+		}
+	}
+
 	ret = btrfs_find_free_objectid(g_trans, root,
 				       BTRFS_FIRST_FREE_OBJECTID, &ino);
 	if (ret < 0) {
@@ -635,7 +767,6 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 		error("failed to insert inode item %llu for '%s': %m", ino, full_path);
 		return ret;
 	}
-
 	ret = btrfs_add_link(g_trans, root, ino, parent->ino,
 			     full_path + ftwbuf->base,
 			     strlen(full_path) - ftwbuf->base,
@@ -646,6 +777,22 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 		error("failed to add link for inode %llu ('%s'): %m", ino, full_path);
 		return ret;
 	}
+
+	/*
+	 * Found a possible hard link, add it into the hard link rb tree for
+	 * future detection.
+	 */
+	if (have_hard_links) {
+		ret = add_hard_link(root, ino, st);
+		if (ret < 0) {
+			errno = -ret;
+			error("failed to add hard link record for '%s': %m",
+			       full_path);
+			return ret;
+		}
+		ret = 0;
+	}
+
 	/*
 	 * btrfs_add_link() has increased the nlink to 1 in the metadata.
 	 * Also update the value in case we need to update the inode item
@@ -714,8 +861,6 @@ int btrfs_mkfs_fill_dir(struct btrfs_trans_handle *trans, const char *source_dir
 	}
 
 	g_trans = trans;
-	g_hardlink_warning = false;
-	g_hardlink_count = 0;
 	g_subvols = subvols;
 	INIT_LIST_HEAD(&current_path.inode_list);
 
@@ -725,13 +870,9 @@ int btrfs_mkfs_fill_dir(struct btrfs_trans_handle *trans, const char *source_dir
 		return ret;
 	}
 
-	if (g_hardlink_warning)
-		warning("%llu hardlinks were detected in %s, all converted to new inodes",
-			g_hardlink_count, source_dir);
-
 	while (current_path.level > 0)
 		rootdir_path_pop(&current_path);
-
+	rb_free_nodes(&hardlink_root, free_one_hardlink);
 	return 0;
 }
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] btrfs-progs: mkfs-tests: add hardlink related tests for --subvol
  2024-08-20  3:45 [PATCH v2 0/2] btrfs-progs: mkfs/rootdir: add hard link support Qu Wenruo
  2024-08-20  3:45 ` [PATCH v2 1/2] " Qu Wenruo
@ 2024-08-20  3:45 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-08-20  3:45 UTC (permalink / raw)
  To: linux-btrfs

This introduces two new cases:

- 3 hardlinks without any subvolume
  This should results 3 hard links inside the btrfs.

- 3 hardlinks, but a subvolume will split 2 of them
  Then the 2 inside the same subvolume should still report 2 nlinks,
  but the lone one inside the new subvolume can only report 1 nlink.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 mkfs/rootdir.c                              |  8 +--
 tests/mkfs-tests/036-rootdir-subvol/test.sh | 78 +++++++++++++++++----
 2 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index e1ca00f57e60..a24afe0715bb 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -721,7 +721,7 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 	parent = rootdir_path_last(&current_path);
 	root = parent->root;
 
-	/* For non-directory inode, check if there is already any hard link. */
+	/* Check if there is already a hard link record for this. */
 	if (have_hard_links) {
 		struct hardlink_entry *found;
 
@@ -767,6 +767,7 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 		error("failed to insert inode item %llu for '%s': %m", ino, full_path);
 		return ret;
 	}
+
 	ret = btrfs_add_link(g_trans, root, ino, parent->ino,
 			     full_path + ftwbuf->base,
 			     strlen(full_path) - ftwbuf->base,
@@ -778,10 +779,7 @@ static int ftw_add_inode(const char *full_path, const struct stat *st,
 		return ret;
 	}
 
-	/*
-	 * Found a possible hard link, add it into the hard link rb tree for
-	 * future detection.
-	 */
+	/* Record this new hard link. */
 	if (have_hard_links) {
 		ret = add_hard_link(root, ino, st);
 		if (ret < 0) {
diff --git a/tests/mkfs-tests/036-rootdir-subvol/test.sh b/tests/mkfs-tests/036-rootdir-subvol/test.sh
index 63ba928f348a..38491332170d 100755
--- a/tests/mkfs-tests/036-rootdir-subvol/test.sh
+++ b/tests/mkfs-tests/036-rootdir-subvol/test.sh
@@ -11,23 +11,75 @@ prepare_test_dev
 
 tmp=$(_mktemp_dir mkfs-rootdir)
 
-run_check touch "$tmp/foo"
-run_check mkdir "$tmp/dir"
-run_check mkdir "$tmp/dir/subvol"
-run_check touch "$tmp/dir/subvol/bar"
+basic()
+{
+	run_check touch "$tmp/foo"
+	run_check mkdir "$tmp/dir"
+	run_check mkdir "$tmp/dir/subvol"
+	run_check touch "$tmp/dir/subvol/bar"
 
-run_check_mkfs_test_dev --rootdir "$tmp" --subvol dir/subvol
-run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+	run_check_mkfs_test_dev --rootdir "$tmp" --subvol dir/subvol
+	run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
 
-run_check_mount_test_dev
-run_check_stdout $SUDO_HELPER "$TOP/btrfs" subvolume list "$TEST_MNT" | \
+	run_check_mount_test_dev
+	run_check_stdout $SUDO_HELPER "$TOP/btrfs" subvolume list "$TEST_MNT" | \
 	cut -d\  -f9 > "$tmp/output"
-run_check_umount_test_dev
+	run_check_umount_test_dev
 
-result=$(cat "$tmp/output")
+	result=$(cat "$tmp/output")
 
-if [ "$result" != "dir/subvol" ]; then
-	_fail "dir/subvol not in subvolume list"
-fi
+	if [ "$result" != "dir/subvol" ]; then
+		_fail "dir/subvol not in subvolume list"
+	fi
+	rm -rf -- "$tmp/foo" "$tmp/dir"
+}
 
+basic_hardlinks()
+{
+	run_check touch "$tmp/hl1"
+	run_check ln "$tmp/hl1" "$tmp/hl2"
+	run_check mkdir "$tmp/dir"
+	run_check ln "$tmp/hl1" "$tmp/dir/hl3"
+
+	run_check_mkfs_test_dev --rootdir "$tmp"
+	run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+
+	run_check_mount_test_dev
+	nr_hardlink=$(run_check_stdout $SUDO_HELPER stat -c "%h" "$TEST_MNT/hl1")
+
+	if [ "$nr_hardlink" -ne 3 ]; then
+		_fail "hard link number incorrect, has ${nr_hardlink} expect 3"
+	fi
+	run_check_umount_test_dev
+	rm -rf -- "$tmp/hl1" "$tmp/hl2" "$tmp/dir"
+}
+
+split_by_subvolume_hardlinks()
+{
+	run_check touch "$tmp/hl1"
+	run_check ln "$tmp/hl1" "$tmp/hl2"
+	run_check mkdir "$tmp/subv"
+	run_check ln "$tmp/hl1" "$tmp/subv/hl3"
+
+	run_check_mkfs_test_dev --rootdir "$tmp" --subvol subv
+	run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+
+	run_check_mount_test_dev
+	nr_hardlink=$(run_check_stdout $SUDO_HELPER stat -c "%h" "$TEST_MNT/hl1")
+
+	if [ $nr_hardlink -ne 2 ]; then
+		_fail "hard link number incorrect for hl1, has ${nr_hardlink} expect 2"
+	fi
+
+	nr_hardlink=$(run_check_stdout $SUDO_HELPER stat -c "%h" "$TEST_MNT/subv/hl3")
+	if [ $nr_hardlink -ne 1 ]; then
+		_fail "hard link number incorrect for subv/hl3, has ${nr_hardlink} expect 1"
+	fi
+	run_check_umount_test_dev
+	rm -rf -- "$tmp/hl1" "$tmp/hl2" "$tmp/dir"
+}
+
+basic
+basic_hardlinks
+split_by_subvolume_hardlinks
 rm -rf -- "$tmp"
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-08-20  3:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-20  3:45 [PATCH v2 0/2] btrfs-progs: mkfs/rootdir: add hard link support Qu Wenruo
2024-08-20  3:45 ` [PATCH v2 1/2] " Qu Wenruo
2024-08-20  3:45 ` [PATCH v2 2/2] btrfs-progs: mkfs-tests: add hardlink related tests for --subvol Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).