public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] io path options + reflink (mild security implications)
@ 2024-11-12  3:35 Kent Overstreet
  2024-11-12  3:35 ` [PATCH 1/3] bcachefs: BCH_SB_VERSION_INCOMPAT Kent Overstreet
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-11-12  3:35 UTC (permalink / raw)
  To: linux-bcachefs, linux-fsdevel, linux-kernel
  Cc: Kent Overstreet, brauner, sforshee, viro, jack

so, I've been fleshing out various things at the intersection of io path
options + rebalance + reflink, and this is the last little bit

background: bcachefs has io path options that can be set filesystem
wide, or per inode, and when changed rebalance automatically picks them
up and does the right thing

reflink adds a wrinkle, which is that we'd like e.g. recursively setting
the foreground/background targets on some files to move them to the
appropriate device (or nr_replicas etc.), like other data - but if a
user did a reflink copy of some other user's data and then set
nr_replicas=1, that would be bad.

so this series adds a flag to reflink pointers for "may propagate option
changes", which can then be set at remap_file_range() time based on
vfs level permission checks.

so, question for everyone: is write access to the source file what we
want? or should it be stricter, i.e. ownership matches?

then, we're currently missing mnt_idmap plumbing to remap_file_range()
to do said permissions checks - do we want to do that? or is there an
easier way?

Kent Overstreet (3):
  bcachefs: BCH_SB_VERSION_INCOMPAT
  bcachefs: bcachefs_metadata_version_reflink_p_may_update_opts
  bcachefs: Option changes now get propagated to reflinked data

 fs/bcachefs/bcachefs.h        |  2 ++
 fs/bcachefs/bcachefs_format.h | 28 +++++++++++--------
 fs/bcachefs/fs-io.c           |  9 ++++++-
 fs/bcachefs/move.c            | 51 ++++++++++++++++++++++++++++++-----
 fs/bcachefs/recovery.c        | 27 ++++++++++++++++---
 fs/bcachefs/reflink.c         | 18 ++++++++++---
 fs/bcachefs/reflink.h         |  3 ++-
 fs/bcachefs/reflink_format.h  |  2 ++
 fs/bcachefs/super-io.c        | 48 ++++++++++++++++++++++++++++++---
 fs/bcachefs/super-io.h        | 18 ++++++++++---
 10 files changed, 172 insertions(+), 34 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] bcachefs: BCH_SB_VERSION_INCOMPAT
  2024-11-12  3:35 [PATCH 0/3] io path options + reflink (mild security implications) Kent Overstreet
@ 2024-11-12  3:35 ` Kent Overstreet
  2024-11-12  3:35 ` [PATCH 2/3] bcachefs: bcachefs_metadata_version_reflink_p_may_update_opts Kent Overstreet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-11-12  3:35 UTC (permalink / raw)
  To: linux-bcachefs, linux-fsdevel, linux-kernel
  Cc: Kent Overstreet, brauner, sforshee, viro, jack

We've been getting away from feature bits: they don't have any kind of
ordering, and thus it's possible for people to enable weird combinations
of features that were never tested or intended to be run.

Much better to just give every new feature, compatible or incompatible,
a version number.

Additionally, we probably won't ever rev the major version number: major
version numbers represent incompatible versions, but that doesn't really
fit with how we actually roll out incompatible features - we need a
better way of rolling out incompatible features.

So, this patch adds two new superblock fields:
- BCH_SB_VERSION_INCOMPAT
- BCH_SB_VERSION_INCOMPAT_ALLOWED

BCH_SB_VERSION_INCOMPAT_ALLOWED indicates that incompatible features up
to version number x are allowed to be used without user prompting, but
it does not by itself deny old versions from mounting.

BCH_SB_VERSION_INCOMPAT does deny old versions from mounting, and must
be <= BCH_SB_VERSION_INCOMPAT_ALLOWED.

BCH_SB_VERSION_INCOMPAT will only be set when a codepath attempts to use
an incompatible feature, so as to not unnecessarily break compatibility
with old versions.

bch2_request_incompat_feature() is the new interface to check if an
incompatible feature may be used.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs.h        |  2 ++
 fs/bcachefs/bcachefs_format.h | 25 ++++++++++--------
 fs/bcachefs/recovery.c        | 27 +++++++++++++++++---
 fs/bcachefs/super-io.c        | 48 ++++++++++++++++++++++++++++++++---
 fs/bcachefs/super-io.h        | 18 ++++++++++---
 5 files changed, 99 insertions(+), 21 deletions(-)

diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 18479787f31a..2110c79b276b 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -771,6 +771,8 @@ struct bch_fs {
 		__uuid_t	user_uuid;
 
 		u16		version;
+		u16		version_incompat;
+		u16		version_incompat_allowed;
 		u16		version_min;
 		u16		version_upgrade_complete;
 
diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index dc14bfe37e3b..043390ce4b53 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -842,6 +842,9 @@ LE64_BITMASK(BCH_SB_VERSION_UPGRADE_COMPLETE,
 					struct bch_sb, flags[5],  0, 16);
 LE64_BITMASK(BCH_SB_ALLOCATOR_STUCK_TIMEOUT,
 					struct bch_sb, flags[5], 16, 32);
+LE64_BITMASK(BCH_SB_VERSION_INCOMPAT,	struct bch_sb, flags[5], 32, 48);
+LE64_BITMASK(BCH_SB_VERSION_INCOMPAT_ALLOWED,
+					struct bch_sb, flags[5], 48, 64);
 
 static inline __u64 BCH_SB_COMPRESSION_TYPE(const struct bch_sb *sb)
 {
@@ -894,21 +897,23 @@ static inline void SET_BCH_SB_BACKGROUND_COMPRESSION_TYPE(struct bch_sb *sb, __u
 	x(new_varint,			15)	\
 	x(journal_no_flush,		16)	\
 	x(alloc_v2,			17)	\
-	x(extents_across_btree_nodes,	18)
+	x(extents_across_btree_nodes,	18)	\
+	x(incompat_version_field,	19)
 
 #define BCH_SB_FEATURES_ALWAYS				\
-	((1ULL << BCH_FEATURE_new_extent_overwrite)|	\
-	 (1ULL << BCH_FEATURE_extents_above_btree_updates)|\
-	 (1ULL << BCH_FEATURE_btree_updates_journalled)|\
-	 (1ULL << BCH_FEATURE_alloc_v2)|\
-	 (1ULL << BCH_FEATURE_extents_across_btree_nodes))
+	(BIT_ULL(BCH_FEATURE_new_extent_overwrite)|	\
+	 BIT_ULL(BCH_FEATURE_extents_above_btree_updates)|\
+	 BIT_ULL(BCH_FEATURE_btree_updates_journalled)|\
+	 BIT_ULL(BCH_FEATURE_alloc_v2)|\
+	 BIT_ULL(BCH_FEATURE_extents_across_btree_nodes))
 
 #define BCH_SB_FEATURES_ALL				\
 	(BCH_SB_FEATURES_ALWAYS|			\
-	 (1ULL << BCH_FEATURE_new_siphash)|		\
-	 (1ULL << BCH_FEATURE_btree_ptr_v2)|		\
-	 (1ULL << BCH_FEATURE_new_varint)|		\
-	 (1ULL << BCH_FEATURE_journal_no_flush))
+	 BIT_ULL(BCH_FEATURE_new_siphash)|		\
+	 BIT_ULL(BCH_FEATURE_btree_ptr_v2)|		\
+	 BIT_ULL(BCH_FEATURE_new_varint)|		\
+	 BIT_ULL(BCH_FEATURE_journal_no_flush)|		\
+	 BIT_ULL(BCH_FEATURE_incompat_version_field))
 
 enum bch_sb_feature {
 #define x(f, n) BCH_FEATURE_##f,
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 7086a7226989..e252b6ce0923 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -595,6 +595,7 @@ static bool check_version_upgrade(struct bch_fs *c)
 					 bch2_latest_compatible_version(c->sb.version));
 	unsigned old_version = c->sb.version_upgrade_complete ?: c->sb.version;
 	unsigned new_version = 0;
+	bool ret = false;
 
 	if (old_version < bcachefs_metadata_required_upgrade_below) {
 		if (c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible ||
@@ -650,14 +651,32 @@ static bool check_version_upgrade(struct bch_fs *c)
 		}
 
 		bch_info(c, "%s", buf.buf);
+		printbuf_exit(&buf);
+
+		ret = true;
+	}
 
-		bch2_sb_upgrade(c, new_version);
+	if (new_version > c->sb.version_incompat &&
+	    c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible) {
+		struct printbuf buf = PRINTBUF;
+
+		prt_str(&buf, "Now allowing incompatible features up to ");
+		bch2_version_to_text(&buf, new_version);
+		prt_str(&buf, ", previously allowed up to ");
+		bch2_version_to_text(&buf, c->sb.version_incompat_allowed);
+		prt_newline(&buf);
 
+		bch_info(c, "%s", buf.buf);
 		printbuf_exit(&buf);
-		return true;
+
+		ret = true;
 	}
 
-	return false;
+	if (ret)
+		bch2_sb_upgrade(c, new_version,
+				c->opts.version_upgrade == BCH_VERSION_UPGRADE_incompatible);
+
+	return ret;
 }
 
 int bch2_fs_recovery(struct bch_fs *c)
@@ -1056,7 +1075,7 @@ int bch2_fs_initialize(struct bch_fs *c)
 	bch2_check_version_downgrade(c);
 
 	if (c->opts.version_upgrade != BCH_VERSION_UPGRADE_none) {
-		bch2_sb_upgrade(c, bcachefs_metadata_version_current);
+		bch2_sb_upgrade(c, bcachefs_metadata_version_current, true);
 		SET_BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb, bcachefs_metadata_version_current);
 		bch2_write_super(c);
 	}
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index 4c29f8215d54..0b5f0ce199de 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -42,7 +42,7 @@ static const struct bch2_metadata_version bch2_metadata_versions[] = {
 #undef x
 };
 
-void bch2_version_to_text(struct printbuf *out, unsigned v)
+void bch2_version_to_text(struct printbuf *out, enum bcachefs_metadata_version v)
 {
 	const char *str = "(unknown version)";
 
@@ -55,7 +55,7 @@ void bch2_version_to_text(struct printbuf *out, unsigned v)
 	prt_printf(out, "%u.%u: %s", BCH_VERSION_MAJOR(v), BCH_VERSION_MINOR(v), str);
 }
 
-unsigned bch2_latest_compatible_version(unsigned v)
+enum bcachefs_metadata_version bch2_latest_compatible_version(enum bcachefs_metadata_version v)
 {
 	if (!BCH_VERSION_MAJOR(v))
 		return v;
@@ -69,6 +69,16 @@ unsigned bch2_latest_compatible_version(unsigned v)
 	return v;
 }
 
+void bch2_set_version_incompat(struct bch_fs *c, enum bcachefs_metadata_version version)
+{
+	mutex_lock(&c->sb_lock);
+	SET_BCH_SB_VERSION_INCOMPAT(c->disk_sb.sb,
+		max(BCH_SB_VERSION_INCOMPAT(c->disk_sb.sb), version));
+	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_FEATURE_incompat_version_field);
+	bch2_write_super(c);
+	mutex_unlock(&c->sb_lock);
+}
+
 const char * const bch2_sb_fields[] = {
 #define x(name, nr)	#name,
 	BCH_SB_FIELDS()
@@ -364,7 +374,8 @@ static int bch2_sb_validate(struct bch_sb_handle *disk_sb,
 		return ret;
 
 	if (sb->features[1] ||
-	    (le64_to_cpu(sb->features[0]) & (~0ULL << BCH_FEATURE_NR))) {
+	    (le64_to_cpu(sb->features[0]) & (~0ULL << BCH_FEATURE_NR)) ||
+	    BCH_SB_VERSION_INCOMPAT(sb) > bcachefs_metadata_version_current) {
 		prt_printf(out, "Filesystem has incompatible features");
 		return -BCH_ERR_invalid_sb_features;
 	}
@@ -407,6 +418,20 @@ static int bch2_sb_validate(struct bch_sb_handle *disk_sb,
 		return -BCH_ERR_invalid_sb_time_precision;
 	}
 
+	if (BCH_SB_VERSION_INCOMPAT(sb) > BCH_SB_VERSION_INCOMPAT_ALLOWED(sb)) {
+		prt_printf(out, "Invalid version_incompat > incompat_allowed: %llu > %llu",
+			   BCH_SB_VERSION_INCOMPAT(sb),
+			   BCH_SB_VERSION_INCOMPAT_ALLOWED(sb));
+		return -BCH_ERR_invalid_sb_version;
+	}
+
+	if (BCH_SB_VERSION_INCOMPAT_ALLOWED(sb) > le16_to_cpu(sb->version)) {
+		prt_printf(out, "Invalid incompat_allowed > version: %llu > %u",
+			   BCH_SB_VERSION_INCOMPAT_ALLOWED(sb),
+			   le16_to_cpu(sb->version));
+		return -BCH_ERR_invalid_sb_version;
+	}
+
 	if (!flags) {
 		/*
 		 * Been seeing a bug where these are getting inexplicably
@@ -520,6 +545,9 @@ static void bch2_sb_update(struct bch_fs *c)
 	c->sb.uuid		= src->uuid;
 	c->sb.user_uuid		= src->user_uuid;
 	c->sb.version		= le16_to_cpu(src->version);
+	c->sb.version_incompat	= BCH_SB_VERSION_INCOMPAT(src);
+	c->sb.version_incompat_allowed
+				= BCH_SB_VERSION_INCOMPAT_ALLOWED(src);
 	c->sb.version_min	= le16_to_cpu(src->version_min);
 	c->sb.version_upgrade_complete = BCH_SB_VERSION_UPGRADE_COMPLETE(src);
 	c->sb.nr_devices	= src->nr_devices;
@@ -1159,7 +1187,7 @@ bool bch2_check_version_downgrade(struct bch_fs *c)
 	return ret;
 }
 
-void bch2_sb_upgrade(struct bch_fs *c, unsigned new_version)
+void bch2_sb_upgrade(struct bch_fs *c, unsigned new_version, bool incompat)
 {
 	lockdep_assert_held(&c->sb_lock);
 
@@ -1169,6 +1197,10 @@ void bch2_sb_upgrade(struct bch_fs *c, unsigned new_version)
 
 	c->disk_sb.sb->version = cpu_to_le16(new_version);
 	c->disk_sb.sb->features[0] |= cpu_to_le64(BCH_SB_FEATURES_ALL);
+
+	if (incompat)
+		SET_BCH_SB_VERSION_INCOMPAT_ALLOWED(c->disk_sb.sb,
+			max(BCH_SB_VERSION_INCOMPAT_ALLOWED(c->disk_sb.sb), new_version));
 }
 
 static int bch2_sb_ext_validate(struct bch_sb *sb, struct bch_sb_field *f,
@@ -1333,6 +1365,14 @@ void bch2_sb_to_text(struct printbuf *out, struct bch_sb *sb,
 	bch2_version_to_text(out, le16_to_cpu(sb->version));
 	prt_newline(out);
 
+	prt_printf(out, "Incompatible features allowed:\t");
+	bch2_version_to_text(out, BCH_SB_VERSION_INCOMPAT_ALLOWED(sb));
+	prt_newline(out);
+
+	prt_printf(out, "Incompatible features in use:\t");
+	bch2_version_to_text(out, BCH_SB_VERSION_INCOMPAT(sb));
+	prt_newline(out);
+
 	prt_printf(out, "Version upgrade complete:\t");
 	bch2_version_to_text(out, BCH_SB_VERSION_UPGRADE_COMPLETE(sb));
 	prt_newline(out);
diff --git a/fs/bcachefs/super-io.h b/fs/bcachefs/super-io.h
index 90e7b176cdd0..25b4bfe7a25e 100644
--- a/fs/bcachefs/super-io.h
+++ b/fs/bcachefs/super-io.h
@@ -18,8 +18,20 @@ static inline bool bch2_version_compatible(u16 version)
 		version >= bcachefs_metadata_version_min;
 }
 
-void bch2_version_to_text(struct printbuf *, unsigned);
-unsigned bch2_latest_compatible_version(unsigned);
+void bch2_version_to_text(struct printbuf *, enum bcachefs_metadata_version);
+enum bcachefs_metadata_version bch2_latest_compatible_version(enum bcachefs_metadata_version);
+
+void bch2_set_version_incompat(struct bch_fs *, enum bcachefs_metadata_version);
+
+static inline bool bch2_request_incompat_feature(struct bch_fs *c,
+						 enum bcachefs_metadata_version version)
+{
+	if (version < c->sb.version_incompat_allowed)
+		return false;
+	if (version > c->sb.version_incompat)
+		bch2_set_version_incompat(c, version);
+	return true;
+}
 
 static inline size_t bch2_sb_field_bytes(struct bch_sb_field *f)
 {
@@ -94,7 +106,7 @@ static inline void bch2_check_set_feature(struct bch_fs *c, unsigned feat)
 }
 
 bool bch2_check_version_downgrade(struct bch_fs *);
-void bch2_sb_upgrade(struct bch_fs *, unsigned);
+void bch2_sb_upgrade(struct bch_fs *, unsigned, bool);
 
 void __bch2_sb_field_to_text(struct printbuf *, struct bch_sb *,
 			     struct bch_sb_field *);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] bcachefs: bcachefs_metadata_version_reflink_p_may_update_opts
  2024-11-12  3:35 [PATCH 0/3] io path options + reflink (mild security implications) Kent Overstreet
  2024-11-12  3:35 ` [PATCH 1/3] bcachefs: BCH_SB_VERSION_INCOMPAT Kent Overstreet
@ 2024-11-12  3:35 ` Kent Overstreet
  2024-11-12  3:35 ` [PATCH 3/3] bcachefs: Option changes now get propagated to reflinked data Kent Overstreet
  2024-11-12 10:53 ` [PATCH 0/3] io path options + reflink (mild security implications) Jan Kara
  3 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-11-12  3:35 UTC (permalink / raw)
  To: linux-bcachefs, linux-fsdevel, linux-kernel
  Cc: Kent Overstreet, brauner, sforshee, viro, jack

Previously, io path option changes on a file would be picked up
automatically and applied to existing data - but not for reflinked data,
as we had no way of doing this safely. A user may have had permission to
copy (and reflink) a given file, but not write to it, and if so they
shouldn't be allowed to change e.g. nr_replicas or other options.

This uses the incompat feature mechanism in the previous patch to add a
new incompatible flag to bch_reflink_p, indicating whether a given
reflink pointer may propagate io path option changes back to the
indirect extent.

In this initial patch we're only setting it for the source extents.

We'd like to set it for the destination in a reflink copy, when the user
has write access to the source, but that requires mnt_idmap which is not
curretly plumbed up to remap_file_range.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs_format.h |  3 ++-
 fs/bcachefs/fs-io.c           |  9 ++++++++-
 fs/bcachefs/reflink.c         | 18 +++++++++++++++---
 fs/bcachefs/reflink.h         |  3 ++-
 fs/bcachefs/reflink_format.h  |  2 ++
 5 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index 043390ce4b53..8cdf4ddf2d5c 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -677,7 +677,8 @@ struct bch_sb_field_ext {
 	x(disk_accounting_v3,		BCH_VERSION(1, 10))		\
 	x(disk_accounting_inum,		BCH_VERSION(1, 11))		\
 	x(rebalance_work_acct_fix,	BCH_VERSION(1, 12))		\
-	x(inode_has_child_snapshots,	BCH_VERSION(1, 13))
+	x(inode_has_child_snapshots,	BCH_VERSION(1, 13))		\
+	x(reflink_p_may_update_opts,	BCH_VERSION(1, 14))
 
 enum bcachefs_metadata_version {
 	bcachefs_metadata_version_min = 9,
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c
index c6fdfec51082..12a747f20f6d 100644
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@@ -877,11 +877,18 @@ loff_t bch2_remap_file_range(struct file *file_src, loff_t pos_src,
 	bch2_mark_pagecache_unallocated(src, pos_src >> 9,
 				   (pos_src + aligned_len) >> 9);
 
+	/*
+	 * XXX: we'd like to be telling bch2_remap_range() if we have
+	 * permission to write to the source file, and thus if io path option
+	 * changes should be propagated through the copy, but we need mnt_idmap
+	 * from the pathwalk, awkward
+	 */
 	ret = bch2_remap_range(c,
 			       inode_inum(dst), pos_dst >> 9,
 			       inode_inum(src), pos_src >> 9,
 			       aligned_len >> 9,
-			       pos_dst + len, &i_sectors_delta);
+			       pos_dst + len, &i_sectors_delta,
+			       false);
 	if (ret < 0)
 		goto err;
 
diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c
index 38db5a011702..c243cd9aa63b 100644
--- a/fs/bcachefs/reflink.c
+++ b/fs/bcachefs/reflink.c
@@ -482,7 +482,8 @@ int bch2_trigger_indirect_inline_data(struct btree_trans *trans,
 
 static int bch2_make_extent_indirect(struct btree_trans *trans,
 				     struct btree_iter *extent_iter,
-				     struct bkey_i *orig)
+				     struct bkey_i *orig,
+				     bool reflink_p_may_update_opts_field)
 {
 	struct bch_fs *c = trans->c;
 	struct btree_iter reflink_iter = { NULL };
@@ -548,6 +549,9 @@ static int bch2_make_extent_indirect(struct btree_trans *trans,
 
 	SET_REFLINK_P_IDX(&r_p->v, bkey_start_offset(&r_v->k));
 
+	if (reflink_p_may_update_opts_field)
+		SET_REFLINK_P_MAY_UPDATE_OPTIONS(&r_p->v, true);
+
 	ret = bch2_trans_update(trans, extent_iter, &r_p->k_i,
 				BTREE_UPDATE_internal_snapshot_node);
 err:
@@ -578,7 +582,8 @@ s64 bch2_remap_range(struct bch_fs *c,
 		     subvol_inum dst_inum, u64 dst_offset,
 		     subvol_inum src_inum, u64 src_offset,
 		     u64 remap_sectors,
-		     u64 new_i_size, s64 *i_sectors_delta)
+		     u64 new_i_size, s64 *i_sectors_delta,
+		     bool may_change_src_io_path_opts)
 {
 	struct btree_trans *trans;
 	struct btree_iter dst_iter, src_iter;
@@ -591,6 +596,8 @@ s64 bch2_remap_range(struct bch_fs *c,
 	struct bpos src_want;
 	u64 dst_done = 0;
 	u32 dst_snapshot, src_snapshot;
+	bool reflink_p_may_update_opts_field =
+		bch2_request_incompat_feature(c, bcachefs_metadata_version_reflink_p_may_update_opts);
 	int ret = 0, ret2 = 0;
 
 	if (!bch2_write_ref_tryget(c, BCH_WRITE_REF_reflink))
@@ -672,7 +679,8 @@ s64 bch2_remap_range(struct bch_fs *c,
 			src_k = bkey_i_to_s_c(new_src.k);
 
 			ret = bch2_make_extent_indirect(trans, &src_iter,
-						new_src.k);
+						new_src.k,
+						reflink_p_may_update_opts_field);
 			if (ret)
 				continue;
 
@@ -690,6 +698,10 @@ s64 bch2_remap_range(struct bch_fs *c,
 				 bkey_start_offset(src_k.k));
 
 			SET_REFLINK_P_IDX(&dst_p->v, offset);
+
+			if (reflink_p_may_update_opts_field &&
+			    may_change_src_io_path_opts)
+				SET_REFLINK_P_MAY_UPDATE_OPTIONS(&dst_p->v, true);
 		} else {
 			BUG();
 		}
diff --git a/fs/bcachefs/reflink.h b/fs/bcachefs/reflink.h
index b61a4bdd8e82..ddeb0803af67 100644
--- a/fs/bcachefs/reflink.h
+++ b/fs/bcachefs/reflink.h
@@ -78,7 +78,8 @@ struct bkey_s_c bch2_lookup_indirect_extent(struct btree_trans *, struct btree_i
 					    bool, unsigned);
 
 s64 bch2_remap_range(struct bch_fs *, subvol_inum, u64,
-		     subvol_inum, u64, u64, u64, s64 *);
+		     subvol_inum, u64, u64, u64, s64 *,
+		     bool);
 
 int bch2_gc_reflink_done(struct bch_fs *);
 int bch2_gc_reflink_start(struct bch_fs *);
diff --git a/fs/bcachefs/reflink_format.h b/fs/bcachefs/reflink_format.h
index 53502627b2c5..92995e4f898e 100644
--- a/fs/bcachefs/reflink_format.h
+++ b/fs/bcachefs/reflink_format.h
@@ -19,6 +19,8 @@ struct bch_reflink_p {
 
 LE64_BITMASK(REFLINK_P_IDX,	struct bch_reflink_p, idx_flags,  0, 56);
 LE64_BITMASK(REFLINK_P_ERROR,	struct bch_reflink_p, idx_flags, 56, 57);
+LE64_BITMASK(REFLINK_P_MAY_UPDATE_OPTIONS,
+				struct bch_reflink_p, idx_flags, 57, 58);
 
 struct bch_reflink_v {
 	struct bch_val		v;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] bcachefs: Option changes now get propagated to reflinked data
  2024-11-12  3:35 [PATCH 0/3] io path options + reflink (mild security implications) Kent Overstreet
  2024-11-12  3:35 ` [PATCH 1/3] bcachefs: BCH_SB_VERSION_INCOMPAT Kent Overstreet
  2024-11-12  3:35 ` [PATCH 2/3] bcachefs: bcachefs_metadata_version_reflink_p_may_update_opts Kent Overstreet
@ 2024-11-12  3:35 ` Kent Overstreet
  2024-11-12 10:53 ` [PATCH 0/3] io path options + reflink (mild security implications) Jan Kara
  3 siblings, 0 replies; 5+ messages in thread
From: Kent Overstreet @ 2024-11-12  3:35 UTC (permalink / raw)
  To: linux-bcachefs, linux-fsdevel, linux-kernel
  Cc: Kent Overstreet, brauner, sforshee, viro, jack

Now that bch2_move_get_io_opts() re-propagates changed inode io options
to bch_extent_rebalance, we can properly suport changing IO path options
for reflinked data.

Changing a per-file IO path option, either via the xattr interface or
via the BCHFS_IOC_REINHERIT_ATTRS ioctl, will now trigger a scan (the
inode number is marked as needing a scan, via
bch2_set_rebalance_needs_scan()), and rebalance will use
bch2_move_data(), which will walk the inode number and pick up the new
options.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/move.c | 51 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index a6b503278519..27f885cf998a 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -22,6 +22,7 @@
 #include "keylist.h"
 #include "move.h"
 #include "rebalance.h"
+#include "reflink.h"
 #include "replicas.h"
 #include "snapshot.h"
 #include "super-io.h"
@@ -389,6 +390,7 @@ int bch2_move_extent(struct moving_context *ctxt,
 
 static struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
 			  struct per_snapshot_io_opts *io_opts,
+			  struct bpos extent_pos, /* extent_iter, extent_k may be in reflink btree */
 			  struct btree_iter *extent_iter,
 			  struct bkey_s_c extent_k)
 {
@@ -400,12 +402,12 @@ static struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
 	if (extent_k.k->type == KEY_TYPE_reflink_v)
 		goto out;
 
-	if (io_opts->cur_inum != extent_k.k->p.inode) {
+	if (io_opts->cur_inum != extent_pos.inode) {
 		io_opts->d.nr = 0;
 
-		ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_k.k->p.inode),
+		ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_pos.inode),
 					 BTREE_ITER_all_snapshots, k, ({
-			if (k.k->p.offset != extent_k.k->p.inode)
+			if (k.k->p.offset != extent_pos.inode)
 				break;
 
 			if (!bkey_is_inode(k.k))
@@ -419,7 +421,7 @@ static struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
 
 			darray_push(&io_opts->d, e);
 		}));
-		io_opts->cur_inum = extent_k.k->p.inode;
+		io_opts->cur_inum = extent_pos.inode;
 	}
 
 	ret = ret ?: trans_was_restarted(trans, restart_count);
@@ -525,9 +527,15 @@ static int bch2_move_data_btree(struct moving_context *ctxt,
 	struct per_snapshot_io_opts snapshot_io_opts;
 	struct bch_io_opts *io_opts;
 	struct bkey_buf sk;
-	struct btree_iter iter;
+	struct btree_iter iter, reflink_iter = {};
 	struct bkey_s_c k;
 	struct data_update_opts data_opts;
+	/*
+	 * If we're moving a single file, also process reflinked data it points
+	 * to (this includes propagating changed io_opts from the inode to the
+	 * extent):
+	 */
+	bool walk_indirect = start.inode == end.inode;
 	int ret = 0, ret2;
 
 	per_snapshot_io_opts_init(&snapshot_io_opts, c);
@@ -547,6 +555,8 @@ static int bch2_move_data_btree(struct moving_context *ctxt,
 		bch2_ratelimit_reset(ctxt->rate);
 
 	while (!bch2_move_ratelimit(ctxt)) {
+		struct btree_iter *extent_iter = &iter;
+
 		bch2_trans_begin(trans);
 
 		k = bch2_btree_iter_peek(&iter);
@@ -565,10 +575,36 @@ static int bch2_move_data_btree(struct moving_context *ctxt,
 		if (ctxt->stats)
 			ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
 
+		if (walk_indirect &&
+		    k.k->type == KEY_TYPE_reflink_p &&
+		    REFLINK_P_MAY_UPDATE_OPTIONS(bkey_s_c_to_reflink_p(k).v)) {
+			struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k);
+			s64 offset_into_extent	= iter.pos.offset - bkey_start_offset(k.k);
+
+			bch2_trans_iter_exit(trans, &reflink_iter);
+			k = bch2_lookup_indirect_extent(trans, &reflink_iter, &offset_into_extent, p, true, 0);
+			ret = bkey_err(k);
+			if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
+				continue;
+			if (ret)
+				break;
+
+			if (bkey_deleted(k.k))
+				goto next_nondata;
+
+			/*
+			 * XXX: reflink pointers may point to multiple indirect
+			 * extents, so don't advance past the entire reflink
+			 * pointer - need to fixup iter->k
+			 */
+			extent_iter = &reflink_iter;
+		}
+
 		if (!bkey_extent_is_direct_data(k.k))
 			goto next_nondata;
 
-		io_opts = bch2_move_get_io_opts(trans, &snapshot_io_opts, &iter, k);
+		io_opts = bch2_move_get_io_opts(trans, &snapshot_io_opts,
+						iter.pos, extent_iter, k);
 		ret = PTR_ERR_OR_ZERO(io_opts);
 		if (ret)
 			continue;
@@ -584,7 +620,7 @@ static int bch2_move_data_btree(struct moving_context *ctxt,
 		bch2_bkey_buf_reassemble(&sk, c, k);
 		k = bkey_i_to_s_c(sk.k);
 
-		ret2 = bch2_move_extent(ctxt, NULL, &iter, k, *io_opts, data_opts);
+		ret2 = bch2_move_extent(ctxt, NULL, extent_iter, k, *io_opts, data_opts);
 		if (ret2) {
 			if (bch2_err_matches(ret2, BCH_ERR_transaction_restart))
 				continue;
@@ -605,6 +641,7 @@ static int bch2_move_data_btree(struct moving_context *ctxt,
 		bch2_btree_iter_advance(&iter);
 	}
 
+	bch2_trans_iter_exit(trans, &reflink_iter);
 	bch2_trans_iter_exit(trans, &iter);
 	bch2_bkey_buf_exit(&sk, c);
 	per_snapshot_io_opts_exit(&snapshot_io_opts);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] io path options + reflink (mild security implications)
  2024-11-12  3:35 [PATCH 0/3] io path options + reflink (mild security implications) Kent Overstreet
                   ` (2 preceding siblings ...)
  2024-11-12  3:35 ` [PATCH 3/3] bcachefs: Option changes now get propagated to reflinked data Kent Overstreet
@ 2024-11-12 10:53 ` Jan Kara
  3 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2024-11-12 10:53 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcachefs, linux-fsdevel, linux-kernel, brauner, sforshee,
	viro, jack

On Mon 11-11-24 22:35:32, Kent Overstreet wrote:
> so, I've been fleshing out various things at the intersection of io path
> options + rebalance + reflink, and this is the last little bit
> 
> background: bcachefs has io path options that can be set filesystem
> wide, or per inode, and when changed rebalance automatically picks them
> up and does the right thing
> 
> reflink adds a wrinkle, which is that we'd like e.g. recursively setting
> the foreground/background targets on some files to move them to the
> appropriate device (or nr_replicas etc.), like other data - but if a
> user did a reflink copy of some other user's data and then set
> nr_replicas=1, that would be bad.
> 
> so this series adds a flag to reflink pointers for "may propagate option
> changes", which can then be set at remap_file_range() time based on
> vfs level permission checks.
> 
> so, question for everyone: is write access to the source file what we
> want? or should it be stricter, i.e. ownership matches?

Well, if I understand the impact properly, this seems similar to the
effects vfs_fileattr_set() can have on a file and there we have:

        if (!inode_owner_or_capable(idmap, inode))
                return -EPERM;

So I'd say ownership match would be more consistent.

> then, we're currently missing mnt_idmap plumbing to remap_file_range()
> to do said permissions checks - do we want to do that? or is there an
> easier way?

Well, if you have struct file, you have the mount and thus idmap through
file_mnt_idmap(file). And struct file is available in remap_file_range()
call stack so I'm not sure what is the problem exactly.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-12 10:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-12  3:35 [PATCH 0/3] io path options + reflink (mild security implications) Kent Overstreet
2024-11-12  3:35 ` [PATCH 1/3] bcachefs: BCH_SB_VERSION_INCOMPAT Kent Overstreet
2024-11-12  3:35 ` [PATCH 2/3] bcachefs: bcachefs_metadata_version_reflink_p_may_update_opts Kent Overstreet
2024-11-12  3:35 ` [PATCH 3/3] bcachefs: Option changes now get propagated to reflinked data Kent Overstreet
2024-11-12 10:53 ` [PATCH 0/3] io path options + reflink (mild security implications) Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox