linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] vfs: security: Parse dev_name before calling security_sb_mount
@ 2025-07-08 23:05 Song Liu
  2025-07-09 10:24 ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-08 23:05 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel, linux-security-module, apparmor,
	selinux, tomoyo-users_en, tomoyo-users_ja
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, mic, gnoack, m, john.johansen, john, stephen.smalley.work,
	omosnace, takedakn, penguin-kernel, enlightened, Song Liu

security_sb_mount handles multiple types of mounts: new mount, bind
mount, etc. When parameter dev_name is a path, it need to be parsed
with kern_path.

Move the parsing of dev_name to path_mount, and pass the result to
security_sb_mount, so that:
1. The LSMs do not need to call kern_path again;
2. For BPF LSM, we can use struct path dev_path, which is much easier to
   use than a string.
3. We can now remove do_move_mount_old.

Also, move may_mount check to before security_sb_mount and potential
kern_path, so that requests without proper capability will be rejected
sooner.

Signed-off-by: Song Liu <song@kernel.org>

---
The primary motivation of this change is to monitor bind mount and move
mount in BPF LSM. There are a few options for this to work:
1. Introduce bpf_kern_path kfunc.
2. Add new hook(s), such as [1].
3. Something like this patch.

At this moment, I think this patch is the best solution.

New mount for filesystems with FS_REQUIRES_DEV also need kern_path for
dev_name. apparmor and tomoyo still call kern_path in such cases.
However, it is a bit tricky to move this kern_path call to path_mount,
so new mount path is not changed in this version.

[1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/
---
 fs/namespace.c                    | 142 ++++++++++++++++++------------
 include/linux/lsm_hook_defs.h     |   3 +-
 include/linux/security.h          |   7 +-
 security/apparmor/include/mount.h |   5 +-
 security/apparmor/lsm.c           |   8 +-
 security/apparmor/mount.c         |  31 +------
 security/landlock/fs.c            |   2 +-
 security/security.c               |   6 +-
 security/selinux/hooks.c          |   1 +
 security/tomoyo/common.h          |   3 +-
 security/tomoyo/mount.c           |  36 +++++---
 security/tomoyo/tomoyo.c          |   6 +-
 12 files changed, 136 insertions(+), 114 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e13d9ab4f564..89f7fcd23b6c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3056,37 +3056,28 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, const char *old_name,
+static int do_loopback(struct path *path, struct path *old_path,
 				int recurse)
 {
-	struct path old_path;
 	struct mount *mnt = NULL, *parent;
 	struct mountpoint *mp;
-	int err;
-	if (!old_name || !*old_name)
-		return -EINVAL;
-	err = kern_path(old_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &old_path);
-	if (err)
-		return err;
+	int err = -EINVAL;
 
-	err = -EINVAL;
-	if (mnt_ns_loop(old_path.dentry))
-		goto out;
+	if (mnt_ns_loop(old_path->dentry))
+		return -EINVAL;
 
 	mp = lock_mount(path);
-	if (IS_ERR(mp)) {
-		err = PTR_ERR(mp);
-		goto out;
-	}
+	if (IS_ERR(mp))
+		return PTR_ERR(mp);
 
 	parent = real_mount(path->mnt);
 	if (!check_mnt(parent))
-		goto out2;
+		goto out;
 
-	mnt = __do_loopback(&old_path, recurse);
+	mnt = __do_loopback(old_path, recurse);
 	if (IS_ERR(mnt)) {
 		err = PTR_ERR(mnt);
-		goto out2;
+		goto out;
 	}
 
 	err = graft_tree(mnt, parent, mp);
@@ -3095,10 +3086,8 @@ static int do_loopback(struct path *path, const char *old_name,
 		umount_tree(mnt, UMOUNT_SYNC);
 		unlock_mount_hash();
 	}
-out2:
-	unlock_mount(mp);
 out:
-	path_put(&old_path);
+	unlock_mount(mp);
 	return err;
 }
 
@@ -3741,23 +3730,6 @@ static int do_move_mount(struct path *old_path,
 	return err;
 }
 
-static int do_move_mount_old(struct path *path, const char *old_name)
-{
-	struct path old_path;
-	int err;
-
-	if (!old_name || !*old_name)
-		return -EINVAL;
-
-	err = kern_path(old_name, LOOKUP_FOLLOW, &old_path);
-	if (err)
-		return err;
-
-	err = do_move_mount(&old_path, path, 0);
-	path_put(&old_path);
-	return err;
-}
-
 /*
  * add a mount into a namespace's mount tree
  */
@@ -4117,6 +4089,31 @@ static char *copy_mount_string(const void __user *data)
 	return data ? strndup_user(data, PATH_MAX) : NULL;
 }
 
+enum mount_operation {
+	MOUNT_OP_RECONFIGURE,
+	MOUNT_OP_REMOUNT,
+	MOUNT_OP_BIND,
+	MOUNT_OP_CHANGE_TYPE,
+	MOUNT_OP_MOVE,
+	MOUNT_OP_NEW,
+};
+
+static enum mount_operation get_mount_op(unsigned long flags)
+{
+	if ((flags & (MS_REMOUNT | MS_BIND)) == (MS_REMOUNT | MS_BIND))
+		return MOUNT_OP_RECONFIGURE;
+	if (flags & MS_REMOUNT)
+		return MOUNT_OP_REMOUNT;
+	if (flags & MS_BIND)
+		return MOUNT_OP_BIND;
+	if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
+		return MOUNT_OP_CHANGE_TYPE;
+	if (flags & MS_MOVE)
+		return MOUNT_OP_MOVE;
+
+	return MOUNT_OP_NEW;
+}
+
 /*
  * Flags is a 32-bit value that allows up to 31 non-fs dependent flags to
  * be given to the mount() call (ie: read-only, no-dev, no-suid etc).
@@ -4135,6 +4132,8 @@ int path_mount(const char *dev_name, struct path *path,
 		const char *type_page, unsigned long flags, void *data_page)
 {
 	unsigned int mnt_flags = 0, sb_flags;
+	enum mount_operation op;
+	struct path dev_path = {};
 	int ret;
 
 	/* Discard magic */
@@ -4148,11 +4147,29 @@ int path_mount(const char *dev_name, struct path *path,
 	if (flags & MS_NOUSER)
 		return -EINVAL;
 
-	ret = security_sb_mount(dev_name, path, type_page, flags, data_page);
+	if (!may_mount()) {
+		ret = -EPERM;
+		goto out;
+	}
+
+	op = get_mount_op(flags);
+
+	if (op == MOUNT_OP_BIND || op == MOUNT_OP_MOVE) {
+		unsigned int lookup_flags = LOOKUP_FOLLOW;
+
+		if (!dev_name || !*dev_name)
+			return -EINVAL;
+
+		if (op == MOUNT_OP_BIND)
+			lookup_flags |= LOOKUP_AUTOMOUNT;
+		ret = kern_path(dev_name, lookup_flags, &dev_path);
+		if (ret)
+			return ret;
+	}
+
+	ret = security_sb_mount(dev_name, &dev_path, path, type_page, flags, data_page);
 	if (ret)
-		return ret;
-	if (!may_mount())
-		return -EPERM;
+		goto out;
 	if (flags & SB_MANDLOCK)
 		warn_mandlock();
 
@@ -4195,19 +4212,34 @@ int path_mount(const char *dev_name, struct path *path,
 			    SB_LAZYTIME |
 			    SB_I_VERSION);
 
-	if ((flags & (MS_REMOUNT | MS_BIND)) == (MS_REMOUNT | MS_BIND))
-		return do_reconfigure_mnt(path, mnt_flags);
-	if (flags & MS_REMOUNT)
-		return do_remount(path, flags, sb_flags, mnt_flags, data_page);
-	if (flags & MS_BIND)
-		return do_loopback(path, dev_name, flags & MS_REC);
-	if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
-		return do_change_type(path, flags);
-	if (flags & MS_MOVE)
-		return do_move_mount_old(path, dev_name);
-
-	return do_new_mount(path, type_page, sb_flags, mnt_flags, dev_name,
-			    data_page);
+	switch (op) {
+	case MOUNT_OP_RECONFIGURE:
+		ret = do_reconfigure_mnt(path, mnt_flags);
+		break;
+	case MOUNT_OP_REMOUNT:
+		ret = do_remount(path, flags, sb_flags, mnt_flags, data_page);
+		break;
+	case MOUNT_OP_BIND:
+		ret = do_loopback(path, &dev_path, flags & MS_REC);
+		break;
+	case MOUNT_OP_CHANGE_TYPE:
+		ret = do_change_type(path, flags);
+		break;
+	case MOUNT_OP_MOVE:
+		ret = do_move_mount(&dev_path, path, 0);
+		break;
+	case MOUNT_OP_NEW:
+		ret = do_new_mount(path, type_page, sb_flags, mnt_flags, dev_name,
+				   data_page);
+		break;
+	default:
+		/* unknown op? */
+		WARN_ON_ONCE(1);
+		break;
+	}
+out:
+	path_put(&dev_path);
+	return ret;
 }
 
 int do_mount(const char *dev_name, const char __user *dir_name,
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index bf3bbac4e02a..b5b9f6fc78e2 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -69,7 +69,8 @@ LSM_HOOK(int, 0, sb_remount, struct super_block *sb, void *mnt_opts)
 LSM_HOOK(int, 0, sb_kern_mount, const struct super_block *sb)
 LSM_HOOK(int, 0, sb_show_options, struct seq_file *m, struct super_block *sb)
 LSM_HOOK(int, 0, sb_statfs, struct dentry *dentry)
-LSM_HOOK(int, 0, sb_mount, const char *dev_name, const struct path *path,
+LSM_HOOK(int, 0, sb_mount, const char *dev_name, const struct path *dev_path,
+	 const struct path *path,
 	 const char *type, unsigned long flags, void *data)
 LSM_HOOK(int, 0, sb_umount, struct vfsmount *mnt, int flags)
 LSM_HOOK(int, 0, sb_pivotroot, const struct path *old_path,
diff --git a/include/linux/security.h b/include/linux/security.h
index e8d9f6069f0c..af3f38c33ba0 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -373,7 +373,8 @@ int security_sb_remount(struct super_block *sb, void *mnt_opts);
 int security_sb_kern_mount(const struct super_block *sb);
 int security_sb_show_options(struct seq_file *m, struct super_block *sb);
 int security_sb_statfs(struct dentry *dentry);
-int security_sb_mount(const char *dev_name, const struct path *path,
+int security_sb_mount(const char *dev_name, const struct path *dev_path,
+		      const struct path *path,
 		      const char *type, unsigned long flags, void *data);
 int security_sb_umount(struct vfsmount *mnt, int flags);
 int security_sb_pivotroot(const struct path *old_path, const struct path *new_path);
@@ -803,7 +804,9 @@ static inline int security_sb_statfs(struct dentry *dentry)
 	return 0;
 }
 
-static inline int security_sb_mount(const char *dev_name, const struct path *path,
+static inline int security_sb_mount(const char *dev_name,
+				    const struct path *dev_path,
+				    const struct path *path,
 				    const char *type, unsigned long flags,
 				    void *data)
 {
diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
index 46834f828179..0beb626f1e8b 100644
--- a/security/apparmor/include/mount.h
+++ b/security/apparmor/include/mount.h
@@ -31,16 +31,13 @@ int aa_remount(const struct cred *subj_cred,
 
 int aa_bind_mount(const struct cred *subj_cred,
 		  struct aa_label *label, const struct path *path,
-		  const char *old_name, unsigned long flags);
+		  const struct path *old_path, unsigned long flags);
 
 
 int aa_mount_change_type(const struct cred *subj_cred,
 			 struct aa_label *label, const struct path *path,
 			 unsigned long flags);
 
-int aa_move_mount_old(const struct cred *subj_cred,
-		      struct aa_label *label, const struct path *path,
-		      const char *old_name);
 int aa_move_mount(const struct cred *subj_cred,
 		  struct aa_label *label, const struct path *from_path,
 		  const struct path *to_path);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 9b6c2f157f83..b9f2bd8e9d3a 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -701,7 +701,8 @@ static int apparmor_uring_sqpoll(void)
 }
 #endif /* CONFIG_IO_URING */
 
-static int apparmor_sb_mount(const char *dev_name, const struct path *path,
+static int apparmor_sb_mount(const char *dev_name, const struct path *dev_path,
+			     const struct path *path,
 			     const char *type, unsigned long flags, void *data)
 {
 	struct aa_label *label;
@@ -720,14 +721,13 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
 					   data);
 		else if (flags & MS_BIND)
 			error = aa_bind_mount(current_cred(), label, path,
-					      dev_name, flags);
+					      dev_path, flags);
 		else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE |
 				  MS_UNBINDABLE))
 			error = aa_mount_change_type(current_cred(), label,
 						     path, flags);
 		else if (flags & MS_MOVE)
-			error = aa_move_mount_old(current_cred(), label, path,
-						  dev_name);
+			error = aa_move_mount(current_cred(), label, dev_path, path);
 		else
 			error = aa_new_mount(current_cred(), label, dev_name,
 					     path, type, flags, data);
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index bf8863253e07..00c8acf9d8f9 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -421,25 +421,17 @@ int aa_remount(const struct cred *subj_cred,
 
 int aa_bind_mount(const struct cred *subj_cred,
 		  struct aa_label *label, const struct path *path,
-		  const char *dev_name, unsigned long flags)
+		  const struct path *old_path, unsigned long flags)
 {
 	struct aa_profile *profile;
 	char *buffer = NULL, *old_buffer = NULL;
-	struct path old_path;
 	int error;
 
 	AA_BUG(!label);
 	AA_BUG(!path);
 
-	if (!dev_name || !*dev_name)
-		return -EINVAL;
-
 	flags &= MS_REC | MS_BIND;
 
-	error = kern_path(dev_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &old_path);
-	if (error)
-		return error;
-
 	buffer = aa_get_buffer(false);
 	old_buffer = aa_get_buffer(false);
 	error = -ENOMEM;
@@ -447,12 +439,11 @@ int aa_bind_mount(const struct cred *subj_cred,
 		goto out;
 
 	error = fn_for_each_confined(label, profile,
-			match_mnt(subj_cred, profile, path, buffer, &old_path,
+			match_mnt(subj_cred, profile, path, buffer, old_path,
 				  old_buffer, NULL, flags, NULL, false));
 out:
 	aa_put_buffer(buffer);
 	aa_put_buffer(old_buffer);
-	path_put(&old_path);
 
 	return error;
 }
@@ -516,24 +507,6 @@ int aa_move_mount(const struct cred *subj_cred,
 	return error;
 }
 
-int aa_move_mount_old(const struct cred *subj_cred, struct aa_label *label,
-		      const struct path *path, const char *orig_name)
-{
-	struct path old_path;
-	int error;
-
-	if (!orig_name || !*orig_name)
-		return -EINVAL;
-	error = kern_path(orig_name, LOOKUP_FOLLOW, &old_path);
-	if (error)
-		return error;
-
-	error = aa_move_mount(subj_cred, label, &old_path, path);
-	path_put(&old_path);
-
-	return error;
-}
-
 int aa_new_mount(const struct cred *subj_cred, struct aa_label *label,
 		 const char *dev_name, const struct path *path,
 		 const char *type, unsigned long flags, void *data)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 6fee7c20f64d..508ec65f3297 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1418,7 +1418,7 @@ static void log_fs_change_topology_dentry(
  * inherit these new constraints.  Anyway, for backward compatibility reasons,
  * a dedicated user space option would be required (e.g. as a ruleset flag).
  */
-static int hook_sb_mount(const char *const dev_name,
+static int hook_sb_mount(const char *const dev_name, const struct path *dev_path,
 			 const struct path *const path, const char *const type,
 			 const unsigned long flags, void *const data)
 {
diff --git a/security/security.c b/security/security.c
index fc8405928cc7..e52a1195e91e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1550,6 +1550,7 @@ int security_sb_statfs(struct dentry *dentry)
 /**
  * security_sb_mount() - Check permission for mounting a filesystem
  * @dev_name: filesystem backing device
+ * @dev_path: path of filesystem backing device
  * @path: mount point
  * @type: filesystem type
  * @flags: mount flags
@@ -1564,10 +1565,11 @@ int security_sb_statfs(struct dentry *dentry)
  *
  * Return: Returns 0 if permission is granted.
  */
-int security_sb_mount(const char *dev_name, const struct path *path,
+int security_sb_mount(const char *dev_name, const struct path *dev_path,
+		      const struct path *path,
 		      const char *type, unsigned long flags, void *data)
 {
-	return call_int_hook(sb_mount, dev_name, path, type, flags, data);
+	return call_int_hook(sb_mount, dev_name, dev_path, path, type, flags, data);
 }
 
 /**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 595ceb314aeb..a2c240ffd1e1 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2764,6 +2764,7 @@ static int selinux_sb_statfs(struct dentry *dentry)
 }
 
 static int selinux_mount(const char *dev_name,
+			 const struct path *dev_path,
 			 const struct path *path,
 			 const char *type,
 			 unsigned long flags,
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index 0e8e2e959aef..9b7d190f573e 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -980,7 +980,8 @@ int tomoyo_init_request_info(struct tomoyo_request_info *r,
 			     const u8 index);
 int tomoyo_mkdev_perm(const u8 operation, const struct path *path,
 		      const unsigned int mode, unsigned int dev);
-int tomoyo_mount_permission(const char *dev_name, const struct path *path,
+int tomoyo_mount_permission(const char *dev_name, const struct path *dev_path,
+			    const struct path *path,
 			    const char *type, unsigned long flags,
 			    void *data_page);
 int tomoyo_open_control(const u8 type, struct file *file);
diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
index 2755971f50df..ee10cbfbf798 100644
--- a/security/tomoyo/mount.c
+++ b/security/tomoyo/mount.c
@@ -76,11 +76,12 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
  */
 static int tomoyo_mount_acl(struct tomoyo_request_info *r,
 			    const char *dev_name,
+			    const struct path *dev_path,
 			    const struct path *dir, const char *type,
 			    unsigned long flags)
 {
 	struct tomoyo_obj_info obj = { };
-	struct path path;
+	struct path path = { };
 	struct file_system_type *fstype = NULL;
 	const char *requested_type = NULL;
 	const char *requested_dir_name = NULL;
@@ -132,17 +133,25 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
 			need_dev = 1;
 	}
 	if (need_dev) {
-		/* Get mount point or device file. */
-		if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
-			error = -ENOENT;
+		error = -ENOENT;
+		if (!dev_name)
 			goto out;
+
+		if (need_dev == -1) {
+			/* bind mount or move mount */
+			if (!dev_path->dentry)
+				goto out;
+
+			obj.path1 = *dev_path;
+		} else {
+			/* new mount */
+			if (kern_path(dev_name, LOOKUP_FOLLOW, &path))
+				goto out;
+			obj.path1 = path;
 		}
-		obj.path1 = path;
-		requested_dev_name = tomoyo_realpath_from_path(&path);
-		if (!requested_dev_name) {
-			error = -ENOENT;
+		requested_dev_name = tomoyo_realpath_from_path(&obj.path1);
+		if (!requested_dev_name)
 			goto out;
-		}
 	} else {
 		/* Map dev_name to "<NULL>" if no dev_name given. */
 		if (!dev_name)
@@ -172,8 +181,8 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
 		put_filesystem(fstype);
 	kfree(requested_type);
 	/* Drop refcount obtained by kern_path(). */
-	if (obj.path1.dentry)
-		path_put(&obj.path1);
+	if (path.dentry)
+		path_put(&path);
 	return error;
 }
 
@@ -188,7 +197,8 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
  *
  * Returns 0 on success, negative value otherwise.
  */
-int tomoyo_mount_permission(const char *dev_name, const struct path *path,
+int tomoyo_mount_permission(const char *dev_name, const struct path *dev_path,
+			    const struct path *path,
 			    const char *type, unsigned long flags,
 			    void *data_page)
 {
@@ -234,7 +244,7 @@ int tomoyo_mount_permission(const char *dev_name, const struct path *path,
 	if (!type)
 		type = "<NULL>";
 	idx = tomoyo_read_lock();
-	error = tomoyo_mount_acl(&r, dev_name, path, type, flags);
+	error = tomoyo_mount_acl(&r, dev_name, dev_path, path, type, flags);
 	tomoyo_read_unlock(idx);
 	return error;
 }
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index d6ebcd9db80a..58e7a295f9b9 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -402,6 +402,7 @@ static int tomoyo_path_chroot(const struct path *path)
  * tomoyo_sb_mount - Target for security_sb_mount().
  *
  * @dev_name: Name of device file. Maybe NULL.
+ * @dev_path: Path to of device file. Maybe zero'ed.
  * @path:     Pointer to "struct path".
  * @type:     Name of filesystem type. Maybe NULL.
  * @flags:    Mount options.
@@ -409,10 +410,11 @@ static int tomoyo_path_chroot(const struct path *path)
  *
  * Returns 0 on success, negative value otherwise.
  */
-static int tomoyo_sb_mount(const char *dev_name, const struct path *path,
+static int tomoyo_sb_mount(const char *dev_name, const struct path *dev_path,
+			   const struct path *path,
 			   const char *type, unsigned long flags, void *data)
 {
-	return tomoyo_mount_permission(dev_name, path, type, flags, data);
+	return tomoyo_mount_permission(dev_name, dev_path, path, type, flags, data);
 }
 
 /**
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-08 23:05 [RFC] vfs: security: Parse dev_name before calling security_sb_mount Song Liu
@ 2025-07-09 10:24 ` Al Viro
  2025-07-09 15:19   ` Paul Moore
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2025-07-09 10:24 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module, apparmor,
	selinux, tomoyo-users_en, tomoyo-users_ja, kernel-team, andrii,
	eddyz87, ast, daniel, martin.lau, brauner, jack, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef, mic, gnoack, m,
	john.johansen, john, stephen.smalley.work, omosnace, takedakn,
	penguin-kernel, enlightened

On Tue, Jul 08, 2025 at 04:05:04PM -0700, Song Liu wrote:
> security_sb_mount handles multiple types of mounts: new mount, bind
> mount, etc. When parameter dev_name is a path, it need to be parsed
> with kern_path.
> 
> Move the parsing of dev_name to path_mount, and pass the result to
> security_sb_mount, so that:
> 1. The LSMs do not need to call kern_path again;
> 2. For BPF LSM, we can use struct path dev_path, which is much easier to
>    use than a string.
> 3. We can now remove do_move_mount_old.
> 
> Also, move may_mount check to before security_sb_mount and potential
> kern_path, so that requests without proper capability will be rejected
> sooner.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> 
> ---
> The primary motivation of this change is to monitor bind mount and move
> mount in BPF LSM. There are a few options for this to work:
> 1. Introduce bpf_kern_path kfunc.
> 2. Add new hook(s), such as [1].
> 3. Something like this patch.
> 
> At this moment, I think this patch is the best solution.
> 
> New mount for filesystems with FS_REQUIRES_DEV also need kern_path for
> dev_name. apparmor and tomoyo still call kern_path in such cases.
> However, it is a bit tricky to move this kern_path call to path_mount,
> so new mount path is not changed in this version.

security_sb_mount() is and had always been a mind-boggling trash of an API.

It makes no sense in terms of operations being requested.  And any questions
regarding its semantics had been consistently met with blanket "piss off,
LSM gets to do whatever it wants to do, you are not to question the sanity
and you are not to request any kind of rules - give us the fucking syscall
arguments and let us at it".

Come up with a saner API.  We are done accomodating that idiocy.  The only
changes you get to make in fs/namespace.c are "here's our better-defined
hooks, please call <this hook> when you do <that>".

NAKed-by: Al Viro <viro@zeniv.linux.org.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-09 10:24 ` Al Viro
@ 2025-07-09 15:19   ` Paul Moore
  2025-07-09 17:06     ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Moore @ 2025-07-09 15:19 UTC (permalink / raw)
  To: Al Viro
  Cc: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module,
	apparmor, selinux, tomoyo-users_en, tomoyo-users_ja, kernel-team,
	andrii, eddyz87, ast, daniel, martin.lau, brauner, jack, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef, mic, gnoack, m,
	john.johansen, john, stephen.smalley.work, omosnace, takedakn,
	penguin-kernel, enlightened

On Wed, Jul 9, 2025 at 6:24 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jul 08, 2025 at 04:05:04PM -0700, Song Liu wrote:
> > security_sb_mount handles multiple types of mounts: new mount, bind
> > mount, etc. When parameter dev_name is a path, it need to be parsed
> > with kern_path.

...

> security_sb_mount() is and had always been a mind-boggling trash of an API.
>
> It makes no sense in terms of operations being requested.  And any questions
> regarding its semantics had been consistently met with blanket "piss off,
> LSM gets to do whatever it wants to do, you are not to question the sanity
> and you are not to request any kind of rules - give us the fucking syscall
> arguments and let us at it".

I'm not going to comment on past remarks made by other devs, but I do
want to make it clear that I am interested in making sure we have LSM
hooks which satisfy both the needs of the existing in-tree LSMs while
also presenting a sane API to the kernel subsystems in which they are
placed.  I'm happy to revisit any of our existing LSM hooks to
restructure them to better fit these goals; simply send some patches
and let's discuss them.

> Come up with a saner API.  We are done accomodating that idiocy.  The only
> changes you get to make in fs/namespace.c are "here's our better-defined
> hooks, please call <this hook> when you do <that>".

I don't have the cycles to revisit the security_sb_mount() hook
myself, but perhaps Song Liu does with some suggestions/guidance from
you or Christian on what an improved LSM hook would look like from a
VFS perspective.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-09 15:19   ` Paul Moore
@ 2025-07-09 17:06     ` Song Liu
  2025-07-10 11:46       ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-09 17:06 UTC (permalink / raw)
  To: Paul Moore
  Cc: Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, brauner@kernel.org,
	jack@suse.cz, kpsingh@kernel.org, mattbobrowski@google.com,
	amir73il@gmail.com, repnop@google.com, jlayton@kernel.org,
	josef@toxicpanda.com, mic@digikod.net, gnoack@google.com,
	m@maowtm.org, john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

Hi Al and Paul, 

Thanks for your comments!

> On Jul 9, 2025, at 8:19 AM, Paul Moore <paul@paul-moore.com> wrote:
> 
> On Wed, Jul 9, 2025 at 6:24 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Tue, Jul 08, 2025 at 04:05:04PM -0700, Song Liu wrote:
>>> security_sb_mount handles multiple types of mounts: new mount, bind
>>> mount, etc. When parameter dev_name is a path, it need to be parsed
>>> with kern_path.
> 
> ...
> 
>> security_sb_mount() is and had always been a mind-boggling trash of an API.
>> 
>> It makes no sense in terms of operations being requested.  And any questions
>> regarding its semantics had been consistently met with blanket "piss off,
>> LSM gets to do whatever it wants to do, you are not to question the sanity
>> and you are not to request any kind of rules - give us the fucking syscall
>> arguments and let us at it".
> 
> I'm not going to comment on past remarks made by other devs, but I do
> want to make it clear that I am interested in making sure we have LSM
> hooks which satisfy both the needs of the existing in-tree LSMs while
> also presenting a sane API to the kernel subsystems in which they are
> placed.  I'm happy to revisit any of our existing LSM hooks to
> restructure them to better fit these goals; simply send some patches
> and let's discuss them.
> 
>> Come up with a saner API.  We are done accomodating that idiocy.  The only
>> changes you get to make in fs/namespace.c are "here's our better-defined
>> hooks, please call <this hook> when you do <that>".

Right now, we have security_sb_mount and security_move_mount, for 
syscall “mount” and “move_mount” respectively. This is confusing 
because we can also do move mount with syscall “mount”. How about 
we create 5 different security hooks:

security_bind_mount
security_new_mount
security_reconfigure_mount
security_remount
security_change_type_mount

and remove security_sb_mount. After this, we will have 6 hooks for
each type of mount (the 5 above plus security_move_mount). 

> 
> I don't have the cycles to revisit the security_sb_mount() hook
> myself, but perhaps Song Liu does with some suggestions/guidance from
> you or Christian on what an improved LSM hook would look like from a
> VFS perspective.

Thanks,
Song



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-09 17:06     ` Song Liu
@ 2025-07-10 11:46       ` Christian Brauner
  2025-07-10 17:00         ` Song Liu
  2025-07-10 21:40         ` Paul Moore
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Brauner @ 2025-07-10 11:46 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Wed, Jul 09, 2025 at 05:06:36PM +0000, Song Liu wrote:
> Hi Al and Paul, 
> 
> Thanks for your comments!
> 
> > On Jul 9, 2025, at 8:19 AM, Paul Moore <paul@paul-moore.com> wrote:
> > 
> > On Wed, Jul 9, 2025 at 6:24 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> On Tue, Jul 08, 2025 at 04:05:04PM -0700, Song Liu wrote:
> >>> security_sb_mount handles multiple types of mounts: new mount, bind
> >>> mount, etc. When parameter dev_name is a path, it need to be parsed
> >>> with kern_path.
> > 
> > ...
> > 
> >> security_sb_mount() is and had always been a mind-boggling trash of an API.
> >> 
> >> It makes no sense in terms of operations being requested.  And any questions
> >> regarding its semantics had been consistently met with blanket "piss off,
> >> LSM gets to do whatever it wants to do, you are not to question the sanity
> >> and you are not to request any kind of rules - give us the fucking syscall
> >> arguments and let us at it".
> > 
> > I'm not going to comment on past remarks made by other devs, but I do
> > want to make it clear that I am interested in making sure we have LSM
> > hooks which satisfy both the needs of the existing in-tree LSMs while
> > also presenting a sane API to the kernel subsystems in which they are
> > placed.  I'm happy to revisit any of our existing LSM hooks to
> > restructure them to better fit these goals; simply send some patches
> > and let's discuss them.
> > 
> >> Come up with a saner API.  We are done accomodating that idiocy.  The only
> >> changes you get to make in fs/namespace.c are "here's our better-defined
> >> hooks, please call <this hook> when you do <that>".
> 
> Right now, we have security_sb_mount and security_move_mount, for 
> syscall “mount” and “move_mount” respectively. This is confusing 
> because we can also do move mount with syscall “mount”. How about 
> we create 5 different security hooks:
> 
> security_bind_mount
> security_new_mount
> security_reconfigure_mount
> security_remount
> security_change_type_mount
> 
> and remove security_sb_mount. After this, we will have 6 hooks for
> each type of mount (the 5 above plus security_move_mount).

I've multiple times pointed out that the current mount security hooks
aren't working and basically everything in the new mount api is
unsupervised from an LSM perspective.

My recommendation is make a list of all the currently supported
security_*() hooks in the mount code (I certainly don't have them in my
head). Figure out what each of them allow to mediate effectively and how
the callchains are related.

Then make a proposal how to replace them with something that a) doesn't
cause regressions which is probably something that the LSMs care about
and b) that covers the new mount API sufficiently to be properly
mediated.

I'll happily review proposals. Fwiw, I'm pretty sure that this is
something that Mickael is interested in as well.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-10 11:46       ` Christian Brauner
@ 2025-07-10 17:00         ` Song Liu
  2025-07-11  9:36           ` Christian Brauner
  2025-07-10 21:40         ` Paul Moore
  1 sibling, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-10 17:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org



> On Jul 10, 2025, at 4:46 AM, Christian Brauner <brauner@kernel.org> wrote:

[...]

>> Right now, we have security_sb_mount and security_move_mount, for 
>> syscall “mount” and “move_mount” respectively. This is confusing 
>> because we can also do move mount with syscall “mount”. How about 
>> we create 5 different security hooks:
>> 
>> security_bind_mount
>> security_new_mount
>> security_reconfigure_mount
>> security_remount
>> security_change_type_mount
>> 
>> and remove security_sb_mount. After this, we will have 6 hooks for
>> each type of mount (the 5 above plus security_move_mount).
> 
> I've multiple times pointed out that the current mount security hooks
> aren't working and basically everything in the new mount api is
> unsupervised from an LSM perspective.

To make sure I understand the comment. By “new mount api”, do you mean 
the code path under do_new_mount()? 

> My recommendation is make a list of all the currently supported
> security_*() hooks in the mount code (I certainly don't have them in my
> head). Figure out what each of them allow to mediate effectively and how
> the callchains are related.
> 
> Then make a proposal how to replace them with something that a) doesn't
> cause regressions which is probably something that the LSMs care about
> and b) that covers the new mount API sufficiently to be properly
> mediated.
> 
> I'll happily review proposals. Fwiw, I'm pretty sure that this is
> something that Mickael is interested in as well.

So we will consider a proper redesign of LSM hooks for mount syscalls, 
but we do not want incremental improvements like this one. Do I get 
the direction right?

Thanks,
Song


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-10 11:46       ` Christian Brauner
  2025-07-10 17:00         ` Song Liu
@ 2025-07-10 21:40         ` Paul Moore
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Moore @ 2025-07-10 21:40 UTC (permalink / raw)
  To: Song Liu
  Cc: Christian Brauner, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Thu, Jul 10, 2025 at 7:46 AM Christian Brauner <brauner@kernel.org> wrote:
> On Wed, Jul 09, 2025 at 05:06:36PM +0000, Song Liu wrote:

...

> I'll happily review proposals. Fwiw, I'm pretty sure that this is
> something that Mickael is interested in as well.

As a gentle reminder, please be sure to include the LSM list on these
efforts, at the absolute least I want to review the patches, and I'm
sure the other individual LSM subsystem maintainers will surely want
to take a look too.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-10 17:00         ` Song Liu
@ 2025-07-11  9:36           ` Christian Brauner
  2025-07-11 16:22             ` Song Liu
  2025-07-11 23:09             ` Song Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Brauner @ 2025-07-11  9:36 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Thu, Jul 10, 2025 at 05:00:18PM +0000, Song Liu wrote:
> 
> 
> > On Jul 10, 2025, at 4:46 AM, Christian Brauner <brauner@kernel.org> wrote:
> 
> [...]
> 
> >> Right now, we have security_sb_mount and security_move_mount, for 
> >> syscall “mount” and “move_mount” respectively. This is confusing 
> >> because we can also do move mount with syscall “mount”. How about 
> >> we create 5 different security hooks:
> >> 
> >> security_bind_mount
> >> security_new_mount
> >> security_reconfigure_mount
> >> security_remount
> >> security_change_type_mount
> >> 
> >> and remove security_sb_mount. After this, we will have 6 hooks for
> >> each type of mount (the 5 above plus security_move_mount).
> > 
> > I've multiple times pointed out that the current mount security hooks
> > aren't working and basically everything in the new mount api is
> > unsupervised from an LSM perspective.
> 
> To make sure I understand the comment. By “new mount api”, do you mean 
> the code path under do_new_mount()? 

fsopen()
fsconfig()
fsmount()
open_tree()
open_tree_attr()
move_mount()
statmount()
listmount()

I think that's all.

> 
> > My recommendation is make a list of all the currently supported
> > security_*() hooks in the mount code (I certainly don't have them in my
> > head). Figure out what each of them allow to mediate effectively and how
> > the callchains are related.
> > 
> > Then make a proposal how to replace them with something that a) doesn't
> > cause regressions which is probably something that the LSMs care about
> > and b) that covers the new mount API sufficiently to be properly
> > mediated.
> > 
> > I'll happily review proposals. Fwiw, I'm pretty sure that this is
> > something that Mickael is interested in as well.
> 
> So we will consider a proper redesign of LSM hooks for mount syscalls, 
> but we do not want incremental improvements like this one. Do I get 
> the direction right?

If incremental is workable then I think so yes. But it would be great to
get a consistent picture of what people want/need.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-11  9:36           ` Christian Brauner
@ 2025-07-11 16:22             ` Song Liu
  2025-07-14  8:45               ` Christian Brauner
  2025-07-11 23:09             ` Song Liu
  1 sibling, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-11 16:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org



> On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@kernel.org> wrote:

[...]

>>> 
>> To make sure I understand the comment. By “new mount api”, do you mean 
>> the code path under do_new_mount()?
> 
> fsopen()
> fsconfig()
> fsmount()
> open_tree()
> open_tree_attr()
> move_mount()
> statmount()
> listmount()
> 
> I think that's all.

Thanks for the clarification and pointer!

> 
>> 
>>> My recommendation is make a list of all the currently supported
>>> security_*() hooks in the mount code (I certainly don't have them in my
>>> head). Figure out what each of them allow to mediate effectively and how
>>> the callchains are related.
>>> 
>>> Then make a proposal how to replace them with something that a) doesn't
>>> cause regressions which is probably something that the LSMs care about
>>> and b) that covers the new mount API sufficiently to be properly
>>> mediated.
>>> 
>>> I'll happily review proposals. Fwiw, I'm pretty sure that this is
>>> something that Mickael is interested in as well.
>> 
>> So we will consider a proper redesign of LSM hooks for mount syscalls, 
>> but we do not want incremental improvements like this one. Do I get 
>> the direction right?
> 
> If incremental is workable then I think so yes. But it would be great to
> get a consistent picture of what people want/need.

In short term, we would like a way to get struct path of dev_name for  
bind mount. AFAICT, there are a few options:

1. Introduce bpf_kern_path kfunc.
2. Add new hook(s), such as [1].
3. Something like this patch.

[1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/

Do you think we can ship one of them? 

Thanks,
Song


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-11  9:36           ` Christian Brauner
  2025-07-11 16:22             ` Song Liu
@ 2025-07-11 23:09             ` Song Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Song Liu @ 2025-07-11 23:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org



> On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Thu, Jul 10, 2025 at 05:00:18PM +0000, Song Liu wrote:
>> 
>> 
>>> On Jul 10, 2025, at 4:46 AM, Christian Brauner <brauner@kernel.org> wrote:
>> 
>> [...]
>> 
>>>> Right now, we have security_sb_mount and security_move_mount, for 
>>>> syscall “mount” and “move_mount” respectively. This is confusing 
>>>> because we can also do move mount with syscall “mount”. How about 
>>>> we create 5 different security hooks:
>>>> 
>>>> security_bind_mount
>>>> security_new_mount
>>>> security_reconfigure_mount
>>>> security_remount
>>>> security_change_type_mount
>>>> 
>>>> and remove security_sb_mount. After this, we will have 6 hooks for
>>>> each type of mount (the 5 above plus security_move_mount).
>>> 
>>> I've multiple times pointed out that the current mount security hooks
>>> aren't working and basically everything in the new mount api is
>>> unsupervised from an LSM perspective.
>> 
>> To make sure I understand the comment. By “new mount api”, do you mean 
>> the code path under do_new_mount()?
> 
> fsopen()
> fsconfig()
> fsmount()
> open_tree()
> open_tree_attr()
> move_mount()
> statmount()
> listmount()
> 
> I think that's all.

Reading the code, I think we also need to cover fspick. 

Thanks,
Song




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-11 16:22             ` Song Liu
@ 2025-07-14  8:45               ` Christian Brauner
  2025-07-14 15:10                 ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2025-07-14  8:45 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Fri, Jul 11, 2025 at 04:22:52PM +0000, Song Liu wrote:
> 
> 
> > On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@kernel.org> wrote:
> 
> [...]
> 
> >>> 
> >> To make sure I understand the comment. By “new mount api”, do you mean 
> >> the code path under do_new_mount()?
> > 
> > fsopen()
> > fsconfig()
> > fsmount()
> > open_tree()
> > open_tree_attr()
> > move_mount()
> > statmount()
> > listmount()
> > 
> > I think that's all.
> 
> Thanks for the clarification and pointer!
> 
> > 
> >> 
> >>> My recommendation is make a list of all the currently supported
> >>> security_*() hooks in the mount code (I certainly don't have them in my
> >>> head). Figure out what each of them allow to mediate effectively and how
> >>> the callchains are related.
> >>> 
> >>> Then make a proposal how to replace them with something that a) doesn't
> >>> cause regressions which is probably something that the LSMs care about
> >>> and b) that covers the new mount API sufficiently to be properly
> >>> mediated.
> >>> 
> >>> I'll happily review proposals. Fwiw, I'm pretty sure that this is
> >>> something that Mickael is interested in as well.
> >> 
> >> So we will consider a proper redesign of LSM hooks for mount syscalls, 
> >> but we do not want incremental improvements like this one. Do I get 
> >> the direction right?
> > 
> > If incremental is workable then I think so yes. But it would be great to
> > get a consistent picture of what people want/need.
> 
> In short term, we would like a way to get struct path of dev_name for  

You scared me for a second. By "dev_name" you mean the source path.

> bind mount. AFAICT, there are a few options:
> 
> 1. Introduce bpf_kern_path kfunc.
> 2. Add new hook(s), such as [1].
> 3. Something like this patch.
> 
> [1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/
> 
> Do you think we can ship one of them? 

If you place a new security hook into __do_loopback() the only thing
that I'm not excited about is that we're holding the global namespace
semaphore at that point. And I want to have as little LSM hook calls
under the namespace semaphore as possible.

If you have 1000 containers each calling into
security_something_something_bind_mount() and then you do your "walk
upwards towards the root stuff" and that root is 100000 directories away
you've introduced a proper DOS or at least a severe new bottleneck into
the system. And because of mount namespace propagation that needs to be
serialized across all mount namespaces the namespace semaphore isn't
something we can just massage away.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-14  8:45               ` Christian Brauner
@ 2025-07-14 15:10                 ` Song Liu
  2025-07-15 10:18                   ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-14 15:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org



> On Jul 14, 2025, at 1:45 AM, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Fri, Jul 11, 2025 at 04:22:52PM +0000, Song Liu wrote:
>> 
>> 
>>> On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@kernel.org> wrote:
>> 
>> [...]
>> 
>>>>> 
>>>> To make sure I understand the comment. By “new mount api”, do you mean 
>>>> the code path under do_new_mount()?
>>> 
>>> fsopen()
>>> fsconfig()
>>> fsmount()
>>> open_tree()
>>> open_tree_attr()
>>> move_mount()
>>> statmount()
>>> listmount()
>>> 
>>> I think that's all.
>> 
>> Thanks for the clarification and pointer!
>> 
>>> 
>>>> 
>>>>> My recommendation is make a list of all the currently supported
>>>>> security_*() hooks in the mount code (I certainly don't have them in my
>>>>> head). Figure out what each of them allow to mediate effectively and how
>>>>> the callchains are related.
>>>>> 
>>>>> Then make a proposal how to replace them with something that a) doesn't
>>>>> cause regressions which is probably something that the LSMs care about
>>>>> and b) that covers the new mount API sufficiently to be properly
>>>>> mediated.
>>>>> 
>>>>> I'll happily review proposals. Fwiw, I'm pretty sure that this is
>>>>> something that Mickael is interested in as well.
>>>> 
>>>> So we will consider a proper redesign of LSM hooks for mount syscalls, 
>>>> but we do not want incremental improvements like this one. Do I get 
>>>> the direction right?
>>> 
>>> If incremental is workable then I think so yes. But it would be great to
>>> get a consistent picture of what people want/need.
>> 
>> In short term, we would like a way to get struct path of dev_name for  
> 
> You scared me for a second. By "dev_name" you mean the source path.

Right, we need to get struct path for the source path specified by 
string “dev_name”.

> 
>> bind mount. AFAICT, there are a few options:
>> 
>> 1. Introduce bpf_kern_path kfunc.
>> 2. Add new hook(s), such as [1].
>> 3. Something like this patch.
>> 
>> [1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/ 
>> 
>> Do you think we can ship one of them?
> 
> If you place a new security hook into __do_loopback() the only thing
> that I'm not excited about is that we're holding the global namespace
> semaphore at that point. And I want to have as little LSM hook calls
> under the namespace semaphore as possible.

do_loopback() changed a bit since [1]. But if we put the new hook 
in do_loopback() before lock_mount(), we don’t have the problem with
the namespace semaphore, right? Also, this RFC doesn’t seem to have 
this issue either. 


> If you have 1000 containers each calling into
> security_something_something_bind_mount() and then you do your "walk
> upwards towards the root stuff" and that root is 100000 directories away
> you've introduced a proper DOS or at least a severe new bottleneck into
> the system. And because of mount namespace propagation that needs to be
> serialized across all mount namespaces the namespace semaphore isn't
> something we can just massage away.

AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I 
don’t think we need to overthink about a LSM helper causing DoS in 
some special scenarios. The owner of the LSM, either built-in LSM or 
BPF LSM, need to be aware of such risks and design the LSM rules 
properly to avoid DoS risks. For example, if the path tree is really 
deep, the LSM may decide to block the mount after walking a preset 
number of steps. 

Thanks,
Song



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-14 15:10                 ` Song Liu
@ 2025-07-15 10:18                   ` Christian Brauner
  2025-07-15 22:31                     ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2025-07-15 10:18 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:
> 
> 
> > On Jul 14, 2025, at 1:45 AM, Christian Brauner <brauner@kernel.org> wrote:
> > 
> > On Fri, Jul 11, 2025 at 04:22:52PM +0000, Song Liu wrote:
> >> 
> >> 
> >>> On Jul 11, 2025, at 2:36 AM, Christian Brauner <brauner@kernel.org> wrote:
> >> 
> >> [...]
> >> 
> >>>>> 
> >>>> To make sure I understand the comment. By “new mount api”, do you mean 
> >>>> the code path under do_new_mount()?
> >>> 
> >>> fsopen()
> >>> fsconfig()
> >>> fsmount()
> >>> open_tree()
> >>> open_tree_attr()
> >>> move_mount()
> >>> statmount()
> >>> listmount()
> >>> 
> >>> I think that's all.
> >> 
> >> Thanks for the clarification and pointer!
> >> 
> >>> 
> >>>> 
> >>>>> My recommendation is make a list of all the currently supported
> >>>>> security_*() hooks in the mount code (I certainly don't have them in my
> >>>>> head). Figure out what each of them allow to mediate effectively and how
> >>>>> the callchains are related.
> >>>>> 
> >>>>> Then make a proposal how to replace them with something that a) doesn't
> >>>>> cause regressions which is probably something that the LSMs care about
> >>>>> and b) that covers the new mount API sufficiently to be properly
> >>>>> mediated.
> >>>>> 
> >>>>> I'll happily review proposals. Fwiw, I'm pretty sure that this is
> >>>>> something that Mickael is interested in as well.
> >>>> 
> >>>> So we will consider a proper redesign of LSM hooks for mount syscalls, 
> >>>> but we do not want incremental improvements like this one. Do I get 
> >>>> the direction right?
> >>> 
> >>> If incremental is workable then I think so yes. But it would be great to
> >>> get a consistent picture of what people want/need.
> >> 
> >> In short term, we would like a way to get struct path of dev_name for  
> > 
> > You scared me for a second. By "dev_name" you mean the source path.
> 
> Right, we need to get struct path for the source path specified by 
> string “dev_name”.
> 
> > 
> >> bind mount. AFAICT, there are a few options:
> >> 
> >> 1. Introduce bpf_kern_path kfunc.
> >> 2. Add new hook(s), such as [1].
> >> 3. Something like this patch.
> >> 
> >> [1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/ 
> >> 
> >> Do you think we can ship one of them?
> > 
> > If you place a new security hook into __do_loopback() the only thing
> > that I'm not excited about is that we're holding the global namespace
> > semaphore at that point. And I want to have as little LSM hook calls
> > under the namespace semaphore as possible.
> 
> do_loopback() changed a bit since [1]. But if we put the new hook 
> in do_loopback() before lock_mount(), we don’t have the problem with
> the namespace semaphore, right? Also, this RFC doesn’t seem to have 
> this issue either. 

While the mount isn't locked another mount can still be mounted on top
of it. lock_mount() will detect this and lookup the topmost mount and
use that. IOW, the value of old_path->mnt may have changed after
lock_mount().

> > If you have 1000 containers each calling into
> > security_something_something_bind_mount() and then you do your "walk
> > upwards towards the root stuff" and that root is 100000 directories away
> > you've introduced a proper DOS or at least a severe new bottleneck into
> > the system. And because of mount namespace propagation that needs to be
> > serialized across all mount namespaces the namespace semaphore isn't
> > something we can just massage away.
> 
> AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I 
> don’t think we need to overthink about a LSM helper causing DoS in 
> some special scenarios. The owner of the LSM, either built-in LSM or 
> BPF LSM, need to be aware of such risks and design the LSM rules 
> properly to avoid DoS risks. For example, if the path tree is really 
> deep, the LSM may decide to block the mount after walking a preset 
> number of steps. 

The scope of the lock matters _a lot_. If a poorly designed LSM happens
to take exorbitant amount of time under the inode_lock() it's annoying:
to anyone else wanting to grab the inode_lock() _for that single inode_.

If a poorly designed LSM does broken stuff under the namespace semaphore
any mount event on the whole system will block, effectively deadlocking
the system in an instant. For example, if anything even glances at
/proc/<pid>/mountinfo it's game over. It's already iffy that we allow
security_sb_statfs() under there but that's at least guaranteed to be
fast.

If you can make it work so that we don't have to place security_*()
under the namespace semaphore and you can figure out how to deal with a
potential overmount racing you then this would be ideal for everyone.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-15 10:18                   ` Christian Brauner
@ 2025-07-15 22:31                     ` Song Liu
  2025-07-16  8:31                       ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-07-15 22:31 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org


> On Jul 15, 2025, at 3:18 AM, Christian Brauner <brauner@kernel.org> wrote:
> On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:


[...]

>>> If you place a new security hook into __do_loopback() the only thing
>>> that I'm not excited about is that we're holding the global namespace
>>> semaphore at that point. And I want to have as little LSM hook calls
>>> under the namespace semaphore as possible.
>> 
>> do_loopback() changed a bit since [1]. But if we put the new hook 
>> in do_loopback() before lock_mount(), we don’t have the problem with
>> the namespace semaphore, right? Also, this RFC doesn’t seem to have 
>> this issue either.
> 
> While the mount isn't locked another mount can still be mounted on top
> of it. lock_mount() will detect this and lookup the topmost mount and
> use that. IOW, the value of old_path->mnt may have changed after
> lock_mount().

I am probably confused. Do you mean path->mnt (instead of old_path->mnt) 
may have changed after lock_mount()? 

> If you have 1000 containers each calling into
>>> security_something_something_bind_mount() and then you do your "walk
>>> upwards towards the root stuff" and that root is 100000 directories away
>>> you've introduced a proper DOS or at least a severe new bottleneck into
>>> the system. And because of mount namespace propagation that needs to be
>>> serialized across all mount namespaces the namespace semaphore isn't
>>> something we can just massage away.
>> 
>> AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I 
>> don’t think we need to overthink about a LSM helper causing DoS in 
>> some special scenarios. The owner of the LSM, either built-in LSM or 
>> BPF LSM, need to be aware of such risks and design the LSM rules 
>> properly to avoid DoS risks. For example, if the path tree is really 
>> deep, the LSM may decide to block the mount after walking a preset 
>> number of steps.
> 
> The scope of the lock matters _a lot_. If a poorly designed LSM happens
> to take exorbitant amount of time under the inode_lock() it's annoying:
> to anyone else wanting to grab the inode_lock() _for that single inode_.
> 
> If a poorly designed LSM does broken stuff under the namespace semaphore
> any mount event on the whole system will block, effectively deadlocking
> the system in an instant. For example, if anything even glances at
> /proc/<pid>/mountinfo it's game over. It's already iffy that we allow
> security_sb_statfs() under there but that's at least guaranteed to be
> fast.
> 
> If you can make it work so that we don't have to place security_*()
> under the namespace semaphore and you can figure out how to deal with a
> potential overmount racing you then this would be ideal for everyone.

I am trying to understand all the challenges here. 

It appears to me that do_loopback() has the tricky issue:

static int do_loopback(struct path *path, ...)
{
	...
	/* 
	 * path may still change, so not a good point to add
	 * security hook 
	 */
	mp = lock_mount(path);
	if (IS_ERR(mp)) {
		/* ... */
	}
	/* 
	 * namespace_sem is locked, so not a good point to add
	 * security hook
	 */
	...
}

Basically, without major work with locking, there is no good 
spot to insert a security hook into do_loopback(). Or, maybe 
we can add a hook somewhere in lock_mount()? 

Did I get the challenge correct?

Thanks,
Song


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-15 22:31                     ` Song Liu
@ 2025-07-16  8:31                       ` Christian Brauner
  2025-07-16 17:12                         ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2025-07-16  8:31 UTC (permalink / raw)
  To: Song Liu
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org

On Tue, Jul 15, 2025 at 10:31:39PM +0000, Song Liu wrote:
> 
> > On Jul 15, 2025, at 3:18 AM, Christian Brauner <brauner@kernel.org> wrote:
> > On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:
> 
> 
> [...]
> 
> >>> If you place a new security hook into __do_loopback() the only thing
> >>> that I'm not excited about is that we're holding the global namespace
> >>> semaphore at that point. And I want to have as little LSM hook calls
> >>> under the namespace semaphore as possible.
> >> 
> >> do_loopback() changed a bit since [1]. But if we put the new hook 
> >> in do_loopback() before lock_mount(), we don’t have the problem with
> >> the namespace semaphore, right? Also, this RFC doesn’t seem to have 
> >> this issue either.
> > 
> > While the mount isn't locked another mount can still be mounted on top
> > of it. lock_mount() will detect this and lookup the topmost mount and
> > use that. IOW, the value of old_path->mnt may have changed after
> > lock_mount().
> 
> I am probably confused. Do you mean path->mnt (instead of old_path->mnt) 
> may have changed after lock_mount()? 

I mean the target path. I forgot that the code uses @old_path to mean
the source path not the target path. And you're interested in the source
path, not the target path.

> 
> > If you have 1000 containers each calling into
> >>> security_something_something_bind_mount() and then you do your "walk
> >>> upwards towards the root stuff" and that root is 100000 directories away
> >>> you've introduced a proper DOS or at least a severe new bottleneck into
> >>> the system. And because of mount namespace propagation that needs to be
> >>> serialized across all mount namespaces the namespace semaphore isn't
> >>> something we can just massage away.
> >> 
> >> AFAICT, a poorly designed LSM can easily DoS a system. Therefore, I 
> >> don’t think we need to overthink about a LSM helper causing DoS in 
> >> some special scenarios. The owner of the LSM, either built-in LSM or 
> >> BPF LSM, need to be aware of such risks and design the LSM rules 
> >> properly to avoid DoS risks. For example, if the path tree is really 
> >> deep, the LSM may decide to block the mount after walking a preset 
> >> number of steps.
> > 
> > The scope of the lock matters _a lot_. If a poorly designed LSM happens
> > to take exorbitant amount of time under the inode_lock() it's annoying:
> > to anyone else wanting to grab the inode_lock() _for that single inode_.
> > 
> > If a poorly designed LSM does broken stuff under the namespace semaphore
> > any mount event on the whole system will block, effectively deadlocking
> > the system in an instant. For example, if anything even glances at
> > /proc/<pid>/mountinfo it's game over. It's already iffy that we allow
> > security_sb_statfs() under there but that's at least guaranteed to be
> > fast.
> > 
> > If you can make it work so that we don't have to place security_*()
> > under the namespace semaphore and you can figure out how to deal with a
> > potential overmount racing you then this would be ideal for everyone.
> 
> I am trying to understand all the challenges here. 

As long as you're only interested in the source path's mount, you're
fine.

> 
> It appears to me that do_loopback() has the tricky issue:
> 
> static int do_loopback(struct path *path, ...)
> {
> 	...
> 	/* 
> 	 * path may still change, so not a good point to add
> 	 * security hook 
> 	 */
> 	mp = lock_mount(path);
> 	if (IS_ERR(mp)) {
> 		/* ... */
> 	}
> 	/* 
> 	 * namespace_sem is locked, so not a good point to add
> 	 * security hook
> 	 */
> 	...
> }
> 
> Basically, without major work with locking, there is no good 
> spot to insert a security hook into do_loopback(). Or, maybe 
> we can add a hook somewhere in lock_mount()? 

You can't really because the lookup_mnt() call in lock_mount() happens
under the namespace semaphore already and if it's the topmost mount it
won't be dropped again and you can't drop it again without risking
overmounts again.

But again, as long as you are interested in the source mount you should
be fine.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] vfs: security: Parse dev_name before calling security_sb_mount
  2025-07-16  8:31                       ` Christian Brauner
@ 2025-07-16 17:12                         ` Song Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Song Liu @ 2025-07-16 17:12 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Paul Moore, Al Viro, Song Liu, bpf@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org, apparmor@lists.ubuntu.com,
	selinux@vger.kernel.org, tomoyo-users_en@lists.sourceforge.net,
	tomoyo-users_ja@lists.sourceforge.net, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, amir73il@gmail.com,
	repnop@google.com, jlayton@kernel.org, josef@toxicpanda.com,
	mic@digikod.net, gnoack@google.com, m@maowtm.org,
	john.johansen@canonical.com, john@apparmor.net,
	stephen.smalley.work@gmail.com, omosnace@redhat.com,
	takedakn@nttdata.co.jp, penguin-kernel@i-love.sakura.ne.jp,
	enlightened@chromium.org



> On Jul 16, 2025, at 1:31 AM, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Tue, Jul 15, 2025 at 10:31:39PM +0000, Song Liu wrote:
>> 
>>> On Jul 15, 2025, at 3:18 AM, Christian Brauner <brauner@kernel.org> wrote:
>>> On Mon, Jul 14, 2025 at 03:10:57PM +0000, Song Liu wrote:
>> 
>> 
>> [...]
>> 
>>>>> If you place a new security hook into __do_loopback() the only thing
>>>>> that I'm not excited about is that we're holding the global namespace
>>>>> semaphore at that point. And I want to have as little LSM hook calls
>>>>> under the namespace semaphore as possible.
>>>> 
>>>> do_loopback() changed a bit since [1]. But if we put the new hook 
>>>> in do_loopback() before lock_mount(), we don’t have the problem with
>>>> the namespace semaphore, right? Also, this RFC doesn’t seem to have 
>>>> this issue either.
>>> 
>>> While the mount isn't locked another mount can still be mounted on top
>>> of it. lock_mount() will detect this and lookup the topmost mount and
>>> use that. IOW, the value of old_path->mnt may have changed after
>>> lock_mount().
>> 
>> I am probably confused. Do you mean path->mnt (instead of old_path->mnt) 
>> may have changed after lock_mount()?
> 
> I mean the target path. I forgot that the code uses @old_path to mean
> the source path not the target path. And you're interested in the source
> path, not the target path.

Both security_sb_mount and security_move_mount has the overmount issue 
for target path. 

[...]

>> 
>> It appears to me that do_loopback() has the tricky issue:
>> 
>> static int do_loopback(struct path *path, ...)
>> {
>> ...
>> /* 
>> * path may still change, so not a good point to add
>> * security hook 
>> */
>> mp = lock_mount(path);
>> if (IS_ERR(mp)) {
>> /* ... */
>> }
>> /* 
>> * namespace_sem is locked, so not a good point to add
>> * security hook
>> */
>> ...
>> }
>> 
>> Basically, without major work with locking, there is no good 
>> spot to insert a security hook into do_loopback(). Or, maybe 
>> we can add a hook somewhere in lock_mount()?
> 
> You can't really because the lookup_mnt() call in lock_mount() happens
> under the namespace semaphore already and if it's the topmost mount it
> won't be dropped again and you can't drop it again without risking
> overmounts again.

We probably have to accept the overmount issue for security_ hooks 
that covers the new mount APIs. 

> But again, as long as you are interested in the source mount you should
> be fine.

For the source path, we are back to the issue we want to address 
in this RFC: to get struct path of dev_name (source path) for bind 
mount. Among these proposals:

1. Introduce bpf_kern_path kfunc.
We will probably limit this kfunc to only work on security_sb_mount.

2. Add new hook(s), such as [1].
[1] https://lore.kernel.org/linux-security-module/20250110021008.2704246-1-enlightened@chromium.org/
This is not a complete solution. However, given security_sb_mount 
as-is handles so many different cases, we will likely split it in 
the future. Therefore, this new hook can be a reasonable incremental 
step. 

3. Something like this patch.

Does any proposal look acceptable?

Thanks,
Song


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-16 17:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 23:05 [RFC] vfs: security: Parse dev_name before calling security_sb_mount Song Liu
2025-07-09 10:24 ` Al Viro
2025-07-09 15:19   ` Paul Moore
2025-07-09 17:06     ` Song Liu
2025-07-10 11:46       ` Christian Brauner
2025-07-10 17:00         ` Song Liu
2025-07-11  9:36           ` Christian Brauner
2025-07-11 16:22             ` Song Liu
2025-07-14  8:45               ` Christian Brauner
2025-07-14 15:10                 ` Song Liu
2025-07-15 10:18                   ` Christian Brauner
2025-07-15 22:31                     ` Song Liu
2025-07-16  8:31                       ` Christian Brauner
2025-07-16 17:12                         ` Song Liu
2025-07-11 23:09             ` Song Liu
2025-07-10 21:40         ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).