* [PATCH 1/3] selftests/mount_setattr: Add a test to test locking mount attrs
@ 2023-08-10 9:00 Sargun Dhillon
2023-08-10 9:00 ` [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr Sargun Dhillon
2023-08-10 9:00 ` [PATCH 3/3] selftests/mount_setattr: Add tests for mount locking API Sargun Dhillon
0 siblings, 2 replies; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-10 9:00 UTC (permalink / raw)
To: linux-fsdevel, linux-api
Cc: Aleksa Sarai, Christoph Hellwig, Christian Brauner
Certain mount attributes are meant to be locked when sharing mounts with
another mount namespace. This validates that behaviour holds as expected.
- Locked attributes are not changeable
- Non-locked attributes can be changed, and changed back
Test output:
sudo ./mount_setattr_test -t mount_attr_lock
make: Nothing to be done for 'all'.
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN mount_setattr.mount_attr_lock ...
# OK mount_setattr.mount_attr_lock
ok 1 mount_setattr.mount_attr_lock
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
.../mount_setattr/mount_setattr_test.c | 54 +++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index c6a8c732b802..2aaa4aae41f5 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -400,6 +400,11 @@ FIXTURE_SETUP(mount_setattr)
ASSERT_EQ(mount("testing", "/tmp/B/BB", "tmpfs", MS_NOATIME | MS_NODEV,
"size=100000,mode=700"), 0);
+ ASSERT_EQ(mkdir("/tmp/C", 0777), 0);
+
+ ASSERT_EQ(mount("testing", "/tmp/C", "tmpfs", MS_NOATIME,
+ "size=100000,mode=700"), 0);
+
ASSERT_EQ(mount("testing", "/mnt", "tmpfs", MS_NOATIME | MS_NODEV,
"size=100000,mode=700"), 0);
@@ -1497,4 +1502,53 @@ TEST_F(mount_setattr, mount_attr_nosymfollow)
ASSERT_EQ(close(fd), 0);
}
+TEST_F(mount_setattr, mount_attr_lock)
+{
+ struct mount_attr attr = {
+ .attr_set = MOUNT_ATTR_RDONLY|MOUNT_ATTR_NOSUID|MOUNT_ATTR_NODEV,
+ };
+
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+ ASSERT_EQ(prepare_unpriv_mountns(), 0);
+
+ attr.attr_set = 0;
+ attr.attr_clr = MOUNT_ATTR_RDONLY;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ attr.attr_clr = MOUNT_ATTR_NOSUID;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ attr.attr_clr = MOUNT_ATTR_NODEV;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ /* Do not allow changing any atime flags after locking */
+ attr.attr_set = MOUNT_ATTR_RELATIME;
+ attr.attr_clr = MOUNT_ATTR__ATIME;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ attr.attr_set = MOUNT_ATTR_STRICTATIME;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ attr.attr_set = MOUNT_ATTR_NODIRATIME;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ /*
+ * "re-setting" the atime setting to the same value should work.
+ * Also, to make sure this isn't a no-op, try making things less permissive
+ */
+ attr.attr_set = MOUNT_ATTR_NOATIME | MOUNT_ATTR_NOEXEC;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+
+ /* We should still be allowed to clear the attribute we set */
+ attr.attr_set = 0;
+ attr.attr_clr = MOUNT_ATTR_NOEXEC;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+}
+
TEST_HARNESS_MAIN
--
2.39.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-10 9:00 [PATCH 1/3] selftests/mount_setattr: Add a test to test locking mount attrs Sargun Dhillon
@ 2023-08-10 9:00 ` Sargun Dhillon
2023-08-14 4:40 ` Aleksa Sarai
2023-08-15 9:30 ` Christian Brauner
2023-08-10 9:00 ` [PATCH 3/3] selftests/mount_setattr: Add tests for mount locking API Sargun Dhillon
1 sibling, 2 replies; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-10 9:00 UTC (permalink / raw)
To: linux-fsdevel, linux-api
Cc: Aleksa Sarai, Christoph Hellwig, Christian Brauner
We support locking certain mount attributes in the kernel. This API
isn't directly exposed to users. Right now, users can lock mount
attributes by going through the process of creating a new user
namespaces, and when the mounts are copied to the "lower privilege"
domain, they're locked. The mount can be reopened, and passed around
as a "locked mount".
Locked mounts are useful, for example, in container execution without
user namespaces, where you may want to expose some host data as read
only without allowing the container to remount the mount as mutable.
The API currently requires that the given privilege is taken away
while or before locking the flag in the less privileged position.
This could be relaxed in the future, where the user is allowed to
remount the mount as read only, but once they do, they cannot make
it read only again.
Right now, this allows for all flags that are lockable via the
userns unshare trick to be locked, other than the atime related
ones. This is because the semantics of what the "less privileged"
position is around the atime flags is unclear.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
include/uapi/linux/mount.h | 2 ++
2 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 54847db5b819..5396e544ac84 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
struct mount_kattr {
unsigned int attr_set;
unsigned int attr_clr;
+ unsigned int attr_lock;
unsigned int propagation;
unsigned int lookup_flags;
bool recurse;
@@ -3608,6 +3609,9 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
#define MOUNT_SETATTR_PROPAGATION_FLAGS \
(MS_UNBINDABLE | MS_PRIVATE | MS_SLAVE | MS_SHARED)
+#define MOUNT_SETATTR_VALID_LOCK_FLAGS \
+ (MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID | MOUNT_ATTR_NODEV | \
+ MOUNT_ATTR_NOEXEC)
static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
{
@@ -3629,6 +3633,22 @@ static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
return mnt_flags;
}
+static unsigned int attr_flags_to_mnt_lock_flags(u64 attr_flags)
+{
+ unsigned int mnt_flags = 0;
+
+ if (attr_flags & MOUNT_ATTR_RDONLY)
+ mnt_flags |= MNT_LOCK_READONLY;
+ if (attr_flags & MOUNT_ATTR_NOSUID)
+ mnt_flags |= MNT_LOCK_NOSUID;
+ if (attr_flags & MOUNT_ATTR_NODEV)
+ mnt_flags |= MNT_LOCK_NODEV;
+ if (attr_flags & MOUNT_ATTR_NOEXEC)
+ mnt_flags |= MNT_LOCK_NOEXEC;
+
+ return mnt_flags;
+}
+
/*
* Create a kernel mount representation for a new, prepared superblock
* (specified by fs_fd) and attach to an open_tree-like file descriptor.
@@ -4037,11 +4057,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
int err;
for (m = mnt; m; m = next_mnt(m, mnt)) {
- if (!can_change_locked_flags(m, recalc_flags(kattr, m))) {
+ int new_mount_flags = recalc_flags(kattr, m);
+
+ if (!can_change_locked_flags(m, new_mount_flags)) {
err = -EPERM;
break;
}
+ if ((new_mount_flags & kattr->attr_lock) != kattr->attr_lock) {
+ err = -EINVAL;
+ break;
+ }
+
err = can_idmap_mount(kattr, m);
if (err)
break;
@@ -4278,8 +4305,14 @@ static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
if ((attr->attr_set | attr->attr_clr) & ~MOUNT_SETATTR_VALID_FLAGS)
return -EINVAL;
+ if (attr->attr_lock & ~MOUNT_SETATTR_VALID_LOCK_FLAGS)
+ return -EINVAL;
+
kattr->attr_set = attr_flags_to_mnt_flags(attr->attr_set);
kattr->attr_clr = attr_flags_to_mnt_flags(attr->attr_clr);
+ kattr->attr_lock = attr_flags_to_mnt_flags(attr->attr_lock);
+ kattr->attr_set |= attr_flags_to_mnt_lock_flags(attr->attr_lock);
+
/*
* Since the MOUNT_ATTR_<atime> values are an enum, not a bitmap,
@@ -4337,7 +4370,7 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
struct mount_attr attr;
struct mount_kattr kattr;
- BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER0);
+ BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER1);
if (flags & ~(AT_EMPTY_PATH |
AT_RECURSIVE |
@@ -4360,7 +4393,8 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
/* Don't bother walking through the mounts if this is a nop. */
if (attr.attr_set == 0 &&
attr.attr_clr == 0 &&
- attr.propagation == 0)
+ attr.propagation == 0 &&
+ attr.attr_lock == 0)
return 0;
err = build_mount_kattr(&attr, usize, &kattr, flags);
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 4d93967f8aea..de667c4f852d 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -131,9 +131,11 @@ struct mount_attr {
__u64 attr_clr;
__u64 propagation;
__u64 userns_fd;
+ __u64 attr_lock;
};
/* List of all mount_attr versions. */
#define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */
+#define MOUNT_ATTR_SIZE_VER1 40
#endif /* _UAPI_LINUX_MOUNT_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] selftests/mount_setattr: Add tests for mount locking API
2023-08-10 9:00 [PATCH 1/3] selftests/mount_setattr: Add a test to test locking mount attrs Sargun Dhillon
2023-08-10 9:00 ` [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr Sargun Dhillon
@ 2023-08-10 9:00 ` Sargun Dhillon
1 sibling, 0 replies; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-10 9:00 UTC (permalink / raw)
To: linux-fsdevel, linux-api
Cc: Aleksa Sarai, Christoph Hellwig, Christian Brauner
This adds tests to lock specific flags in place, and verifies that
the expected rules hold.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
tools/include/uapi/linux/mount.h | 1 +
.../mount_setattr/mount_setattr_test.c | 42 +++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/tools/include/uapi/linux/mount.h b/tools/include/uapi/linux/mount.h
index 4d93967f8aea..66f302019373 100644
--- a/tools/include/uapi/linux/mount.h
+++ b/tools/include/uapi/linux/mount.h
@@ -135,5 +135,6 @@ struct mount_attr {
/* List of all mount_attr versions. */
#define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */
+#define MOUNT_ATTR_SIZE_VER1 40
#endif /* _UAPI_LINUX_MOUNT_H */
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 2aaa4aae41f5..3411fe17400b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1551,4 +1551,46 @@ TEST_F(mount_setattr, mount_attr_lock)
ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
}
+TEST_F(mount_setattr, mount_attr_do_lock)
+{
+ struct mount_attr attr = {};
+
+ attr.attr_lock = MOUNT_ATTR_NODIRATIME;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EINVAL);
+
+ attr.attr_lock = MOUNT_ATTR__ATIME;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EINVAL);
+
+ /* Do not allow locking unset locks */
+ attr.attr_lock = MOUNT_ATTR_NOEXEC;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EINVAL);
+
+ /* Set and lock at the same time */
+ attr.attr_set = MOUNT_ATTR_NOEXEC;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+ ASSERT_EQ(errno, EINVAL);
+
+ memset(&attr, 0, sizeof(attr));
+ /* Make sure we can't clear noexec now (that locking worked) */
+ attr.attr_clr = MOUNT_ATTR_NOEXEC;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+
+ memset(&attr, 0, sizeof(attr));
+ attr.attr_set = MOUNT_ATTR_NODEV;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+
+ memset(&attr, 0, sizeof(attr));
+ attr.attr_lock = MOUNT_ATTR_NODEV;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
+
+ /* Make sure we can't clear MOUNT_ATTR_NODEV */
+ memset(&attr, 0, sizeof(attr));
+ attr.attr_clr = MOUNT_ATTR_NODEV;
+ ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), -1);
+ ASSERT_EQ(errno, EPERM);
+}
TEST_HARNESS_MAIN
--
2.39.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-10 9:00 ` [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr Sargun Dhillon
@ 2023-08-14 4:40 ` Aleksa Sarai
2023-08-14 6:18 ` Aleksa Sarai
2023-08-14 18:15 ` Sargun Dhillon
2023-08-15 9:30 ` Christian Brauner
1 sibling, 2 replies; 13+ messages in thread
From: Aleksa Sarai @ 2023-08-14 4:40 UTC (permalink / raw)
To: Sargun Dhillon
Cc: linux-fsdevel, linux-api, Christoph Hellwig, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 6695 bytes --]
On 2023-08-10, Sargun Dhillon <sargun@sargun.me> wrote:
> We support locking certain mount attributes in the kernel. This API
> isn't directly exposed to users. Right now, users can lock mount
> attributes by going through the process of creating a new user
> namespaces, and when the mounts are copied to the "lower privilege"
> domain, they're locked. The mount can be reopened, and passed around
> as a "locked mount".
>
> Locked mounts are useful, for example, in container execution without
> user namespaces, where you may want to expose some host data as read
> only without allowing the container to remount the mount as mutable.
>
> The API currently requires that the given privilege is taken away
> while or before locking the flag in the less privileged position.
> This could be relaxed in the future, where the user is allowed to
> remount the mount as read only, but once they do, they cannot make
> it read only again.
>
> Right now, this allows for all flags that are lockable via the
> userns unshare trick to be locked, other than the atime related
> ones. This is because the semantics of what the "less privileged"
> position is around the atime flags is unclear.
>
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> ---
> fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> include/uapi/linux/mount.h | 2 ++
> 2 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 54847db5b819..5396e544ac84 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> struct mount_kattr {
> unsigned int attr_set;
> unsigned int attr_clr;
> + unsigned int attr_lock;
> unsigned int propagation;
> unsigned int lookup_flags;
> bool recurse;
> @@ -3608,6 +3609,9 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
>
> #define MOUNT_SETATTR_PROPAGATION_FLAGS \
> (MS_UNBINDABLE | MS_PRIVATE | MS_SLAVE | MS_SHARED)
> +#define MOUNT_SETATTR_VALID_LOCK_FLAGS \
> + (MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID | MOUNT_ATTR_NODEV | \
> + MOUNT_ATTR_NOEXEC)
>
> static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
> {
> @@ -3629,6 +3633,22 @@ static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
> return mnt_flags;
> }
>
> +static unsigned int attr_flags_to_mnt_lock_flags(u64 attr_flags)
> +{
> + unsigned int mnt_flags = 0;
> +
> + if (attr_flags & MOUNT_ATTR_RDONLY)
> + mnt_flags |= MNT_LOCK_READONLY;
> + if (attr_flags & MOUNT_ATTR_NOSUID)
> + mnt_flags |= MNT_LOCK_NOSUID;
> + if (attr_flags & MOUNT_ATTR_NODEV)
> + mnt_flags |= MNT_LOCK_NODEV;
> + if (attr_flags & MOUNT_ATTR_NOEXEC)
> + mnt_flags |= MNT_LOCK_NOEXEC;
> +
> + return mnt_flags;
> +}
> +
> /*
> * Create a kernel mount representation for a new, prepared superblock
> * (specified by fs_fd) and attach to an open_tree-like file descriptor.
> @@ -4037,11 +4057,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
> int err;
>
> for (m = mnt; m; m = next_mnt(m, mnt)) {
> - if (!can_change_locked_flags(m, recalc_flags(kattr, m))) {
> + int new_mount_flags = recalc_flags(kattr, m);
> +
> + if (!can_change_locked_flags(m, new_mount_flags)) {
> err = -EPERM;
> break;
> }
It just occurred to me that the whole MNT_LOCK_* machinery has the
unfortunate consequence of restricting the host root user from being
able to modify the locked flags. Since this change will let you do this
without creating a userns, do we want to make can_change_locked_flags()
do capable(CAP_SYS_MOUNT)?
> + if ((new_mount_flags & kattr->attr_lock) != kattr->attr_lock) {
> + err = -EINVAL;
> + break;
> + }
Since the MNT_LOCK_* flags are invisible to userspace, it seems more
reasonable to have the attr_lock set be added to the existing set rather
than requiring userspace to pass the same set of flags.
Actually, AFAICS this implementation breaks backwards compatibility
because with this change you now need to pass MNT_LOCK_* flags if
operating on a mount that has locks applied already. So existing
programs (which have .attr_lock=0) will start getting -EINVAL when
operating on mounts with locked flags (such as those locked in the
userns case). Or am I missing something?
In any case, the most reasonable behaviour would be to OR the requested
lock flags with the existing ones IMHO.
> +
> err = can_idmap_mount(kattr, m);
> if (err)
> break;
> @@ -4278,8 +4305,14 @@ static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
> if ((attr->attr_set | attr->attr_clr) & ~MOUNT_SETATTR_VALID_FLAGS)
> return -EINVAL;
>
> + if (attr->attr_lock & ~MOUNT_SETATTR_VALID_LOCK_FLAGS)
> + return -EINVAL;
> +
> kattr->attr_set = attr_flags_to_mnt_flags(attr->attr_set);
> kattr->attr_clr = attr_flags_to_mnt_flags(attr->attr_clr);
> + kattr->attr_lock = attr_flags_to_mnt_flags(attr->attr_lock);
> + kattr->attr_set |= attr_flags_to_mnt_lock_flags(attr->attr_lock);
> +
>
> /*
> * Since the MOUNT_ATTR_<atime> values are an enum, not a bitmap,
> @@ -4337,7 +4370,7 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
> struct mount_attr attr;
> struct mount_kattr kattr;
>
> - BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER1);
>
> if (flags & ~(AT_EMPTY_PATH |
> AT_RECURSIVE |
> @@ -4360,7 +4393,8 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
> /* Don't bother walking through the mounts if this is a nop. */
> if (attr.attr_set == 0 &&
> attr.attr_clr == 0 &&
> - attr.propagation == 0)
> + attr.propagation == 0 &&
> + attr.attr_lock == 0)
> return 0;
>
> err = build_mount_kattr(&attr, usize, &kattr, flags);
> diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> index 4d93967f8aea..de667c4f852d 100644
> --- a/include/uapi/linux/mount.h
> +++ b/include/uapi/linux/mount.h
> @@ -131,9 +131,11 @@ struct mount_attr {
> __u64 attr_clr;
> __u64 propagation;
> __u64 userns_fd;
> + __u64 attr_lock;
> };
>
> /* List of all mount_attr versions. */
> #define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */
> +#define MOUNT_ATTR_SIZE_VER1 40
>
> #endif /* _UAPI_LINUX_MOUNT_H */
> --
> 2.39.3
>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-14 4:40 ` Aleksa Sarai
@ 2023-08-14 6:18 ` Aleksa Sarai
2023-08-14 18:15 ` Sargun Dhillon
1 sibling, 0 replies; 13+ messages in thread
From: Aleksa Sarai @ 2023-08-14 6:18 UTC (permalink / raw)
To: Sargun Dhillon
Cc: linux-fsdevel, linux-api, Christoph Hellwig, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 7618 bytes --]
On 2023-08-14, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2023-08-10, Sargun Dhillon <sargun@sargun.me> wrote:
> > We support locking certain mount attributes in the kernel. This API
> > isn't directly exposed to users. Right now, users can lock mount
> > attributes by going through the process of creating a new user
> > namespaces, and when the mounts are copied to the "lower privilege"
> > domain, they're locked. The mount can be reopened, and passed around
> > as a "locked mount".
> >
> > Locked mounts are useful, for example, in container execution without
> > user namespaces, where you may want to expose some host data as read
> > only without allowing the container to remount the mount as mutable.
> >
> > The API currently requires that the given privilege is taken away
> > while or before locking the flag in the less privileged position.
> > This could be relaxed in the future, where the user is allowed to
> > remount the mount as read only, but once they do, they cannot make
> > it read only again.
> >
> > Right now, this allows for all flags that are lockable via the
> > userns unshare trick to be locked, other than the atime related
> > ones. This is because the semantics of what the "less privileged"
> > position is around the atime flags is unclear.
> >
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > ---
> > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > include/uapi/linux/mount.h | 2 ++
> > 2 files changed, 39 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 54847db5b819..5396e544ac84 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > struct mount_kattr {
> > unsigned int attr_set;
> > unsigned int attr_clr;
> > + unsigned int attr_lock;
> > unsigned int propagation;
> > unsigned int lookup_flags;
> > bool recurse;
> > @@ -3608,6 +3609,9 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
> >
> > #define MOUNT_SETATTR_PROPAGATION_FLAGS \
> > (MS_UNBINDABLE | MS_PRIVATE | MS_SLAVE | MS_SHARED)
> > +#define MOUNT_SETATTR_VALID_LOCK_FLAGS \
> > + (MOUNT_ATTR_RDONLY | MOUNT_ATTR_NOSUID | MOUNT_ATTR_NODEV | \
> > + MOUNT_ATTR_NOEXEC)
> >
> > static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
> > {
> > @@ -3629,6 +3633,22 @@ static unsigned int attr_flags_to_mnt_flags(u64 attr_flags)
> > return mnt_flags;
> > }
> >
> > +static unsigned int attr_flags_to_mnt_lock_flags(u64 attr_flags)
> > +{
> > + unsigned int mnt_flags = 0;
> > +
> > + if (attr_flags & MOUNT_ATTR_RDONLY)
> > + mnt_flags |= MNT_LOCK_READONLY;
> > + if (attr_flags & MOUNT_ATTR_NOSUID)
> > + mnt_flags |= MNT_LOCK_NOSUID;
> > + if (attr_flags & MOUNT_ATTR_NODEV)
> > + mnt_flags |= MNT_LOCK_NODEV;
> > + if (attr_flags & MOUNT_ATTR_NOEXEC)
> > + mnt_flags |= MNT_LOCK_NOEXEC;
> > +
> > + return mnt_flags;
> > +}
> > +
> > /*
> > * Create a kernel mount representation for a new, prepared superblock
> > * (specified by fs_fd) and attach to an open_tree-like file descriptor.
> > @@ -4037,11 +4057,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
> > int err;
> >
> > for (m = mnt; m; m = next_mnt(m, mnt)) {
> > - if (!can_change_locked_flags(m, recalc_flags(kattr, m))) {
> > + int new_mount_flags = recalc_flags(kattr, m);
> > +
> > + if (!can_change_locked_flags(m, new_mount_flags)) {
> > err = -EPERM;
> > break;
> > }
>
> It just occurred to me that the whole MNT_LOCK_* machinery has the
> unfortunate consequence of restricting the host root user from being
> able to modify the locked flags. Since this change will let you do this
> without creating a userns, do we want to make can_change_locked_flags()
> do capable(CAP_SYS_MOUNT)?
Then again, it seems the semantics of changing locked mount flags would
probably be a bit ugly -- should changing the flag unset the locked bit?
If not, then not being able to clear the flags would make userspace's
existing mechanism for handling locked mounts (inherit the all lockable
mount flags already set on the mountpoint) would not work anymore.
So maybe it's better to just leave this as-is...
> > + if ((new_mount_flags & kattr->attr_lock) != kattr->attr_lock) {
> > + err = -EINVAL;
> > + break;
> > + }
>
> Since the MNT_LOCK_* flags are invisible to userspace, it seems more
> reasonable to have the attr_lock set be added to the existing set rather
> than requiring userspace to pass the same set of flags.
>
> Actually, AFAICS this implementation breaks backwards compatibility
> because with this change you now need to pass MNT_LOCK_* flags if
> operating on a mount that has locks applied already. So existing
> programs (which have .attr_lock=0) will start getting -EINVAL when
> operating on mounts with locked flags (such as those locked in the
> userns case). Or am I missing something?
>
> In any case, the most reasonable behaviour would be to OR the requested
> lock flags with the existing ones IMHO.
>
> > +
> > err = can_idmap_mount(kattr, m);
> > if (err)
> > break;
> > @@ -4278,8 +4305,14 @@ static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
> > if ((attr->attr_set | attr->attr_clr) & ~MOUNT_SETATTR_VALID_FLAGS)
> > return -EINVAL;
> >
> > + if (attr->attr_lock & ~MOUNT_SETATTR_VALID_LOCK_FLAGS)
> > + return -EINVAL;
> > +
> > kattr->attr_set = attr_flags_to_mnt_flags(attr->attr_set);
> > kattr->attr_clr = attr_flags_to_mnt_flags(attr->attr_clr);
> > + kattr->attr_lock = attr_flags_to_mnt_flags(attr->attr_lock);
> > + kattr->attr_set |= attr_flags_to_mnt_lock_flags(attr->attr_lock);
> > +
> >
> > /*
> > * Since the MOUNT_ATTR_<atime> values are an enum, not a bitmap,
> > @@ -4337,7 +4370,7 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
> > struct mount_attr attr;
> > struct mount_kattr kattr;
> >
> > - BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER0);
> > + BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER1);
> >
> > if (flags & ~(AT_EMPTY_PATH |
> > AT_RECURSIVE |
> > @@ -4360,7 +4393,8 @@ SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path,
> > /* Don't bother walking through the mounts if this is a nop. */
> > if (attr.attr_set == 0 &&
> > attr.attr_clr == 0 &&
> > - attr.propagation == 0)
> > + attr.propagation == 0 &&
> > + attr.attr_lock == 0)
> > return 0;
> >
> > err = build_mount_kattr(&attr, usize, &kattr, flags);
> > diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> > index 4d93967f8aea..de667c4f852d 100644
> > --- a/include/uapi/linux/mount.h
> > +++ b/include/uapi/linux/mount.h
> > @@ -131,9 +131,11 @@ struct mount_attr {
> > __u64 attr_clr;
> > __u64 propagation;
> > __u64 userns_fd;
> > + __u64 attr_lock;
> > };
> >
> > /* List of all mount_attr versions. */
> > #define MOUNT_ATTR_SIZE_VER0 32 /* sizeof first published struct */
> > +#define MOUNT_ATTR_SIZE_VER1 40
> >
> > #endif /* _UAPI_LINUX_MOUNT_H */
> > --
> > 2.39.3
> >
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-14 4:40 ` Aleksa Sarai
2023-08-14 6:18 ` Aleksa Sarai
@ 2023-08-14 18:15 ` Sargun Dhillon
2023-08-14 23:58 ` Aleksa Sarai
1 sibling, 1 reply; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-14 18:15 UTC (permalink / raw)
To: Aleksa Sarai
Cc: linux-fsdevel, linux-api, Christoph Hellwig, Christian Brauner
On Sun, Aug 13, 2023 at 10:41 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
>
> It just occurred to me that the whole MNT_LOCK_* machinery has the
> unfortunate consequence of restricting the host root user from being
> able to modify the locked flags. Since this change will let you do this
> without creating a userns, do we want to make can_change_locked_flags()
> do capable(CAP_SYS_MOUNT)?
>
Doesn't mount_setattr already require that the user has CAP_SYS_ADMIN
in the mount's user namespace?
I'm not sure how this lets us bypass that.
Or are you saying we should check for
CAP_SYS_MOUNT && CAP_SYS_ADMIN?
> > + if ((new_mount_flags & kattr->attr_lock) != kattr->attr_lock) {
> > + err = -EINVAL;
> > + break;
> > + }
>
> Since the MNT_LOCK_* flags are invisible to userspace, it seems more
> reasonable to have the attr_lock set be added to the existing set rather
> than requiring userspace to pass the same set of flags.
>
IMHO, it's nice to be able to use the existing set of flags. I don't mind
adding new flags though.
> Actually, AFAICS this implementation breaks backwards compatibility
> because with this change you now need to pass MNT_LOCK_* flags if
> operating on a mount that has locks applied already. So existing
> programs (which have .attr_lock=0) will start getting -EINVAL when
> operating on mounts with locked flags (such as those locked in the
> userns case). Or am I missing something?
I don't think so, because if attr_lock is 0, then
new_mount_flags & kattr->attr_lock is 0. kattr->attr_lock is only
flags to *newly lock*, and doesn't inherit the set of current locks.
>
> In any case, the most reasonable behaviour would be to OR the requested
> lock flags with the existing ones IMHO.
>
I can append this to the mount_attr_do_lock test in the next patch, and it
lets me flip the NOSUID bit on and off without attr_lock being set, unless
you're referring to something else. You shouldn't be able to attr_clr a
locked attribute (even as CAP_SYS_ADMIN in the init_user_ns).
attr_set will noop.
/* Make sure we can set NOSUID (not locked) */
memset(&attr, 0, sizeof(attr));
attr.attr_set = MOUNT_ATTR_NOSUID;
ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
/* Make sure we can clear NOSUID (not locked) */
memset(&attr, 0, sizeof(attr));
attr.attr_clr = MOUNT_ATTR_NOSUID;
ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-14 18:15 ` Sargun Dhillon
@ 2023-08-14 23:58 ` Aleksa Sarai
0 siblings, 0 replies; 13+ messages in thread
From: Aleksa Sarai @ 2023-08-14 23:58 UTC (permalink / raw)
To: Sargun Dhillon
Cc: linux-fsdevel, linux-api, Christoph Hellwig, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 3605 bytes --]
On 2023-08-14, Sargun Dhillon <sargun@sargun.me> wrote:
> On Sun, Aug 13, 2023 at 10:41 PM Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > It just occurred to me that the whole MNT_LOCK_* machinery has the
> > unfortunate consequence of restricting the host root user from being
> > able to modify the locked flags. Since this change will let you do this
> > without creating a userns, do we want to make can_change_locked_flags()
> > do capable(CAP_SYS_MOUNT)?
> >
> Doesn't mount_setattr already require that the user has CAP_SYS_ADMIN
> in the mount's user namespace?
>
> I'm not sure how this lets us bypass that.
>
> Or are you saying we should check for
> CAP_SYS_MOUNT && CAP_SYS_ADMIN?
I was talking about the fact that can_change_locked_flags() doesn't have
an escape catch for capable(CAP_SYS_whatever). But as I mentioned in a
later mail, this might be safe but it would have other downsides.
> > > + if ((new_mount_flags & kattr->attr_lock) != kattr->attr_lock) {
> > > + err = -EINVAL;
> > > + break;
> > > + }
> >
> > Since the MNT_LOCK_* flags are invisible to userspace, it seems more
> > reasonable to have the attr_lock set be added to the existing set rather
> > than requiring userspace to pass the same set of flags.
> >
> IMHO, it's nice to be able to use the existing set of flags. I don't mind
> adding new flags though.
This was related to the later paragraph. I agree passing the existing
MOUNT_ATTR_* flags to attr_lock is preferable.
> > Actually, AFAICS this implementation breaks backwards compatibility
> > because with this change you now need to pass MNT_LOCK_* flags if
> > operating on a mount that has locks applied already. So existing
> > programs (which have .attr_lock=0) will start getting -EINVAL when
> > operating on mounts with locked flags (such as those locked in the
> > userns case). Or am I missing something?
> I don't think so, because if attr_lock is 0, then
> new_mount_flags & kattr->attr_lock is 0. kattr->attr_lock is only
> flags to *newly lock*, and doesn't inherit the set of current locks.
Ah, I misread this (I was confused why this check was needed which made
me think it must be checking something else, but looking at it again,
the check is actually making sure that if you lock a flag that the flag
is also set) -- in that case the behaviour is exactly what I was
describing. Oops!
> > In any case, the most reasonable behaviour would be to OR the requested
> > lock flags with the existing ones IMHO.
> >
> I can append this to the mount_attr_do_lock test in the next patch, and it
> lets me flip the NOSUID bit on and off without attr_lock being set, unless
> you're referring to something else.
The behaviour I was talking about was:
memset(&attr, 0, sizeof(attr));
attr.attr_set = attr.attr_lock = MOUNT_ATTR_NOSUID;
ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0);
memset(&attr, 0, sizeof(attr));
attr.attr_set = attr.attr_lock = MOUNT_ATTR_NOEXEC;
ASSERT_EQ(sys_mount_setattr(-1, "/tmp/C", 0, &attr, sizeof(attr)), 0); // should work
/* mount should have MOUNT_ATTR_NOSUID|MOUNT_ATTR_NOEXEC now */
(Which is what the current implementation should do.)
> You shouldn't be able to attr_clr a locked attribute (even as
> CAP_SYS_ADMIN in the init_user_ns).
This is what I was referring to in the escape hatch bit above (whether
we should allow this).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-10 9:00 ` [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr Sargun Dhillon
2023-08-14 4:40 ` Aleksa Sarai
@ 2023-08-15 9:30 ` Christian Brauner
2023-08-15 13:46 ` Sargun Dhillon
1 sibling, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2023-08-15 9:30 UTC (permalink / raw)
To: Sargun Dhillon; +Cc: linux-fsdevel, linux-api, Aleksa Sarai, Christoph Hellwig
On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> We support locking certain mount attributes in the kernel. This API
> isn't directly exposed to users. Right now, users can lock mount
> attributes by going through the process of creating a new user
> namespaces, and when the mounts are copied to the "lower privilege"
> domain, they're locked. The mount can be reopened, and passed around
> as a "locked mount".
Not sure if that's what you're getting at but you can actually fully
create these locked mounts already:
P1 P2
# init userns + init mountns # init userns + init mountns
sudo mount --bind /foo /bar
sudo mount --bind -o ro,nosuid,nodev,noexec /bar
# unprivileged userns + unprivileged mountns
unshare --mount --user --map-root
mount --bind -oremount
fd = open_tree(/bar, OPEN_TREE_CLONE)
send(fd_send, P2);
recv(&fd_recv, P1)
move_mount(fd_recv, /locked-mnt);
and now you have a fully locked mount on the host for P2. Did you mean that?
>
> Locked mounts are useful, for example, in container execution without
> user namespaces, where you may want to expose some host data as read
> only without allowing the container to remount the mount as mutable.
>
> The API currently requires that the given privilege is taken away
> while or before locking the flag in the less privileged position.
> This could be relaxed in the future, where the user is allowed to
> remount the mount as read only, but once they do, they cannot make
> it read only again.
s/read only/read write/
>
> Right now, this allows for all flags that are lockable via the
> userns unshare trick to be locked, other than the atime related
> ones. This is because the semantics of what the "less privileged"
> position is around the atime flags is unclear.
I think that atime stuff doesn't really make sense to expose to
userspace. That seems a bit pointless imho.
>
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> ---
> fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> include/uapi/linux/mount.h | 2 ++
> 2 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 54847db5b819..5396e544ac84 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> struct mount_kattr {
> unsigned int attr_set;
> unsigned int attr_clr;
> + unsigned int attr_lock;
So when I originally noted down this crazy idea
https://github.com/uapi-group/kernel-features
I didn't envision a new struct member but rather a flag that could be
raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
other flags in attr_set to become locked.
So if we could avoid growing the struct pointlessly I'd prefer that. Is
there a reason that wouldn't work?
I have no strong feelings about this tbh. It seems useful overall to
have this ability. But it deviates a bit from regular mount semantics in
that you can lock mount properties for the lifetime of the mount
explicitly.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-15 9:30 ` Christian Brauner
@ 2023-08-15 13:46 ` Sargun Dhillon
2023-08-16 7:32 ` Christian Brauner
0 siblings, 1 reply; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-15 13:46 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-api, Aleksa Sarai, Christoph Hellwig
On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> > We support locking certain mount attributes in the kernel. This API
> > isn't directly exposed to users. Right now, users can lock mount
> > attributes by going through the process of creating a new user
> > namespaces, and when the mounts are copied to the "lower privilege"
> > domain, they're locked. The mount can be reopened, and passed around
> > as a "locked mount".
>
> Not sure if that's what you're getting at but you can actually fully
> create these locked mounts already:
>
> P1 P2
> # init userns + init mountns # init userns + init mountns
> sudo mount --bind /foo /bar
> sudo mount --bind -o ro,nosuid,nodev,noexec /bar
>
> # unprivileged userns + unprivileged mountns
> unshare --mount --user --map-root
>
> mount --bind -oremount
>
> fd = open_tree(/bar, OPEN_TREE_CLONE)
>
> send(fd_send, P2);
>
> recv(&fd_recv, P1)
>
> move_mount(fd_recv, /locked-mnt);
>
> and now you have a fully locked mount on the host for P2. Did you mean that?
>
Yep. Doing this within a program without clone / fork is awkward. Forking and
unsharing in random C++ programs doesn't always go super well, so in my
mind it'd be nice to have an API to do this directly.
In addition, having the superblock continue to be owned by the userns that
its mounted in is nice because then they can toggle the other mount attributes
(nodev, nosuid, noexec are the ones we care about).
> >
> > Locked mounts are useful, for example, in container execution without
> > user namespaces, where you may want to expose some host data as read
> > only without allowing the container to remount the mount as mutable.
> >
> > The API currently requires that the given privilege is taken away
> > while or before locking the flag in the less privileged position.
> > This could be relaxed in the future, where the user is allowed to
> > remount the mount as read only, but once they do, they cannot make
> > it read only again.
>
> s/read only/read write/
>
> >
> > Right now, this allows for all flags that are lockable via the
> > userns unshare trick to be locked, other than the atime related
> > ones. This is because the semantics of what the "less privileged"
> > position is around the atime flags is unclear.
>
> I think that atime stuff doesn't really make sense to expose to
> userspace. That seems a bit pointless imho.
>
> >
> > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > ---
> > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > include/uapi/linux/mount.h | 2 ++
> > 2 files changed, 39 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 54847db5b819..5396e544ac84 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > struct mount_kattr {
> > unsigned int attr_set;
> > unsigned int attr_clr;
> > + unsigned int attr_lock;
>
> So when I originally noted down this crazy idea
> https://github.com/uapi-group/kernel-features
> I didn't envision a new struct member but rather a flag that could be
> raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
> other flags in attr_set to become locked.
>
> So if we could avoid growing the struct pointlessly I'd prefer that. Is
> there a reason that wouldn't work?
No reason. The semantics were just a little more awkward, IMHO.
Specifically:
* This attr could never be cleared, only set, which didn't seem to follow
the attr_set / attr_clr semantics
* If we ever introduced a mount_getattr call, you'd want to expose
each of the locked bits independently, I'd think, and exposing
that through one flag wouldn't give you the same fidelity.
>
> I have no strong feelings about this tbh. It seems useful overall to
> have this ability. But it deviates a bit from regular mount semantics in
> that you can lock mount properties for the lifetime of the mount
> explicitly.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-15 13:46 ` Sargun Dhillon
@ 2023-08-16 7:32 ` Christian Brauner
2023-08-16 8:56 ` Aleksa Sarai
0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2023-08-16 7:32 UTC (permalink / raw)
To: Sargun Dhillon; +Cc: linux-fsdevel, linux-api, Aleksa Sarai, Christoph Hellwig
On Tue, Aug 15, 2023 at 06:46:33AM -0700, Sargun Dhillon wrote:
> On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> > > We support locking certain mount attributes in the kernel. This API
> > > isn't directly exposed to users. Right now, users can lock mount
> > > attributes by going through the process of creating a new user
> > > namespaces, and when the mounts are copied to the "lower privilege"
> > > domain, they're locked. The mount can be reopened, and passed around
> > > as a "locked mount".
> >
> > Not sure if that's what you're getting at but you can actually fully
> > create these locked mounts already:
> >
> > P1 P2
> > # init userns + init mountns # init userns + init mountns
> > sudo mount --bind /foo /bar
> > sudo mount --bind -o ro,nosuid,nodev,noexec /bar
> >
> > # unprivileged userns + unprivileged mountns
> > unshare --mount --user --map-root
> >
> > mount --bind -oremount
> >
> > fd = open_tree(/bar, OPEN_TREE_CLONE)
> >
> > send(fd_send, P2);
> >
> > recv(&fd_recv, P1)
> >
> > move_mount(fd_recv, /locked-mnt);
> >
> > and now you have a fully locked mount on the host for P2. Did you mean that?
> >
>
> Yep. Doing this within a program without clone / fork is awkward. Forking and
> unsharing in random C++ programs doesn't always go super well, so in my
> mind it'd be nice to have an API to do this directly.
>
> In addition, having the superblock continue to be owned by the userns that
> its mounted in is nice because then they can toggle the other mount attributes
> (nodev, nosuid, noexec are the ones we care about).
>
> > >
> > > Locked mounts are useful, for example, in container execution without
> > > user namespaces, where you may want to expose some host data as read
> > > only without allowing the container to remount the mount as mutable.
> > >
> > > The API currently requires that the given privilege is taken away
> > > while or before locking the flag in the less privileged position.
> > > This could be relaxed in the future, where the user is allowed to
> > > remount the mount as read only, but once they do, they cannot make
> > > it read only again.
> >
> > s/read only/read write/
> >
> > >
> > > Right now, this allows for all flags that are lockable via the
> > > userns unshare trick to be locked, other than the atime related
> > > ones. This is because the semantics of what the "less privileged"
> > > position is around the atime flags is unclear.
> >
> > I think that atime stuff doesn't really make sense to expose to
> > userspace. That seems a bit pointless imho.
> >
> > >
> > > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > > ---
> > > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > > include/uapi/linux/mount.h | 2 ++
> > > 2 files changed, 39 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index 54847db5b819..5396e544ac84 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > > struct mount_kattr {
> > > unsigned int attr_set;
> > > unsigned int attr_clr;
> > > + unsigned int attr_lock;
> >
> > So when I originally noted down this crazy idea
> > https://github.com/uapi-group/kernel-features
> > I didn't envision a new struct member but rather a flag that could be
> > raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
> > other flags in attr_set to become locked.
> >
> > So if we could avoid growing the struct pointlessly I'd prefer that. Is
> > there a reason that wouldn't work?
> No reason. The semantics were just a little more awkward, IMHO.
> Specifically:
> * This attr could never be cleared, only set, which didn't seem to follow
> the attr_set / attr_clr semantics
> * If we ever introduced a mount_getattr call, you'd want to expose
> each of the locked bits independently, I'd think, and exposing
> that through one flag wouldn't give you the same fidelity.
Hm, right. So it's either new flags or a new member. @Aleksa?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-16 7:32 ` Christian Brauner
@ 2023-08-16 8:56 ` Aleksa Sarai
2023-08-16 10:36 ` Christian Brauner
0 siblings, 1 reply; 13+ messages in thread
From: Aleksa Sarai @ 2023-08-16 8:56 UTC (permalink / raw)
To: Christian Brauner
Cc: Sargun Dhillon, linux-fsdevel, linux-api, Christoph Hellwig
[-- Attachment #1: Type: text/plain, Size: 4978 bytes --]
On 2023-08-16, Christian Brauner <brauner@kernel.org> wrote:
> On Tue, Aug 15, 2023 at 06:46:33AM -0700, Sargun Dhillon wrote:
> > On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> > > > We support locking certain mount attributes in the kernel. This API
> > > > isn't directly exposed to users. Right now, users can lock mount
> > > > attributes by going through the process of creating a new user
> > > > namespaces, and when the mounts are copied to the "lower privilege"
> > > > domain, they're locked. The mount can be reopened, and passed around
> > > > as a "locked mount".
> > >
> > > Not sure if that's what you're getting at but you can actually fully
> > > create these locked mounts already:
> > >
> > > P1 P2
> > > # init userns + init mountns # init userns + init mountns
> > > sudo mount --bind /foo /bar
> > > sudo mount --bind -o ro,nosuid,nodev,noexec /bar
> > >
> > > # unprivileged userns + unprivileged mountns
> > > unshare --mount --user --map-root
> > >
> > > mount --bind -oremount
> > >
> > > fd = open_tree(/bar, OPEN_TREE_CLONE)
> > >
> > > send(fd_send, P2);
> > >
> > > recv(&fd_recv, P1)
> > >
> > > move_mount(fd_recv, /locked-mnt);
> > >
> > > and now you have a fully locked mount on the host for P2. Did you mean that?
> > >
> >
> > Yep. Doing this within a program without clone / fork is awkward. Forking and
> > unsharing in random C++ programs doesn't always go super well, so in my
> > mind it'd be nice to have an API to do this directly.
> >
> > In addition, having the superblock continue to be owned by the userns that
> > its mounted in is nice because then they can toggle the other mount attributes
> > (nodev, nosuid, noexec are the ones we care about).
> >
> > > >
> > > > Locked mounts are useful, for example, in container execution without
> > > > user namespaces, where you may want to expose some host data as read
> > > > only without allowing the container to remount the mount as mutable.
> > > >
> > > > The API currently requires that the given privilege is taken away
> > > > while or before locking the flag in the less privileged position.
> > > > This could be relaxed in the future, where the user is allowed to
> > > > remount the mount as read only, but once they do, they cannot make
> > > > it read only again.
> > >
> > > s/read only/read write/
> > >
> > > >
> > > > Right now, this allows for all flags that are lockable via the
> > > > userns unshare trick to be locked, other than the atime related
> > > > ones. This is because the semantics of what the "less privileged"
> > > > position is around the atime flags is unclear.
> > >
> > > I think that atime stuff doesn't really make sense to expose to
> > > userspace. That seems a bit pointless imho.
> > >
> > > >
> > > > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > > > ---
> > > > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > > > include/uapi/linux/mount.h | 2 ++
> > > > 2 files changed, 39 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > index 54847db5b819..5396e544ac84 100644
> > > > --- a/fs/namespace.c
> > > > +++ b/fs/namespace.c
> > > > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > > > struct mount_kattr {
> > > > unsigned int attr_set;
> > > > unsigned int attr_clr;
> > > > + unsigned int attr_lock;
> > >
> > > So when I originally noted down this crazy idea
> > > https://github.com/uapi-group/kernel-features
> > > I didn't envision a new struct member but rather a flag that could be
> > > raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
> > > other flags in attr_set to become locked.
> > >
> > > So if we could avoid growing the struct pointlessly I'd prefer that. Is
> > > there a reason that wouldn't work?
> > No reason. The semantics were just a little more awkward, IMHO.
> > Specifically:
> > * This attr could never be cleared, only set, which didn't seem to follow
> > the attr_set / attr_clr semantics
> > * If we ever introduced a mount_getattr call, you'd want to expose
> > each of the locked bits independently, I'd think, and exposing
> > that through one flag wouldn't give you the same fidelity.
>
> Hm, right. So it's either new flags or a new member. @Aleksa?
I like ->attr_lock more tbh, especially since they cannot be cleared.
They are implemented as mount flags internally, but conceptually locking
flags is a separate thing to setting them.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-16 8:56 ` Aleksa Sarai
@ 2023-08-16 10:36 ` Christian Brauner
2023-08-25 17:02 ` Sargun Dhillon
0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2023-08-16 10:36 UTC (permalink / raw)
To: Aleksa Sarai; +Cc: Sargun Dhillon, linux-fsdevel, linux-api, Christoph Hellwig
On Wed, Aug 16, 2023 at 06:56:25PM +1000, Aleksa Sarai wrote:
> On 2023-08-16, Christian Brauner <brauner@kernel.org> wrote:
> > On Tue, Aug 15, 2023 at 06:46:33AM -0700, Sargun Dhillon wrote:
> > > On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> > > > > We support locking certain mount attributes in the kernel. This API
> > > > > isn't directly exposed to users. Right now, users can lock mount
> > > > > attributes by going through the process of creating a new user
> > > > > namespaces, and when the mounts are copied to the "lower privilege"
> > > > > domain, they're locked. The mount can be reopened, and passed around
> > > > > as a "locked mount".
> > > >
> > > > Not sure if that's what you're getting at but you can actually fully
> > > > create these locked mounts already:
> > > >
> > > > P1 P2
> > > > # init userns + init mountns # init userns + init mountns
> > > > sudo mount --bind /foo /bar
> > > > sudo mount --bind -o ro,nosuid,nodev,noexec /bar
> > > >
> > > > # unprivileged userns + unprivileged mountns
> > > > unshare --mount --user --map-root
> > > >
> > > > mount --bind -oremount
> > > >
> > > > fd = open_tree(/bar, OPEN_TREE_CLONE)
> > > >
> > > > send(fd_send, P2);
> > > >
> > > > recv(&fd_recv, P1)
> > > >
> > > > move_mount(fd_recv, /locked-mnt);
> > > >
> > > > and now you have a fully locked mount on the host for P2. Did you mean that?
> > > >
> > >
> > > Yep. Doing this within a program without clone / fork is awkward. Forking and
> > > unsharing in random C++ programs doesn't always go super well, so in my
> > > mind it'd be nice to have an API to do this directly.
> > >
> > > In addition, having the superblock continue to be owned by the userns that
> > > its mounted in is nice because then they can toggle the other mount attributes
> > > (nodev, nosuid, noexec are the ones we care about).
> > >
> > > > >
> > > > > Locked mounts are useful, for example, in container execution without
> > > > > user namespaces, where you may want to expose some host data as read
> > > > > only without allowing the container to remount the mount as mutable.
> > > > >
> > > > > The API currently requires that the given privilege is taken away
> > > > > while or before locking the flag in the less privileged position.
> > > > > This could be relaxed in the future, where the user is allowed to
> > > > > remount the mount as read only, but once they do, they cannot make
> > > > > it read only again.
> > > >
> > > > s/read only/read write/
> > > >
> > > > >
> > > > > Right now, this allows for all flags that are lockable via the
> > > > > userns unshare trick to be locked, other than the atime related
> > > > > ones. This is because the semantics of what the "less privileged"
> > > > > position is around the atime flags is unclear.
> > > >
> > > > I think that atime stuff doesn't really make sense to expose to
> > > > userspace. That seems a bit pointless imho.
> > > >
> > > > >
> > > > > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > > > > ---
> > > > > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > > > > include/uapi/linux/mount.h | 2 ++
> > > > > 2 files changed, 39 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > > index 54847db5b819..5396e544ac84 100644
> > > > > --- a/fs/namespace.c
> > > > > +++ b/fs/namespace.c
> > > > > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > > > > struct mount_kattr {
> > > > > unsigned int attr_set;
> > > > > unsigned int attr_clr;
> > > > > + unsigned int attr_lock;
> > > >
> > > > So when I originally noted down this crazy idea
> > > > https://github.com/uapi-group/kernel-features
> > > > I didn't envision a new struct member but rather a flag that could be
> > > > raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
> > > > other flags in attr_set to become locked.
> > > >
> > > > So if we could avoid growing the struct pointlessly I'd prefer that. Is
> > > > there a reason that wouldn't work?
> > > No reason. The semantics were just a little more awkward, IMHO.
> > > Specifically:
> > > * This attr could never be cleared, only set, which didn't seem to follow
> > > the attr_set / attr_clr semantics
> > > * If we ever introduced a mount_getattr call, you'd want to expose
> > > each of the locked bits independently, I'd think, and exposing
> > > that through one flag wouldn't give you the same fidelity.
> >
> > Hm, right. So it's either new flags or a new member. @Aleksa?
>
> I like ->attr_lock more tbh, especially since they cannot be cleared.
> They are implemented as mount flags internally, but conceptually locking
> flags is a separate thing to setting them.
Ok, it'd be neat if could do the sanity review of this api then as you
know the eventual users probably best.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr
2023-08-16 10:36 ` Christian Brauner
@ 2023-08-25 17:02 ` Sargun Dhillon
0 siblings, 0 replies; 13+ messages in thread
From: Sargun Dhillon @ 2023-08-25 17:02 UTC (permalink / raw)
To: Christian Brauner
Cc: Aleksa Sarai, linux-fsdevel, linux-api, Christoph Hellwig
On Wed, Aug 16, 2023 at 4:36 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Aug 16, 2023 at 06:56:25PM +1000, Aleksa Sarai wrote:
> > On 2023-08-16, Christian Brauner <brauner@kernel.org> wrote:
> > > On Tue, Aug 15, 2023 at 06:46:33AM -0700, Sargun Dhillon wrote:
> > > > On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote:
> > > > > > We support locking certain mount attributes in the kernel. This API
> > > > > > isn't directly exposed to users. Right now, users can lock mount
> > > > > > attributes by going through the process of creating a new user
> > > > > > namespaces, and when the mounts are copied to the "lower privilege"
> > > > > > domain, they're locked. The mount can be reopened, and passed around
> > > > > > as a "locked mount".
> > > > >
> > > > > Not sure if that's what you're getting at but you can actually fully
> > > > > create these locked mounts already:
> > > > >
> > > > > P1 P2
> > > > > # init userns + init mountns # init userns + init mountns
> > > > > sudo mount --bind /foo /bar
> > > > > sudo mount --bind -o ro,nosuid,nodev,noexec /bar
> > > > >
> > > > > # unprivileged userns + unprivileged mountns
> > > > > unshare --mount --user --map-root
> > > > >
> > > > > mount --bind -oremount
> > > > >
> > > > > fd = open_tree(/bar, OPEN_TREE_CLONE)
> > > > >
> > > > > send(fd_send, P2);
> > > > >
> > > > > recv(&fd_recv, P1)
> > > > >
> > > > > move_mount(fd_recv, /locked-mnt);
> > > > >
> > > > > and now you have a fully locked mount on the host for P2. Did you mean that?
> > > > >
> > > >
> > > > Yep. Doing this within a program without clone / fork is awkward. Forking and
> > > > unsharing in random C++ programs doesn't always go super well, so in my
> > > > mind it'd be nice to have an API to do this directly.
> > > >
> > > > In addition, having the superblock continue to be owned by the userns that
> > > > its mounted in is nice because then they can toggle the other mount attributes
> > > > (nodev, nosuid, noexec are the ones we care about).
> > > >
> > > > > >
> > > > > > Locked mounts are useful, for example, in container execution without
> > > > > > user namespaces, where you may want to expose some host data as read
> > > > > > only without allowing the container to remount the mount as mutable.
> > > > > >
> > > > > > The API currently requires that the given privilege is taken away
> > > > > > while or before locking the flag in the less privileged position.
> > > > > > This could be relaxed in the future, where the user is allowed to
> > > > > > remount the mount as read only, but once they do, they cannot make
> > > > > > it read only again.
> > > > >
> > > > > s/read only/read write/
> > > > >
> > > > > >
> > > > > > Right now, this allows for all flags that are lockable via the
> > > > > > userns unshare trick to be locked, other than the atime related
> > > > > > ones. This is because the semantics of what the "less privileged"
> > > > > > position is around the atime flags is unclear.
> > > > >
> > > > > I think that atime stuff doesn't really make sense to expose to
> > > > > userspace. That seems a bit pointless imho.
> > > > >
> > > > > >
> > > > > > Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> > > > > > ---
> > > > > > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++---
> > > > > > include/uapi/linux/mount.h | 2 ++
> > > > > > 2 files changed, 39 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > > > > index 54847db5b819..5396e544ac84 100644
> > > > > > --- a/fs/namespace.c
> > > > > > +++ b/fs/namespace.c
> > > > > > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
> > > > > > struct mount_kattr {
> > > > > > unsigned int attr_set;
> > > > > > unsigned int attr_clr;
> > > > > > + unsigned int attr_lock;
> > > > >
> > > > > So when I originally noted down this crazy idea
> > > > > https://github.com/uapi-group/kernel-features
> > > > > I didn't envision a new struct member but rather a flag that could be
> > > > > raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the
> > > > > other flags in attr_set to become locked.
> > > > >
> > > > > So if we could avoid growing the struct pointlessly I'd prefer that. Is
> > > > > there a reason that wouldn't work?
> > > > No reason. The semantics were just a little more awkward, IMHO.
> > > > Specifically:
> > > > * This attr could never be cleared, only set, which didn't seem to follow
> > > > the attr_set / attr_clr semantics
> > > > * If we ever introduced a mount_getattr call, you'd want to expose
> > > > each of the locked bits independently, I'd think, and exposing
> > > > that through one flag wouldn't give you the same fidelity.
> > >
> > > Hm, right. So it's either new flags or a new member. @Aleksa?
> >
> > I like ->attr_lock more tbh, especially since they cannot be cleared.
> > They are implemented as mount flags internally, but conceptually locking
> > flags is a separate thing to setting them.
>
> Ok, it'd be neat if could do the sanity review of this api then as you
> know the eventual users probably best.
Aleksa,
What do you think? The biggest miss / frustration with this API is that
there is no way to introspect which flags are locked from userspace,
but given the absence of a mount_getattr syscall (currently), I think
that we can add that later.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-08-25 17:04 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-10 9:00 [PATCH 1/3] selftests/mount_setattr: Add a test to test locking mount attrs Sargun Dhillon
2023-08-10 9:00 ` [PATCH 2/3] fs: Allow user to lock mount attributes with mount_setattr Sargun Dhillon
2023-08-14 4:40 ` Aleksa Sarai
2023-08-14 6:18 ` Aleksa Sarai
2023-08-14 18:15 ` Sargun Dhillon
2023-08-14 23:58 ` Aleksa Sarai
2023-08-15 9:30 ` Christian Brauner
2023-08-15 13:46 ` Sargun Dhillon
2023-08-16 7:32 ` Christian Brauner
2023-08-16 8:56 ` Aleksa Sarai
2023-08-16 10:36 ` Christian Brauner
2023-08-25 17:02 ` Sargun Dhillon
2023-08-10 9:00 ` [PATCH 3/3] selftests/mount_setattr: Add tests for mount locking API Sargun Dhillon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).