* [PATCH v3 0/2] implement OA2_INHERIT_CRED flag for openat2() @ 2024-04-23 22:46 Stas Sergeev 2024-04-23 22:46 ` [PATCH 1/2] fs: reorganize path_openat() Stas Sergeev 2024-04-23 22:46 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 0 siblings, 2 replies; 11+ messages in thread From: Stas Sergeev @ 2024-04-23 22:46 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. It is needed to perform an open operation with the creds that were in effect when the dir_fd was opened. This allows the process to pre-open some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged user, while still retaining the possibility to open/create files within the pre-opened directory set. The more detailed description (including security considerations) is available in the log messages of individual patches. Changes in v3: - partially revert v2 changes to avoid overriding capabilities. Only the bare minimum is overridden: fsuid, fsgid and group_info. Document the fact the full cred override is unwanted, as it may represent an unneeded security risk. Changes in v2: - capture full struct cred instead of just fsuid/fsgid. Suggested by Stefan Metzmacher <metze@samba.org> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] fs: reorganize path_openat() 2024-04-23 22:46 [PATCH v3 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev @ 2024-04-23 22:46 ` Stas Sergeev 2024-04-23 22:46 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 1 sibling, 0 replies; 11+ messages in thread From: Stas Sergeev @ 2024-04-23 22:46 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This patch moves the call to alloc_empty_file() below the call to path_init(). That changes is needed for the next patch, which adds a cred override for alloc_empty_file(). The needed cred info is only available after the call to path_init(). No functional changes are intended by that patch. Signed-off-by: Stas Sergeev <stsp2@yandex.ru> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Andy Lutomirski <luto@kernel.org> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org --- fs/namei.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index c5b2a25be7d0..2fde2c320ae9 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3782,22 +3782,30 @@ static struct file *path_openat(struct nameidata *nd, struct file *file; int error; - file = alloc_empty_file(op->open_flag, current_cred()); - if (IS_ERR(file)) - return file; - - if (unlikely(file->f_flags & __O_TMPFILE)) { + if (unlikely(op->open_flag & __O_TMPFILE)) { + file = alloc_empty_file(op->open_flag, current_cred()); + if (IS_ERR(file)) + return file; error = do_tmpfile(nd, flags, op, file); - } else if (unlikely(file->f_flags & O_PATH)) { + } else if (unlikely(op->open_flag & O_PATH)) { + file = alloc_empty_file(op->open_flag, current_cred()); + if (IS_ERR(file)) + return file; error = do_o_path(nd, flags, file); } else { const char *s = path_init(nd, flags); - while (!(error = link_path_walk(s, nd)) && - (s = open_last_lookups(nd, file, op)) != NULL) - ; + file = alloc_empty_file(op->open_flag, current_cred()); + error = PTR_ERR_OR_ZERO(file); + if (!error) { + while (!(error = link_path_walk(s, nd)) && + (s = open_last_lookups(nd, file, op)) != NULL) + ; + } if (!error) error = do_open(nd, file, op); terminate_walk(nd); + if (IS_ERR(file)) + return file; } if (likely(!error)) { if (likely(file->f_mode & FMODE_OPENED)) -- 2.44.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-23 22:46 [PATCH v3 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev 2024-04-23 22:46 ` [PATCH 1/2] fs: reorganize path_openat() Stas Sergeev @ 2024-04-23 22:46 ` Stas Sergeev 1 sibling, 0 replies; 11+ messages in thread From: Stas Sergeev @ 2024-04-23 22:46 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This flag performs the open operation with the fs credentials (fsuid, fsgid, group_info) that were in effect when dir_fd was opened. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: - Only the bare minimal set of credentials is overridden: fsuid, fsgid and group_info. The rest, for example capabilities, are not overridden to avoid unneeded security risks. - To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. - To avoid leaking creds across exec, this patch requires O_CLOEXEC flag on a directory. Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev <stsp2@yandex.ru> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> --- fs/internal.h | 2 +- fs/namei.c | 61 ++++++++++++++++++++++++++++++++++-- fs/open.c | 2 +- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 ++ 5 files changed, 66 insertions(+), 4 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 7ca738904e34..692b53b19aad 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_block *sb) * open.c */ struct open_flags { - int open_flag; + u64 open_flag; umode_t mode; int acc_mode; int intent; diff --git a/fs/namei.c b/fs/namei.c index 2fde2c320ae9..f34ad2b296c7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -586,6 +586,9 @@ struct nameidata { int dfd; vfsuid_t dir_vfsuid; umode_t dir_mode; + kuid_t dir_open_fsuid; + kgid_t dir_open_fsgid; + struct group_info *dir_open_groups; } __randomize_layout; #define ND_ROOT_PRESET 1 @@ -695,6 +698,8 @@ static void terminate_walk(struct nameidata *nd) nd->depth = 0; nd->path.mnt = NULL; nd->path.dentry = NULL; + if (nd->dir_open_groups) + put_group_info(nd->dir_open_groups); } /* path_put is needed afterwards regardless of success or failure */ @@ -2414,6 +2419,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) get_fs_pwd(current->fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = current_cred()->fsuid; + nd->dir_open_fsgid = current_cred()->fsgid; + nd->dir_open_groups = get_current_groups(); } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); @@ -2437,6 +2445,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags) path_get(&nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = f.file->f_cred->fsuid; + nd->dir_open_fsgid = f.file->f_cred->fsgid; + nd->dir_open_groups = get_group_info( + f.file->f_cred->group_info); fdput(f); } @@ -3776,6 +3788,29 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) return error; } +static const struct cred *openat2_override_creds(struct nameidata *nd) +{ + const struct cred *old_cred; + struct cred *override_cred; + + override_cred = prepare_creds(); + if (!override_cred) + return NULL; + + override_cred->fsuid = nd->dir_open_fsuid; + override_cred->fsgid = nd->dir_open_fsgid; + override_cred->group_info = nd->dir_open_groups; + + override_cred->non_rcu = 1; + + old_cred = override_creds(override_cred); + + /* override_cred() gets its own ref */ + put_cred(override_cred); + + return old_cred; +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3794,8 +3829,28 @@ static struct file *path_openat(struct nameidata *nd, error = do_o_path(nd, flags, file); } else { const char *s = path_init(nd, flags); - file = alloc_empty_file(op->open_flag, current_cred()); - error = PTR_ERR_OR_ZERO(file); + const struct cred *old_cred = NULL; + + error = 0; + if (op->open_flag & OA2_INHERIT_CRED) { + /* Make sure to work only with restricted + * look-up modes. + */ + if (!(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) + error = -EPERM; + /* Only work with O_CLOEXEC dirs. */ + if (!get_close_on_exec(nd->dfd)) + error = -EPERM; + + if (!error) + old_cred = openat2_override_creds(nd); + } + if (!error) { + file = alloc_empty_file(op->open_flag, current_cred()); + error = PTR_ERR_OR_ZERO(file); + } else { + file = ERR_PTR(error); + } if (!error) { while (!(error = link_path_walk(s, nd)) && (s = open_last_lookups(nd, file, op)) != NULL) @@ -3803,6 +3858,8 @@ static struct file *path_openat(struct nameidata *nd, } if (!error) error = do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); if (IS_ERR(file)) return file; diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..6be013182a35 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..b71f8b162102 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..cdd676a10b62 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,7 @@ struct open_how { return -EAGAIN if that's not possible. */ +/* openat2-specific flags go to upper 4 bytes. */ +#define OA2_INHERIT_CRED (1ULL << 32) + #endif /* _UAPI_LINUX_OPENAT2_H */ -- 2.44.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2() @ 2024-04-24 10:52 Stas Sergeev 2024-04-24 10:52 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 0 siblings, 1 reply; 11+ messages in thread From: Stas Sergeev @ 2024-04-24 10:52 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. It is needed to perform an open operation with the creds that were in effect when the dir_fd was opened. This allows the process to pre-open some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged user, while still retaining the possibility to open/create files within the pre-opened directory set. The sand-boxing is security-oriented: symlinks leading outside of a sand-box are rejected. /proc magic links are rejected. The more detailed description (including security considerations) is available in the log messages of individual patches. Changes in v4: - add optimizations suggested by David Laight <David.Laight@ACULAB.COM> - move security checks to build_open_flags() - force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org> Changes in v3: - partially revert v2 changes to avoid overriding capabilities. Only the bare minimum is overridden: fsuid, fsgid and group_info. Document the fact the full cred override is unwanted, as it may represent an unneeded security risk. Changes in v2: - capture full struct cred instead of just fsuid/fsgid. Suggested by Stefan Metzmacher <metze@samba.org> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: David Laight <David.Laight@ACULAB.COM> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-24 10:52 [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev @ 2024-04-24 10:52 ` Stas Sergeev 2024-04-25 2:31 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Stas Sergeev @ 2024-04-24 10:52 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This flag performs the open operation with the fs credentials (fsuid, fsgid, group_info) that were in effect when dir_fd was opened. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: - Only the bare minimal set of credentials is overridden: fsuid, fsgid and group_info. The rest, for example capabilities, are not overridden to avoid unneeded security risks. - To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. - To avoid leaking creds across exec, this patch requires O_CLOEXEC flag on a directory. - Magic /proc symlinks are discarded, as suggested by Andy Lutomirski <luto@kernel.org> Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev <stsp2@yandex.ru> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> --- fs/internal.h | 2 +- fs/namei.c | 56 ++++++++++++++++++++++++++++++++++-- fs/open.c | 10 ++++++- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 ++ 5 files changed, 69 insertions(+), 4 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 7ca738904e34..692b53b19aad 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_block *sb) * open.c */ struct open_flags { - int open_flag; + u64 open_flag; umode_t mode; int acc_mode; int intent; diff --git a/fs/namei.c b/fs/namei.c index 413eef134234..aeb9f504538e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -586,6 +586,9 @@ struct nameidata { int dfd; vfsuid_t dir_vfsuid; umode_t dir_mode; + kuid_t dir_open_fsuid; + kgid_t dir_open_fsgid; + struct group_info *dir_open_groups; } __randomize_layout; #define ND_ROOT_PRESET 1 @@ -695,6 +698,8 @@ static void terminate_walk(struct nameidata *nd) nd->depth = 0; nd->path.mnt = NULL; nd->path.dentry = NULL; + if (nd->dir_open_groups) + put_group_info(nd->dir_open_groups); } /* path_put is needed afterwards regardless of success or failure */ @@ -2414,6 +2419,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) get_fs_pwd(current->fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = current_cred()->fsuid; + nd->dir_open_fsgid = current_cred()->fsgid; + nd->dir_open_groups = get_current_groups(); } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); @@ -2437,6 +2445,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags) path_get(&nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_fsuid = f.file->f_cred->fsuid; + nd->dir_open_fsgid = f.file->f_cred->fsgid; + nd->dir_open_groups = get_group_info( + f.file->f_cred->group_info); fdput(f); } @@ -3776,6 +3788,29 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) return error; } +static const struct cred *openat2_override_creds(struct nameidata *nd) +{ + const struct cred *old_cred; + struct cred *override_cred; + + override_cred = prepare_creds(); + if (!override_cred) + return NULL; + + override_cred->fsuid = nd->dir_open_fsuid; + override_cred->fsgid = nd->dir_open_fsgid; + override_cred->group_info = nd->dir_open_groups; + + override_cred->non_rcu = 1; + + old_cred = override_creds(override_cred); + + /* override_cred() gets its own ref */ + put_cred(override_cred); + + return old_cred; +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd, error = do_o_path(nd, flags, file); } else { const char *s = path_init(nd, flags); - file = alloc_empty_file(open_flags, current_cred()); - error = PTR_ERR_OR_ZERO(file); + const struct cred *old_cred = NULL; + + error = 0; + if (open_flags & OA2_INHERIT_CRED) { + /* Only work with O_CLOEXEC dirs. */ + if (!get_close_on_exec(nd->dfd)) + error = -EPERM; + + if (!error) + old_cred = openat2_override_creds(nd); + } + if (!error) { + file = alloc_empty_file(open_flags, current_cred()); + error = PTR_ERR_OR_ZERO(file); + } else { + file = ERR_PTR(error); + } if (!error) { while (!(error = link_path_walk(s, nd)) && (s = open_last_lookups(nd, file, op)) != NULL) @@ -3802,6 +3852,8 @@ static struct file *path_openat(struct nameidata *nd, } if (!error) error = do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); if (IS_ERR(file)) return file; diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..c871ff8fc6e3 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; @@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) lookup_flags |= LOOKUP_CACHED; } + if (flags & OA2_INHERIT_CRED) { + /* Inherit creds only with scoped look-up modes. */ + if (!(lookup_flags & LOOKUP_IS_SCOPED)) + return -EPERM; + /* Reject /proc "magic" links if inheriting creds. */ + lookup_flags |= LOOKUP_NO_MAGICLINKS; + } + op->lookup_flags = lookup_flags; return 0; } diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..b71f8b162102 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..cdd676a10b62 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,7 @@ struct open_how { return -EAGAIN if that's not possible. */ +/* openat2-specific flags go to upper 4 bytes. */ +#define OA2_INHERIT_CRED (1ULL << 32) + #endif /* _UAPI_LINUX_OPENAT2_H */ -- 2.44.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-24 10:52 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev @ 2024-04-25 2:31 ` Al Viro 2024-04-25 7:24 ` stsp 2024-04-25 9:23 ` stsp 2024-04-25 13:50 ` kernel test robot 2024-04-25 14:02 ` Christian Brauner 2 siblings, 2 replies; 11+ messages in thread From: Al Viro @ 2024-04-25 2:31 UTC (permalink / raw) To: Stas Sergeev Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche On Wed, Apr 24, 2024 at 01:52:48PM +0300, Stas Sergeev wrote: > @@ -3793,8 +3828,23 @@ static struct file *path_openat(struct nameidata *nd, > error = do_o_path(nd, flags, file); > } else { > const char *s = path_init(nd, flags); > - file = alloc_empty_file(open_flags, current_cred()); > - error = PTR_ERR_OR_ZERO(file); > + const struct cred *old_cred = NULL; > + > + error = 0; > + if (open_flags & OA2_INHERIT_CRED) { > + /* Only work with O_CLOEXEC dirs. */ > + if (!get_close_on_exec(nd->dfd)) > + error = -EPERM; > + > + if (!error) > + old_cred = openat2_override_creds(nd); > + } > + if (!error) { > + file = alloc_empty_file(open_flags, current_cred()); Consider the following, currently absolutely harmless situation: * process is owned by luser:students. * descriptor 69 refers to root-opened root directory (O_RDONLY) What's the expected result of fcntl(69, F_SEFTD, O_CLOEXEC); opening "etc/shadow" with dirfd equal to 69 and your flag given subsequent read() from the resulting descriptor? At which point will the kernel say "go fuck yourself, I'm not letting you read that file", provided that attacker passes that new flag of yours? As a bonus question, how about opening it for _write_, seeing that this is an obvious instant roothole? Again, currently the setup that has a root-opened directory in descriptor table of a non-root process is safe. Incidentally, suppose you have the same process run with stdin opened (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with dirfd being 0, pathname - "" and flags - O_RDWR. AFAICS, without an explicit opt-in by the original opener it's a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED, whatever) it would be a good idea - it gives too much. NAKed-by: Al Viro <viro@zeniv.linux.org.uk> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-25 2:31 ` Al Viro @ 2024-04-25 7:24 ` stsp 2024-04-25 9:23 ` stsp 1 sibling, 0 replies; 11+ messages in thread From: stsp @ 2024-04-25 7:24 UTC (permalink / raw) To: Al Viro Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche 25.04.2024 05:31, Al Viro пишет: > Consider the following, currently absolutely harmless situation: > * process is owned by luser:students. > * descriptor 69 refers to root-opened root directory (O_RDONLY) > What's the expected result of > fcntl(69, F_SEFTD, O_CLOEXEC); > opening "etc/shadow" with dirfd equal to 69 and your flag given > subsequent read() from the resulting descriptor? > > At which point will the kernel say "go fuck yourself, I'm not letting you > read that file", provided that attacker passes that new flag of yours? > > As a bonus question, how about opening it for _write_, seeing that this > is an obvious instant roothole? > > Again, currently the setup that has a root-opened directory in descriptor > table of a non-root process is safe. > > Incidentally, suppose you have the same process run with stdin opened > (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with > dirfd being 0, pathname - "" and flags - O_RDWR. Ok, F_SETFD, how simple. :( > AFAICS, without an explicit opt-in by the original opener it's > a non-starter, and TBH I doubt that even with such opt-in (FMODE_CRED, > whatever) it would be a good idea - it gives too much. Yes, which is why I am quite sceptical to this FMODE_CRED idea. Please note that my O_CLOEXEC check actually meant to check that exactly this process have opened the dir. It just didn't happen that way, as you pointed. Can I replace the O_CLOEXEC check with some explicit check that makes sure the fd was opened by exactly that process? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-25 2:31 ` Al Viro 2024-04-25 7:24 ` stsp @ 2024-04-25 9:23 ` stsp 1 sibling, 0 replies; 11+ messages in thread From: stsp @ 2024-04-25 9:23 UTC (permalink / raw) To: Al Viro Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche 25.04.2024 05:31, Al Viro пишет: > Incidentally, suppose you have the same process run with stdin opened > (r/o) by root. F_SETFD it to O_CLOEXEC, then use your open with > dirfd being 0, pathname - "" and flags - O_RDWR. I actually checked this with the test-case. It seems to return ENOENT: Breakpoint 1, openat2 (dirfd=0, pathname=0x7fffffffdbee "", how=0x7fffffffd5e0, size=24) at tst.c:13 13 return syscall(SYS_openat2, dirfd, pathname, how, size); (gdb) fin Run till exit from #0 openat2 (dirfd=0, pathname=0x7fffffffdbee "", how=0x7fffffffd5e0, size=24) at tst.c:13 0x000000000040167b in main (argc=3, argv=0x7fffffffd7b8) at tst.c:140 140 fd = openat2(0, efile, &how1, sizeof(how1)); Value returned is $1 = -1 (gdb) list 135 err = fcntl(0, F_SETFD, O_CLOEXEC); 136 if (err) { 137 perror("fcntl(F_SETFD)"); 138 return EXIT_FAILURE; 139 } 140 fd = openat2(0, efile, &how1, sizeof(how1)); 141 if (fd == -1) { 142 perror("openat2(1)"); 143 // return EXIT_FAILURE; 144 } else { (gdb) p errno $2 = 2 So it seems the creds can't be stolen from a non-dir fd, but I wonder why ENOENT is returned instead of ENOTDIR. Such ENOENT is not dicumented in a man page of openat2(), so I guess there is some problem here even w/o my patch. :) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-24 10:52 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 2024-04-25 2:31 ` Al Viro @ 2024-04-25 13:50 ` kernel test robot 2024-04-25 14:02 ` Christian Brauner 2 siblings, 0 replies; 11+ messages in thread From: kernel test robot @ 2024-04-25 13:50 UTC (permalink / raw) To: Stas Sergeev Cc: oe-lkp, lkp, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, Paolo Bonzini, Christian Göttsche, linux-fsdevel, linux-kernel, Stas Sergeev, David Laight, linux-api, oliver.sang Hello, kernel test robot noticed "BUG:KASAN:wild-memory-access_in_terminate_walk" on: commit: 97bb54b42b1d6150e9ae11a7bf7833ed9f8c471d ("[PATCH 2/2] openat2: add OA2_INHERIT_CRED flag") url: https://github.com/intel-lab-lkp/linux/commits/Stas-Sergeev/fs-reorganize-path_openat/20240424-185527 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 9d1ddab261f3e2af7c384dc02238784ce0cf9f98 patch link: https://lore.kernel.org/all/20240424105248.189032-3-stsp2@yandex.ru/ patch subject: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag in testcase: boot compiler: clang-17 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +---------------------------------------------------------------------------------------+------------+------------+ | | 831d3c6cc6 | 97bb54b42b | +---------------------------------------------------------------------------------------+------------+------------+ | BUG:KASAN:wild-memory-access_in_terminate_walk | 0 | 12 | | canonical_address#:#[##] | 0 | 12 | | RIP:terminate_walk | 0 | 12 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 12 | +---------------------------------------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202404252107.3c18eed2-lkp@intel.com [ 2.555857][ T16] BUG: KASAN: wild-memory-access in terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] Write of size 4 at addr aaaaaaaaaaaaaaaa by task kdevtmpfs/16 [ 2.556181][ T16] [ 2.556181][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd [ 2.556181][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 2.556181][ T16] Call Trace: [ 2.556181][ T16] <TASK> [ 2.556181][ T16] dump_stack_lvl (lib/dump_stack.c:116) [ 2.556181][ T16] print_report (mm/kasan/report.c:?) [ 2.556181][ T16] ? kasan_report (mm/kasan/report.c:214 mm/kasan/report.c:590) [ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] kasan_report (mm/kasan/report.c:603) [ 2.556181][ T16] ? terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] kasan_check_range (mm/kasan/generic.c:?) [ 2.556181][ T16] terminate_walk (include/linux/instrumented.h:? include/linux/atomic/atomic-instrumented.h:400 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.556181][ T16] path_lookupat (fs/namei.c:2515) [ 2.556181][ T16] filename_lookup (fs/namei.c:2526) [ 2.556181][ T16] kern_path (fs/namei.c:2634) [ 2.556181][ T16] init_mount (fs/init.c:22) [ 2.556181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419) [ 2.556181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436) [ 2.556181][ T16] kthread (kernel/kthread.c:390) [ 2.556181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435) [ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.556181][ T16] ret_from_fork (arch/x86/kernel/process.c:153) [ 2.556181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.556181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 2.556181][ T16] </TASK> [ 2.556181][ T16] ================================================================== [ 2.556184][ T16] Disabling lock debugging due to kernel taint [ 2.556901][ T16] general protection fault, probably for non-canonical address 0xaaaaaaaaaaaaaaaa: 0000 [#1] KASAN PTI [ 2.558131][ T16] CPU: 0 PID: 16 Comm: kdevtmpfs Tainted: G B T 6.9.0-rc5-00038-g97bb54b42b1d #1 c90cc2d91176f38ca16e85ead0a72934082854cd [ 2.559653][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 2.560181][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.560181][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4 All code ======== 0: 03 43 80 add -0x80(%rbx),%eax 3: 3c 2e cmp $0x2e,%al 5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) 9: 89 ff mov %edi,%edi b: e8 01 61 f4 ff call 0xfffffffffff46111 10: 49 8b 1f mov (%r15),%rbx 13: 48 85 db test %rbx,%rbx 16: 74 41 je 0x59 18: 48 89 df mov %rbx,%rdi 1b: be 04 00 00 00 mov $0x4,%esi 20: e8 dc 61 f4 ff call 0xfffffffffff46201 25: b8 ff ff ff ff mov $0xffffffff,%eax 2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction 2d: 83 f8 01 cmp $0x1,%eax 30: 75 25 jne 0x57 32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) 37: 74 08 je 0x41 39: 4c 89 ff mov %r15,%rdi 3c: e8 .byte 0xe8 3d: d0 60 f4 shlb -0xc(%rax) Code starting with the faulting instruction =========================================== 0: 0f c1 03 xadd %eax,(%rbx) 3: 83 f8 01 cmp $0x1,%eax 6: 75 25 jne 0x2d 8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) d: 74 08 je 0x17 f: 4c 89 ff mov %r15,%rdi 12: e8 .byte 0xe8 13: d0 60 f4 shlb -0xc(%rax) [ 2.560181][ T16] RSP: 0000:ffffc9000010fc40 EFLAGS: 00010246 [ 2.560181][ T16] RAX: 00000000ffffffff RBX: aaaaaaaaaaaaaaaa RCX: ffffffff811e4a0f [ 2.560181][ T16] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8792adc0 [ 2.560181][ T16] RBP: 0000000000000011 R08: ffffffff8792adc7 R09: 1ffffffff0f255b8 [ 2.560181][ T16] R10: dffffc0000000000 R11: fffffbfff0f255b9 R12: 1ffff92000021fc4 [ 2.560181][ T16] R13: dffffc0000000000 R14: 1ffff92000021fc1 R15: ffffc9000010fe08 [ 2.560181][ T16] FS: 0000000000000000(0000) GS:ffffffff878dc000(0000) knlGS:0000000000000000 [ 2.560181][ T16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.560181][ T16] CR2: ffff88843ffff000 CR3: 000000000789c000 CR4: 00000000000406f0 [ 2.560181][ T16] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2.560181][ T16] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2.560181][ T16] Call Trace: [ 2.560181][ T16] <TASK> [ 2.560181][ T16] ? __die_body (arch/x86/kernel/dumpstack.c:421) [ 2.560181][ T16] ? die_addr (arch/x86/kernel/dumpstack.c:?) [ 2.560181][ T16] ? exc_general_protection (arch/x86/kernel/traps.c:?) [ 2.560181][ T16] ? end_report (arch/x86/include/asm/current.h:49 mm/kasan/report.c:240) [ 2.560181][ T16] ? asm_exc_general_protection (arch/x86/include/asm/idtentry.h:617) [ 2.560181][ T16] ? add_taint (arch/x86/include/asm/bitops.h:60 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/panic.c:555) [ 2.560181][ T16] ? terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.560181][ T16] path_lookupat (fs/namei.c:2515) [ 2.560181][ T16] filename_lookup (fs/namei.c:2526) [ 2.560181][ T16] kern_path (fs/namei.c:2634) [ 2.560181][ T16] init_mount (fs/init.c:22) [ 2.560181][ T16] devtmpfs_setup (drivers/base/devtmpfs.c:419) [ 2.560181][ T16] devtmpfsd (drivers/base/devtmpfs.c:436) [ 2.560181][ T16] kthread (kernel/kthread.c:390) [ 2.560181][ T16] ? vclkdev_alloc (drivers/base/devtmpfs.c:435) [ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.560181][ T16] ret_from_fork (arch/x86/kernel/process.c:153) [ 2.560181][ T16] ? kthread_unuse_mm (kernel/kthread.c:341) [ 2.560181][ T16] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) [ 2.560181][ T16] </TASK> [ 2.560181][ T16] Modules linked in: [ 2.560183][ T16] ---[ end trace 0000000000000000 ]--- [ 2.560820][ T16] RIP: 0010:terminate_walk (arch/x86/include/asm/atomic.h:103 include/linux/atomic/atomic-arch-fallback.h:949 include/linux/atomic/atomic-instrumented.h:401 include/linux/refcount.h:264 include/linux/refcount.h:307 include/linux/refcount.h:325 fs/namei.c:702) [ 2.561462][ T16] Code: 03 43 80 3c 2e 00 74 08 4c 89 ff e8 01 61 f4 ff 49 8b 1f 48 85 db 74 41 48 89 df be 04 00 00 00 e8 dc 61 f4 ff b8 ff ff ff ff <0f> c1 03 83 f8 01 75 25 43 80 3c 2e 00 74 08 4c 89 ff e8 d0 60 f4 All code ======== 0: 03 43 80 add -0x80(%rbx),%eax 3: 3c 2e cmp $0x2e,%al 5: 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) 9: 89 ff mov %edi,%edi b: e8 01 61 f4 ff call 0xfffffffffff46111 10: 49 8b 1f mov (%r15),%rbx 13: 48 85 db test %rbx,%rbx 16: 74 41 je 0x59 18: 48 89 df mov %rbx,%rdi 1b: be 04 00 00 00 mov $0x4,%esi 20: e8 dc 61 f4 ff call 0xfffffffffff46201 25: b8 ff ff ff ff mov $0xffffffff,%eax 2a:* 0f c1 03 xadd %eax,(%rbx) <-- trapping instruction 2d: 83 f8 01 cmp $0x1,%eax 30: 75 25 jne 0x57 32: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) 37: 74 08 je 0x41 39: 4c 89 ff mov %r15,%rdi 3c: e8 .byte 0xe8 3d: d0 60 f4 shlb -0xc(%rax) Code starting with the faulting instruction =========================================== 0: 0f c1 03 xadd %eax,(%rbx) 3: 83 f8 01 cmp $0x1,%eax 6: 75 25 jne 0x2d 8: 43 80 3c 2e 00 cmpb $0x0,(%r14,%r13,1) d: 74 08 je 0x17 f: 4c 89 ff mov %r15,%rdi 12: e8 .byte 0xe8 13: d0 60 f4 shlb -0xc(%rax) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240425/202404252107.3c18eed2-lkp@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-24 10:52 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 2024-04-25 2:31 ` Al Viro 2024-04-25 13:50 ` kernel test robot @ 2024-04-25 14:02 ` Christian Brauner 2024-04-26 13:36 ` stsp 2 siblings, 1 reply; 11+ messages in thread From: Christian Brauner @ 2024-04-25 14:02 UTC (permalink / raw) To: Stas Sergeev Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche > struct open_flags { > - int open_flag; > + u64 open_flag; Btw, this change taken together with > +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) is also ripe to causes subtle bugs and security issues. This new VALID_OPENAT2_FLAGS define bypasses BUILD_BUG_ON_MSG(upper_32_bits(VALID_OPEN_FLAGS), "struct open_flags doesn't yet handle flags > 32 bits"); in build_open_flags(). And right now lookup_open(), open_last_lookups(), and do_open() just do: int open_flag = op->open_flag; Because op->open_flag was 32bit that was fine. But now ->open_flag is 64bit which means we truncate the upper 32bit including OA2_INHERIT_CRED or any other new flag in the upper 32bits in those functions. So as soon as there's an additional check in e.g., do_open() for OA2_INHERIT_CRED or in any of the other helpers that's security relevant we're fscked because that flag is never seen and no bot will help us here. And it's super easy to miss during review... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-25 14:02 ` Christian Brauner @ 2024-04-26 13:36 ` stsp 0 siblings, 0 replies; 11+ messages in thread From: stsp @ 2024-04-26 13:36 UTC (permalink / raw) To: Christian Brauner Cc: linux-kernel, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, David Laight, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche 25.04.2024 17:02, Christian Brauner пишет: >> struct open_flags { >> - int open_flag; >> + u64 open_flag; > Btw, this change taken together with All fixed in v5. I dropped u64 use. Other comments are addressed as well. Please let me know if I missed some. Thank you. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 0/2] implement OA2_INHERIT_CRED flag for openat2() @ 2024-04-23 11:01 Stas Sergeev 2024-04-23 11:01 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 0 siblings, 1 reply; 11+ messages in thread From: Stas Sergeev @ 2024-04-23 11:01 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. It is needed to perform an open operation with the creds that were in effect when the dir_fd was opened. This allows the process to pre-open some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged user, while still retaining the possibility to open/create files within the pre-opened directory set. Changes in v2: - capture full struct cred instead of just fsuid/fsgid. Suggested by Stefan Metzmacher <metze@samba.org> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> Stas Sergeev (2): fs: reorganize path_openat() openat2: add OA2_INHERIT_CRED flag fs/internal.h | 2 +- fs/namei.c | 52 +++++++++++++++++++++++++++++------- fs/open.c | 2 +- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 +++ 5 files changed, 50 insertions(+), 11 deletions(-) -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag 2024-04-23 11:01 [PATCH v2 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev @ 2024-04-23 11:01 ` Stas Sergeev 0 siblings, 0 replies; 11+ messages in thread From: Stas Sergeev @ 2024-04-23 11:01 UTC (permalink / raw) To: linux-kernel Cc: Stas Sergeev, Stefan Metzmacher, Eric Biederman, Alexander Viro, Andy Lutomirski, Christian Brauner, Jan Kara, Jeff Layton, Chuck Lever, Alexander Aring, linux-fsdevel, linux-api, Paolo Bonzini, Christian Göttsche This flag performs the open operation with the credentials that were in effect when dir_fd was opened. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. To avoid leaking creds across exec, this patch requires O_CLOEXEC flag on a directory. Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev <stsp2@yandex.ru> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> --- fs/internal.h | 2 +- fs/namei.c | 30 ++++++++++++++++++++++++++++-- fs/open.c | 2 +- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 +++ 5 files changed, 35 insertions(+), 4 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 7ca738904e34..692b53b19aad 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -169,7 +169,7 @@ static inline void sb_end_ro_state_change(struct super_block *sb) * open.c */ struct open_flags { - int open_flag; + u64 open_flag; umode_t mode; int acc_mode; int intent; diff --git a/fs/namei.c b/fs/namei.c index 2fde2c320ae9..0e0f2e32ef02 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -586,6 +586,7 @@ struct nameidata { int dfd; vfsuid_t dir_vfsuid; umode_t dir_mode; + const struct cred *dir_open_cred; } __randomize_layout; #define ND_ROOT_PRESET 1 @@ -695,6 +696,7 @@ static void terminate_walk(struct nameidata *nd) nd->depth = 0; nd->path.mnt = NULL; nd->path.dentry = NULL; + put_cred(nd->dir_open_cred); } /* path_put is needed afterwards regardless of success or failure */ @@ -2414,6 +2416,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags) get_fs_pwd(current->fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_cred = get_current_cred(); } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); @@ -2437,6 +2440,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags) path_get(&nd->path); nd->inode = nd->path.dentry->d_inode; } + nd->dir_open_cred = get_cred(f.file->f_cred); fdput(f); } @@ -3794,8 +3798,28 @@ static struct file *path_openat(struct nameidata *nd, error = do_o_path(nd, flags, file); } else { const char *s = path_init(nd, flags); - file = alloc_empty_file(op->open_flag, current_cred()); - error = PTR_ERR_OR_ZERO(file); + const struct cred *old_cred = NULL; + + error = 0; + if (op->open_flag & OA2_INHERIT_CRED) { + /* Make sure to work only with restricted + * look-up modes. + */ + if (!(nd->flags & (LOOKUP_BENEATH | LOOKUP_IN_ROOT))) + error = -EPERM; + /* Only work with O_CLOEXEC dirs. */ + if (!get_close_on_exec(nd->dfd)) + error = -EPERM; + + if (!error) + old_cred = override_creds(nd->dir_open_cred); + } + if (!error) { + file = alloc_empty_file(op->open_flag, current_cred()); + error = PTR_ERR_OR_ZERO(file); + } else { + file = ERR_PTR(error); + } if (!error) { while (!(error = link_path_walk(s, nd)) && (s = open_last_lookups(nd, file, op)) != NULL) @@ -3803,6 +3827,8 @@ static struct file *path_openat(struct nameidata *nd, } if (!error) error = do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); if (IS_ERR(file)) return file; diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..6be013182a35 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..b71f8b162102 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_INHERIT_CRED) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..cdd676a10b62 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,7 @@ struct open_how { return -EAGAIN if that's not possible. */ +/* openat2-specific flags go to upper 4 bytes. */ +#define OA2_INHERIT_CRED (1ULL << 32) + #endif /* _UAPI_LINUX_OPENAT2_H */ -- 2.44.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-26 13:36 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-23 22:46 [PATCH v3 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev 2024-04-23 22:46 ` [PATCH 1/2] fs: reorganize path_openat() Stas Sergeev 2024-04-23 22:46 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev -- strict thread matches above, loose matches on Subject: below -- 2024-04-24 10:52 [PATCH v4 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev 2024-04-24 10:52 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev 2024-04-25 2:31 ` Al Viro 2024-04-25 7:24 ` stsp 2024-04-25 9:23 ` stsp 2024-04-25 13:50 ` kernel test robot 2024-04-25 14:02 ` Christian Brauner 2024-04-26 13:36 ` stsp 2024-04-23 11:01 [PATCH v2 0/2] implement OA2_INHERIT_CRED flag for openat2() Stas Sergeev 2024-04-23 11:01 ` [PATCH 2/2] openat2: add OA2_INHERIT_CRED flag Stas Sergeev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).