* [PATCH review 0/11] General unprivileged mount support @ 2016-07-02 17:18 Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:18 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk As well as in these patches the code is also available from: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing It has been a long time in coming but recently in the userns tree the superblock has been expanded with a s_user_ns field indicating the user namespace that owns a superblock. The s_user_ns owner of a superblock has three implications. - Only kuids and kgids that map into s_user_ns are allowed to be sent to a filesystem from the vfs. - If the uid or gid on the filesystem does not map into s_user_ns i_uid is set to INVALID_UID and i_gid is set to INVALID_GID. - The scope of permission checks can be changed from global to a capabilitiy check in s_user_ns. The overall strategy is to handle as much of this as possible in the VFS so that what is happening is consistent between filesystems and has widespread review, and so that individual filesystems don't need to duplicate code. This set of patches inserts checks to ensure only kuids and kgids that map into s_user_ns are sent to filesystems. This set of patches updates the vfs to deal with potentially unmapped uids and gids in the i_uid and i_gid fields. The strategy adopted is to deny any activity that causes inodes with unmapped uids or gid to be written to disk except for a chown that causes makes i_uid and i_gid to map into s_user_ns. Relaxing of the capability checks and adding new filesystems that would benefit from the changes is held off until the vfs support is complete. I believe this work is complete so if there anything questionable you see please let me know. I have included linux-api because several system calls get new failure modes mostly -EOVERFLOW, and that may need separate documenation and review. The target for this work is to enable fully unprivileged fuse mounts and whichever filesystem results from the uid shifting work. These vfs changes should support all kinds of filesystems, but in practice it is an open problem if it is possible to modify a block based filesystem to be safe from people manipulating filesystem images in an attempt to get the kernel to malfunction so only a very limited set of additional filesystems is ever expected to be enabled by this work. Eric W. Biederman (6): userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS vfs: Verify acls are valid within superblock's s_user_ns. vfs: Don't modify inodes with a uid or gid unknown to the vfs vfs: Don't create inodes with a uid or gid unknown to the vfs quota: Ensure qids map to the filesystem quota: Handle quota data stored in s_user_ns. Seth Forshee (5): fs: Refuse uid/gid changes which don't map into s_user_ns fs: Check for invalid i_uid in may_follow_link() cred: Reject inodes with invalid ids in set_create_file_as() evm: Translate user/group ids relative to s_user_ns when computing HMAC fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns drivers/staging/lustre/lustre/mdc/mdc_request.c | 2 +- fs/9p/acl.c | 2 +- fs/attr.c | 19 +++++++++ fs/inode.c | 7 ++++ fs/namei.c | 40 ++++++++++++++---- fs/posix_acl.c | 8 ++-- fs/quota/dquot.c | 7 +++- fs/quota/quota.c | 14 +++---- fs/quota/quota_tree.c | 23 +++++++---- fs/quota/quota_v1.c | 6 ++- fs/quota/quota_v2.c | 10 +++-- fs/xattr.c | 7 ++++ include/linux/fs.h | 55 ++++++++++++++----------- include/linux/posix_acl.h | 2 +- include/linux/quota.h | 10 +++++ include/linux/uidgid.h | 4 +- kernel/cred.c | 2 + security/integrity/evm/evm_crypto.c | 4 +- 18 files changed, 154 insertions(+), 68 deletions(-) Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns 2016-07-02 17:18 [PATCH review 0/11] General unprivileged mount support Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman ` (5 more replies) 2016-07-04 8:52 ` [PATCH review 0/11] General unprivileged mount support Jan Kara [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2 siblings, 6 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> Add checks to notify_change to verify that uid and gid changes will map into the superblock's user namespace. If they do not fail with -EOVERFLOW. This is mandatory so that fileystems don't have to even think of dealing with ia_uid and ia_gid that --EWB Moved the test from inode_change_ok to notify_change Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/attr.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/attr.c b/fs/attr.c index 25b24d0f6c88..dd723578ddce 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -255,6 +255,17 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de if (!(attr->ia_valid & ~(ATTR_KILL_SUID | ATTR_KILL_SGID))) return 0; + /* + * Verify that uid/gid changes are valid in the target + * namespace of the superblock. + */ + if (ia_valid & ATTR_UID && + !kuid_has_mapping(inode->i_sb->s_user_ns, attr->ia_uid)) + return -EOVERFLOW; + if (ia_valid & ATTR_GID && + !kgid_has_mapping(inode->i_sb->s_user_ns, attr->ia_gid)) + return -EOVERFLOW; + error = security_inode_setattr(dentry, attr); if (error) return error; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns. 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 05/11] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman ` (4 subsequent siblings) 5 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Update posix_acl_valid to verify that an acl is within a user namespace. Update the callers of posix_acl_valid to pass in an appropriate user namespace. For posix_acl_xattr_set and v9fs_xattr_set_acl pass in inode->i_sb->s_user_ns to posix_acl_valid. For md_unpack_acl pass in &init_user_ns as no inode or superblock is in sight. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- drivers/staging/lustre/lustre/mdc/mdc_request.c | 2 +- fs/9p/acl.c | 2 +- fs/posix_acl.c | 8 ++++---- include/linux/posix_acl.h | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c index 86b7445365f4..40cf57fad581 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c @@ -434,7 +434,7 @@ static int mdc_unpack_acl(struct ptlrpc_request *req, struct lustre_md *md) return rc; } - rc = posix_acl_valid(acl); + rc = posix_acl_valid(&init_user_ns, acl); if (rc) { CERROR("validate acl: %d\n", rc); posix_acl_release(acl); diff --git a/fs/9p/acl.c b/fs/9p/acl.c index 0576eaeb60b9..5b6a1743ea17 100644 --- a/fs/9p/acl.c +++ b/fs/9p/acl.c @@ -266,7 +266,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler, if (IS_ERR(acl)) return PTR_ERR(acl); else if (acl) { - retval = posix_acl_valid(acl); + retval = posix_acl_valid(inode->i_sb->s_user_ns, acl); if (retval) goto err_out; } diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 8a4a266beff3..647c28180675 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -205,7 +205,7 @@ posix_acl_clone(const struct posix_acl *acl, gfp_t flags) * Check if an acl is valid. Returns 0 if it is, or -E... otherwise. */ int -posix_acl_valid(const struct posix_acl *acl) +posix_acl_valid(struct user_namespace *user_ns, const struct posix_acl *acl) { const struct posix_acl_entry *pa, *pe; int state = ACL_USER_OBJ; @@ -225,7 +225,7 @@ posix_acl_valid(const struct posix_acl *acl) case ACL_USER: if (state != ACL_USER) return -EINVAL; - if (!uid_valid(pa->e_uid)) + if (!kuid_has_mapping(user_ns, pa->e_uid)) return -EINVAL; needs_mask = 1; break; @@ -240,7 +240,7 @@ posix_acl_valid(const struct posix_acl *acl) case ACL_GROUP: if (state != ACL_GROUP) return -EINVAL; - if (!gid_valid(pa->e_gid)) + if (!kgid_has_mapping(user_ns, pa->e_gid)) return -EINVAL; needs_mask = 1; break; @@ -845,7 +845,7 @@ posix_acl_xattr_set(const struct xattr_handler *handler, return PTR_ERR(acl); if (acl) { - ret = posix_acl_valid(acl); + ret = posix_acl_valid(inode->i_sb->s_user_ns, acl); if (ret) goto out; } diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 5b5a80cc5926..1af6438fde3e 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -81,7 +81,7 @@ posix_acl_release(struct posix_acl *acl) extern void posix_acl_init(struct posix_acl *, int); extern struct posix_acl *posix_acl_alloc(int, gfp_t); -extern int posix_acl_valid(const struct posix_acl *); +extern int posix_acl_valid(struct user_namespace *, const struct posix_acl *); extern int posix_acl_permission(struct inode *, const struct posix_acl *, int); extern struct posix_acl *posix_acl_from_mode(umode_t, gfp_t); extern int posix_acl_equiv_mode(const struct posix_acl *, umode_t *); -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 05/11] cred: Reject inodes with invalid ids in set_create_file_as() 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs Eric W. Biederman ` (3 subsequent siblings) 5 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> Using INVALID_[UG]ID for the LSM file creation context doesn't make sense, so return an error if the inode passed to set_create_file_as() has an invalid id. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/cred.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index 0c0cd8a62285..5f264fb5737d 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx); */ int set_create_files_as(struct cred *new, struct inode *inode) { + if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid)) + return -EINVAL; new->fsuid = inode->i_uid; new->fsgid = inode->i_gid; return security_kernel_create_files_as(new, inode); -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 05/11] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman [not found] ` <20160702172035.19568-7-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:20 ` [PATCH review 08/11] quota: Ensure qids map to the filesystem Eric W. Biederman ` (2 subsequent siblings) 5 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/namei.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 840201c4c290..629823f19a6a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2814,16 +2814,22 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) * 1. We can't do it if child already exists (open has special treatment for * this case, but since we are inlined it's OK) * 2. We can't do it if dir is read-only (done in permission()) - * 3. We should have write and exec permissions on dir - * 4. We can't do it if dir is immutable (done in permission()) + * 3. We can't do it if the fs can't represent the fsuid or fsgid. + * 4. We should have write and exec permissions on dir + * 5. We can't do it if dir is immutable (done in permission()) */ static inline int may_create(struct inode *dir, struct dentry *child) { + struct user_namespace *s_user_ns; audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE); if (child->d_inode) return -EEXIST; if (IS_DEADDIR(dir)) return -ENOENT; + s_user_ns = dir->i_sb->s_user_ns; + if (!kuid_has_mapping(s_user_ns, current_fsuid()) || + !kgid_has_mapping(s_user_ns, current_fsgid())) + return -EOVERFLOW; return inode_permission(dir, MAY_WRITE | MAY_EXEC); } -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <20160702172035.19568-7-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs [not found] ` <20160702172035.19568-7-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-04 7:59 ` Jan Kara [not found] ` <20160704075919.GA5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-04 7:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk On Sat 02-07-16 12:20:31, Eric W. Biederman wrote: > It is expected that filesystems can not represent uids and gids from > outside of their user namespace. Keep things simple by not even > trying to create filesystem nodes with non-sense uids and gids. > > Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> So if we have sb->s_user_ns that doesn't map UID and GID 0, root cannot directly create files in this filesystem. EOVERFLOW error will at least hint us where the problem is but still I'm suspecting this is going to create hard to debug configuration issues... I'm not sure if we can do anything about this but I wanted to point it out. Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20160704075919.GA5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs [not found] ` <20160704075919.GA5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> @ 2016-07-05 14:55 ` Eric W. Biederman [not found] ` <87zipwxhgp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-05 14:55 UTC (permalink / raw) To: Jan Kara Cc: Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > On Sat 02-07-16 12:20:31, Eric W. Biederman wrote: >> It is expected that filesystems can not represent uids and gids from >> outside of their user namespace. Keep things simple by not even >> trying to create filesystem nodes with non-sense uids and gids. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > > So if we have sb->s_user_ns that doesn't map UID and GID 0, root cannot > directly create files in this filesystem. EOVERFLOW error will at least > hint us where the problem is but still I'm suspecting this is going to > create hard to debug configuration issues... I'm not sure if we can do > anything about this but I wanted to point it out. A reasonable point. A couple of details. - If there is no uid or gid 0 inside of a user namespace there is no root user in that namespace so this is a moot point. - If we are talking about uid 0 in the initial user namespace the scenario you are worried about can occur, but you have to work at it to get into that situation. To create a superblock has s_user_ns != &init_user_ns something like the following has to occur. setuid(1000); unshare(CLONE_NEWUSER); unshare(CLONE_NEWNS); mount(....); At which point the oridinary root user can not see the filesystem with s_user_ns != &init_user_ns as it is located in another mount namespace. Which means root has to use setns to get into that mount namespace, so there are going to be larger hints than -EOVERFLOW. At the same time if there are better error codes that make it clearer what is happening I am all ears. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <87zipwxhgp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs [not found] ` <87zipwxhgp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-06 9:07 ` Jan Kara [not found] ` <20160706090705.GE14067-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-06 9:07 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni On Tue 05-07-16 09:55:18, Eric W. Biederman wrote: > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > > > On Sat 02-07-16 12:20:31, Eric W. Biederman wrote: > >> It is expected that filesystems can not represent uids and gids from > >> outside of their user namespace. Keep things simple by not even > >> trying to create filesystem nodes with non-sense uids and gids. > >> > >> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > > > > So if we have sb->s_user_ns that doesn't map UID and GID 0, root cannot > > directly create files in this filesystem. EOVERFLOW error will at least > > hint us where the problem is but still I'm suspecting this is going to > > create hard to debug configuration issues... I'm not sure if we can do > > anything about this but I wanted to point it out. > > A reasonable point. > > A couple of details. > > - If there is no uid or gid 0 inside of a user namespace there is > no root user in that namespace so this is a moot point. > > - If we are talking about uid 0 in the initial user namespace the > scenario you are worried about can occur, but you have to work at it > to get into that situation. > > To create a superblock has s_user_ns != &init_user_ns something like > the following has to occur. > > setuid(1000); > unshare(CLONE_NEWUSER); > unshare(CLONE_NEWNS); > mount(....); > > At which point the oridinary root user can not see the filesystem > with s_user_ns != &init_user_ns as it is located in another mount > namespace. > > Which means root has to use setns to get into that mount namespace, > so there are going to be larger hints than -EOVERFLOW. OK, I see. Thanks for explanation. The inability of the admin (UID 0 in init_user_ns) to easily see and change any filesystem mounted in the system makes me somewhat nervous ;). But I guess the complexity is the price for the flexibility... Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20160706090705.GE14067-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs [not found] ` <20160706090705.GE14067-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> @ 2016-07-06 15:37 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 15:37 UTC (permalink / raw) To: Jan Kara Cc: Seth Forshee, Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > On Tue 05-07-16 09:55:18, Eric W. Biederman wrote: >> To create a superblock has s_user_ns != &init_user_ns something like >> the following has to occur. >> >> setuid(1000); >> unshare(CLONE_NEWUSER); >> unshare(CLONE_NEWNS); >> mount(....); >> >> At which point the oridinary root user can not see the filesystem >> with s_user_ns != &init_user_ns as it is located in another mount >> namespace. >> >> Which means root has to use setns to get into that mount namespace, >> so there are going to be larger hints than -EOVERFLOW. > > OK, I see. Thanks for explanation. The inability of the admin (UID 0 in > init_user_ns) to easily see and change any filesystem mounted in the system > makes me somewhat nervous ;). But I guess the complexity is the price for > the flexibility... Certainly the price of not allowing unprivileged users to confuse existing setuid 0 and setcap executables. There are days I really wish I could do things the plan9 way and throw out any legacy baggage that makes things hard to implement. As I recall plan9 threw out setuid, setgid and only supported the 9p filesystem to make this all simple and almost safe. Wonderfully and horribly I have to keep a lot more real world code usable in this framework. All of that said with the existence of CRIU I expect we will ultimately have all of the interfaces needed for adminstration applications to see everything that is going on. Unfortunately sometimes those interfaces and tools lag behind the rest of the work. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 08/11] quota: Ensure qids map to the filesystem 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (2 preceding siblings ...) 2016-07-02 17:20 ` [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 10/11] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 5 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Introduce the helper qid_has_mapping and use it to ensure that the quota system only considers qids that map to the filesystems s_user_ns. In practice for quota supporting filesystems today this is the exact same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not map into init_user_ns. Replace the qid_valid calls with qid_has_mapping as values come in from userspace. This is harmless today and it prepares the quota system to work on filesystems with quotas but mounted by unprivileged users. Call qid_has_mapping from dqget. This ensures the passed in qid has a prepresentation on the underlying filesystem. Previously this was unnecessary as filesystesm never had qids that could not map. With the introduction of filesystems outside of s_user_ns this will not remain true. All of this ensures the quota code never has to deal with qids that don't map to the underlying filesystem. Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/quota/dquot.c | 3 +++ fs/quota/quota.c | 12 ++++++------ include/linux/quota.h | 10 ++++++++++ 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index ff21980d0119..74706b6aa747 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -841,6 +841,9 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid) unsigned int hashent = hashfn(sb, qid); struct dquot *dquot, *empty = NULL; + if (!qid_has_mapping(sb->s_user_ns, qid)) + return ERR_PTR(-EINVAL); + if (!sb_has_quota_active(sb, qid.type)) return ERR_PTR(-ESRCH); we_slept: diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 0f10ee9892ce..73f6f4cf0a21 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -211,7 +211,7 @@ static int quota_getquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &fdq); if (ret) @@ -237,7 +237,7 @@ static int quota_getnextquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &fdq); if (ret) @@ -288,7 +288,7 @@ static int quota_setquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; copy_from_if_dqblk(&fdq, &idq); return sb->s_qcop->set_dqblk(sb, qid, &fdq); @@ -581,7 +581,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ if (from_kqid(&init_user_ns, qid) == 0 && @@ -642,7 +642,7 @@ static int quota_getxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &qdq); if (ret) @@ -669,7 +669,7 @@ static int quota_getnextxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &qdq); if (ret) diff --git a/include/linux/quota.h b/include/linux/quota.h index 9dfb6bce8c9e..1db16ee39b31 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -179,6 +179,16 @@ static inline struct kqid make_kqid_projid(kprojid_t projid) return kqid; } +/** + * qid_has_mapping - Report if a qid maps into a user namespace. + * @ns: The user namespace to see if a value maps into. + * @qid: The kernel internal quota identifier to test. + */ +static inline bool qid_has_mapping(struct user_namespace *ns, struct kqid qid) +{ + return from_kqid(ns, qid) != (qid_t) -1; +} + extern spinlock_t dq_data_lock; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 10/11] evm: Translate user/group ids relative to s_user_ns when computing HMAC 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (3 preceding siblings ...) 2016-07-02 17:20 ` [PATCH review 08/11] quota: Ensure qids map to the filesystem Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 5 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> The EVM HMAC should be calculated using the on disk user and group ids, so the k[ug]ids in the inode must be translated relative to the s_user_ns of the inode's super block. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- security/integrity/evm/evm_crypto.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c index 30b6b7d0429f..11c1d30bd705 100644 --- a/security/integrity/evm/evm_crypto.c +++ b/security/integrity/evm/evm_crypto.c @@ -151,8 +151,8 @@ static void hmac_add_misc(struct shash_desc *desc, struct inode *inode, memset(&hmac_misc, 0, sizeof(hmac_misc)); hmac_misc.ino = inode->i_ino; hmac_misc.generation = inode->i_generation; - hmac_misc.uid = from_kuid(&init_user_ns, inode->i_uid); - hmac_misc.gid = from_kgid(&init_user_ns, inode->i_gid); + hmac_misc.uid = from_kuid(inode->i_sb->s_user_ns, inode->i_uid); + hmac_misc.gid = from_kgid(inode->i_sb->s_user_ns, inode->i_gid); hmac_misc.mode = inode->i_mode; crypto_shash_update(desc, (const u8 *)&hmac_misc, sizeof(hmac_misc)); if (evm_hmac_attrs & EVM_ATTR_FSUUID) -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* [PATCH review 02/11] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 04/11] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman ` (3 subsequent siblings) 4 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni Refuse to admit any user namespace has a mapping of the INVALID_UID and the INVALID_GID when !CONFIG_USER_NS. Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- include/linux/uidgid.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h index 03835522dfcb..25e9d9216340 100644 --- a/include/linux/uidgid.h +++ b/include/linux/uidgid.h @@ -177,12 +177,12 @@ static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid) static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid) { - return true; + return uid_valid(uid); } static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) { - return true; + return gid_valid(gid); } #endif /* CONFIG_USER_NS */ -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 04/11] fs: Check for invalid i_uid in may_follow_link() [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:20 ` [PATCH review 02/11] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 06/11] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman ` (2 subsequent siblings) 4 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Filesystem uids which don't map into a user namespace may result in inode->i_uid being INVALID_UID. A symlink and its parent could have different owners in the filesystem can both get mapped to INVALID_UID, which may result in following a symlink when this would not have otherwise been permitted when protected symlinks are enabled. Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/namei.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 757a32725d92..8701bd9a5270 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -901,6 +901,7 @@ static inline int may_follow_link(struct nameidata *nd) { const struct inode *inode; const struct inode *parent; + kuid_t puid; if (!sysctl_protected_symlinks) return 0; @@ -916,7 +917,8 @@ static inline int may_follow_link(struct nameidata *nd) return 0; /* Allowed if parent directory and link owner match. */ - if (uid_eq(parent->i_uid, inode->i_uid)) + puid = parent->i_uid; + if (uid_valid(puid) && uid_eq(puid, inode->i_uid)) return 0; if (nd->flags & LOOKUP_RCU) -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 06/11] vfs: Don't modify inodes with a uid or gid unknown to the vfs [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:20 ` [PATCH review 02/11] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 04/11] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 09/11] quota: Handle quota data stored in s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 11/11] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman 4 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk When a filesystem outside of init_user_ns is mounted it could have uids and gids stored in it that do not map to init_user_ns. The plan is to allow those filesystems to set i_uid to INVALID_UID and i_gid to INVALID_GID for unmapped uids and gids and then to handle that strange case in the vfs to ensure there is consistent robust handling of the weirdness. Upon a careful review of the vfs and filesystems about the only case where there is any possibility of confusion or trouble is when the inode is written back to disk. In that case filesystems typically read the inode->i_uid and inode->i_gid and write them to disk even when just an inode timestamp is being updated. Which leads to a rule that is very simple to implement and understand inodes whose i_uid or i_gid is not valid may not be written. In dealing with access times this means treat those inodes as if the inode flag S_NOATIME was set. Reads of the inodes appear safe and useful, but any write or modification is disallowed. The only inode write that is allowed is a chown that sets the uid and gid on the inode to valid values. After such a chown the inode is normal and may be treated as such. Denying all writes to inodes with uids or gids unknown to the vfs also prevents several oddball cases where corruption would have occurred because the vfs does not have complete information. One problem case that is prevented is attempting to use the gid of a directory for new inodes where the directories sgid bit is set but the directories gid is not mapped. Another problem case avoided is attempting to update the evm hash after setxattr, removexattr, and setattr. As the evm hash includeds the inode->i_uid or inode->i_gid not knowning the uid or gid prevents a correct evm hash from being computed. evm hash verification also fails when i_uid or i_gid is unknown but that is essentially harmless as it does not cause filesystem corruption. Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/attr.c | 8 ++++++++ fs/inode.c | 7 +++++++ fs/namei.c | 26 +++++++++++++++++++++----- fs/xattr.c | 7 +++++++ include/linux/fs.h | 5 +++++ 5 files changed, 48 insertions(+), 5 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index dd723578ddce..42bb42bb3c72 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -266,6 +266,14 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de !kgid_has_mapping(inode->i_sb->s_user_ns, attr->ia_gid)) return -EOVERFLOW; + /* Don't allow modifications of files with invalid uids or + * gids unless those uids & gids are being made valid. + */ + if (!(ia_valid & ATTR_UID) && !uid_valid(inode->i_uid)) + return -EOVERFLOW; + if (!(ia_valid & ATTR_GID) && !gid_valid(inode->i_gid)) + return -EOVERFLOW; + error = security_inode_setattr(dentry, attr); if (error) return error; diff --git a/fs/inode.c b/fs/inode.c index 4ccbc21b30ce..c0ebb97fb085 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1617,6 +1617,13 @@ bool atime_needs_update(const struct path *path, struct inode *inode) if (inode->i_flags & S_NOATIME) return false; + + /* Atime updates will likely cause i_uid and i_gid to be written + * back improprely if their true value is unknown to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return false; + if (IS_NOATIME(inode)) return false; if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode)) diff --git a/fs/namei.c b/fs/namei.c index 8701bd9a5270..840201c4c290 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -410,6 +410,14 @@ int __inode_permission(struct inode *inode, int mask) */ if (IS_IMMUTABLE(inode)) return -EACCES; + + /* + * Updating mtime will likely cause i_uid and i_gid to be + * written back improperly if their true value is unknown + * to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EACCES; } retval = do_inode_permission(inode, mask); @@ -2759,10 +2767,11 @@ EXPORT_SYMBOL(__check_sticky); * c. have CAP_FOWNER capability * 6. If the victim is append-only or immutable we can't do antyhing with * links pointing to it. - * 7. If we were asked to remove a directory and victim isn't one - ENOTDIR. - * 8. If we were asked to remove a non-directory and victim isn't one - EISDIR. - * 9. We can't remove a root or mountpoint. - * 10. We don't allow removal of NFS sillyrenamed files; it's handled by + * 7. If the victim has an unknown uid or gid we can't change the inode. + * 8. If we were asked to remove a directory and victim isn't one - ENOTDIR. + * 9. If we were asked to remove a non-directory and victim isn't one - EISDIR. + * 10. We can't remove a root or mountpoint. + * 11. We don't allow removal of NFS sillyrenamed files; it's handled by * nfs_async_unlink(). */ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) @@ -2784,7 +2793,7 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) return -EPERM; if (check_sticky(dir, inode) || IS_APPEND(inode) || - IS_IMMUTABLE(inode) || IS_SWAPFILE(inode)) + IS_IMMUTABLE(inode) || IS_SWAPFILE(inode) || HAS_UNMAPPED_ID(inode)) return -EPERM; if (isdir) { if (!d_is_dir(victim)) @@ -4190,6 +4199,13 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de */ if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) return -EPERM; + /* + * Updating the link count will likely cause i_uid and i_gid to + * be writen back improperly if their true value is unknown to + * the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EPERM; if (!dir->i_op->link) return -EPERM; if (S_ISDIR(inode->i_mode)) diff --git a/fs/xattr.c b/fs/xattr.c index 4beafc43daa5..c243905835ab 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -38,6 +38,13 @@ xattr_permission(struct inode *inode, const char *name, int mask) if (mask & MAY_WRITE) { if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) return -EPERM; + /* + * Updating an xattr will likely cause i_uid and i_gid + * to be writen back improperly if their true value is + * unknown to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EPERM; } /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 375e37f42cdf..cb25ceb6d1ef 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1874,6 +1874,11 @@ struct super_operations { #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool HAS_UNMAPPED_ID(struct inode *inode) +{ + return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid); +} + /* * Inode state bits. Protected by inode->i_lock * -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 09/11] quota: Handle quota data stored in s_user_ns. [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2016-07-02 17:20 ` [PATCH review 06/11] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman [not found] ` <20160702172035.19568-9-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:20 ` [PATCH review 11/11] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman 4 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with the filesystems notion of id 0. For the dquot hashfn translate the qid into sb->s_user_ns to remove extraneous bits before hashing. When dealing with generic quota files in quota_tree.c, quota_v1.c, and quota_v2.c before writing quotas to the file encode them in sb->s_user_ns, and decode ids read from the file from sb->s_user_ns. One complication comes up in qtree_get_next_id, as it is possible in principle for id's to be present in the quota file that do not have a mapping in the filesystems user namespace. If that is the case the code is modified to skip the unmapped ids. Inspired-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/quota/dquot.c | 4 ++-- fs/quota/quota.c | 2 +- fs/quota/quota_tree.c | 23 ++++++++++++++--------- fs/quota/quota_v1.c | 6 ++++-- fs/quota/quota_v2.c | 10 ++++++---- 5 files changed, 27 insertions(+), 18 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 74706b6aa747..9b39925f34f4 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -252,7 +252,7 @@ static int __dquot_initialize(struct inode *inode, int type); static inline unsigned int hashfn(const struct super_block *sb, struct kqid qid) { - unsigned int id = from_kqid(&init_user_ns, qid); + unsigned int id = from_kqid(sb->s_user_ns, qid); int type = qid.type; unsigned long tmp; @@ -748,7 +748,7 @@ void dqput(struct dquot *dquot) if (!atomic_read(&dquot->dq_count)) { quota_error(dquot->dq_sb, "trying to free free dquot of %s %d", quotatypes[dquot->dq_id.type], - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id)); BUG(); } #endif diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 73f6f4cf0a21..35df08ee9c97 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -584,7 +584,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ - if (from_kqid(&init_user_ns, qid) == 0 && + if (from_kqid(sb->s_user_ns, qid) == 0 && fdq.d_fieldmask & (FS_DQ_WARNS_MASK | FS_DQ_TIMER_MASK)) { struct qc_info qinfo; int ret; diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c index 0738972e8d3f..4f8f6bdc698c 100644 --- a/fs/quota/quota_tree.c +++ b/fs/quota/quota_tree.c @@ -34,7 +34,7 @@ static int __get_index(struct qtree_mem_dqinfo *info, qid_t id, int depth) static int get_index(struct qtree_mem_dqinfo *info, struct kqid qid, int depth) { - qid_t id = from_kqid(&init_user_ns, qid); + qid_t id = from_kqid(info->dqi_sb->s_user_ns, qid); return __get_index(info, id, depth); } @@ -554,7 +554,7 @@ static loff_t find_block_dqentry(struct qtree_mem_dqinfo *info, if (i == qtree_dqstr_in_blk(info)) { quota_error(dquot->dq_sb, "Quota for id %u referenced but not present", - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id)); ret = -EIO; goto out_buf; } else { @@ -624,7 +624,7 @@ int qtree_read_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) if (offset < 0) quota_error(sb,"Can't read quota structure " "for id %u", - from_kqid(&init_user_ns, + from_kqid(sb->s_user_ns, dquot->dq_id)); dquot->dq_off = 0; set_bit(DQ_FAKE_B, &dquot->dq_flags); @@ -643,7 +643,7 @@ int qtree_read_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) if (ret >= 0) ret = -EIO; quota_error(sb, "Error while reading quota structure for id %u", - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(sb->s_user_ns, dquot->dq_id)); set_bit(DQ_FAKE_B, &dquot->dq_flags); memset(&dquot->dq_dqb, 0, sizeof(struct mem_dqblk)); kfree(ddquot); @@ -721,13 +721,18 @@ out_buf: int qtree_get_next_id(struct qtree_mem_dqinfo *info, struct kqid *qid) { - qid_t id = from_kqid(&init_user_ns, *qid); + qid_t id = from_kqid(info->dqi_sb->s_user_ns, *qid); + struct kqid next_id; int ret; - ret = find_next_id(info, &id, QT_TREEOFF, 0); - if (ret < 0) - return ret; - *qid = make_kqid(&init_user_ns, qid->type, id); + do { + ret = find_next_id(info, &id, QT_TREEOFF, 0); + if (ret < 0) + return ret; + next_id = make_kqid(info->dqi_sb->s_user_ns, qid->type, id); + } while (!qid_valid(next_id)); /* Loop until a mapped id is found */ + + *qid = next_id; return 0; } EXPORT_SYMBOL(qtree_get_next_id); diff --git a/fs/quota/quota_v1.c b/fs/quota/quota_v1.c index 8fe79beced5c..087e7743b996 100644 --- a/fs/quota/quota_v1.c +++ b/fs/quota/quota_v1.c @@ -54,6 +54,7 @@ static void v1_mem2disk_dqblk(struct v1_disk_dqblk *d, struct mem_dqblk *m) static int v1_read_dqblk(struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; int type = dquot->dq_id.type; struct v1_disk_dqblk dqblk; @@ -64,7 +65,7 @@ static int v1_read_dqblk(struct dquot *dquot) memset(&dqblk, 0, sizeof(struct v1_disk_dqblk)); dquot->dq_sb->s_op->quota_read(dquot->dq_sb, type, (char *)&dqblk, sizeof(struct v1_disk_dqblk), - v1_dqoff(from_kqid(&init_user_ns, dquot->dq_id))); + v1_dqoff(from_kqid(dq_user_ns, dquot->dq_id)); v1_disk2mem_dqblk(&dquot->dq_dqb, &dqblk); if (dquot->dq_dqb.dqb_bhardlimit == 0 && @@ -79,6 +80,7 @@ static int v1_read_dqblk(struct dquot *dquot) static int v1_commit_dqblk(struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; short type = dquot->dq_id.type; ssize_t ret; struct v1_disk_dqblk dqblk; @@ -95,7 +97,7 @@ static int v1_commit_dqblk(struct dquot *dquot) if (sb_dqopt(dquot->dq_sb)->files[type]) ret = dquot->dq_sb->s_op->quota_write(dquot->dq_sb, type, (char *)&dqblk, sizeof(struct v1_disk_dqblk), - v1_dqoff(from_kqid(&init_user_ns, dquot->dq_id))); + v1_dqoff(from_kquid(dq_user_ns, dquot->dq_id))); if (ret != sizeof(struct v1_disk_dqblk)) { quota_error(dquot->dq_sb, "dquota write failed"); if (ret >= 0) diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c index ca71bf881ad1..4ecf91611a87 100644 --- a/fs/quota/quota_v2.c +++ b/fs/quota/quota_v2.c @@ -199,6 +199,7 @@ static void v2r0_disk2memdqb(struct dquot *dquot, void *dp) static void v2r0_mem2diskdqb(void *dp, struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; struct v2r0_disk_dqblk *d = dp; struct mem_dqblk *m = &dquot->dq_dqb; struct qtree_mem_dqinfo *info = @@ -212,7 +213,7 @@ static void v2r0_mem2diskdqb(void *dp, struct dquot *dquot) d->dqb_bsoftlimit = cpu_to_le32(v2_stoqb(m->dqb_bsoftlimit)); d->dqb_curspace = cpu_to_le64(m->dqb_curspace); d->dqb_btime = cpu_to_le64(m->dqb_btime); - d->dqb_id = cpu_to_le32(from_kqid(&init_user_ns, dquot->dq_id)); + d->dqb_id = cpu_to_le32(from_kqid(dq_user_ns, dquot->dq_qid)); if (qtree_entry_unused(info, dp)) d->dqb_itime = cpu_to_le64(1); } @@ -225,7 +226,7 @@ static int v2r0_is_id(void *dp, struct dquot *dquot) if (qtree_entry_unused(info, dp)) return 0; - return qid_eq(make_kqid(&init_user_ns, dquot->dq_id.type, + return qid_eq(make_kqid(dquot->dq_sb->s_user_ns, quot->dq_id.type, le32_to_cpu(d->dqb_id)), dquot->dq_id); } @@ -252,6 +253,7 @@ static void v2r1_disk2memdqb(struct dquot *dquot, void *dp) static void v2r1_mem2diskdqb(void *dp, struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; struct v2r1_disk_dqblk *d = dp; struct mem_dqblk *m = &dquot->dq_dqb; struct qtree_mem_dqinfo *info = @@ -265,7 +267,7 @@ static void v2r1_mem2diskdqb(void *dp, struct dquot *dquot) d->dqb_bsoftlimit = cpu_to_le64(v2_stoqb(m->dqb_bsoftlimit)); d->dqb_curspace = cpu_to_le64(m->dqb_curspace); d->dqb_btime = cpu_to_le64(m->dqb_btime); - d->dqb_id = cpu_to_le32(from_kqid(&init_user_ns, dquot->dq_id)); + d->dqb_id = cpu_to_le32(from_kqid(dq_user_ns, dquot->dq_id)); if (qtree_entry_unused(info, dp)) d->dqb_itime = cpu_to_le64(1); } @@ -278,7 +280,7 @@ static int v2r1_is_id(void *dp, struct dquot *dquot) if (qtree_entry_unused(info, dp)) return 0; - return qid_eq(make_kqid(&init_user_ns, dquot->dq_id.type, + return qid_eq(make_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id.type, le32_to_cpu(d->dqb_id)), dquot->dq_id); } -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <20160702172035.19568-9-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. [not found] ` <20160702172035.19568-9-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-02 17:33 ` Eric W. Biederman [not found] ` <87mvm03pxy.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:33 UTC (permalink / raw) To: Seth Forshee Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with the filesystems notion of id 0. For the dquot hashfn translate the qid into sb->s_user_ns to remove extraneous bits before hashing. When dealing with generic quota files in quota_tree.c, quota_v1.c, and quota_v2.c before writing quotas to the file encode them in sb->s_user_ns, and decode ids read from the file from sb->s_user_ns. One complication comes up in qtree_get_next_id, as it is possible in principle for id's to be present in the quota file that do not have a mapping in the filesystems user namespace. If that is the case the code is modified to skip the unmapped ids. Inspired-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- Grr. My kconfig was wrong and typo's crept into my previous version of this patch. Thank you kbuild test robot from catching this. fs/quota/dquot.c | 4 ++-- fs/quota/quota.c | 2 +- fs/quota/quota_tree.c | 23 ++++++++++++++--------- fs/quota/quota_v1.c | 6 ++++-- fs/quota/quota_v2.c | 10 ++++++---- 5 files changed, 27 insertions(+), 18 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 74706b6aa747..9b39925f34f4 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -252,7 +252,7 @@ static int __dquot_initialize(struct inode *inode, int type); static inline unsigned int hashfn(const struct super_block *sb, struct kqid qid) { - unsigned int id = from_kqid(&init_user_ns, qid); + unsigned int id = from_kqid(sb->s_user_ns, qid); int type = qid.type; unsigned long tmp; @@ -748,7 +748,7 @@ void dqput(struct dquot *dquot) if (!atomic_read(&dquot->dq_count)) { quota_error(dquot->dq_sb, "trying to free free dquot of %s %d", quotatypes[dquot->dq_id.type], - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id)); BUG(); } #endif diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 73f6f4cf0a21..35df08ee9c97 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -584,7 +584,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ - if (from_kqid(&init_user_ns, qid) == 0 && + if (from_kqid(sb->s_user_ns, qid) == 0 && fdq.d_fieldmask & (FS_DQ_WARNS_MASK | FS_DQ_TIMER_MASK)) { struct qc_info qinfo; int ret; diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c index 0738972e8d3f..4f8f6bdc698c 100644 --- a/fs/quota/quota_tree.c +++ b/fs/quota/quota_tree.c @@ -34,7 +34,7 @@ static int __get_index(struct qtree_mem_dqinfo *info, qid_t id, int depth) static int get_index(struct qtree_mem_dqinfo *info, struct kqid qid, int depth) { - qid_t id = from_kqid(&init_user_ns, qid); + qid_t id = from_kqid(info->dqi_sb->s_user_ns, qid); return __get_index(info, id, depth); } @@ -554,7 +554,7 @@ static loff_t find_block_dqentry(struct qtree_mem_dqinfo *info, if (i == qtree_dqstr_in_blk(info)) { quota_error(dquot->dq_sb, "Quota for id %u referenced but not present", - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id)); ret = -EIO; goto out_buf; } else { @@ -624,7 +624,7 @@ int qtree_read_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) if (offset < 0) quota_error(sb,"Can't read quota structure " "for id %u", - from_kqid(&init_user_ns, + from_kqid(sb->s_user_ns, dquot->dq_id)); dquot->dq_off = 0; set_bit(DQ_FAKE_B, &dquot->dq_flags); @@ -643,7 +643,7 @@ int qtree_read_dquot(struct qtree_mem_dqinfo *info, struct dquot *dquot) if (ret >= 0) ret = -EIO; quota_error(sb, "Error while reading quota structure for id %u", - from_kqid(&init_user_ns, dquot->dq_id)); + from_kqid(sb->s_user_ns, dquot->dq_id)); set_bit(DQ_FAKE_B, &dquot->dq_flags); memset(&dquot->dq_dqb, 0, sizeof(struct mem_dqblk)); kfree(ddquot); @@ -721,13 +721,18 @@ out_buf: int qtree_get_next_id(struct qtree_mem_dqinfo *info, struct kqid *qid) { - qid_t id = from_kqid(&init_user_ns, *qid); + qid_t id = from_kqid(info->dqi_sb->s_user_ns, *qid); + struct kqid next_id; int ret; - ret = find_next_id(info, &id, QT_TREEOFF, 0); - if (ret < 0) - return ret; - *qid = make_kqid(&init_user_ns, qid->type, id); + do { + ret = find_next_id(info, &id, QT_TREEOFF, 0); + if (ret < 0) + return ret; + next_id = make_kqid(info->dqi_sb->s_user_ns, qid->type, id); + } while (!qid_valid(next_id)); /* Loop until a mapped id is found */ + + *qid = next_id; return 0; } EXPORT_SYMBOL(qtree_get_next_id); diff --git a/fs/quota/quota_v1.c b/fs/quota/quota_v1.c index 8fe79beced5c..a354f8bed96a 100644 --- a/fs/quota/quota_v1.c +++ b/fs/quota/quota_v1.c @@ -54,6 +54,7 @@ static void v1_mem2disk_dqblk(struct v1_disk_dqblk *d, struct mem_dqblk *m) static int v1_read_dqblk(struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; int type = dquot->dq_id.type; struct v1_disk_dqblk dqblk; @@ -64,7 +65,7 @@ static int v1_read_dqblk(struct dquot *dquot) memset(&dqblk, 0, sizeof(struct v1_disk_dqblk)); dquot->dq_sb->s_op->quota_read(dquot->dq_sb, type, (char *)&dqblk, sizeof(struct v1_disk_dqblk), - v1_dqoff(from_kqid(&init_user_ns, dquot->dq_id))); + v1_dqoff(from_kqid(dq_user_ns, dquot->dq_id))); v1_disk2mem_dqblk(&dquot->dq_dqb, &dqblk); if (dquot->dq_dqb.dqb_bhardlimit == 0 && @@ -79,6 +80,7 @@ static int v1_read_dqblk(struct dquot *dquot) static int v1_commit_dqblk(struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; short type = dquot->dq_id.type; ssize_t ret; struct v1_disk_dqblk dqblk; @@ -95,7 +97,7 @@ static int v1_commit_dqblk(struct dquot *dquot) if (sb_dqopt(dquot->dq_sb)->files[type]) ret = dquot->dq_sb->s_op->quota_write(dquot->dq_sb, type, (char *)&dqblk, sizeof(struct v1_disk_dqblk), - v1_dqoff(from_kqid(&init_user_ns, dquot->dq_id))); + v1_dqoff(from_kqid(dq_user_ns, dquot->dq_id))); if (ret != sizeof(struct v1_disk_dqblk)) { quota_error(dquot->dq_sb, "dquota write failed"); if (ret >= 0) diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c index ca71bf881ad1..54e27b296656 100644 --- a/fs/quota/quota_v2.c +++ b/fs/quota/quota_v2.c @@ -199,6 +199,7 @@ static void v2r0_disk2memdqb(struct dquot *dquot, void *dp) static void v2r0_mem2diskdqb(void *dp, struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; struct v2r0_disk_dqblk *d = dp; struct mem_dqblk *m = &dquot->dq_dqb; struct qtree_mem_dqinfo *info = @@ -212,7 +213,7 @@ static void v2r0_mem2diskdqb(void *dp, struct dquot *dquot) d->dqb_bsoftlimit = cpu_to_le32(v2_stoqb(m->dqb_bsoftlimit)); d->dqb_curspace = cpu_to_le64(m->dqb_curspace); d->dqb_btime = cpu_to_le64(m->dqb_btime); - d->dqb_id = cpu_to_le32(from_kqid(&init_user_ns, dquot->dq_id)); + d->dqb_id = cpu_to_le32(from_kqid(dq_user_ns, dquot->dq_id)); if (qtree_entry_unused(info, dp)) d->dqb_itime = cpu_to_le64(1); } @@ -225,7 +226,7 @@ static int v2r0_is_id(void *dp, struct dquot *dquot) if (qtree_entry_unused(info, dp)) return 0; - return qid_eq(make_kqid(&init_user_ns, dquot->dq_id.type, + return qid_eq(make_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id.type, le32_to_cpu(d->dqb_id)), dquot->dq_id); } @@ -252,6 +253,7 @@ static void v2r1_disk2memdqb(struct dquot *dquot, void *dp) static void v2r1_mem2diskdqb(void *dp, struct dquot *dquot) { + struct user_namespace *dq_user_ns = dquot->dq_sb->s_user_ns; struct v2r1_disk_dqblk *d = dp; struct mem_dqblk *m = &dquot->dq_dqb; struct qtree_mem_dqinfo *info = @@ -265,7 +267,7 @@ static void v2r1_mem2diskdqb(void *dp, struct dquot *dquot) d->dqb_bsoftlimit = cpu_to_le64(v2_stoqb(m->dqb_bsoftlimit)); d->dqb_curspace = cpu_to_le64(m->dqb_curspace); d->dqb_btime = cpu_to_le64(m->dqb_btime); - d->dqb_id = cpu_to_le32(from_kqid(&init_user_ns, dquot->dq_id)); + d->dqb_id = cpu_to_le32(from_kqid(dq_user_ns, dquot->dq_id)); if (qtree_entry_unused(info, dp)) d->dqb_itime = cpu_to_le64(1); } @@ -278,7 +280,7 @@ static int v2r1_is_id(void *dp, struct dquot *dquot) if (qtree_entry_unused(info, dp)) return 0; - return qid_eq(make_kqid(&init_user_ns, dquot->dq_id.type, + return qid_eq(make_kqid(dquot->dq_sb->s_user_ns, dquot->dq_id.type, le32_to_cpu(d->dqb_id)), dquot->dq_id); } -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <87mvm03pxy.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. [not found] ` <87mvm03pxy.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-04 9:11 ` Jan Kara 2016-07-05 14:48 ` Seth Forshee 2016-07-05 15:34 ` Eric W. Biederman 0 siblings, 2 replies; 58+ messages in thread From: Jan Kara @ 2016-07-04 9:11 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni On Sat 02-07-16 12:33:29, Eric W. Biederman wrote: > In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with > the filesystems notion of id 0. Hum, is it really usable? Basically the tool calling Q_XSETQLIMIT would have to be aware of the namespace the filesystem is mounted in to be able to perform the desired operation (and if it gets is wrong, there's possibility it would just silently set the timers for some user instead of for all users). > For the dquot hashfn translate the qid into sb->s_user_ns to remove > extraneous bits before hashing. > > When dealing with generic quota files in quota_tree.c, quota_v1.c, and > quota_v2.c before writing quotas to the file encode them in > sb->s_user_ns, and decode ids read from the file from sb->s_user_ns. And here the sb->s_user_ns becomes part of the on-disk format because the numerical ID value is used to compute the position in on-disk data structures. This seems to be in conflict with the idea that anything that is stored on disk is stored in the initial user namespace. > One complication comes up in qtree_get_next_id, as it is possible in > principle for id's to be present in the quota file that do not have a > mapping in the filesystems user namespace. If that is the case the > code is modified to skip the unmapped ids. So I'm not 100% sure how this is going to work. qtree_get_next_id() is used as an interface to report quota information for the whole filesystem. So skipping ids without mappings makes sense. However userspace uses the interface as: id = 0; while (get_next_quota(id, &result)) { do stuff; id = result.dq_id + 1; } If sb->s_user_ns is the current namespace of the process, I suppose this is going to work as expected. If the namespace of the process is different (including the init_user_ns), I expect this breaks, doesn't it? Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. 2016-07-04 9:11 ` Jan Kara @ 2016-07-05 14:48 ` Seth Forshee 2016-07-05 15:34 ` Eric W. Biederman 1 sibling, 0 replies; 58+ messages in thread From: Seth Forshee @ 2016-07-05 14:48 UTC (permalink / raw) To: Jan Kara Cc: Eric W. Biederman, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel, Djalal Harouni On Mon, Jul 04, 2016 at 11:11:00AM +0200, Jan Kara wrote: > On Sat 02-07-16 12:33:29, Eric W. Biederman wrote: > > In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with > > the filesystems notion of id 0. > > Hum, is it really usable? Basically the tool calling Q_XSETQLIMIT would > have to be aware of the namespace the filesystem is mounted in to be able > to perform the desired operation (and if it gets is wrong, there's > possibility it would just silently set the timers for some user instead of > for all users). Generally userspace does not need to be aware of the namespace. The user id passed from userspace is translated based on its namespace, and if that kqid doesn't map into s_user_ns the Q_XSETQLIM operation fails. But it requires going to some trouble and having CAP_SYS_ADMIN towards the relevant namespaces to give processes not in s_user_ns visibility to the mount, so that isn't going to be a common scenario. If some user does set up such a scenario then it doesn't seem to be asking too much for them to be aware of the limitations. Thanks, Seth ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. 2016-07-04 9:11 ` Jan Kara 2016-07-05 14:48 ` Seth Forshee @ 2016-07-05 15:34 ` Eric W. Biederman [not found] ` <87d1msumhy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 1 sibling, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-05 15:34 UTC (permalink / raw) To: Jan Kara Cc: Seth Forshee, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel, Djalal Harouni Jan Kara <jack@suse.cz> writes: > On Sat 02-07-16 12:33:29, Eric W. Biederman wrote: >> In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with >> the filesystems notion of id 0. > > Hum, is it really usable? Basically the tool calling Q_XSETQLIMIT would > have to be aware of the namespace the filesystem is mounted in to be able > to perform the desired operation (and if it gets is wrong, there's > possibility it would just silently set the timers for some user instead of > for all users). In general yes. Because we are talking about the uid of the root user in the container that mounted the filesystem. It is the uid that is stored as 0 on the filesystem. So in the small everything looks normal. >> For the dquot hashfn translate the qid into sb->s_user_ns to remove >> extraneous bits before hashing. >> >> When dealing with generic quota files in quota_tree.c, quota_v1.c, and >> quota_v2.c before writing quotas to the file encode them in >> sb->s_user_ns, and decode ids read from the file from sb->s_user_ns. > > And here the sb->s_user_ns becomes part of the on-disk format because the > numerical ID value is used to compute the position in on-disk data > structures. This seems to be in conflict with the idea that anything that > is stored on disk is stored in the initial user namespace. Up to now it has been the case that all filesystems with a backing store that the kernel could mount stored their data in the initial user namespace. This set of changes is all about making it so that a filesystem with uids and gids stored in s_user_ns can be supported safely by the kernel. >> One complication comes up in qtree_get_next_id, as it is possible in >> principle for id's to be present in the quota file that do not have a >> mapping in the filesystems user namespace. If that is the case the >> code is modified to skip the unmapped ids. > > So I'm not 100% sure how this is going to work. qtree_get_next_id() is used > as an interface to report quota information for the whole filesystem. So > skipping ids without mappings makes sense. However userspace uses the > interface as: > > id = 0; > while (get_next_quota(id, &result)) { > do stuff; > id = result.dq_id + 1; > } > > If sb->s_user_ns is the current namespace of the process, I suppose this is > going to work as expected. If the namespace of the process is different > (including the init_user_ns), I expect this breaks, doesn't it? Possibly. And this is a good question. Where this will break initially in that scenario is in: static int quota_getnextquota(struct super_block *sb, int type, qid_t id, void __user *addr) { ... if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &fdq); if (ret) If userspace actually has to perform the increment. The way I read the code in Now it has occurred to me that I am probably getting ahead of things with actually enabling all of this for the quota subsystem as I don't think we have fileystems that use the vfs quota support on our short term list. I don't think these patches are wrong except possibily in handling a quota interface with unmapped ids. Which is sufficiently weird I don't know if there is something better we can do. Given that we are not likely to use quotas for a while I am happy to add a test in vfs_load_quota_inode (the guts of quota_on, and quota_enable) that tests if s_user_ns != &init_user_ns and just fails. Then the quota changes can be left until we figure out how to support a filesystem that uses quota support. That should ensure better testing coverage if nothing else. The more I think of it the more I think that sounds like wisdom. Dropping this patch and replacing it by one that just does: diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index d8fb0fd3ff6f..9c9890fe18b7 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2273,6 +2273,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, error = -EINVAL; goto out_fmt; } + /* Filesystems outside of init_user_ns not yet supported */ + if (sb->s_user_ns != &init_user_ns) { + error = -EINVAL; + goto out_fmt; + } /* Usage always has to be set... */ if (!(flags & DQUOT_USAGE_ENABLED)) { error = -EINVAL; seems a lot more appropriate at this point. That is enough to give a great big hint there is something that needs to be done but won't embrittle the code with untested corner cases. Especially since the code probably needs testing and a very close review to see what problems a hostile quota file can create. I the quota files are simple enough we can't get into to much trouble but that is an important consideration at this point. So I am going to go respin this patchset replacing this change. Eric ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <87d1msumhy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. [not found] ` <87d1msumhy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-05 20:57 ` Dave Chinner 2016-07-05 21:28 ` Eric W. Biederman 0 siblings, 1 reply; 58+ messages in thread From: Dave Chinner @ 2016-07-05 20:57 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Seth Forshee, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni On Tue, Jul 05, 2016 at 10:34:49AM -0500, Eric W. Biederman wrote: > The more I think of it the more I think that sounds like wisdom. > Dropping this patch and replacing it by one that just does: > > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > index d8fb0fd3ff6f..9c9890fe18b7 100644 > --- a/fs/quota/dquot.c > +++ b/fs/quota/dquot.c > @@ -2273,6 +2273,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, > error = -EINVAL; > goto out_fmt; > } > + /* Filesystems outside of init_user_ns not yet supported */ > + if (sb->s_user_ns != &init_user_ns) { > + error = -EINVAL; > + goto out_fmt; > + } > /* Usage always has to be set... */ > if (!(flags & DQUOT_USAGE_ENABLED)) { > error = -EINVAL; > > > seems a lot more appropriate at this point. That is enough to give a > great big hint there is something that needs to be done but won't > embrittle the code with untested corner cases. You'll need to propagate that to all filesystems that have their own quota implemenation, too. Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. 2016-07-05 20:57 ` Dave Chinner @ 2016-07-05 21:28 ` Eric W. Biederman [not found] ` <8737nnrcyy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-05 21:28 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Seth Forshee, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> writes: > On Tue, Jul 05, 2016 at 10:34:49AM -0500, Eric W. Biederman wrote: >> The more I think of it the more I think that sounds like wisdom. >> Dropping this patch and replacing it by one that just does: >> >> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c >> index d8fb0fd3ff6f..9c9890fe18b7 100644 >> --- a/fs/quota/dquot.c >> +++ b/fs/quota/dquot.c >> @@ -2273,6 +2273,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, >> error = -EINVAL; >> goto out_fmt; >> } >> + /* Filesystems outside of init_user_ns not yet supported */ >> + if (sb->s_user_ns != &init_user_ns) { >> + error = -EINVAL; >> + goto out_fmt; >> + } >> /* Usage always has to be set... */ >> if (!(flags & DQUOT_USAGE_ENABLED)) { >> error = -EINVAL; >> >> >> seems a lot more appropriate at this point. That is enough to give a >> great big hint there is something that needs to be done but won't >> embrittle the code with untested corner cases. > > You'll need to propagate that to all filesystems that have their own > quota implemenation, too. All of the filesytems that have their own quota implementations omit the flag FS_USERNS_MOUNT in fs_flags in struct filesystem so they are protected. The above change is extending to the generic implementation of quota files the same protection that is already enjoyed by filesystems in general. Recognition that being attacked by malicious users is a difficult thing to defend against, and not enabling the code until there is a reasonable certainty the code is robust against maliciuos attack and everyone involved with maintaining the code is comfortable about the situtation. I have every intention of keeping all of the changes to the generic parts of the quota layer so that filesystems who wish to implement unprivileged mounts and implement quotas may proceed if they so desire. Eric p.s. I expect the the generic quota implementation is simple enough that it is not particularly suseptible to problems caused by malicious data. But I don't currently care enough to verify and test that assumption so this is very much the wrong time for me to be enabling the feature. ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <8737nnrcyy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. [not found] ` <8737nnrcyy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-06 6:35 ` Dave Chinner 2016-07-06 8:25 ` Jan Kara 0 siblings, 1 reply; 58+ messages in thread From: Dave Chinner @ 2016-07-06 6:35 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni On Tue, Jul 05, 2016 at 04:28:53PM -0500, Eric W. Biederman wrote: > Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> writes: > > > On Tue, Jul 05, 2016 at 10:34:49AM -0500, Eric W. Biederman wrote: > >> The more I think of it the more I think that sounds like wisdom. > >> Dropping this patch and replacing it by one that just does: > >> > >> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > >> index d8fb0fd3ff6f..9c9890fe18b7 100644 > >> --- a/fs/quota/dquot.c > >> +++ b/fs/quota/dquot.c > >> @@ -2273,6 +2273,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, > >> error = -EINVAL; > >> goto out_fmt; > >> } > >> + /* Filesystems outside of init_user_ns not yet supported */ > >> + if (sb->s_user_ns != &init_user_ns) { > >> + error = -EINVAL; > >> + goto out_fmt; > >> + } > >> /* Usage always has to be set... */ > >> if (!(flags & DQUOT_USAGE_ENABLED)) { > >> error = -EINVAL; > >> > >> > >> seems a lot more appropriate at this point. That is enough to give a > >> great big hint there is something that needs to be done but won't > >> embrittle the code with untested corner cases. > > > > You'll need to propagate that to all filesystems that have their own > > quota implemenation, too. > > All of the filesytems that have their own quota implementations omit the > flag FS_USERNS_MOUNT in fs_flags in struct filesystem so they are > protected. Which is the same situation currently for every filesystem that supports quotas, regardless of the implementation infrastructure. > p.s. I expect the the generic quota implementation is simple enough > that it is not particularly suseptible to problems caused by malicious > data. But I don't currently care enough to verify and test that > assumption so this is very much the wrong time for me to be enabling the > feature. All the more reason you should be adding the same guard to all the other filesystems.... All i'm asking you to do is to make this check in a way that all filesystems that implement quotas will execute it. Don't leave landmines with security implications around - make sure all filesystems have the same protections. -Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. 2016-07-06 6:35 ` Dave Chinner @ 2016-07-06 8:25 ` Jan Kara 2016-07-06 17:51 ` Eric W. Biederman 0 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-06 8:25 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Eric W. Biederman, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni, Michael Kerrisk On Wed 06-07-16 16:35:04, Dave Chinner wrote: > On Tue, Jul 05, 2016 at 04:28:53PM -0500, Eric W. Biederman wrote: > > Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> writes: > > > > > On Tue, Jul 05, 2016 at 10:34:49AM -0500, Eric W. Biederman wrote: > > >> The more I think of it the more I think that sounds like wisdom. > > >> Dropping this patch and replacing it by one that just does: > > >> > > >> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > > >> index d8fb0fd3ff6f..9c9890fe18b7 100644 > > >> --- a/fs/quota/dquot.c > > >> +++ b/fs/quota/dquot.c > > >> @@ -2273,6 +2273,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, > > >> error = -EINVAL; > > >> goto out_fmt; > > >> } > > >> + /* Filesystems outside of init_user_ns not yet supported */ > > >> + if (sb->s_user_ns != &init_user_ns) { > > >> + error = -EINVAL; > > >> + goto out_fmt; > > >> + } > > >> /* Usage always has to be set... */ > > >> if (!(flags & DQUOT_USAGE_ENABLED)) { > > >> error = -EINVAL; > > >> > > >> > > >> seems a lot more appropriate at this point. That is enough to give a > > >> great big hint there is something that needs to be done but won't > > >> embrittle the code with untested corner cases. > > > > > > You'll need to propagate that to all filesystems that have their own > > > quota implemenation, too. > > > > All of the filesytems that have their own quota implementations omit the > > flag FS_USERNS_MOUNT in fs_flags in struct filesystem so they are > > protected. > > Which is the same situation currently for every filesystem that > supports quotas, regardless of the implementation infrastructure. > > > p.s. I expect the the generic quota implementation is simple enough > > that it is not particularly suseptible to problems caused by malicious > > data. But I don't currently care enough to verify and test that > > assumption so this is very much the wrong time for me to be enabling the > > feature. > > All the more reason you should be adding the same guard to all the > other filesystems.... > > All i'm asking you to do is to make this check in a way that all > filesystems that implement quotas will execute it. Don't leave > landmines with security implications around - make sure all > filesystems have the same protections. Well, I'm not sure I follow you here. VFS quotas are a generic code used by a few filesystems. So I can imagine that someone would decide to enable FS_USERNS_MOUNT for one of those filesystems without thinking about quotas and then Eric's check would trigger and possibly save use from some problems. When someone decides to enable FS_USERNS_MOUNT for XFS, he will have presumably made sure all parts of XFS are safe, including its quota implementation. I don't want to stop you or Eric in adding an extra check in XFS, I just have hard time to see how that check would trigger and how XFS quota is different from other XFS parts... Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 review 09/11] quota: Handle quota data stored in s_user_ns. 2016-07-06 8:25 ` Jan Kara @ 2016-07-06 17:51 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 17:51 UTC (permalink / raw) To: Jan Kara Cc: Dave Chinner, Seth Forshee, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel, Djalal Harouni Jan Kara <jack@suse.cz> writes: > On Wed 06-07-16 16:35:04, Dave Chinner wrote: >> All the more reason you should be adding the same guard to all the >> other filesystems.... >> >> All i'm asking you to do is to make this check in a way that all >> filesystems that implement quotas will execute it. Don't leave >> landmines with security implications around - make sure all >> filesystems have the same protections. That is exactly what I am doing. Ensuring people don't think the generic quota file code has been closely audited and reviewed and deemed safe against malicious users. > Well, I'm not sure I follow you here. VFS quotas are a generic code used by > a few filesystems. So I can imagine that someone would decide to enable > FS_USERNS_MOUNT for one of those filesystems without thinking about quotas > and then Eric's check would trigger and possibly save use from some > problems. > > When someone decides to enable FS_USERNS_MOUNT for XFS, he will have > presumably made sure all parts of XFS are safe, including its quota > implementation. > > I don't want to stop you or Eric in adding an extra check in XFS, I just > have hard time to see how that check would trigger and how XFS quota is > different from other XFS parts... Exactly. While it is true that the quota file code has not been deemed safe for attack by malicious users on any filesystem. It only matters if the entire filesystem has been deemed safe by setting FS_USERNS_MOUNT. So the only possible avenue of confusion I can see is with the default quota file code in dquot.c and company which I am addressing. The fully generic parts of quota in quota.c that every one uses will handle all of the weird s_user_ns != &init_user_ns cases at the end of this patchset so there is nothing to worry about there either. So I don't see additional checks worth adding anywhere. Dave if you want to send me an XFS patch or point out where such a check would belong in XFS I won't be opposed to carrying it in my tree along with the rest of this change. I probably will be puzzled about what makes that code need an extra check but I won't be opposed to carring such a patch. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 11/11] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> ` (3 preceding siblings ...) 2016-07-02 17:20 ` [PATCH review 09/11] quota: Handle quota data stored in s_user_ns Eric W. Biederman @ 2016-07-02 17:20 ` Eric W. Biederman 4 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-02 17:20 UTC (permalink / raw) To: Seth Forshee Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni From: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> For filesystems mounted from a user namespace on-disk ids should be translated relative to s_users_ns rather than init_user_ns. When an id in the filesystem doesn't exist in s_user_ns the associated id in the inode will be set to INVALID_[UG]ID, which turns these into de facto "nobody" ids. This actually maps pretty well into the way most code already works, and those places where it didn't were fixed in previous patches. Moving forward vfs code needs to be careful to handle instances where ids in inodes may be invalid. Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- include/linux/fs.h | 50 +++++++++++++++++++++++++------------------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index cb25ceb6d1ef..8aa9b72e0bc5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -831,31 +831,6 @@ static inline void i_size_write(struct inode *inode, loff_t i_size) #endif } -/* Helper functions so that in most cases filesystems will - * not need to deal directly with kuid_t and kgid_t and can - * instead deal with the raw numeric values that are stored - * in the filesystem. - */ -static inline uid_t i_uid_read(const struct inode *inode) -{ - return from_kuid(&init_user_ns, inode->i_uid); -} - -static inline gid_t i_gid_read(const struct inode *inode) -{ - return from_kgid(&init_user_ns, inode->i_gid); -} - -static inline void i_uid_write(struct inode *inode, uid_t uid) -{ - inode->i_uid = make_kuid(&init_user_ns, uid); -} - -static inline void i_gid_write(struct inode *inode, gid_t gid) -{ - inode->i_gid = make_kgid(&init_user_ns, gid); -} - static inline unsigned iminor(const struct inode *inode) { return MINOR(inode->i_rdev); @@ -1461,6 +1436,31 @@ struct super_block { struct list_head s_inodes; /* all inodes */ }; +/* Helper functions so that in most cases filesystems will + * not need to deal directly with kuid_t and kgid_t and can + * instead deal with the raw numeric values that are stored + * in the filesystem. + */ +static inline uid_t i_uid_read(const struct inode *inode) +{ + return from_kuid(inode->i_sb->s_user_ns, inode->i_uid); +} + +static inline gid_t i_gid_read(const struct inode *inode) +{ + return from_kgid(inode->i_sb->s_user_ns, inode->i_gid); +} + +static inline void i_uid_write(struct inode *inode, uid_t uid) +{ + inode->i_uid = make_kuid(inode->i_sb->s_user_ns, uid); +} + +static inline void i_gid_write(struct inode *inode, gid_t gid) +{ + inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid); +} + extern struct timespec current_fs_time(struct super_block *sb); /* -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-02 17:18 [PATCH review 0/11] General unprivileged mount support Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman @ 2016-07-04 8:52 ` Jan Kara [not found] ` <20160704085220.GC5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-04 8:52 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk On Sat 02-07-16 12:18:08, Eric W. Biederman wrote: > > As well as in these patches the code is also available from: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > > It has been a long time in coming but recently in the userns tree the > superblock has been expanded with a s_user_ns field indicating the user > namespace that owns a superblock. > > The s_user_ns owner of a superblock has three implications. > - Only kuids and kgids that map into s_user_ns are allowed to be sent to a > filesystem from the vfs. > - If the uid or gid on the filesystem does not map into s_user_ns i_uid > is set to INVALID_UID and i_gid is set to INVALID_GID. > - The scope of permission checks can be changed from global to a > capabilitiy check in s_user_ns. OK, to check that I understand it right: So the uids and gids that are stored on disk are still expected to be in the initial id namespace, aren't they? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20160704085220.GC5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <20160704085220.GC5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> @ 2016-07-04 16:27 ` Eric W. Biederman [not found] ` <87h9c52wsd.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-04 16:27 UTC (permalink / raw) To: Jan Kara Cc: Seth Forshee, Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > On Sat 02-07-16 12:18:08, Eric W. Biederman wrote: >> >> As well as in these patches the code is also available from: >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing >> >> It has been a long time in coming but recently in the userns tree the >> superblock has been expanded with a s_user_ns field indicating the user >> namespace that owns a superblock. >> >> The s_user_ns owner of a superblock has three implications. >> - Only kuids and kgids that map into s_user_ns are allowed to be sent to a >> filesystem from the vfs. >> - If the uid or gid on the filesystem does not map into s_user_ns i_uid >> is set to INVALID_UID and i_gid is set to INVALID_GID. >> - The scope of permission checks can be changed from global to a >> capabilitiy check in s_user_ns. > > OK, to check that I understand it right: > > So the uids and gids that are stored on disk are still expected to be in > the initial id namespace, aren't they? No. The general expectation is that the ids on disk are store in s_user_ns. Id's that don't map to the initial id namespace get stored in the generic data structures as INVALID_UID and INVALID_GID. In practice I don't expect anyone will set up a situation knowingly where id's don't map, but the case has to be handled because mistakes and malicious code happens. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <87h9c52wsd.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <87h9c52wsd.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-06 8:54 ` Jan Kara 2016-07-06 13:54 ` Seth Forshee 0 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-06 8:54 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > > > On Sat 02-07-16 12:18:08, Eric W. Biederman wrote: > >> > >> As well as in these patches the code is also available from: > >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > >> > >> It has been a long time in coming but recently in the userns tree the > >> superblock has been expanded with a s_user_ns field indicating the user > >> namespace that owns a superblock. > >> > >> The s_user_ns owner of a superblock has three implications. > >> - Only kuids and kgids that map into s_user_ns are allowed to be sent to a > >> filesystem from the vfs. > >> - If the uid or gid on the filesystem does not map into s_user_ns i_uid > >> is set to INVALID_UID and i_gid is set to INVALID_GID. > >> - The scope of permission checks can be changed from global to a > >> capabilitiy check in s_user_ns. > > > > OK, to check that I understand it right: > > > > So the uids and gids that are stored on disk are still expected to be in > > the initial id namespace, aren't they? > > No. > > The general expectation is that the ids on disk are store in s_user_ns. > > Id's that don't map to the initial id namespace get stored in the > generic data structures as INVALID_UID and INVALID_GID. > In practice I don't expect anyone will set up a situation knowingly > where id's don't map, but the case has to be handled because mistakes > and malicious code happens. OK, thanks for explanation. But then the namespace the filesystem is mounted with essentially becomes part of the on-disk format, doesn't it? Because if someone mounts the media from a different namespace, suddently the UID/GIDs may map to different users in initial user namespace and consequences may be weird, right? Shouldn't it thus be somehow stored together with the filesystem to make things more robust? I don't remember the indented uses for user-ns mounts so I may be just wrong. But my experience tells me that external data (such as user namespace ID mappings in your case) that modify meaning of on-disk format tend to cause maintenance difficulties in the long run... Because someone *will* have the idea of migrating these fs images between containers / machines and then they have to make sure mappings get migrated as well and it all becomes cumbersome. Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 8:54 ` Jan Kara @ 2016-07-06 13:54 ` Seth Forshee 2016-07-06 14:22 ` Jan Kara 0 siblings, 1 reply; 58+ messages in thread From: Seth Forshee @ 2016-07-06 13:54 UTC (permalink / raw) To: Jan Kara Cc: Eric W. Biederman, Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > > Jan Kara <jack@suse.cz> writes: > > > > > On Sat 02-07-16 12:18:08, Eric W. Biederman wrote: > > >> > > >> As well as in these patches the code is also available from: > > >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > > >> > > >> It has been a long time in coming but recently in the userns tree the > > >> superblock has been expanded with a s_user_ns field indicating the user > > >> namespace that owns a superblock. > > >> > > >> The s_user_ns owner of a superblock has three implications. > > >> - Only kuids and kgids that map into s_user_ns are allowed to be sent to a > > >> filesystem from the vfs. > > >> - If the uid or gid on the filesystem does not map into s_user_ns i_uid > > >> is set to INVALID_UID and i_gid is set to INVALID_GID. > > >> - The scope of permission checks can be changed from global to a > > >> capabilitiy check in s_user_ns. > > > > > > OK, to check that I understand it right: > > > > > > So the uids and gids that are stored on disk are still expected to be in > > > the initial id namespace, aren't they? > > > > No. > > > > The general expectation is that the ids on disk are store in s_user_ns. > > > > Id's that don't map to the initial id namespace get stored in the > > generic data structures as INVALID_UID and INVALID_GID. > > > In practice I don't expect anyone will set up a situation knowingly > > where id's don't map, but the case has to be handled because mistakes > > and malicious code happens. > > OK, thanks for explanation. But then the namespace the filesystem is > mounted with essentially becomes part of the on-disk format, doesn't it? > Because if someone mounts the media from a different namespace, suddently > the UID/GIDs may map to different users in initial user namespace and > consequences may be weird, right? Shouldn't it thus be somehow stored > together with the filesystem to make things more robust? > > I don't remember the indented uses for user-ns mounts so I may be just > wrong. But my experience tells me that external data (such as user > namespace ID mappings in your case) that modify meaning of on-disk format > tend to cause maintenance difficulties in the long run... Because someone > *will* have the idea of migrating these fs images between containers / > machines and then they have to make sure mappings get migrated as well and > it all becomes cumbersome. The intended use case for this is containers, with the idea being that I as a user will get the same behavior in the container as I would in init_user_ns without needing any userspace modifications to achieve that. So if I have a filesystem that contains uid 0 and I mount it in my container, I should see uid 0. If I mount the same bits in another container with a different uid mapping I should also see uid 0. If I mkfs a new filesystem in my container then mount it, the root directory of the fs is owned by uid 0 in my container without any modifications to mkfs. I'd argue that this makes it easier to migrate a disk between containers because the ids in the disk show up the same within the container regardless of the id mapping. If someone wants to mount a filesystem in one container and also access it in another container with a completely different id mapping, well I don't think that's ever going to work well. Seth ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 13:54 ` Seth Forshee @ 2016-07-06 14:22 ` Jan Kara 2016-07-06 14:46 ` Seth Forshee ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Jan Kara @ 2016-07-06 14:22 UTC (permalink / raw) To: Seth Forshee Cc: Jan Kara, Eric W. Biederman, Linux Containers, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk On Wed 06-07-16 08:54:46, Seth Forshee wrote: > On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: > > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > > I don't remember the indented uses for user-ns mounts so I may be just > > wrong. But my experience tells me that external data (such as user > > namespace ID mappings in your case) that modify meaning of on-disk format > > tend to cause maintenance difficulties in the long run... Because someone > > *will* have the idea of migrating these fs images between containers / > > machines and then they have to make sure mappings get migrated as well and > > it all becomes cumbersome. > > The intended use case for this is containers, with the idea being that I > as a user will get the same behavior in the container as I would in > init_user_ns without needing any userspace modifications to achieve > that. > > So if I have a filesystem that contains uid 0 and I mount it in my > container, I should see uid 0. If I mount the same bits in another > container with a different uid mapping I should also see uid 0. > > If I mkfs a new filesystem in my container then mount it, the root > directory of the fs is owned by uid 0 in my container without any > modifications to mkfs. > > I'd argue that this makes it easier to migrate a disk between containers > because the ids in the disk show up the same within the container > regardless of the id mapping. If someone wants to mount a filesystem in > one container and also access it in another container with a completely > different id mapping, well I don't think that's ever going to work well. OK, I see how this is supposed to work. However you assume here that both containers have the same set of valid UIDs, don't you? If that is not the case, the mounted image will not be usable in the other container, right? Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 14:22 ` Jan Kara @ 2016-07-06 14:46 ` Seth Forshee 2016-07-06 15:01 ` Eric W. Biederman 2016-07-06 15:23 ` James Bottomley 2 siblings, 0 replies; 58+ messages in thread From: Seth Forshee @ 2016-07-06 14:46 UTC (permalink / raw) To: Jan Kara Cc: Eric W. Biederman, Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk On Wed, Jul 06, 2016 at 04:22:55PM +0200, Jan Kara wrote: > On Wed 06-07-16 08:54:46, Seth Forshee wrote: > > On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: > > > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > > > I don't remember the indented uses for user-ns mounts so I may be just > > > wrong. But my experience tells me that external data (such as user > > > namespace ID mappings in your case) that modify meaning of on-disk format > > > tend to cause maintenance difficulties in the long run... Because someone > > > *will* have the idea of migrating these fs images between containers / > > > machines and then they have to make sure mappings get migrated as well and > > > it all becomes cumbersome. > > > > The intended use case for this is containers, with the idea being that I > > as a user will get the same behavior in the container as I would in > > init_user_ns without needing any userspace modifications to achieve > > that. > > > > So if I have a filesystem that contains uid 0 and I mount it in my > > container, I should see uid 0. If I mount the same bits in another > > container with a different uid mapping I should also see uid 0. > > > > If I mkfs a new filesystem in my container then mount it, the root > > directory of the fs is owned by uid 0 in my container without any > > modifications to mkfs. > > > > I'd argue that this makes it easier to migrate a disk between containers > > because the ids in the disk show up the same within the container > > regardless of the id mapping. If someone wants to mount a filesystem in > > one container and also access it in another container with a completely > > different id mapping, well I don't think that's ever going to work well. > > OK, I see how this is supposed to work. However you assume here that both > containers have the same set of valid UIDs, don't you? If that is not the > case, the mounted image will not be usable in the other container, right? It's possible of course. I'd expect anyone wanting to use this in practice to set up their containers with appropriate mappings. Full OS style containers will at least need some minimal uid mapping to work properly. Seth ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 14:22 ` Jan Kara 2016-07-06 14:46 ` Seth Forshee @ 2016-07-06 15:01 ` Eric W. Biederman 2016-07-06 15:23 ` James Bottomley 2 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 15:01 UTC (permalink / raw) To: Jan Kara Cc: Seth Forshee, Linux Containers, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk Jan Kara <jack@suse.cz> writes: > On Wed 06-07-16 08:54:46, Seth Forshee wrote: >> On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: >> > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: >> > I don't remember the indented uses for user-ns mounts so I may be just >> > wrong. But my experience tells me that external data (such as user >> > namespace ID mappings in your case) that modify meaning of on-disk format >> > tend to cause maintenance difficulties in the long run... Because someone >> > *will* have the idea of migrating these fs images between containers / >> > machines and then they have to make sure mappings get migrated as well and >> > it all becomes cumbersome. >> >> The intended use case for this is containers, with the idea being that I >> as a user will get the same behavior in the container as I would in >> init_user_ns without needing any userspace modifications to achieve >> that. >> >> So if I have a filesystem that contains uid 0 and I mount it in my >> container, I should see uid 0. If I mount the same bits in another >> container with a different uid mapping I should also see uid 0. >> >> If I mkfs a new filesystem in my container then mount it, the root >> directory of the fs is owned by uid 0 in my container without any >> modifications to mkfs. >> >> I'd argue that this makes it easier to migrate a disk between containers >> because the ids in the disk show up the same within the container >> regardless of the id mapping. If someone wants to mount a filesystem in >> one container and also access it in another container with a completely >> different id mapping, well I don't think that's ever going to work well. > > OK, I see how this is supposed to work. However you assume here that both > containers have the same set of valid UIDs, don't you? If that is not the > case, the mounted image will not be usable in the other container, right? In practice most uids and gids are going to be < 2^16. We currently have a 32bit id space so it isn't hard to map all of the ids. In practice this is exactly the same problem as moving filesystems between systems and ensuring that the /etc/passwd and /etc/group entries that describe the users and groups on the two systems are similiar enough not to cause problems. The difference between the classic case of moving filesystems between systems and the case with user namespaces is that when there are differences they can be encapuslated in a user namespace. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 14:22 ` Jan Kara 2016-07-06 14:46 ` Seth Forshee 2016-07-06 15:01 ` Eric W. Biederman @ 2016-07-06 15:23 ` James Bottomley [not found] ` <1467818630.2369.21.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2 siblings, 1 reply; 58+ messages in thread From: James Bottomley @ 2016-07-06 15:23 UTC (permalink / raw) To: Jan Kara, Seth Forshee Cc: Eric W. Biederman, Linux Containers, linux-fsdevel, Linux API, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk On Wed, 2016-07-06 at 16:22 +0200, Jan Kara wrote: > On Wed 06-07-16 08:54:46, Seth Forshee wrote: > > On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: > > > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > > > I don't remember the indented uses for user-ns mounts so I may be > > > just wrong. But my experience tells me that external data (such > > > as user namespace ID mappings in your case) that modify meaning > > > of on-disk format tend to cause maintenance difficulties in the > > > long run... Because someone *will* have the idea of migrating > > > these fs images between containers / machines and then they have > > > to make sure mappings get migrated as well and it all becomes > > > cumbersome. > > > > The intended use case for this is containers, with the idea being > > that I as a user will get the same behavior in the container as I > > would in init_user_ns without needing any userspace modifications > > to achieve that. > > > > So if I have a filesystem that contains uid 0 and I mount it in my > > container, I should see uid 0. If I mount the same bits in another > > container with a different uid mapping I should also see uid 0. > > > > If I mkfs a new filesystem in my container then mount it, the root > > directory of the fs is owned by uid 0 in my container without any > > modifications to mkfs. > > > > I'd argue that this makes it easier to migrate a disk between > > containers because the ids in the disk show up the same within the > > container regardless of the id mapping. If someone wants to mount a > > filesystem in one container and also access it in another container > > with a completely different id mapping, well I don't think that's > > ever going to work well. > > OK, I see how this is supposed to work. However you assume here that > both containers have the same set of valid UIDs, don't you? If that > is not the case, the mounted image will not be usable in the other > container, right? You can always set it up wrongly is the rule of containers. Because the virtualizations are so granular, there are many possible configurations which don't make sense in the real world. The main use case for this is operating system images. For them we have a set of known UID/GIDs in the image (usually 0-1000 plus the nobody/nogroups for both). Using this scheme, we'd set up the container in a userns that mapped all these ids to something unprivileged and then set up a s_user_ns to do the same for the mount location of the image meaning that the unprivileged container can now manipulate the image. There are several self contained proposals on linux-fsdevel for doing this, like shifts, which is what I'm currently using to manipulate images, so for me what it does is allows me to get rid of all the credential shifting when performing operations on the underlying filesystem. In fact, I think it pretty much allows me to get rid of a lot of the upper/lower filesystem distinction in shiftfs and I'd get quotas and other stuff I ignored for free. However, any of the other uid/gid shifting proposals can also use this as the engine. The point here is, this patch set is simply mechanism; it requires a glue layer (like shiftfs, fuse or the vfs remapping proposal) to activate it. The activation decides how much exposure to the underlying filesystem there is, so with shiftfs, there's none, it's a purely volatile system crafted for chosen images. However, it's fully possible to come up with an activation where the filesystem would decide (through some on disk format information) to declare the image to be safely remapped in this uid/gid range and then we could allow it to be mounted unprivileged (without a capability check) into a user_ns that matched the mapping. This latter is a bit of a fantasy since container images are currently little more than tar files and we have no extant way to connect them to linux fs formats, but once the possibility exists, whose to say this won't change? James ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <1467818630.2369.21.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>]
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <1467818630.2369.21.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> @ 2016-07-06 16:35 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 16:35 UTC (permalink / raw) To: James Bottomley Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, Andy Lutomirski, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Djalal Harouni James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> writes: > The point here is, this patch set is simply mechanism; it requires a > glue layer (like shiftfs, fuse or the vfs remapping proposal) to > activate it. Well ext4 can be used directly and Seth has basic patches for that support. The modifications needed are quite modest. The problem with ext4 is that a malicious ext4 filesystem image might be able to do something nasty to the kernel. How to create a maintainable high performance filesystem that can guard against malicious filesystem images is an open problem right now. Which makes ext4 a poor target for unprivileged mounts. Fuse is a good target because guarding against malicious input from userspace is part of it's orginial design. The new novel mechanism is handling INVALID_UID and INVALID_GID at the VFS layer so that filesystems who have translations in play (which will be anything not mounted by the global root) won't have to get all of the weird corner cases right on their own. To that end I will be very interested to see what shiftfs looks like on top of all of this. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-06 13:44 ` Andy Lutomirski [not found] ` <CALCETrVof174gPCZnD2Z-RMjR-P=NcA0mYCU9ki6=o9hpFL-BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-07-06 14:01 ` Andy Lutomirski 2016-07-06 18:10 ` [PATCH review 0/12] General unprivileged mount support v2 Eric W. Biederman 2 siblings, 1 reply; 58+ messages in thread From: Andy Lutomirski @ 2016-07-06 13:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Linux Containers, Linux FS Devel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Jan Kara, Jann Horn, Michael Kerrisk On Sat, Jul 2, 2016 at 10:18 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > As well as in these patches the code is also available from: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > Minor nit: in "fs: Refuse uid/gid changes which don't map into s_user_ns", the changelog ends in the middle of a sentence. ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <CALCETrVof174gPCZnD2Z-RMjR-P=NcA0mYCU9ki6=o9hpFL-BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <CALCETrVof174gPCZnD2Z-RMjR-P=NcA0mYCU9ki6=o9hpFL-BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-07-06 15:21 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 15:21 UTC (permalink / raw) To: Andy Lutomirski Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, James Bottomley, Seth Forshee, Michael Kerrisk, Linux FS Devel, Djalal Harouni Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes: > On Sat, Jul 2, 2016 at 10:18 AM, Eric W. Biederman > <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >> >> As well as in these patches the code is also available from: >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing >> > > Minor nit: in "fs: Refuse uid/gid changes which don't map into > s_user_ns", the changelog ends in the middle of a sentence. This text? > For filesystems mounted from a user namespace on-disk ids should > be translated relative to s_users_ns rather than init_user_ns. > When an id in the filesystem doesn't exist in s_user_ns the > associated id in the inode will be set to INVALID_[UG]ID, which > turns these into de facto "nobody" ids. This actually maps pretty > well into the way most code already works, and those places where > it didn't were fixed in previous patches. Moving forward vfs code > needs to be careful to handle instances where ids in inodes may > be invalid. I don't see the changelog problem you are talking about. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 13:44 ` Andy Lutomirski @ 2016-07-06 14:01 ` Andy Lutomirski 2016-07-06 15:19 ` Eric W. Biederman 2016-07-06 18:10 ` [PATCH review 0/12] General unprivileged mount support v2 Eric W. Biederman 2 siblings, 1 reply; 58+ messages in thread From: Andy Lutomirski @ 2016-07-06 14:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Seth Forshee, Linux Containers, Linux FS Devel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Jan Kara, Jann Horn, Michael Kerrisk On Sat, Jul 2, 2016 at 10:18 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > As well as in these patches the code is also available from: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing > > It has been a long time in coming but recently in the userns tree the > superblock has been expanded with a s_user_ns field indicating the user > namespace that owns a superblock. > It would be nice if global root could mount otherwise-unsafe filesystems with s_user_ns set to a non-init namespace. Do you have plans to allow that? --Andy ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 0/11] General unprivileged mount support 2016-07-06 14:01 ` Andy Lutomirski @ 2016-07-06 15:19 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 15:19 UTC (permalink / raw) To: Andy Lutomirski Cc: Jan Kara, Jann Horn, Linux API, Linux Containers, James Bottomley, Seth Forshee, Michael Kerrisk, Linux FS Devel, Djalal Harouni Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes: > On Sat, Jul 2, 2016 at 10:18 AM, Eric W. Biederman > <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >> >> As well as in these patches the code is also available from: >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing >> >> It has been a long time in coming but recently in the userns tree the >> superblock has been expanded with a s_user_ns field indicating the user >> namespace that owns a superblock. >> > > It would be nice if global root could mount otherwise-unsafe > filesystems with s_user_ns set to a non-init namespace. Do you have > plans to allow that? Looking at the code global root would pass the permission checks that are present in my tree today. At the same time I don't have a provision for global root to specify the user namespace. The practical limitation is that the filesystems need a little bit of gentle massaging to handle the case when s_user_ns != &init_user_ns. I think i_uid_read and i_uid_write already provide 90% of that. The other limitation is that s_user_ns is defined as the owner of the filesystem and as the default translation. Which will mean root in s_user_ns will be trusted to remount the filesystem. The primary target right now is a safe unprivileged mount of fuse. I am not opposed to extensions allowing the global root to mount otherwise-unsafe filesystems with s_user_ns != &init_user_ns as long as it is for cases where we can trust the underlying filesystem image and don't mind giving root in s_user_ns all interesting permissions over the filesystem. I think that could be an very interesting intermediary step in getting filesystems supported. Still my focus is true unprivileged mounts, there are a huge number of little details that go into getting that right, and my poor brain can't handle looking at cases beyond that. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 0/12] General unprivileged mount support v2 [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 13:44 ` Andy Lutomirski 2016-07-06 14:01 ` Andy Lutomirski @ 2016-07-06 18:10 ` Eric W. Biederman [not found] ` <874m82bptc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:10 UTC (permalink / raw) To: Linux Containers Cc: Jan Kara, Linux API, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jann Horn, Djalal Harouni As well as in these patches the code is also available from: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing v2 is slightly simplified. During the review I realized the previous quota changes while perhaps not wrong were in a part of the code I don't want people to think is ready for use on unprivileged filesystems yet. It has been a long time in coming but recently in the userns tree the superblock has been expanded with a s_user_ns field indicating the user namespace that owns a superblock. The s_user_ns owner of a superblock has three implications. - Only kuids and kgids that map into s_user_ns are allowed to be sent to a filesystem from the vfs. - If the uid or gid on the filesystem does not map into s_user_ns i_uid is set to INVALID_UID and i_gid is set to INVALID_GID. - The scope of permission checks can be changed from global to a capabilitiy check in s_user_ns. The overall strategy is to handle as much of this as possible in the VFS so that what is happening is consistent between filesystems and has widespread review, and so that individual filesystems don't need to duplicate code. This set of patches inserts checks to ensure only kuids and kgids that map into s_user_ns are sent to filesystems. This set of patches updates the vfs to deal with potentially unmapped uids and gids in the i_uid and i_gid fields. The strategy adopted is to deny any activity that causes inodes with unmapped uids or gid to be written to disk except for a chown that causes makes i_uid and i_gid to map into s_user_ns. Relaxing of the capability checks and adding new filesystems that would benefit from the changes is held off until the vfs support is complete. I believe this work is complete so if there anything questionable you see please let me know. I have included linux-api because several system calls get new failure modes mostly -EOVERFLOW, and that may need separate documenation and review. The target for this work is to enable fully unprivileged fuse mounts and whichever filesystem results from the uid shifting work. These vfs changes should support all kinds of filesystems, but in practice it is an open problem if it is possible to modify a block based filesystem to be safe from people manipulating filesystem images in an attempt to get the kernel to malfunction so only a very limited set of additional filesystems is ever expected to be enabled by this work. Eric W. Biederman (7): userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS vfs: Verify acls are valid within superblock's s_user_ns. vfs: Don't modify inodes with a uid or gid unknown to the vfs vfs: Don't create inodes with a uid or gid unknown to the vfs quota: Ensure qids map to the filesystem quota: Handle quota data stored in s_user_ns in quota_setxquota dquot: For now explicitly don't support filesystems outside of init_user_ns Seth Forshee (5): fs: Refuse uid/gid changes which don't map into s_user_ns fs: Check for invalid i_uid in may_follow_link() cred: Reject inodes with invalid ids in set_create_file_as() evm: Translate user/group ids relative to s_user_ns when computing HMAC fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns drivers/staging/lustre/lustre/mdc/mdc_request.c | 2 +- fs/9p/acl.c | 2 +- fs/attr.c | 19 +++++++++ fs/inode.c | 7 ++++ fs/namei.c | 40 ++++++++++++++---- fs/posix_acl.c | 8 ++-- fs/quota/dquot.c | 8 ++++ fs/quota/quota.c | 14 +++---- fs/xattr.c | 7 ++++ include/linux/fs.h | 55 ++++++++++++++----------- include/linux/posix_acl.h | 2 +- include/linux/quota.h | 10 +++++ include/linux/uidgid.h | 4 +- kernel/cred.c | 2 + security/integrity/evm/evm_crypto.c | 4 +- 15 files changed, 133 insertions(+), 51 deletions(-) Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <874m82bptc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns [not found] ` <874m82bptc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman ` (7 more replies) 0 siblings, 8 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Jan Kara, Linux API, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jann Horn, Djalal Harouni From: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Add checks to notify_change to verify that uid and gid changes will map into the superblock's user namespace. If they do not fail with -EOVERFLOW. This is mandatory so that fileystems don't have to even think of dealing with ia_uid and ia_gid that --EWB Moved the test from inode_change_ok to notify_change Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/attr.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/attr.c b/fs/attr.c index 25b24d0f6c88..dd723578ddce 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -255,6 +255,17 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de if (!(attr->ia_valid & ~(ATTR_KILL_SUID | ATTR_KILL_SGID))) return 0; + /* + * Verify that uid/gid changes are valid in the target + * namespace of the superblock. + */ + if (ia_valid & ATTR_UID && + !kuid_has_mapping(inode->i_sb->s_user_ns, attr->ia_uid)) + return -EOVERFLOW; + if (ia_valid & ATTR_GID && + !kgid_has_mapping(inode->i_sb->s_user_ns, attr->ia_gid)) + return -EOVERFLOW; + error = security_inode_setattr(dentry, attr); if (error) return error; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 03/12] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman ` (6 subsequent siblings) 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Refuse to admit any user namespace has a mapping of the INVALID_UID and the INVALID_GID when !CONFIG_USER_NS. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/uidgid.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h index 03835522dfcb..25e9d9216340 100644 --- a/include/linux/uidgid.h +++ b/include/linux/uidgid.h @@ -177,12 +177,12 @@ static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid) static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid) { - return true; + return uid_valid(uid); } static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) { - return true; + return gid_valid(gid); } #endif /* CONFIG_USER_NS */ -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 03/12] vfs: Verify acls are valid within superblock's s_user_ns. 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 04/12] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman ` (5 subsequent siblings) 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Update posix_acl_valid to verify that an acl is within a user namespace. Update the callers of posix_acl_valid to pass in an appropriate user namespace. For posix_acl_xattr_set and v9fs_xattr_set_acl pass in inode->i_sb->s_user_ns to posix_acl_valid. For md_unpack_acl pass in &init_user_ns as no inode or superblock is in sight. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- drivers/staging/lustre/lustre/mdc/mdc_request.c | 2 +- fs/9p/acl.c | 2 +- fs/posix_acl.c | 8 ++++---- include/linux/posix_acl.h | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c index 86b7445365f4..40cf57fad581 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_request.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c @@ -434,7 +434,7 @@ static int mdc_unpack_acl(struct ptlrpc_request *req, struct lustre_md *md) return rc; } - rc = posix_acl_valid(acl); + rc = posix_acl_valid(&init_user_ns, acl); if (rc) { CERROR("validate acl: %d\n", rc); posix_acl_release(acl); diff --git a/fs/9p/acl.c b/fs/9p/acl.c index 0576eaeb60b9..5b6a1743ea17 100644 --- a/fs/9p/acl.c +++ b/fs/9p/acl.c @@ -266,7 +266,7 @@ static int v9fs_xattr_set_acl(const struct xattr_handler *handler, if (IS_ERR(acl)) return PTR_ERR(acl); else if (acl) { - retval = posix_acl_valid(acl); + retval = posix_acl_valid(inode->i_sb->s_user_ns, acl); if (retval) goto err_out; } diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 8a4a266beff3..647c28180675 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -205,7 +205,7 @@ posix_acl_clone(const struct posix_acl *acl, gfp_t flags) * Check if an acl is valid. Returns 0 if it is, or -E... otherwise. */ int -posix_acl_valid(const struct posix_acl *acl) +posix_acl_valid(struct user_namespace *user_ns, const struct posix_acl *acl) { const struct posix_acl_entry *pa, *pe; int state = ACL_USER_OBJ; @@ -225,7 +225,7 @@ posix_acl_valid(const struct posix_acl *acl) case ACL_USER: if (state != ACL_USER) return -EINVAL; - if (!uid_valid(pa->e_uid)) + if (!kuid_has_mapping(user_ns, pa->e_uid)) return -EINVAL; needs_mask = 1; break; @@ -240,7 +240,7 @@ posix_acl_valid(const struct posix_acl *acl) case ACL_GROUP: if (state != ACL_GROUP) return -EINVAL; - if (!gid_valid(pa->e_gid)) + if (!kgid_has_mapping(user_ns, pa->e_gid)) return -EINVAL; needs_mask = 1; break; @@ -845,7 +845,7 @@ posix_acl_xattr_set(const struct xattr_handler *handler, return PTR_ERR(acl); if (acl) { - ret = posix_acl_valid(acl); + ret = posix_acl_valid(inode->i_sb->s_user_ns, acl); if (ret) goto out; } diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index 5b5a80cc5926..1af6438fde3e 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -81,7 +81,7 @@ posix_acl_release(struct posix_acl *acl) extern void posix_acl_init(struct posix_acl *, int); extern struct posix_acl *posix_acl_alloc(int, gfp_t); -extern int posix_acl_valid(const struct posix_acl *); +extern int posix_acl_valid(struct user_namespace *, const struct posix_acl *); extern int posix_acl_permission(struct inode *, const struct posix_acl *, int); extern struct posix_acl *posix_acl_from_mode(umode_t, gfp_t); extern int posix_acl_equiv_mode(const struct posix_acl *, umode_t *); -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 04/12] fs: Check for invalid i_uid in may_follow_link() 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 03/12] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 05/12] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman ` (4 subsequent siblings) 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> Filesystem uids which don't map into a user namespace may result in inode->i_uid being INVALID_UID. A symlink and its parent could have different owners in the filesystem can both get mapped to INVALID_UID, which may result in following a symlink when this would not have otherwise been permitted when protected symlinks are enabled. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/namei.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 757a32725d92..8701bd9a5270 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -901,6 +901,7 @@ static inline int may_follow_link(struct nameidata *nd) { const struct inode *inode; const struct inode *parent; + kuid_t puid; if (!sysctl_protected_symlinks) return 0; @@ -916,7 +917,8 @@ static inline int may_follow_link(struct nameidata *nd) return 0; /* Allowed if parent directory and link owner match. */ - if (uid_eq(parent->i_uid, inode->i_uid)) + puid = parent->i_uid; + if (uid_valid(puid) && uid_eq(puid, inode->i_uid)) return 0; if (nd->flags & LOOKUP_RCU) -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 05/12] cred: Reject inodes with invalid ids in set_create_file_as() 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (2 preceding siblings ...) 2016-07-06 18:12 ` [PATCH review 04/12] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> ` (3 subsequent siblings) 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> Using INVALID_[UG]ID for the LSM file creation context doesn't make sense, so return an error if the inode passed to set_create_file_as() has an invalid id. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/cred.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index 0c0cd8a62285..5f264fb5737d 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx); */ int set_create_files_as(struct cred *new, struct inode *inode) { + if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid)) + return -EINVAL; new->fsuid = inode->i_uid; new->fsgid = inode->i_gid; return security_kernel_create_files_as(new, inode); -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* [PATCH review 06/12] vfs: Don't modify inodes with a uid or gid unknown to the vfs [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 07/12] vfs: Don't create " Eric W. Biederman ` (2 subsequent siblings) 3 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk When a filesystem outside of init_user_ns is mounted it could have uids and gids stored in it that do not map to init_user_ns. The plan is to allow those filesystems to set i_uid to INVALID_UID and i_gid to INVALID_GID for unmapped uids and gids and then to handle that strange case in the vfs to ensure there is consistent robust handling of the weirdness. Upon a careful review of the vfs and filesystems about the only case where there is any possibility of confusion or trouble is when the inode is written back to disk. In that case filesystems typically read the inode->i_uid and inode->i_gid and write them to disk even when just an inode timestamp is being updated. Which leads to a rule that is very simple to implement and understand inodes whose i_uid or i_gid is not valid may not be written. In dealing with access times this means treat those inodes as if the inode flag S_NOATIME was set. Reads of the inodes appear safe and useful, but any write or modification is disallowed. The only inode write that is allowed is a chown that sets the uid and gid on the inode to valid values. After such a chown the inode is normal and may be treated as such. Denying all writes to inodes with uids or gids unknown to the vfs also prevents several oddball cases where corruption would have occurred because the vfs does not have complete information. One problem case that is prevented is attempting to use the gid of a directory for new inodes where the directories sgid bit is set but the directories gid is not mapped. Another problem case avoided is attempting to update the evm hash after setxattr, removexattr, and setattr. As the evm hash includeds the inode->i_uid or inode->i_gid not knowning the uid or gid prevents a correct evm hash from being computed. evm hash verification also fails when i_uid or i_gid is unknown but that is essentially harmless as it does not cause filesystem corruption. Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/attr.c | 8 ++++++++ fs/inode.c | 7 +++++++ fs/namei.c | 26 +++++++++++++++++++++----- fs/xattr.c | 7 +++++++ include/linux/fs.h | 5 +++++ 5 files changed, 48 insertions(+), 5 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index dd723578ddce..42bb42bb3c72 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -266,6 +266,14 @@ int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **de !kgid_has_mapping(inode->i_sb->s_user_ns, attr->ia_gid)) return -EOVERFLOW; + /* Don't allow modifications of files with invalid uids or + * gids unless those uids & gids are being made valid. + */ + if (!(ia_valid & ATTR_UID) && !uid_valid(inode->i_uid)) + return -EOVERFLOW; + if (!(ia_valid & ATTR_GID) && !gid_valid(inode->i_gid)) + return -EOVERFLOW; + error = security_inode_setattr(dentry, attr); if (error) return error; diff --git a/fs/inode.c b/fs/inode.c index 4ccbc21b30ce..c0ebb97fb085 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1617,6 +1617,13 @@ bool atime_needs_update(const struct path *path, struct inode *inode) if (inode->i_flags & S_NOATIME) return false; + + /* Atime updates will likely cause i_uid and i_gid to be written + * back improprely if their true value is unknown to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return false; + if (IS_NOATIME(inode)) return false; if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode)) diff --git a/fs/namei.c b/fs/namei.c index 8701bd9a5270..840201c4c290 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -410,6 +410,14 @@ int __inode_permission(struct inode *inode, int mask) */ if (IS_IMMUTABLE(inode)) return -EACCES; + + /* + * Updating mtime will likely cause i_uid and i_gid to be + * written back improperly if their true value is unknown + * to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EACCES; } retval = do_inode_permission(inode, mask); @@ -2759,10 +2767,11 @@ EXPORT_SYMBOL(__check_sticky); * c. have CAP_FOWNER capability * 6. If the victim is append-only or immutable we can't do antyhing with * links pointing to it. - * 7. If we were asked to remove a directory and victim isn't one - ENOTDIR. - * 8. If we were asked to remove a non-directory and victim isn't one - EISDIR. - * 9. We can't remove a root or mountpoint. - * 10. We don't allow removal of NFS sillyrenamed files; it's handled by + * 7. If the victim has an unknown uid or gid we can't change the inode. + * 8. If we were asked to remove a directory and victim isn't one - ENOTDIR. + * 9. If we were asked to remove a non-directory and victim isn't one - EISDIR. + * 10. We can't remove a root or mountpoint. + * 11. We don't allow removal of NFS sillyrenamed files; it's handled by * nfs_async_unlink(). */ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) @@ -2784,7 +2793,7 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) return -EPERM; if (check_sticky(dir, inode) || IS_APPEND(inode) || - IS_IMMUTABLE(inode) || IS_SWAPFILE(inode)) + IS_IMMUTABLE(inode) || IS_SWAPFILE(inode) || HAS_UNMAPPED_ID(inode)) return -EPERM; if (isdir) { if (!d_is_dir(victim)) @@ -4190,6 +4199,13 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de */ if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) return -EPERM; + /* + * Updating the link count will likely cause i_uid and i_gid to + * be writen back improperly if their true value is unknown to + * the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EPERM; if (!dir->i_op->link) return -EPERM; if (S_ISDIR(inode->i_mode)) diff --git a/fs/xattr.c b/fs/xattr.c index 4beafc43daa5..c243905835ab 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -38,6 +38,13 @@ xattr_permission(struct inode *inode, const char *name, int mask) if (mask & MAY_WRITE) { if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) return -EPERM; + /* + * Updating an xattr will likely cause i_uid and i_gid + * to be writen back improperly if their true value is + * unknown to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EPERM; } /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 375e37f42cdf..cb25ceb6d1ef 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1874,6 +1874,11 @@ struct super_operations { #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool HAS_UNMAPPED_ID(struct inode *inode) +{ + return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid); +} + /* * Inode state bits. Protected by inode->i_lock * -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 07/12] vfs: Don't create inodes with a uid or gid unknown to the vfs [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-06 18:12 ` [PATCH review 06/12] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 11/12] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman 3 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/namei.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 840201c4c290..629823f19a6a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2814,16 +2814,22 @@ static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) * 1. We can't do it if child already exists (open has special treatment for * this case, but since we are inlined it's OK) * 2. We can't do it if dir is read-only (done in permission()) - * 3. We should have write and exec permissions on dir - * 4. We can't do it if dir is immutable (done in permission()) + * 3. We can't do it if the fs can't represent the fsuid or fsgid. + * 4. We should have write and exec permissions on dir + * 5. We can't do it if dir is immutable (done in permission()) */ static inline int may_create(struct inode *dir, struct dentry *child) { + struct user_namespace *s_user_ns; audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE); if (child->d_inode) return -EEXIST; if (IS_DEADDIR(dir)) return -ENOENT; + s_user_ns = dir->i_sb->s_user_ns; + if (!kuid_has_mapping(s_user_ns, current_fsuid()) || + !kgid_has_mapping(s_user_ns, current_fsgid())) + return -EOVERFLOW; return inode_permission(dir, MAY_WRITE | MAY_EXEC); } -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-06 18:12 ` [PATCH review 06/12] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 07/12] vfs: Don't create " Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-11 10:09 ` Jan Kara 2016-07-06 18:12 ` [PATCH review 11/12] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman 3 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Mostly supporting filesystems outside of init_user_ns is s/&init_usre_ns/dquot->dq_sb->s_user_ns/. An actual need for supporting quotas on filesystems outside of s_user_ns is quite a ways away and to be done responsibily needs an audit on what can happen with hostile quota files. Until that audit is complete don't attempt to support quota files on filesystems outside of s_user_ns. Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/quota/dquot.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 74706b6aa747..87197d13cc76 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2271,6 +2271,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, error = -EINVAL; goto out_fmt; } + /* Filesystems outside of init_user_ns not yet supported */ + if (sb->s_user_ns != &init_user_ns) { + error = -EINVAL; + goto out_fmt; + } /* Usage always has to be set... */ if (!(flags & DQUOT_USAGE_ENABLED)) { error = -EINVAL; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns 2016-07-06 18:12 ` [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns Eric W. Biederman @ 2016-07-11 10:09 ` Jan Kara 0 siblings, 0 replies; 58+ messages in thread From: Jan Kara @ 2016-07-11 10:09 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk On Wed 06-07-16 13:12:10, Eric W. Biederman wrote: > Mostly supporting filesystems outside of init_user_ns is > s/&init_usre_ns/dquot->dq_sb->s_user_ns/. An actual need for > supporting quotas on filesystems outside of s_user_ns is quite a ways > away and to be done responsibily needs an audit on what can happen > with hostile quota files. Until that audit is complete don't attempt > to support quota files on filesystems outside of s_user_ns. > > Cc: Jan Kara <jack@suse.cz> > Acked-by: Seth Forshee <seth.forshee@canonical.com> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Looks good. You can add: Acked-by: Jan Kara <jack@suse.cz> Honza > --- > fs/quota/dquot.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > index 74706b6aa747..87197d13cc76 100644 > --- a/fs/quota/dquot.c > +++ b/fs/quota/dquot.c > @@ -2271,6 +2271,11 @@ static int vfs_load_quota_inode(struct inode *inode, int type, int format_id, > error = -EINVAL; > goto out_fmt; > } > + /* Filesystems outside of init_user_ns not yet supported */ > + if (sb->s_user_ns != &init_user_ns) { > + error = -EINVAL; > + goto out_fmt; > + } > /* Usage always has to be set... */ > if (!(flags & DQUOT_USAGE_ENABLED)) { > error = -EINVAL; > -- > 2.8.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 11/12] evm: Translate user/group ids relative to s_user_ns when computing HMAC [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2016-07-06 18:12 ` [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 3 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> The EVM HMAC should be calculated using the on disk user and group ids, so the k[ug]ids in the inode must be translated relative to the s_user_ns of the inode's super block. Signed-off-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- security/integrity/evm/evm_crypto.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c index 30b6b7d0429f..11c1d30bd705 100644 --- a/security/integrity/evm/evm_crypto.c +++ b/security/integrity/evm/evm_crypto.c @@ -151,8 +151,8 @@ static void hmac_add_misc(struct shash_desc *desc, struct inode *inode, memset(&hmac_misc, 0, sizeof(hmac_misc)); hmac_misc.ino = inode->i_ino; hmac_misc.generation = inode->i_generation; - hmac_misc.uid = from_kuid(&init_user_ns, inode->i_uid); - hmac_misc.gid = from_kgid(&init_user_ns, inode->i_gid); + hmac_misc.uid = from_kuid(inode->i_sb->s_user_ns, inode->i_uid); + hmac_misc.gid = from_kgid(inode->i_sb->s_user_ns, inode->i_gid); hmac_misc.mode = inode->i_mode; crypto_shash_update(desc, (const u8 *)&hmac_misc, sizeof(hmac_misc)); if (evm_hmac_attrs & EVM_ATTR_FSUUID) -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 08/12] quota: Ensure qids map to the filesystem 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (4 preceding siblings ...) [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-06 18:12 ` Eric W. Biederman [not found] ` <20160706181212.16267-8-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-06 18:12 ` [PATCH review 09/12] quota: Handle quota data stored in s_user_ns in quota_setxquota Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 12/12] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman 7 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk Introduce the helper qid_has_mapping and use it to ensure that the quota system only considers qids that map to the filesystems s_user_ns. In practice for quota supporting filesystems today this is the exact same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not map into init_user_ns. Replace the qid_valid calls with qid_has_mapping as values come in from userspace. This is harmless today and it prepares the quota system to work on filesystems with quotas but mounted by unprivileged users. Call qid_has_mapping from dqget. This ensures the passed in qid has a prepresentation on the underlying filesystem. Previously this was unnecessary as filesystesm never had qids that could not map. With the introduction of filesystems outside of s_user_ns this will not remain true. All of this ensures the quota code never has to deal with qids that don't map to the underlying filesystem. Cc: Jan Kara <jack@suse.cz> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/quota/dquot.c | 3 +++ fs/quota/quota.c | 12 ++++++------ include/linux/quota.h | 10 ++++++++++ 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index ff21980d0119..74706b6aa747 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -841,6 +841,9 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid) unsigned int hashent = hashfn(sb, qid); struct dquot *dquot, *empty = NULL; + if (!qid_has_mapping(sb->s_user_ns, qid)) + return ERR_PTR(-EINVAL); + if (!sb_has_quota_active(sb, qid.type)) return ERR_PTR(-ESRCH); we_slept: diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 0f10ee9892ce..73f6f4cf0a21 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -211,7 +211,7 @@ static int quota_getquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &fdq); if (ret) @@ -237,7 +237,7 @@ static int quota_getnextquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &fdq); if (ret) @@ -288,7 +288,7 @@ static int quota_setquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; copy_from_if_dqblk(&fdq, &idq); return sb->s_qcop->set_dqblk(sb, qid, &fdq); @@ -581,7 +581,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->set_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ if (from_kqid(&init_user_ns, qid) == 0 && @@ -642,7 +642,7 @@ static int quota_getxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_dqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_dqblk(sb, qid, &qdq); if (ret) @@ -669,7 +669,7 @@ static int quota_getnextxquota(struct super_block *sb, int type, qid_t id, if (!sb->s_qcop->get_nextdqblk) return -ENOSYS; qid = make_kqid(current_user_ns(), type, id); - if (!qid_valid(qid)) + if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; ret = sb->s_qcop->get_nextdqblk(sb, &qid, &qdq); if (ret) diff --git a/include/linux/quota.h b/include/linux/quota.h index 9dfb6bce8c9e..1db16ee39b31 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -179,6 +179,16 @@ static inline struct kqid make_kqid_projid(kprojid_t projid) return kqid; } +/** + * qid_has_mapping - Report if a qid maps into a user namespace. + * @ns: The user namespace to see if a value maps into. + * @qid: The kernel internal quota identifier to test. + */ +static inline bool qid_has_mapping(struct user_namespace *ns, struct kqid qid) +{ + return from_kqid(ns, qid) != (qid_t) -1; +} + extern spinlock_t dq_data_lock; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
[parent not found: <20160706181212.16267-8-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem [not found] ` <20160706181212.16267-8-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2016-07-11 10:14 ` Jan Kara [not found] ` <20160711101424.GH12410-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Jan Kara @ 2016-07-11 10:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jann Horn, Djalal Harouni On Wed 06-07-16 13:12:08, Eric W. Biederman wrote: > Introduce the helper qid_has_mapping and use it to ensure that the > quota system only considers qids that map to the filesystems > s_user_ns. > > In practice for quota supporting filesystems today this is the exact > same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not > map into init_user_ns. > > Replace the qid_valid calls with qid_has_mapping as values come in > from userspace. This is harmless today and it prepares the quota > system to work on filesystems with quotas but mounted by unprivileged > users. > > Call qid_has_mapping from dqget. This ensures the passed in qid has a > prepresentation on the underlying filesystem. Previously this was > unnecessary as filesystesm never had qids that could not map. With > the introduction of filesystems outside of s_user_ns this will not > remain true. > > All of this ensures the quota code never has to deal with qids that > don't map to the underlying filesystem. Does this and the following patch make sense when we actually don't allow quotas when s_user_ns is set? If things are untested, they will get broken so I think it is wiser to do the conversion only once we decide quota should be supported for namespaces != init_user_ns. Or do I miss something? Honza > > Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> > Acked-by: Seth Forshee <seth.forshee-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> > Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > --- > fs/quota/dquot.c | 3 +++ > fs/quota/quota.c | 12 ++++++------ > include/linux/quota.h | 10 ++++++++++ > 3 files changed, 19 insertions(+), 6 deletions(-) > > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > index ff21980d0119..74706b6aa747 100644 > --- a/fs/quota/dquot.c > +++ b/fs/quota/dquot.c > @@ -841,6 +841,9 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid) > unsigned int hashent = hashfn(sb, qid); > struct dquot *dquot, *empty = NULL; > > + if (!qid_has_mapping(sb->s_user_ns, qid)) > + return ERR_PTR(-EINVAL); > + > if (!sb_has_quota_active(sb, qid.type)) > return ERR_PTR(-ESRCH); > we_slept: > diff --git a/fs/quota/quota.c b/fs/quota/quota.c > index 0f10ee9892ce..73f6f4cf0a21 100644 > --- a/fs/quota/quota.c > +++ b/fs/quota/quota.c > @@ -211,7 +211,7 @@ static int quota_getquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->get_dqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > ret = sb->s_qcop->get_dqblk(sb, qid, &fdq); > if (ret) > @@ -237,7 +237,7 @@ static int quota_getnextquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->get_nextdqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > ret = sb->s_qcop->get_nextdqblk(sb, &qid, &fdq); > if (ret) > @@ -288,7 +288,7 @@ static int quota_setquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->set_dqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > copy_from_if_dqblk(&fdq, &idq); > return sb->s_qcop->set_dqblk(sb, qid, &fdq); > @@ -581,7 +581,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->set_dqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > /* Are we actually setting timer / warning limits for all users? */ > if (from_kqid(&init_user_ns, qid) == 0 && > @@ -642,7 +642,7 @@ static int quota_getxquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->get_dqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > ret = sb->s_qcop->get_dqblk(sb, qid, &qdq); > if (ret) > @@ -669,7 +669,7 @@ static int quota_getnextxquota(struct super_block *sb, int type, qid_t id, > if (!sb->s_qcop->get_nextdqblk) > return -ENOSYS; > qid = make_kqid(current_user_ns(), type, id); > - if (!qid_valid(qid)) > + if (!qid_has_mapping(sb->s_user_ns, qid)) > return -EINVAL; > ret = sb->s_qcop->get_nextdqblk(sb, &qid, &qdq); > if (ret) > diff --git a/include/linux/quota.h b/include/linux/quota.h > index 9dfb6bce8c9e..1db16ee39b31 100644 > --- a/include/linux/quota.h > +++ b/include/linux/quota.h > @@ -179,6 +179,16 @@ static inline struct kqid make_kqid_projid(kprojid_t projid) > return kqid; > } > > +/** > + * qid_has_mapping - Report if a qid maps into a user namespace. > + * @ns: The user namespace to see if a value maps into. > + * @qid: The kernel internal quota identifier to test. > + */ > +static inline bool qid_has_mapping(struct user_namespace *ns, struct kqid qid) > +{ > + return from_kqid(ns, qid) != (qid_t) -1; > +} > + > > extern spinlock_t dq_data_lock; > > -- > 2.8.3 > -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <20160711101424.GH12410-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem [not found] ` <20160711101424.GH12410-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> @ 2016-07-11 18:12 ` Eric W. Biederman [not found] ` <878tx8dowu.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 58+ messages in thread From: Eric W. Biederman @ 2016-07-11 18:12 UTC (permalink / raw) To: Jan Kara Cc: Linux Containers, Seth Forshee, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jann Horn, Michael Kerrisk Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > On Wed 06-07-16 13:12:08, Eric W. Biederman wrote: >> Introduce the helper qid_has_mapping and use it to ensure that the >> quota system only considers qids that map to the filesystems >> s_user_ns. >> >> In practice for quota supporting filesystems today this is the exact >> same check as qid_valid. As only 0xffffffff aka (qid_t)-1 does not >> map into init_user_ns. >> >> Replace the qid_valid calls with qid_has_mapping as values come in >> from userspace. This is harmless today and it prepares the quota >> system to work on filesystems with quotas but mounted by unprivileged >> users. >> >> Call qid_has_mapping from dqget. This ensures the passed in qid has a >> prepresentation on the underlying filesystem. Previously this was >> unnecessary as filesystesm never had qids that could not map. With >> the introduction of filesystems outside of s_user_ns this will not >> remain true. >> >> All of this ensures the quota code never has to deal with qids that >> don't map to the underlying filesystem. > > Does this and the following patch make sense when we actually don't allow > quotas when s_user_ns is set? If things are untested, they will get broken > so I think it is wiser to do the conversion only once we decide quota > should be supported for namespaces != init_user_ns. Or do I miss > something? All of the code will get exercised in the success case. So I don't think there is a strong chance of bitrot. The patches are trivially correct so I am not concerned about them. What the patches add are a logical consistency with the rest of the vfs. In my tree the rest of the vfs now handles uids and gids that can not be mapped to a filesystem and returns an error. In my tree the rest of the vfs now effectively assumes that uids and gids are stored in s_user_ns on a filesystem. So I freely admit there is a slight chance that some bit-rot will occur and the quota code will be changed in such a way that those assumptions are broken. Bugs happen. I think it much less likely if all of the generic code in the vfs makes the same assumptions. The place where I am concerned about thorough review and testing is someone poisoning quota files and then the kernel trying to use them. In the preliminary work we have done in other places in the kernel and for other filesystems there almost always winds up being some way to confuse the kernel and get it to misbave if you can poison the disk based inputs. As poison disk based inputs is not something filesystems are stronlgy concerned about. In most cases the disk the filesystem resides on is in the box and therefore under control of the OS at all times. Dave Chinner has even said he will never consider handling poisoned disk based inputs for XFS as the run time cost is too high. Between actually finding issues that can cause problems, and the increased amount of kernel code executed (and thus the increase in kernel attack surface) I am very paranoid about enabling code that trusts data that could be poisoned data from a hostile party. At the same time I am very uncomfortable with the fact the kernel does not protect against malicious disks and poisoned disk images. As poisoned disk images are a well known exploit vector in the wild. A well known and demonstrated attack vector that works is to leave a usb stick in a public place, and helpful people will place it into their computer to try and figure out who it belongs to. In trying to be helpful their computer will unbeknownst to them start executing code that does not serve the interests of the computer owner. I hate that we can not currently protect people from shenanigans like that. So I intend to be as responsible and as careful as I can, but also to push forward in the hopes that we can eventually not have to worry about our computers betraying us when we as human beings perform reasonable actions. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <878tx8dowu.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem [not found] ` <878tx8dowu.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2016-07-13 1:34 ` Dave Chinner 2016-07-13 3:45 ` Dave Chinner 2016-07-13 5:43 ` Jann Horn 0 siblings, 2 replies; 58+ messages in thread From: Dave Chinner @ 2016-07-13 1:34 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jann Horn, Djalal Harouni On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote: > The place where I am concerned about thorough review and testing is > someone poisoning quota files and then the kernel trying to use them. > In the preliminary work we have done in other places in the kernel and > for other filesystems there almost always winds up being some way to > confuse the kernel and get it to misbave if you can poison the disk > based inputs. As poison disk based inputs is not something filesystems > are stronlgy concerned about. In most cases the disk the filesystem > resides on is in the box and therefore under control of the OS at all > times. Dave Chinner has even said he will never consider handling > poisoned disk based inputs for XFS as the run time cost is too high. I didn't say that. I said that comprehensive checks to catch all possible malicious inputs is too expensive to consider a viable solution for allowing user-mounts of arbitrary filesystem images through the kernel. We already have runtime validation that bounds check most on-disk fields when they are read - that deals with fuzz based poisoning testing quite well and provides some protection against directed structural attacks as well. IOWs, we already handle a large scope of poisoned inputs safely, but it's not comprehensive because we can't easily determine cross-object reference validity within the format-determined limits. e.g. we can check that the number of records in a btree block is within the valid bounds of a block, but we cannot determine that the record count has been incremented by 1 and a bogus record has been inserted somewhere inside the block and the CRC recalculated to match the modification. We can also check the records themselves for being within bound (e.g. we can check a freespace record points to block within range and of valid length) but we can't check that the extent is actually free space. That requires doing a full filesystem traversal to determine if the extent is actually free space or not. Of course, we could look up the rmap tree (if we have one) to determine if the space is actually used or not, but an attacker can also insert/remove records in that tree, too, so if we can't trust the free space tree, we can't trust the rmap tree either. Hence we have to fall back to brute force validation if we want to be certain that the metadata has not been tampered with. To bring this back to quota files, the only way to validate that a quota file has not been tampered with is to run a quotacheck on the filesystem once it has been mounted. This requires visiting every inode in the filesystem, so it an expensive operation. Only XFS has this functionality in kernel, so for untrusted mounts we could simply run it on every mount that has quotas enabled. Of course, users won't care that mounting their filesystem now takes several minutes (hours, even, when we have millions of inodes in the fs) while these checks are run... Detecting malicious corruptions that specifically manipulate the on-disk structure within the bounds of format validity are difficult to detect and costly to protect against. We'd need to move large parts of fsck into the kernel and run it to validate every piece of metadata read into the kernel. Then we've got a much larger attack surface in the kernel (all the validity checking code needs to be robust against invalid structures, too!), a lot more complexity (more bugs!) and a lot of additional runtime overhead (slow filesystem = unhappy users!). It's just not a practical solution to the problem. > Between actually finding issues that can cause problems, and the > increased amount of kernel code executed (and thus the increase in > kernel attack surface) I am very paranoid about enabling code that > trusts data that could be poisoned data from a hostile party. > > At the same time I am very uncomfortable with the fact the kernel does > not protect against malicious disks and poisoned disk images. As > poisoned disk images are a well known exploit vector in the wild. A > well known and demonstrated attack vector that works is to leave a usb > stick in a public place, and helpful people will place it into their > computer to try and figure out who it belongs to. In trying to be > helpful their computer will unbeknownst to them start executing code > that does not serve the interests of the computer owner. I hate that we > can not currently protect people from shenanigans like that. Yes, we know all about these problems. Unfortunately someone appears to not be listening when they being repeatedly told that hardening all the kernel filesystem implementations against poisoned images is simply not a viable solution to the problem. Move the parsing of untrusted structures out of the kernel - work with the various filesystem teams to build viable FUSE implementations (where it's much easier to incorporate parts of the userspace fsck code) and provide the FUSE filesystems to container users wanting to mount their own filesystem images. Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem 2016-07-13 1:34 ` Dave Chinner @ 2016-07-13 3:45 ` Dave Chinner 2016-07-13 5:43 ` Jann Horn 1 sibling, 0 replies; 58+ messages in thread From: Dave Chinner @ 2016-07-13 3:45 UTC (permalink / raw) To: Eric W. Biederman Cc: Jan Kara, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Michael Kerrisk, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jann Horn, Djalal Harouni On Wed, Jul 13, 2016 at 11:34:36AM +1000, Dave Chinner wrote: > On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote: > > The place where I am concerned about thorough review and testing is > > someone poisoning quota files and then the kernel trying to use them. > > In the preliminary work we have done in other places in the kernel and > > for other filesystems there almost always winds up being some way to > > confuse the kernel and get it to misbave if you can poison the disk > > based inputs. As poison disk based inputs is not something filesystems > > are stronlgy concerned about. In most cases the disk the filesystem > > resides on is in the box and therefore under control of the OS at all > > times. Dave Chinner has even said he will never consider handling > > poisoned disk based inputs for XFS as the run time cost is too high. > > I didn't say that. I said that comprehensive checks to catch all > possible malicious inputs is too expensive to consider a viable > solution for allowing user-mounts of arbitrary filesystem images > through the kernel. [.....] > To bring this back to quota files, the only way to validate that a > quota file has not been tampered with is to run a quotacheck on the > filesystem once it has been mounted. This requires visiting every > inode in the filesystem, so it an expensive operation. Only XFS has > this functionality in kernel, so for untrusted mounts we could > simply run it on every mount that has quotas enabled. Of course, > users won't care that mounting their filesystem now takes several > minutes (hours, even, when we have millions of inodes in the fs) > while these checks are run... So, over lunch I realised the problem with this. quotacheck is verifying the contents of the quota file, but we haven't verified the structure of the quota file to begin with. Hence just enabling quotas could cause the filesystem to do bad things in the kernel on mount if the quota file metadata has been tampered with. IOWs, it's not just quota data parsing that we have to be concerned with here - parsing the quota file structure itself could be an attack vector that triggers on mount. Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem 2016-07-13 1:34 ` Dave Chinner 2016-07-13 3:45 ` Dave Chinner @ 2016-07-13 5:43 ` Jann Horn 2016-07-14 17:03 ` Eric W. Biederman 1 sibling, 1 reply; 58+ messages in thread From: Jann Horn @ 2016-07-13 5:43 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Linux API, Linux Containers, Andy Lutomirski, James Bottomley, Seth Forshee, Eric W. Biederman, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk, Djalal Harouni [-- Attachment #1.1: Type: text/plain, Size: 2537 bytes --] On Wed, Jul 13, 2016 at 11:34:36AM +1000, Dave Chinner wrote: > On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote: > > The place where I am concerned about thorough review and testing is > > someone poisoning quota files and then the kernel trying to use them. > > In the preliminary work we have done in other places in the kernel and > > for other filesystems there almost always winds up being some way to > > confuse the kernel and get it to misbave if you can poison the disk > > based inputs. As poison disk based inputs is not something filesystems > > are stronlgy concerned about. In most cases the disk the filesystem > > resides on is in the box and therefore under control of the OS at all > > times. Dave Chinner has even said he will never consider handling > > poisoned disk based inputs for XFS as the run time cost is too high. > > I didn't say that. I said that comprehensive checks to catch all > possible malicious inputs is too expensive to consider a viable > solution for allowing user-mounts of arbitrary filesystem images > through the kernel. > [...] > > To bring this back to quota files, the only way to validate that a > quota file has not been tampered with is to run a quotacheck on the > filesystem once it has been mounted. This requires visiting every > inode in the filesystem, so it an expensive operation. Only XFS has > this functionality in kernel, so for untrusted mounts we could > simply run it on every mount that has quotas enabled. Of course, > users won't care that mounting their filesystem now takes several > minutes (hours, even, when we have millions of inodes in the fs) > while these checks are run... > > Detecting malicious corruptions that specifically manipulate the > on-disk structure within the bounds of format validity are difficult > to detect and costly to protect against. We'd need to move large > parts of fsck into the kernel and run it to validate every piece of > metadata read into the kernel. Then we've got a much larger attack > surface in the kernel (all the validity checking code needs to be > robust against invalid structures, too!), a lot more complexity > (more bugs!) and a lot of additional runtime overhead (slow > filesystem = unhappy users!). It's just not a practical solution to > the problem. And ideally, you'd want to also guard against an evil disk that suddenly changes its contents after you've run fsck on it, and you can't easily do that without making things complicated. [-- Attachment #1.2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 205 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem 2016-07-13 5:43 ` Jann Horn @ 2016-07-14 17:03 ` Eric W. Biederman 0 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-14 17:03 UTC (permalink / raw) To: Jann Horn Cc: Dave Chinner, Jan Kara, Linux Containers, Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Michael Kerrisk Jann Horn <jann@thejh.net> writes: > On Wed, Jul 13, 2016 at 11:34:36AM +1000, Dave Chinner wrote: >> On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote: >> > The place where I am concerned about thorough review and testing is >> > someone poisoning quota files and then the kernel trying to use them. >> > In the preliminary work we have done in other places in the kernel and >> > for other filesystems there almost always winds up being some way to >> > confuse the kernel and get it to misbave if you can poison the disk >> > based inputs. As poison disk based inputs is not something filesystems >> > are stronlgy concerned about. In most cases the disk the filesystem >> > resides on is in the box and therefore under control of the OS at all >> > times. Dave Chinner has even said he will never consider handling >> > poisoned disk based inputs for XFS as the run time cost is too high. >> >> I didn't say that. I said that comprehensive checks to catch all >> possible malicious inputs is too expensive to consider a viable >> solution for allowing user-mounts of arbitrary filesystem images >> through the kernel. Dave you said that speaking as the XFS maintainer. So I take that to be your position and to refer to XFS. > [...] >> >> To bring this back to quota files, the only way to validate that a >> quota file has not been tampered with is to run a quotacheck on the >> filesystem once it has been mounted. This requires visiting every >> inode in the filesystem, so it an expensive operation. Only XFS has >> this functionality in kernel, so for untrusted mounts we could >> simply run it on every mount that has quotas enabled. Of course, >> users won't care that mounting their filesystem now takes several >> minutes (hours, even, when we have millions of inodes in the fs) >> while these checks are run... >> >> Detecting malicious corruptions that specifically manipulate the >> on-disk structure within the bounds of format validity are difficult >> to detect and costly to protect against. We'd need to move large >> parts of fsck into the kernel and run it to validate every piece of >> metadata read into the kernel. Then we've got a much larger attack >> surface in the kernel (all the validity checking code needs to be >> robust against invalid structures, too!), a lot more complexity >> (more bugs!) and a lot of additional runtime overhead (slow >> filesystem = unhappy users!). It's just not a practical solution to >> the problem. This critiqute as I read it confuses data integrity and safety from privilege escalation. If a filesystem image or it's backing store are malicious there is no need to be concerned about data integrity. > And ideally, you'd want to also guard against an evil disk that > suddenly changes its contents after you've run fsck on it, and you > can't easily do that without making things complicated. I agree an evil disk definitely should be part of any threat analysis. I also agree that anything that is complicated is not a practical as complicated means lots of code, and bugs are in general a function of the amount of code. At the same time my only concern when analyzing something for safety against a malicious filesystem is can the malicious data cause the kernel to misbehave. This includes things like stack overflows, and memory corruption. In the specific case of a quota file, if there is a quota file that has little to no resemblence to reality but the kernel doesn't misbehave, I don't think that is a problem from an unprivileged mount perspective. On the flip side if a malicious quota file allows an in kernel quota variable to go negative and that causes the kernel to misbehave that is a show stopper for allowing an unprivileged mount. Personally I see the quantity of code in current filesystems as making it hard to have a low enough probability of problems to allow unprivileged mounts. Changes for all of the weird cases a backing store for unprivileged mounts brings with it have been added to the VFS because that is the right place to implement them. Working at a filesystem independent level allows all of the right people involved in the review, and it allows clean and general solutions to the weird cases that come up with a uid or gid does not map into the kernel. All of that said the only filesystem with backing store that I see as a reasonable target to support right now is fuse. As fuse was designed from the get go to support filesystems from unprivileged users. Will fuse someday support quotas? I don't know. But the there is no extra cost to support that case in fs/quota/quota.c so I have added the necessary code. Eric ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH review 09/12] quota: Handle quota data stored in s_user_ns in quota_setxquota 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (5 preceding siblings ...) 2016-07-06 18:12 ` [PATCH review 08/12] quota: Ensure qids map to the filesystem Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 12/12] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk In Q_XSETQLIMIT use sb->s_user_ns to detect when we are dealing with the filesystems notion of id 0. Cc: Jan Kara <jack@suse.cz> Acked-by: Seth Forshee <seth.forshee@canonical.com> Inspired-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/quota/quota.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 73f6f4cf0a21..35df08ee9c97 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -584,7 +584,7 @@ static int quota_setxquota(struct super_block *sb, int type, qid_t id, if (!qid_has_mapping(sb->s_user_ns, qid)) return -EINVAL; /* Are we actually setting timer / warning limits for all users? */ - if (from_kqid(&init_user_ns, qid) == 0 && + if (from_kqid(sb->s_user_ns, qid) == 0 && fdq.d_fieldmask & (FS_DQ_WARNS_MASK | FS_DQ_TIMER_MASK)) { struct qc_info qinfo; int ret; -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH review 12/12] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman ` (6 preceding siblings ...) 2016-07-06 18:12 ` [PATCH review 09/12] quota: Handle quota data stored in s_user_ns in quota_setxquota Eric W. Biederman @ 2016-07-06 18:12 ` Eric W. Biederman 7 siblings, 0 replies; 58+ messages in thread From: Eric W. Biederman @ 2016-07-06 18:12 UTC (permalink / raw) To: Linux Containers Cc: Seth Forshee, linux-fsdevel, Linux API, James Bottomley, Djalal Harouni, Serge E. Hallyn, Andy Lutomirski, Jan Kara, Jann Horn, Michael Kerrisk From: Seth Forshee <seth.forshee@canonical.com> For filesystems mounted from a user namespace on-disk ids should be translated relative to s_users_ns rather than init_user_ns. When an id in the filesystem doesn't exist in s_user_ns the associated id in the inode will be set to INVALID_[UG]ID, which turns these into de facto "nobody" ids. This actually maps pretty well into the way most code already works, and those places where it didn't were fixed in previous patches. Moving forward vfs code needs to be careful to handle instances where ids in inodes may be invalid. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- include/linux/fs.h | 50 +++++++++++++++++++++++++------------------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index cb25ceb6d1ef..8aa9b72e0bc5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -831,31 +831,6 @@ static inline void i_size_write(struct inode *inode, loff_t i_size) #endif } -/* Helper functions so that in most cases filesystems will - * not need to deal directly with kuid_t and kgid_t and can - * instead deal with the raw numeric values that are stored - * in the filesystem. - */ -static inline uid_t i_uid_read(const struct inode *inode) -{ - return from_kuid(&init_user_ns, inode->i_uid); -} - -static inline gid_t i_gid_read(const struct inode *inode) -{ - return from_kgid(&init_user_ns, inode->i_gid); -} - -static inline void i_uid_write(struct inode *inode, uid_t uid) -{ - inode->i_uid = make_kuid(&init_user_ns, uid); -} - -static inline void i_gid_write(struct inode *inode, gid_t gid) -{ - inode->i_gid = make_kgid(&init_user_ns, gid); -} - static inline unsigned iminor(const struct inode *inode) { return MINOR(inode->i_rdev); @@ -1461,6 +1436,31 @@ struct super_block { struct list_head s_inodes; /* all inodes */ }; +/* Helper functions so that in most cases filesystems will + * not need to deal directly with kuid_t and kgid_t and can + * instead deal with the raw numeric values that are stored + * in the filesystem. + */ +static inline uid_t i_uid_read(const struct inode *inode) +{ + return from_kuid(inode->i_sb->s_user_ns, inode->i_uid); +} + +static inline gid_t i_gid_read(const struct inode *inode) +{ + return from_kgid(inode->i_sb->s_user_ns, inode->i_gid); +} + +static inline void i_uid_write(struct inode *inode, uid_t uid) +{ + inode->i_uid = make_kuid(inode->i_sb->s_user_ns, uid); +} + +static inline void i_gid_write(struct inode *inode, gid_t gid) +{ + inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid); +} + extern struct timespec current_fs_time(struct super_block *sb); /* -- 2.8.3 ^ permalink raw reply related [flat|nested] 58+ messages in thread
end of thread, other threads:[~2016-07-14 17:03 UTC | newest] Thread overview: 58+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-02 17:18 [PATCH review 0/11] General unprivileged mount support Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 05/11] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 07/11] vfs: Don't create inodes with a uid or gid unknown to the vfs Eric W. Biederman [not found] ` <20160702172035.19568-7-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-04 7:59 ` Jan Kara [not found] ` <20160704075919.GA5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2016-07-05 14:55 ` Eric W. Biederman [not found] ` <87zipwxhgp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 9:07 ` Jan Kara [not found] ` <20160706090705.GE14067-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2016-07-06 15:37 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 08/11] quota: Ensure qids map to the filesystem Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 10/11] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman [not found] ` <20160702172035.19568-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:20 ` [PATCH review 02/11] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 04/11] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 06/11] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 09/11] quota: Handle quota data stored in s_user_ns Eric W. Biederman [not found] ` <20160702172035.19568-9-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-02 17:33 ` [PATCH v2 " Eric W. Biederman [not found] ` <87mvm03pxy.fsf_-_-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-04 9:11 ` Jan Kara 2016-07-05 14:48 ` Seth Forshee 2016-07-05 15:34 ` Eric W. Biederman [not found] ` <87d1msumhy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-05 20:57 ` Dave Chinner 2016-07-05 21:28 ` Eric W. Biederman [not found] ` <8737nnrcyy.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 6:35 ` Dave Chinner 2016-07-06 8:25 ` Jan Kara 2016-07-06 17:51 ` Eric W. Biederman 2016-07-02 17:20 ` [PATCH review 11/11] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman 2016-07-04 8:52 ` [PATCH review 0/11] General unprivileged mount support Jan Kara [not found] ` <20160704085220.GC5200-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2016-07-04 16:27 ` Eric W. Biederman [not found] ` <87h9c52wsd.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 8:54 ` Jan Kara 2016-07-06 13:54 ` Seth Forshee 2016-07-06 14:22 ` Jan Kara 2016-07-06 14:46 ` Seth Forshee 2016-07-06 15:01 ` Eric W. Biederman 2016-07-06 15:23 ` James Bottomley [not found] ` <1467818630.2369.21.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2016-07-06 16:35 ` Eric W. Biederman [not found] ` <87ziq03qnj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 13:44 ` Andy Lutomirski [not found] ` <CALCETrVof174gPCZnD2Z-RMjR-P=NcA0mYCU9ki6=o9hpFL-BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-07-06 15:21 ` Eric W. Biederman 2016-07-06 14:01 ` Andy Lutomirski 2016-07-06 15:19 ` Eric W. Biederman 2016-07-06 18:10 ` [PATCH review 0/12] General unprivileged mount support v2 Eric W. Biederman [not found] ` <874m82bptc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-06 18:12 ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 03/12] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 04/12] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 05/12] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman [not found] ` <20160706181212.16267-1-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-06 18:12 ` [PATCH review 06/12] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 07/12] vfs: Don't create " Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns Eric W. Biederman 2016-07-11 10:09 ` Jan Kara 2016-07-06 18:12 ` [PATCH review 11/12] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 08/12] quota: Ensure qids map to the filesystem Eric W. Biederman [not found] ` <20160706181212.16267-8-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2016-07-11 10:14 ` Jan Kara [not found] ` <20160711101424.GH12410-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org> 2016-07-11 18:12 ` Eric W. Biederman [not found] ` <878tx8dowu.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-07-13 1:34 ` Dave Chinner 2016-07-13 3:45 ` Dave Chinner 2016-07-13 5:43 ` Jann Horn 2016-07-14 17:03 ` Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 09/12] quota: Handle quota data stored in s_user_ns in quota_setxquota Eric W. Biederman 2016-07-06 18:12 ` [PATCH review 12/12] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).