From: "Serge E. Hallyn" <serge@hallyn.com>
To: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Jann Horn <jann@thejh.net>,
"Serge E. Hallyn" <serge.hallyn@ubuntu.com>,
Linux API <linux-api@vger.kernel.org>,
Linux Containers <containers@lists.linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
"Andrew G. Morgan" <morgan@kernel.org>
Subject: Re: [PATCH 1/1] simplified security.nscapability xattr
Date: Tue, 26 Apr 2016 23:39:24 -0500 [thread overview]
Message-ID: <20160427043923.GA25140@mail.hallyn.com> (raw)
In-Reply-To: <CAGXu5jJbmSKst_RiM84-7OaX=2XettzpTh34uFFoevvoPRO76Q@mail.gmail.com>
Quoting Kees Cook (keescook@chromium.org):
> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > Quoting Kees Cook (keescook@chromium.org):
> >> On Fri, Apr 22, 2016 at 10:26 AM, <serge.hallyn@ubuntu.com> wrote:
> >> > From: Serge Hallyn <serge.hallyn@ubuntu.com>
> >> >
> >> > This can only be set by root in his own namespace, and will
> >> > only be respected by namespaces with that same root kuid
> >> > mapped as root, or namespaces descended from it.
> >> >
> >> > This allows a simple setxattr to work, allows tar/untar to
> >> > work, and allows us to tar in one namespace and untar in
> >> > another while preserving the capability, without risking
> >> > leaking privilege into a parent namespace.
> >>
> >> The concept seems sensible to me. Various notes below...
> >>
> >> >
> >> > Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> >> > ---
> >> > include/linux/capability.h | 5 ++-
> >> > include/uapi/linux/capability.h | 18 ++++++++
> >> > include/uapi/linux/xattr.h | 3 ++
> >> > security/commoncap.c | 91 +++++++++++++++++++++++++++++++++++++--
> >> > 4 files changed, 112 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> > index 00690ff..cf533ff 100644
> >> > --- a/include/linux/capability.h
> >> > +++ b/include/linux/capability.h
> >> > @@ -13,7 +13,7 @@
> >> > #define _LINUX_CAPABILITY_H
> >> >
> >> > #include <uapi/linux/capability.h>
> >> > -
> >> > +#include <linux/uidgid.h>
> >> >
> >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
> >> > #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3
> >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data {
> >> > kernel_cap_t inheritable;
> >> > };
> >> >
> >> > +#define NS_CAPS_VERSION(x) (x & 0xFF)
> >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF)
> >> > +
> >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct))
> >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t))
> >> >
> >> > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> >> > index 12c37a1..f0b4a66 100644
> >> > --- a/include/uapi/linux/capability.h
> >> > +++ b/include/uapi/linux/capability.h
> >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct {
> >> > #define VFS_CAP_U32_2 2
> >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2))
> >> >
> >> > +/* version number for security.nscapability xattrs hdr->hdr_info */
> >> > +#define VFS_NS_CAP_REVISION 1
> >> > +
> >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2
> >> > #define VFS_CAP_U32 VFS_CAP_U32_2
> >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2
> >> > @@ -74,6 +77,21 @@ struct vfs_cap_data {
> >> > } data[VFS_CAP_U32];
> >> > };
> >> >
> >> > +#define VFS_NS_CAP_EFFECTIVE 0x1
> >> > +/*
> >> > + * 32-bit hdr_info contains
> >> > + * 16 leftmost: reserved
> >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far)
> >> > + * last 8: version
> >> > + */
> >> > +struct vfs_ns_cap_data {
> >> > + __le32 magic_etc;
> >> > + struct {
> >> > + __le32 permitted; /* Little endian */
> >> > + __le32 inheritable; /* Little endian */
> >> > + } data[VFS_CAP_U32];
> >> > +};
> >>
> >> This is identical to vfs_cap_data. Is there a reason not to re-use the
> >> existing one?
> >
> > Hm. I used to have them completely different. ATM the only difference
> > is what goes into the magic_etc, and that not very (different). So
> > yeah it probably should be re-used.
> >
> >> > +
> >> > #ifndef __KERNEL__
> >> >
> >> > /*
> >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
> >> > index 1590c49..67c80ab 100644
> >> > --- a/include/uapi/linux/xattr.h
> >> > +++ b/include/uapi/linux/xattr.h
> >> > @@ -68,6 +68,9 @@
> >> > #define XATTR_CAPS_SUFFIX "capability"
> >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
> >> >
> >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability"
> >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX
> >> > +
> >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access"
> >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX XATTR_POSIX_ACL_ACCESS
> >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default"
> >>
> >> Are these documented anywhere in Documentation/ or in man pages? This
> >> seems like it'd need a man-page update too.
> >
> > Yeah, if we decide we're ok with this strategy I'll update those.
> >
> >> > diff --git a/security/commoncap.c b/security/commoncap.c
> >> > index 48071ed..8f3f34a 100644
> >> > --- a/security/commoncap.c
> >> > +++ b/security/commoncap.c
> >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry)
> >> > if (!inode->i_op->getxattr)
> >> > return 0;
> >> >
> >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0);
> >> > + if (error > 0)
> >> > + return 1;
> >> > +
> >> > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
> >> > if (error <= 0)
> >> > return 0;
> >>
> >> I think this might be more readable if the getxattr calls were
> >> standardized (one returns 1, the other returns 0, with inverted tests
> >> -- hard to read). IIUC, if either returns > 0, return 1, otherwise
> >> return 0.
> >
> > Hm. Yeah I can see where that would be confusing. I can change the flow
> > to make the checks the same.
> >
> >> > @@ -330,11 +334,17 @@ int cap_inode_need_killpriv(struct dentry *dentry)
> >> > int cap_inode_killpriv(struct dentry *dentry)
> >> > {
> >> > struct inode *inode = d_backing_inode(dentry);
> >> > + int ret1, ret2;
> >> >
> >> > if (!inode->i_op->removexattr)
> >> > return 0;
> >> >
> >> > - return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS);
> >> > + ret1 = inode->i_op->removexattr(dentry, XATTR_NAME_CAPS);
> >> > + ret2 = inode->i_op->removexattr(dentry, XATTR_NAME_NS_CAPS);
> >> > +
> >> > + if (ret1 != 0)
> >> > + return ret1;
> >> > + return ret2;
> >> > }
> >> >
> >> > /*
> >> > @@ -438,6 +448,65 @@ int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data
> >> > return 0;
> >> > }
> >> >
> >> > +int get_vfs_ns_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps)
> >> > +{
> >> > + struct inode *inode = d_backing_inode(dentry);
> >> > + unsigned i;
> >> > + u32 magic_etc;
> >> > + ssize_t size;
> >> > + struct vfs_ns_cap_data nscap;
> >> > + bool foundroot = false;
> >> > + struct user_namespace *ns;
> >> > +
> >> > + memset(cpu_caps, 0, sizeof(struct cpu_vfs_cap_data));
> >> > +
> >> > + if (!inode || !inode->i_op->getxattr)
> >> > + return -ENODATA;
> >> > +
> >> > + /* verify that current or ancestor userns root owns this file */
> >> > + for (ns = current_user_ns(); ; ns = ns->parent) {
> >> > + if (from_kuid(ns, dentry->d_inode->i_uid) == 0) {
> >> > + foundroot = true;
> >> > + break;
> >> > + }
> >> > + if (ns == &init_user_ns)
> >> > + break;
> >> > + }
> >> > + if (!foundroot)
> >> > + return -ENODATA;
> >> > +
> >> > + size = inode->i_op->getxattr((struct dentry *)dentry, XATTR_NAME_NS_CAPS,
> >> > + &nscap, sizeof(nscap));
> >> > + if (size == -ENODATA || size == -EOPNOTSUPP)
> >> > + /* no data, that's ok */
> >> > + return -ENODATA;
> >> > + if (size < 0)
> >> > + return size;
> >> > + if (size != sizeof(nscap))
> >> > + return -EINVAL;
> >> > +
> >> > + magic_etc = le32_to_cpu(nscap.magic_etc);
> >> > +
> >> > + if (NS_CAPS_VERSION(magic_etc) != VFS_NS_CAP_REVISION)
> >> > + return -EINVAL;
> >> > +
> >> > + cpu_caps->magic_etc = VFS_CAP_REVISION_2;
> >> > + if (NS_CAPS_FLAGS(magic_etc) & VFS_NS_CAP_EFFECTIVE)
> >> > + cpu_caps->magic_etc |= VFS_CAP_FLAGS_EFFECTIVE;
> >> > + /* copy the entry */
> >> > + CAP_FOR_EACH_U32(i) {
> >> > + if (i >= VFS_CAP_U32_2)
> >> > + break;
> >> > + cpu_caps->permitted.cap[i] = le32_to_cpu(nscap.data[i].permitted);
> >> > + cpu_caps->inheritable.cap[i] = le32_to_cpu(nscap.data[i].inheritable);
> >> > + }
> >> > +
> >> > + cpu_caps->permitted.cap[CAP_LAST_U32] &= CAP_LAST_U32_VALID_MASK;
> >> > + cpu_caps->inheritable.cap[CAP_LAST_U32] &= CAP_LAST_U32_VALID_MASK;
> >> > +
> >> > + return 0;
> >> > +}
> >> > +
> >> > /*
> >> > * Attempt to get the on-exec apply capability sets for an executable file from
> >> > * its xattrs and, if present, apply them to the proposed credentials being
> >> > @@ -456,11 +525,13 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_c
> >> > if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> >> > return 0;
> >> >
> >> > - rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
> >> > + rc = get_vfs_ns_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
> >> > + if (rc == -ENODATA)
> >> > + rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
> >>
> >> So nscaps overrides a "regular" file cap? That might seem worth
> >> mentioning in the change log or somewhere.
> >
> > If it applies, yes. Now maybe that's not what we want. Maybe so long as
> > a regular one exists (which must have been set by the init_user_ns admin)
> > we should use it? I think that makes sense, what do you think?
>
> I struggle to see why the non-ns fscap should be honored... this is
Mainly because that's how it is right now - if there is a security.capability
then it is honored in all namespaces. Changing that *could* change the
behavior which some poor schlep is depending on. And in general we don't
do that.
> effectively fscap stacking, and there's no way that I see for the cap
> to leak "up" to init_user_ns. It only leaks up to nested user_nses...
> but they need to already be priv-equiv. Hmmm.
>
> I'm CC'ing Jann who like these mind-benders... :)
>
> >
> > (In either case agreed it should be clearly documented)
> >
> >> > if (rc < 0) {
> >> > if (rc == -EINVAL)
> >> > - printk(KERN_NOTICE "%s: get_vfs_caps_from_disk returned %d for %s\n",
> >> > - __func__, rc, bprm->filename);
> >> > + printk(KERN_NOTICE "Invalid argument reading file caps for %s\n",
> >> > + bprm->filename);
> >> > else if (rc == -ENODATA)
> >> > rc = 0;
> >> > goto out;
> >> > @@ -662,6 +733,12 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
> >> > return 0;
> >> > }
> >> >
> >> > + if (!strcmp(name, XATTR_NAME_NS_CAPS)) {
> >> > + if (!capable_wrt_inode_uidgid(dentry->d_inode, CAP_SETFCAP))
> >> > + return -EPERM;
> >> > + return 0;
> >> > + }
> >> > +
> >> > if (!strncmp(name, XATTR_SECURITY_PREFIX,
> >> > sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> >> > !capable(CAP_SYS_ADMIN))
> >> > @@ -688,6 +765,12 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
> >> > return 0;
> >> > }
> >> >
> >> > + if (!strcmp(name, XATTR_NAME_NS_CAPS)) {
> >> > + if (!capable_wrt_inode_uidgid(dentry->d_inode, CAP_SETFCAP))
> >> > + return -EPERM;
> >> > + return 0;
> >> > + }
> >> > +
> >>
> >> This looks like userspace must knowingly be aware that it is in a
> >> namespace and to DTRT instead of it being translated by the kernel
> >> when setxattr is called under !init_user_ns?
> >
> > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that
> > shows you are in init_user_ns then it uses security.capability, otherwise
> > it uses security.nscapability.
> >
> > I've occasionally considered having the xattr code do the quiet
> > substitution if need be.
> >
> > In fact, much of this structure comes from when I was still trying to
> > do multiple values per xattr. Given what we're doing here, we could
> > keep the xattr contents exactly the same, just changing the name.
> > So userspace could just get and set security.capability; if you are
> > in a non-init user_ns, if security.capability is set then you cannot
> > set it; if security.capability is not set, then the kernel writes
> > security.nscapability instead and returns success.
> >
> > I don't like magic, but this might be just straightforward enough
> > to not be offensive. Thoughts?
>
> Yeah, I think it might be better to have the magic in this case, since
> it seems weird to just reject setxattr if a tool didn't realize it was
> in a namespace. I'm not sure -- it is also nice to have an explicit
> API here.
>
> I would defer to Eric or Michael on that. I keep going back and forth,
> though I suspect it's probably best to do what you already have
> (explicit API).
>
> -Kees
>
> >
> >> > if (!strncmp(name, XATTR_SECURITY_PREFIX,
> >> > sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> >> > !capable(CAP_SYS_ADMIN))
> >> > --
> >> > 1.7.9.5
> >> >
> >>
> >>
> >>
> >> --
> >> Kees Cook
> >> Chrome OS & Brillo Security
> >> _______________________________________________
> >> Containers mailing list
> >> Containers@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/containers
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security
next prev parent reply other threads:[~2016-04-27 4:39 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-22 17:26 namespaced file capabilities serge.hallyn
2016-04-22 17:26 ` [PATCH 1/1] simplified security.nscapability xattr serge.hallyn
2016-04-26 19:46 ` Seth Forshee
2016-04-26 21:59 ` Kees Cook
2016-04-26 22:26 ` Serge E. Hallyn
2016-04-26 22:39 ` Kees Cook
2016-04-27 4:39 ` Serge E. Hallyn [this message]
2016-04-27 8:09 ` Jann Horn
2016-05-02 3:54 ` Serge E. Hallyn
2016-05-02 18:31 ` Michael Kerrisk (man-pages)
2016-05-02 21:31 ` Eric W. Biederman
[not found] ` <CALQRfL7mfpyudWs4Z8W5Zi8CTG-9O0OvrCnRU7pk0MXtsLBd0A@mail.gmail.com>
2016-05-03 4:50 ` Eric W. Biederman
2016-05-10 19:00 ` Serge E. Hallyn
2016-05-03 5:19 ` Serge E. Hallyn
2016-05-03 5:54 ` Eric W. Biederman
2016-05-03 14:25 ` Serge E. Hallyn
2016-05-10 19:03 ` Serge E. Hallyn
2016-05-07 23:10 ` Jann Horn
2016-05-11 21:02 ` Serge E. Hallyn
2016-05-16 21:15 ` Serge E. Hallyn
2016-05-16 21:48 ` Serge E. Hallyn
2016-05-18 21:57 ` [PATCH RFC] user-namespaced file capabilities - now with more magic Serge E. Hallyn
2016-05-19 20:53 ` Mimi Zohar
2016-05-20 3:40 ` Serge E. Hallyn
2016-05-20 11:19 ` Mimi Zohar
2016-05-20 18:28 ` Eric W. Biederman
2016-05-20 19:09 ` Mimi Zohar
2016-05-20 19:11 ` Eric W. Biederman
2016-05-20 19:26 ` Serge E. Hallyn
2016-05-20 19:42 ` Eric W. Biederman
2016-05-20 19:59 ` Serge E. Hallyn
2016-05-20 23:23 ` Mimi Zohar
2016-05-20 23:32 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160427043923.GA25140@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=containers@lists.linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=jann@thejh.net \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=morgan@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=serge.hallyn@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox