linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [manpages PATCH] capabilities.7: describe namespaced file capabilities
@ 2018-01-09 18:52 Serge E. Hallyn
       [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2018-01-16 17:26 ` Jann Horn
  0 siblings, 2 replies; 6+ messages in thread
From: Serge E. Hallyn @ 2018-01-09 18:52 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Eric W. Biederman
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook,
	Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan

Update the capabilities(7)  manpage with a description of the
new-ish namespaced file capability support.

A note on userspace tools:  since the kernel will automatically
convert between v2 and v3 xattrs, and translate nsroot between
v3 xattrs, we can make do with the current getcap(8) and setcap(8)
tools. I.e. a user on the host can create a transient user namespace
with the appropriate mappings and run setcap(8) there.  The kernel
will automatically write a v3 xattr with the transient namespace's
root user as nsroot.

Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
---
 man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/man7/capabilities.7 b/man7/capabilities.7
index 166eaaf..76e7e02 100644
--- a/man7/capabilities.7
+++ b/man7/capabilities.7
@@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
 then the effective flag must also be specified as enabled
 for all other capabilities for which the corresponding permitted or
 inheritable flags is enabled.
+.PP
+Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
+the capabilities to be applied to the file, with no record of the writer's
+credentials.  Therefore only privileged users can be trusted to write them, and
+.BR CAP_SETFCAP
+over the user namespace which mounted the filesystem (usually the initial user
+namespace) is required.  This makes it impossible to write file capabilities
+from a user namespaced container, which causes some package updates to fail.
+.PP
+In order to support setting file capabilities in containers, the
+kernel must be able to identify whether the task executing the
+file will be constrained to a subset of the resources over which
+the writer of the file capabilities has privilege.  To this end,
+since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
+of the root user in the writer's namespace ("nsroot").  Hence the writer only
+requires
+.IP 1.
+.BR CAP_SETFCAP
+over the file inode, meaning the writing task must have
+.BR CAP_SETFCAP
+over a user namespace into which the inode's owning user ID is mapped.
+.PP
+and
+.IP 2.
+.BR CAP_SETFCAP
+over the writer's own user namespace.
+.PP
+A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
+whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
+.PP
+Users with the required privilege may use
+.BR setxattr(2)
+to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
+The kernel will automatically convert a VFS_CAP_REVISION_2 to a
+VFS_CAP_REVISION_3 extended attribute with the "nsroot"
+set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
+extended attribute is specified, then the kernel will map the
+specified root user ID (which must be a valid user ID mapped in the caller's
+user namespace) into the initial user namespace.  Likewise,
+.BR getxattr(2)
+results will be converted and simplified to show a VFS_CAP_REVISION_2
+extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's
+namespace, or to map the VFS_CAP_REVISION_3 root user ID into the
+caller's namespace.
 .\"
 .SS Transformation of capabilities during execve()
 .PP
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities
       [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2018-01-14  9:40   ` Michael Kerrisk (man-pages)
  2018-01-15  4:31     ` Serge E. Hallyn
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2018-01-14  9:40 UTC (permalink / raw)
  To: Serge E. Hallyn, Eric W. Biederman
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook,
	Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan

Hello Serge,

On 01/09/2018 07:52 PM, Serge E. Hallyn wrote:
> Update the capabilities(7)  manpage with a description of the
> new-ish namespaced file capability support.

Thanks for this patch. I'm trying to craft a modified version
based on your text, so no need to send a new version at this
stage, but I do have some questions below.

> A note on userspace tools:  since the kernel will automatically
> convert between v2 and v3 xattrs, and translate nsroot between
> v3 xattrs, we can make do with the current getcap(8) and setcap(8)
> tools. I.e. a user on the host can create a transient user namespace
> with the appropriate mappings and run setcap(8) there.  The kernel
> will automatically write a v3 xattr with the transient namespace's
> root user as nsroot.
>
> Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> ---
>  man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/man7/capabilities.7 b/man7/capabilities.7
> index 166eaaf..76e7e02 100644
> --- a/man7/capabilities.7
> +++ b/man7/capabilities.7
> @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
>  then the effective flag must also be specified as enabled
>  for all other capabilities for which the corresponding permitted or
>  inheritable flags is enabled.
> +.PP
> +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
> +the capabilities to be applied to the file, with no record of the writer's
> +credentials.  Therefore only privileged users can be trusted to write them, and
> +.BR CAP_SETFCAP
> +over the user namespace which mounted the filesystem (usually the initial user
> +namespace) is required.  This makes it impossible to write file capabilities
> +from a user namespaced container, which causes some package updates to fail.
> +.PP
> +In order to support setting file capabilities in containers, the
> +kernel must be able to identify whether the task executing the
> +file will be constrained to a subset of the resources over which
> +the writer of the file capabilities has privilege.  To this end,
> +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
> +of the root user in the writer's namespace ("nsroot").

Here, "nsroot" means the UID 0 in the namespace as it would be mapped
into the initial userns, right?

> Hence the writer only
> +requires
> +.IP 1.
> +.BR CAP_SETFCAP
> +over the file inode, meaning the writing task must have
> +.BR CAP_SETFCAP
> +over a user namespace into which the inode's owning user ID is mapped.

I don't understand the above line. Could you explain with an example?

Cheers,

Michael

> +.PP
> +and
> +.IP 2.
> +.BR CAP_SETFCAP
> +over the writer's own user namespace.
> +.PP
> +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
> +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
> +.PP
> +Users with the required privilege may use
> +.BR setxattr(2)
> +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
> +The kernel will automatically convert a VFS_CAP_REVISION_2 to a
> +VFS_CAP_REVISION_3 extended attribute with the "nsroot"
> +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
> +extended attribute is specified, then the kernel will map the
> +specified root user ID (which must be a valid user ID mapped in the caller's
> +user namespace) into the initial user namespace.  Likewise,
> +.BR getxattr(2)
> +results will be converted and simplified to show a VFS_CAP_REVISION_2
> +extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's
> +namespace, or to map the VFS_CAP_REVISION_3 root user ID into the
> +caller's namespace.


>  .\"
>  .SS Transformation of capabilities during execve()
>  .PP
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities
  2018-01-14  9:40   ` Michael Kerrisk (man-pages)
@ 2018-01-15  4:31     ` Serge E. Hallyn
  0 siblings, 0 replies; 6+ messages in thread
From: Serge E. Hallyn @ 2018-01-15  4:31 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Serge E. Hallyn, Eric W. Biederman, linux-man, Seth Forshee,
	linux-api, linux-security-module, Kees Cook, Andreas Gruenbacher,
	Andy Lutomirski, Andrew G. Morgan

Quoting Michael Kerrisk (man-pages) (mtk.manpages@gmail.com):
> Hello Serge,
> 
> On 01/09/2018 07:52 PM, Serge E. Hallyn wrote:
> > Update the capabilities(7)  manpage with a description of the
> > new-ish namespaced file capability support.
> 
> Thanks for this patch. I'm trying to craft a modified version
> based on your text, so no need to send a new version at this
> stage, but I do have some questions below.

Awesome, thanks.

> > A note on userspace tools:  since the kernel will automatically
> > convert between v2 and v3 xattrs, and translate nsroot between
> > v3 xattrs, we can make do with the current getcap(8) and setcap(8)
> > tools. I.e. a user on the host can create a transient user namespace
> > with the appropriate mappings and run setcap(8) there.  The kernel
> > will automatically write a v3 xattr with the transient namespace's
> > root user as nsroot.
> >
> > Signed-off-by: Serge Hallyn <shallyn@cisco.com>
> > ---
> >  man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/man7/capabilities.7 b/man7/capabilities.7
> > index 166eaaf..76e7e02 100644
> > --- a/man7/capabilities.7
> > +++ b/man7/capabilities.7
> > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
> >  then the effective flag must also be specified as enabled
> >  for all other capabilities for which the corresponding permitted or
> >  inheritable flags is enabled.
> > +.PP
> > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
> > +the capabilities to be applied to the file, with no record of the writer's
> > +credentials.  Therefore only privileged users can be trusted to write them, and
> > +.BR CAP_SETFCAP
> > +over the user namespace which mounted the filesystem (usually the initial user
> > +namespace) is required.  This makes it impossible to write file capabilities
> > +from a user namespaced container, which causes some package updates to fail.
> > +.PP
> > +In order to support setting file capabilities in containers, the
> > +kernel must be able to identify whether the task executing the
> > +file will be constrained to a subset of the resources over which
> > +the writer of the file capabilities has privilege.  To this end,
> > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
> > +of the root user in the writer's namespace ("nsroot").
> 
> Here, "nsroot" means the UID 0 in the namespace as it would be mapped
> into the initial userns, right?

Right.  If we can come up with a better name that would be great.

> > Hence the writer only
> > +requires
> > +.IP 1.
> > +.BR CAP_SETFCAP
> > +over the file inode, meaning the writing task must have
> > +.BR CAP_SETFCAP
> > +over a user namespace into which the inode's owning user ID is mapped.
> 
> I don't understand the above line. Could you explain with an example?

If the file is owned by uid 1000, then uid 1000 can create a new user
ns in which 1000 is mapped to .  In this namespace, the new task has
CAP_SETFCAP over the user ns, and 1000 is mapped into the userns (as
0), so the write is allowed.

In the above example, if the xattr being written was v2, then the
actual written xattr will be v3 with nsroot=1000

If the xattr was v3, with nsroot=0, then nsroot=1000 will be written.

If the xattr was v3, with nsroot=500, where 500 is not mapped from
the userns, then the write will be forbidden.

As another allowed case, if I'm uid 1000 and setting up a container
where 100005 is mapped to uid 5;  I create a userns where hostuids
100000-165535 map to namespace uids 0-65535, then as root in the
namespace I have CAP_SETFCAP over the namespace, and 100005 is
mapped in the namespace, so I can write to the file.

As a final, nested example:  I'm uid 1000 and have uids 100000-300000
as my delegated subuids.  I create a container with that full range,
and am running as root there (100000).  Now I create a nested container
where 100000-165535 (which are really 200000-265535 on the host) will
be mapped to 0-65535.  In its rootfs I write /bin/ping with cap_net_raw=pe
and just for fun make it owned by nested uid 5.

So /bin/ping is owned by
	hostuid 200005 = c1 uid 100005 = c2 uid 5
As root in the container I have CAP_SETFCAP over a userns where c2 uid 5
is mapped, so I'm allowed to write a filecap.
If I write it as v2 xattr, then the actual written xattr will be v3 with
nsroot=100000, if I simply write it as root in c1, or nsroot=200000 if
I enter the nested container before writing it.
There are several more options, but let's just pick one - and assume that
as root in the first container (hostuid 100000) I request a v3 xattr
with nsroot=100000.  The actual written xattr will ahve nsroot=200000.
now when uid 1000 in the nested container runs /bin/ping, the kernel will
see that that task's user namespace has uid 0 mapped to 200000, and so
the fscap will be honored.

-serge

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities
  2018-01-09 18:52 [manpages PATCH] capabilities.7: describe namespaced file capabilities Serge E. Hallyn
       [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2018-01-16 17:26 ` Jann Horn
  2018-01-16 17:38   ` Serge E. Hallyn
  1 sibling, 1 reply; 6+ messages in thread
From: Jann Horn @ 2018-01-16 17:26 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Michael Kerrisk-manpages, Eric W. Biederman, linux-man,
	Seth Forshee, Linux API, linux-security-module, Kees Cook,
	Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan

On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> Update the capabilities(7)  manpage with a description of the
> new-ish namespaced file capability support.
>
> A note on userspace tools:  since the kernel will automatically
> convert between v2 and v3 xattrs, and translate nsroot between
> v3 xattrs, we can make do with the current getcap(8) and setcap(8)
> tools. I.e. a user on the host can create a transient user namespace
> with the appropriate mappings and run setcap(8) there.  The kernel
> will automatically write a v3 xattr with the transient namespace's
> root user as nsroot.
>
> Signed-off-by: Serge Hallyn <shallyn@cisco.com>
> ---
>  man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
>
> diff --git a/man7/capabilities.7 b/man7/capabilities.7
> index 166eaaf..76e7e02 100644
> --- a/man7/capabilities.7
> +++ b/man7/capabilities.7
> @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
>  then the effective flag must also be specified as enabled
>  for all other capabilities for which the corresponding permitted or
>  inheritable flags is enabled.
> +.PP
> +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
> +the capabilities to be applied to the file, with no record of the writer's
> +credentials.  Therefore only privileged users can be trusted to write them, and
> +.BR CAP_SETFCAP
> +over the user namespace which mounted the filesystem (usually the initial user
> +namespace) is required.  This makes it impossible to write file capabilities
> +from a user namespaced container, which causes some package updates to fail.
> +.PP
> +In order to support setting file capabilities in containers, the
> +kernel must be able to identify whether the task executing the
> +file will be constrained to a subset of the resources over which
> +the writer of the file capabilities has privilege.  To this end,
> +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
> +of the root user in the writer's namespace ("nsroot").  Hence the writer only
> +requires
> +.IP 1.
> +.BR CAP_SETFCAP
> +over the file inode, meaning the writing task must have
> +.BR CAP_SETFCAP
> +over a user namespace into which the inode's owning user ID is mapped.
> +.PP
> +and
> +.IP 2.
> +.BR CAP_SETFCAP
> +over the writer's own user namespace.

I think that the following would be clearer (but technically
equivalent): "Hence the writer only requires CAP_SETFCAP over the file
inode, meaning that the writing task must have CAP_SETFCAP in its own
user namespace and the UID and GID of the file inode must be mapped in
the writing task's user namespace.".

> +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
> +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
> +.PP
> +Users with the required privilege may use
> +.BR setxattr(2)
> +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
> +The kernel will automatically convert a VFS_CAP_REVISION_2 to a
> +VFS_CAP_REVISION_3 extended attribute with the "nsroot"
> +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
> +extended attribute is specified, then the kernel will map the
> +specified root user ID (which must be a valid user ID mapped in the caller's
> +user namespace) into the initial user namespace.

Really, "into the initial user namespace"? That may be true for the
kernel-internal representation, but the on-disk representation is the
mapping into the user namespace that contains the mount namespace into
which the file system was mounted, right? This would become observable
when a file system is mounted in a different namespace than before, or
when working with FUSE in a namespace.

> Likewise,
> +.BR getxattr(2)
> +results will be converted and simplified to show a VFS_CAP_REVISION_2
> +extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's
> +namespace, or to map the VFS_CAP_REVISION_3 root user ID into the
> +caller's namespace.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities
  2018-01-16 17:26 ` Jann Horn
@ 2018-01-16 17:38   ` Serge E. Hallyn
       [not found]     ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Serge E. Hallyn @ 2018-01-16 17:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: Serge E. Hallyn, Michael Kerrisk-manpages, Eric W. Biederman,
	linux-man, Seth Forshee, Linux API, linux-security-module,
	Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan

Quoting Jann Horn (jannh@google.com):
> On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > Update the capabilities(7)  manpage with a description of the
> > new-ish namespaced file capability support.
> >
> > A note on userspace tools:  since the kernel will automatically
> > convert between v2 and v3 xattrs, and translate nsroot between
> > v3 xattrs, we can make do with the current getcap(8) and setcap(8)
> > tools. I.e. a user on the host can create a transient user namespace
> > with the appropriate mappings and run setcap(8) there.  The kernel
> > will automatically write a v3 xattr with the transient namespace's
> > root user as nsroot.
> >
> > Signed-off-by: Serge Hallyn <shallyn@cisco.com>
> > ---
> >  man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> >
> > diff --git a/man7/capabilities.7 b/man7/capabilities.7
> > index 166eaaf..76e7e02 100644
> > --- a/man7/capabilities.7
> > +++ b/man7/capabilities.7
> > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
> >  then the effective flag must also be specified as enabled
> >  for all other capabilities for which the corresponding permitted or
> >  inheritable flags is enabled.
> > +.PP
> > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
> > +the capabilities to be applied to the file, with no record of the writer's
> > +credentials.  Therefore only privileged users can be trusted to write them, and
> > +.BR CAP_SETFCAP
> > +over the user namespace which mounted the filesystem (usually the initial user
> > +namespace) is required.  This makes it impossible to write file capabilities
> > +from a user namespaced container, which causes some package updates to fail.
> > +.PP
> > +In order to support setting file capabilities in containers, the
> > +kernel must be able to identify whether the task executing the
> > +file will be constrained to a subset of the resources over which
> > +the writer of the file capabilities has privilege.  To this end,
> > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
> > +of the root user in the writer's namespace ("nsroot").  Hence the writer only
> > +requires
> > +.IP 1.
> > +.BR CAP_SETFCAP
> > +over the file inode, meaning the writing task must have
> > +.BR CAP_SETFCAP
> > +over a user namespace into which the inode's owning user ID is mapped.
> > +.PP
> > +and
> > +.IP 2.
> > +.BR CAP_SETFCAP
> > +over the writer's own user namespace.
> 
> I think that the following would be clearer (but technically
> equivalent): "Hence the writer only requires CAP_SETFCAP over the file
> inode, meaning that the writing task must have CAP_SETFCAP in its own
> user namespace and the UID and GID of the file inode must be mapped in
> the writing task's user namespace.".

Looks good to me.

> > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
> > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
> > +.PP
> > +Users with the required privilege may use
> > +.BR setxattr(2)
> > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
> > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a
> > +VFS_CAP_REVISION_3 extended attribute with the "nsroot"
> > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
> > +extended attribute is specified, then the kernel will map the
> > +specified root user ID (which must be a valid user ID mapped in the caller's
> > +user namespace) into the initial user namespace.
> 
> Really, "into the initial user namespace"? That may be true for the
> kernel-internal representation, but the on-disk representation is the
> mapping into the user namespace that contains the mount namespace into
> which the file system was mounted, right?

Ah, yes, it is.

>  This would become observable
> when a file system is mounted in a different namespace than before, or
> when working with FUSE in a namespace.

Yes it would.

Michael, you said you were reworking it, do you mind working this into
it as well?

thanks Jann,
-serge

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities
       [not found]     ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2018-01-17 23:44       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2018-01-17 23:44 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Jann Horn, Eric W. Biederman, linux-man, Seth Forshee, Linux API,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook,
	Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan

On 16 January 2018 at 18:38, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Jann Horn (jannh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):
>> On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>> > Update the capabilities(7)  manpage with a description of the
>> > new-ish namespaced file capability support.
>> >
>> > A note on userspace tools:  since the kernel will automatically
>> > convert between v2 and v3 xattrs, and translate nsroot between
>> > v3 xattrs, we can make do with the current getcap(8) and setcap(8)
>> > tools. I.e. a user on the host can create a transient user namespace
>> > with the appropriate mappings and run setcap(8) there.  The kernel
>> > will automatically write a v3 xattr with the transient namespace's
>> > root user as nsroot.
>> >
>> > Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
>> > ---
>> >  man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>> >  1 file changed, 44 insertions(+)
>> >
>> > diff --git a/man7/capabilities.7 b/man7/capabilities.7
>> > index 166eaaf..76e7e02 100644
>> > --- a/man7/capabilities.7
>> > +++ b/man7/capabilities.7
>> > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
>> >  then the effective flag must also be specified as enabled
>> >  for all other capabilities for which the corresponding permitted or
>> >  inheritable flags is enabled.
>> > +.PP
>> > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported.  These store only
>> > +the capabilities to be applied to the file, with no record of the writer's
>> > +credentials.  Therefore only privileged users can be trusted to write them, and
>> > +.BR CAP_SETFCAP
>> > +over the user namespace which mounted the filesystem (usually the initial user
>> > +namespace) is required.  This makes it impossible to write file capabilities
>> > +from a user namespaced container, which causes some package updates to fail.
>> > +.PP
>> > +In order to support setting file capabilities in containers, the
>> > +kernel must be able to identify whether the task executing the
>> > +file will be constrained to a subset of the resources over which
>> > +the writer of the file capabilities has privilege.  To this end,
>> > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
>> > +of the root user in the writer's namespace ("nsroot").  Hence the writer only
>> > +requires
>> > +.IP 1.
>> > +.BR CAP_SETFCAP
>> > +over the file inode, meaning the writing task must have
>> > +.BR CAP_SETFCAP
>> > +over a user namespace into which the inode's owning user ID is mapped.
>> > +.PP
>> > +and
>> > +.IP 2.
>> > +.BR CAP_SETFCAP
>> > +over the writer's own user namespace.
>>
>> I think that the following would be clearer (but technically
>> equivalent): "Hence the writer only requires CAP_SETFCAP over the file
>> inode, meaning that the writing task must have CAP_SETFCAP in its own
>> user namespace and the UID and GID of the file inode must be mapped in
>> the writing task's user namespace.".
>
> Looks good to me.
>
>> > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
>> > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
>> > +.PP
>> > +Users with the required privilege may use
>> > +.BR setxattr(2)
>> > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
>> > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a
>> > +VFS_CAP_REVISION_3 extended attribute with the "nsroot"
>> > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
>> > +extended attribute is specified, then the kernel will map the
>> > +specified root user ID (which must be a valid user ID mapped in the caller's
>> > +user namespace) into the initial user namespace.
>>
>> Really, "into the initial user namespace"? That may be true for the
>> kernel-internal representation, but the on-disk representation is the
>> mapping into the user namespace that contains the mount namespace into
>> which the file system was mounted, right?
>
> Ah, yes, it is.
>
>>  This would become observable
>> when a file system is mounted in a different namespace than before, or
>> when working with FUSE in a namespace.
>
> Yes it would.
>
> Michael, you said you were reworking it, do you mind working this into
> it as well?

Yes, I'll do that. It may be a couple of weeks before I get some more
cycles for this, however.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-01-17 23:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 18:52 [manpages PATCH] capabilities.7: describe namespaced file capabilities Serge E. Hallyn
     [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-14  9:40   ` Michael Kerrisk (man-pages)
2018-01-15  4:31     ` Serge E. Hallyn
2018-01-16 17:26 ` Jann Horn
2018-01-16 17:38   ` Serge E. Hallyn
     [not found]     ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-17 23:44       ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).