* [manpages PATCH] capabilities.7: describe namespaced file capabilities
@ 2018-01-09 18:52 Serge E. Hallyn
[not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-16 17:26 ` Jann Horn
0 siblings, 2 replies; 6+ messages in thread
From: Serge E. Hallyn @ 2018-01-09 18:52 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Eric W. Biederman
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Seth Forshee,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook,
Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan
Update the capabilities(7) manpage with a description of the
new-ish namespaced file capability support.
A note on userspace tools: since the kernel will automatically
convert between v2 and v3 xattrs, and translate nsroot between
v3 xattrs, we can make do with the current getcap(8) and setcap(8)
tools. I.e. a user on the host can create a transient user namespace
with the appropriate mappings and run setcap(8) there. The kernel
will automatically write a v3 xattr with the transient namespace's
root user as nsroot.
Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
---
man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/man7/capabilities.7 b/man7/capabilities.7
index 166eaaf..76e7e02 100644
--- a/man7/capabilities.7
+++ b/man7/capabilities.7
@@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability,
then the effective flag must also be specified as enabled
for all other capabilities for which the corresponding permitted or
inheritable flags is enabled.
+.PP
+Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only
+the capabilities to be applied to the file, with no record of the writer's
+credentials. Therefore only privileged users can be trusted to write them, and
+.BR CAP_SETFCAP
+over the user namespace which mounted the filesystem (usually the initial user
+namespace) is required. This makes it impossible to write file capabilities
+from a user namespaced container, which causes some package updates to fail.
+.PP
+In order to support setting file capabilities in containers, the
+kernel must be able to identify whether the task executing the
+file will be constrained to a subset of the resources over which
+the writer of the file capabilities has privilege. To this end,
+since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID
+of the root user in the writer's namespace ("nsroot"). Hence the writer only
+requires
+.IP 1.
+.BR CAP_SETFCAP
+over the file inode, meaning the writing task must have
+.BR CAP_SETFCAP
+over a user namespace into which the inode's owning user ID is mapped.
+.PP
+and
+.IP 2.
+.BR CAP_SETFCAP
+over the writer's own user namespace.
+.PP
+A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace
+whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace.
+.PP
+Users with the required privilege may use
+.BR setxattr(2)
+to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write.
+The kernel will automatically convert a VFS_CAP_REVISION_2 to a
+VFS_CAP_REVISION_3 extended attribute with the "nsroot"
+set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3
+extended attribute is specified, then the kernel will map the
+specified root user ID (which must be a valid user ID mapped in the caller's
+user namespace) into the initial user namespace. Likewise,
+.BR getxattr(2)
+results will be converted and simplified to show a VFS_CAP_REVISION_2
+extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's
+namespace, or to map the VFS_CAP_REVISION_3 root user ID into the
+caller's namespace.
.\"
.SS Transformation of capabilities during execve()
.PP
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread[parent not found: <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2018-01-14 9:40 ` Michael Kerrisk (man-pages) 2018-01-15 4:31 ` Serge E. Hallyn 0 siblings, 1 reply; 6+ messages in thread From: Michael Kerrisk (man-pages) @ 2018-01-14 9:40 UTC (permalink / raw) To: Serge E. Hallyn, Eric W. Biederman Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA, Seth Forshee, linux-api-u79uwXL29TY76Z2rM5mHXA, linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan Hello Serge, On 01/09/2018 07:52 PM, Serge E. Hallyn wrote: > Update the capabilities(7) manpage with a description of the > new-ish namespaced file capability support. Thanks for this patch. I'm trying to craft a modified version based on your text, so no need to send a new version at this stage, but I do have some questions below. > A note on userspace tools: since the kernel will automatically > convert between v2 and v3 xattrs, and translate nsroot between > v3 xattrs, we can make do with the current getcap(8) and setcap(8) > tools. I.e. a user on the host can create a transient user namespace > with the appropriate mappings and run setcap(8) there. The kernel > will automatically write a v3 xattr with the transient namespace's > root user as nsroot. > > Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> > --- > man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/man7/capabilities.7 b/man7/capabilities.7 > index 166eaaf..76e7e02 100644 > --- a/man7/capabilities.7 > +++ b/man7/capabilities.7 > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability, > then the effective flag must also be specified as enabled > for all other capabilities for which the corresponding permitted or > inheritable flags is enabled. > +.PP > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only > +the capabilities to be applied to the file, with no record of the writer's > +credentials. Therefore only privileged users can be trusted to write them, and > +.BR CAP_SETFCAP > +over the user namespace which mounted the filesystem (usually the initial user > +namespace) is required. This makes it impossible to write file capabilities > +from a user namespaced container, which causes some package updates to fail. > +.PP > +In order to support setting file capabilities in containers, the > +kernel must be able to identify whether the task executing the > +file will be constrained to a subset of the resources over which > +the writer of the file capabilities has privilege. To this end, > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID > +of the root user in the writer's namespace ("nsroot"). Here, "nsroot" means the UID 0 in the namespace as it would be mapped into the initial userns, right? > Hence the writer only > +requires > +.IP 1. > +.BR CAP_SETFCAP > +over the file inode, meaning the writing task must have > +.BR CAP_SETFCAP > +over a user namespace into which the inode's owning user ID is mapped. I don't understand the above line. Could you explain with an example? Cheers, Michael > +.PP > +and > +.IP 2. > +.BR CAP_SETFCAP > +over the writer's own user namespace. > +.PP > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace. > +.PP > +Users with the required privilege may use > +.BR setxattr(2) > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write. > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a > +VFS_CAP_REVISION_3 extended attribute with the "nsroot" > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3 > +extended attribute is specified, then the kernel will map the > +specified root user ID (which must be a valid user ID mapped in the caller's > +user namespace) into the initial user namespace. Likewise, > +.BR getxattr(2) > +results will be converted and simplified to show a VFS_CAP_REVISION_2 > +extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's > +namespace, or to map the VFS_CAP_REVISION_3 root user ID into the > +caller's namespace. > .\" > .SS Transformation of capabilities during execve() > .PP > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities 2018-01-14 9:40 ` Michael Kerrisk (man-pages) @ 2018-01-15 4:31 ` Serge E. Hallyn 0 siblings, 0 replies; 6+ messages in thread From: Serge E. Hallyn @ 2018-01-15 4:31 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Serge E. Hallyn, Eric W. Biederman, linux-man, Seth Forshee, linux-api, linux-security-module, Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan Quoting Michael Kerrisk (man-pages) (mtk.manpages@gmail.com): > Hello Serge, > > On 01/09/2018 07:52 PM, Serge E. Hallyn wrote: > > Update the capabilities(7) manpage with a description of the > > new-ish namespaced file capability support. > > Thanks for this patch. I'm trying to craft a modified version > based on your text, so no need to send a new version at this > stage, but I do have some questions below. Awesome, thanks. > > A note on userspace tools: since the kernel will automatically > > convert between v2 and v3 xattrs, and translate nsroot between > > v3 xattrs, we can make do with the current getcap(8) and setcap(8) > > tools. I.e. a user on the host can create a transient user namespace > > with the appropriate mappings and run setcap(8) there. The kernel > > will automatically write a v3 xattr with the transient namespace's > > root user as nsroot. > > > > Signed-off-by: Serge Hallyn <shallyn@cisco.com> > > --- > > man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 44 insertions(+) > > > > diff --git a/man7/capabilities.7 b/man7/capabilities.7 > > index 166eaaf..76e7e02 100644 > > --- a/man7/capabilities.7 > > +++ b/man7/capabilities.7 > > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability, > > then the effective flag must also be specified as enabled > > for all other capabilities for which the corresponding permitted or > > inheritable flags is enabled. > > +.PP > > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only > > +the capabilities to be applied to the file, with no record of the writer's > > +credentials. Therefore only privileged users can be trusted to write them, and > > +.BR CAP_SETFCAP > > +over the user namespace which mounted the filesystem (usually the initial user > > +namespace) is required. This makes it impossible to write file capabilities > > +from a user namespaced container, which causes some package updates to fail. > > +.PP > > +In order to support setting file capabilities in containers, the > > +kernel must be able to identify whether the task executing the > > +file will be constrained to a subset of the resources over which > > +the writer of the file capabilities has privilege. To this end, > > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID > > +of the root user in the writer's namespace ("nsroot"). > > Here, "nsroot" means the UID 0 in the namespace as it would be mapped > into the initial userns, right? Right. If we can come up with a better name that would be great. > > Hence the writer only > > +requires > > +.IP 1. > > +.BR CAP_SETFCAP > > +over the file inode, meaning the writing task must have > > +.BR CAP_SETFCAP > > +over a user namespace into which the inode's owning user ID is mapped. > > I don't understand the above line. Could you explain with an example? If the file is owned by uid 1000, then uid 1000 can create a new user ns in which 1000 is mapped to . In this namespace, the new task has CAP_SETFCAP over the user ns, and 1000 is mapped into the userns (as 0), so the write is allowed. In the above example, if the xattr being written was v2, then the actual written xattr will be v3 with nsroot=1000 If the xattr was v3, with nsroot=0, then nsroot=1000 will be written. If the xattr was v3, with nsroot=500, where 500 is not mapped from the userns, then the write will be forbidden. As another allowed case, if I'm uid 1000 and setting up a container where 100005 is mapped to uid 5; I create a userns where hostuids 100000-165535 map to namespace uids 0-65535, then as root in the namespace I have CAP_SETFCAP over the namespace, and 100005 is mapped in the namespace, so I can write to the file. As a final, nested example: I'm uid 1000 and have uids 100000-300000 as my delegated subuids. I create a container with that full range, and am running as root there (100000). Now I create a nested container where 100000-165535 (which are really 200000-265535 on the host) will be mapped to 0-65535. In its rootfs I write /bin/ping with cap_net_raw=pe and just for fun make it owned by nested uid 5. So /bin/ping is owned by hostuid 200005 = c1 uid 100005 = c2 uid 5 As root in the container I have CAP_SETFCAP over a userns where c2 uid 5 is mapped, so I'm allowed to write a filecap. If I write it as v2 xattr, then the actual written xattr will be v3 with nsroot=100000, if I simply write it as root in c1, or nsroot=200000 if I enter the nested container before writing it. There are several more options, but let's just pick one - and assume that as root in the first container (hostuid 100000) I request a v3 xattr with nsroot=100000. The actual written xattr will ahve nsroot=200000. now when uid 1000 in the nested container runs /bin/ping, the kernel will see that that task's user namespace has uid 0 mapped to 200000, and so the fscap will be honored. -serge ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities 2018-01-09 18:52 [manpages PATCH] capabilities.7: describe namespaced file capabilities Serge E. Hallyn [not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2018-01-16 17:26 ` Jann Horn 2018-01-16 17:38 ` Serge E. Hallyn 1 sibling, 1 reply; 6+ messages in thread From: Jann Horn @ 2018-01-16 17:26 UTC (permalink / raw) To: Serge E. Hallyn Cc: Michael Kerrisk-manpages, Eric W. Biederman, linux-man, Seth Forshee, Linux API, linux-security-module, Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge@hallyn.com> wrote: > Update the capabilities(7) manpage with a description of the > new-ish namespaced file capability support. > > A note on userspace tools: since the kernel will automatically > convert between v2 and v3 xattrs, and translate nsroot between > v3 xattrs, we can make do with the current getcap(8) and setcap(8) > tools. I.e. a user on the host can create a transient user namespace > with the appropriate mappings and run setcap(8) there. The kernel > will automatically write a v3 xattr with the transient namespace's > root user as nsroot. > > Signed-off-by: Serge Hallyn <shallyn@cisco.com> > --- > man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/man7/capabilities.7 b/man7/capabilities.7 > index 166eaaf..76e7e02 100644 > --- a/man7/capabilities.7 > +++ b/man7/capabilities.7 > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability, > then the effective flag must also be specified as enabled > for all other capabilities for which the corresponding permitted or > inheritable flags is enabled. > +.PP > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only > +the capabilities to be applied to the file, with no record of the writer's > +credentials. Therefore only privileged users can be trusted to write them, and > +.BR CAP_SETFCAP > +over the user namespace which mounted the filesystem (usually the initial user > +namespace) is required. This makes it impossible to write file capabilities > +from a user namespaced container, which causes some package updates to fail. > +.PP > +In order to support setting file capabilities in containers, the > +kernel must be able to identify whether the task executing the > +file will be constrained to a subset of the resources over which > +the writer of the file capabilities has privilege. To this end, > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID > +of the root user in the writer's namespace ("nsroot"). Hence the writer only > +requires > +.IP 1. > +.BR CAP_SETFCAP > +over the file inode, meaning the writing task must have > +.BR CAP_SETFCAP > +over a user namespace into which the inode's owning user ID is mapped. > +.PP > +and > +.IP 2. > +.BR CAP_SETFCAP > +over the writer's own user namespace. I think that the following would be clearer (but technically equivalent): "Hence the writer only requires CAP_SETFCAP over the file inode, meaning that the writing task must have CAP_SETFCAP in its own user namespace and the UID and GID of the file inode must be mapped in the writing task's user namespace.". > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace. > +.PP > +Users with the required privilege may use > +.BR setxattr(2) > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write. > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a > +VFS_CAP_REVISION_3 extended attribute with the "nsroot" > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3 > +extended attribute is specified, then the kernel will map the > +specified root user ID (which must be a valid user ID mapped in the caller's > +user namespace) into the initial user namespace. Really, "into the initial user namespace"? That may be true for the kernel-internal representation, but the on-disk representation is the mapping into the user namespace that contains the mount namespace into which the file system was mounted, right? This would become observable when a file system is mounted in a different namespace than before, or when working with FUSE in a namespace. > Likewise, > +.BR getxattr(2) > +results will be converted and simplified to show a VFS_CAP_REVISION_2 > +extended attribute, if a VFS_CAP_REVISION_3 applies to the caller's > +namespace, or to map the VFS_CAP_REVISION_3 root user ID into the > +caller's namespace. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities 2018-01-16 17:26 ` Jann Horn @ 2018-01-16 17:38 ` Serge E. Hallyn [not found] ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Serge E. Hallyn @ 2018-01-16 17:38 UTC (permalink / raw) To: Jann Horn Cc: Serge E. Hallyn, Michael Kerrisk-manpages, Eric W. Biederman, linux-man, Seth Forshee, Linux API, linux-security-module, Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan Quoting Jann Horn (jannh@google.com): > On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge@hallyn.com> wrote: > > Update the capabilities(7) manpage with a description of the > > new-ish namespaced file capability support. > > > > A note on userspace tools: since the kernel will automatically > > convert between v2 and v3 xattrs, and translate nsroot between > > v3 xattrs, we can make do with the current getcap(8) and setcap(8) > > tools. I.e. a user on the host can create a transient user namespace > > with the appropriate mappings and run setcap(8) there. The kernel > > will automatically write a v3 xattr with the transient namespace's > > root user as nsroot. > > > > Signed-off-by: Serge Hallyn <shallyn@cisco.com> > > --- > > man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 44 insertions(+) > > > > diff --git a/man7/capabilities.7 b/man7/capabilities.7 > > index 166eaaf..76e7e02 100644 > > --- a/man7/capabilities.7 > > +++ b/man7/capabilities.7 > > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability, > > then the effective flag must also be specified as enabled > > for all other capabilities for which the corresponding permitted or > > inheritable flags is enabled. > > +.PP > > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only > > +the capabilities to be applied to the file, with no record of the writer's > > +credentials. Therefore only privileged users can be trusted to write them, and > > +.BR CAP_SETFCAP > > +over the user namespace which mounted the filesystem (usually the initial user > > +namespace) is required. This makes it impossible to write file capabilities > > +from a user namespaced container, which causes some package updates to fail. > > +.PP > > +In order to support setting file capabilities in containers, the > > +kernel must be able to identify whether the task executing the > > +file will be constrained to a subset of the resources over which > > +the writer of the file capabilities has privilege. To this end, > > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID > > +of the root user in the writer's namespace ("nsroot"). Hence the writer only > > +requires > > +.IP 1. > > +.BR CAP_SETFCAP > > +over the file inode, meaning the writing task must have > > +.BR CAP_SETFCAP > > +over a user namespace into which the inode's owning user ID is mapped. > > +.PP > > +and > > +.IP 2. > > +.BR CAP_SETFCAP > > +over the writer's own user namespace. > > I think that the following would be clearer (but technically > equivalent): "Hence the writer only requires CAP_SETFCAP over the file > inode, meaning that the writing task must have CAP_SETFCAP in its own > user namespace and the UID and GID of the file inode must be mapped in > the writing task's user namespace.". Looks good to me. > > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace > > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace. > > +.PP > > +Users with the required privilege may use > > +.BR setxattr(2) > > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write. > > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a > > +VFS_CAP_REVISION_3 extended attribute with the "nsroot" > > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3 > > +extended attribute is specified, then the kernel will map the > > +specified root user ID (which must be a valid user ID mapped in the caller's > > +user namespace) into the initial user namespace. > > Really, "into the initial user namespace"? That may be true for the > kernel-internal representation, but the on-disk representation is the > mapping into the user namespace that contains the mount namespace into > which the file system was mounted, right? Ah, yes, it is. > This would become observable > when a file system is mounted in a different namespace than before, or > when working with FUSE in a namespace. Yes it would. Michael, you said you were reworking it, do you mind working this into it as well? thanks Jann, -serge ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [manpages PATCH] capabilities.7: describe namespaced file capabilities [not found] ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2018-01-17 23:44 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 6+ messages in thread From: Michael Kerrisk (man-pages) @ 2018-01-17 23:44 UTC (permalink / raw) To: Serge E. Hallyn Cc: Jann Horn, Eric W. Biederman, linux-man, Seth Forshee, Linux API, linux-security-module-u79uwXL29TY76Z2rM5mHXA, Kees Cook, Andreas Gruenbacher, Andy Lutomirski, Andrew G. Morgan On 16 January 2018 at 18:38, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Jann Horn (jannh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org): >> On Tue, Jan 9, 2018 at 7:52 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: >> > Update the capabilities(7) manpage with a description of the >> > new-ish namespaced file capability support. >> > >> > A note on userspace tools: since the kernel will automatically >> > convert between v2 and v3 xattrs, and translate nsroot between >> > v3 xattrs, we can make do with the current getcap(8) and setcap(8) >> > tools. I.e. a user on the host can create a transient user namespace >> > with the appropriate mappings and run setcap(8) there. The kernel >> > will automatically write a v3 xattr with the transient namespace's >> > root user as nsroot. >> > >> > Signed-off-by: Serge Hallyn <shallyn-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> >> > --- >> > man7/capabilities.7 | 44 ++++++++++++++++++++++++++++++++++++++++++++ >> > 1 file changed, 44 insertions(+) >> > >> > diff --git a/man7/capabilities.7 b/man7/capabilities.7 >> > index 166eaaf..76e7e02 100644 >> > --- a/man7/capabilities.7 >> > +++ b/man7/capabilities.7 >> > @@ -936,6 +936,50 @@ if we specify the effective flag as being enabled for any capability, >> > then the effective flag must also be specified as enabled >> > for all other capabilities for which the corresponding permitted or >> > inheritable flags is enabled. >> > +.PP >> > +Until 4.13, only VFS_CAP_REVISION_2 xattrs were supported. These store only >> > +the capabilities to be applied to the file, with no record of the writer's >> > +credentials. Therefore only privileged users can be trusted to write them, and >> > +.BR CAP_SETFCAP >> > +over the user namespace which mounted the filesystem (usually the initial user >> > +namespace) is required. This makes it impossible to write file capabilities >> > +from a user namespaced container, which causes some package updates to fail. >> > +.PP >> > +In order to support setting file capabilities in containers, the >> > +kernel must be able to identify whether the task executing the >> > +file will be constrained to a subset of the resources over which >> > +the writer of the file capabilities has privilege. To this end, >> > +since 4.13, VFS_CAP_REVISION_3 capabilities store the user ID >> > +of the root user in the writer's namespace ("nsroot"). Hence the writer only >> > +requires >> > +.IP 1. >> > +.BR CAP_SETFCAP >> > +over the file inode, meaning the writing task must have >> > +.BR CAP_SETFCAP >> > +over a user namespace into which the inode's owning user ID is mapped. >> > +.PP >> > +and >> > +.IP 2. >> > +.BR CAP_SETFCAP >> > +over the writer's own user namespace. >> >> I think that the following would be clearer (but technically >> equivalent): "Hence the writer only requires CAP_SETFCAP over the file >> inode, meaning that the writing task must have CAP_SETFCAP in its own >> user namespace and the UID and GID of the file inode must be mapped in >> the writing task's user namespace.". > > Looks good to me. > >> > +A VFS_CAP_REVISION_3 file capability will take effect only when run in a user namespace >> > +whose UID 0 maps to the saved "nsroot", or a descendant of such a namespace. >> > +.PP >> > +Users with the required privilege may use >> > +.BR setxattr(2) >> > +to request either a VFS_CAP_REVISION_2 or VFS_CAP_REVISION_3 write. >> > +The kernel will automatically convert a VFS_CAP_REVISION_2 to a >> > +VFS_CAP_REVISION_3 extended attribute with the "nsroot" >> > +set to the root user in the writer's user namespace, or, if a VFS_CAP_REVISION_3 >> > +extended attribute is specified, then the kernel will map the >> > +specified root user ID (which must be a valid user ID mapped in the caller's >> > +user namespace) into the initial user namespace. >> >> Really, "into the initial user namespace"? That may be true for the >> kernel-internal representation, but the on-disk representation is the >> mapping into the user namespace that contains the mount namespace into >> which the file system was mounted, right? > > Ah, yes, it is. > >> This would become observable >> when a file system is mounted in a different namespace than before, or >> when working with FUSE in a namespace. > > Yes it would. > > Michael, you said you were reworking it, do you mind working this into > it as well? Yes, I'll do that. It may be a couple of weeks before I get some more cycles for this, however. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-01-17 23:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 18:52 [manpages PATCH] capabilities.7: describe namespaced file capabilities Serge E. Hallyn
[not found] ` <20180109185218.GA21753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-14 9:40 ` Michael Kerrisk (man-pages)
2018-01-15 4:31 ` Serge E. Hallyn
2018-01-16 17:26 ` Jann Horn
2018-01-16 17:38 ` Serge E. Hallyn
[not found] ` <20180116173803.GA15538-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-17 23:44 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox