* Could not mount sysfs when enable userns but disable netns
@ 2014-07-11 7:27 chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
[not found] ` <5871495633F38949900D2BF2DC04883E562293-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: chenhanxiao-BthXqXjhjHXQFUHtdCDX3A @ 2014-07-11 7:27 UTC (permalink / raw)
To: Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org),
Serge Hallyn (serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org),
Greg Kroah-Hartman
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hello,
How to reproduce:
1. Prepare a container, enable userns and disable netns
2. use libvirt-lxc to start a container
3. libvirt could not mount sysfs then failed to start.
Then I found that
commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says:
"Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
over the net namespace."
But why should we check sysfs mouont permission over net namespace?
We've already checked CAP_SYS_ADMIN though.
What the relationship between sysfs and net namespace,
or this check is a little redundant?
Any insights on this?
Thanks,
- Chen
PS: codes below could be a workaround
@@ -34,7 +35,8 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
return ERR_PTR(-EPERM);
- if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
+ if (current->nsproxy->net_ns != &init_net &&
+ !kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
return ERR_PTR(-EPERM);
}
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <5871495633F38949900D2BF2DC04883E562293-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>]
* Re: Could not mount sysfs when enable userns but disable netns [not found] ` <5871495633F38949900D2BF2DC04883E562293-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org> @ 2014-07-11 14:28 ` Serge E. Hallyn [not found] ` <20140711142806.GA26441-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Serge E. Hallyn @ 2014-07-11 14:28 UTC (permalink / raw) To: chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org Cc: Greg Kroah-Hartman, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Serge Hallyn (serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org), Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org), linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Quoting chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org (chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org): > Hello, > > How to reproduce: > 1. Prepare a container, enable userns and disable netns > 2. use libvirt-lxc to start a container > 3. libvirt could not mount sysfs then failed to start. > > Then I found that > commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says: > "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights > over the net namespace." > > But why should we check sysfs mouont permission over net namespace? > We've already checked CAP_SYS_ADMIN though. > > What the relationship between sysfs and net namespace, > or this check is a little redundant? It is not redundant. The whole point is that after clone(CLONE_NEWUSER) you get a newly filled set of capabilities. But you should not have privileges over the host's network namesapce. After you unshare a new network namespace, you *should* have privilege over it. So the fact that we've already check CAP_SYS_ADMIN means nothing, because the capabilities need to be targeted. > Any insights on this? > > Thanks, > - Chen > > PS: codes below could be a workaround > > @@ -34,7 +35,8 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type, > if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) > return ERR_PTR(-EPERM); > > - if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) > + if (current->nsproxy->net_ns != &init_net && > + !kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) > return ERR_PTR(-EPERM); > } > _______________________________________________ > Containers mailing list > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org > https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20140711142806.GA26441-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: Could not mount sysfs when enable userns but disable netns [not found] ` <20140711142806.GA26441-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2014-07-11 16:29 ` Eric W. Biederman [not found] ` <87ha2nyi3y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Eric W. Biederman @ 2014-07-11 16:29 UTC (permalink / raw) To: Serge E. Hallyn Cc: Greg Kroah-Hartman, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Serge Hallyn (serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org), linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > Quoting chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org (chenhanxiao-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org): >> Hello, >> >> How to reproduce: >> 1. Prepare a container, enable userns and disable netns >> 2. use libvirt-lxc to start a container >> 3. libvirt could not mount sysfs then failed to start. >> >> Then I found that >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says: >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights >> over the net namespace." >> >> But why should we check sysfs mouont permission over net namespace? >> We've already checked CAP_SYS_ADMIN though. We already checked capable(CAP_SYS_ADMIN) and it failed. >> What the relationship between sysfs and net namespace, >> or this check is a little redundant? You want a bind mount not a new fresh mount. When looking at how evil actors could abuse things it turned out that in some circumstances the root user (before a user namespace is created) needs to control the policy on which filesystems may be mounted. There are files in sysfs and in proc that you never want to see in a chroot jail, as they just create more surface area to attack. The only reason for creating a new fresh mount of sysfs is to get access to /sys/class/net. So to keep things simple we restrict creation of that mount to cases where the mounter has permisions over the network namespace, and cases where nothing interesing is mounted on top of sysfs. If a new /sys/class/net is not needed it is possible to bind mount the existing copy of sysfs to the new location without loss of functionality. > It is not redundant. The whole point is that after clone(CLONE_NEWUSER) > you get a newly filled set of capabilities. But you should not have > privileges over the host's network namesapce. After you unshare a new > network namespace, you *should* have privilege over it. So the fact > that we've already check CAP_SYS_ADMIN means nothing, because the > capabilities need to be targeted. Exactly the tests are failing because the caller is not the global root and so the code is properly failing the permission checks. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <87ha2nyi3y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* RE: Could not mount sysfs when enable userns but disable netns [not found] ` <87ha2nyi3y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2014-07-14 9:32 ` chenhanxiao-BthXqXjhjHXQFUHtdCDX3A [not found] ` <5871495633F38949900D2BF2DC04883E5632BD-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: chenhanxiao-BthXqXjhjHXQFUHtdCDX3A @ 2014-07-14 9:32 UTC (permalink / raw) To: Eric W. Biederman, Serge E. Hallyn, 'Daniel P. Berrange (berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org)' Cc: Greg Kroah-Hartman, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > -----Original Message----- > From: Eric W. Biederman [mailto:ebiederm@xmission.com] > Sent: Saturday, July 12, 2014 12:29 AM > To: Serge E. Hallyn > Cc: Chen, Hanxiao/陈 晗霄; Serge Hallyn (serge.hallyn@ubuntu.com); Greg > Kroah-Hartman; containers@lists.linux-foundation.org; > linux-kernel@vger.kernel.org > Subject: Re: Could not mount sysfs when enable userns but disable netns > > "Serge E. Hallyn" <serge@hallyn.com> writes: > > > Quoting chenhanxiao@cn.fujitsu.com (chenhanxiao@cn.fujitsu.com): > >> Hello, > >> > >> How to reproduce: > >> 1. Prepare a container, enable userns and disable netns > >> 2. use libvirt-lxc to start a container > >> 3. libvirt could not mount sysfs then failed to start. > >> > >> Then I found that > >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says: > >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights > >> over the net namespace." > >> > >> But why should we check sysfs mouont permission over net namespace? > >> We've already checked CAP_SYS_ADMIN though. > > We already checked capable(CAP_SYS_ADMIN) and it failed. But on my machine, capable(CAP_SYS_ADMIN) passed but failed in kobj_ns_current_may_mount. I added some printks in sysfs_mount: if (!(flags & MS_KERNMOUNT)) { - if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) + if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) { + printk(KERN_WARNING "Failed in capable\n"); return ERR_PTR(-EPERM); + } - if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) + if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) { + printk(KERN_WARNING "Failed in kobj_ns_current_may_mount\n"); return ERR_PTR(-EPERM); + } And found: Jul 14 09:55:26 localhost systemd: Starting Container lxc-chx. Jul 14 09:55:26 localhost systemd-machined: New machine lxc-chx. Jul 14 09:55:26 localhost systemd: Started Container lxc-chx. Jul 14 09:55:26 localhost kernel: [ 784.044709] Failed in kobj_ns_current_may_mount Jul 14 09:55:26 localhost systemd-machined: Machine lxc-chx terminated. > > >> What the relationship between sysfs and net namespace, > >> or this check is a little redundant? > > You want a bind mount not a new fresh mount. > Yes, we need to modify libvirt's codes to deal with sysfs when enable userns but disable netns. Thanks, - Chen > When looking at how evil actors could abuse things it turned out that in > some circumstances the root user (before a user namespace is created) > needs to control the policy on which filesystems may be mounted. There > are files in sysfs and in proc that you never want to see in a chroot > jail, as they just create more surface area to attack. > > The only reason for creating a new fresh mount of sysfs is to get access > to /sys/class/net. So to keep things simple we restrict creation of > that mount to cases where the mounter has permisions over the network > namespace, and cases where nothing interesing is mounted on top of > sysfs. > > If a new /sys/class/net is not needed it is possible to bind mount the > existing copy of sysfs to the new location without loss of > functionality. > > > It is not redundant. The whole point is that after clone(CLONE_NEWUSER) > > you get a newly filled set of capabilities. But you should not have > > privileges over the host's network namesapce. After you unshare a new > > network namespace, you *should* have privilege over it. So the fact > > that we've already check CAP_SYS_ADMIN means nothing, because the > > capabilities need to be targeted. > > Exactly the tests are failing because the caller is not the global root > and so the code is properly failing the permission checks. > > Eric _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <5871495633F38949900D2BF2DC04883E5632BD-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>]
* Re: Could not mount sysfs when enable userns but disable netns [not found] ` <5871495633F38949900D2BF2DC04883E5632BD-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org> @ 2014-07-14 17:23 ` Eric W. Biederman 0 siblings, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2014-07-14 17:23 UTC (permalink / raw) To: chenhanxiao@cn.fujitsu.com Cc: Greg Kroah-Hartman, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org "chenhanxiao@cn.fujitsu.com" <chenhanxiao@cn.fujitsu.com> writes: >> -----Original Message----- >> From: Eric W. Biederman [mailto:ebiederm@xmission.com] >> Sent: Saturday, July 12, 2014 12:29 AM >> To: Serge E. Hallyn >> Cc: Chen, Hanxiao/陈 晗霄; Serge Hallyn (serge.hallyn@ubuntu.com); Greg >> Kroah-Hartman; containers@lists.linux-foundation.org; >> linux-kernel@vger.kernel.org >> Subject: Re: Could not mount sysfs when enable userns but disable netns >> >> "Serge E. Hallyn" <serge@hallyn.com> writes: >> >> > Quoting chenhanxiao@cn.fujitsu.com (chenhanxiao@cn.fujitsu.com): >> >> Hello, >> >> >> >> How to reproduce: >> >> 1. Prepare a container, enable userns and disable netns >> >> 2. use libvirt-lxc to start a container >> >> 3. libvirt could not mount sysfs then failed to start. >> >> >> >> Then I found that >> >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says: >> >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights >> >> over the net namespace." >> >> >> >> But why should we check sysfs mouont permission over net namespace? >> >> We've already checked CAP_SYS_ADMIN though. >> >> We already checked capable(CAP_SYS_ADMIN) and it failed. > > But on my machine, capable(CAP_SYS_ADMIN) passed > but failed in kobj_ns_current_may_mount. No. capable(CAP_SYS_ADMIN) did not pass. fs_fully_visible did passed. There is a significant distinction. If capable(CAP_SYS_ADMIN) had passed kobj_ns_current_may_mount (which is a fancy way of saying ns_capable(net->user_ns, CAP_SYS_ADMIN)) would also have passed. > I added some printks in sysfs_mount: > if (!(flags & MS_KERNMOUNT)) { > - if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) > + if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) { > + printk(KERN_WARNING "Failed in capable\n"); > return ERR_PTR(-EPERM); > + } > > - if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) > + if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) { > + printk(KERN_WARNING "Failed in kobj_ns_current_may_mount\n"); > return ERR_PTR(-EPERM); > + } > > And found: > Jul 14 09:55:26 localhost systemd: Starting Container lxc-chx. > Jul 14 09:55:26 localhost systemd-machined: New machine lxc-chx. > Jul 14 09:55:26 localhost systemd: Started Container lxc-chx. > Jul 14 09:55:26 localhost kernel: [ 784.044709] Failed in kobj_ns_current_may_mount > Jul 14 09:55:26 localhost systemd-machined: Machine lxc-chx terminated. > >> >> >> What the relationship between sysfs and net namespace, >> >> or this check is a little redundant? >> >> You want a bind mount not a new fresh mount. >> > > Yes, we need to modify libvirt's codes to deal with sysfs > when enable userns but disable netns. Please go for it. I don't have any insignt into libvirt so I can't help you there. Eric _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-14 17:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-11 7:27 Could not mount sysfs when enable userns but disable netns chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
[not found] ` <5871495633F38949900D2BF2DC04883E562293-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>
2014-07-11 14:28 ` Serge E. Hallyn
[not found] ` <20140711142806.GA26441-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2014-07-11 16:29 ` Eric W. Biederman
[not found] ` <87ha2nyi3y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-07-14 9:32 ` chenhanxiao-BthXqXjhjHXQFUHtdCDX3A
[not found] ` <5871495633F38949900D2BF2DC04883E5632BD-ZEd+hNNJ6a5ZYpXjqAkB5jz3u5zwRJJDAzI0kPv9QBlmR6Xm/wNWPw@public.gmane.org>
2014-07-14 17:23 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox