public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
@ 2014-04-29 13:49 Marian Marinov
  2014-04-29 18:35 ` Theodore Ts'o
  0 siblings, 1 reply; 28+ messages in thread
From: Marian Marinov @ 2014-04-29 13:49 UTC (permalink / raw)
  To: containers, LXC development mailing-list,
	linux-kernel@vger.kernel.org

Hello,
when using user namespaces I found a bug in the capability checks done by ioctl.

If someone tries to use chattr +i while in a different user namespace it will get the following:

ioctl(3, EXT2_IOC_SETFLAGS, 0x7fffa4fedacc) = -1 EPERM (Operation not permitted)

I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE) check with 
ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).

If you agree I can send patches for all filesystems.

I'm proposing the following patch:

---
  fs/ext4/ioctl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index d011b69..25683d0 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -265,7 +265,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
                  * This test looks nicer. Thanks to Pauline Middelink
                  */
                 if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
-                       if (!capable(CAP_LINUX_IMMUTABLE))
+                       if (!ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE))
                                 goto flags_out;
                 }

-- 
1.8.4


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 13:49 ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace Marian Marinov
@ 2014-04-29 18:35 ` Theodore Ts'o
  2014-04-29 18:52   ` Serge Hallyn
  0 siblings, 1 reply; 28+ messages in thread
From: Theodore Ts'o @ 2014-04-29 18:35 UTC (permalink / raw)
  To: Marian Marinov
  Cc: containers, LXC development mailing-list,
	linux-kernel@vger.kernel.org

On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> 
> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).

Um, wouldn't it be better to simply fix the capable() function?

/**
 * capable - Determine if the current task has a superior capability in effect
 * @cap: The capability to be tested for
 *
 * Return true if the current task has the given superior capability currently
 * available for use, false if not.
 *
 * This sets PF_SUPERPRIV on the task if the capability is available on the
 * assumption that it's about to be used.
 */
bool capable(int cap)
{
	return ns_capable(&init_user_ns, cap);
}
EXPORT_SYMBOL(capable);

The documentation states that it is for "the current task", and I
can't imagine any use case, where user namespaces are in effect, where
using init_user_ns would ever make sense.

No?  Otherwise, pretty much every single use of capable() would be
broken, not just this once instances in ext4/ioctl.c.

					- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 18:35 ` Theodore Ts'o
@ 2014-04-29 18:52   ` Serge Hallyn
  2014-04-29 21:49     ` Marian Marinov
  0 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-29 18:52 UTC (permalink / raw)
  To: Theodore Ts'o, Marian Marinov, containers,
	LXC development mailing-list, linux-kernel@vger.kernel.org

Quoting Theodore Ts'o (tytso@mit.edu):
> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> > 
> > I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
> > check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
> 
> Um, wouldn't it be better to simply fix the capable() function?
> 
> /**
>  * capable - Determine if the current task has a superior capability in effect
>  * @cap: The capability to be tested for
>  *
>  * Return true if the current task has the given superior capability currently
>  * available for use, false if not.
>  *
>  * This sets PF_SUPERPRIV on the task if the capability is available on the
>  * assumption that it's about to be used.
>  */
> bool capable(int cap)
> {
> 	return ns_capable(&init_user_ns, cap);
> }
> EXPORT_SYMBOL(capable);
> 
> The documentation states that it is for "the current task", and I
> can't imagine any use case, where user namespaces are in effect, where
> using init_user_ns would ever make sense.

the init_user_ns represents the user_ns owning the object, not the
subject.

The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
CAP_SYS_IMMUTABLE)' by definition.

So NACK to that particular patch.  I'm not sure, but IIUC it should be
safe to check against the userns owning the inode?

> No?  Otherwise, pretty much every single use of capable() would be
> broken, not just this once instances in ext4/ioctl.c.
> 
> 					- Ted
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 18:52   ` Serge Hallyn
@ 2014-04-29 21:49     ` Marian Marinov
  2014-04-29 22:02       ` Serge Hallyn
  0 siblings, 1 reply; 28+ messages in thread
From: Marian Marinov @ 2014-04-29 21:49 UTC (permalink / raw)
  To: Serge Hallyn, Theodore Ts'o, containers,
	LXC development mailing-list, linux-kernel@vger.kernel.org

On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> Quoting Theodore Ts'o (tytso@mit.edu):
>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>
>>> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
>>> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
>>
>> Um, wouldn't it be better to simply fix the capable() function?
>>
>> /**
>>   * capable - Determine if the current task has a superior capability in effect
>>   * @cap: The capability to be tested for
>>   *
>>   * Return true if the current task has the given superior capability currently
>>   * available for use, false if not.
>>   *
>>   * This sets PF_SUPERPRIV on the task if the capability is available on the
>>   * assumption that it's about to be used.
>>   */
>> bool capable(int cap)
>> {
>> 	return ns_capable(&init_user_ns, cap);
>> }
>> EXPORT_SYMBOL(capable);
>>
>> The documentation states that it is for "the current task", and I
>> can't imagine any use case, where user namespaces are in effect, where
>> using init_user_ns would ever make sense.
>
> the init_user_ns represents the user_ns owning the object, not the
> subject.
>
> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
> CAP_SYS_IMMUTABLE)' by definition.
>
> So NACK to that particular patch.  I'm not sure, but IIUC it should be
> safe to check against the userns owning the inode?
>

So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?

I agree that this is more sane.

Marian

>> No?  Otherwise, pretty much every single use of capable() would be
>> broken, not just this once instances in ext4/ioctl.c.
>>
>> 					- Ted
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/containers
>


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 21:49     ` Marian Marinov
@ 2014-04-29 22:02       ` Serge Hallyn
  2014-04-29 22:24         ` Marian Marinov
  0 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-29 22:02 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Theodore Ts'o, containers, LXC development mailing-list,
	linux-kernel@vger.kernel.org

Quoting Marian Marinov (mm@1h.com):
> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >Quoting Theodore Ts'o (tytso@mit.edu):
> >>On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >>>
> >>>I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
> >>>check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
> >>
> >>Um, wouldn't it be better to simply fix the capable() function?
> >>
> >>/**
> >>  * capable - Determine if the current task has a superior capability in effect
> >>  * @cap: The capability to be tested for
> >>  *
> >>  * Return true if the current task has the given superior capability currently
> >>  * available for use, false if not.
> >>  *
> >>  * This sets PF_SUPERPRIV on the task if the capability is available on the
> >>  * assumption that it's about to be used.
> >>  */
> >>bool capable(int cap)
> >>{
> >>	return ns_capable(&init_user_ns, cap);
> >>}
> >>EXPORT_SYMBOL(capable);
> >>
> >>The documentation states that it is for "the current task", and I
> >>can't imagine any use case, where user namespaces are in effect, where
> >>using init_user_ns would ever make sense.
> >
> >the init_user_ns represents the user_ns owning the object, not the
> >subject.
> >
> >The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
> >CAP_SYS_IMMUTABLE)' by definition.
> >
> >So NACK to that particular patch.  I'm not sure, but IIUC it should be
> >safe to check against the userns owning the inode?
> >
> 
> So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> 
> I agree that this is more sane.

Right, and I think the two operations you're looking at seem sane
to allow.

thanks,
-serge

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:02       ` Serge Hallyn
@ 2014-04-29 22:24         ` Marian Marinov
  2014-04-29 22:29           ` Serge Hallyn
  0 siblings, 1 reply; 28+ messages in thread
From: Marian Marinov @ 2014-04-29 22:24 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Theodore Ts'o, containers, LXC development mailing-list,
	linux-kernel@vger.kernel.org

On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> Quoting Marian Marinov (mm@1h.com):
>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>> Quoting Theodore Ts'o (tytso@mit.edu):
>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>
>>>>> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
>>>>> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
>>>>
>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>
>>>> /**
>>>>   * capable - Determine if the current task has a superior capability in effect
>>>>   * @cap: The capability to be tested for
>>>>   *
>>>>   * Return true if the current task has the given superior capability currently
>>>>   * available for use, false if not.
>>>>   *
>>>>   * This sets PF_SUPERPRIV on the task if the capability is available on the
>>>>   * assumption that it's about to be used.
>>>>   */
>>>> bool capable(int cap)
>>>> {
>>>> 	return ns_capable(&init_user_ns, cap);
>>>> }
>>>> EXPORT_SYMBOL(capable);
>>>>
>>>> The documentation states that it is for "the current task", and I
>>>> can't imagine any use case, where user namespaces are in effect, where
>>>> using init_user_ns would ever make sense.
>>>
>>> the init_user_ns represents the user_ns owning the object, not the
>>> subject.
>>>
>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>>> setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
>>> CAP_SYS_IMMUTABLE)' by definition.
>>>
>>> So NACK to that particular patch.  I'm not sure, but IIUC it should be
>>> safe to check against the userns owning the inode?
>>>
>>
>> So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>
>> I agree that this is more sane.
>
> Right, and I think the two operations you're looking at seem sane
> to allow.

If you are ok with this patch, I will fix all file systems and send patches.

Signed-off-by: Marian Marinov <mm@yuhu.biz>
---
  fs/ext4/ioctl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index d011b69..9418634 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -265,7 +265,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
                  * This test looks nicer. Thanks to Pauline Middelink
                  */
                 if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
-                   if (!capable(CAP_LINUX_IMMUTABLE))
+                 if (!inode_capable(inode, CAP_LINUX_IMMUTABLE))
                                 goto flags_out;
                 }

---
1.8.4

Marian


>
> thanks,
> -serge
>


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:24         ` Marian Marinov
@ 2014-04-29 22:29           ` Serge Hallyn
  2014-04-29 22:45             ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-29 22:29 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Theodore Ts'o, containers, LXC development mailing-list,
	linux-kernel@vger.kernel.org

Quoting Marian Marinov (mm@1h.com):
> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >Quoting Marian Marinov (mm@1h.com):
> >>On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >>>Quoting Theodore Ts'o (tytso@mit.edu):
> >>>>On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >>>>>
> >>>>>I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
> >>>>>check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
> >>>>
> >>>>Um, wouldn't it be better to simply fix the capable() function?
> >>>>
> >>>>/**
> >>>>  * capable - Determine if the current task has a superior capability in effect
> >>>>  * @cap: The capability to be tested for
> >>>>  *
> >>>>  * Return true if the current task has the given superior capability currently
> >>>>  * available for use, false if not.
> >>>>  *
> >>>>  * This sets PF_SUPERPRIV on the task if the capability is available on the
> >>>>  * assumption that it's about to be used.
> >>>>  */
> >>>>bool capable(int cap)
> >>>>{
> >>>>	return ns_capable(&init_user_ns, cap);
> >>>>}
> >>>>EXPORT_SYMBOL(capable);
> >>>>
> >>>>The documentation states that it is for "the current task", and I
> >>>>can't imagine any use case, where user namespaces are in effect, where
> >>>>using init_user_ns would ever make sense.
> >>>
> >>>the init_user_ns represents the user_ns owning the object, not the
> >>>subject.
> >>>
> >>>The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >>>setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
> >>>CAP_SYS_IMMUTABLE)' by definition.
> >>>
> >>>So NACK to that particular patch.  I'm not sure, but IIUC it should be
> >>>safe to check against the userns owning the inode?
> >>>
> >>
> >>So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >>'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >>
> >>I agree that this is more sane.
> >
> >Right, and I think the two operations you're looking at seem sane
> >to allow.
> 
> If you are ok with this patch, I will fix all file systems and send patches.

Sounds good, thanks.

> Signed-off-by: Marian Marinov <mm@yuhu.biz>

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

> ---
>  fs/ext4/ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index d011b69..9418634 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -265,7 +265,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>                  * This test looks nicer. Thanks to Pauline Middelink
>                  */
>                 if ((flags ^ oldflags) & (EXT4_APPEND_FL | EXT4_IMMUTABLE_FL)) {
> -                   if (!capable(CAP_LINUX_IMMUTABLE))
> +                 if (!inode_capable(inode, CAP_LINUX_IMMUTABLE))
>                                 goto flags_out;
>                 }
> 
> ---
> 1.8.4
> 
> Marian
> 
> 
> >
> >thanks,
> >-serge
> >
> 
> 
> -- 
> Marian Marinov
> Founder & CEO of 1H Ltd.
> Jabber/GTalk: hackman@jabber.org
> ICQ: 7556201
> Mobile: +359 886 660 270

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:29           ` Serge Hallyn
@ 2014-04-29 22:45             ` Andy Lutomirski
  2014-04-29 23:06               ` Theodore Ts'o
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-29 22:45 UTC (permalink / raw)
  To: Serge Hallyn, Marian Marinov
  Cc: containers, Ted Ts'o, Linux Kernel Mailing List, lxc-devel

On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>>>
>>>>>>> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
>>>>>>> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
>>>>>>
>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>>>
>>>>>> /**
>>>>>>  * capable - Determine if the current task has a superior capability in effect
>>>>>>  * @cap: The capability to be tested for
>>>>>>  *
>>>>>>  * Return true if the current task has the given superior capability currently
>>>>>>  * available for use, false if not.
>>>>>>  *
>>>>>>  * This sets PF_SUPERPRIV on the task if the capability is available on the
>>>>>>  * assumption that it's about to be used.
>>>>>>  */
>>>>>> bool capable(int cap)
>>>>>> {
>>>>>> 	return ns_capable(&init_user_ns, cap);
>>>>>> }
>>>>>> EXPORT_SYMBOL(capable);
>>>>>>
>>>>>> The documentation states that it is for "the current task", and I
>>>>>> can't imagine any use case, where user namespaces are in effect, where
>>>>>> using init_user_ns would ever make sense.
>>>>>
>>>>> the init_user_ns represents the user_ns owning the object, not the
>>>>> subject.
>>>>>
>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>>>>> setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
>>>>> CAP_SYS_IMMUTABLE)' by definition.
>>>>>
>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should be
>>>>> safe to check against the userns owning the inode?
>>>>>
>>>>
>>>> So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>>>
>>>> I agree that this is more sane.
>>>
>>> Right, and I think the two operations you're looking at seem sane
>>> to allow.
>>
>> If you are ok with this patch, I will fix all file systems and send patches.
> 
> Sounds good, thanks.
> 
>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
> 
> Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>

Wait, what?

Inodes aren't owned by user namespaces; they're owned by users.  And any
user can arrange to have a user namespace in which they pass an
inode_capable check on any inode that they own.

Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
entirely.

Nacked-by: Andy Lutomirski <luto@amacapital.net>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:45             ` Andy Lutomirski
@ 2014-04-29 23:06               ` Theodore Ts'o
  2014-04-29 23:07                 ` Andy Lutomirski
  2014-04-29 23:20               ` Marian Marinov
  2014-04-30  0:16               ` Serge Hallyn
  2 siblings, 1 reply; 28+ messages in thread
From: Theodore Ts'o @ 2014-04-29 23:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Serge Hallyn, Marian Marinov, containers,
	Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 03:45:24PM -0700, Andy Lutomirski wrote:
> 
> Wait, what?
> 
> Inodes aren't owned by user namespaces; they're owned by users.  And any
> user can arrange to have a user namespace in which they pass an
> inode_capable check on any inode that they own.
> 
> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> entirely.
> 
> Nacked-by: Andy Lutomirski <luto@amacapital.net>

Right, but you can't set a mapping in a child namespace unless you
have CAP_SETUID in the parent namespace, right?  Otherwise user
namespaces are completely broken from a security perspective, since
inode_capable() could never do the right thing.

Personally, reading how user namespaces work, it makes the hair rise
on the back of my neck.  I'm not sure the concept works at all from a
security perspective, but hey, I'm not using user namespaces, and some
fool thought it was worth merging.  :-)

						- Ted



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 23:06               ` Theodore Ts'o
@ 2014-04-29 23:07                 ` Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-29 23:07 UTC (permalink / raw)
  To: Theodore Ts'o, Andy Lutomirski, Serge Hallyn, Marian Marinov,
	Linux Containers, Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 4:06 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Tue, Apr 29, 2014 at 03:45:24PM -0700, Andy Lutomirski wrote:
>>
>> Wait, what?
>>
>> Inodes aren't owned by user namespaces; they're owned by users.  And any
>> user can arrange to have a user namespace in which they pass an
>> inode_capable check on any inode that they own.
>>
>> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
>> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>> entirely.
>>
>> Nacked-by: Andy Lutomirski <luto@amacapital.net>
>
> Right, but you can't set a mapping in a child namespace unless you
> have CAP_SETUID in the parent namespace, right?

Nope.  You can't set a mapping for someone else's uid, but you can
certainly map your own.

> Otherwise user
> namespaces are completely broken from a security perspective, since
> inode_capable() could never do the right thing.

I don't know what inode_capable's "right thing" is, but at least one
of the existing callers is blatantly wrong.  Patches coming shortly.

>
> Personally, reading how user namespaces work, it makes the hair rise
> on the back of my neck.  I'm not sure the concept works at all from a
> security perspective, but hey, I'm not using user namespaces, and some
> fool thought it was worth merging.  :-)

I like them.  I've also found quite a few serious bugs in them.  So go figure :)

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:45             ` Andy Lutomirski
  2014-04-29 23:06               ` Theodore Ts'o
@ 2014-04-29 23:20               ` Marian Marinov
  2014-04-29 23:22                 ` Andy Lutomirski
  2014-04-30  0:16               ` Serge Hallyn
  2 siblings, 1 reply; 28+ messages in thread
From: Marian Marinov @ 2014-04-29 23:20 UTC (permalink / raw)
  To: Andy Lutomirski, Serge Hallyn
  Cc: containers, Ted Ts'o, Linux Kernel Mailing List, lxc-devel

On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>>>>
>>>>>>>> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
>>>>>>>> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
>>>>>>>
>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>>>>
>>>>>>> /**
>>>>>>>   * capable - Determine if the current task has a superior capability in effect
>>>>>>>   * @cap: The capability to be tested for
>>>>>>>   *
>>>>>>>   * Return true if the current task has the given superior capability currently
>>>>>>>   * available for use, false if not.
>>>>>>>   *
>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is available on the
>>>>>>>   * assumption that it's about to be used.
>>>>>>>   */
>>>>>>> bool capable(int cap)
>>>>>>> {
>>>>>>> 	return ns_capable(&init_user_ns, cap);
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(capable);
>>>>>>>
>>>>>>> The documentation states that it is for "the current task", and I
>>>>>>> can't imagine any use case, where user namespaces are in effect, where
>>>>>>> using init_user_ns would ever make sense.
>>>>>>
>>>>>> the init_user_ns represents the user_ns owning the object, not the
>>>>>> subject.
>>>>>>
>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>>>>>> setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>>>>>>
>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should be
>>>>>> safe to check against the userns owning the inode?
>>>>>>
>>>>>
>>>>> So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>>>>
>>>>> I agree that this is more sane.
>>>>
>>>> Right, and I think the two operations you're looking at seem sane
>>>> to allow.
>>>
>>> If you are ok with this patch, I will fix all file systems and send patches.
>>
>> Sounds good, thanks.
>>
>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>>
>> Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>
> Wait, what?
>
> Inodes aren't owned by user namespaces; they're owned by users.  And any
> user can arrange to have a user namespace in which they pass an
> inode_capable check on any inode that they own.
>
> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> entirely.

The problem I'm trying to solve is this:

container with its own user namespace and CAP_SYS_IMMUTABLE should be able to use chattr on all files witch this 
container has access to.

Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.

With the proposed two fixes CAP_SYS_IMMUTABLE started working in the container.

The first solution got its user namespace from the currently running process and the second gets its user namespace from 
the currently opened inode.

So what would be the best solution in this case?

Marian

>
> Nacked-by: Andy Lutomirski <luto@amacapital.net>
>
>
>


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 23:20               ` Marian Marinov
@ 2014-04-29 23:22                 ` Andy Lutomirski
  2014-04-29 23:47                   ` Stéphane Graber
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-29 23:22 UTC (permalink / raw)
  To: Marian Marinov, Eric W. Biederman
  Cc: Serge Hallyn, Linux Containers, Ted Ts'o,
	Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
> On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
>>
>> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>>>
>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>
>>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>>>>>
>>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>>>
>>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>>>>>>
>>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>>>>>>>>
>>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm proposing a fix to this, by replacing the
>>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
>>>>>>>>> check with ns_capable(current_cred()->user_ns,
>>>>>>>>> CAP_LINUX_IMMUTABLE).
>>>>>>>>
>>>>>>>>
>>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>>>>>
>>>>>>>> /**
>>>>>>>>   * capable - Determine if the current task has a superior
>>>>>>>> capability in effect
>>>>>>>>   * @cap: The capability to be tested for
>>>>>>>>   *
>>>>>>>>   * Return true if the current task has the given superior
>>>>>>>> capability currently
>>>>>>>>   * available for use, false if not.
>>>>>>>>   *
>>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
>>>>>>>> available on the
>>>>>>>>   * assumption that it's about to be used.
>>>>>>>>   */
>>>>>>>> bool capable(int cap)
>>>>>>>> {
>>>>>>>>         return ns_capable(&init_user_ns, cap);
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(capable);
>>>>>>>>
>>>>>>>> The documentation states that it is for "the current task", and I
>>>>>>>> can't imagine any use case, where user namespaces are in effect,
>>>>>>>> where
>>>>>>>> using init_user_ns would ever make sense.
>>>>>>>
>>>>>>>
>>>>>>> the init_user_ns represents the user_ns owning the object, not the
>>>>>>> subject.
>>>>>>>
>>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>>>>>>> setuid(0), execve, and end up satisfying
>>>>>>> 'ns_capable(current_cred()->userns,
>>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>>>>>>>
>>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
>>>>>>> be
>>>>>>> safe to check against the userns owning the inode?
>>>>>>>
>>>>>>
>>>>>> So what you are proposing is to replace
>>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>>>>>
>>>>>> I agree that this is more sane.
>>>>>
>>>>>
>>>>> Right, and I think the two operations you're looking at seem sane
>>>>> to allow.
>>>>
>>>>
>>>> If you are ok with this patch, I will fix all file systems and send
>>>> patches.
>>>
>>>
>>> Sounds good, thanks.
>>>
>>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>>>
>>>
>>> Acked-by: Serge E. Hallyn
>>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>>
>>
>> Wait, what?
>>
>> Inodes aren't owned by user namespaces; they're owned by users.  And any
>> user can arrange to have a user namespace in which they pass an
>> inode_capable check on any inode that they own.
>>
>> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
>> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>> entirely.
>
>
> The problem I'm trying to solve is this:
>
> container with its own user namespace and CAP_SYS_IMMUTABLE should be able
> to use chattr on all files witch this container has access to.
>
> Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
>
> With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
> container.
>
> The first solution got its user namespace from the currently running process
> and the second gets its user namespace from the currently opened inode.
>
> So what would be the best solution in this case?

I'd suggest adding a mount option like fs_owner_uid that names a uid
that owns, in the sense of having unlimited access to, a filesystem.
Then anyone with caps on a namespace owned by that uid could do
whatever.

Eric?

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 23:22                 ` Andy Lutomirski
@ 2014-04-29 23:47                   ` Stéphane Graber
  2014-04-29 23:51                     ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Stéphane Graber @ 2014-04-29 23:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marian Marinov, Eric W. Biederman, Linux Containers, Serge Hallyn,
	Ted Ts'o, Linux Kernel Mailing List, lxc-devel

[-- Attachment #1: Type: text/plain, Size: 5092 bytes --]

On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
> >>
> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> >>>
> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >>>>
> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >>>>>
> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >>>>>>
> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >>>>>>>
> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
> >>>>>>>>
> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm proposing a fix to this, by replacing the
> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
> >>>>>>>>> CAP_LINUX_IMMUTABLE).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
> >>>>>>>>
> >>>>>>>> /**
> >>>>>>>>   * capable - Determine if the current task has a superior
> >>>>>>>> capability in effect
> >>>>>>>>   * @cap: The capability to be tested for
> >>>>>>>>   *
> >>>>>>>>   * Return true if the current task has the given superior
> >>>>>>>> capability currently
> >>>>>>>>   * available for use, false if not.
> >>>>>>>>   *
> >>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
> >>>>>>>> available on the
> >>>>>>>>   * assumption that it's about to be used.
> >>>>>>>>   */
> >>>>>>>> bool capable(int cap)
> >>>>>>>> {
> >>>>>>>>         return ns_capable(&init_user_ns, cap);
> >>>>>>>> }
> >>>>>>>> EXPORT_SYMBOL(capable);
> >>>>>>>>
> >>>>>>>> The documentation states that it is for "the current task", and I
> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
> >>>>>>>> where
> >>>>>>>> using init_user_ns would ever make sense.
> >>>>>>>
> >>>>>>>
> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
> >>>>>>> subject.
> >>>>>>>
> >>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >>>>>>> setuid(0), execve, and end up satisfying
> >>>>>>> 'ns_capable(current_cred()->userns,
> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
> >>>>>>>
> >>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
> >>>>>>> be
> >>>>>>> safe to check against the userns owning the inode?
> >>>>>>>
> >>>>>>
> >>>>>> So what you are proposing is to replace
> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >>>>>>
> >>>>>> I agree that this is more sane.
> >>>>>
> >>>>>
> >>>>> Right, and I think the two operations you're looking at seem sane
> >>>>> to allow.
> >>>>
> >>>>
> >>>> If you are ok with this patch, I will fix all file systems and send
> >>>> patches.
> >>>
> >>>
> >>> Sounds good, thanks.
> >>>
> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
> >>>
> >>>
> >>> Acked-by: Serge E. Hallyn
> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >>
> >>
> >> Wait, what?
> >>
> >> Inodes aren't owned by user namespaces; they're owned by users.  And any
> >> user can arrange to have a user namespace in which they pass an
> >> inode_capable check on any inode that they own.
> >>
> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> >> entirely.
> >
> >
> > The problem I'm trying to solve is this:
> >
> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
> > to use chattr on all files witch this container has access to.
> >
> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
> >
> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
> > container.
> >
> > The first solution got its user namespace from the currently running process
> > and the second gets its user namespace from the currently opened inode.
> >
> > So what would be the best solution in this case?
> 
> I'd suggest adding a mount option like fs_owner_uid that names a uid
> that owns, in the sense of having unlimited access to, a filesystem.
> Then anyone with caps on a namespace owned by that uid could do
> whatever.
> 
> Eric?
> 
> --Andy

The most obvious problem I can think of with "do whatever" is that this
will likely include mknod of char and block devices which you can then
chown/chmod as you wish and use to access any devices on the system from
an unprivileged container.
This can however be mitigated by using the devices cgroup controller.

You also probably wouldn't want any unprivileged user from the host to
find a way to access that mounted filesytem but so long as you do the
mount in a separate mountns and don't share uids between the host and
the container, that should be fine too.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 23:47                   ` Stéphane Graber
@ 2014-04-29 23:51                     ` Andy Lutomirski
  2014-04-30  0:01                       ` Stéphane Graber
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-29 23:51 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Marian Marinov, Eric W. Biederman, Linux Containers, Serge Hallyn,
	Ted Ts'o, Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
> On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
>> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
>> >>
>> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>> >>>
>> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>> >>>>
>> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>> >>>>>
>> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>> >>>>>>
>> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>> >>>>>>>
>> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>> >>>>>>>>
>> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> I'm proposing a fix to this, by replacing the
>> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
>> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
>> >>>>>>>>> CAP_LINUX_IMMUTABLE).
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>> >>>>>>>>
>> >>>>>>>> /**
>> >>>>>>>>   * capable - Determine if the current task has a superior
>> >>>>>>>> capability in effect
>> >>>>>>>>   * @cap: The capability to be tested for
>> >>>>>>>>   *
>> >>>>>>>>   * Return true if the current task has the given superior
>> >>>>>>>> capability currently
>> >>>>>>>>   * available for use, false if not.
>> >>>>>>>>   *
>> >>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
>> >>>>>>>> available on the
>> >>>>>>>>   * assumption that it's about to be used.
>> >>>>>>>>   */
>> >>>>>>>> bool capable(int cap)
>> >>>>>>>> {
>> >>>>>>>>         return ns_capable(&init_user_ns, cap);
>> >>>>>>>> }
>> >>>>>>>> EXPORT_SYMBOL(capable);
>> >>>>>>>>
>> >>>>>>>> The documentation states that it is for "the current task", and I
>> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
>> >>>>>>>> where
>> >>>>>>>> using init_user_ns would ever make sense.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
>> >>>>>>> subject.
>> >>>>>>>
>> >>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>> >>>>>>> setuid(0), execve, and end up satisfying
>> >>>>>>> 'ns_capable(current_cred()->userns,
>> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>> >>>>>>>
>> >>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
>> >>>>>>> be
>> >>>>>>> safe to check against the userns owning the inode?
>> >>>>>>>
>> >>>>>>
>> >>>>>> So what you are proposing is to replace
>> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>> >>>>>>
>> >>>>>> I agree that this is more sane.
>> >>>>>
>> >>>>>
>> >>>>> Right, and I think the two operations you're looking at seem sane
>> >>>>> to allow.
>> >>>>
>> >>>>
>> >>>> If you are ok with this patch, I will fix all file systems and send
>> >>>> patches.
>> >>>
>> >>>
>> >>> Sounds good, thanks.
>> >>>
>> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>> >>>
>> >>>
>> >>> Acked-by: Serge E. Hallyn
>> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>> >>
>> >>
>> >> Wait, what?
>> >>
>> >> Inodes aren't owned by user namespaces; they're owned by users.  And any
>> >> user can arrange to have a user namespace in which they pass an
>> >> inode_capable check on any inode that they own.
>> >>
>> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
>> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>> >> entirely.
>> >
>> >
>> > The problem I'm trying to solve is this:
>> >
>> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
>> > to use chattr on all files witch this container has access to.
>> >
>> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
>> >
>> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
>> > container.
>> >
>> > The first solution got its user namespace from the currently running process
>> > and the second gets its user namespace from the currently opened inode.
>> >
>> > So what would be the best solution in this case?
>>
>> I'd suggest adding a mount option like fs_owner_uid that names a uid
>> that owns, in the sense of having unlimited access to, a filesystem.
>> Then anyone with caps on a namespace owned by that uid could do
>> whatever.
>>
>> Eric?
>>
>> --Andy
>
> The most obvious problem I can think of with "do whatever" is that this
> will likely include mknod of char and block devices which you can then
> chown/chmod as you wish and use to access any devices on the system from
> an unprivileged container.
> This can however be mitigated by using the devices cgroup controller.

Or 'nodev'.  setuid/setgid may have the same problem, too.

Implementing something like this would also make CAP_DAC_READ_SEARCH
and CAP_DAC_OVERRIDE work.

Arguably it should be impossible to mount such a thing in the first
place without global privilege.

>
> You also probably wouldn't want any unprivileged user from the host to
> find a way to access that mounted filesytem but so long as you do the
> mount in a separate mountns and don't share uids between the host and
> the container, that should be fine too.

This part should be a nonissue -- an unprivileged user who has the
right uid owns the namespace anyway, so this is the least of your
worries.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 23:51                     ` Andy Lutomirski
@ 2014-04-30  0:01                       ` Stéphane Graber
  2014-04-30  0:10                         ` Marian Marinov
  2014-04-30  0:11                         ` Andy Lutomirski
  0 siblings, 2 replies; 28+ messages in thread
From: Stéphane Graber @ 2014-04-30  0:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marian Marinov, Eric W. Biederman, Linux Containers, Serge Hallyn,
	Ted Ts'o, Linux Kernel Mailing List, lxc-devel

[-- Attachment #1: Type: text/plain, Size: 6529 bytes --]

On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
> > On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
> >> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
> >> >>
> >> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> >> >>>
> >> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >> >>>>
> >> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >> >>>>>
> >> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >> >>>>>>
> >> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >> >>>>>>>
> >> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
> >> >>>>>>>>
> >> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> I'm proposing a fix to this, by replacing the
> >> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
> >> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
> >> >>>>>>>>> CAP_LINUX_IMMUTABLE).
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
> >> >>>>>>>>
> >> >>>>>>>> /**
> >> >>>>>>>>   * capable - Determine if the current task has a superior
> >> >>>>>>>> capability in effect
> >> >>>>>>>>   * @cap: The capability to be tested for
> >> >>>>>>>>   *
> >> >>>>>>>>   * Return true if the current task has the given superior
> >> >>>>>>>> capability currently
> >> >>>>>>>>   * available for use, false if not.
> >> >>>>>>>>   *
> >> >>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
> >> >>>>>>>> available on the
> >> >>>>>>>>   * assumption that it's about to be used.
> >> >>>>>>>>   */
> >> >>>>>>>> bool capable(int cap)
> >> >>>>>>>> {
> >> >>>>>>>>         return ns_capable(&init_user_ns, cap);
> >> >>>>>>>> }
> >> >>>>>>>> EXPORT_SYMBOL(capable);
> >> >>>>>>>>
> >> >>>>>>>> The documentation states that it is for "the current task", and I
> >> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
> >> >>>>>>>> where
> >> >>>>>>>> using init_user_ns would ever make sense.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
> >> >>>>>>> subject.
> >> >>>>>>>
> >> >>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >> >>>>>>> setuid(0), execve, and end up satisfying
> >> >>>>>>> 'ns_capable(current_cred()->userns,
> >> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
> >> >>>>>>>
> >> >>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
> >> >>>>>>> be
> >> >>>>>>> safe to check against the userns owning the inode?
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>> So what you are proposing is to replace
> >> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >> >>>>>>
> >> >>>>>> I agree that this is more sane.
> >> >>>>>
> >> >>>>>
> >> >>>>> Right, and I think the two operations you're looking at seem sane
> >> >>>>> to allow.
> >> >>>>
> >> >>>>
> >> >>>> If you are ok with this patch, I will fix all file systems and send
> >> >>>> patches.
> >> >>>
> >> >>>
> >> >>> Sounds good, thanks.
> >> >>>
> >> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
> >> >>>
> >> >>>
> >> >>> Acked-by: Serge E. Hallyn
> >> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >> >>
> >> >>
> >> >> Wait, what?
> >> >>
> >> >> Inodes aren't owned by user namespaces; they're owned by users.  And any
> >> >> user can arrange to have a user namespace in which they pass an
> >> >> inode_capable check on any inode that they own.
> >> >>
> >> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
> >> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> >> >> entirely.
> >> >
> >> >
> >> > The problem I'm trying to solve is this:
> >> >
> >> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
> >> > to use chattr on all files witch this container has access to.
> >> >
> >> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
> >> >
> >> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
> >> > container.
> >> >
> >> > The first solution got its user namespace from the currently running process
> >> > and the second gets its user namespace from the currently opened inode.
> >> >
> >> > So what would be the best solution in this case?
> >>
> >> I'd suggest adding a mount option like fs_owner_uid that names a uid
> >> that owns, in the sense of having unlimited access to, a filesystem.
> >> Then anyone with caps on a namespace owned by that uid could do
> >> whatever.
> >>
> >> Eric?
> >>
> >> --Andy
> >
> > The most obvious problem I can think of with "do whatever" is that this
> > will likely include mknod of char and block devices which you can then
> > chown/chmod as you wish and use to access any devices on the system from
> > an unprivileged container.
> > This can however be mitigated by using the devices cgroup controller.
> 
> Or 'nodev'.  setuid/setgid may have the same problem, too.
> 
> Implementing something like this would also make CAP_DAC_READ_SEARCH
> and CAP_DAC_OVERRIDE work.
> 
> Arguably it should be impossible to mount such a thing in the first
> place without global privilege.
> 
> >
> > You also probably wouldn't want any unprivileged user from the host to
> > find a way to access that mounted filesytem but so long as you do the
> > mount in a separate mountns and don't share uids between the host and
> > the container, that should be fine too.
> 
> This part should be a nonissue -- an unprivileged user who has the
> right uid owns the namespace anyway, so this is the least of your
> worries.
> 
> --Andy

It should be a nonissue so long as we make sure that a file owned by a
uid outside the scope of the container may not be changed even though
fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
a shell and anyone who can see the fs from the host will be getting a
root shell (assuming said file is owned by the host's uid 0).

So that's restricting slightly what "do whatever" would do in this case.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:01                       ` Stéphane Graber
@ 2014-04-30  0:10                         ` Marian Marinov
  2014-04-30  0:12                           ` Andy Lutomirski
  2014-04-30  0:11                         ` Andy Lutomirski
  1 sibling, 1 reply; 28+ messages in thread
From: Marian Marinov @ 2014-04-30  0:10 UTC (permalink / raw)
  To: Stéphane Graber, Andy Lutomirski
  Cc: Eric W. Biederman, Linux Containers, Serge Hallyn, Ted Ts'o,
	Linux Kernel Mailing List, lxc-devel

On 04/30/2014 03:01 AM, Stéphane Graber wrote:
> On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
>>> On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
>>>> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
>>>>> On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>>>>>>>
>>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>>>>>
>>>>>>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>>>>>>>>>
>>>>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>>>>>>>
>>>>>>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>>>>>>>>>>
>>>>>>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm proposing a fix to this, by replacing the
>>>>>>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
>>>>>>>>>>>>> check with ns_capable(current_cred()->user_ns,
>>>>>>>>>>>>> CAP_LINUX_IMMUTABLE).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>>>>>>>>>
>>>>>>>>>>>> /**
>>>>>>>>>>>>    * capable - Determine if the current task has a superior
>>>>>>>>>>>> capability in effect
>>>>>>>>>>>>    * @cap: The capability to be tested for
>>>>>>>>>>>>    *
>>>>>>>>>>>>    * Return true if the current task has the given superior
>>>>>>>>>>>> capability currently
>>>>>>>>>>>>    * available for use, false if not.
>>>>>>>>>>>>    *
>>>>>>>>>>>>    * This sets PF_SUPERPRIV on the task if the capability is
>>>>>>>>>>>> available on the
>>>>>>>>>>>>    * assumption that it's about to be used.
>>>>>>>>>>>>    */
>>>>>>>>>>>> bool capable(int cap)
>>>>>>>>>>>> {
>>>>>>>>>>>>          return ns_capable(&init_user_ns, cap);
>>>>>>>>>>>> }
>>>>>>>>>>>> EXPORT_SYMBOL(capable);
>>>>>>>>>>>>
>>>>>>>>>>>> The documentation states that it is for "the current task", and I
>>>>>>>>>>>> can't imagine any use case, where user namespaces are in effect,
>>>>>>>>>>>> where
>>>>>>>>>>>> using init_user_ns would ever make sense.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> the init_user_ns represents the user_ns owning the object, not the
>>>>>>>>>>> subject.
>>>>>>>>>>>
>>>>>>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>>>>>>>>>>> setuid(0), execve, and end up satisfying
>>>>>>>>>>> 'ns_capable(current_cred()->userns,
>>>>>>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>>>>>>>>>>>
>>>>>>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
>>>>>>>>>>> be
>>>>>>>>>>> safe to check against the userns owning the inode?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So what you are proposing is to replace
>>>>>>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>>>>>>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>>>>>>>>>
>>>>>>>>>> I agree that this is more sane.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, and I think the two operations you're looking at seem sane
>>>>>>>>> to allow.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you are ok with this patch, I will fix all file systems and send
>>>>>>>> patches.
>>>>>>>
>>>>>>>
>>>>>>> Sounds good, thanks.
>>>>>>>
>>>>>>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>>>>>>>
>>>>>>>
>>>>>>> Acked-by: Serge E. Hallyn
>>>>>>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>>>>>>
>>>>>>
>>>>>> Wait, what?
>>>>>>
>>>>>> Inodes aren't owned by user namespaces; they're owned by users.  And any
>>>>>> user can arrange to have a user namespace in which they pass an
>>>>>> inode_capable check on any inode that they own.
>>>>>>
>>>>>> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
>>>>>> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>>>>>> entirely.
>>>>>
>>>>>
>>>>> The problem I'm trying to solve is this:
>>>>>
>>>>> container with its own user namespace and CAP_SYS_IMMUTABLE should be able
>>>>> to use chattr on all files witch this container has access to.
>>>>>
>>>>> Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
>>>>>
>>>>> With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
>>>>> container.
>>>>>
>>>>> The first solution got its user namespace from the currently running process
>>>>> and the second gets its user namespace from the currently opened inode.
>>>>>
>>>>> So what would be the best solution in this case?
>>>>
>>>> I'd suggest adding a mount option like fs_owner_uid that names a uid
>>>> that owns, in the sense of having unlimited access to, a filesystem.
>>>> Then anyone with caps on a namespace owned by that uid could do
>>>> whatever.
>>>>
>>>> Eric?
>>>>
>>>> --Andy
>>>
>>> The most obvious problem I can think of with "do whatever" is that this
>>> will likely include mknod of char and block devices which you can then
>>> chown/chmod as you wish and use to access any devices on the system from
>>> an unprivileged container.
>>> This can however be mitigated by using the devices cgroup controller.
>>
>> Or 'nodev'.  setuid/setgid may have the same problem, too.
>>
>> Implementing something like this would also make CAP_DAC_READ_SEARCH
>> and CAP_DAC_OVERRIDE work.
>>
>> Arguably it should be impossible to mount such a thing in the first
>> place without global privilege.
>>
>>>
>>> You also probably wouldn't want any unprivileged user from the host to
>>> find a way to access that mounted filesytem but so long as you do the
>>> mount in a separate mountns and don't share uids between the host and
>>> the container, that should be fine too.
>>
>> This part should be a nonissue -- an unprivileged user who has the
>> right uid owns the namespace anyway, so this is the least of your
>> worries.
>>
>> --Andy
>
> It should be a nonissue so long as we make sure that a file owned by a
> uid outside the scope of the container may not be changed even though
> fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
> a shell and anyone who can see the fs from the host will be getting a
> root shell (assuming said file is owned by the host's uid 0).
>
> So that's restricting slightly what "do whatever" would do in this case.
>

In my case I give an LVM volume to each container and limit the container to only this block device using the devices 
cgroup.
So the inode_capable() fix worked like a charm for me.
The container can not see any filesystem other then its own.
And I have another patch for my kernel that prohibits setns from cgroup other then / which prevents programs from the 
container from getting out. clone() can be used to create new namespaces but can not be used to attach to already 
created namespaces.

Marian


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman@jabber.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:01                       ` Stéphane Graber
  2014-04-30  0:10                         ` Marian Marinov
@ 2014-04-30  0:11                         ` Andy Lutomirski
  2014-04-30  0:21                           ` Serge Hallyn
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-30  0:11 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Marian Marinov, Eric W. Biederman, Linux Containers, Serge Hallyn,
	Ted Ts'o, Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 5:01 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
> On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
>> > On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
>> >> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
>> >> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
>> >> >>
>> >> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>> >> >>>
>> >> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>> >> >>>>
>> >> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>> >> >>>>>
>> >> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>> >> >>>>>>
>> >> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>> >> >>>>>>>
>> >> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>> >> >>>>>>>>
>> >> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> I'm proposing a fix to this, by replacing the
>> >> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
>> >> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
>> >> >>>>>>>>> CAP_LINUX_IMMUTABLE).
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>> >> >>>>>>>>
>> >> >>>>>>>> /**
>> >> >>>>>>>>   * capable - Determine if the current task has a superior
>> >> >>>>>>>> capability in effect
>> >> >>>>>>>>   * @cap: The capability to be tested for
>> >> >>>>>>>>   *
>> >> >>>>>>>>   * Return true if the current task has the given superior
>> >> >>>>>>>> capability currently
>> >> >>>>>>>>   * available for use, false if not.
>> >> >>>>>>>>   *
>> >> >>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
>> >> >>>>>>>> available on the
>> >> >>>>>>>>   * assumption that it's about to be used.
>> >> >>>>>>>>   */
>> >> >>>>>>>> bool capable(int cap)
>> >> >>>>>>>> {
>> >> >>>>>>>>         return ns_capable(&init_user_ns, cap);
>> >> >>>>>>>> }
>> >> >>>>>>>> EXPORT_SYMBOL(capable);
>> >> >>>>>>>>
>> >> >>>>>>>> The documentation states that it is for "the current task", and I
>> >> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
>> >> >>>>>>>> where
>> >> >>>>>>>> using init_user_ns would ever make sense.
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
>> >> >>>>>>> subject.
>> >> >>>>>>>
>> >> >>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
>> >> >>>>>>> setuid(0), execve, and end up satisfying
>> >> >>>>>>> 'ns_capable(current_cred()->userns,
>> >> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>> >> >>>>>>>
>> >> >>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
>> >> >>>>>>> be
>> >> >>>>>>> safe to check against the userns owning the inode?
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>> So what you are proposing is to replace
>> >> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>> >> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>> >> >>>>>>
>> >> >>>>>> I agree that this is more sane.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Right, and I think the two operations you're looking at seem sane
>> >> >>>>> to allow.
>> >> >>>>
>> >> >>>>
>> >> >>>> If you are ok with this patch, I will fix all file systems and send
>> >> >>>> patches.
>> >> >>>
>> >> >>>
>> >> >>> Sounds good, thanks.
>> >> >>>
>> >> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>> >> >>>
>> >> >>>
>> >> >>> Acked-by: Serge E. Hallyn
>> >> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>> >> >>
>> >> >>
>> >> >> Wait, what?
>> >> >>
>> >> >> Inodes aren't owned by user namespaces; they're owned by users.  And any
>> >> >> user can arrange to have a user namespace in which they pass an
>> >> >> inode_capable check on any inode that they own.
>> >> >>
>> >> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
>> >> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>> >> >> entirely.
>> >> >
>> >> >
>> >> > The problem I'm trying to solve is this:
>> >> >
>> >> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
>> >> > to use chattr on all files witch this container has access to.
>> >> >
>> >> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
>> >> >
>> >> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
>> >> > container.
>> >> >
>> >> > The first solution got its user namespace from the currently running process
>> >> > and the second gets its user namespace from the currently opened inode.
>> >> >
>> >> > So what would be the best solution in this case?
>> >>
>> >> I'd suggest adding a mount option like fs_owner_uid that names a uid
>> >> that owns, in the sense of having unlimited access to, a filesystem.
>> >> Then anyone with caps on a namespace owned by that uid could do
>> >> whatever.
>> >>
>> >> Eric?
>> >>
>> >> --Andy
>> >
>> > The most obvious problem I can think of with "do whatever" is that this
>> > will likely include mknod of char and block devices which you can then
>> > chown/chmod as you wish and use to access any devices on the system from
>> > an unprivileged container.
>> > This can however be mitigated by using the devices cgroup controller.
>>
>> Or 'nodev'.  setuid/setgid may have the same problem, too.
>>
>> Implementing something like this would also make CAP_DAC_READ_SEARCH
>> and CAP_DAC_OVERRIDE work.
>>
>> Arguably it should be impossible to mount such a thing in the first
>> place without global privilege.
>>
>> >
>> > You also probably wouldn't want any unprivileged user from the host to
>> > find a way to access that mounted filesytem but so long as you do the
>> > mount in a separate mountns and don't share uids between the host and
>> > the container, that should be fine too.
>>
>> This part should be a nonissue -- an unprivileged user who has the
>> right uid owns the namespace anyway, so this is the least of your
>> worries.
>>
>> --Andy
>
> It should be a nonissue so long as we make sure that a file owned by a
> uid outside the scope of the container may not be changed even though
> fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
> a shell and anyone who can see the fs from the host will be getting a
> root shell (assuming said file is owned by the host's uid 0).

I feel like that's too fragile.  I'd rather add a rule that one of
these filesystems always acts like it's nosuid unless you're inside a
user namespace that matches fs_owner_uid.

Maybe even that is too weird.  How about setuid, setgid, and fcaps
only work on mounts that are in mount namespaces that are owned by the
current user namespace or one of its parents?  IOW, a struct mount is
only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
parents?

Untrusted mounts would act like they are nosuid,nodev.  Someone can
try to figure out a safe way to relax nodev at some point.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:10                         ` Marian Marinov
@ 2014-04-30  0:12                           ` Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-30  0:12 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Stéphane Graber, Eric W. Biederman, Linux Containers,
	Serge Hallyn, Ted Ts'o, Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 5:10 PM, Marian Marinov <mm@1h.com> wrote:
> On 04/30/2014 03:01 AM, Stéphane Graber wrote:
>>
>> On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
>>>
>>> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com>
>>> wrote:
>>>>
>>>> On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
>>>>>
>>>>> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
>>>>>>
>>>>>> On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm proposing a fix to this, by replacing the
>>>>>>>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
>>>>>>>>>>>>>> check with ns_capable(current_cred()->user_ns,
>>>>>>>>>>>>>> CAP_LINUX_IMMUTABLE).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
>>>>>>>>>>>>>
>>>>>>>>>>>>> /**
>>>>>>>>>>>>>    * capable - Determine if the current task has a superior
>>>>>>>>>>>>> capability in effect
>>>>>>>>>>>>>    * @cap: The capability to be tested for
>>>>>>>>>>>>>    *
>>>>>>>>>>>>>    * Return true if the current task has the given superior
>>>>>>>>>>>>> capability currently
>>>>>>>>>>>>>    * available for use, false if not.
>>>>>>>>>>>>>    *
>>>>>>>>>>>>>    * This sets PF_SUPERPRIV on the task if the capability is
>>>>>>>>>>>>> available on the
>>>>>>>>>>>>>    * assumption that it's about to be used.
>>>>>>>>>>>>>    */
>>>>>>>>>>>>> bool capable(int cap)
>>>>>>>>>>>>> {
>>>>>>>>>>>>>          return ns_capable(&init_user_ns, cap);
>>>>>>>>>>>>> }
>>>>>>>>>>>>> EXPORT_SYMBOL(capable);
>>>>>>>>>>>>>
>>>>>>>>>>>>> The documentation states that it is for "the current task", and
>>>>>>>>>>>>> I
>>>>>>>>>>>>> can't imagine any use case, where user namespaces are in
>>>>>>>>>>>>> effect,
>>>>>>>>>>>>> where
>>>>>>>>>>>>> using init_user_ns would ever make sense.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> the init_user_ns represents the user_ns owning the object, not
>>>>>>>>>>>> the
>>>>>>>>>>>> subject.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch by Marian is wrong.  Anyone can do
>>>>>>>>>>>> 'clone(CLONE_NEWUSER)',
>>>>>>>>>>>> setuid(0), execve, and end up satisfying
>>>>>>>>>>>> 'ns_capable(current_cred()->userns,
>>>>>>>>>>>> CAP_SYS_IMMUTABLE)' by definition.
>>>>>>>>>>>>
>>>>>>>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it
>>>>>>>>>>>> should
>>>>>>>>>>>> be
>>>>>>>>>>>> safe to check against the userns owning the inode?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So what you are proposing is to replace
>>>>>>>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
>>>>>>>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
>>>>>>>>>>>
>>>>>>>>>>> I agree that this is more sane.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, and I think the two operations you're looking at seem sane
>>>>>>>>>> to allow.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you are ok with this patch, I will fix all file systems and send
>>>>>>>>> patches.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sounds good, thanks.
>>>>>>>>
>>>>>>>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Acked-by: Serge E. Hallyn
>>>>>>>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Wait, what?
>>>>>>>
>>>>>>> Inodes aren't owned by user namespaces; they're owned by users.  And
>>>>>>> any
>>>>>>> user can arrange to have a user namespace in which they pass an
>>>>>>> inode_capable check on any inode that they own.
>>>>>>>
>>>>>>> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If
>>>>>>> this
>>>>>>> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
>>>>>>> entirely.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The problem I'm trying to solve is this:
>>>>>>
>>>>>> container with its own user namespace and CAP_SYS_IMMUTABLE should be
>>>>>> able
>>>>>> to use chattr on all files witch this container has access to.
>>>>>>
>>>>>> Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not
>>>>>> working.
>>>>>>
>>>>>> With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
>>>>>> container.
>>>>>>
>>>>>> The first solution got its user namespace from the currently running
>>>>>> process
>>>>>> and the second gets its user namespace from the currently opened
>>>>>> inode.
>>>>>>
>>>>>> So what would be the best solution in this case?
>>>>>
>>>>>
>>>>> I'd suggest adding a mount option like fs_owner_uid that names a uid
>>>>> that owns, in the sense of having unlimited access to, a filesystem.
>>>>> Then anyone with caps on a namespace owned by that uid could do
>>>>> whatever.
>>>>>
>>>>> Eric?
>>>>>
>>>>> --Andy
>>>>
>>>>
>>>> The most obvious problem I can think of with "do whatever" is that this
>>>> will likely include mknod of char and block devices which you can then
>>>> chown/chmod as you wish and use to access any devices on the system from
>>>> an unprivileged container.
>>>> This can however be mitigated by using the devices cgroup controller.
>>>
>>>
>>> Or 'nodev'.  setuid/setgid may have the same problem, too.
>>>
>>> Implementing something like this would also make CAP_DAC_READ_SEARCH
>>> and CAP_DAC_OVERRIDE work.
>>>
>>> Arguably it should be impossible to mount such a thing in the first
>>> place without global privilege.
>>>
>>>>
>>>> You also probably wouldn't want any unprivileged user from the host to
>>>> find a way to access that mounted filesytem but so long as you do the
>>>> mount in a separate mountns and don't share uids between the host and
>>>> the container, that should be fine too.
>>>
>>>
>>> This part should be a nonissue -- an unprivileged user who has the
>>> right uid owns the namespace anyway, so this is the least of your
>>> worries.
>>>
>>> --Andy
>>
>>
>> It should be a nonissue so long as we make sure that a file owned by a
>> uid outside the scope of the container may not be changed even though
>> fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
>> a shell and anyone who can see the fs from the host will be getting a
>> root shell (assuming said file is owned by the host's uid 0).
>>
>> So that's restricting slightly what "do whatever" would do in this case.
>>
>
> In my case I give an LVM volume to each container and limit the container to
> only this block device using the devices cgroup.
> So the inode_capable() fix worked like a charm for me.
> The container can not see any filesystem other then its own.
> And I have another patch for my kernel that prohibits setns from cgroup
> other then / which prevents programs from the container from getting out.
> clone() can be used to create new namespaces but can not be used to attach
> to already created namespaces.

Doesn't matter -- the risk here is that an attacker outside the
namespace can get an fd that points to a directory in the namespace.
SCM_RIGHTS would be the major vector.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-29 22:45             ` Andy Lutomirski
  2014-04-29 23:06               ` Theodore Ts'o
  2014-04-29 23:20               ` Marian Marinov
@ 2014-04-30  0:16               ` Serge Hallyn
  2014-04-30  0:32                 ` Theodore Ts'o
  2 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-30  0:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marian Marinov, containers, Ted Ts'o,
	Linux Kernel Mailing List, lxc-devel

Quoting Andy Lutomirski (luto@amacapital.net):
> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> > Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
> >>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >>>>>>>
> >>>>>>> I'm proposing a fix to this, by replacing the capable(CAP_LINUX_IMMUTABLE)
> >>>>>>> check with ns_capable(current_cred()->user_ns, CAP_LINUX_IMMUTABLE).
> >>>>>>
> >>>>>> Um, wouldn't it be better to simply fix the capable() function?
> >>>>>>
> >>>>>> /**
> >>>>>>  * capable - Determine if the current task has a superior capability in effect
> >>>>>>  * @cap: The capability to be tested for
> >>>>>>  *
> >>>>>>  * Return true if the current task has the given superior capability currently
> >>>>>>  * available for use, false if not.
> >>>>>>  *
> >>>>>>  * This sets PF_SUPERPRIV on the task if the capability is available on the
> >>>>>>  * assumption that it's about to be used.
> >>>>>>  */
> >>>>>> bool capable(int cap)
> >>>>>> {
> >>>>>> 	return ns_capable(&init_user_ns, cap);
> >>>>>> }
> >>>>>> EXPORT_SYMBOL(capable);
> >>>>>>
> >>>>>> The documentation states that it is for "the current task", and I
> >>>>>> can't imagine any use case, where user namespaces are in effect, where
> >>>>>> using init_user_ns would ever make sense.
> >>>>>
> >>>>> the init_user_ns represents the user_ns owning the object, not the
> >>>>> subject.
> >>>>>
> >>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >>>>> setuid(0), execve, and end up satisfying 'ns_capable(current_cred()->userns,
> >>>>> CAP_SYS_IMMUTABLE)' by definition.
> >>>>>
> >>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should be
> >>>>> safe to check against the userns owning the inode?
> >>>>>
> >>>>
> >>>> So what you are proposing is to replace 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >>>>
> >>>> I agree that this is more sane.
> >>>
> >>> Right, and I think the two operations you're looking at seem sane
> >>> to allow.
> >>
> >> If you are ok with this patch, I will fix all file systems and send patches.
> > 
> > Sounds good, thanks.
> > 
> >> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
> > 
> > Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> 
> Wait, what?
> 
> Inodes aren't owned by user namespaces; they're owned by users.  And any
> user can arrange to have a user namespace in which they pass an
> inode_capable check on any inode that they own.
> 
> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this

Sigh, yeah...  I just dont' understand what it is.  But you're right.

> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> entirely.
> 
> Nacked-by: Andy Lutomirski <luto@amacapital.net>

I forget the details, but there was another case where I wanted to
have the userns which 'owns' the whole fs available.  I guess we'd
have to check against that instead of using inode_capable.

-serge

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:11                         ` Andy Lutomirski
@ 2014-04-30  0:21                           ` Serge Hallyn
  2014-04-30  0:23                             ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-30  0:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stéphane Graber, Ted Ts'o, Linux Containers,
	Linux Kernel Mailing List, lxc-devel, Eric W. Biederman

Quoting Andy Lutomirski (luto@amacapital.net):
> On Tue, Apr 29, 2014 at 5:01 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
> > On Tue, Apr 29, 2014 at 04:51:54PM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 29, 2014 at 4:47 PM, Stéphane Graber <stgraber@ubuntu.com> wrote:
> >> > On Tue, Apr 29, 2014 at 04:22:55PM -0700, Andy Lutomirski wrote:
> >> >> On Tue, Apr 29, 2014 at 4:20 PM, Marian Marinov <mm@1h.com> wrote:
> >> >> > On 04/30/2014 01:45 AM, Andy Lutomirski wrote:
> >> >> >>
> >> >> >> On 04/29/2014 03:29 PM, Serge Hallyn wrote:
> >> >> >>>
> >> >> >>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >> >> >>>>
> >> >> >>>> On 04/30/2014 01:02 AM, Serge Hallyn wrote:
> >> >> >>>>>
> >> >> >>>>> Quoting Marian Marinov (mm-108MBtLGafw@public.gmane.org):
> >> >> >>>>>>
> >> >> >>>>>> On 04/29/2014 09:52 PM, Serge Hallyn wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> Quoting Theodore Ts'o (tytso-3s7WtUTddSA@public.gmane.org):
> >> >> >>>>>>>>
> >> >> >>>>>>>> On Tue, Apr 29, 2014 at 04:49:14PM +0300, Marian Marinov wrote:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> I'm proposing a fix to this, by replacing the
> >> >> >>>>>>>>> capable(CAP_LINUX_IMMUTABLE)
> >> >> >>>>>>>>> check with ns_capable(current_cred()->user_ns,
> >> >> >>>>>>>>> CAP_LINUX_IMMUTABLE).
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>> Um, wouldn't it be better to simply fix the capable() function?
> >> >> >>>>>>>>
> >> >> >>>>>>>> /**
> >> >> >>>>>>>>   * capable - Determine if the current task has a superior
> >> >> >>>>>>>> capability in effect
> >> >> >>>>>>>>   * @cap: The capability to be tested for
> >> >> >>>>>>>>   *
> >> >> >>>>>>>>   * Return true if the current task has the given superior
> >> >> >>>>>>>> capability currently
> >> >> >>>>>>>>   * available for use, false if not.
> >> >> >>>>>>>>   *
> >> >> >>>>>>>>   * This sets PF_SUPERPRIV on the task if the capability is
> >> >> >>>>>>>> available on the
> >> >> >>>>>>>>   * assumption that it's about to be used.
> >> >> >>>>>>>>   */
> >> >> >>>>>>>> bool capable(int cap)
> >> >> >>>>>>>> {
> >> >> >>>>>>>>         return ns_capable(&init_user_ns, cap);
> >> >> >>>>>>>> }
> >> >> >>>>>>>> EXPORT_SYMBOL(capable);
> >> >> >>>>>>>>
> >> >> >>>>>>>> The documentation states that it is for "the current task", and I
> >> >> >>>>>>>> can't imagine any use case, where user namespaces are in effect,
> >> >> >>>>>>>> where
> >> >> >>>>>>>> using init_user_ns would ever make sense.
> >> >> >>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>> the init_user_ns represents the user_ns owning the object, not the
> >> >> >>>>>>> subject.
> >> >> >>>>>>>
> >> >> >>>>>>> The patch by Marian is wrong.  Anyone can do 'clone(CLONE_NEWUSER)',
> >> >> >>>>>>> setuid(0), execve, and end up satisfying
> >> >> >>>>>>> 'ns_capable(current_cred()->userns,
> >> >> >>>>>>> CAP_SYS_IMMUTABLE)' by definition.
> >> >> >>>>>>>
> >> >> >>>>>>> So NACK to that particular patch.  I'm not sure, but IIUC it should
> >> >> >>>>>>> be
> >> >> >>>>>>> safe to check against the userns owning the inode?
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>> So what you are proposing is to replace
> >> >> >>>>>> 'ns_capable(current_cred()->userns, CAP_SYS_IMMUTABLE)' with
> >> >> >>>>>> 'inode_capable(inode, CAP_SYS_IMMUTABLE)' ?
> >> >> >>>>>>
> >> >> >>>>>> I agree that this is more sane.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> Right, and I think the two operations you're looking at seem sane
> >> >> >>>>> to allow.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> If you are ok with this patch, I will fix all file systems and send
> >> >> >>>> patches.
> >> >> >>>
> >> >> >>>
> >> >> >>> Sounds good, thanks.
> >> >> >>>
> >> >> >>>> Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org>
> >> >> >>>
> >> >> >>>
> >> >> >>> Acked-by: Serge E. Hallyn
> >> >> >>> <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> >> >> >>
> >> >> >>
> >> >> >> Wait, what?
> >> >> >>
> >> >> >> Inodes aren't owned by user namespaces; they're owned by users.  And any
> >> >> >> user can arrange to have a user namespace in which they pass an
> >> >> >> inode_capable check on any inode that they own.
> >> >> >>
> >> >> >> Presumably there's a reason that CAP_SYS_IMMUTABLE is needed.  If this
> >> >> >> gets merged, then it would be better to just drop CAP_SYS_IMMUTABLE
> >> >> >> entirely.
> >> >> >
> >> >> >
> >> >> > The problem I'm trying to solve is this:
> >> >> >
> >> >> > container with its own user namespace and CAP_SYS_IMMUTABLE should be able
> >> >> > to use chattr on all files witch this container has access to.
> >> >> >
> >> >> > Unfortunately with the capable(CAP_SYS_IMMUTABLE) check this is not working.
> >> >> >
> >> >> > With the proposed two fixes CAP_SYS_IMMUTABLE started working in the
> >> >> > container.
> >> >> >
> >> >> > The first solution got its user namespace from the currently running process
> >> >> > and the second gets its user namespace from the currently opened inode.
> >> >> >
> >> >> > So what would be the best solution in this case?
> >> >>
> >> >> I'd suggest adding a mount option like fs_owner_uid that names a uid
> >> >> that owns, in the sense of having unlimited access to, a filesystem.
> >> >> Then anyone with caps on a namespace owned by that uid could do
> >> >> whatever.
> >> >>
> >> >> Eric?
> >> >>
> >> >> --Andy
> >> >
> >> > The most obvious problem I can think of with "do whatever" is that this
> >> > will likely include mknod of char and block devices which you can then
> >> > chown/chmod as you wish and use to access any devices on the system from
> >> > an unprivileged container.
> >> > This can however be mitigated by using the devices cgroup controller.
> >>
> >> Or 'nodev'.  setuid/setgid may have the same problem, too.
> >>
> >> Implementing something like this would also make CAP_DAC_READ_SEARCH
> >> and CAP_DAC_OVERRIDE work.
> >>
> >> Arguably it should be impossible to mount such a thing in the first
> >> place without global privilege.
> >>
> >> >
> >> > You also probably wouldn't want any unprivileged user from the host to
> >> > find a way to access that mounted filesytem but so long as you do the
> >> > mount in a separate mountns and don't share uids between the host and
> >> > the container, that should be fine too.
> >>
> >> This part should be a nonissue -- an unprivileged user who has the
> >> right uid owns the namespace anyway, so this is the least of your
> >> worries.
> >>
> >> --Andy
> >
> > It should be a nonissue so long as we make sure that a file owned by a
> > uid outside the scope of the container may not be changed even though
> > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
> > a shell and anyone who can see the fs from the host will be getting a
> > root shell (assuming said file is owned by the host's uid 0).
> 
> I feel like that's too fragile.  I'd rather add a rule that one of

yeah I don't wnat to rush something like that.  I'd rather stash
the userns of the task which did the mounting and check against
that.  Note that would make it worthless unless and until we allowed
mounting from non-init userns, but then we can only claim "our fs
superblock readers suck and therefore containers can't mount an fs"
so long before we start to feel some shame and audit them...

> these filesystems always acts like it's nosuid unless you're inside a
> user namespace that matches fs_owner_uid.
> 
> Maybe even that is too weird.  How about setuid, setgid, and fcaps
> only work on mounts that are in mount namespaces that are owned by the
> current user namespace or one of its parents?  IOW, a struct mount is
> only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
> parents?
> 
> Untrusted mounts would act like they are nosuid,nodev.  Someone can
> try to figure out a safe way to relax nodev at some point.
> 
> --Andy
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:21                           ` Serge Hallyn
@ 2014-04-30  0:23                             ` Andy Lutomirski
  2014-04-30  0:44                               ` Serge Hallyn
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-30  0:23 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Stéphane Graber, Ted Ts'o, Linux Containers,
	Linux Kernel Mailing List, lxc-devel, Eric W. Biederman

On Tue, Apr 29, 2014 at 5:21 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto@amacapital.net):
>> > It should be a nonissue so long as we make sure that a file owned by a
>> > uid outside the scope of the container may not be changed even though
>> > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
>> > a shell and anyone who can see the fs from the host will be getting a
>> > root shell (assuming said file is owned by the host's uid 0).
>>
>> I feel like that's too fragile.  I'd rather add a rule that one of
>
> yeah I don't wnat to rush something like that.  I'd rather stash
> the userns of the task which did the mounting and check against
> that.  Note that would make it worthless unless and until we allowed
> mounting from non-init userns, but then we can only claim "our fs
> superblock readers suck and therefore containers can't mount an fs"
> so long before we start to feel some shame and audit them...
>
>> these filesystems always acts like it's nosuid unless you're inside a
>> user namespace that matches fs_owner_uid.
>>
>> Maybe even that is too weird.  How about setuid, setgid, and fcaps
>> only work on mounts that are in mount namespaces that are owned by the
>> current user namespace or one of its parents?  IOW, a struct mount is
>> only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
>> parents?
>>
>> Untrusted mounts would act like they are nosuid,nodev.  Someone can
>> try to figure out a safe way to relax nodev at some point.

Do you like this variant?  We could add a way for global root to mount
an fs on behalf of a userns.  I'd rather this be more explicit than
just mounting it in a mount ns owned by the user namespace, though.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:16               ` Serge Hallyn
@ 2014-04-30  0:32                 ` Theodore Ts'o
  2014-04-30  0:33                   ` Andy Lutomirski
                                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Theodore Ts'o @ 2014-04-30  0:32 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Andy Lutomirski, Marian Marinov, containers,
	Linux Kernel Mailing List, lxc-devel

On Wed, Apr 30, 2014 at 12:16:41AM +0000, Serge Hallyn wrote:
> I forget the details, but there was another case where I wanted to
> have the userns which 'owns' the whole fs available.  I guess we'd
> have to check against that instead of using inode_capable.

Yes, that sounds right.

And *please* tell me that that under no circumstances can anyone other
than root@init_user_ns is allowed to use mknod....

						- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:32                 ` Theodore Ts'o
@ 2014-04-30  0:33                   ` Andy Lutomirski
  2014-04-30  0:40                   ` Serge Hallyn
  2014-04-30  7:48                   ` Eric W. Biederman
  2 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-30  0:33 UTC (permalink / raw)
  To: Theodore Ts'o, Serge Hallyn, Andy Lutomirski, Marian Marinov,
	Linux Containers, Linux Kernel Mailing List, lxc-devel

On Tue, Apr 29, 2014 at 5:32 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Wed, Apr 30, 2014 at 12:16:41AM +0000, Serge Hallyn wrote:
>> I forget the details, but there was another case where I wanted to
>> have the userns which 'owns' the whole fs available.  I guess we'd
>> have to check against that instead of using inode_capable.
>
> Yes, that sounds right.
>
> And *please* tell me that that under no circumstances can anyone other
> than root@init_user_ns is allowed to use mknod....

I haven't read the code, but I tried it the other day, and I got
-EPERM.  So we're okay for now.  (Well, other than the issue I just
sent to security@kernel.org, but that's not quite the same thing.)

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:32                 ` Theodore Ts'o
  2014-04-30  0:33                   ` Andy Lutomirski
@ 2014-04-30  0:40                   ` Serge Hallyn
  2014-04-30  7:48                   ` Eric W. Biederman
  2 siblings, 0 replies; 28+ messages in thread
From: Serge Hallyn @ 2014-04-30  0:40 UTC (permalink / raw)
  To: Theodore Ts'o, Andy Lutomirski, Marian Marinov, containers,
	Linux Kernel Mailing List, lxc-devel

Quoting Theodore Ts'o (tytso@mit.edu):
> On Wed, Apr 30, 2014 at 12:16:41AM +0000, Serge Hallyn wrote:
> > I forget the details, but there was another case where I wanted to
> > have the userns which 'owns' the whole fs available.  I guess we'd
> > have to check against that instead of using inode_capable.
> 
> Yes, that sounds right.
> 
> And *please* tell me that that under no circumstances can anyone other
> than root@init_user_ns is allowed to use mknod....

That's the case.  We've considered making exceptions for things like
/dev/null, but in practice bind-mounting devices from the host has
worked out just fine.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:23                             ` Andy Lutomirski
@ 2014-04-30  0:44                               ` Serge Hallyn
  2014-04-30  1:03                                 ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Serge Hallyn @ 2014-04-30  0:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stéphane Graber, Ted Ts'o, Linux Containers,
	Linux Kernel Mailing List, lxc-devel, Eric W. Biederman

Quoting Andy Lutomirski (luto@amacapital.net):
> On Tue, Apr 29, 2014 at 5:21 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > Quoting Andy Lutomirski (luto@amacapital.net):
> >> > It should be a nonissue so long as we make sure that a file owned by a
> >> > uid outside the scope of the container may not be changed even though
> >> > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
> >> > a shell and anyone who can see the fs from the host will be getting a
> >> > root shell (assuming said file is owned by the host's uid 0).
> >>
> >> I feel like that's too fragile.  I'd rather add a rule that one of
> >
> > yeah I don't wnat to rush something like that.  I'd rather stash
> > the userns of the task which did the mounting and check against
> > that.  Note that would make it worthless unless and until we allowed
> > mounting from non-init userns, but then we can only claim "our fs
> > superblock readers suck and therefore containers can't mount an fs"
> > so long before we start to feel some shame and audit them...
> >
> >> these filesystems always acts like it's nosuid unless you're inside a
> >> user namespace that matches fs_owner_uid.
> >>
> >> Maybe even that is too weird.  How about setuid, setgid, and fcaps
> >> only work on mounts that are in mount namespaces that are owned by the
> >> current user namespace or one of its parents?  IOW, a struct mount is
> >> only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
> >> parents?
> >>
> >> Untrusted mounts would act like they are nosuid,nodev.  Someone can
> >> try to figure out a safe way to relax nodev at some point.
> 
> Do you like this variant?  We could add a way for global root to mount
> an fs on behalf of a userns.  I'd rather this be more explicit than
> just mounting it in a mount ns owned by the user namespace, though.

I'm missing something.  Which mnt are you talking about?  A user
can just clone a new userns and then clone(CLONE_NEWNS) to get a set
of mounts owned by himself...  We need to get a mnt (or a cred or
straight to a userns) tied to the first mount of the superblock, istm.

-serge

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:44                               ` Serge Hallyn
@ 2014-04-30  1:03                                 ` Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-04-30  1:03 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Stéphane Graber, Ted Ts'o, Linux Containers,
	Linux Kernel Mailing List, lxc-devel, Eric W. Biederman

On Tue, Apr 29, 2014 at 5:44 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto@amacapital.net):
>> On Tue, Apr 29, 2014 at 5:21 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>> > Quoting Andy Lutomirski (luto@amacapital.net):
>> >> > It should be a nonissue so long as we make sure that a file owned by a
>> >> > uid outside the scope of the container may not be changed even though
>> >> > fs_owner_uid is set. Otherwise, it's just a matter of chmod +S on say
>> >> > a shell and anyone who can see the fs from the host will be getting a
>> >> > root shell (assuming said file is owned by the host's uid 0).
>> >>
>> >> I feel like that's too fragile.  I'd rather add a rule that one of
>> >
>> > yeah I don't wnat to rush something like that.  I'd rather stash
>> > the userns of the task which did the mounting and check against
>> > that.  Note that would make it worthless unless and until we allowed
>> > mounting from non-init userns, but then we can only claim "our fs
>> > superblock readers suck and therefore containers can't mount an fs"
>> > so long before we start to feel some shame and audit them...
>> >
>> >> these filesystems always acts like it's nosuid unless you're inside a
>> >> user namespace that matches fs_owner_uid.
>> >>
>> >> Maybe even that is too weird.  How about setuid, setgid, and fcaps
>> >> only work on mounts that are in mount namespaces that are owned by the
>> >> current user namespace or one of its parents?  IOW, a struct mount is
>> >> only trusted if mnt->mnt_ns->user_ns == current user ns or one of its
>> >> parents?
>> >>
>> >> Untrusted mounts would act like they are nosuid,nodev.  Someone can
>> >> try to figure out a safe way to relax nodev at some point.
>>
>> Do you like this variant?  We could add a way for global root to mount
>> an fs on behalf of a userns.  I'd rather this be more explicit than
>> just mounting it in a mount ns owned by the user namespace, though.
>
> I'm missing something.  Which mnt are you talking about?  A user
> can just clone a new userns and then clone(CLONE_NEWNS) to get a set
> of mounts owned by himself...  We need to get a mnt (or a cred or
> straight to a userns) tied to the first mount of the superblock, istm.

Sure, but then that user is the only user that ends up trusting the
mount.  This could end up being surprising, though -- it would be
weird for a bind mount of an implicitly nosuid mount to end up not
being nosuid as seen by the mounter.

This still feels a bit overcomplicated.  Grr.  I do like that idea
that, if someone creates a tmpfs mount, sticks a setuid file in it,
and hands someone outside the namespace an fd to the mount, that the
file won't be setuid as seen from outside.  This will make using the
same uids in different containers a lot safer, although it still won't
really be safe.

Another wart: chroot on a directory in someone else's mount namespace
works, I think.  That just seems wrong, although I don't immediately
see how it's a problem.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  0:32                 ` Theodore Ts'o
  2014-04-30  0:33                   ` Andy Lutomirski
  2014-04-30  0:40                   ` Serge Hallyn
@ 2014-04-30  7:48                   ` Eric W. Biederman
  2014-04-30 13:33                     ` Serge Hallyn
  2 siblings, 1 reply; 28+ messages in thread
From: Eric W. Biederman @ 2014-04-30  7:48 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Serge Hallyn, Andy Lutomirski, Marian Marinov, containers,
	Linux Kernel Mailing List, lxc-devel

Theodore Ts'o <tytso@mit.edu> writes:

> On Wed, Apr 30, 2014 at 12:16:41AM +0000, Serge Hallyn wrote:
>> I forget the details, but there was another case where I wanted to
>> have the userns which 'owns' the whole fs available.  I guess we'd
>> have to check against that instead of using inode_capable.
>
> Yes, that sounds right.
>
> And *please* tell me that that under no circumstances can anyone other
> than root@init_user_ns is allowed to use mknod....

Nope.  mknod not allowed.  capable(CAP_MKNOD) is required is required
and I can't see any reason to change that.

As a rule of thumb, the only additional actions allowed in a user
namespace above and beyond what an ordinary unpriviliged user would be
allowed to do are those things which we only don't allow because they
could confuse a setuid root executable.


If we ever allow the creation of immutable files by unprivileged users
those files would at least have to be kept completely separate from the
files the global root encounters (aka a disjoint mount namespace).

I do not currently see a path to safely using immutable files with just
user namespace root permission.

Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace
  2014-04-30  7:48                   ` Eric W. Biederman
@ 2014-04-30 13:33                     ` Serge Hallyn
  0 siblings, 0 replies; 28+ messages in thread
From: Serge Hallyn @ 2014-04-30 13:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Ts'o, Andy Lutomirski, Marian Marinov, containers,
	Linux Kernel Mailing List, lxc-devel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Theodore Ts'o <tytso@mit.edu> writes:
> 
> > On Wed, Apr 30, 2014 at 12:16:41AM +0000, Serge Hallyn wrote:
> >> I forget the details, but there was another case where I wanted to
> >> have the userns which 'owns' the whole fs available.  I guess we'd
> >> have to check against that instead of using inode_capable.
> >
> > Yes, that sounds right.
> >
> > And *please* tell me that that under no circumstances can anyone other
> > than root@init_user_ns is allowed to use mknod....
> 
> Nope.  mknod not allowed.  capable(CAP_MKNOD) is required is required
> and I can't see any reason to change that.
> 
> As a rule of thumb, the only additional actions allowed in a user
> namespace above and beyond what an ordinary unpriviliged user would be
> allowed to do are those things which we only don't allow because they
> could confuse a setuid root executable.
> 
> 
> If we ever allow the creation of immutable files by unprivileged users
> those files would at least have to be kept completely separate from the
> files the global root encounters (aka a disjoint mount namespace).
> 
> I do not currently see a path to safely using immutable files with just
> user namespace root permission.

It's very far off, but I think the path is:

1. at first mount of a blockdev, note the cred (or just userns) which
mounted it
2. work on auditing superblock readers so we can start allowing some
blockdev mounts in user namespaces :)
3. check for privilege against the userns owning a superblock

-serge

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-04-30 13:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-29 13:49 ioctl CAP_LINUX_IMMUTABLE is checked in the wrong namespace Marian Marinov
2014-04-29 18:35 ` Theodore Ts'o
2014-04-29 18:52   ` Serge Hallyn
2014-04-29 21:49     ` Marian Marinov
2014-04-29 22:02       ` Serge Hallyn
2014-04-29 22:24         ` Marian Marinov
2014-04-29 22:29           ` Serge Hallyn
2014-04-29 22:45             ` Andy Lutomirski
2014-04-29 23:06               ` Theodore Ts'o
2014-04-29 23:07                 ` Andy Lutomirski
2014-04-29 23:20               ` Marian Marinov
2014-04-29 23:22                 ` Andy Lutomirski
2014-04-29 23:47                   ` Stéphane Graber
2014-04-29 23:51                     ` Andy Lutomirski
2014-04-30  0:01                       ` Stéphane Graber
2014-04-30  0:10                         ` Marian Marinov
2014-04-30  0:12                           ` Andy Lutomirski
2014-04-30  0:11                         ` Andy Lutomirski
2014-04-30  0:21                           ` Serge Hallyn
2014-04-30  0:23                             ` Andy Lutomirski
2014-04-30  0:44                               ` Serge Hallyn
2014-04-30  1:03                                 ` Andy Lutomirski
2014-04-30  0:16               ` Serge Hallyn
2014-04-30  0:32                 ` Theodore Ts'o
2014-04-30  0:33                   ` Andy Lutomirski
2014-04-30  0:40                   ` Serge Hallyn
2014-04-30  7:48                   ` Eric W. Biederman
2014-04-30 13:33                     ` Serge Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox