[PATCH RESEND] userns: enable tmpfs support for user namespace

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RESEND] userns: enable tmpfs support for user namespace
@ 2013-01-16 10:25 Gao feng
       [not found] ` <1358331945-4106-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Gao feng @ 2013-01-16 10:25 UTC (permalink / raw)
  To: ebiederm-aS9lmoZGLiVWk0Htik3J/w
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

From: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

Since the memory used by tmpfs is under control of
memory cgroup. and the files under the tmpfs will not
be leak to other tmpfs.

So mounting tmpfs in user namespace does no harm to the
host,we can allow tmpfs to be mounted in user namespace.

Signed-off-by: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 mm/shmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 5dd56f6..8eff60a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= shmem_mount,
 	.kill_sb	= kill_litter_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init shmem_init(void)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found] ` <1358331945-4106-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2013-01-16 14:35   ` Serge Hallyn
  2013-01-17  1:07     ` Gao feng
  0 siblings, 1 reply; 25+ messages in thread
From: Serge Hallyn @ 2013-01-16 14:35 UTC (permalink / raw)
  To: Gao feng
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
> From: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> 
> Since the memory used by tmpfs is under control of
> memory cgroup. and the files under the tmpfs will not
> be leak to other tmpfs.
> 
> So mounting tmpfs in user namespace does no harm to the
> host,we can allow tmpfs to be mounted in user namespace.
> 
> Signed-off-by: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

I've got the same patch in my kernel at
http://kernel.ubuntu.com/git?p=serge/quantal-userns.git;a=summary

except note that there are two definitions of shmem_fs_type.

> ---
>  mm/shmem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5dd56f6..8eff60a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.7.11.7
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
  2013-01-16 14:35   ` Serge Hallyn
@ 2013-01-17  1:07     ` Gao feng
       [not found]       ` <50F74EC6.60004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Gao feng @ 2013-01-17  1:07 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On 2013/01/16 22:35, Serge Hallyn wrote:
> Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
>> From: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
>>
>> Since the memory used by tmpfs is under control of
>> memory cgroup. and the files under the tmpfs will not
>> be leak to other tmpfs.
>>
>> So mounting tmpfs in user namespace does no harm to the
>> host,we can allow tmpfs to be mounted in user namespace.
>>
>> Signed-off-by: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> 
> I've got the same patch in my kernel at
> http://kernel.ubuntu.com/git?p=serge/quantal-userns.git;a=summary
> 
> except note that there are two definitions of shmem_fs_type.
> 

Yes, I miss the other one,Do you have plan to push this patch
into linus's linux-2.6 or eric's userns tree?

I'm trying to add userns support for libvirt,so I need tmpfs to
be allowed to mount in userns.

Thanks!

>> ---
>>  mm/shmem.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 5dd56f6..8eff60a 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2766,6 +2766,7 @@ static struct file_system_type shmem_fs_type = {
>>  	.name		= "tmpfs",
>>  	.mount		= shmem_mount,
>>  	.kill_sb	= kill_litter_super,
>> +	.fs_flags	= FS_USERNS_MOUNT,
>>  };
>>  
>>  int __init shmem_init(void)
>> -- 
>> 1.7.11.7
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]       ` <50F74EC6.60004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2013-01-17 10:15         ` Eric W. Biederman
  2013-01-17 17:14         ` Serge Hallyn
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-17 10:15 UTC (permalink / raw)
  To: Gao feng; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> writes:

> On 2013/01/16 22:35, Serge Hallyn wrote:
>> Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
>>> From: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
>>>
>>> Since the memory used by tmpfs is under control of
>>> memory cgroup. and the files under the tmpfs will not
>>> be leak to other tmpfs.
>>>
>>> So mounting tmpfs in user namespace does no harm to the
>>> host,we can allow tmpfs to be mounted in user namespace.
>>>
>>> Signed-off-by: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
>> 
>> I've got the same patch in my kernel at
>> http://kernel.ubuntu.com/git?p=serge/quantal-userns.git;a=summary
>> 
>> except note that there are two definitions of shmem_fs_type.
>> 
>
> Yes, I miss the other one,Do you have plan to push this patch
> into linus's linux-2.6 or eric's userns tree?

Linus's linux-2.6.git is a symlink to Linus's linux.git  Talking about
2.6 in this day and age is a bit confusing.

> I'm trying to add userns support for libvirt,so I need tmpfs to
> be allowed to mount in userns.

At a practical level I am happy to apply a complete patch in my tree
once if it gets posted to fs-devel and probably lkml for review,
and I have had a chance to read the memory control group and verify
with my own little eyes that the memory control group can in fact limit
tmpfs.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]       ` <50F74EC6.60004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  2013-01-17 10:15         ` Eric W. Biederman
@ 2013-01-17 17:14         ` Serge Hallyn
  2013-01-17 23:34           ` Eric W. Biederman
  1 sibling, 1 reply; 25+ messages in thread
From: Serge Hallyn @ 2013-01-17 17:14 UTC (permalink / raw)
  To: Gao feng
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
> On 2013/01/16 22:35, Serge Hallyn wrote:
> > Quoting Gao feng (gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org):
> >> From: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> >>
> >> Since the memory used by tmpfs is under control of
> >> memory cgroup. and the files under the tmpfs will not
> >> be leak to other tmpfs.
> >>
> >> So mounting tmpfs in user namespace does no harm to the
> >> host,we can allow tmpfs to be mounted in user namespace.
> >>
> >> Signed-off-by: gaofeng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> > 
> > I've got the same patch in my kernel at
> > http://kernel.ubuntu.com/git?p=serge/quantal-userns.git;a=summary
> > 
> > except note that there are two definitions of shmem_fs_type.
> > 
> 
> Yes, I miss the other one,Do you have plan to push this patch
> into linus's linux-2.6 or eric's userns tree?

I actually was waiting for Eric to do it, but I'll happily send it
to linux-fsdevel and lkml (in a bit).

> I'm trying to add userns support for libvirt,so I need tmpfs to
> be allowed to mount in userns.

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
  2013-01-17 17:14         ` Serge Hallyn
@ 2013-01-17 23:34           ` Eric W. Biederman
       [not found]             ` <87fw1zbd03.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-17 23:34 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:

> I actually was waiting for Eric to do it, but I'll happily send it
> to linux-fsdevel and lkml (in a bit).

I might just.

I will take a look at this in a week or so.  I want to get through the
core userspace bits first so I can just cross those off my list of
things that need to be done.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]             ` <87fw1zbd03.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-18  4:24               ` Serge Hallyn
  2013-01-18  5:29                 ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Serge Hallyn @ 2013-01-18  4:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> 
> > I actually was waiting for Eric to do it, but I'll happily send it
> > to linux-fsdevel and lkml (in a bit).
> 
> I might just.
> 
> I will take a look at this in a week or so.  I want to get through the
> core userspace bits first so I can just cross those off my list of
> things that need to be done.
> 
> Eric

Ok, I'll wait on sending it then - thanks.

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
  2013-01-18  4:24               ` Serge Hallyn
@ 2013-01-18  5:29                 ` Eric W. Biederman
       [not found]                   ` <87vcavys6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-18  5:29 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:

> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>> 
>> > I actually was waiting for Eric to do it, but I'll happily send it
>> > to linux-fsdevel and lkml (in a bit).
>> 
>> I might just.
>> 
>> I will take a look at this in a week or so.  I want to get through the
>> core userspace bits first so I can just cross those off my list of
>> things that need to be done.
>> 
>> Eric
>
> Ok, I'll wait on sending it then - thanks.

Next up is my patch to shadow-utils and then taking a good hard stare at
what is left kernel side.

One of the questions I need to answer is:  Do cgroups actually work
for what needs to be limited?  Or does the the focus of cgroups on
processes without other ownership in objects fundamentally limit what
can be expressed with cgroups in a problematic way.  In which case would
some hierarchical limits based on user namespaces and rlimits be easier
to implement and make more sense.

I think the answer will be that cgroups are good enough but that
question certainly needs looking at.

Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                   ` <87vcavys6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-18  5:33                     ` Glauber Costa
       [not found]                       ` <50F8DEBF.1020701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2013-01-20 19:24                     ` Serge E. Hallyn
  1 sibling, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2013-01-18  5:33 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> 
>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>
>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>> to linux-fsdevel and lkml (in a bit).
>>>
>>> I might just.
>>>
>>> I will take a look at this in a week or so.  I want to get through the
>>> core userspace bits first so I can just cross those off my list of
>>> things that need to be done.
>>>
>>> Eric
>>
>> Ok, I'll wait on sending it then - thanks.
> 
> Next up is my patch to shadow-utils and then taking a good hard stare at
> what is left kernel side.
> 
> One of the questions I need to answer is:  Do cgroups actually work
> for what needs to be limited?  Or does the the focus of cgroups on
> processes without other ownership in objects fundamentally limit what
> can be expressed with cgroups in a problematic way.  In which case would
> some hierarchical limits based on user namespaces and rlimits be easier
> to implement and make more sense.
> 
> I think the answer will be that cgroups are good enough but that
> question certainly needs looking at.
> 
> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
> 
First easy question:

cgroups are not necessarily configured.

IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
relying on the fact that cgroups will stop memory abuse (correct me if I
am wrong).

But what if the user is not using cgroups?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                       ` <50F8DEBF.1020701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-01-18  6:04                         ` Eric W. Biederman
       [not found]                           ` <87ip6vyqkf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-01-21  2:39                         ` [PATCH RESEND] userns: enable tmpfs support for user namespace Gao feng
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-18  6:04 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>> 
>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>
>>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>>> to linux-fsdevel and lkml (in a bit).
>>>>
>>>> I might just.
>>>>
>>>> I will take a look at this in a week or so.  I want to get through the
>>>> core userspace bits first so I can just cross those off my list of
>>>> things that need to be done.
>>>>
>>>> Eric
>>>
>>> Ok, I'll wait on sending it then - thanks.
>> 
>> Next up is my patch to shadow-utils and then taking a good hard stare at
>> what is left kernel side.
>> 
>> One of the questions I need to answer is:  Do cgroups actually work
>> for what needs to be limited?  Or does the the focus of cgroups on
>> processes without other ownership in objects fundamentally limit what
>> can be expressed with cgroups in a problematic way.  In which case would
>> some hierarchical limits based on user namespaces and rlimits be easier
>> to implement and make more sense.
>> 
>> I think the answer will be that cgroups are good enough but that
>> question certainly needs looking at.
>> 
>> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
>> 
> First easy question:
>
> cgroups are not necessarily configured.
>
> IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
> relying on the fact that cgroups will stop memory abuse (correct me if I
> am wrong).
>
> But what if the user is not using cgroups?

The requirement for tmpfs to be safe is that there should be a control
that root can use to prevent DOS attacks.  If you don't choose to use
what is available then shrug.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                           ` <87ip6vyqkf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-18  6:10                             ` Glauber Costa
       [not found]                               ` <50F8E73B.7000903-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2013-01-18  6:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 01/17/2013 10:04 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
>> On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>
>>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>>
>>>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>>>> to linux-fsdevel and lkml (in a bit).
>>>>>
>>>>> I might just.
>>>>>
>>>>> I will take a look at this in a week or so.  I want to get through the
>>>>> core userspace bits first so I can just cross those off my list of
>>>>> things that need to be done.
>>>>>
>>>>> Eric
>>>>
>>>> Ok, I'll wait on sending it then - thanks.
>>>
>>> Next up is my patch to shadow-utils and then taking a good hard stare at
>>> what is left kernel side.
>>>
>>> One of the questions I need to answer is:  Do cgroups actually work
>>> for what needs to be limited?  Or does the the focus of cgroups on
>>> processes without other ownership in objects fundamentally limit what
>>> can be expressed with cgroups in a problematic way.  In which case would
>>> some hierarchical limits based on user namespaces and rlimits be easier
>>> to implement and make more sense.
>>>
>>> I think the answer will be that cgroups are good enough but that
>>> question certainly needs looking at.
>>>
>>> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
>>>
>> First easy question:
>>
>> cgroups are not necessarily configured.
>>
>> IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
>> relying on the fact that cgroups will stop memory abuse (correct me if I
>> am wrong).
>>
>> But what if the user is not using cgroups?
> 
> The requirement for tmpfs to be safe is that there should be a control
> that root can use to prevent DOS attacks.  If you don't choose to use
> what is available then shrug.
> 

Yes, but if you are an unprivileged user, the whole box would go down,
not just your namespace/container/group, etc.

So at first it seems to me very risky to allow an unprivileged mount of
something that may or may not be constrained. IOW: not depending on
cgroups and relying solely on namespaces to achieve seems better at first.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                               ` <50F8E73B.7000903-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-01-18  7:01                                 ` Eric W. Biederman
       [not found]                                   ` <87ip6vug8p.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-18  7:01 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 01/17/2013 10:04 PM, Eric W. Biederman wrote:
>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> 
>>> On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>
>>>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>>>
>>>>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>>>>> to linux-fsdevel and lkml (in a bit).
>>>>>>
>>>>>> I might just.
>>>>>>
>>>>>> I will take a look at this in a week or so.  I want to get through the
>>>>>> core userspace bits first so I can just cross those off my list of
>>>>>> things that need to be done.
>>>>>>
>>>>>> Eric
>>>>>
>>>>> Ok, I'll wait on sending it then - thanks.
>>>>
>>>> Next up is my patch to shadow-utils and then taking a good hard stare at
>>>> what is left kernel side.
>>>>
>>>> One of the questions I need to answer is:  Do cgroups actually work
>>>> for what needs to be limited?  Or does the the focus of cgroups on
>>>> processes without other ownership in objects fundamentally limit what
>>>> can be expressed with cgroups in a problematic way.  In which case would
>>>> some hierarchical limits based on user namespaces and rlimits be easier
>>>> to implement and make more sense.
>>>>
>>>> I think the answer will be that cgroups are good enough but that
>>>> question certainly needs looking at.
>>>>
>>>> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
>>>>
>>> First easy question:
>>>
>>> cgroups are not necessarily configured.
>>>
>>> IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
>>> relying on the fact that cgroups will stop memory abuse (correct me if I
>>> am wrong).
>>>
>>> But what if the user is not using cgroups?
>> 
>> The requirement for tmpfs to be safe is that there should be a control
>> that root can use to prevent DOS attacks.  If you don't choose to use
>> what is available then shrug.
>> 
>
> Yes, but if you are an unprivileged user, the whole box would go down,
> not just your namespace/container/group, etc.
>
> So at first it seems to me very risky to allow an unprivileged mount of
> something that may or may not be constrained. IOW: not depending on
> cgroups and relying solely on namespaces to achieve seems better at
> first.

Cgroups are the entity that is supposed to constrain these things.  That
is what they are there for.  If cgroups don't work for containers what
is the point?

That said this seems we may be approaching the question I was asking
earlier.  Is there a semantic reason why we can express things better
in terms of user namespaces and rlimits than we can in terms of control
groups?

There may actually be in this case.  Memory accounting has long been a
tricky problem because it is hard to know who to charge the memory to.
I think it would be very reasonable to make the rule that you charge the
memory to the user namespace that created the object.

For a filesystem like tmpfs that would be the user namespace where the
tmpfs is first mounted.

At which point with a touch of care you can build hierarchal limits
for memory use of tmpfs and other consumers of memory based on user
namespaces.

(I still think memory control groups being able to limit tmpfs is enough
 to allow tmpfs mounts in user namespaces because that is only 2 lines
 of code and some verification that memory control groups can do the
 work.  But if there is a better way we can add that.)

What are the practical problems with control groups that makes them
undesirable/hard to use with namespaces?

What would it take to fix the problems with control groups?

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                   ` <87ip6vug8p.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-18 18:42                                     ` Glauber Costa
       [not found]                                       ` <50F99787.3090708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2013-01-18 18:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
> What are the practical problems with control groups that makes them
> undesirable/hard to use with namespaces?
> 
> What would it take to fix the problems with control groups?
There aren't, from my PoV.
When I run containers, for instance, I basically join all namespaces,
configure all groups, and everything I can.

I do know, however, that not every use case is like that, and those
things tends to be very loosely coupled.

So what I am worried about, is not a valid container usage where you
have your constraints configured. But if I login into a box as a normal
user, and that now allows me to create a userns, and maliciously fire a
big tmpfs from there, cgroups will not gonna be there for me - it's not
a container box, is just something I am trying to break.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                       ` <50F99787.3090708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-01-18 19:48                                         ` Serge Hallyn
  2013-01-18 19:52                                           ` Glauber Costa
  0 siblings, 1 reply; 25+ messages in thread
From: Serge Hallyn @ 2013-01-18 19:48 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
> > What are the practical problems with control groups that makes them
> > undesirable/hard to use with namespaces?
> > 
> > What would it take to fix the problems with control groups?
> There aren't, from my PoV.
> When I run containers, for instance, I basically join all namespaces,
> configure all groups, and everything I can.
> 
> I do know, however, that not every use case is like that, and those
> things tends to be very loosely coupled.
> 
> So what I am worried about, is not a valid container usage where you
> have your constraints configured. But if I login into a box as a normal
> user, and that now allows me to create a userns, and maliciously fire a
> big tmpfs from there, cgroups will not gonna be there for me - it's not
> a container box, is just something I am trying to break.

Hm.  So basically we would, ideally, find a way to make it so that if
uid 500 creates a new userns and, therein, mounts a tmpfs, then that
tmpfs gets accounted and limited along with uid 500's RSS?

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
  2013-01-18 19:48                                         ` Serge Hallyn
@ 2013-01-18 19:52                                           ` Glauber Costa
       [not found]                                             ` <50F9A7FD.6030507-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2013-01-18 19:52 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 01/18/2013 11:48 AM, Serge Hallyn wrote:
> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
>>> What are the practical problems with control groups that makes them
>>> undesirable/hard to use with namespaces?
>>>
>>> What would it take to fix the problems with control groups?
>> There aren't, from my PoV.
>> When I run containers, for instance, I basically join all namespaces,
>> configure all groups, and everything I can.
>>
>> I do know, however, that not every use case is like that, and those
>> things tends to be very loosely coupled.
>>
>> So what I am worried about, is not a valid container usage where you
>> have your constraints configured. But if I login into a box as a normal
>> user, and that now allows me to create a userns, and maliciously fire a
>> big tmpfs from there, cgroups will not gonna be there for me - it's not
>> a container box, is just something I am trying to break.
> 
> Hm.  So basically we would, ideally, find a way to make it so that if
> uid 500 creates a new userns and, therein, mounts a tmpfs, then that
> tmpfs gets accounted and limited along with uid 500's RSS?
> 

Dunno.

One option would be to start establishing stronger connections between
cgroups and namespaces in a sane way. And then, we only allow such
mounts when you are actually cgroup backed.

Again, I am not concerned with sane setups in here, but much more with
normal users in normal systems taking advantage of this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                             ` <50F9A7FD.6030507-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-01-18 20:06                                               ` Serge Hallyn
  2013-01-18 20:18                                               ` Eric W. Biederman
  2013-01-20 19:27                                               ` Serge E. Hallyn
  2 siblings, 0 replies; 25+ messages in thread
From: Serge Hallyn @ 2013-01-18 20:06 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 01/18/2013 11:48 AM, Serge Hallyn wrote:
> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> >> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
> >>> What are the practical problems with control groups that makes them
> >>> undesirable/hard to use with namespaces?
> >>>
> >>> What would it take to fix the problems with control groups?
> >> There aren't, from my PoV.
> >> When I run containers, for instance, I basically join all namespaces,
> >> configure all groups, and everything I can.
> >>
> >> I do know, however, that not every use case is like that, and those
> >> things tends to be very loosely coupled.
> >>
> >> So what I am worried about, is not a valid container usage where you
> >> have your constraints configured. But if I login into a box as a normal
> >> user, and that now allows me to create a userns, and maliciously fire a
> >> big tmpfs from there, cgroups will not gonna be there for me - it's not
> >> a container box, is just something I am trying to break.
> > 
> > Hm.  So basically we would, ideally, find a way to make it so that if
> > uid 500 creates a new userns and, therein, mounts a tmpfs, then that
> > tmpfs gets accounted and limited along with uid 500's RSS?
> > 
> 
> Dunno.
> 
> One option would be to start establishing stronger connections between
> cgroups and namespaces in a sane way. And then, we only allow such
> mounts when you are actually cgroup backed.
> 
> Again, I am not concerned with sane setups in here, but much more with
> normal users in normal systems taking advantage of this.

Right, and since a strong motivation for this is precisely to allow
unprivileged unshare of user_ns, and, from there, all others, we
can't talk about "setups", as the whole point is to not need a setup.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                             ` <50F9A7FD.6030507-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2013-01-18 20:06                                               ` Serge Hallyn
@ 2013-01-18 20:18                                               ` Eric W. Biederman
       [not found]                                                 ` <87hament1w.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-01-20 19:27                                               ` Serge E. Hallyn
  2 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-18 20:18 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 01/18/2013 11:48 AM, Serge Hallyn wrote:
>> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>>> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
>>>> What are the practical problems with control groups that makes them
>>>> undesirable/hard to use with namespaces?
>>>>
>>>> What would it take to fix the problems with control groups?
>>> There aren't, from my PoV.
>>> When I run containers, for instance, I basically join all namespaces,
>>> configure all groups, and everything I can.
>>>
>>> I do know, however, that not every use case is like that, and those
>>> things tends to be very loosely coupled.
>>>
>>> So what I am worried about, is not a valid container usage where you
>>> have your constraints configured. But if I login into a box as a normal
>>> user, and that now allows me to create a userns, and maliciously fire a
>>> big tmpfs from there, cgroups will not gonna be there for me - it's not
>>> a container box, is just something I am trying to break.
>> 
>> Hm.  So basically we would, ideally, find a way to make it so that if
>> uid 500 creates a new userns and, therein, mounts a tmpfs, then that
>> tmpfs gets accounted and limited along with uid 500's RSS?
>> 
>
> Dunno.
>
> One option would be to start establishing stronger connections between
> cgroups and namespaces in a sane way. And then, we only allow such
> mounts when you are actually cgroup backed.
>
> Again, I am not concerned with sane setups in here, but much more with
> normal users in normal systems taking advantage of this.

For me this translates into it would be good if we can get distros to
establish some good default limits for when they enable user namespaces.

At a practical level I just looked and my current distribution does not
limit the size of processes I can create or the amount of memory those
processes can use.  So unless the distro I am looking at is strongly
atypical any kind of memory limit is certainly worth providing but won't
help much.

Are memory control groups at this point palatable to general purpose
distributions?  If memory control groups are not that does seem to be an
argument that we need something better.  Last I looked memory control
groups had some ugly overheads and doubled the size of struct page so
there are certainly reasons why memory control groups might be a problem.

Serge does ubunutu enable memory control groups?

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                                 ` <87hament1w.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-18 20:32                                                   ` Serge Hallyn
  2013-01-18 22:38                                                   ` Glauber Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Serge Hallyn @ 2013-01-18 20:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Serge does ubunutu enable memory control groups?

Yup, they're enabled, but not configured by default.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                                 ` <87hament1w.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-01-18 20:32                                                   ` Serge Hallyn
@ 2013-01-18 22:38                                                   ` Glauber Costa
       [not found]                                                     ` <50F9CED4.2070109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Glauber Costa @ 2013-01-18 22:38 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 01/18/2013 12:18 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
> 
>> On 01/18/2013 11:48 AM, Serge Hallyn wrote:
>>> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>>>> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
>>>>> What are the practical problems with control groups that makes them
>>>>> undesirable/hard to use with namespaces?
>>>>>
>>>>> What would it take to fix the problems with control groups?
>>>> There aren't, from my PoV.
>>>> When I run containers, for instance, I basically join all namespaces,
>>>> configure all groups, and everything I can.
>>>>
>>>> I do know, however, that not every use case is like that, and those
>>>> things tends to be very loosely coupled.
>>>>
>>>> So what I am worried about, is not a valid container usage where you
>>>> have your constraints configured. But if I login into a box as a normal
>>>> user, and that now allows me to create a userns, and maliciously fire a
>>>> big tmpfs from there, cgroups will not gonna be there for me - it's not
>>>> a container box, is just something I am trying to break.
>>>
>>> Hm.  So basically we would, ideally, find a way to make it so that if
>>> uid 500 creates a new userns and, therein, mounts a tmpfs, then that
>>> tmpfs gets accounted and limited along with uid 500's RSS?
>>>
>>
>> Dunno.
>>
>> One option would be to start establishing stronger connections between
>> cgroups and namespaces in a sane way. And then, we only allow such
>> mounts when you are actually cgroup backed.
>>
>> Again, I am not concerned with sane setups in here, but much more with
>> normal users in normal systems taking advantage of this.
> 
> For me this translates into it would be good if we can get distros to
> establish some good default limits for when they enable user namespaces.
> 
> At a practical level I just looked and my current distribution does not
> limit the size of processes I can create or the amount of memory those
> processes can use.  So unless the distro I am looking at is strongly
> atypical any kind of memory limit is certainly worth providing but won't
> help much.
> 
> Are memory control groups at this point palatable to general purpose
> distributions?  If memory control groups are not that does seem to be an
> argument that we need something better.  Last I looked memory control
> groups had some ugly overheads and doubled the size of struct page so
> there are certainly reasons why memory control groups might be a problem.
> 
We are actively placing a lot of effort into reducing this overhead.

> Serge does ubunutu enable memory control groups?
> 
I believe at least systemd uses it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                   ` <87vcavys6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2013-01-18  5:33                     ` Glauber Costa
@ 2013-01-20 19:24                     ` Serge E. Hallyn
  1 sibling, 0 replies; 25+ messages in thread
From: Serge E. Hallyn @ 2013-01-20 19:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> 
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
> >> 
> >> > I actually was waiting for Eric to do it, but I'll happily send it
> >> > to linux-fsdevel and lkml (in a bit).
> >> 
> >> I might just.
> >> 
> >> I will take a look at this in a week or so.  I want to get through the
> >> core userspace bits first so I can just cross those off my list of
> >> things that need to be done.
> >> 
> >> Eric
> >
> > Ok, I'll wait on sending it then - thanks.
> 
> Next up is my patch to shadow-utils and then taking a good hard stare at
> what is left kernel side.
> 
> One of the questions I need to answer is:  Do cgroups actually work
> for what needs to be limited?  Or does the the focus of cgroups on
> processes without other ownership in objects fundamentally limit what

Note that with pam (and presumably through systemd) you can tie a user to
a cgroup at login.  You could chown the cgroup to the user, counting on
proper hierarchy enforcement to not let the user escape, while the user
could still descend in the hierarchy for flexibility (i.e. creating his
own containers).

> can be expressed with cgroups in a problematic way.  In which case would
> some hierarchical limits based on user namespaces and rlimits be easier
> to implement and make more sense.

1. most distros enable cgroups, so the penalty is being paid anyway.
2. if there are real gains to be had by adding another set of limits
   as mentioned here, then I hope someone will look into it.  But that
   it separate from the question of whether the memory cgroup is
   enough to justify allowing tmpfs mounts in user namespaces.

   We could make the FS_USERNS_MOUNT flag in tmpfs conditional on
   the memory cgroup being on?  Though that doesn't guarantee that
   the cgroups will be properly configured.

> I think the answer will be that cgroups are good enough but that
> question certainly needs looking at.
> 
> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.

Sounds good - thanks.

Is there a git tree for the shadow-utils changes which people can start
looking at?

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                             ` <50F9A7FD.6030507-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2013-01-18 20:06                                               ` Serge Hallyn
  2013-01-18 20:18                                               ` Eric W. Biederman
@ 2013-01-20 19:27                                               ` Serge E. Hallyn
  2 siblings, 0 replies; 25+ messages in thread
From: Serge E. Hallyn @ 2013-01-20 19:27 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> On 01/18/2013 11:48 AM, Serge Hallyn wrote:
> > Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
> >> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
> >>> What are the practical problems with control groups that makes them
> >>> undesirable/hard to use with namespaces?
> >>>
> >>> What would it take to fix the problems with control groups?
> >> There aren't, from my PoV.
> >> When I run containers, for instance, I basically join all namespaces,
> >> configure all groups, and everything I can.
> >>
> >> I do know, however, that not every use case is like that, and those
> >> things tends to be very loosely coupled.
> >>
> >> So what I am worried about, is not a valid container usage where you
> >> have your constraints configured. But if I login into a box as a normal
> >> user, and that now allows me to create a userns, and maliciously fire a
> >> big tmpfs from there, cgroups will not gonna be there for me - it's not
> >> a container box, is just something I am trying to break.
> > 
> > Hm.  So basically we would, ideally, find a way to make it so that if
> > uid 500 creates a new userns and, therein, mounts a tmpfs, then that
> > tmpfs gets accounted and limited along with uid 500's RSS?
> > 
> 
> Dunno.
> 
> One option would be to start establishing stronger connections between
> cgroups and namespaces in a sane way. And then, we only allow such
> mounts when you are actually cgroup backed.

The latter is probably not horrible - I'm all for encouraging distros
to start always setting up cgroups on login.

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                       ` <50F8DEBF.1020701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2013-01-18  6:04                         ` Eric W. Biederman
@ 2013-01-21  2:39                         ` Gao feng
       [not found]                           ` <50FCAA62.8070804-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Gao feng @ 2013-01-21  2:39 UTC (permalink / raw)
  To: Glauber Costa
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 2013/01/18 13:33, Glauber Costa wrote:
> On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>
>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>
>>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>>> to linux-fsdevel and lkml (in a bit).
>>>>
>>>> I might just.
>>>>
>>>> I will take a look at this in a week or so.  I want to get through the
>>>> core userspace bits first so I can just cross those off my list of
>>>> things that need to be done.
>>>>
>>>> Eric
>>>
>>> Ok, I'll wait on sending it then - thanks.
>>
>> Next up is my patch to shadow-utils and then taking a good hard stare at
>> what is left kernel side.
>>
>> One of the questions I need to answer is:  Do cgroups actually work
>> for what needs to be limited?  Or does the the focus of cgroups on
>> processes without other ownership in objects fundamentally limit what
>> can be expressed with cgroups in a problematic way.  In which case would
>> some hierarchical limits based on user namespaces and rlimits be easier
>> to implement and make more sense.
>>
>> I think the answer will be that cgroups are good enough but that
>> question certainly needs looking at.
>>
>> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
>>
> First easy question:
> 
> cgroups are not necessarily configured.
> 
> IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
> relying on the fact that cgroups will stop memory abuse (correct me if I
> am wrong).
> 
> But what if the user is not using cgroups?
> 

I think maybe we can force config MEMCG being selected when we decide to
enable userns.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RESEND] userns: enable tmpfs support for user namespace
       [not found]                           ` <50FCAA62.8070804-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2013-01-21  5:08                             ` Glauber Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Glauber Costa @ 2013-01-21  5:08 UTC (permalink / raw)
  To: Gao feng
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 01/21/2013 06:39 AM, Gao feng wrote:
> On 2013/01/18 13:33, Glauber Costa wrote:
>> On 01/17/2013 09:29 PM, Eric W. Biederman wrote:
>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>
>>>> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>>>>> Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> writes:
>>>>>
>>>>>> I actually was waiting for Eric to do it, but I'll happily send it
>>>>>> to linux-fsdevel and lkml (in a bit).
>>>>>
>>>>> I might just.
>>>>>
>>>>> I will take a look at this in a week or so.  I want to get through the
>>>>> core userspace bits first so I can just cross those off my list of
>>>>> things that need to be done.
>>>>>
>>>>> Eric
>>>>
>>>> Ok, I'll wait on sending it then - thanks.
>>>
>>> Next up is my patch to shadow-utils and then taking a good hard stare at
>>> what is left kernel side.
>>>
>>> One of the questions I need to answer is:  Do cgroups actually work
>>> for what needs to be limited?  Or does the the focus of cgroups on
>>> processes without other ownership in objects fundamentally limit what
>>> can be expressed with cgroups in a problematic way.  In which case would
>>> some hierarchical limits based on user namespaces and rlimits be easier
>>> to implement and make more sense.
>>>
>>> I think the answer will be that cgroups are good enough but that
>>> question certainly needs looking at.
>>>
>>> Anyway.  shadow-utils, minimal tmpfs, minimal devpts, and then the rest.
>>>
>> First easy question:
>>
>> cgroups are not necessarily configured.
>>
>> IIUC, the aim of this patch is to allow unprivileged mounts of tmpfs
>> relying on the fact that cgroups will stop memory abuse (correct me if I
>> am wrong).
>>
>> But what if the user is not using cgroups?
>>
> 
> I think maybe we can force config MEMCG being selected when we decide to
> enable userns.
> 
Which is the same as nothing.

MEMCG being compile-time selection doesn't really mean anything.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                                     ` <50F9CED4.2070109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2013-01-25  8:12                                                       ` Eric W. Biederman
       [not found]                                                         ` <87zjzxllzz.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2013-01-25  8:12 UTC (permalink / raw)
  To: Glauber Costa; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 01/18/2013 12:18 PM, Eric W. Biederman wrote:
>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> 
>>> On 01/18/2013 11:48 AM, Serge Hallyn wrote:
>>>> Quoting Glauber Costa (glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org):
>>>>> On 01/17/2013 11:01 PM, Eric W. Biederman wrote:
>>>>>> What are the practical problems with control groups that makes them
>>>>>> undesirable/hard to use with namespaces?
>>>>>>
>>>>>> What would it take to fix the problems with control groups?
>>>>> There aren't, from my PoV.
>>>>> When I run containers, for instance, I basically join all namespaces,
>>>>> configure all groups, and everything I can.
>>>>>
>>>>> I do know, however, that not every use case is like that, and those
>>>>> things tends to be very loosely coupled.
>>>>>
>>>>> So what I am worried about, is not a valid container usage where you
>>>>> have your constraints configured. But if I login into a box as a normal
>>>>> user, and that now allows me to create a userns, and maliciously fire a
>>>>> big tmpfs from there, cgroups will not gonna be there for me - it's not
>>>>> a container box, is just something I am trying to break.
>>>>
>>>> Hm.  So basically we would, ideally, find a way to make it so that if
>>>> uid 500 creates a new userns and, therein, mounts a tmpfs, then that
>>>> tmpfs gets accounted and limited along with uid 500's RSS?
>>>>
>>>
>>> Dunno.
>>>
>>> One option would be to start establishing stronger connections between
>>> cgroups and namespaces in a sane way. And then, we only allow such
>>> mounts when you are actually cgroup backed.
>>>
>>> Again, I am not concerned with sane setups in here, but much more with
>>> normal users in normal systems taking advantage of this.
>> 
>> For me this translates into it would be good if we can get distros to
>> establish some good default limits for when they enable user namespaces.
>> 
>> At a practical level I just looked and my current distribution does not
>> limit the size of processes I can create or the amount of memory those
>> processes can use.  So unless the distro I am looking at is strongly
>> atypical any kind of memory limit is certainly worth providing but won't
>> help much.
>> 
>> Are memory control groups at this point palatable to general purpose
>> distributions?  If memory control groups are not that does seem to be an
>> argument that we need something better.  Last I looked memory control
>> groups had some ugly overheads and doubled the size of struct page so
>> there are certainly reasons why memory control groups might be a problem.
>> 
> We are actively placing a lot of effort into reducing this overhead.
>
>> Serge does ubunutu enable memory control groups?
>> 
> I believe at least systemd uses it.

So I just finished my basic review of the current state of memory
control groups.  Memory control groups do succesfully control memory
all kinds of memory (if properly configured) and the overhead has
been reduced to 2/7 the size of struct page on 64bit systems.

<tangent>
By my rough calculations the memory control group overhead is 4MiB per
gigabyte.  It looks like that overhead can be pretty easily cut in half
by simply embedding the flags into the low bits of the memory_control
group pointer.  And I still don't understand why page_cgroup is not
a member of struct page.  But whatever memory control groups have much
less per page overhead than they used to.
</tangent>

So it looks like distros and everyone else who enable user namespaces
and allow multiple users to be logged in at the same to to setup
memory control groups to limit the trouble their users can get into.

I think I will have to add a patch to document that recomendation.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Constraining the memory used by an unprivilged mount of tmpfs.
       [not found]                                                         ` <87zjzxllzz.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-25  8:21                                                           ` Lord Glauber Costa of Sealand
  0 siblings, 0 replies; 25+ messages in thread
From: Lord Glauber Costa of Sealand @ 2013-01-25  8:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Michael Kerrisk


> I think I will have to add a patch to document that recomendation.
> 

Provided this happens, and is documented in all places users would
usually search for, I am fine with it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-01-25  8:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-16 10:25 [PATCH RESEND] userns: enable tmpfs support for user namespace Gao feng
     [not found] ` <1358331945-4106-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-01-16 14:35   ` Serge Hallyn
2013-01-17  1:07     ` Gao feng
     [not found]       ` <50F74EC6.60004-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-01-17 10:15         ` Eric W. Biederman
2013-01-17 17:14         ` Serge Hallyn
2013-01-17 23:34           ` Eric W. Biederman
     [not found]             ` <87fw1zbd03.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-18  4:24               ` Serge Hallyn
2013-01-18  5:29                 ` Eric W. Biederman
     [not found]                   ` <87vcavys6k.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-18  5:33                     ` Glauber Costa
     [not found]                       ` <50F8DEBF.1020701-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-01-18  6:04                         ` Eric W. Biederman
     [not found]                           ` <87ip6vyqkf.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-18  6:10                             ` Glauber Costa
     [not found]                               ` <50F8E73B.7000903-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-01-18  7:01                                 ` Constraining the memory used by an unprivilged mount of tmpfs Eric W. Biederman
     [not found]                                   ` <87ip6vug8p.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-18 18:42                                     ` Glauber Costa
     [not found]                                       ` <50F99787.3090708-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-01-18 19:48                                         ` Serge Hallyn
2013-01-18 19:52                                           ` Glauber Costa
     [not found]                                             ` <50F9A7FD.6030507-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-01-18 20:06                                               ` Serge Hallyn
2013-01-18 20:18                                               ` Eric W. Biederman
     [not found]                                                 ` <87hament1w.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-18 20:32                                                   ` Serge Hallyn
2013-01-18 22:38                                                   ` Glauber Costa
     [not found]                                                     ` <50F9CED4.2070109-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-01-25  8:12                                                       ` Eric W. Biederman
     [not found]                                                         ` <87zjzxllzz.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-25  8:21                                                           ` Lord Glauber Costa of Sealand
2013-01-20 19:27                                               ` Serge E. Hallyn
2013-01-21  2:39                         ` [PATCH RESEND] userns: enable tmpfs support for user namespace Gao feng
     [not found]                           ` <50FCAA62.8070804-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-01-21  5:08                             ` Glauber Costa
2013-01-20 19:24                     ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.