Linux Container Development
 help / color / mirror / Atom feed
* Re: call_usermodehelper in containers
       [not found]                 ` <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-18  2:57                   ` Eric W. Biederman
       [not found]                     ` <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-02-18  2:57 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A


Ccing The containers list because a related discussion is happening there
and somehow this thread has never made it there.

Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:

> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
>> On 11/15, Eric W. Biederman wrote:
>> > 
>> > I don't understand that one.  Having a preforked thread with the
>> > proper
>> > environment that can act like kthreadd in terms of spawning user
>> > mode
>> > helpers works and is simple.
>
> Forgive me replying to such an old thread but ...
>
> After realizing workqueues can't be used to pre-create threads to run
> usermode helpers I've returned to look at this.

If someone can wind up with a good implementation I will be happy.

>> Can't we ask ->child_reaper to create the non-daemonized kernel thread
>> with the "right" ->nsproxy, ->fs, etc?
>
> Eric, do you think this approach would be sufficient too?
>
> Probably wouldn't be quite right for user namespaces but should provide
> what's needed for other cases?
>
> It certainly has the advantage of not having to maintain a plague of
> processes waiting around to execute helpers.

That certainly sounds attractive.  Especially for the case of everyone
who wants to set a core pattern in a container.

I am fuzzy on all of the details right now, but what I do remember is
that in the kernel the user mode helper concepts when they attempted to
scrub a processes environment were quite error prone until we managed to
get kthreadd(pid 2) on the scene which always had a clean environment.

If we are going to tie this kind of thing to the pid namespace I
recommend simplying denying it if you are in a user namespace without
an approrpriate pid namespace.  AKA simply not allowing thigns to be setup
if current->pid_ns->user_ns != current->user_ns.

That still leaves things a little hand-wavy but I hope that helps
conceptually.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                     ` <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-18  3:17                       ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2016-02-18  3:17 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers,
	skinsbursky-5HdwGun5lf+gSpxsJD1C4w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:

> AFAICS kernel/kmod.c used to use create_singlethread_workqueue() and
>  queue_work() to perform umh calls, now it uses only queue_work() and
> the system_unbound_wq workqueue.
>
> Looking at the workqueue sub system there doesn't appear to be a way to
> create a workqueue with a thread runner thread, created within the
> process context at the time of workqueue creation, that then waits to
> run work. So there's no way to create a workqueue to run umh calls
> within a specific process context, such as that of a container, by using
> the workqueue subsystem as it is now.
>
> The problem being that the process context of the caller requesting umh
> isn't necessarily (and shouldn't be used because it could allow the
> caller to hijack the environment) the process context that needs to be
> used for the request.
>
> It looks like the reply to this thread from Oleg that demonstrates using
> child_reaper for the run context could be used though. Capturing the
> struct pid of child_reaper and then using that to locate the appropriate
> task context later (if it still exists) at request time could be used.
>
> That doesn't take care of working out when this should be captured or
> where to put it so it can be obtained at request time (which seems
> difficult in itself).

It would be really really nice if the user namespace could be used
for the where do we look at case.  As every other namespace already
has a pointer to the user namespace, and fundamentally the user
namespace is the permission boundary (from a namespace perspective).

So for the equivalent of kthreadd in a user namespace we need a thread
that has a full set of namespaces owned by the user namespaces.

On one side this is very easy to obtain if we look at the process that
sets core_pattern or mounts one of the nfs filesystems (such as the
filesystem that when mounted starts nfsd), and just fork a kernel thread
from it.

On another side perhaps what we want is a syscall call it start_umhd
that says repurpose the caller of this thread to handle future user mode
helper calls.  That we could tie to a user namespace quite easily.

This definitely does not play particularly nice with queue work and
friends, but that is just infrastructure and we can update user mode
helper to use something else reasonable as long as we have a solid
design.

Perhaps there is a combination of the two ideas that could work.
Instead of a syscall use the invocation of a service that needs a user
mode helper as a trigger to create such a launcher thread.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                     ` <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-02-18  3:43                       ` Kamezawa Hiroyuki
       [not found]                         ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
       [not found]                         ` <1455777387.3188.24.camel@themaw.net>
  0 siblings, 2 replies; 14+ messages in thread
From: Kamezawa Hiroyuki @ 2016-02-18  3:43 UTC (permalink / raw)
  To: Eric W. Biederman, Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On 2016/02/18 11:57, Eric W. Biederman wrote:
> 
> Ccing The containers list because a related discussion is happening there
> and somehow this thread has never made it there.
> 
> Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> 
>> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
>>> On 11/15, Eric W. Biederman wrote:
>>>>
>>>> I don't understand that one.  Having a preforked thread with the
>>>> proper
>>>> environment that can act like kthreadd in terms of spawning user
>>>> mode
>>>> helpers works and is simple.
>>
>> Forgive me replying to such an old thread but ...
>>
>> After realizing workqueues can't be used to pre-create threads to run
>> usermode helpers I've returned to look at this.
> 
> If someone can wind up with a good implementation I will be happy.
> 
>>> Can't we ask ->child_reaper to create the non-daemonized kernel thread
>>> with the "right" ->nsproxy, ->fs, etc?
>>
>> Eric, do you think this approach would be sufficient too?
>>
>> Probably wouldn't be quite right for user namespaces but should provide
>> what's needed for other cases?
>>
>> It certainly has the advantage of not having to maintain a plague of
>> processes waiting around to execute helpers.
> 
> That certainly sounds attractive.  Especially for the case of everyone
> who wants to set a core pattern in a container.
> 
> I am fuzzy on all of the details right now, but what I do remember is
> that in the kernel the user mode helper concepts when they attempted to
> scrub a processes environment were quite error prone until we managed to
> get kthreadd(pid 2) on the scene which always had a clean environment.
> 
> If we are going to tie this kind of thing to the pid namespace I
> recommend simplying denying it if you are in a user namespace without
> an approrpriate pid namespace.  AKA simply not allowing thigns to be setup
> if current->pid_ns->user_ns != current->user_ns.
> 
Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ?

User_ns check seems not to allow core-dump-cather in host will not work if user_ns is used.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                         ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2016-02-18  6:36                           ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-18  6:36 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote:
> On 2016/02/18 11:57, Eric W. Biederman wrote:
> > 
> > Ccing The containers list because a related discussion is happening
> > there
> > and somehow this thread has never made it there.
> > 
> > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> > 
> > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
> > > > On 11/15, Eric W. Biederman wrote:
> > > > > 
> > > > > I don't understand that one.  Having a preforked thread with
> > > > > the
> > > > > proper
> > > > > environment that can act like kthreadd in terms of spawning
> > > > > user
> > > > > mode
> > > > > helpers works and is simple.
> > > 
> > > Forgive me replying to such an old thread but ...
> > > 
> > > After realizing workqueues can't be used to pre-create threads to
> > > run
> > > usermode helpers I've returned to look at this.
> > 
> > If someone can wind up with a good implementation I will be happy.
> > 
> > > > Can't we ask ->child_reaper to create the non-daemonized kernel
> > > > thread
> > > > with the "right" ->nsproxy, ->fs, etc?
> > > 
> > > Eric, do you think this approach would be sufficient too?
> > > 
> > > Probably wouldn't be quite right for user namespaces but should
> > > provide
> > > what's needed for other cases?
> > > 
> > > It certainly has the advantage of not having to maintain a plague
> > > of
> > > processes waiting around to execute helpers.
> > 
> > That certainly sounds attractive.  Especially for the case of
> > everyone
> > who wants to set a core pattern in a container.
> > 
> > I am fuzzy on all of the details right now, but what I do remember
> > is
> > that in the kernel the user mode helper concepts when they attempted
> > to
> > scrub a processes environment were quite error prone until we
> > managed to
> > get kthreadd(pid 2) on the scene which always had a clean
> > environment.
> > 
> > If we are going to tie this kind of thing to the pid namespace I
> > recommend simplying denying it if you are in a user namespace
> > without
> > an approrpriate pid namespace.  AKA simply not allowing thigns to be
> > setup
> > if current->pid_ns->user_ns != current->user_ns.
> > 
> Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ?
> 
> User_ns check seems not to allow core-dump-cather in host will not
> work if user_ns is used.

I don't think so but I'm not sure.

The approach I was talking about assumes the init process of the caller
(say within a container, corresponding to ->child_reaper) is an
appropriate template for umh thread execution.

But I don't think that covers the case where unshare has created
different namespaces, like a mount namespace for example.

The current workqueue sub system can't be used to pre-create a thread to
be used for umh execution so, either is needs changes or yet another
mechanism needs to be implemented.

For uses other than core dumping capturing a reference to the struct pid
of the environment init process and using that as an execution template
should be sufficient and takes care of environment existence problems
with some extra checks, not to mention eliminating the need for a
potentially huge number of kernel threads needing to be created to
provide execution templates.

Where to store this and how to access it when needed is another problem.

Not sure a usermode helper capability is the right thing either as I
thought one important use of user namespaces was to allow unprivileged
users to perform operations they otherwise can't.

Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible ....

Still an appropriate execution template would be needed and IIUC we
can't trust getting that from within a user created namespace
environment.

> 
> Thanks,
> -Kame
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                           ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-18  7:37                             ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-18  7:37 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote:
> On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote:
> > On 2016/02/18 11:57, Eric W. Biederman wrote:
> > > 
> > > Ccing The containers list because a related discussion is
> > > happening
> > > there
> > > and somehow this thread has never made it there.
> > > 
> > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> > > 
> > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
> > > > > On 11/15, Eric W. Biederman wrote:
> > > > > > 
> > > > > > I don't understand that one.  Having a preforked thread with
> > > > > > the
> > > > > > proper
> > > > > > environment that can act like kthreadd in terms of spawning
> > > > > > user
> > > > > > mode
> > > > > > helpers works and is simple.
> > > > 
> > > > Forgive me replying to such an old thread but ...
> > > > 
> > > > After realizing workqueues can't be used to pre-create threads
> > > > to
> > > > run
> > > > usermode helpers I've returned to look at this.
> > > 
> > > If someone can wind up with a good implementation I will be happy.
> > > 
> > > > > Can't we ask ->child_reaper to create the non-daemonized
> > > > > kernel
> > > > > thread
> > > > > with the "right" ->nsproxy, ->fs, etc?
> > > > 
> > > > Eric, do you think this approach would be sufficient too?
> > > > 
> > > > Probably wouldn't be quite right for user namespaces but should
> > > > provide
> > > > what's needed for other cases?
> > > > 
> > > > It certainly has the advantage of not having to maintain a
> > > > plague
> > > > of
> > > > processes waiting around to execute helpers.
> > > 
> > > That certainly sounds attractive.  Especially for the case of
> > > everyone
> > > who wants to set a core pattern in a container.
> > > 
> > > I am fuzzy on all of the details right now, but what I do remember
> > > is
> > > that in the kernel the user mode helper concepts when they
> > > attempted
> > > to
> > > scrub a processes environment were quite error prone until we
> > > managed to
> > > get kthreadd(pid 2) on the scene which always had a clean
> > > environment.
> > > 
> > > If we are going to tie this kind of thing to the pid namespace I
> > > recommend simplying denying it if you are in a user namespace
> > > without
> > > an approrpriate pid namespace.  AKA simply not allowing thigns to
> > > be
> > > setup
> > > if current->pid_ns->user_ns != current->user_ns.
> > > 
> > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ?
> > 
> > User_ns check seems not to allow core-dump-cather in host will not
> > work if user_ns is used.
> 
> I don't think so but I'm not sure.
> 
> The approach I was talking about assumes the init process of the
> caller
> (say within a container, corresponding to ->child_reaper) is an
> appropriate template for umh thread execution.
> 
> But I don't think that covers the case where unshare has created
> different namespaces, like a mount namespace for example.
> 
> The current workqueue sub system can't be used to pre-create a thread
> to
> be used for umh execution so, either is needs changes or yet another
> mechanism needs to be implemented.
> 
> For uses other than core dumping capturing a reference to the struct
> pid
> of the environment init process and using that as an execution
> template
> should be sufficient and takes care of environment existence problems
> with some extra checks, not to mention eliminating the need for a
> potentially huge number of kernel threads needing to be created to
> provide execution templates.
> 
> Where to store this and how to access it when needed is another
> problem.
> 
> Not sure a usermode helper capability is the right thing either as I
> thought one important use of user namespaces was to allow unprivileged
> users to perform operations they otherwise can't.
> 
> Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible ....
> 
> Still an appropriate execution template would be needed and IIUC we
> can't trust getting that from within a user created namespace
> environment.

Perhaps, if a struct cred could be captured at some appropriate time
that could be used to cater for user namespaces.

Eric, do you think that would be possible to do without allowing users
to circumvent security?

> 
> > 
> > Thanks,
> > -Kame

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                             ` <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-18 20:45                               ` Eric W. Biederman
       [not found]                                 ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2016-02-18 20:45 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A,
	bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:

> On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote:
>> On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote:
>> > On 2016/02/18 11:57, Eric W. Biederman wrote:
>> > > 
>> > > Ccing The containers list because a related discussion is
>> > > happening
>> > > there
>> > > and somehow this thread has never made it there.
>> > > 
>> > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
>> > > 
>> > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
>> > > > > On 11/15, Eric W. Biederman wrote:
>> > > > > > 
>> > > > > > I don't understand that one.  Having a preforked thread with
>> > > > > > the
>> > > > > > proper
>> > > > > > environment that can act like kthreadd in terms of spawning
>> > > > > > user
>> > > > > > mode
>> > > > > > helpers works and is simple.
>> > > > 
>> > > > Forgive me replying to such an old thread but ...
>> > > > 
>> > > > After realizing workqueues can't be used to pre-create threads
>> > > > to
>> > > > run
>> > > > usermode helpers I've returned to look at this.
>> > > 
>> > > If someone can wind up with a good implementation I will be happy.
>> > > 
>> > > > > Can't we ask ->child_reaper to create the non-daemonized
>> > > > > kernel
>> > > > > thread
>> > > > > with the "right" ->nsproxy, ->fs, etc?
>> > > > 
>> > > > Eric, do you think this approach would be sufficient too?
>> > > > 
>> > > > Probably wouldn't be quite right for user namespaces but should
>> > > > provide
>> > > > what's needed for other cases?
>> > > > 
>> > > > It certainly has the advantage of not having to maintain a
>> > > > plague
>> > > > of
>> > > > processes waiting around to execute helpers.
>> > > 
>> > > That certainly sounds attractive.  Especially for the case of
>> > > everyone
>> > > who wants to set a core pattern in a container.
>> > > 
>> > > I am fuzzy on all of the details right now, but what I do remember
>> > > is
>> > > that in the kernel the user mode helper concepts when they
>> > > attempted
>> > > to
>> > > scrub a processes environment were quite error prone until we
>> > > managed to
>> > > get kthreadd(pid 2) on the scene which always had a clean
>> > > environment.
>> > > 
>> > > If we are going to tie this kind of thing to the pid namespace I
>> > > recommend simplying denying it if you are in a user namespace
>> > > without
>> > > an approrpriate pid namespace.  AKA simply not allowing thigns to
>> > > be
>> > > setup
>> > > if current->pid_ns->user_ns != current->user_ns.
>> > > 
>> > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ?

I wasn't talking about a capability I was talking about how to identify
where the user mode helper lives.

>> > User_ns check seems not to allow core-dump-cather in host will not
>> > work if user_ns is used.

The bottom line is all of this approaches non-sense if user namespaces
are not used.  If you just have a pid namespace or a mount namespace (or
perhaps both) and your fire off a new fangled user mode helper you get a
deep problem.  The user space process started to handle your core dump or
your nfs callback will have a full set of capabilities (because it is
still in the root user namespace).  With a full set of capabilities
and perhaps a little luck there is no containment.

The imperfect solution that currently exists for the core dump helper
is to provide enough information to the user space application that
it can query and find out the context of the core dumping application
and keep everything in that application sandbox if it so desires.
I expect something similar could be done for other user mode helper
style callbacks.

To make starting the user space application other than how we do today
needs a good argument that you are you can allow a lesser privileged
process set things up and that it can be exploited to gain privielge.

>> I don't think so but I'm not sure.
>> 
>> The approach I was talking about assumes the init process of the
>> caller
>> (say within a container, corresponding to ->child_reaper) is an
>> appropriate template for umh thread execution.
>> 
>> But I don't think that covers the case where unshare has created
>> different namespaces, like a mount namespace for example.
>> 
>> The current workqueue sub system can't be used to pre-create a thread
>> to
>> be used for umh execution so, either is needs changes or yet another
>> mechanism needs to be implemented.
>> 
>> For uses other than core dumping capturing a reference to the struct
>> pid
>> of the environment init process and using that as an execution
>> template
>> should be sufficient and takes care of environment existence problems
>> with some extra checks, not to mention eliminating the need for a
>> potentially huge number of kernel threads needing to be created to
>> provide execution templates.
>> 
>> Where to store this and how to access it when needed is another
>> problem.
>> 
>> Not sure a usermode helper capability is the right thing either as I
>> thought one important use of user namespaces was to allow unprivileged
>> users to perform operations they otherwise can't.
>> 
>> Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible ....
>> 
>> Still an appropriate execution template would be needed and IIUC we
>> can't trust getting that from within a user created namespace
>> environment.
>
> Perhaps, if a struct cred could be captured at some appropriate time
> that could be used to cater for user namespaces.
>
> Eric, do you think that would be possible to do without allowing users
> to circumvent security?

The general problem with capturing less than a full process is that
we always mess it up and forget to capture something important.

In a lot of ways this is a very simpilar problem to setting up an at job
or a cron job.  You build a script you test it then you tell at to run
it at a certain time and it fails, because your working environment did
not include something important that was in your actuall environment.

Unfortunately in this case the failures we are talking about are
container escapes and privilege escalation, so we do need to tread
carefully.

We might be able to safely define the context as the context of the
currently running init process (Which we can identifiy with a struct
pid).  Justifying that looks a little trickier but doable.

After a mechanism is picked it simply becomes a case of making certain
your permission checks for starting something are in sync with your
mechanism.

Personally I am a fan of the don't be clever and capture a kernel thread
approach as it is very easy to see you what if any exploitation
opportunities there are.  The justifications for something more clever
is trickier.  Of course we do something that from this perspective would
be considered ``clever'' today with kthreadd and user mode helpers.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                 ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-02-19  3:08                                   ` Kamezawa Hiroyuki
       [not found]                                     ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
       [not found]                                     ` <1455860260.3356.31.camel@themaw.net>
  2016-02-19  5:14                                   ` Ian Kent
  1 sibling, 2 replies; 14+ messages in thread
From: Kamezawa Hiroyuki @ 2016-02-19  3:08 UTC (permalink / raw)
  To: Eric W. Biederman, Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On 2016/02/19 5:45, Eric W. Biederman wrote: 
> Personally I am a fan of the don't be clever and capture a kernel thread
> approach as it is very easy to see you what if any exploitation
> opportunities there are.  The justifications for something more clever
> is trickier.  Of course we do something that from this perspective would
> be considered ``clever'' today with kthreadd and user mode helpers.
> 

I read old discussion....let me allow clarification  to create a helper kernel thread 
to run usermodehelper with using kthreadd.

0) define a trigger to create an independent usermodehelper environment for a container.
   Option A) at creating some namespace (pid, uid, etc...)
   Option B) at creating a new nsproxy
   Option C).at a new systemcall is called or some sysctl, make_private_usermode_helper() or some,
  
  It's expected this should be triggered by init process of a container with some capability.
  And scope of the effect should be defined. pid namespace ? nsporxy ? or new namespace ?

1) create a helper thread.
   task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper")
   switch task's nsproxy to current.(swtich_task_namespaces())
   switch task's cgroups to current (cgroup_attach_task_all())
   switch task's cred to current.
   copy task's capability from current
   (and any other ?)
   wake_up_process()
   
   And create a link between kthread_wq and container.

2) modify call_usermodehelper() to use kthread_worker
....

It seems the problem is which object container private user mode helper should be tied to.

Regards,
-Kame

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                 ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2016-02-19  3:08                                   ` Kamezawa Hiroyuki
@ 2016-02-19  5:14                                   ` Ian Kent
       [not found]                                     ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
       [not found]                                     ` <1456196130.2911.10.camel@themaw.net>
  1 sibling, 2 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-19  5:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A,
	bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote:
> Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> 
> > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote:
> > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote:
> > > > On 2016/02/18 11:57, Eric W. Biederman wrote:
> > > > > 
> > > > > Ccing The containers list because a related discussion is
> > > > > happening
> > > > > there
> > > > > and somehow this thread has never made it there.
> > > > > 
> > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> > > > > 
> > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
> > > > > > > On 11/15, Eric W. Biederman wrote:
> > > > > > > > 
> > > > > > > > I don't understand that one.  Having a preforked thread
> > > > > > > > with
> > > > > > > > the
> > > > > > > > proper
> > > > > > > > environment that can act like kthreadd in terms of
> > > > > > > > spawning
> > > > > > > > user
> > > > > > > > mode
> > > > > > > > helpers works and is simple.
> > > > > > 
> > > > > > Forgive me replying to such an old thread but ...
> > > > > > 
> > > > > > After realizing workqueues can't be used to pre-create
> > > > > > threads
> > > > > > to
> > > > > > run
> > > > > > usermode helpers I've returned to look at this.
> > > > > 
> > > > > If someone can wind up with a good implementation I will be
> > > > > happy.
> > > > > 
> > > > > > > Can't we ask ->child_reaper to create the non-daemonized
> > > > > > > kernel
> > > > > > > thread
> > > > > > > with the "right" ->nsproxy, ->fs, etc?
> > > > > > 
> > > > > > Eric, do you think this approach would be sufficient too?
> > > > > > 
> > > > > > Probably wouldn't be quite right for user namespaces but
> > > > > > should
> > > > > > provide
> > > > > > what's needed for other cases?
> > > > > > 
> > > > > > It certainly has the advantage of not having to maintain a
> > > > > > plague
> > > > > > of
> > > > > > processes waiting around to execute helpers.
> > > > > 
> > > > > That certainly sounds attractive.  Especially for the case of
> > > > > everyone
> > > > > who wants to set a core pattern in a container.
> > > > > 
> > > > > I am fuzzy on all of the details right now, but what I do
> > > > > remember
> > > > > is
> > > > > that in the kernel the user mode helper concepts when they
> > > > > attempted
> > > > > to
> > > > > scrub a processes environment were quite error prone until we
> > > > > managed to
> > > > > get kthreadd(pid 2) on the scene which always had a clean
> > > > > environment.
> > > > > 
> > > > > If we are going to tie this kind of thing to the pid namespace
> > > > > I
> > > > > recommend simplying denying it if you are in a user namespace
> > > > > without
> > > > > an approrpriate pid namespace.  AKA simply not allowing thigns
> > > > > to
> > > > > be
> > > > > setup
> > > > > if current->pid_ns->user_ns != current->user_ns.
> > > > > 
> > > > Can't be handled by simple capability like
> > > > CAP_SYS_USERMODEHELPER ?
> 
> I wasn't talking about a capability I was talking about how to
> identify
> where the user mode helper lives.
> 
> > > > User_ns check seems not to allow core-dump-cather in host will
> > > > not
> > > > work if user_ns is used.
> 
> The bottom line is all of this approaches non-sense if user namespaces
> are not used.  If you just have a pid namespace or a mount namespace
> (or
> perhaps both) and your fire off a new fangled user mode helper you get
> a
> deep problem.  The user space process started to handle your core dump
> or
> your nfs callback will have a full set of capabilities (because it is
> still in the root user namespace).  With a full set of capabilities
> and perhaps a little luck there is no containment.
> 
> The imperfect solution that currently exists for the core dump helper
> is to provide enough information to the user space application that
> it can query and find out the context of the core dumping application
> and keep everything in that application sandbox if it so desires.
> I expect something similar could be done for other user mode helper
> style callbacks.
> 
> To make starting the user space application other than how we do today
> needs a good argument that you are you can allow a lesser privileged
> process set things up and that it can be exploited to gain privielge.
> 
> > > I don't think so but I'm not sure.
> > > 
> > > The approach I was talking about assumes the init process of the
> > > caller
> > > (say within a container, corresponding to ->child_reaper) is an
> > > appropriate template for umh thread execution.
> > > 
> > > But I don't think that covers the case where unshare has created
> > > different namespaces, like a mount namespace for example.
> > > 
> > > The current workqueue sub system can't be used to pre-create a
> > > thread
> > > to
> > > be used for umh execution so, either is needs changes or yet
> > > another
> > > mechanism needs to be implemented.
> > > 
> > > For uses other than core dumping capturing a reference to the
> > > struct
> > > pid
> > > of the environment init process and using that as an execution
> > > template
> > > should be sufficient and takes care of environment existence
> > > problems
> > > with some extra checks, not to mention eliminating the need for a
> > > potentially huge number of kernel threads needing to be created to
> > > provide execution templates.
> > > 
> > > Where to store this and how to access it when needed is another
> > > problem.
> > > 
> > > Not sure a usermode helper capability is the right thing either as
> > > I
> > > thought one important use of user namespaces was to allow
> > > unprivileged
> > > users to perform operations they otherwise can't.
> > > 
> > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible ....
> > > 
> > > Still an appropriate execution template would be needed and IIUC
> > > we
> > > can't trust getting that from within a user created namespace
> > > environment.
> > 
> > Perhaps, if a struct cred could be captured at some appropriate time
> > that could be used to cater for user namespaces.
> > 
> > Eric, do you think that would be possible to do without allowing
> > users
> > to circumvent security?
> 
> The general problem with capturing less than a full process is that
> we always mess it up and forget to capture something important.
> 
> In a lot of ways this is a very simpilar problem to setting up an at
> job
> or a cron job.  You build a script you test it then you tell at to run
> it at a certain time and it fails, because your working environment
> did
> not include something important that was in your actuall environment.
> 
> Unfortunately in this case the failures we are talking about are
> container escapes and privilege escalation, so we do need to tread
> carefully.
> 
> We might be able to safely define the context as the context of the
> currently running init process (Which we can identifiy with a struct
> pid).  Justifying that looks a little trickier but doable.

Right, that seems like a fairly straight forward thing to implement
based on Olegs' example patch.

I'll put together a series based on that approach.

Keep in mind that the patches in my previous posts for sub-system usage
are definitely wrong but I can use them (and they will be only an
initial example of how to use the mechanism) to verify that contained
execution happens. They will need to change.

I was thinking that also capturing a struct cred (although I need to
look more at the relationship between the process cred, and the nsproxy
locations) at a particular time combined with a double fork and exec
could allow inclusion of user namespace.

Perhaps at only one level deep, ie. only allowing the first user
namesapec created from init or from container and not user namespaces
created from within a user namespace (if I can work out how to identify
that case).

Again when these are captured and how to get at them when needed is
going to be a challenge.

> 
> After a mechanism is picked it simply becomes a case of making certain
> your permission checks for starting something are in sync with your
> mechanism.

Hopefully yourself and others can help with that, ;)

> 
> Personally I am a fan of the don't be clever and capture a kernel
> thread
> approach as it is very easy to see you what if any exploitation
> opportunities there are.  The justifications for something more clever
> is trickier.  Of course we do something that from this perspective
> would
> be considered ``clever'' today with kthreadd and user mode helpers.

Indeed, a good policy, but it seems the choice of the init process
context (of a given container) is fairly straight forward and much of
the tricky stuff and a good measure of checks may already be done in
thread creation and exec code.

As you have pointed out before this is a very difficult problem to deal
with .....

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                     ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2016-02-19  5:37                                       ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-19  5:37 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote:
> On 2016/02/19 5:45, Eric W. Biederman wrote: 
> > Personally I am a fan of the don't be clever and capture a kernel
> > thread
> > approach as it is very easy to see you what if any exploitation
> > opportunities there are.  The justifications for something more
> > clever
> > is trickier.  Of course we do something that from this perspective
> > would
> > be considered ``clever'' today with kthreadd and user mode helpers.
> > 
> 
> I read old discussion....let me allow clarification  to create a
> helper kernel thread 
> to run usermodehelper with using kthreadd.
> 
> 0) define a trigger to create an independent usermodehelper
> environment for a container.
>    Option A) at creating some namespace (pid, uid, etc...)
>    Option B) at creating a new nsproxy
>    Option C).at a new systemcall is called or some sysctl,
> make_private_usermode_helper() or some,
>   
>   It's expected this should be triggered by init process of a
> container with some capability.
>   And scope of the effect should be defined. pid namespace ? nsporxy ?
> or new namespace ?
> 
> 1) create a helper thread.
>    task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper")
>    switch task's nsproxy to current.(swtich_task_namespaces())
>    switch task's cgroups to current (cgroup_attach_task_all())
>    switch task's cred to current.
>    copy task's capability from current
>    (and any other ?)
>    wake_up_process()
>    
>    And create a link between kthread_wq and container.

Not sure I quite understand this but I thought the difficulty with this
approach previously (even though the approach was very much incomplete)
was knowing that all the "moving parts" would not allow vulnerabilities.

And it looks like this would require a kernel thread for each instance.
So for a thousand containers that each mount an NFS mount that means, at
least, 1000 additional kernel threads. Might be able to sell that, if we
were lucky, but from an system administration POV it's horrible.

There's also the question of existence (aka. lifetime) to deal with
since the thread above needs to be created at a time other than the
usermode helper callback.

What happens for SIGKILL on a container?

> 2) modify call_usermodehelper() to use kthread_worker
> ....
> 
> It seems the problem is which object container private user mode
> helper should be tied to.
> 
> Regards,
> -Kame

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                       ` <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-19  9:30                                         ` Kamezawa Hiroyuki
       [not found]                                           ` <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Kamezawa Hiroyuki @ 2016-02-19  9:30 UTC (permalink / raw)
  To: Ian Kent, Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On 2016/02/19 14:37, Ian Kent wrote:
> On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote:
>> On 2016/02/19 5:45, Eric W. Biederman wrote:
>>> Personally I am a fan of the don't be clever and capture a kernel
>>> thread
>>> approach as it is very easy to see you what if any exploitation
>>> opportunities there are.  The justifications for something more
>>> clever
>>> is trickier.  Of course we do something that from this perspective
>>> would
>>> be considered ``clever'' today with kthreadd and user mode helpers.
>>>
>>
>> I read old discussion....let me allow clarification  to create a
>> helper kernel thread
>> to run usermodehelper with using kthreadd.
>>
>> 0) define a trigger to create an independent usermodehelper
>> environment for a container.
>>     Option A) at creating some namespace (pid, uid, etc...)
>>     Option B) at creating a new nsproxy
>>     Option C).at a new systemcall is called or some sysctl,
>> make_private_usermode_helper() or some,
>>
>>    It's expected this should be triggered by init process of a
>> container with some capability.
>>    And scope of the effect should be defined. pid namespace ? nsporxy ?
>> or new namespace ?
>>
>> 1) create a helper thread.
>>     task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper")
>>     switch task's nsproxy to current.(swtich_task_namespaces())
>>     switch task's cgroups to current (cgroup_attach_task_all())
>>     switch task's cred to current.
>>     copy task's capability from current
>>     (and any other ?)
>>     wake_up_process()
>>
>>     And create a link between kthread_wq and container.
>
> Not sure I quite understand this but I thought the difficulty with this
> approach previously (even though the approach was very much incomplete)
> was knowing that all the "moving parts" would not allow vulnerabilities.
>
Ok, that was discussed.

> And it looks like this would require a kernel thread for each instance.
> So for a thousand containers that each mount an NFS mount that means, at
> least, 1000 additional kernel threads. Might be able to sell that, if we
> were lucky, but from an system administration POV it's horrible.
>
I agree.

> There's also the question of existence (aka. lifetime) to deal with
> since the thread above needs to be created at a time other than the
> usermode helper callback.
>
> What happens for SIGKILL on a container?
>
It depends on how the helper kthread is tied to a container related object.
If kthread is linked with some namespace, we can kill it when a namespace
goes away.

So, with your opinion,
  - a helper thread should be spawned on demand
  - the lifetime of it should be clear. It will be good to have as same life time as the container.

I wonder there is no solution for "moving part" problem other than calling
do_fork() or copy_process() with container's init process context if we do all in the kernel.
Is that possible ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                           ` <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
@ 2016-02-20  3:28                                             ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-20  3:28 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	devel-GEFAQzZX7r8dnm+yROfE0A

On Fri, 2016-02-19 at 18:30 +0900, Kamezawa Hiroyuki wrote:
> On 2016/02/19 14:37, Ian Kent wrote:
> > On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote:
> > > On 2016/02/19 5:45, Eric W. Biederman wrote:
> > > > Personally I am a fan of the don't be clever and capture a
> > > > kernel
> > > > thread
> > > > approach as it is very easy to see you what if any exploitation
> > > > opportunities there are.  The justifications for something more
> > > > clever
> > > > is trickier.  Of course we do something that from this
> > > > perspective
> > > > would
> > > > be considered ``clever'' today with kthreadd and user mode
> > > > helpers.
> > > > 
> > > 
> > > I read old discussion....let me allow clarification  to create a
> > > helper kernel thread
> > > to run usermodehelper with using kthreadd.
> > > 
> > > 0) define a trigger to create an independent usermodehelper
> > > environment for a container.
> > >     Option A) at creating some namespace (pid, uid, etc...)
> > >     Option B) at creating a new nsproxy
> > >     Option C).at a new systemcall is called or some sysctl,
> > > make_private_usermode_helper() or some,
> > > 
> > >    It's expected this should be triggered by init process of a
> > > container with some capability.
> > >    And scope of the effect should be defined. pid namespace ?
> > > nsporxy ?
> > > or new namespace ?
> > > 
> > > 1) create a helper thread.
> > >     task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper")
> > >     switch task's nsproxy to current.(swtich_task_namespaces())
> > >     switch task's cgroups to current (cgroup_attach_task_all())
> > >     switch task's cred to current.
> > >     copy task's capability from current
> > >     (and any other ?)
> > >     wake_up_process()
> > > 
> > >     And create a link between kthread_wq and container.
> > 
> > Not sure I quite understand this but I thought the difficulty with
> > this
> > approach previously (even though the approach was very much
> > incomplete)
> > was knowing that all the "moving parts" would not allow
> > vulnerabilities.
> > 
> Ok, that was discussed.
> 
> > And it looks like this would require a kernel thread for each
> > instance.
> > So for a thousand containers that each mount an NFS mount that
> > means, at
> > least, 1000 additional kernel threads. Might be able to sell that,
> > if we
> > were lucky, but from an system administration POV it's horrible.
> > 
> I agree.
> 
> > There's also the question of existence (aka. lifetime) to deal with
> > since the thread above needs to be created at a time other than the
> > usermode helper callback.
> > 
> > What happens for SIGKILL on a container?
> > 

First understand that the fork and workqueue code is not something I've
needed to look at in the past so it's still quite new to me even now.

> It depends on how the helper kthread is tied to a container related
> object.
> If kthread is linked with some namespace, we can kill it when a
> namespace
> goes away.

I don't know how to do that so without knowing any better I assume it
could be difficult and complicated but, of course, I don't know.

> 
> So, with your opinion,
>   - a helper thread should be spawned on demand
>   - the lifetime of it should be clear. It will be good to have as
> same life time as the container.

This was always what I believed to be the best way to do it but ...

Not sure you've seen the other threads on this by me so let me provide
some history.

I started out posting a series (totally untested, an RFC only) in the
hope of finding a way to do this.

After a few iterations that lead to the conclusion that a kernel thread
would need to be created to provide context for subsequent helper
execution (for every distinct context), much the same as we have here,
and that the init process of the required context would probably be
sufficient for this, required as the environment of the thread
requesting helper execution itself could be used subvert execution.

I ended up accepting that even if I could work out what needed to be
captured and work out what needed to be done to switch to the
namspace(s) and other bits that would be high maintenance as it would be
fairly complicated and subsystems may be added or changed over time.

Also I had assumed a singlethread workqueue would create a single thread
for helper execution which was wrong.

After realizing what I had was far from what's needed I went back and
started reviewing the previous threads.

That lead me to following a link Oleg had posted to this thread where I
finally saw his suggestion about using ->child_reaper as the execution
template.

That really got my attention because of its simplicity and that's why I
want to give that a try now and see where it leads. However user
namespaces do sound like a problem even with this.

Having finally got a simple test scenario I see now that the palaces I
use to capture the information used to run the helper is also wrong but
that's less important than getting an execution method that works, is
safe, and is as simple as it can be. 

> 
> I wonder there is no solution for "moving part" problem other than
> calling
> do_fork() or copy_process() with container's init process context if
> we do all in the kernel.

Not sure I understand this but I believe that ultimately there will be
the equivalent of a fork (perhaps two) and exec (we need to exec the
helper anyway) no matter how this is done.

For example, IIUC, a fork must be done to change pid namespace but a
template like the container init process would already have that pid
namespace in cases other than possibly user namespaces.

I hope I understood what you were asking and haven't needlessly rambled
on,  ;)

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                     ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-23  2:55                                       ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-23  2:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A,
	bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, 2016-02-19 at 13:14 +0800, Ian Kent wrote:
> On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote:
> > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> > 
> > > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote:
> > > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote:
> > > > > On 2016/02/18 11:57, Eric W. Biederman wrote:
> > > > > > 
> > > > > > Ccing The containers list because a related discussion is
> > > > > > happening
> > > > > > there
> > > > > > and somehow this thread has never made it there.
> > > > > > 
> > > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes:
> > > > > > 
> > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
> > > > > > > > On 11/15, Eric W. Biederman wrote:
> > > > > > > > > 
> > > > > > > > > I don't understand that one.  Having a preforked
> > > > > > > > > thread
> > > > > > > > > with
> > > > > > > > > the
> > > > > > > > > proper
> > > > > > > > > environment that can act like kthreadd in terms of
> > > > > > > > > spawning
> > > > > > > > > user
> > > > > > > > > mode
> > > > > > > > > helpers works and is simple.
> > > > > > > 
> > > > > > > Forgive me replying to such an old thread but ...
> > > > > > > 
> > > > > > > After realizing workqueues can't be used to pre-create
> > > > > > > threads
> > > > > > > to
> > > > > > > run
> > > > > > > usermode helpers I've returned to look at this.
> > > > > > 
> > > > > > If someone can wind up with a good implementation I will be
> > > > > > happy.
> > > > > > 
> > > > > > > > Can't we ask ->child_reaper to create the non-daemonized
> > > > > > > > kernel
> > > > > > > > thread
> > > > > > > > with the "right" ->nsproxy, ->fs, etc?
> > > > > > > 
> > > > > > > Eric, do you think this approach would be sufficient too?
> > > > > > > 
> > > > > > > Probably wouldn't be quite right for user namespaces but
> > > > > > > should
> > > > > > > provide
> > > > > > > what's needed for other cases?
> > > > > > > 
> > > > > > > It certainly has the advantage of not having to maintain a
> > > > > > > plague
> > > > > > > of
> > > > > > > processes waiting around to execute helpers.
> > > > > > 
> > > > > > That certainly sounds attractive.  Especially for the case
> > > > > > of
> > > > > > everyone
> > > > > > who wants to set a core pattern in a container.
> > > > > > 
> > > > > > I am fuzzy on all of the details right now, but what I do
> > > > > > remember
> > > > > > is
> > > > > > that in the kernel the user mode helper concepts when they
> > > > > > attempted
> > > > > > to
> > > > > > scrub a processes environment were quite error prone until
> > > > > > we
> > > > > > managed to
> > > > > > get kthreadd(pid 2) on the scene which always had a clean
> > > > > > environment.
> > > > > > 
> > > > > > If we are going to tie this kind of thing to the pid
> > > > > > namespace
> > > > > > I
> > > > > > recommend simplying denying it if you are in a user
> > > > > > namespace
> > > > > > without
> > > > > > an approrpriate pid namespace.  AKA simply not allowing
> > > > > > thigns
> > > > > > to
> > > > > > be
> > > > > > setup
> > > > > > if current->pid_ns->user_ns != current->user_ns.
> > > > > > 
> > > > > Can't be handled by simple capability like
> > > > > CAP_SYS_USERMODEHELPER ?
> > 
> > I wasn't talking about a capability I was talking about how to
> > identify
> > where the user mode helper lives.
> > 
> > > > > User_ns check seems not to allow core-dump-cather in host will
> > > > > not
> > > > > work if user_ns is used.
> > 
> > The bottom line is all of this approaches non-sense if user
> > namespaces
> > are not used.  If you just have a pid namespace or a mount namespace
> > (or
> > perhaps both) and your fire off a new fangled user mode helper you
> > get
> > a
> > deep problem.  The user space process started to handle your core
> > dump
> > or
> > your nfs callback will have a full set of capabilities (because it
> > is
> > still in the root user namespace).  With a full set of capabilities
> > and perhaps a little luck there is no containment.
> > 
> > The imperfect solution that currently exists for the core dump
> > helper
> > is to provide enough information to the user space application that
> > it can query and find out the context of the core dumping
> > application
> > and keep everything in that application sandbox if it so desires.
> > I expect something similar could be done for other user mode helper
> > style callbacks.
> > 
> > To make starting the user space application other than how we do
> > today
> > needs a good argument that you are you can allow a lesser privileged
> > process set things up and that it can be exploited to gain
> > privielge.
> > 
> > > > I don't think so but I'm not sure.
> > > > 
> > > > The approach I was talking about assumes the init process of the
> > > > caller
> > > > (say within a container, corresponding to ->child_reaper) is an
> > > > appropriate template for umh thread execution.
> > > > 
> > > > But I don't think that covers the case where unshare has created
> > > > different namespaces, like a mount namespace for example.
> > > > 
> > > > The current workqueue sub system can't be used to pre-create a
> > > > thread
> > > > to
> > > > be used for umh execution so, either is needs changes or yet
> > > > another
> > > > mechanism needs to be implemented.
> > > > 
> > > > For uses other than core dumping capturing a reference to the
> > > > struct
> > > > pid
> > > > of the environment init process and using that as an execution
> > > > template
> > > > should be sufficient and takes care of environment existence
> > > > problems
> > > > with some extra checks, not to mention eliminating the need for
> > > > a
> > > > potentially huge number of kernel threads needing to be created
> > > > to
> > > > provide execution templates.
> > > > 
> > > > Where to store this and how to access it when needed is another
> > > > problem.
> > > > 
> > > > Not sure a usermode helper capability is the right thing either
> > > > as
> > > > I
> > > > thought one important use of user namespaces was to allow
> > > > unprivileged
> > > > users to perform operations they otherwise can't.
> > > > 
> > > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible ....
> > > > 
> > > > Still an appropriate execution template would be needed and IIUC
> > > > we
> > > > can't trust getting that from within a user created namespace
> > > > environment.
> > > 
> > > Perhaps, if a struct cred could be captured at some appropriate
> > > time
> > > that could be used to cater for user namespaces.
> > > 
> > > Eric, do you think that would be possible to do without allowing
> > > users
> > > to circumvent security?
> > 
> > The general problem with capturing less than a full process is that
> > we always mess it up and forget to capture something important.
> > 
> > In a lot of ways this is a very simpilar problem to setting up an at
> > job
> > or a cron job.  You build a script you test it then you tell at to
> > run
> > it at a certain time and it fails, because your working environment
> > did
> > not include something important that was in your actuall
> > environment.
> > 
> > Unfortunately in this case the failures we are talking about are
> > container escapes and privilege escalation, so we do need to tread
> > carefully.
> > 
> > We might be able to safely define the context as the context of the
> > currently running init process (Which we can identifiy with a struct
> > pid).  Justifying that looks a little trickier but doable.
> 
> Right, that seems like a fairly straight forward thing to implement
> based on Olegs' example patch.
> 
> I'll put together a series based on that approach.
> 
> Keep in mind that the patches in my previous posts for sub-system
> usage
> are definitely wrong but I can use them (and they will be only an
> initial example of how to use the mechanism) to verify that contained
> execution happens. They will need to change.
> 
> I was thinking that also capturing a struct cred (although I need to
> look more at the relationship between the process cred, and the
> nsproxy
> locations) at a particular time combined with a double fork and exec
> could allow inclusion of user namespace.
> 
> Perhaps at only one level deep, ie. only allowing the first user
> namesapec created from init or from container and not user namespaces
> created from within a user namespace (if I can work out how to
> identify
> that case).

You know, wrt. the mechanism Oleg suggested, I've been wondering if it's
even necessary to capture process template information for execution.

Isn't the main issue the execution of unknown arbitrary objects getting
access to a privileged context?

Then perhaps it is sufficient to require registration of an SHA hash (of
some sort) for these objects by a suitably privileged process and only
allow helper execution of valid objects.

If that is sufficient then helper execution from within a container or
user namespace could just use the callers environment itself.

What else do we need to be wary of, any thoughts Eric?

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                       ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
@ 2016-02-23 14:36                                         ` J. Bruce Fields
  0 siblings, 0 replies; 14+ messages in thread
From: J. Bruce Fields @ 2016-02-23 14:36 UTC (permalink / raw)
  To: Ian Kent
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Eric W. Biederman, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote:
> You know, wrt. the mechanism Oleg suggested, I've been wondering if it's
> even necessary to capture process template information for execution.
> 
> Isn't the main issue the execution of unknown arbitrary objects getting
> access to a privileged context?
> 
> Then perhaps it is sufficient to require registration of an SHA hash (of
> some sort) for these objects by a suitably privileged process and only
> allow helper execution of valid objects.

That executable probably also depends on libraries, services, and tons
of other miscellaneous stuff in its environment.  The NFSv4 client
idmapper, for example, may be doing ldap calls.  Unless the helper is
created with incredible care, I don't think that it's enough just to
verify that you're executing the correct helper.

--b.

> 
> If that is sufficient then helper execution from within a container or
> user namespace could just use the callers environment itself.
> 
> What else do we need to be wary of, any thoughts Eric?
> 
> Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: call_usermodehelper in containers
       [not found]                                         ` <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2016-02-24  0:55                                           ` Ian Kent
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Kent @ 2016-02-24  0:55 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky,
	Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A,
	Eric W. Biederman, bharrosh-C4P08NqkoRlBDgjK7y7TUQ,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, 2016-02-23 at 09:36 -0500, J. Bruce Fields wrote:
> On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote:
> > You know, wrt. the mechanism Oleg suggested, I've been wondering if
> > it's
> > even necessary to capture process template information for
> > execution.
> > 
> > Isn't the main issue the execution of unknown arbitrary objects
> > getting
> > access to a privileged context?
> > 
> > Then perhaps it is sufficient to require registration of an SHA hash
> > (of
> > some sort) for these objects by a suitably privileged process and
> > only
> > allow helper execution of valid objects.
> 
> That executable probably also depends on libraries, services, and tons
> of other miscellaneous stuff in its environment.  The NFSv4 client
> idmapper, for example, may be doing ldap calls.  Unless the helper is
> created with incredible care, I don't think that it's enough just to
> verify that you're executing the correct helper.

Yeah, I was thinking the logistics of keeping something like this up to
date would be hard but calculating this for every call would be too much
overhead I think.

> 
> --b.
> 
> > 
> > If that is sufficient then helper execution from within a container
> > or
> > user namespace could just use the callers environment itself.
> > 
> > What else do we need to be wary of, any thoughts Eric?
> > 
> > Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-24  0:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131111071825.62da01d1@tlielax.poochiereds.net>
     [not found] ` <20131112004703.GB15377@kroah.com>
     [not found]   ` <20131112061201.04cf25ab@tlielax.poochiereds.net>
     [not found]     ` <528226EC.4050701@parallels.com>
     [not found]       ` <20131112083043.0ab78e67@tlielax.poochiereds.net>
     [not found]         ` <5285FA0A.2080802@parallels.com>
     [not found]           ` <871u2incyo.fsf@xmission.com>
     [not found]             ` <20131118172844.GA10005@redhat.com>
     [not found]               ` <1455149857.2903.9.camel@themaw.net>
     [not found]                 ` <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  2:57                   ` call_usermodehelper in containers Eric W. Biederman
     [not found]                     ` <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-02-18  3:43                       ` Kamezawa Hiroyuki
     [not found]                         ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-18  6:36                           ` Ian Kent
     [not found]                         ` <1455777387.3188.24.camel@themaw.net>
     [not found]                           ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  7:37                             ` Ian Kent
     [not found]                           ` <1455781033.2908.5.camel@themaw.net>
     [not found]                             ` <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18 20:45                               ` Eric W. Biederman
     [not found]                                 ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-02-19  3:08                                   ` Kamezawa Hiroyuki
     [not found]                                     ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-19  5:37                                       ` Ian Kent
     [not found]                                     ` <1455860260.3356.31.camel@themaw.net>
     [not found]                                       ` <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-19  9:30                                         ` Kamezawa Hiroyuki
     [not found]                                           ` <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-20  3:28                                             ` Ian Kent
2016-02-19  5:14                                   ` Ian Kent
     [not found]                                     ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-23  2:55                                       ` Ian Kent
     [not found]                                     ` <1456196130.2911.10.camel@themaw.net>
     [not found]                                       ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-23 14:36                                         ` J. Bruce Fields
     [not found]                                       ` <20160223143627.GB31951@fieldses.org>
     [not found]                                         ` <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-02-24  0:55                                           ` Ian Kent
     [not found]             ` <52860B6D.4090208@parallels.com>
     [not found]               ` <1455320373.2890.5.camel@themaw.net>
     [not found]                 ` <56BF5511.6050606@virtuozzo.com>
     [not found]                   ` <1455495082.2941.32.camel@themaw.net>
     [not found]                     ` <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  3:17                       ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox