Re: [RFC][PATCH 0/9] Make containers kernel objects

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Djalal Harouni <tixxdz@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	David Howells <dhowells@redhat.com>,
	trondmy@primarydata.com, Miklos Szeredi <mszeredi@redhat.com>,
	linux-nfs@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Tue, 23 May 2017 16:30:33 +0200	[thread overview]
Message-ID: <CAEiveUcbmm5m4=11ZppxAWppeoFWUBFpLC7dAZRuBCTFHR548g@mail.gmail.com> (raw)
In-Reply-To: <874lwbraxh.fsf@xmission.com>

On Tue, May 23, 2017 at 2:54 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Jeff Layton <jlayton@redhat.com> writes:
>
>> On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote:
>>> David Howells <dhowells@redhat.com> writes:
>>>
>>> > Here are a set of patches to define a container object for the kernel and
>>> > to provide some methods to create and manipulate them.
>>> >
>>> > The reason I think this is necessary is that the kernel has no idea how to
>>> > direct upcalls to what userspace considers to be a container - current
>>> > Linux practice appears to make a "container" just an arbitrarily chosen
>>> > junction of namespaces, control groups and files, which may be changed
>>> > individually within the "container".
>>> >
>>>
>>> I think this might possibly be a useful abstraction for solving the
>>> keyring upcalls if it was something created implicitly.
>>>
>>> fork_into_container for use by keyring upcalls is currently a security
>>> vulnerability as it allows escaping all of a containers cgroups.  But
>>> you have that on your list of things to fix.  However you don't have
>>> seccomp and a few other things.
>>>
>>> Before we had kthreadd in the kernel upcalls always had issues because
>>> the code to reset all of the userspace bits and make the forked
>>> task suitable for running upcalls was always missing some detail.  It is
>>> a very bug-prone kind of idiom that you are talking about.  It is doubly
>>> bug-prone because the wrongness is visible to userspace and as such
>>> might get become a frozen KABI guarantee.
>>>
>>> Let me suggest a concrete alternative:
>>>
>>> - At the time of mount observer the mounters user namespace.
>>> - Find the mounters pid namespace.
>>> - If the mounters pid namespace is owned by the mounters user namespace
>>>   walk up the pid namespace tree to the first pid namespace owned by
>>>   that user namespace.
>>> - If the mounters pid namespace is not owned by the mounters user
>>>   namespace fail the mount it is going to need to make upcalls as
>>>   will not be possible.
>>> - Hold a reference to the pid namespace that was found.
>>>
>>> Then when an upcall needs to be made fork a child of the init process
>>> of the specified pid namespace.  Or fail if the init process of the
>>> pid namespace has died.
>>>
>>> That should always work and it does not require keeping expensive state
>>> where we did not have it previously.  Further because the semantics are
>>> fork a child of a particular pid namespace's init as features get added
>>> to the kernel this code remains well defined.
>>>
>>> For ordinary request-key upcalls we should be able to use the same rules
>>> and just not save/restore things in the kernel.
>>>
>>
>> OK, that does seem like a reasonable idea. Note that it's not just
>> request-key upcalls here that we're interested in, but anything that
>> we'd typically spawn from kthreadd otherwise.
>
> General user mode helper *Nod*.
>
>> That said, I worry a little about this. If the init process does a setns
>> at the wrong time, suddenly you're doing the upcall in different
>> namespaces than you intended.
>>
>> Might it be better to use the init process of the container as the
>> template like you suggest, but snapshot its "context" at a particular
>> point in time instead?
>>
>> knfsd could do this when it's started, for instance...
>
> The danger of a snapshot it time is something important (like cgroup
> membership) might change.
>
> It might be necessary to have this be an opt-in.   Perhaps even to the
> point of starting a dedicated kthreadd.
>
> Right now I think we need to figure out what it will take to solve this
> in the kernel because I strongly suspect that solving this in userspace
> is a cop out and we really aren't providing enough information to
> userspace to run the helper in the proper context.    And I strongly
> suspect that providing enough information from the kernel will be
> roughly equivalent to solving this in the kernel.

Maybe it depends on the cases, a general approach can be too difficult
to handle especially from the security point. Maybe it is better to
identify what operations need what context, and a userspace
service/proxy can act using kthreadd with the right context... at
least the shift to this model has been done for years now in the
mobile industry.


-- 
tixxdz

next prev parent reply	other threads:[~2017-05-23 14:30 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells
2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells
2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells
2017-08-14  5:47   ` Richard Guy Briggs
2017-08-16 22:21     ` Paul Moore
2017-08-18  8:03       ` Richard Guy Briggs
2017-09-06 14:03         ` Serge E. Hallyn
2017-09-14  5:47           ` Richard Guy Briggs
2017-09-08 20:02         ` Paul Moore
2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells
2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells
2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside " David Howells
2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells
2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells
2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells
2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells
2017-05-22 16:53 ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley
2017-05-22 17:14   ` Aleksa Sarai
2017-05-22 17:27   ` Jessica Frazelle
2017-05-22 18:34   ` Jeff Layton
2017-05-22 19:21     ` James Bottomley
2017-05-22 22:14       ` Jeff Layton
2017-05-23 10:35       ` Ian Kent
2017-05-23  9:38   ` Ian Kent
2017-05-23 14:53   ` David Howells
2017-05-23 14:56     ` Eric W. Biederman
2017-05-23 15:14     ` David Howells
2017-05-23 15:17       ` Eric W. Biederman
2017-05-23 15:44         ` James Bottomley
2017-05-23 16:36         ` David Howells
2017-05-24  8:26           ` Eric W. Biederman
2017-05-24  9:16             ` Ian Kent
2017-05-22 17:11 ` Jessica Frazelle
2017-05-22 19:04 ` Eric W. Biederman
2017-05-22 22:22   ` Jeff Layton
2017-05-23 12:54     ` Eric W. Biederman
2017-05-23 14:27       ` Jeff Layton
2017-05-23 14:30       ` Djalal Harouni [this message]
2017-05-23 14:54         ` Colin Walters
2017-05-23 15:31           ` Jeff Layton
2017-05-23 15:35             ` Colin Walters
2017-05-23 15:30         ` David Howells
2017-05-23 14:23     ` Djalal Harouni
2017-05-27 17:45   ` Trond Myklebust
2017-05-27 19:10     ` James Bottomley
2017-05-23 10:09 ` Ian Kent
2017-05-23 13:52 ` David Howells
2017-05-23 15:02   ` James Bottomley
2017-05-23 15:23   ` Eric W. Biederman
2017-05-23 15:12 ` David Howells
2017-05-23 15:33 ` Eric W. Biederman
2017-05-23 16:13 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEiveUcbmm5m4=11ZppxAWppeoFWUBFpLC7dAZRuBCTFHR548g@mail.gmail.com' \
    --to=tixxdz@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=trondmy@primarydata.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).