All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lennart Poettering <mzxreary@0pointer.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matt Helsley <matthltc@us.ibm.com>,
	Kay Sievers <kay.sievers@vrfy.org>,
	linux-kernel@vger.kernel.org, harald@redhat.com, david@fubar.dk,
	greg@kroah.com
Subject: Re: A Plumber’s Wish List for Linux
Date: Mon, 10 Oct 2011 18:31:41 +0200	[thread overview]
Message-ID: <20111010163140.GA22191@tango.0pointer.de> (raw)
In-Reply-To: <m17h4g2jqy.fsf@fess.ebiederm.org>

On Fri, 07.10.11 21:24, Eric W. Biederman (ebiederm@xmission.com) wrote:

> 
> Lennart Poettering <mzxreary@0pointer.de> writes:
> 
> > On Fri, 07.10.11 00:49, Matt Helsley (matthltc@us.ibm.com) wrote:
> >
> >> 
> >> On Fri, Oct 07, 2011 at 01:17:02AM +0200, Kay Sievers wrote:
> >> 
> >> <snip>
> >> 
> >> > * simple, reliable and future-proof way to detect whether a specific pid
> >> > is running in a CLONE_NEWPID container, i.e. not in the root PID
> >> > namespace. Currently, there are available a few ugly hacks to detect
> >> 
> >> Is that precisely what's needed or would it be sufficient to know
> >> that the pid is running in a child pid namespace of the current pid
> >> namespace? If so, I think this could eventually be done by comparing
> >> the inode numbers assigned to /proc/<pid>/ns/pid to those of
> >> /proc/1/ns/pid.
> >
> > I think the most interesting test would be to figure out for a process
> > if itself is running in a PID namespace. And for that comparing inodes
> > wouldn't work since the namespace process would never get access to the
> > inode of the outside init.
> 
> Strictly correct answer.  All processes are running in a pid namespace.
> I think we can implement that in a libc header.
> 
> static inline bool in_pid_namespace(void)
> {
>         return true;
> }
> 
> Why does it matter if you are running in something other than the
> initial pid namespace?  I expect what you are really after is something
> else entirely, and you are asking the wrong question.

Well, all other virtualization solutions are easily detectable via CPUID
leaf 0x1, bit 31, and via DMI and some other ways. However, for Linux
containers there is no nice way to detect them.

VMs are pretty good at providing a comprehensive emulation of real
machines, and distributions running in them usually do not need
information whether they are running in a VM or not. This is very
different though for containers: Quite a few kernel subsystems are
currently not virtualized, for example SELinux, VTs, most of sysfs, most
of /proc/sys, audit, udev or file systems (by which I mean that for a
container you probably don't want to fsck the root fs, and so on), and
containers tend to be much more lightweight than real systems.

To make a standard distribution run nicely in a Linux container you
usually have to make quite a number of modifications to it and disable
certain things from the boot process. Ideally however, one could simply
boot the same image on a real machine and in a container and would just
do the right thing, fully stateless. And for that you need to be able to
detect containers, and currently you can't.

Of course, in 10 years or so containers might be much more complete then
they are right now, and virtualize all subsystems I listed above and
maybe a ton more, but that's 10y for now, and for now to make things
work as cleanly as possible it would be immensly helpful if containers
could be detectable in a nice way.

Of course, in many case there are nicer ways to shortcut the init jobs
on a container. For example, instead of bypassing root fsck in a
container it makes a lot more sense to simply say: bypass root fsck if
the root fs is already writable. And there's more like that. But at the
end of the day you always want to be able to bind certain things to the
fact that you are running in a container, if you want things to "just
work". And I believe that must be the goal. 

I am pretty sure that having a way to detect execution in a container is
a minimum requirement to get general purpose distribution makers to
officially support and care for execution in container environments. As
you are a container guy I am sure that would be very much in your
interest.

And note that I am only interested in detecting CLONE_NEWPID, not the
other namespaces. CLONE_NEWPID is the core namespace technology that
turns a container into a container, so that's all that's needed.

And yes, CLONE_NEWPID can be useful for other purposes then just
containers as well. However, that doesn't really matter for my usecase
as mentioned above: becuase if you run an init system in CLONE_NEWPID
namespace, then that's what I call a container, and the init system
should have all rights to detect that.

The root PID namespace is different from all other namespaces btw,
already in the fact that the the kernel threads are part of it, but not
the other namespaces.

Finally, note that it prevously has been very easy to detect execution
in a container, simple by checking the "ns" cgroup hierarchy. (i.e. look
whether the path in /proc/self/cgroup for "ns" wasn't "/" and you knew
you were in a container). systemd made use of that and since very early
on we supported container boots. The removal of "ns" broke systemd in
that regard. Now, I don't want "ns" back, and I am not going to make the
big hubbub out of the fact that you guys broke userspace that way. But
what I do like to see made available again is a sane way to detect
execution in a container environment, i.e. a way for a process to detect
whether it is running in the root CLONE_NEWPID namespace.

Thanks,

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

  reply	other threads:[~2011-10-10 16:31 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-06 23:17 A Plumber’s Wish List for Linux Kay Sievers
2011-10-06 23:46 ` Andi Kleen
2011-10-07  0:13   ` Lennart Poettering
2011-10-07  1:57     ` Andi Kleen
2011-10-07 15:58       ` Lennart Poettering
2011-10-19 23:16     ` H. Peter Anvin
2011-10-07  7:49 ` Matt Helsley
2011-10-07 16:01   ` Lennart Poettering
2011-10-08  4:24     ` Eric W. Biederman
2011-10-10 16:31       ` Lennart Poettering [this message]
2011-10-10 20:59         ` Detecting if you are running in a container Eric W. Biederman
2011-10-10 21:41           ` Lennart Poettering
2011-10-11  5:40             ` Eric W. Biederman
2011-10-11  6:54             ` Eric W. Biederman
2011-10-12 16:59             ` Kay Sievers
2011-11-01 22:05               ` [lxc-devel] " Michael Tokarev
2011-11-01 23:51                 ` Eric W. Biederman
2011-11-02  8:08                   ` Michael Tokarev
2011-10-11  1:32           ` Ted Ts'o
2011-10-11  2:05             ` Matt Helsley
2011-10-11  3:25               ` Ted Ts'o
2011-10-11  6:42                 ` Eric W. Biederman
2011-10-11 12:53                   ` Theodore Tso
2011-10-11 21:16                     ` Eric W. Biederman
2011-10-11 22:30                       ` david
2011-10-12  4:26                         ` Eric W. Biederman
2011-10-12  5:10                           ` david
2011-10-12 15:08                             ` Serge E. Hallyn
2011-10-12 17:57                       ` J. Bruce Fields
2011-10-12 18:25                         ` Kyle Moffett
2011-10-12 19:04                           ` J. Bruce Fields
2011-10-12 19:12                             ` Kyle Moffett
2011-10-14 15:54                               ` Ted Ts'o
2011-10-14 18:04                                 ` Eric W. Biederman
2011-10-14 21:58                                   ` H. Peter Anvin
2011-10-16  9:42                                     ` Eric W. Biederman
2011-10-30 20:11                                       ` H. Peter Anvin
2011-11-01 13:38                                         ` Eric W. Biederman
2011-10-11 22:25               ` david
2011-10-07 10:12 ` A Plumber’s Wish List for Linux Alan Cox
2011-10-07 10:28   ` Kay Sievers
2011-10-07 10:38     ` Alan Cox
2011-10-07 12:46       ` Kay Sievers
2011-10-07 13:39         ` Theodore Tso
2011-10-07 15:21         ` Hugo Mills
2011-10-10 11:18           ` A Plumber???s " David Sterba
2011-10-10 11:18             ` David Sterba
2011-10-10 13:09             ` Theodore Tso
2011-10-13  0:28               ` Dave Chinner
2011-10-14 15:47                 ` Ted Ts'o
2011-10-11 13:14             ` Serge E. Hallyn
2011-10-11 15:49               ` Andrew G. Morgan
2011-10-12  2:31                 ` Serge E. Hallyn
2011-10-12 20:51                 ` Lennart Poettering
2011-10-08  9:53         ` A Plumber’s " Bastien ROUCARIES
2011-10-09  3:15           ` Alex Elsayed
2011-10-07 16:07       ` Valdis.Kletnieks
2011-10-07 12:35 ` Vivek Goyal
2011-10-07 18:59 ` Greg KH
2011-10-09 12:20   ` Kay Sievers
2011-10-09  8:45 ` Rusty Russell
2011-10-11 23:16 ` Andrew Morton
2011-10-12  0:53   ` Frederic Weisbecker
2011-10-12  0:59   ` Frederic Weisbecker
     [not found]     ` <20111012174014.GE6281@google.com>
2011-10-12 18:16       ` Cyrill Gorcunov
2011-10-14 15:38         ` Frederic Weisbecker
2011-10-14 16:01           ` Cyrill Gorcunov
2011-10-14 16:08             ` Cyrill Gorcunov
2011-10-14 16:19               ` Frederic Weisbecker
2011-10-19 21:19           ` Paul Menage
2011-10-19 21:12 ` Paul Menage
2011-10-19 23:03   ` Lennart Poettering
2011-10-19 23:09     ` Paul Menage
2011-10-19 23:31       ` Lennart Poettering
2011-10-22 10:21         ` Frederic Weisbecker
2011-10-22 15:28           ` Lennart Poettering
2011-10-25  5:40             ` Li Zefan
2011-10-30 17:18               ` Lennart Poettering
2011-11-01  1:27                 ` Li Zefan
     [not found] <CAE2SPAZci=u__d58phePCftVr_e+i+N2YU-JYjGDG_b3TmYTSQ@mail.gmail.com>
2011-10-07 13:40 ` Alan Cox
2011-10-07 14:57   ` Alexander E. Patrakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111010163140.GA22191@tango.0pointer.de \
    --to=mzxreary@0pointer.de \
    --cc=david@fubar.dk \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=harald@redhat.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthltc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.