Re: [PATCH] pidns: Make pid_max per namespace

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Pavel Emelyanov <xemul@parallels.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Tejun Heo <tj@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] pidns: Make pid_max per namespace
Date: Thu, 10 Mar 2011 02:44:59 -0800	[thread overview]
Message-ID: <20110310024459.e54fd99e.akpm@linux-foundation.org> (raw)
In-Reply-To: <4D78A2B8.1030605@parallels.com>

On Thu, 10 Mar 2011 13:06:48 +0300 Pavel Emelyanov <xemul@parallels.com> wrote:

> On 03/10/2011 12:50 PM, Andrew Morton wrote:
> > On Thu, 10 Mar 2011 12:35:32 +0300 Pavel Emelyanov <xemul@parallels.com> wrote:
> > 
> >> On 03/08/2011 02:58 AM, Andrew Morton wrote:
> >>> On Thu, 03 Mar 2011 11:39:17 +0300
> >>> Pavel Emelyanov <xemul@parallels.com> wrote:
> >>>
> >>>> Rationale:
> >>>>
> >>>> On x86_64 with big ram people running containers set pid_max on host to 
> >>>> large values to be able to launch more containers. At the same time 
> >>>> containers running 32-bit software experience problems with large pids - ps
> >>>> calls readdir/stat on proc entries and inode's i_ino happen to be too big 
> >>>> for the 32-bit API.
> >>>>
> >>>> Thus, the ability to limit the pid value inside container is required.
> >>>>
> >>>
> >>> This is a behavioural change, isn't it?  In current kernels a write to
> >>> /proc/sys/kernel/pid_max will change the max pid on all processes. 
> >>> After this change, that write will only affect processes in the current
> >>> namespace.  Anyone who was depending on the old behaviour might run
> >>> into problems?
> >>
> >> Hardly. If the behavior of some two apps depends on its synchronous change,
> >> these two might want to run in the same pid namespace.
> > 
> > I don't understand your answer.  What is this "synchronous change" of which
> > you speak?  Does your "might want to run" suggestion mean that userspace 
> > changes would be required for this operation to again work correctly?
> 
> Your concern was about "anyone who was depending on the old behaviour", where
> the old behavior meant "a write to sys.pid_max will change the max pid on all
> processes".
> 
> I wanted to say, that if someone changes pid_max and expects someone else to
> act differently after this, then these two should live in the same pid namespace.

So it's a non-back-compatible change to the userspace interface.  uh-oh.

> IOW, if X raises the pid_max, then all the processes X sees in its pid namespace
> *may* have pids up to this value. All the other process, that are not visible
> in X's pid space will have other values, but X doesn't see them, so why should
> we care?

Current userspace has no *need* to be running in the same pidns to
alter the pid_max of some processes.  So the chances are good that
any current userspace takes advantage of this.

Silly example:

	if (fork() == 0) {
		/* child */
		create_new_pidns();
		start_doing_stuff();
	} else {
		/* parent */
		increase_pid_max();
	}

Another example would be logging into a system as root in the init_ns
and modifying /proc/sys/kernel/pid_max by hand.

I don't have a clue how much code is out there using pid namespaces,
not how much of that code alters the default pid_max.  Hard.

The proposed interface is a bit weird and hacky anyway, isn't it?  We
have a single pseudo-file in a well-known location -
/proc/sys/kernel/pid_max.  One would expect alteration of that
system-wide file to have system-wide effects, only that isn't the case.
Instead a modification to the system-wide file has local-pidns-only
effects.  It would be much more logical to have a per-pidns pid_max
pseudo file.

And if we do that, we then need to work out what to do with writes to
/proc/sys/kernel/pid_max.  Remember the user expects those writes to
alter all processes on the machine!  I guess it would be acceptable to
permit that to continue to happen - a write to /proc/sys/kernel/pid_max
will overwrite all the per-pidns pid_max settings.

     prev parent reply	other threads:[~2011-03-10 10:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-03  8:39 [PATCH] pidns: Make pid_max per namespace Pavel Emelyanov
2011-03-07 23:58 ` Andrew Morton
2011-03-10  9:35   ` Pavel Emelyanov
2011-03-10  9:50     ` Andrew Morton
2011-03-10 10:06       ` Pavel Emelyanov
2011-03-10 10:44         ` Andrew Morton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110310024459.e54fd99e.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.