From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sukadev Bhattiprolu Subject: Re: [RFC][v8][PATCH 0/10] Implement clone3() system call Date: Tue, 20 Oct 2009 11:33:29 -0700 Message-ID: <20091020183329.GB22646@us.ibm.com> References: <20091013044925.GA28181@us.ibm.com> <4AD8C7E4.9000903@free.fr> <20091016194451.GA28706@us.ibm.com> <4ADCCD68.9030003@free.fr> <4ADCDE7F.4090501@librato.com> <20091020005125.GG27627@count0.beaverton.ibm.com> <20091020040315.GA26632@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Eric W. Biederman" Cc: Matt Helsley , Oren Laadan , Daniel Lezcano , randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, arnd-r2nGTMty4D4@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Containers , Nathan Lynch , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org, kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org, hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org, mingo-X9Un+BFzKDI@public.gmane.org, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, Alexey Dobriyan , roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Pavel Emelyanov List-Id: linux-api@vger.kernel.org Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote: | > Could you clarify ? How is the call to alloc_pidmap() from clone3() different | > from the call from clone() itself ? | | I think it is totally inappropriate to assign pids in a pid namespace | where there are user space processes already running. Honestly, I don't understand why it is inappropriate or how this differs from normal clone() - which also assigns pids in own and ancestor pid namespaces. | | > | How we handle a clone extension depends critically on if we want to | > | create a processes for restart in user space or kernel space. | > | | > | Could some one give me or point me at a strong case for creating the | > | processes for restart in user space? | > | > There has been a lot of discussion on this with reference to the | > Checkpoint/Restart patchset. See http://lkml.org/lkml/2009/4/13/401 | > for instance. | | Just read it. Thank you. Sorry. I should have mentioned the reason here. (Like you mention below), flexibility is the main reason. | Now I am certain clone_with_pids() is not useful functionality to be | exporting to userspace. | | The only real argument in favor of doing this in user space is greater | flexibility. I can see checkpointing/restoring a single thread process | without a pid namespace. Anything more and you are just asking for | trouble. | | A design that weakens security. Increases maintenance costs. All for | an unreliable result seems like a bad one to me. | | > | The pid assignment code is currently ugly. I asked that we just pass | > | in the min max pid pids that already exist into the core pid | > | assignment function and a constrained min/max that only admits a | > | single pid when we are allocating a struct pid for restart. That was | > | not done and now we have a weird abortion with unnecessary special cases. | > | > I did post a version of the patch attemptint to implement that. As | > pointed out in: | > | > http://lkml.org/lkml/2009/8/17/445 | > | > we would need more checks in alloc_pidmap() to cover cases like min or max | > being invalid or min being greater than max or max being greater than pid_max | > etc. Those checks also made the code ugly (imo). | | If you need more checks you are doing it wrong. The code already has min | and max values, and even a start value. I was just strongly suggesting | we generalize where we get the values from, and then we have not special | cases. Well, if alloc_pidmap(pid_ns, min, max) does not have to check the parameters passed in (ie assumes that callers pass it in correctly) it might be simple. But when user specifies the pid, the min == max == user's target pid so we will need to check the values either here or in callers. Yes the code already has values and a start value. But these are controlled by alloc_pidmap() and not passed in from the user space. alloc_pidmap() needs to assign the next available pid or a specific target pid. Generalizing it to alloc a pid in a range seemed be a bit of an over kill for currently known usages. I will post a version of the patch outside this patchset with min and max parameters and we can see if it can be optimized/beautified. Sukadev