linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano-GANU6spQydw@public.gmane.org>
To: Sukadev Bhattiprolu
	<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	arnd-r2nGTMty4D4@public.gmane.org,
	Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	Nathan Lynch <nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>,
	Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org,
	mingo-X9Un+BFzKDI@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	Alexey Dobriyan
	<adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Subject: Re: [RFC][v8][PATCH 0/10] Implement clone3() system call
Date: Mon, 19 Oct 2009 22:34:48 +0200	[thread overview]
Message-ID: <4ADCCD68.9030003@free.fr> (raw)
In-Reply-To: <20091016194451.GA28706-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Sukadev Bhattiprolu wrote:
> Daniel Lezcano [daniel.lezcano-GANU6spQydw@public.gmane.org] wrote:
>   
>> Sukadev Bhattiprolu wrote:
>>     
>>> Subject: [RFC][v8][PATCH 0/10] Implement clone3() system call
>>>
>>> To support application checkpoint/restart, a task must have the same pid it
>>> had when it was checkpointed.  When containers are nested, the tasks within
>>> the containers exist in multiple pid namespaces and hence have multiple pids
>>> to specify during restart.
>>>
>>> This patchset implements a new system call, clone3() that lets a process
>>> specify the pids of the child process.
>>>
>>> Patches 1 through 7 are helper patches, needed for choosing a pid for the
>>> child process.
>>>
>>> PATCH 9 defines a prototype of the new system call. PATCH 10 adds some
>>> documentation on the new system call, some/all of which will eventually
>>> go into a man page.
>>>   
>>>       
>> Sorry for jumping so late in the discussion and for having maybe my
>> remarks pointless...
>>
>> If this syscall is only for checkpoint / restart, why this shouldn't be
>> used with a future generic sys_restart syscall ?
>>     
>
> As I tried to explain in PATCH 0/9, the ability to choose a pid is only
> for C/R but we are also trying to clone-flags so we won't need yet
> another variant of clone() fairly soon.
>
>   
>> Otherwise, shouldn't be more convenient to have something usable for
>> everyone, let's say:
>>
>> cloneat(pid_t pid, pid_t desiredpid, ...);
>>
>> Where 'desiredpid' is a hint of for the kernel for the pid to be
>> allocated (zero means the kernel will choose one for us) and the newly
>> allocated task is the son of 'pid'.
>>     
>
> Hmm, so P1 would call cloneat() to create a child P3 _on behalf_ of process
> P2 ?  I did not know we had a requirement for that. Can you explain the
> use-case more ? IOW, why can't P2 create the child P3 by itself ?
>   
I forgot to mention a constraint with the specified pid : P2 has to be 
child of P1.
In other word, you can not specify a pid to clonat which is not your 
descendant (including yourself).
With this constraint I think there is no security issues.

Concerning of forking on behalf of another process, we can consider it 
is up to the caller / programmer to know what it does. If a process in 
the process hierarchy exec'ed a program and we cloneat this process and 
then the program fails because of an "unexpected error", well, we should 
have not done that. A similar example is when the IPC are removed while 
they are used by some other processes.

Here it is a interesting use case:
 * if you created a pid namespace, and, let's say, booted a system 
container where the container init is the "init" process, then with this 
call you can enter the container at any time by doing cloneat() followed 
by an exec of your command. I think that was a requirement when there 
were discussions around "sys_hijack".

Another point. It's another way to extend the exhausted clone  flags as 
the cloneat can be called as a compatibility way, with cloneat(getpid(), 
0, ... )

> Note also that 'desiredpid' must be a list of pids (one for each pid
> namespaces that the child will belong to) and hence we need 'nr_pids'
> to specify the list. Given that we are limited to 6 parameters to the
> syscall, such parameters must be stuffed into 'struct clone_args'.
>
> So we should do something like:
>
> 	sys_clone3(u32 flags_low, pid_t pid, struct clone_args *carg,
> 		pid_t *desired_pids)
>
> or (to match the name and parameters, move 'pid' parameter into clone_args)
>   
Well, hiding multiple clone in one clone call is ... weird. AFAIR, there 
was a debate between kernel or userspace proctree creation but it looks 
like it's done from the kernel with this call.

I don't really see a difference between sys_restart(pid_t pid , int fd, 
long flags) where pid_t is the topmost in the hierarchy, fd is a file 
descriptor to a structure "pid_t * + struct clone_args *" and flags is 
"PROCTREE".

IMHO, it is nicer to recursively restore the process tree for the nested 
pid namespaces, that will be really an userspace process tree creation 
and cloneat will be your friend here :)

>> That looks more consistent with the "<syscall>at" family, 'openat',
>> 'faccessat', 'readlinkat', etc ... and usable for something else than
>> the checkpoint / restart.
>>     
>
> The subtle difference though is that openat() does not open a file on
> behalf of another process and so the 'at' suffix would not apply ?
>   
Yes and no, depending of where you put the cursor. If you consider the 
'at' suffix means a process context, then I agree with you, there is a 
difference because the cloneat will be out of the current process 
context. But if you consider the 'at' suffix as a context in general, 
and openat means "relatively to a file descriptor" and cloneat means 
"relatively to a pid namespace" the 'at' suffix may apply. But I agree 
that we are so used to call the posix "fork", that cloneat sounds scary :)

Thanks
  -- Daniel

  parent reply	other threads:[~2009-10-19 20:34 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-13  4:49 [RFC][v8][PATCH 0/10] Implement clone3() system call Sukadev Bhattiprolu
     [not found] ` <20091013044925.GA28181-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13  4:49   ` [RFC][v8][PATCH 1/10]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-10-13  4:50   ` [RFC][v8][PATCH 2/10]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-10-13  4:50   ` [RFC][v8][PATCH 3/10]: Make pid_max a pid_ns property Sukadev Bhattiprolu
     [not found]     ` <20091013045041.GC28435-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13  5:19       ` Alexey Dobriyan
2009-10-13 13:09       ` Pavel Emelyanov
     [not found]         ` <4AD47C1F.7040703-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2009-10-13 15:24           ` Serge E. Hallyn
2009-10-13 16:10             ` Pavel Emelyanov
2009-10-13 16:28               ` Serge E. Hallyn
2009-10-13  4:51   ` [RFC][v8][PATCH 4/10]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
     [not found]     ` <20091013045104.GD28435-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13 11:50       ` Pavel Emelyanov
     [not found]         ` <4AD46977.5020303-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2009-10-15  0:24           ` Sukadev Bhattiprolu
2009-10-13  4:51   ` [RFC][v8][PATCH 5/10]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-10-13  4:52   ` [RFC][v8][PATCH 6/10]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-10-13  4:52   ` [RFC][v8][PATCH 7/10]: Check invalid clone flags Sukadev Bhattiprolu
     [not found]     ` <20091013045234.GG28435-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13 18:35       ` Oren Laadan
     [not found]         ` <4AD4C88D.7040008-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-13 23:38           ` Sukadev Bhattiprolu
2009-10-13  4:52   ` [RFC][v8][PATCH 8/10]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-10-13  4:54   ` [RFC][v8][PATCH 9/10]: Define clone3() syscall Sukadev Bhattiprolu
     [not found]     ` <20091013045439.GI28435-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13 18:46       ` Oren Laadan
2009-10-16  4:20       ` Sukadev Bhattiprolu
     [not found]         ` <20091016042041.GA7220-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-16  6:25           ` Michael Kerrisk
     [not found]             ` <cfd18e0f0910152325m4a9125c2q18f36f5bd7d4a0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-16 18:06               ` Sukadev Bhattiprolu
     [not found]                 ` <20091016180631.GA31036-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-19 17:44                   ` Matt Helsley
     [not found]                     ` <20091019174405.GE27627-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-19 21:31                       ` H. Peter Anvin
     [not found]                         ` <4ADCDAA8.5080408-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-19 23:50                           ` Matt Helsley
     [not found]                             ` <20091019235012.GF27627-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-21  4:26                               ` Michael Kerrisk
2009-10-21 13:03                                 ` H. Peter Anvin
     [not found]                                   ` <4ADF06B7.50508-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-21 19:44                                     ` Sukadev Bhattiprolu
2009-10-21 22:03                                       ` H. Peter Anvin
     [not found]                                       ` <20091021194440.GA1283-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22 10:40                                         ` Michael Kerrisk
     [not found]                                           ` <cfd18e0f0910220340n7c655daap78e395136c56f882-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-22 18:10                                             ` Sukadev Bhattiprolu
2009-10-22 10:26                                   ` Michael Kerrisk
2009-10-22 11:38                                     ` H. Peter Anvin
2009-10-22 12:14                                       ` Michael Kerrisk
     [not found]                                         ` <cfd18e0f0910220514y1bd5967aj3a04bc3f5b38948b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-22 12:19                                           ` H. Peter Anvin
2009-10-22 13:57                                           ` Matt Helsley
2009-10-13  4:55   ` [RFC][v8][PATCH 10/10]: Document " Sukadev Bhattiprolu
     [not found]     ` <20091013045556.GJ28435-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-14 12:26       ` Arnd Bergmann
     [not found]         ` <200910141426.35338.arnd-r2nGTMty4D4@public.gmane.org>
2009-10-14 18:39           ` Sukadev Bhattiprolu
2009-10-19 21:36       ` Pavel Machek
     [not found]         ` <20091019213636.GB1482-+ZI9xUNit7I@public.gmane.org>
2009-10-21  8:37           ` Arnd Bergmann
2009-10-21  9:33             ` Pavel Machek
     [not found]               ` <20091021093338.GA11670-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2009-10-21 13:26                 ` Arnd Bergmann
     [not found]                   ` <200910211526.50584.arnd-r2nGTMty4D4@public.gmane.org>
2009-10-21 19:09                     ` Pavel Machek
2009-10-21 18:27           ` Sukadev Bhattiprolu
2009-10-13 20:50   ` [RFC][v8][PATCH 0/10] Implement clone3() system call Roland McGrath
     [not found]     ` <20091013205015.1ED524F7-nL1rrgvulkc2UH6IwYuUx0EOCMrvLtNR@public.gmane.org>
2009-10-13 23:27       ` Sukadev Bhattiprolu
     [not found]         ` <20091013232736.GA24392-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-13 23:53           ` Roland McGrath
     [not found]             ` <20091013235320.E90022746-nL1rrgvulkc2UH6IwYuUx0EOCMrvLtNR@public.gmane.org>
2009-10-14  1:13               ` H. Peter Anvin
     [not found]                 ` <4AD525B3.2070906-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-14  4:36                   ` Sukadev Bhattiprolu
     [not found]                     ` <20091014043607.GA32114-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-14  4:38                       ` H. Peter Anvin
2009-10-14 22:36                   ` Sukadev Bhattiprolu
     [not found]                     ` <20091014223634.GB3515-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-14 22:49                       ` H. Peter Anvin
     [not found]                         ` <4AD6557D.3090501-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-15  0:17                           ` Sukadev Bhattiprolu
2009-10-13 23:49   ` H. Peter Anvin
     [not found]     ` <4AD511F1.7010207-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-14  1:39       ` Matt Helsley
     [not found]         ` <20091014013936.GC27627-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-14  2:24           ` H. Peter Anvin
     [not found]             ` <4AD5365E.5090709-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-10-14  4:40               ` Sukadev Bhattiprolu
     [not found]                 ` <20091014044035.GB32114-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-14  4:50                   ` H. Peter Anvin
2009-10-14 16:07                   ` Serge E. Hallyn
2009-10-16 19:22 ` Daniel Lezcano
     [not found]   ` <4AD8C7E4.9000903-GANU6spQydw@public.gmane.org>
2009-10-16 19:44     ` Sukadev Bhattiprolu
     [not found]       ` <20091016194451.GA28706-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-19 20:34         ` Daniel Lezcano [this message]
     [not found]           ` <4ADCCD68.9030003-GANU6spQydw@public.gmane.org>
2009-10-19 21:47             ` Oren Laadan
2009-10-20  0:51               ` Matt Helsley
     [not found]                 ` <20091020005125.GG27627-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-20  3:33                   ` Eric W. Biederman
     [not found]                     ` <m1vdiad9jd.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-20  4:03                       ` Sukadev Bhattiprolu
     [not found]                         ` <20091020040315.GA26632-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-20 10:46                           ` Eric W. Biederman
     [not found]                             ` <m1iqeauyvl.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-20 14:16                               ` Serge E. Hallyn
2009-10-20 18:33                               ` Sukadev Bhattiprolu
     [not found]                                 ` <20091020183329.GB22646-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-20 19:26                                   ` Eric W. Biederman
     [not found]                                     ` <m1r5sxsw7w.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-20 20:13                                       ` Oren Laadan
2009-10-21  6:20                                       ` Sukadev Bhattiprolu
     [not found]                                         ` <20091021062021.GA2667-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21  9:16                                           ` Eric W. Biederman
     [not found]                                             ` <m1eioxrtsb.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-21 18:52                                               ` Sukadev Bhattiprolu
     [not found]                                                 ` <20091021185242.GB12955-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 21:11                                                   ` Eric W. Biederman
2009-10-23  0:42                                               ` Sukadev Bhattiprolu
     [not found]                                                 ` <20091023004253.GA7915-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-23  1:03                                                   ` Eric W. Biederman
2009-10-23  5:30                                                     ` Sukadev Bhattiprolu
     [not found]                                                       ` <20091023053001.GA24972-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-23  5:44                                                         ` Eric W. Biederman
     [not found]                                                           ` <m1ws2mpsuk.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-23 19:21                                                             ` Sukadev Bhattiprolu
     [not found]                                                               ` <20091023192124.GA11088-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-23 20:48                                                                 ` Sukadev Bhattiprolu
     [not found]                                                                   ` <20091023204812.GA26524-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-23 23:26                                                                     ` Eric W. Biederman
     [not found]                                                                       ` <m1y6n1lmk7.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-24  3:38                                                                         ` Sukadev Bhattiprolu
2009-10-23 19:16                                                         ` Oren Laadan
     [not found]                                                           ` <4AE20124.4010108-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-23 19:34                                                             ` Oren Laadan
     [not found]                                                               ` <4AE20532.6060809-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-23 23:12                                                                 ` Eric W. Biederman
2009-10-20 14:09                       ` Serge E. Hallyn
     [not found]               ` <4ADCDE7F.4090501-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-21 15:53                 ` Daniel Lezcano
     [not found]                   ` <4ADF2E75.1020801-GANU6spQydw@public.gmane.org>
2009-10-21 18:45                     ` Oren Laadan
     [not found]                       ` <4ADF56D4.8030405-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-22 11:22                         ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ADCCD68.9030003@free.fr \
    --to=daniel.lezcano-ganu6spqydw@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=arnd-r2nGTMty4D4@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mingo-X9Un+BFzKDI@public.gmane.org \
    --cc=nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org \
    --cc=randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).