All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org
Subject: Re: [RFC][PATCH 0/4] Object creation with a specified id
Date: Fri, 14 Mar 2008 12:45:31 -0400	[thread overview]
Message-ID: <47DAABAB.7000706@cs.columbia.edu> (raw)
In-Reply-To: <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>



Nadia Derbey wrote:
> Oren Laadan wrote:
>>
>>
>> Nadia Derbey wrote:
>>
>>> Oren Laadan wrote:
>>>
>>>>
>>>>
>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>
>>>>> A couple of weeks ago, a discussion has started after Pierre's 
>>>>> proposal for
>>>>> a new syscall to change an ipc id (see thread
>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>
>>>>>
>>>>> Oren's suggestion was to force an object's id during its creation, 
>>>>> rather
>>>>> than 1. create it, 2. change its id.
>>>>>
>>>>> So here is an implementation of what Oren has suggested.
>>>>>
>>>>> 2 new files are defined under /proc/self:
>>>>>   . next_ipcid --> next id to use for ipc object creation
>>>>>   . next_pids --> next upid nr(s) to use for next task to be forked
>>>>>                   (see patch #2 for more details).
>>>>
>>>>
>>>>
>>>> Generally looks good. One meta-comment, though:
>>>>
>>>> I wonder why you use separate files for separate resources, 
>>>
>>>
>>> That would be needed in a situation wheere we don't care about next, 
>>> say, ipc id to be created but we need a predefined pid. But I must 
>>> admit I don't see any pratical application to it.
>>
>>
>> exactly; why set the next-ipc value so far in advance ?  I think it's
>> better (and less confusing) if we require that setting the next-id value
>> be done right before the respective syscall.
> 
> Ok, but this "requirement" should be widely agreed upon ;-)

A discussion on the overall checkpoint/restart policy is certainly due
(and increasingly noted recently).

> What I mean here is that the solution with 1 file per "object type" can 
> easily be extended imho:

I'm aiming at simplicity and minimal (but not restrictive) API for user
space. I argue that we never really need more than one predetermined value
at a time (eg see below), and the cost of setting such value is so small
that there is no real benefit in setting more than one at a time (either
via multiple files or via an array of values). If in fact you wanted more
than one type at a time, you could still make it happen with a single
file without adding many user-visible files in /proc/<pid>.

So far, I can't think of any such identifier that we'd like to pre-set
that does not fit into a "long" type; simply because the kernel does not
use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To
be on the safe side, we can require that the format be "long VAL", just
in case (and later you could have other formats).

The only exception, perhaps, is if a TCP connection is rebuilt with a,
say, connect() syscall, and some information needs to be "predetermined"
so we'll need to extend the format. That can be done with another type
eg. "tcp ....." or a separate file (per your view), _then_, not now.
(As a side note, I don't suggest that this is how TCP will be restored).

In any event, the bottom line is that a single file, with a single
value at a time (possibly annotated with a type), is the simplest, and
isn't restrictive, for our purposes. Looking one step ahead, simplicity
and minimal commitment to user space is important in trying to push this
to the mainline kernel...

> I don't know how the restart is supposed to work, but we can imagine 
> feeding all these files with all the object ids just before restart and 

Building on my own experience with zap I envision the restart operation
of a given task occurring in the context of that task. (I assume this is
how restart will work). Therefore, it makes much sense that before every
syscall that requires a pre-determined resource identifier (eg. clone,
ipc, pty allocation), the task will place the desired value in "next_id"
(and that will only be meaningful during restart) and invoke the said
syscall. Voila.

Note that the restart will "rebuild" the container's state (and the task
state) as it reads in the data from some source. It is likely that not
all data will be available when the first said syscall is about to be
invoked, so you may not be able to feed everything ahead of time.


> let the process pick up the objects ids as it needs them.
> Of course, this would require to enhance the files formats, as well as 
> the way things are stored in the task_struct.
> 
> Hope what I'm saying is not too stupid ;-) ?
> 
> Regards,
> Nadia
> 
>>
>>>
>>>> and why you'd
>>>> want to write multiple identifiers in one go;
>>>
>>>
>>> I used multiple identifiers only for the pid values: this is because 
>>> when a new pid value is allocated for a process that belongs to 
>>> nested namespaces, the lower level upid nr values are allocated in a 
>>> single shot. (see alloc_pid()).
>>>
>>>> it seems to complicate the
>>>> code and interface with minimal gain.
>>>> In practice, a process will only do either one or the other, so a 
>>>> single
>>>> file is enough (e.g. "next_id").
>>>> Also, writing a single value at a time followed by the syscall is 
>>>> enough;
>>>> it's definitely not a performance issue to have multiple calls.
>>>> We assume the user/caller knows what she's doing, so no need to 
>>>> classify
>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) 
>>>> ahead
>>>> of time. The caller simply writes a value and then calls the relevant
>>>> syscall, or otherwise the results may not be what she expected...
>>>> If such context is expected to be required (although I don't see any at
>>>> the moment),  we can require that the user write "TYPE VALUE" pair to
>>>> the "next_id" file.
>>>
>>>
>>> That's exactly what I wanted to avoid by creating 1 file per object.
>>> Now, it's true that in a restart context where I guess that things 
>>> will be done synchronously, we could have a single next_id file.
>>>
>>>>
>>>>>
>>>>> When one of these files (or both of them) is filled, a structure 
>>>>> pointed to
>>>>> by the calling task struct is filled with these ids.
>>>>>
>>>>> Then, when the object is created, the id(s) present in that 
>>>>> structure are
>>>>> used, instead of the default ones.
>>>>>
>>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>>>
>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the 
>>>>> new IPC
>>>>>             object (changes the ipc_addid() path).
>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) 
>>>>> for a newly
>>>>>             allocated process (changes the 
>>>>> alloc_pid()/alloc_pidmap() paths).
>>>>>
>>>>> Any comment and/or suggestions are welcome.
>>>>>
>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>>>
>>>>> Regards,
>>>>> Nadia
>>>>>
>>>>> -- 
>>>>>
>>>>> -- 
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> Regards,
>>> Nadia
>>
>>
>>
> 
> 

  parent reply	other threads:[~2008-03-14 16:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-10 13:50 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next ipc id Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080310135209.769712000-6ktuUTfB/bM@public.gmane.org>
2008-03-11 12:04     ` Pavel Emelyanov
     [not found]       ` <47D67557.7080506-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-11 15:28         ` Nadia Derbey
     [not found]           ` <47D6A52D.6030701-6ktuUTfB/bM@public.gmane.org>
2008-03-11 15:37             ` Pavel Emelyanov
     [not found]               ` <47D6A741.9080708-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-11 15:55                 ` Nadia Derbey
2008-03-11 16:47                 ` Serge E. Hallyn
     [not found]                   ` <20080311164725.GA12918-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2008-03-11 16:55                     ` Pavel Emelyanov
     [not found]                       ` <47D6B990.4080400-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-11 17:53                         ` Serge E. Hallyn
     [not found]                           ` <20080311175328.GA14171-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2008-03-12 19:58                             ` Eric W. Biederman
     [not found]                               ` <m1zlt3yarj.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2008-03-13 10:41                                 ` Nadia Derbey
     [not found]                                   ` <47D904E4.4000208-6ktuUTfB/bM@public.gmane.org>
2008-03-13 17:40                                     ` Eric W. Biederman
     [not found]                                       ` <m1r6eewmj2.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2008-03-13 19:06                                         ` Serge E. Hallyn
2008-03-13 20:01                                         ` Oren Laadan
     [not found]                                           ` <47D987FE.9040909-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-13 23:12                                             ` Eric W. Biederman
     [not found]                                               ` <m163vqw74n.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2008-03-13 23:24                                                 ` Oren Laadan
     [not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
2008-03-13 23:16   ` [RFC][PATCH 0/4] Object creation with a specified id Oren Laadan
     [not found]     ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14  6:21       ` Nadia Derbey
     [not found]         ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>
2008-03-14 15:50           ` Oren Laadan
     [not found]             ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 15:56               ` Pavel Emelyanov
     [not found]                 ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-14 16:02                   ` Oren Laadan
     [not found]                     ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 16:08                       ` Pavel Emelyanov
2008-03-14 16:11                   ` Nadia Derbey
2008-03-14 16:11               ` Nadia Derbey
     [not found]                 ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>
2008-03-14 16:45                   ` Oren Laadan [this message]
     [not found]                     ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-16  3:43                       ` Serge E. Hallyn
     [not found]                         ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>
2008-03-16 19:08                           ` Oren Laadan
     [not found]                             ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-17 14:44                               ` Serge E. Hallyn
  -- strict thread matches above, loose matches on Subject: below --
2008-04-04 14:51 Nadia.Derbey
     [not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
2008-04-15  3:06   ` Nick Andrew
2008-04-15  3:06 ` Nick Andrew
2008-04-15 10:30   ` Nadia Derbey
     [not found]     ` <480483C2.3030509-6ktuUTfB/bM@public.gmane.org>
2008-04-15 18:52       ` Oren Laadan
2008-04-15 18:52         ` Oren Laadan
     [not found]   ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
2008-04-15 10:30     ` Nadia Derbey
2008-04-18  5:46     ` Nadia Derbey
2008-04-18  5:46   ` Nadia Derbey
2008-04-04 14:51 Nadia.Derbey-6ktuUTfB/bM

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47DAABAB.7000706@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=Nadia.Derbey-6ktuUTfB/bM@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.