* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-03-10 13:50 Nadia.Derbey-6ktuUTfB/bM
[not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-03-10 13:50 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: xemul-GEFAQzZX7r8dnm+yROfE0A
A couple of weeks ago, a discussion has started after Pierre's proposal for
a new syscall to change an ipc id (see thread
http://lkml.org/lkml/2008/1/29/209).
Oren's suggestion was to force an object's id during its creation, rather
than 1. create it, 2. change its id.
So here is an implementation of what Oren has suggested.
2 new files are defined under /proc/self:
. next_ipcid --> next id to use for ipc object creation
. next_pids --> next upid nr(s) to use for next task to be forked
(see patch #2 for more details).
When one of these files (or both of them) is filled, a structure pointed to
by the calling task struct is filled with these ids.
Then, when the object is created, the id(s) present in that structure are
used, instead of the default ones.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next ipc to be created.
[PATCH 2/4] adds the procfs facility for next task to be forked.
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
@ 2008-03-13 23:16 ` Oren Laadan
[not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2008-03-13 23:16 UTC (permalink / raw)
To: Nadia.Derbey-6ktuUTfB/bM
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
> A couple of weeks ago, a discussion has started after Pierre's proposal for
> a new syscall to change an ipc id (see thread
> http://lkml.org/lkml/2008/1/29/209).
>
>
> Oren's suggestion was to force an object's id during its creation, rather
> than 1. create it, 2. change its id.
>
> So here is an implementation of what Oren has suggested.
>
> 2 new files are defined under /proc/self:
> . next_ipcid --> next id to use for ipc object creation
> . next_pids --> next upid nr(s) to use for next task to be forked
> (see patch #2 for more details).
Generally looks good. One meta-comment, though:
I wonder why you use separate files for separate resources, and why you'd
want to write multiple identifiers in one go; it seems to complicate the
code and interface with minimal gain.
In practice, a process will only do either one or the other, so a single
file is enough (e.g. "next_id").
Also, writing a single value at a time followed by the syscall is enough;
it's definitely not a performance issue to have multiple calls.
We assume the user/caller knows what she's doing, so no need to classify
the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead
of time. The caller simply writes a value and then calls the relevant
syscall, or otherwise the results may not be what she expected...
If such context is expected to be required (although I don't see any at
the moment), we can require that the user write "TYPE VALUE" pair to
the "next_id" file.
>
> When one of these files (or both of them) is filled, a structure pointed to
> by the calling task struct is filled with these ids.
>
> Then, when the object is created, the id(s) present in that structure are
> used, instead of the default ones.
>
> The patches are against 2.6.25-rc3-mm1, in the following order:
>
> [PATCH 1/4] adds the procfs facility for next ipc to be created.
> [PATCH 2/4] adds the procfs facility for next task to be forked.
> [PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
> object (changes the ipc_addid() path).
> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
> allocated process (changes the alloc_pid()/alloc_pidmap() paths).
>
> Any comment and/or suggestions are welcome.
>
> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>
> Regards,
> Nadia
>
> --
>
> --
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-03-14 6:21 ` Nadia Derbey
[not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Nadia Derbey @ 2008-03-14 6:21 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Oren Laadan wrote:
>
>
> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>
>> A couple of weeks ago, a discussion has started after Pierre's
>> proposal for
>> a new syscall to change an ipc id (see thread
>> http://lkml.org/lkml/2008/1/29/209).
>>
>>
>> Oren's suggestion was to force an object's id during its creation, rather
>> than 1. create it, 2. change its id.
>>
>> So here is an implementation of what Oren has suggested.
>>
>> 2 new files are defined under /proc/self:
>> . next_ipcid --> next id to use for ipc object creation
>> . next_pids --> next upid nr(s) to use for next task to be forked
>> (see patch #2 for more details).
>
>
> Generally looks good. One meta-comment, though:
>
> I wonder why you use separate files for separate resources,
That would be needed in a situation wheere we don't care about next,
say, ipc id to be created but we need a predefined pid. But I must admit
I don't see any pratical application to it.
> and why you'd
> want to write multiple identifiers in one go;
I used multiple identifiers only for the pid values: this is because
when a new pid value is allocated for a process that belongs to nested
namespaces, the lower level upid nr values are allocated in a single
shot. (see alloc_pid()).
> it seems to complicate the
> code and interface with minimal gain.
> In practice, a process will only do either one or the other, so a single
> file is enough (e.g. "next_id").
> Also, writing a single value at a time followed by the syscall is enough;
> it's definitely not a performance issue to have multiple calls.
> We assume the user/caller knows what she's doing, so no need to classify
> the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead
> of time. The caller simply writes a value and then calls the relevant
> syscall, or otherwise the results may not be what she expected...
> If such context is expected to be required (although I don't see any at
> the moment), we can require that the user write "TYPE VALUE" pair to
> the "next_id" file.
That's exactly what I wanted to avoid by creating 1 file per object.
Now, it's true that in a restart context where I guess that things will
be done synchronously, we could have a single next_id file.
>
>>
>> When one of these files (or both of them) is filled, a structure
>> pointed to
>> by the calling task struct is filled with these ids.
>>
>> Then, when the object is created, the id(s) present in that structure are
>> used, instead of the default ones.
>>
>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>
>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>> [PATCH 3/4] makes use of the specified id (if any) to allocate the new
>> IPC
>> object (changes the ipc_addid() path).
>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>> for a newly
>> allocated process (changes the alloc_pid()/alloc_pidmap()
>> paths).
>>
>> Any comment and/or suggestions are welcome.
>>
>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>
>> Regards,
>> Nadia
>>
>> --
>>
>> --
>
>
>
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>
@ 2008-03-14 15:50 ` Oren Laadan
[not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2008-03-14 15:50 UTC (permalink / raw)
To: Nadia Derbey
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Nadia Derbey wrote:
> Oren Laadan wrote:
>>
>>
>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>
>>> A couple of weeks ago, a discussion has started after Pierre's
>>> proposal for
>>> a new syscall to change an ipc id (see thread
>>> http://lkml.org/lkml/2008/1/29/209).
>>>
>>>
>>> Oren's suggestion was to force an object's id during its creation,
>>> rather
>>> than 1. create it, 2. change its id.
>>>
>>> So here is an implementation of what Oren has suggested.
>>>
>>> 2 new files are defined under /proc/self:
>>> . next_ipcid --> next id to use for ipc object creation
>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>> (see patch #2 for more details).
>>
>>
>> Generally looks good. One meta-comment, though:
>>
>> I wonder why you use separate files for separate resources,
>
> That would be needed in a situation wheere we don't care about next,
> say, ipc id to be created but we need a predefined pid. But I must admit
> I don't see any pratical application to it.
exactly; why set the next-ipc value so far in advance ? I think it's
better (and less confusing) if we require that setting the next-id value
be done right before the respective syscall.
>
>> and why you'd
>> want to write multiple identifiers in one go;
>
> I used multiple identifiers only for the pid values: this is because
> when a new pid value is allocated for a process that belongs to nested
> namespaces, the lower level upid nr values are allocated in a single
> shot. (see alloc_pid()).
>
>> it seems to complicate the
>> code and interface with minimal gain.
>> In practice, a process will only do either one or the other, so a single
>> file is enough (e.g. "next_id").
>> Also, writing a single value at a time followed by the syscall is enough;
>> it's definitely not a performance issue to have multiple calls.
>> We assume the user/caller knows what she's doing, so no need to classify
>> the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead
>> of time. The caller simply writes a value and then calls the relevant
>> syscall, or otherwise the results may not be what she expected...
>> If such context is expected to be required (although I don't see any at
>> the moment), we can require that the user write "TYPE VALUE" pair to
>> the "next_id" file.
>
> That's exactly what I wanted to avoid by creating 1 file per object.
> Now, it's true that in a restart context where I guess that things will
> be done synchronously, we could have a single next_id file.
>
>>
>>>
>>> When one of these files (or both of them) is filled, a structure
>>> pointed to
>>> by the calling task struct is filled with these ids.
>>>
>>> Then, when the object is created, the id(s) present in that structure
>>> are
>>> used, instead of the default ones.
>>>
>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>
>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
>>> new IPC
>>> object (changes the ipc_addid() path).
>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>>> for a newly
>>> allocated process (changes the alloc_pid()/alloc_pidmap()
>>> paths).
>>>
>>> Any comment and/or suggestions are welcome.
>>>
>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>
>>> Regards,
>>> Nadia
>>>
>>> --
>>>
>>> --
>>
>>
>>
>
>
> Regards,
> Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-03-14 15:56 ` Pavel Emelyanov
[not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-14 16:11 ` Nadia Derbey
1 sibling, 1 reply; 31+ messages in thread
From: Pavel Emelyanov @ 2008-03-14 15:56 UTC (permalink / raw)
To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Oren Laadan wrote:
>
> Nadia Derbey wrote:
>> Oren Laadan wrote:
>>>
>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>
>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>> proposal for
>>>> a new syscall to change an ipc id (see thread
>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>
>>>>
>>>> Oren's suggestion was to force an object's id during its creation,
>>>> rather
>>>> than 1. create it, 2. change its id.
>>>>
>>>> So here is an implementation of what Oren has suggested.
>>>>
>>>> 2 new files are defined under /proc/self:
>>>> . next_ipcid --> next id to use for ipc object creation
>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>> (see patch #2 for more details).
>>>
>>> Generally looks good. One meta-comment, though:
>>>
>>> I wonder why you use separate files for separate resources,
>> That would be needed in a situation wheere we don't care about next,
>> say, ipc id to be created but we need a predefined pid. But I must admit
>> I don't see any pratical application to it.
>
> exactly; why set the next-ipc value so far in advance ? I think it's
> better (and less confusing) if we require that setting the next-id value
> be done right before the respective syscall.
And race with some other syscall caller? This will only work if the next-ipc-id
and the next-pid are on a task_struct. Are they (at least supposed to be such)?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2008-03-14 16:02 ` Oren Laadan
[not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 16:11 ` Nadia Derbey
1 sibling, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2008-03-14 16:02 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Pavel Emelyanov wrote:
> Oren Laadan wrote:
>> Nadia Derbey wrote:
>>> Oren Laadan wrote:
>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>
>>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>>> proposal for
>>>>> a new syscall to change an ipc id (see thread
>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>
>>>>>
>>>>> Oren's suggestion was to force an object's id during its creation,
>>>>> rather
>>>>> than 1. create it, 2. change its id.
>>>>>
>>>>> So here is an implementation of what Oren has suggested.
>>>>>
>>>>> 2 new files are defined under /proc/self:
>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>> (see patch #2 for more details).
>>>> Generally looks good. One meta-comment, though:
>>>>
>>>> I wonder why you use separate files for separate resources,
>>> That would be needed in a situation wheere we don't care about next,
>>> say, ipc id to be created but we need a predefined pid. But I must admit
>>> I don't see any pratical application to it.
>> exactly; why set the next-ipc value so far in advance ? I think it's
>> better (and less confusing) if we require that setting the next-id value
>> be done right before the respective syscall.
>
> And race with some other syscall caller? This will only work if the next-ipc-id
> and the next-pid are on a task_struct. Are they (at least supposed to be such)?
yes. that's the first detail I looked for in the patch :)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-03-14 16:08 ` Pavel Emelyanov
0 siblings, 0 replies; 31+ messages in thread
From: Pavel Emelyanov @ 2008-03-14 16:08 UTC (permalink / raw)
To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Oren Laadan wrote:
>
> Pavel Emelyanov wrote:
>> Oren Laadan wrote:
>>> Nadia Derbey wrote:
>>>> Oren Laadan wrote:
>>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>>
>>>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>>>> proposal for
>>>>>> a new syscall to change an ipc id (see thread
>>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>>
>>>>>>
>>>>>> Oren's suggestion was to force an object's id during its creation,
>>>>>> rather
>>>>>> than 1. create it, 2. change its id.
>>>>>>
>>>>>> So here is an implementation of what Oren has suggested.
>>>>>>
>>>>>> 2 new files are defined under /proc/self:
>>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>>> (see patch #2 for more details).
>>>>> Generally looks good. One meta-comment, though:
>>>>>
>>>>> I wonder why you use separate files for separate resources,
>>>> That would be needed in a situation wheere we don't care about next,
>>>> say, ipc id to be created but we need a predefined pid. But I must admit
>>>> I don't see any pratical application to it.
>>> exactly; why set the next-ipc value so far in advance ? I think it's
>>> better (and less confusing) if we require that setting the next-id value
>>> be done right before the respective syscall.
>> And race with some other syscall caller? This will only work if the next-ipc-id
>> and the next-pid are on a task_struct. Are they (at least supposed to be such)?
>
> yes. that's the first detail I looked for in the patch :)
OK :) I just remembered some talks about using last_pid for pid allocations
and just wanted to be sure.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 15:56 ` Pavel Emelyanov
@ 2008-03-14 16:11 ` Nadia Derbey
[not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>
1 sibling, 1 reply; 31+ messages in thread
From: Nadia Derbey @ 2008-03-14 16:11 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Oren Laadan wrote:
>
>
> Nadia Derbey wrote:
>
>> Oren Laadan wrote:
>>
>>>
>>>
>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>
>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>> proposal for
>>>> a new syscall to change an ipc id (see thread
>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>
>>>>
>>>> Oren's suggestion was to force an object's id during its creation,
>>>> rather
>>>> than 1. create it, 2. change its id.
>>>>
>>>> So here is an implementation of what Oren has suggested.
>>>>
>>>> 2 new files are defined under /proc/self:
>>>> . next_ipcid --> next id to use for ipc object creation
>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>> (see patch #2 for more details).
>>>
>>>
>>>
>>> Generally looks good. One meta-comment, though:
>>>
>>> I wonder why you use separate files for separate resources,
>>
>>
>> That would be needed in a situation wheere we don't care about next,
>> say, ipc id to be created but we need a predefined pid. But I must
>> admit I don't see any pratical application to it.
>
>
> exactly; why set the next-ipc value so far in advance ? I think it's
> better (and less confusing) if we require that setting the next-id value
> be done right before the respective syscall.
Ok, but this "requirement" should be widely agreed upon ;-)
What I mean here is that the solution with 1 file per "object type" can
easily be extended imho:
I don't know how the restart is supposed to work, but we can imagine
feeding all these files with all the object ids just before restart and
let the process pick up the objects ids as it needs them.
Of course, this would require to enhance the files formats, as well as
the way things are stored in the task_struct.
Hope what I'm saying is not too stupid ;-) ?
Regards,
Nadia
>
>>
>>> and why you'd
>>> want to write multiple identifiers in one go;
>>
>>
>> I used multiple identifiers only for the pid values: this is because
>> when a new pid value is allocated for a process that belongs to nested
>> namespaces, the lower level upid nr values are allocated in a single
>> shot. (see alloc_pid()).
>>
>>> it seems to complicate the
>>> code and interface with minimal gain.
>>> In practice, a process will only do either one or the other, so a single
>>> file is enough (e.g. "next_id").
>>> Also, writing a single value at a time followed by the syscall is
>>> enough;
>>> it's definitely not a performance issue to have multiple calls.
>>> We assume the user/caller knows what she's doing, so no need to classify
>>> the identifier (that is, tell the kernel it's a pid, or an ipc id) ahead
>>> of time. The caller simply writes a value and then calls the relevant
>>> syscall, or otherwise the results may not be what she expected...
>>> If such context is expected to be required (although I don't see any at
>>> the moment), we can require that the user write "TYPE VALUE" pair to
>>> the "next_id" file.
>>
>>
>> That's exactly what I wanted to avoid by creating 1 file per object.
>> Now, it's true that in a restart context where I guess that things
>> will be done synchronously, we could have a single next_id file.
>>
>>>
>>>>
>>>> When one of these files (or both of them) is filled, a structure
>>>> pointed to
>>>> by the calling task struct is filled with these ids.
>>>>
>>>> Then, when the object is created, the id(s) present in that
>>>> structure are
>>>> used, instead of the default ones.
>>>>
>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>>
>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
>>>> new IPC
>>>> object (changes the ipc_addid() path).
>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>>>> for a newly
>>>> allocated process (changes the
>>>> alloc_pid()/alloc_pidmap() paths).
>>>>
>>>> Any comment and/or suggestions are welcome.
>>>>
>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>>
>>>> Regards,
>>>> Nadia
>>>>
>>>> --
>>>>
>>>> --
>>>
>>>
>>>
>>>
>>
>>
>> Regards,
>> Nadia
>
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-14 16:02 ` Oren Laadan
@ 2008-03-14 16:11 ` Nadia Derbey
1 sibling, 0 replies; 31+ messages in thread
From: Nadia Derbey @ 2008-03-14 16:11 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Pavel Emelyanov wrote:
> Oren Laadan wrote:
>
>>Nadia Derbey wrote:
>>
>>>Oren Laadan wrote:
>>>
>>>>Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>
>>>>
>>>>>A couple of weeks ago, a discussion has started after Pierre's
>>>>>proposal for
>>>>>a new syscall to change an ipc id (see thread
>>>>>http://lkml.org/lkml/2008/1/29/209).
>>>>>
>>>>>
>>>>>Oren's suggestion was to force an object's id during its creation,
>>>>>rather
>>>>>than 1. create it, 2. change its id.
>>>>>
>>>>>So here is an implementation of what Oren has suggested.
>>>>>
>>>>>2 new files are defined under /proc/self:
>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>> (see patch #2 for more details).
>>>>
>>>>Generally looks good. One meta-comment, though:
>>>>
>>>>I wonder why you use separate files for separate resources,
>>>
>>>That would be needed in a situation wheere we don't care about next,
>>>say, ipc id to be created but we need a predefined pid. But I must admit
>>>I don't see any pratical application to it.
>>
>>exactly; why set the next-ipc value so far in advance ? I think it's
>>better (and less confusing) if we require that setting the next-id value
>>be done right before the respective syscall.
>
>
> And race with some other syscall caller? This will only work if the next-ipc-id
> and the next-pid are on a task_struct. Are they (at least supposed to be such)?
>
>
Yes they are.
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>
@ 2008-03-14 16:45 ` Oren Laadan
[not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2008-03-14 16:45 UTC (permalink / raw)
To: Nadia Derbey
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Nadia Derbey wrote:
> Oren Laadan wrote:
>>
>>
>> Nadia Derbey wrote:
>>
>>> Oren Laadan wrote:
>>>
>>>>
>>>>
>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>
>>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>>> proposal for
>>>>> a new syscall to change an ipc id (see thread
>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>
>>>>>
>>>>> Oren's suggestion was to force an object's id during its creation,
>>>>> rather
>>>>> than 1. create it, 2. change its id.
>>>>>
>>>>> So here is an implementation of what Oren has suggested.
>>>>>
>>>>> 2 new files are defined under /proc/self:
>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>> (see patch #2 for more details).
>>>>
>>>>
>>>>
>>>> Generally looks good. One meta-comment, though:
>>>>
>>>> I wonder why you use separate files for separate resources,
>>>
>>>
>>> That would be needed in a situation wheere we don't care about next,
>>> say, ipc id to be created but we need a predefined pid. But I must
>>> admit I don't see any pratical application to it.
>>
>>
>> exactly; why set the next-ipc value so far in advance ? I think it's
>> better (and less confusing) if we require that setting the next-id value
>> be done right before the respective syscall.
>
> Ok, but this "requirement" should be widely agreed upon ;-)
A discussion on the overall checkpoint/restart policy is certainly due
(and increasingly noted recently).
> What I mean here is that the solution with 1 file per "object type" can
> easily be extended imho:
I'm aiming at simplicity and minimal (but not restrictive) API for user
space. I argue that we never really need more than one predetermined value
at a time (eg see below), and the cost of setting such value is so small
that there is no real benefit in setting more than one at a time (either
via multiple files or via an array of values). If in fact you wanted more
than one type at a time, you could still make it happen with a single
file without adding many user-visible files in /proc/<pid>.
So far, I can't think of any such identifier that we'd like to pre-set
that does not fit into a "long" type; simply because the kernel does not
use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To
be on the safe side, we can require that the format be "long VAL", just
in case (and later you could have other formats).
The only exception, perhaps, is if a TCP connection is rebuilt with a,
say, connect() syscall, and some information needs to be "predetermined"
so we'll need to extend the format. That can be done with another type
eg. "tcp ....." or a separate file (per your view), _then_, not now.
(As a side note, I don't suggest that this is how TCP will be restored).
In any event, the bottom line is that a single file, with a single
value at a time (possibly annotated with a type), is the simplest, and
isn't restrictive, for our purposes. Looking one step ahead, simplicity
and minimal commitment to user space is important in trying to push this
to the mainline kernel...
> I don't know how the restart is supposed to work, but we can imagine
> feeding all these files with all the object ids just before restart and
Building on my own experience with zap I envision the restart operation
of a given task occurring in the context of that task. (I assume this is
how restart will work). Therefore, it makes much sense that before every
syscall that requires a pre-determined resource identifier (eg. clone,
ipc, pty allocation), the task will place the desired value in "next_id"
(and that will only be meaningful during restart) and invoke the said
syscall. Voila.
Note that the restart will "rebuild" the container's state (and the task
state) as it reads in the data from some source. It is likely that not
all data will be available when the first said syscall is about to be
invoked, so you may not be able to feed everything ahead of time.
> let the process pick up the objects ids as it needs them.
> Of course, this would require to enhance the files formats, as well as
> the way things are stored in the task_struct.
>
> Hope what I'm saying is not too stupid ;-) ?
>
> Regards,
> Nadia
>
>>
>>>
>>>> and why you'd
>>>> want to write multiple identifiers in one go;
>>>
>>>
>>> I used multiple identifiers only for the pid values: this is because
>>> when a new pid value is allocated for a process that belongs to
>>> nested namespaces, the lower level upid nr values are allocated in a
>>> single shot. (see alloc_pid()).
>>>
>>>> it seems to complicate the
>>>> code and interface with minimal gain.
>>>> In practice, a process will only do either one or the other, so a
>>>> single
>>>> file is enough (e.g. "next_id").
>>>> Also, writing a single value at a time followed by the syscall is
>>>> enough;
>>>> it's definitely not a performance issue to have multiple calls.
>>>> We assume the user/caller knows what she's doing, so no need to
>>>> classify
>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id)
>>>> ahead
>>>> of time. The caller simply writes a value and then calls the relevant
>>>> syscall, or otherwise the results may not be what she expected...
>>>> If such context is expected to be required (although I don't see any at
>>>> the moment), we can require that the user write "TYPE VALUE" pair to
>>>> the "next_id" file.
>>>
>>>
>>> That's exactly what I wanted to avoid by creating 1 file per object.
>>> Now, it's true that in a restart context where I guess that things
>>> will be done synchronously, we could have a single next_id file.
>>>
>>>>
>>>>>
>>>>> When one of these files (or both of them) is filled, a structure
>>>>> pointed to
>>>>> by the calling task struct is filled with these ids.
>>>>>
>>>>> Then, when the object is created, the id(s) present in that
>>>>> structure are
>>>>> used, instead of the default ones.
>>>>>
>>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>>>
>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
>>>>> new IPC
>>>>> object (changes the ipc_addid() path).
>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>>>>> for a newly
>>>>> allocated process (changes the
>>>>> alloc_pid()/alloc_pidmap() paths).
>>>>>
>>>>> Any comment and/or suggestions are welcome.
>>>>>
>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>>>
>>>>> Regards,
>>>>> Nadia
>>>>>
>>>>> --
>>>>>
>>>>> --
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> Regards,
>>> Nadia
>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-03-16 3:43 ` Serge E. Hallyn
[not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2008-03-16 3:43 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> Nadia Derbey wrote:
> > Oren Laadan wrote:
> >>
> >>
> >> Nadia Derbey wrote:
> >>
> >>> Oren Laadan wrote:
> >>>
> >>>>
> >>>>
> >>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
> >>>>
> >>>>> A couple of weeks ago, a discussion has started after Pierre's
> >>>>> proposal for
> >>>>> a new syscall to change an ipc id (see thread
> >>>>> http://lkml.org/lkml/2008/1/29/209).
> >>>>>
> >>>>>
> >>>>> Oren's suggestion was to force an object's id during its creation,
> >>>>> rather
> >>>>> than 1. create it, 2. change its id.
> >>>>>
> >>>>> So here is an implementation of what Oren has suggested.
> >>>>>
> >>>>> 2 new files are defined under /proc/self:
> >>>>> . next_ipcid --> next id to use for ipc object creation
> >>>>> . next_pids --> next upid nr(s) to use for next task to be forked
> >>>>> (see patch #2 for more details).
> >>>>
> >>>>
> >>>>
> >>>> Generally looks good. One meta-comment, though:
> >>>>
> >>>> I wonder why you use separate files for separate resources,
> >>>
> >>>
> >>> That would be needed in a situation wheere we don't care about next,
> >>> say, ipc id to be created but we need a predefined pid. But I must
> >>> admit I don't see any pratical application to it.
> >>
> >>
> >> exactly; why set the next-ipc value so far in advance ? I think it's
> >> better (and less confusing) if we require that setting the next-id value
> >> be done right before the respective syscall.
> >
> > Ok, but this "requirement" should be widely agreed upon ;-)
>
> A discussion on the overall checkpoint/restart policy is certainly due
> (and increasingly noted recently).
>
> > What I mean here is that the solution with 1 file per "object type" can
> > easily be extended imho:
>
> I'm aiming at simplicity and minimal (but not restrictive) API for user
> space. I argue that we never really need more than one predetermined value
> at a time (eg see below), and the cost of setting such value is so small
> that there is no real benefit in setting more than one at a time (either
> via multiple files or via an array of values). If in fact you wanted more
> than one type at a time, you could still make it happen with a single
> file without adding many user-visible files in /proc/<pid>.
>
> So far, I can't think of any such identifier that we'd like to pre-set
> that does not fit into a "long" type;
As Nadia has mentioned, if we have checkpointed a container which has
another pid namespace underneath itself, then we will need to restart
some tasks with two predetermined pids. So we'll need two (or more)
longs for the tasks in deeper namespaces.
> simply because the kernel does not
> use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To
> be on the safe side, we can require that the format be "long VAL", just
> in case (and later you could have other formats).
>
> The only exception, perhaps, is if a TCP connection is rebuilt with a,
> say, connect() syscall, and some information needs to be "predetermined"
> so we'll need to extend the format. That can be done with another type
> eg. "tcp ....." or a separate file (per your view), _then_, not now.
> (As a side note, I don't suggest that this is how TCP will be restored).
>
> In any event, the bottom line is that a single file, with a single
> value at a time (possibly annotated with a type), is the simplest, and
> isn't restrictive, for our purposes. Looking one step ahead, simplicity
> and minimal commitment to user space is important in trying to push this
> to the mainline kernel...
>
> > I don't know how the restart is supposed to work, but we can imagine
> > feeding all these files with all the object ids just before restart and
>
> Building on my own experience with zap I envision the restart operation
> of a given task occurring in the context of that task.
Could be, but not necessarily the case. Eric has mentioned using elf
files for restart, and that's one way to go, but whether one central
restart task sets up all the children or the children set themselves up
is yet another design point we haven't decided. I would think that
with a centralized restart it would be easier to assure for instance
that shared anon pages would be properly set up and shared, but since
you advocate each-task-starts-itself I trust zap must handle that.
> (I assume this is
> how restart will work). Therefore, it makes much sense that before every
> syscall that requires a pre-determined resource identifier (eg. clone,
> ipc, pty allocation), the task will place the desired value in "next_id"
> (and that will only be meaningful during restart) and invoke the said
> syscall. Voila.
>
> Note that the restart will "rebuild" the container's state (and the task
> state) as it reads in the data from some source. It is likely that not
> all data will be available when the first said syscall is about to be
> invoked, so you may not be able to feed everything ahead of time.
>
>
> > let the process pick up the objects ids as it needs them.
> > Of course, this would require to enhance the files formats, as well as
> > the way things are stored in the task_struct.
> >
> > Hope what I'm saying is not too stupid ;-) ?
> >
> > Regards,
> > Nadia
> >
> >>
> >>>
> >>>> and why you'd
> >>>> want to write multiple identifiers in one go;
> >>>
> >>>
> >>> I used multiple identifiers only for the pid values: this is because
> >>> when a new pid value is allocated for a process that belongs to
> >>> nested namespaces, the lower level upid nr values are allocated in a
> >>> single shot. (see alloc_pid()).
> >>>
> >>>> it seems to complicate the
> >>>> code and interface with minimal gain.
> >>>> In practice, a process will only do either one or the other, so a
> >>>> single
> >>>> file is enough (e.g. "next_id").
> >>>> Also, writing a single value at a time followed by the syscall is
> >>>> enough;
> >>>> it's definitely not a performance issue to have multiple calls.
> >>>> We assume the user/caller knows what she's doing, so no need to
> >>>> classify
> >>>> the identifier (that is, tell the kernel it's a pid, or an ipc id)
> >>>> ahead
> >>>> of time. The caller simply writes a value and then calls the relevant
> >>>> syscall, or otherwise the results may not be what she expected...
> >>>> If such context is expected to be required (although I don't see any at
> >>>> the moment), we can require that the user write "TYPE VALUE" pair to
> >>>> the "next_id" file.
> >>>
> >>>
> >>> That's exactly what I wanted to avoid by creating 1 file per object.
> >>> Now, it's true that in a restart context where I guess that things
> >>> will be done synchronously, we could have a single next_id file.
> >>>
> >>>>
> >>>>>
> >>>>> When one of these files (or both of them) is filled, a structure
> >>>>> pointed to
> >>>>> by the calling task struct is filled with these ids.
> >>>>>
> >>>>> Then, when the object is created, the id(s) present in that
> >>>>> structure are
> >>>>> used, instead of the default ones.
> >>>>>
> >>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
> >>>>>
> >>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
> >>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
> >>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
> >>>>> new IPC
> >>>>> object (changes the ipc_addid() path).
> >>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
> >>>>> for a newly
> >>>>> allocated process (changes the
> >>>>> alloc_pid()/alloc_pidmap() paths).
> >>>>>
> >>>>> Any comment and/or suggestions are welcome.
> >>>>>
> >>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
> >>>>>
> >>>>> Regards,
> >>>>> Nadia
> >>>>>
> >>>>> --
> >>>>>
> >>>>> --
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> Regards,
> >>> Nadia
> >>
> >>
> >>
> >
> >
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>
@ 2008-03-16 19:08 ` Oren Laadan
[not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 31+ messages in thread
From: Oren Laadan @ 2008-03-16 19:08 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Nadia Derbey wrote:
>>> Oren Laadan wrote:
>>>>
>>>> Nadia Derbey wrote:
>>>>
>>>>> Oren Laadan wrote:
>>>>>
>>>>>>
>>>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>>>
>>>>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>>>>> proposal for
>>>>>>> a new syscall to change an ipc id (see thread
>>>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>>>
>>>>>>>
>>>>>>> Oren's suggestion was to force an object's id during its creation,
>>>>>>> rather
>>>>>>> than 1. create it, 2. change its id.
>>>>>>>
>>>>>>> So here is an implementation of what Oren has suggested.
>>>>>>>
>>>>>>> 2 new files are defined under /proc/self:
>>>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>>>> (see patch #2 for more details).
>>>>>>
>>>>>>
>>>>>> Generally looks good. One meta-comment, though:
>>>>>>
>>>>>> I wonder why you use separate files for separate resources,
>>>>>
>>>>> That would be needed in a situation wheere we don't care about next,
>>>>> say, ipc id to be created but we need a predefined pid. But I must
>>>>> admit I don't see any pratical application to it.
>>>>
>>>> exactly; why set the next-ipc value so far in advance ? I think it's
>>>> better (and less confusing) if we require that setting the next-id value
>>>> be done right before the respective syscall.
>>> Ok, but this "requirement" should be widely agreed upon ;-)
>> A discussion on the overall checkpoint/restart policy is certainly due
>> (and increasingly noted recently).
>>
>>> What I mean here is that the solution with 1 file per "object type" can
>>> easily be extended imho:
>> I'm aiming at simplicity and minimal (but not restrictive) API for user
>> space. I argue that we never really need more than one predetermined value
>> at a time (eg see below), and the cost of setting such value is so small
>> that there is no real benefit in setting more than one at a time (either
>> via multiple files or via an array of values). If in fact you wanted more
>> than one type at a time, you could still make it happen with a single
>> file without adding many user-visible files in /proc/<pid>.
>>
>> So far, I can't think of any such identifier that we'd like to pre-set
>> that does not fit into a "long" type;
>
> As Nadia has mentioned, if we have checkpointed a container which has
> another pid namespace underneath itself, then we will need to restart
> some tasks with two predetermined pids. So we'll need two (or more)
> longs for the tasks in deeper namespaces.
I see. So more than a single "long" type is probably needed. I'd still
prefer that the "scope" of a preset identifier through "next_id" should
be the subsequent syscall; so if you need multiple values for the next
syscall you use it, but you don't support leftovers for the next syscall
to use. The typing system can be something like "long VAL" and then for
array "long* VAL VAL VAL ...", for instance.
>
>> simply because the kernel does not
>> use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To
>> be on the safe side, we can require that the format be "long VAL", just
>> in case (and later you could have other formats).
>>
>> The only exception, perhaps, is if a TCP connection is rebuilt with a,
>> say, connect() syscall, and some information needs to be "predetermined"
>> so we'll need to extend the format. That can be done with another type
>> eg. "tcp ....." or a separate file (per your view), _then_, not now.
>> (As a side note, I don't suggest that this is how TCP will be restored).
>>
>> In any event, the bottom line is that a single file, with a single
>> value at a time (possibly annotated with a type), is the simplest, and
>> isn't restrictive, for our purposes. Looking one step ahead, simplicity
>> and minimal commitment to user space is important in trying to push this
>> to the mainline kernel...
>>
>>> I don't know how the restart is supposed to work, but we can imagine
>>> feeding all these files with all the object ids just before restart and
>> Building on my own experience with zap I envision the restart operation
>> of a given task occurring in the context of that task.
>
> Could be, but not necessarily the case. Eric has mentioned using elf
> files for restart, and that's one way to go, but whether one central
I'm not familiar with the details of this.
> restart task sets up all the children or the children set themselves up
> is yet another design point we haven't decided. I would think that
> with a centralized restart it would be easier to assure for instance
> that shared anon pages would be properly set up and shared, but since
> you advocate each-task-starts-itself I trust zap must handle that.
The main reason I think a task should setup itself, is because most of
the setup requires that new resources be allocated, and the kernel is
already centered around this approach that a task allocates for itself,
not for another task. For instance, if you need to restore a VMA, you
simply call mmap(), a new file, you call open() etc.
Shared anon pages are one example of shared resources that may be used
by multiple processes. Zap's approach is to have the "first" user (in
the sense of the first time the resource is seen during checkpoint) do
the actual restore, and place it in a global table, and then subsequent
tasks will find it in the table and "map" it into their view.
Decentralizing also allow multiple tasks to restart concurrently.
Are we ready to start concrete discussion on the architecture for the
checkpoint/restart ? (and if so .. time to change the subject line).
>
>> (I assume this is
>> how restart will work). Therefore, it makes much sense that before every
>> syscall that requires a pre-determined resource identifier (eg. clone,
>> ipc, pty allocation), the task will place the desired value in "next_id"
>> (and that will only be meaningful during restart) and invoke the said
>> syscall. Voila.
>>
>> Note that the restart will "rebuild" the container's state (and the task
>> state) as it reads in the data from some source. It is likely that not
>> all data will be available when the first said syscall is about to be
>> invoked, so you may not be able to feed everything ahead of time.
>>
>>
>>> let the process pick up the objects ids as it needs them.
>>> Of course, this would require to enhance the files formats, as well as
>>> the way things are stored in the task_struct.
>>>
>>> Hope what I'm saying is not too stupid ;-) ?
>>>
>>> Regards,
>>> Nadia
>>>
>>>>>> and why you'd
>>>>>> want to write multiple identifiers in one go;
>>>>>
>>>>> I used multiple identifiers only for the pid values: this is because
>>>>> when a new pid value is allocated for a process that belongs to
>>>>> nested namespaces, the lower level upid nr values are allocated in a
>>>>> single shot. (see alloc_pid()).
>>>>>
>>>>>> it seems to complicate the
>>>>>> code and interface with minimal gain.
>>>>>> In practice, a process will only do either one or the other, so a
>>>>>> single
>>>>>> file is enough (e.g. "next_id").
>>>>>> Also, writing a single value at a time followed by the syscall is
>>>>>> enough;
>>>>>> it's definitely not a performance issue to have multiple calls.
>>>>>> We assume the user/caller knows what she's doing, so no need to
>>>>>> classify
>>>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id)
>>>>>> ahead
>>>>>> of time. The caller simply writes a value and then calls the relevant
>>>>>> syscall, or otherwise the results may not be what she expected...
>>>>>> If such context is expected to be required (although I don't see any at
>>>>>> the moment), we can require that the user write "TYPE VALUE" pair to
>>>>>> the "next_id" file.
>>>>>
>>>>> That's exactly what I wanted to avoid by creating 1 file per object.
>>>>> Now, it's true that in a restart context where I guess that things
>>>>> will be done synchronously, we could have a single next_id file.
>>>>>
>>>>>>> When one of these files (or both of them) is filled, a structure
>>>>>>> pointed to
>>>>>>> by the calling task struct is filled with these ids.
>>>>>>>
>>>>>>> Then, when the object is created, the id(s) present in that
>>>>>>> structure are
>>>>>>> used, instead of the default ones.
>>>>>>>
>>>>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>>>>>
>>>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
>>>>>>> new IPC
>>>>>>> object (changes the ipc_addid() path).
>>>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>>>>>>> for a newly
>>>>>>> allocated process (changes the
>>>>>>> alloc_pid()/alloc_pidmap() paths).
>>>>>>>
>>>>>>> Any comment and/or suggestions are welcome.
>>>>>>>
>>>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nadia
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Regards,
>>>>> Nadia
>>>>
>>>>
>>>
>> _______________________________________________
>> Containers mailing list
>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2008-03-17 14:44 ` Serge E. Hallyn
0 siblings, 0 replies; 31+ messages in thread
From: Serge E. Hallyn @ 2008-03-17 14:44 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xemul-GEFAQzZX7r8dnm+yROfE0A
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> Serge E. Hallyn wrote:
>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>
>>> Nadia Derbey wrote:
>>>> Oren Laadan wrote:
>>>>>
>>>>> Nadia Derbey wrote:
>>>>>
>>>>>> Oren Laadan wrote:
>>>>>>
>>>>>>>
>>>>>>> Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>>>>>>
>>>>>>>> A couple of weeks ago, a discussion has started after Pierre's
>>>>>>>> proposal for
>>>>>>>> a new syscall to change an ipc id (see thread
>>>>>>>> http://lkml.org/lkml/2008/1/29/209).
>>>>>>>>
>>>>>>>>
>>>>>>>> Oren's suggestion was to force an object's id during its creation,
>>>>>>>> rather
>>>>>>>> than 1. create it, 2. change its id.
>>>>>>>>
>>>>>>>> So here is an implementation of what Oren has suggested.
>>>>>>>>
>>>>>>>> 2 new files are defined under /proc/self:
>>>>>>>> . next_ipcid --> next id to use for ipc object creation
>>>>>>>> . next_pids --> next upid nr(s) to use for next task to be forked
>>>>>>>> (see patch #2 for more details).
>>>>>>>
>>>>>>>
>>>>>>> Generally looks good. One meta-comment, though:
>>>>>>>
>>>>>>> I wonder why you use separate files for separate resources,
>>>>>>
>>>>>> That would be needed in a situation wheere we don't care about next,
>>>>>> say, ipc id to be created but we need a predefined pid. But I must
>>>>>> admit I don't see any pratical application to it.
>>>>>
>>>>> exactly; why set the next-ipc value so far in advance ? I think it's
>>>>> better (and less confusing) if we require that setting the next-id
>>>>> value
>>>>> be done right before the respective syscall.
>>>> Ok, but this "requirement" should be widely agreed upon ;-)
>>> A discussion on the overall checkpoint/restart policy is certainly due
>>> (and increasingly noted recently).
>>>
>>>> What I mean here is that the solution with 1 file per "object type" can
>>>> easily be extended imho:
>>> I'm aiming at simplicity and minimal (but not restrictive) API for user
>>> space. I argue that we never really need more than one predetermined
>>> value
>>> at a time (eg see below), and the cost of setting such value is so small
>>> that there is no real benefit in setting more than one at a time (either
>>> via multiple files or via an array of values). If in fact you wanted more
>>> than one type at a time, you could still make it happen with a single
>>> file without adding many user-visible files in /proc/<pid>.
>>>
>>> So far, I can't think of any such identifier that we'd like to pre-set
>>> that does not fit into a "long" type;
>> As Nadia has mentioned, if we have checkpointed a container which has
>> another pid namespace underneath itself, then we will need to restart
>> some tasks with two predetermined pids. So we'll need two (or more)
>> longs for the tasks in deeper namespaces.
>
> I see. So more than a single "long" type is probably needed. I'd still
> prefer that the "scope" of a preset identifier through "next_id" should
> be the subsequent syscall;
> so if you need multiple values for the next
> syscall you use it, but you don't support leftovers for the next syscall
> to use.
Agreed.
> The typing system can be something like "long VAL" and then for
> array "long* VAL VAL VAL ...", for instance.
>
>>> simply because the kernel does not
>>> use such identifiers in the first place (pid, ipc, pty#, vc# .. etc). To
>>> be on the safe side, we can require that the format be "long VAL", just
>>> in case (and later you could have other formats).
>>>
>>> The only exception, perhaps, is if a TCP connection is rebuilt with a,
>>> say, connect() syscall, and some information needs to be "predetermined"
>>> so we'll need to extend the format. That can be done with another type
>>> eg. "tcp ....." or a separate file (per your view), _then_, not now.
>>> (As a side note, I don't suggest that this is how TCP will be restored).
>>>
>>> In any event, the bottom line is that a single file, with a single
>>> value at a time (possibly annotated with a type), is the simplest, and
>>> isn't restrictive, for our purposes. Looking one step ahead, simplicity
>>> and minimal commitment to user space is important in trying to push this
>>> to the mainline kernel...
>>>
>>>> I don't know how the restart is supposed to work, but we can imagine
>>>> feeding all these files with all the object ids just before restart and
>>> Building on my own experience with zap I envision the restart operation
>>> of a given task occurring in the context of that task.
>> Could be, but not necessarily the case. Eric has mentioned using elf
>> files for restart, and that's one way to go, but whether one central
>
> I'm not familiar with the details of this.
Well he wasn't specific and I'm not sure what his details were, I just
pictured it the way crack and other userspace c/r systems have worked,
where the checkpoint creates and ELF which you execute to restart the
task(set).
>> restart task sets up all the children or the children set themselves up
>> is yet another design point we haven't decided. I would think that
>> with a centralized restart it would be easier to assure for instance
>> that shared anon pages would be properly set up and shared, but since
>> you advocate each-task-starts-itself I trust zap must handle that.
>
> The main reason I think a task should setup itself, is because most of
> the setup requires that new resources be allocated, and the kernel is
> already centered around this approach that a task allocates for itself,
> not for another task. For instance, if you need to restore a VMA, you
> simply call mmap(), a new file, you call open() etc.
Agreed, it does seem cleaner, and if we go with the "sys_create_id()"
approach then clearly that's where we're aiming.
> Shared anon pages are one example of shared resources that may be used
> by multiple processes. Zap's approach is to have the "first" user (in
> the sense of the first time the resource is seen during checkpoint) do
> the actual restore, and place it in a global table, and then subsequent
> tasks will find it in the table and "map" it into their view.
Makes sense.
> Decentralizing also allow multiple tasks to restart concurrently.
Yes, but we lose that if we force create_with_pid() to be implemented
by setting /proc/sys/whatever/pid_min and max :)
> Are we ready to start concrete discussion on the architecture for the
> checkpoint/restart ? (and if so .. time to change the subject line).
Good news on this topic - unofficial word is that the containers
mini-summit at OLS has been approved. They don't yet know whether
it will be monday or tuesday, but hopefully this is enough information
early enough for anyone needing to make/change travel plans.
thanks,
-serge
>>> (I assume this is
>>> how restart will work). Therefore, it makes much sense that before every
>>> syscall that requires a pre-determined resource identifier (eg. clone,
>>> ipc, pty allocation), the task will place the desired value in "next_id"
>>> (and that will only be meaningful during restart) and invoke the said
>>> syscall. Voila.
>>>
>>> Note that the restart will "rebuild" the container's state (and the task
>>> state) as it reads in the data from some source. It is likely that not
>>> all data will be available when the first said syscall is about to be
>>> invoked, so you may not be able to feed everything ahead of time.
>>>
>>>
>>>> let the process pick up the objects ids as it needs them.
>>>> Of course, this would require to enhance the files formats, as well as
>>>> the way things are stored in the task_struct.
>>>>
>>>> Hope what I'm saying is not too stupid ;-) ?
>>>>
>>>> Regards,
>>>> Nadia
>>>>
>>>>>>> and why you'd
>>>>>>> want to write multiple identifiers in one go;
>>>>>>
>>>>>> I used multiple identifiers only for the pid values: this is because
>>>>>> when a new pid value is allocated for a process that belongs to nested
>>>>>> namespaces, the lower level upid nr values are allocated in a single
>>>>>> shot. (see alloc_pid()).
>>>>>>
>>>>>>> it seems to complicate the
>>>>>>> code and interface with minimal gain.
>>>>>>> In practice, a process will only do either one or the other, so a
>>>>>>> single
>>>>>>> file is enough (e.g. "next_id").
>>>>>>> Also, writing a single value at a time followed by the syscall is
>>>>>>> enough;
>>>>>>> it's definitely not a performance issue to have multiple calls.
>>>>>>> We assume the user/caller knows what she's doing, so no need to
>>>>>>> classify
>>>>>>> the identifier (that is, tell the kernel it's a pid, or an ipc id)
>>>>>>> ahead
>>>>>>> of time. The caller simply writes a value and then calls the relevant
>>>>>>> syscall, or otherwise the results may not be what she expected...
>>>>>>> If such context is expected to be required (although I don't see any
>>>>>>> at
>>>>>>> the moment), we can require that the user write "TYPE VALUE" pair to
>>>>>>> the "next_id" file.
>>>>>>
>>>>>> That's exactly what I wanted to avoid by creating 1 file per object.
>>>>>> Now, it's true that in a restart context where I guess that things
>>>>>> will be done synchronously, we could have a single next_id file.
>>>>>>
>>>>>>>> When one of these files (or both of them) is filled, a structure
>>>>>>>> pointed to
>>>>>>>> by the calling task struct is filled with these ids.
>>>>>>>>
>>>>>>>> Then, when the object is created, the id(s) present in that
>>>>>>>> structure are
>>>>>>>> used, instead of the default ones.
>>>>>>>>
>>>>>>>> The patches are against 2.6.25-rc3-mm1, in the following order:
>>>>>>>>
>>>>>>>> [PATCH 1/4] adds the procfs facility for next ipc to be created.
>>>>>>>> [PATCH 2/4] adds the procfs facility for next task to be forked.
>>>>>>>> [PATCH 3/4] makes use of the specified id (if any) to allocate the
>>>>>>>> new IPC
>>>>>>>> object (changes the ipc_addid() path).
>>>>>>>> [PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s)
>>>>>>>> for a newly
>>>>>>>> allocated process (changes the
>>>>>>>> alloc_pid()/alloc_pidmap() paths).
>>>>>>>>
>>>>>>>> Any comment and/or suggestions are welcome.
>>>>>>>>
>>>>>>>> Cc-ing Pavel and Sukadev, since they are the pid namespace authors.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nadia
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Nadia
>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> Containers mailing list
>>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>>> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-04-04 14:51 Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
` (9 more replies)
0 siblings, 10 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl
Hi,
When restarting a process that has been previously checkpointed, that process
should keep on using some of its ids (such as its process id, or sysV ipc ids).
This patch provides a feature that can help ensuring this saved state reuse:
it makes it possible to create an object with a pre-defined id.
A first implementation had been proposed 2 months ago. It consisted in
changing an object's id after it had been created.
Here is a second implementation based on Oren Ladaan's idea: Oren's suggestion
was to force an object's id during its creation, rather than 1. create it,
2. change its id.
A new file is created in procfs: /proc/self/next_id.
When this file is filled with and id value, a structure pointed to by the
calling task struct is filled with that id.
Then, when an object supporting this feature is created, the id present in
that new structure is used, instead of the default one.
The syntax is one of:
. echo "LONG XX" > /proc/self/next_id
next object to be created will have an id set to XX
. echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
next object to be created will have its ids set to XX0, ... X<n-1>
This is particularly useful for processes that may have several ids if
they belong to nested namespaces.
The objects covered here are ipc objects and processes.
Today, the ids are specified as long, but having a type string specified in
the next_id file makes it possible to cover more types in the future, if
needed.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next object to be created, this
object being associated to a single id.
[PATCH 2/4] enhances the procfs facility for objects associated to multiple
ids (like processes).
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-04-04 14:51 Nadia.Derbey-6ktuUTfB/bM
0 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Hi,
When restarting a process that has been previously checkpointed, that process
should keep on using some of its ids (such as its process id, or sysV ipc ids).
This patch provides a feature that can help ensuring this saved state reuse:
it makes it possible to create an object with a pre-defined id.
A first implementation had been proposed 2 months ago. It consisted in
changing an object's id after it had been created.
Here is a second implementation based on Oren Ladaan's idea: Oren's suggestion
was to force an object's id during its creation, rather than 1. create it,
2. change its id.
A new file is created in procfs: /proc/self/next_id.
When this file is filled with and id value, a structure pointed to by the
calling task struct is filled with that id.
Then, when an object supporting this feature is created, the id present in
that new structure is used, instead of the default one.
The syntax is one of:
. echo "LONG XX" > /proc/self/next_id
next object to be created will have an id set to XX
. echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
next object to be created will have its ids set to XX0, ... X<n-1>
This is particularly useful for processes that may have several ids if
they belong to nested namespaces.
The objects covered here are ipc objects and processes.
Today, the ids are specified as long, but having a type string specified in
the next_id file makes it possible to cover more types in the future, if
needed.
The patches are against 2.6.25-rc3-mm1, in the following order:
[PATCH 1/4] adds the procfs facility for next object to be created, this
object being associated to a single id.
[PATCH 2/4] enhances the procfs facility for objects associated to multiple
ids (like processes).
[PATCH 3/4] makes use of the specified id (if any) to allocate the new IPC
object (changes the ipc_addid() path).
[PATCH 4/4] uses the specified id(s) (if any) to set the upid nr(s) for a newly
allocated process (changes the alloc_pid()/alloc_pidmap() paths).
Any comment and/or suggestions are welcome.
Regards,
Nadia
--
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 1/4] Provide a new procfs interface to set next id
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
@ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
` (8 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Nadia Derbey
[-- Attachment #1: proc_set_next_id.patch --]
[-- Type: text/plain, Size: 8855 bytes --]
[PATCH 01/04]
This patch proposes the procfs facilities needed to feed the id for the
next object to be allocated.
if an
echo "LONG XX" > /proc/self/next_id
is issued, next object to be created will have XX as its id.
This applies to objects that need a single id, such as ipc objects.
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
---
fs/exec.c | 3 +
fs/proc/base.c | 73 +++++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 3 +
include/linux/sysids.h | 24 +++++++++++++
kernel/Makefile | 2 -
kernel/exit.c | 4 ++
kernel/fork.c | 2 +
kernel/nextid.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 196 insertions(+), 1 deletion(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200
@@ -0,0 +1,24 @@
+/*
+ * include/linux/sysids.h
+ *
+ * Definitions to support object creation with predefined id.
+ *
+ */
+
+#ifndef _LINUX_SYSIDS_H
+#define _LINUX_SYSIDS_H
+
+struct sys_id {
+ long id;
+};
+
+extern ssize_t get_nextid(struct task_struct *, char *, size_t);
+extern int set_nextid(struct task_struct *, char *);
+extern int reset_nextid(struct task_struct *);
+
+static inline void exit_nextid(struct task_struct *tsk)
+{
+ reset_nextid(tsk);
+}
+
+#endif /* _LINUX_SYSIDS_H */
Index: linux-2.6.25-rc8-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sched.h 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sched.h 2008-04-04 13:55:10.000000000 +0200
@@ -88,6 +88,7 @@ struct sched_param {
#include <linux/task_io_accounting.h>
#include <linux/kobject.h>
#include <linux/latencytop.h>
+#include <linux/sysids.h>
#include <asm/processor.h>
@@ -1278,6 +1279,8 @@ struct task_struct {
int latency_record_count;
struct latency_record latency_record[LT_SAVECOUNT];
#endif
+ /* Id to assign to the next resource to be created */
+ struct sys_id *next_id;
};
/*
Index: linux-2.6.25-rc8-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/fs/proc/base.c 2008-04-04 13:11:35.000000000 +0200
+++ linux-2.6.25-rc8-mm1/fs/proc/base.c 2008-04-04 13:57:18.000000000 +0200
@@ -1138,6 +1138,77 @@ static const struct file_operations proc
#endif
+static ssize_t next_id_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task;
+ char *page;
+ ssize_t length;
+
+ task = get_proc_task(file->f_path.dentry->d_inode);
+ if (!task)
+ return -ESRCH;
+
+ if (count >= PAGE_SIZE)
+ count = PAGE_SIZE - 1;
+
+ length = -ENOMEM;
+ page = (char *) __get_free_page(GFP_TEMPORARY);
+ if (!page)
+ goto out;
+
+ length = get_nextid(task, (char *) page, count);
+ if (length >= 0)
+ length = simple_read_from_buffer(buf, count, ppos,
+ (char *)page, length);
+ free_page((unsigned long) page);
+
+out:
+ put_task_struct(task);
+ return length;
+}
+
+static ssize_t next_id_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ char *page;
+ ssize_t length;
+
+ if (pid_task(proc_pid(inode), PIDTYPE_PID) != current)
+ return -EPERM;
+
+ if (count >= PAGE_SIZE)
+ count = PAGE_SIZE - 1;
+
+ if (*ppos != 0) {
+ /* No partial writes. */
+ return -EINVAL;
+ }
+ page = (char *)__get_free_page(GFP_TEMPORARY);
+ if (!page)
+ return -ENOMEM;
+ length = -EFAULT;
+ if (copy_from_user(page, buf, count))
+ goto out_free_page;
+
+ page[count] = '\0';
+
+ length = set_nextid(current, page);
+ if (!length)
+ length = count;
+
+out_free_page:
+ free_page((unsigned long) page);
+ return length;
+}
+
+static const struct file_operations proc_next_id_operations = {
+ .read = next_id_read,
+ .write = next_id_write,
+};
+
+
#ifdef CONFIG_SCHED_DEBUG
/*
* Print out various scheduling related per-task fields:
@@ -2453,6 +2524,7 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, pid_io_accounting),
#endif
+ REG("next_id", S_IRUGO|S_IWUSR, next_id),
};
static int proc_tgid_base_readdir(struct file * filp,
@@ -2779,6 +2851,7 @@ static const struct pid_entry tid_base_s
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+ REG("next_id", S_IRUGO|S_IWUSR, next_id),
};
static int proc_tid_base_readdir(struct file * filp,
Index: linux-2.6.25-rc8-mm1/kernel/Makefile
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/Makefile 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/Makefile 2008-04-04 13:58:22.000000000 +0200
@@ -9,7 +9,7 @@ obj-y = sched.o fork.o exec_domain.o
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
- notifier.o ksysfs.o pm_qos_params.o
+ notifier.o ksysfs.o pm_qos_params.o nextid.o
obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200
@@ -0,0 +1,86 @@
+/*
+ * linux/kernel/nextid.c
+ *
+ *
+ * Provide the get_nextid() / set_nextid() routines
+ * (called from fs/proc/base.c).
+ * They allow to specify the id for the next resource to be allocated,
+ * instead of letting the allocator set it for us.
+ */
+
+#include <linux/sched.h>
+#include <linux/ctype.h>
+
+
+
+ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size)
+{
+ struct sys_id *sid;
+
+ sid = task->next_id;
+ if (!sid)
+ return snprintf(buffer, size, "UNSET\n");
+
+ return snprintf(buffer, size, "LONG %ld\n", sid->id);
+}
+
+static int set_single_id(struct task_struct *task, char *buffer)
+{
+ struct sys_id *sid;
+ long next_id;
+ char *end;
+
+ next_id = simple_strtol(buffer, &end, 0);
+ if (end == buffer || (end && !isspace(*end)))
+ return -EINVAL;
+
+ sid = task->next_id;
+ if (!sid) {
+ sid = kzalloc(sizeof(*sid), GFP_KERNEL);
+ if (!sid)
+ return -ENOMEM;
+ task->next_id = sid;
+ }
+
+ sid->id = next_id;
+
+ return 0;
+}
+
+int reset_nextid(struct task_struct *task)
+{
+ struct sys_id *sid;
+
+ sid = task->next_id;
+ if (!sid)
+ return 0;
+
+ task->next_id = NULL;
+ kfree(sid);
+ return 0;
+}
+
+#define LONG_STR "LONG"
+#define RESET_STR "RESET"
+
+/*
+ * Parses a line written to /proc/self/next_id.
+ * this line has the following format:
+ * LONG id --> a single id is specified
+ */
+int set_nextid(struct task_struct *task, char *buffer)
+{
+ char *token, *out = buffer;
+
+ if (!out)
+ return -EINVAL;
+
+ token = strsep(&out, " ");
+
+ if (!strcmp(token, LONG_STR))
+ return set_single_id(task, out);
+ else if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
+ return reset_nextid(task);
+ else
+ return -EINVAL;
+}
Index: linux-2.6.25-rc8-mm1/kernel/fork.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200
@@ -1167,6 +1167,8 @@ static struct task_struct *copy_process(
p->blocked_on = NULL; /* not blocked yet */
#endif
+ p->next_id = NULL;
+
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
Index: linux-2.6.25-rc8-mm1/kernel/exit.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/exit.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/exit.c 2008-04-04 14:01:22.000000000 +0200
@@ -987,6 +987,10 @@ NORET_TYPE void do_exit(long code)
proc_exit_connector(tsk);
exit_notify(tsk, group_dead);
+
+ if (unlikely(tsk->next_id))
+ exit_nextid(tsk);
+
#ifdef CONFIG_NUMA
mpol_free(tsk->mempolicy);
tsk->mempolicy = NULL;
Index: linux-2.6.25-rc8-mm1/fs/exec.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/fs/exec.c 2008-04-04 13:11:34.000000000 +0200
+++ linux-2.6.25-rc8-mm1/fs/exec.c 2008-04-04 14:02:09.000000000 +0200
@@ -1024,6 +1024,9 @@ int flush_old_exec(struct linux_binprm *
flush_signal_handlers(current, 0);
flush_old_files(current->files);
+ if (unlikely(current->next_id))
+ reset_nextid(current);
+
return 0;
mmap_failed:
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 1/4] Provide a new procfs interface to set next id
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
@ 2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey
` (7 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl, Nadia Derbey
[-- Attachment #1: proc_set_next_id.patch --]
[-- Type: text/plain, Size: 8835 bytes --]
[PATCH 01/04]
This patch proposes the procfs facilities needed to feed the id for the
next object to be allocated.
if an
echo "LONG XX" > /proc/self/next_id
is issued, next object to be created will have XX as its id.
This applies to objects that need a single id, such as ipc objects.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
fs/exec.c | 3 +
fs/proc/base.c | 73 +++++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 3 +
include/linux/sysids.h | 24 +++++++++++++
kernel/Makefile | 2 -
kernel/exit.c | 4 ++
kernel/fork.c | 2 +
kernel/nextid.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 196 insertions(+), 1 deletion(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200
@@ -0,0 +1,24 @@
+/*
+ * include/linux/sysids.h
+ *
+ * Definitions to support object creation with predefined id.
+ *
+ */
+
+#ifndef _LINUX_SYSIDS_H
+#define _LINUX_SYSIDS_H
+
+struct sys_id {
+ long id;
+};
+
+extern ssize_t get_nextid(struct task_struct *, char *, size_t);
+extern int set_nextid(struct task_struct *, char *);
+extern int reset_nextid(struct task_struct *);
+
+static inline void exit_nextid(struct task_struct *tsk)
+{
+ reset_nextid(tsk);
+}
+
+#endif /* _LINUX_SYSIDS_H */
Index: linux-2.6.25-rc8-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sched.h 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sched.h 2008-04-04 13:55:10.000000000 +0200
@@ -88,6 +88,7 @@ struct sched_param {
#include <linux/task_io_accounting.h>
#include <linux/kobject.h>
#include <linux/latencytop.h>
+#include <linux/sysids.h>
#include <asm/processor.h>
@@ -1278,6 +1279,8 @@ struct task_struct {
int latency_record_count;
struct latency_record latency_record[LT_SAVECOUNT];
#endif
+ /* Id to assign to the next resource to be created */
+ struct sys_id *next_id;
};
/*
Index: linux-2.6.25-rc8-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/fs/proc/base.c 2008-04-04 13:11:35.000000000 +0200
+++ linux-2.6.25-rc8-mm1/fs/proc/base.c 2008-04-04 13:57:18.000000000 +0200
@@ -1138,6 +1138,77 @@ static const struct file_operations proc
#endif
+static ssize_t next_id_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task;
+ char *page;
+ ssize_t length;
+
+ task = get_proc_task(file->f_path.dentry->d_inode);
+ if (!task)
+ return -ESRCH;
+
+ if (count >= PAGE_SIZE)
+ count = PAGE_SIZE - 1;
+
+ length = -ENOMEM;
+ page = (char *) __get_free_page(GFP_TEMPORARY);
+ if (!page)
+ goto out;
+
+ length = get_nextid(task, (char *) page, count);
+ if (length >= 0)
+ length = simple_read_from_buffer(buf, count, ppos,
+ (char *)page, length);
+ free_page((unsigned long) page);
+
+out:
+ put_task_struct(task);
+ return length;
+}
+
+static ssize_t next_id_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ char *page;
+ ssize_t length;
+
+ if (pid_task(proc_pid(inode), PIDTYPE_PID) != current)
+ return -EPERM;
+
+ if (count >= PAGE_SIZE)
+ count = PAGE_SIZE - 1;
+
+ if (*ppos != 0) {
+ /* No partial writes. */
+ return -EINVAL;
+ }
+ page = (char *)__get_free_page(GFP_TEMPORARY);
+ if (!page)
+ return -ENOMEM;
+ length = -EFAULT;
+ if (copy_from_user(page, buf, count))
+ goto out_free_page;
+
+ page[count] = '\0';
+
+ length = set_nextid(current, page);
+ if (!length)
+ length = count;
+
+out_free_page:
+ free_page((unsigned long) page);
+ return length;
+}
+
+static const struct file_operations proc_next_id_operations = {
+ .read = next_id_read,
+ .write = next_id_write,
+};
+
+
#ifdef CONFIG_SCHED_DEBUG
/*
* Print out various scheduling related per-task fields:
@@ -2453,6 +2524,7 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, pid_io_accounting),
#endif
+ REG("next_id", S_IRUGO|S_IWUSR, next_id),
};
static int proc_tgid_base_readdir(struct file * filp,
@@ -2779,6 +2851,7 @@ static const struct pid_entry tid_base_s
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+ REG("next_id", S_IRUGO|S_IWUSR, next_id),
};
static int proc_tid_base_readdir(struct file * filp,
Index: linux-2.6.25-rc8-mm1/kernel/Makefile
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/Makefile 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/Makefile 2008-04-04 13:58:22.000000000 +0200
@@ -9,7 +9,7 @@ obj-y = sched.o fork.o exec_domain.o
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
- notifier.o ksysfs.o pm_qos_params.o
+ notifier.o ksysfs.o pm_qos_params.o nextid.o
obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200
@@ -0,0 +1,86 @@
+/*
+ * linux/kernel/nextid.c
+ *
+ *
+ * Provide the get_nextid() / set_nextid() routines
+ * (called from fs/proc/base.c).
+ * They allow to specify the id for the next resource to be allocated,
+ * instead of letting the allocator set it for us.
+ */
+
+#include <linux/sched.h>
+#include <linux/ctype.h>
+
+
+
+ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size)
+{
+ struct sys_id *sid;
+
+ sid = task->next_id;
+ if (!sid)
+ return snprintf(buffer, size, "UNSET\n");
+
+ return snprintf(buffer, size, "LONG %ld\n", sid->id);
+}
+
+static int set_single_id(struct task_struct *task, char *buffer)
+{
+ struct sys_id *sid;
+ long next_id;
+ char *end;
+
+ next_id = simple_strtol(buffer, &end, 0);
+ if (end == buffer || (end && !isspace(*end)))
+ return -EINVAL;
+
+ sid = task->next_id;
+ if (!sid) {
+ sid = kzalloc(sizeof(*sid), GFP_KERNEL);
+ if (!sid)
+ return -ENOMEM;
+ task->next_id = sid;
+ }
+
+ sid->id = next_id;
+
+ return 0;
+}
+
+int reset_nextid(struct task_struct *task)
+{
+ struct sys_id *sid;
+
+ sid = task->next_id;
+ if (!sid)
+ return 0;
+
+ task->next_id = NULL;
+ kfree(sid);
+ return 0;
+}
+
+#define LONG_STR "LONG"
+#define RESET_STR "RESET"
+
+/*
+ * Parses a line written to /proc/self/next_id.
+ * this line has the following format:
+ * LONG id --> a single id is specified
+ */
+int set_nextid(struct task_struct *task, char *buffer)
+{
+ char *token, *out = buffer;
+
+ if (!out)
+ return -EINVAL;
+
+ token = strsep(&out, " ");
+
+ if (!strcmp(token, LONG_STR))
+ return set_single_id(task, out);
+ else if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
+ return reset_nextid(task);
+ else
+ return -EINVAL;
+}
Index: linux-2.6.25-rc8-mm1/kernel/fork.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200
@@ -1167,6 +1167,8 @@ static struct task_struct *copy_process(
p->blocked_on = NULL; /* not blocked yet */
#endif
+ p->next_id = NULL;
+
/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
Index: linux-2.6.25-rc8-mm1/kernel/exit.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/exit.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/exit.c 2008-04-04 14:01:22.000000000 +0200
@@ -987,6 +987,10 @@ NORET_TYPE void do_exit(long code)
proc_exit_connector(tsk);
exit_notify(tsk, group_dead);
+
+ if (unlikely(tsk->next_id))
+ exit_nextid(tsk);
+
#ifdef CONFIG_NUMA
mpol_free(tsk->mempolicy);
tsk->mempolicy = NULL;
Index: linux-2.6.25-rc8-mm1/fs/exec.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/fs/exec.c 2008-04-04 13:11:34.000000000 +0200
+++ linux-2.6.25-rc8-mm1/fs/exec.c 2008-04-04 14:02:09.000000000 +0200
@@ -1024,6 +1024,9 @@ int flush_old_exec(struct linux_binprm *
flush_signal_handlers(current, 0);
flush_old_files(current->files);
+ if (unlikely(current->next_id))
+ reset_nextid(current);
+
return 0;
mmap_failed:
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s)
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (2 preceding siblings ...)
2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey
@ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey
` (5 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Nadia Derbey
[-- Attachment #1: proc_set_next_ids.patch --]
[-- Type: text/plain, Size: 6815 bytes --]
[PATCH 02/04]
This patch proposes the procfs facilities needed to feed the id(s) for the
next task to be forked.
say n is the number of pids to be provided through procfs:
if an
echo "LONG<n> X0 X1 ... X<n-1>" > /proc/self/next_pids
is issued, the next task to be forked will have its upid nrs set as follows
(say it is forked in a pid ns of level L):
level upid nr
L ----------> X0
..
L - i ------> Xi
..
L - n + 1 --> X<n-1>
Then, for levels L-n down to level 0, the pids will be left to the kernel
choice.
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
---
include/linux/sysids.h | 27 ++++++++
kernel/nextid.c | 150 ++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 155 insertions(+), 22 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200
@@ -8,8 +8,33 @@
#ifndef _LINUX_SYSIDS_H
#define _LINUX_SYSIDS_H
+
+#define NIDS_SMALL 32
+#define NIDS_PER_BLOCK ((unsigned int)(PAGE_SIZE / sizeof(long)))
+
+/* access the ids "array" with this macro */
+#define ID_AT(pi, i) \
+ ((pi)->blocks[(i) / NIDS_PER_BLOCK][(i) % NIDS_PER_BLOCK])
+
+
+/*
+ * List of ids for the next object to be created. This presently applies to
+ * next process to be created.
+ * The next process to be created is associated to a set of upid nrs: one for
+ * each pid namespace level that process belongs to.
+ * upid nrs from level 0 up to level <npids - 1> will be automatically
+ * allocated.
+ * upid nr for level nids will be set to blocks[0][0]
+ * upid nr for level <nids + i> will be set to ID_AT(ids, i);
+ *
+ * If a single id is needed, nids is set to 1 and small_block[0] is set to
+ * that id.
+ */
struct sys_id {
- long id;
+ int nids;
+ long small_block[NIDS_SMALL];
+ int nblocks;
+ long *blocks[0];
};
extern ssize_t get_nextid(struct task_struct *, char *, size_t);
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200
@@ -13,38 +13,138 @@
+static struct sys_id *id_blocks_alloc(int nids)
+{
+ struct sys_id *ids;
+ int nblocks;
+ int i;
+
+ nblocks = (nids + NIDS_PER_BLOCK - 1) / NIDS_PER_BLOCK;
+ BUG_ON(nblocks < 1);
+
+ ids = kmalloc(sizeof(*ids) + nblocks * sizeof(long *), GFP_KERNEL);
+ if (!ids)
+ return NULL;
+ ids->nids = nids;
+ ids->nblocks = nblocks;
+
+ if (nids <= NIDS_SMALL)
+ ids->blocks[0] = ids->small_block;
+ else {
+ for (i = 0; i < nblocks; i++) {
+ long *b;
+ b = (void *)__get_free_page(GFP_KERNEL);
+ if (!b)
+ goto out_undo_partial_alloc;
+ ids->blocks[i] = b;
+ }
+ }
+ return ids;
+
+out_undo_partial_alloc:
+ while (--i >= 0)
+ free_page((unsigned long)ids->blocks[i]);
+
+ kfree(ids);
+ return NULL;
+}
+
+static void id_blocks_free(struct sys_id *ids)
+{
+ if (ids == NULL)
+ return;
+
+ if (ids->blocks[0] != ids->small_block) {
+ int i;
+ for (i = 0; i < ids->nblocks; i++)
+ free_page((unsigned long)ids->blocks[i]);
+ }
+ kfree(ids);
+ return;
+}
+
ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size)
{
+ ssize_t count = 0;
struct sys_id *sid;
+ char *bufptr = buffer;
+ int i;
sid = task->next_id;
- if (!sid)
+ if (!sid || !sid->nids)
return snprintf(buffer, size, "UNSET\n");
- return snprintf(buffer, size, "LONG %ld\n", sid->id);
+ count = sprintf(bufptr, "LONGS (%d) ", sid->nids);
+
+ for (i = 0; i < sid->nids - 1; i++)
+ count += sprintf(&bufptr[count], "%ld ", ID_AT(sid, i));
+
+ count += sprintf(&bufptr[count], "%ld\n", ID_AT(sid, i));
+
+ return count;
}
-static int set_single_id(struct task_struct *task, char *buffer)
+static int fill_nextid_list(struct task_struct *task, int nids, char *buffer)
{
- struct sys_id *sid;
- long next_id;
+ char *token, *buff = buffer;
char *end;
+ struct sys_id *sid;
+ struct sys_id *old_list = task->next_id;
+ int i;
- next_id = simple_strtol(buffer, &end, 0);
- if (end == buffer || (end && !isspace(*end)))
- return -EINVAL;
+ sid = id_blocks_alloc(nids);
+ if (!sid)
+ return -ENOMEM;
- sid = task->next_id;
- if (!sid) {
- sid = kzalloc(sizeof(*sid), GFP_KERNEL);
- if (!sid)
- return -ENOMEM;
- task->next_id = sid;
+ i = 0;
+ while ((token = strsep(&buff, " ")) != NULL && i < nids) {
+ long id;
+
+ if (!*token)
+ goto out_free;
+ id = simple_strtol(token, &end, 0);
+ if (end == token || (*end && !isspace(*end)))
+ goto out_free;
+ ID_AT(sid, i) = id;
+ i++;
}
- sid->id = next_id;
+ if (i != nids)
+ /* Not enough pids compared to npids */
+ goto out_free;
+
+ if (old_list)
+ id_blocks_free(old_list);
+ task->next_id = sid;
return 0;
+
+out_free:
+ id_blocks_free(sid);
+ return -EINVAL;
+}
+
+/*
+ * Parses a line with the following format:
+ * <x> <id0> ... <idx-1>
+ * and sets <id0> to <idx-1> as the sequence of ids to be used for the next
+ * object to be created by the task.
+ * This applies to processes that need 1 id per namespace level.
+ * Any trailing character on the line is skipped.
+ */
+static int set_multiple_ids(struct task_struct *task, char *nb, char *buffer)
+{
+ int nids;
+ char *end;
+
+ nids = simple_strtol(nb, &end, 0);
+ if (*end)
+ return -EINVAL;
+
+ if (nids <= 0)
+ return -EINVAL;
+
+ return fill_nextid_list(task, nids, buffer);
}
int reset_nextid(struct task_struct *task)
@@ -55,8 +155,8 @@ int reset_nextid(struct task_struct *tas
if (!sid)
return 0;
+ id_blocks_free(sid);
task->next_id = NULL;
- kfree(sid);
return 0;
}
@@ -65,12 +165,14 @@ int reset_nextid(struct task_struct *tas
/*
* Parses a line written to /proc/self/next_id.
- * this line has the following format:
+ * this line has one of the following formats:
* LONG id --> a single id is specified
+ * LONG<x> id0 ... id<x-1> --> a sequence of ids is specified
*/
int set_nextid(struct task_struct *task, char *buffer)
{
char *token, *out = buffer;
+ size_t sz;
if (!out)
return -EINVAL;
@@ -78,9 +180,15 @@ int set_nextid(struct task_struct *task,
token = strsep(&out, " ");
if (!strcmp(token, LONG_STR))
- return set_single_id(task, out);
- else if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
+ return fill_nextid_list(task, 1, out);
+
+ sz = strlen(LONG_STR);
+
+ if (!strncmp(token, LONG_STR, sz))
+ return set_multiple_ids(task, token + sz, out);
+
+ if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
return reset_nextid(task);
- else
- return -EINVAL;
+
+ return -EINVAL;
}
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s)
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
@ 2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
` (6 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl, Nadia Derbey
[-- Attachment #1: proc_set_next_ids.patch --]
[-- Type: text/plain, Size: 6795 bytes --]
[PATCH 02/04]
This patch proposes the procfs facilities needed to feed the id(s) for the
next task to be forked.
say n is the number of pids to be provided through procfs:
if an
echo "LONG<n> X0 X1 ... X<n-1>" > /proc/self/next_pids
is issued, the next task to be forked will have its upid nrs set as follows
(say it is forked in a pid ns of level L):
level upid nr
L ----------> X0
..
L - i ------> Xi
..
L - n + 1 --> X<n-1>
Then, for levels L-n down to level 0, the pids will be left to the kernel
choice.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
include/linux/sysids.h | 27 ++++++++
kernel/nextid.c | 150 ++++++++++++++++++++++++++++++++++++++++++-------
2 files changed, 155 insertions(+), 22 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 13:53:04.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200
@@ -8,8 +8,33 @@
#ifndef _LINUX_SYSIDS_H
#define _LINUX_SYSIDS_H
+
+#define NIDS_SMALL 32
+#define NIDS_PER_BLOCK ((unsigned int)(PAGE_SIZE / sizeof(long)))
+
+/* access the ids "array" with this macro */
+#define ID_AT(pi, i) \
+ ((pi)->blocks[(i) / NIDS_PER_BLOCK][(i) % NIDS_PER_BLOCK])
+
+
+/*
+ * List of ids for the next object to be created. This presently applies to
+ * next process to be created.
+ * The next process to be created is associated to a set of upid nrs: one for
+ * each pid namespace level that process belongs to.
+ * upid nrs from level 0 up to level <npids - 1> will be automatically
+ * allocated.
+ * upid nr for level nids will be set to blocks[0][0]
+ * upid nr for level <nids + i> will be set to ID_AT(ids, i);
+ *
+ * If a single id is needed, nids is set to 1 and small_block[0] is set to
+ * that id.
+ */
struct sys_id {
- long id;
+ int nids;
+ long small_block[NIDS_SMALL];
+ int nblocks;
+ long *blocks[0];
};
extern ssize_t get_nextid(struct task_struct *, char *, size_t);
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 13:59:59.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200
@@ -13,38 +13,138 @@
+static struct sys_id *id_blocks_alloc(int nids)
+{
+ struct sys_id *ids;
+ int nblocks;
+ int i;
+
+ nblocks = (nids + NIDS_PER_BLOCK - 1) / NIDS_PER_BLOCK;
+ BUG_ON(nblocks < 1);
+
+ ids = kmalloc(sizeof(*ids) + nblocks * sizeof(long *), GFP_KERNEL);
+ if (!ids)
+ return NULL;
+ ids->nids = nids;
+ ids->nblocks = nblocks;
+
+ if (nids <= NIDS_SMALL)
+ ids->blocks[0] = ids->small_block;
+ else {
+ for (i = 0; i < nblocks; i++) {
+ long *b;
+ b = (void *)__get_free_page(GFP_KERNEL);
+ if (!b)
+ goto out_undo_partial_alloc;
+ ids->blocks[i] = b;
+ }
+ }
+ return ids;
+
+out_undo_partial_alloc:
+ while (--i >= 0)
+ free_page((unsigned long)ids->blocks[i]);
+
+ kfree(ids);
+ return NULL;
+}
+
+static void id_blocks_free(struct sys_id *ids)
+{
+ if (ids == NULL)
+ return;
+
+ if (ids->blocks[0] != ids->small_block) {
+ int i;
+ for (i = 0; i < ids->nblocks; i++)
+ free_page((unsigned long)ids->blocks[i]);
+ }
+ kfree(ids);
+ return;
+}
+
ssize_t get_nextid(struct task_struct *task, char *buffer, size_t size)
{
+ ssize_t count = 0;
struct sys_id *sid;
+ char *bufptr = buffer;
+ int i;
sid = task->next_id;
- if (!sid)
+ if (!sid || !sid->nids)
return snprintf(buffer, size, "UNSET\n");
- return snprintf(buffer, size, "LONG %ld\n", sid->id);
+ count = sprintf(bufptr, "LONGS (%d) ", sid->nids);
+
+ for (i = 0; i < sid->nids - 1; i++)
+ count += sprintf(&bufptr[count], "%ld ", ID_AT(sid, i));
+
+ count += sprintf(&bufptr[count], "%ld\n", ID_AT(sid, i));
+
+ return count;
}
-static int set_single_id(struct task_struct *task, char *buffer)
+static int fill_nextid_list(struct task_struct *task, int nids, char *buffer)
{
- struct sys_id *sid;
- long next_id;
+ char *token, *buff = buffer;
char *end;
+ struct sys_id *sid;
+ struct sys_id *old_list = task->next_id;
+ int i;
- next_id = simple_strtol(buffer, &end, 0);
- if (end == buffer || (end && !isspace(*end)))
- return -EINVAL;
+ sid = id_blocks_alloc(nids);
+ if (!sid)
+ return -ENOMEM;
- sid = task->next_id;
- if (!sid) {
- sid = kzalloc(sizeof(*sid), GFP_KERNEL);
- if (!sid)
- return -ENOMEM;
- task->next_id = sid;
+ i = 0;
+ while ((token = strsep(&buff, " ")) != NULL && i < nids) {
+ long id;
+
+ if (!*token)
+ goto out_free;
+ id = simple_strtol(token, &end, 0);
+ if (end == token || (*end && !isspace(*end)))
+ goto out_free;
+ ID_AT(sid, i) = id;
+ i++;
}
- sid->id = next_id;
+ if (i != nids)
+ /* Not enough pids compared to npids */
+ goto out_free;
+
+ if (old_list)
+ id_blocks_free(old_list);
+ task->next_id = sid;
return 0;
+
+out_free:
+ id_blocks_free(sid);
+ return -EINVAL;
+}
+
+/*
+ * Parses a line with the following format:
+ * <x> <id0> ... <idx-1>
+ * and sets <id0> to <idx-1> as the sequence of ids to be used for the next
+ * object to be created by the task.
+ * This applies to processes that need 1 id per namespace level.
+ * Any trailing character on the line is skipped.
+ */
+static int set_multiple_ids(struct task_struct *task, char *nb, char *buffer)
+{
+ int nids;
+ char *end;
+
+ nids = simple_strtol(nb, &end, 0);
+ if (*end)
+ return -EINVAL;
+
+ if (nids <= 0)
+ return -EINVAL;
+
+ return fill_nextid_list(task, nids, buffer);
}
int reset_nextid(struct task_struct *task)
@@ -55,8 +155,8 @@ int reset_nextid(struct task_struct *tas
if (!sid)
return 0;
+ id_blocks_free(sid);
task->next_id = NULL;
- kfree(sid);
return 0;
}
@@ -65,12 +165,14 @@ int reset_nextid(struct task_struct *tas
/*
* Parses a line written to /proc/self/next_id.
- * this line has the following format:
+ * this line has one of the following formats:
* LONG id --> a single id is specified
+ * LONG<x> id0 ... id<x-1> --> a sequence of ids is specified
*/
int set_nextid(struct task_struct *task, char *buffer)
{
char *token, *out = buffer;
+ size_t sz;
if (!out)
return -EINVAL;
@@ -78,9 +180,15 @@ int set_nextid(struct task_struct *task,
token = strsep(&out, " ");
if (!strcmp(token, LONG_STR))
- return set_single_id(task, out);
- else if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
+ return fill_nextid_list(task, 1, out);
+
+ sz = strlen(LONG_STR);
+
+ if (!strncmp(token, LONG_STR, sz))
+ return set_multiple_ids(task, token + sz, out);
+
+ if (!strncmp(token, RESET_STR, strlen(RESET_STR)))
return reset_nextid(task);
- else
- return -EINVAL;
+
+ return -EINVAL;
}
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 3/4] IPC: use the target ID specified in procfs
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (4 preceding siblings ...)
2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey
@ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM
` (3 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Nadia Derbey
[-- Attachment #1: ipc_use_next_id.patch --]
[-- Type: text/plain, Size: 3367 bytes --]
[PATCH 03/04]
This patch makes use of the target id specified by a previous write into
/proc/self/next_id as the id to use to allocate the next IPC object.
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
---
include/linux/sysids.h | 7 +++++++
ipc/util.c | 40 ++++++++++++++++++++++++++++++++--------
kernel/nextid.c | 2 +-
3 files changed, 40 insertions(+), 9 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:37:45.000000000 +0200
@@ -37,9 +37,16 @@ struct sys_id {
long *blocks[0];
};
+#define next_ipcid(tsk) ((tsk)->next_id \
+ ? ((tsk)->next_id->nids \
+ ? ID_AT((tsk)->next_id, 0) \
+ : -1) \
+ : -1)
+
extern ssize_t get_nextid(struct task_struct *, char *, size_t);
extern int set_nextid(struct task_struct *, char *);
extern int reset_nextid(struct task_struct *);
+extern void id_blocks_free(struct sys_id *);
static inline void exit_nextid(struct task_struct *tsk)
{
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:38:38.000000000 +0200
@@ -49,7 +49,7 @@ out_undo_partial_alloc:
return NULL;
}
-static void id_blocks_free(struct sys_id *ids)
+void id_blocks_free(struct sys_id *ids)
{
if (ids == NULL)
return;
Index: linux-2.6.25-rc8-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/ipc/util.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/ipc/util.c 2008-04-04 14:41:53.000000000 +0200
@@ -260,6 +260,7 @@ int ipc_get_maxid(struct ipc_ids *ids)
int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
{
int id, err;
+ int next_id;
if (size > IPCMNI)
size = IPCMNI;
@@ -267,20 +268,43 @@ int ipc_addid(struct ipc_ids* ids, struc
if (ids->in_use >= size)
return -ENOSPC;
- err = idr_get_new(&ids->ipcs_idr, new, &id);
- if (err)
- return err;
+ next_id = next_ipcid(current);
+ if (next_id >= 0) {
+ /* There is a target id specified, try to use it */
+ int new_lid = next_id % SEQ_MULTIPLIER;
+
+ if (next_id !=
+ (new_lid + (next_id / SEQ_MULTIPLIER) * SEQ_MULTIPLIER))
+ return -EINVAL;
+
+ err = idr_get_new_above(&ids->ipcs_idr, new, new_lid, &id);
+ if (err)
+ return err;
+ if (id != new_lid) {
+ idr_remove(&ids->ipcs_idr, id);
+ return -EBUSY;
+ }
+
+ new->id = next_id;
+ new->seq = next_id / SEQ_MULTIPLIER;
+ id_blocks_free(current->next_id);
+ current->next_id = NULL;
+ } else {
+ err = idr_get_new(&ids->ipcs_idr, new, &id);
+ if (err)
+ return err;
+
+ new->seq = ids->seq++;
+ if (ids->seq > ids->seq_max)
+ ids->seq = 0;
+ new->id = ipc_buildid(id, new->seq);
+ }
ids->in_use++;
new->cuid = new->uid = current->euid;
new->gid = new->cgid = current->egid;
- new->seq = ids->seq++;
- if(ids->seq > ids->seq_max)
- ids->seq = 0;
-
- new->id = ipc_buildid(id, new->seq);
spin_lock_init(&new->lock);
new->deleted = 0;
rcu_read_lock();
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 3/4] IPC: use the target ID specified in procfs
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (3 preceding siblings ...)
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
@ 2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
` (4 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl, Nadia Derbey
[-- Attachment #1: ipc_use_next_id.patch --]
[-- Type: text/plain, Size: 3347 bytes --]
[PATCH 03/04]
This patch makes use of the target id specified by a previous write into
/proc/self/next_id as the id to use to allocate the next IPC object.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
include/linux/sysids.h | 7 +++++++
ipc/util.c | 40 ++++++++++++++++++++++++++++++++--------
kernel/nextid.c | 2 +-
3 files changed, 40 insertions(+), 9 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/sysids.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/sysids.h 2008-04-04 14:18:04.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/sysids.h 2008-04-04 14:37:45.000000000 +0200
@@ -37,9 +37,16 @@ struct sys_id {
long *blocks[0];
};
+#define next_ipcid(tsk) ((tsk)->next_id \
+ ? ((tsk)->next_id->nids \
+ ? ID_AT((tsk)->next_id, 0) \
+ : -1) \
+ : -1)
+
extern ssize_t get_nextid(struct task_struct *, char *, size_t);
extern int set_nextid(struct task_struct *, char *);
extern int reset_nextid(struct task_struct *);
+extern void id_blocks_free(struct sys_id *);
static inline void exit_nextid(struct task_struct *tsk)
{
Index: linux-2.6.25-rc8-mm1/kernel/nextid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/nextid.c 2008-04-04 14:28:13.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/nextid.c 2008-04-04 14:38:38.000000000 +0200
@@ -49,7 +49,7 @@ out_undo_partial_alloc:
return NULL;
}
-static void id_blocks_free(struct sys_id *ids)
+void id_blocks_free(struct sys_id *ids)
{
if (ids == NULL)
return;
Index: linux-2.6.25-rc8-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/ipc/util.c 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/ipc/util.c 2008-04-04 14:41:53.000000000 +0200
@@ -260,6 +260,7 @@ int ipc_get_maxid(struct ipc_ids *ids)
int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
{
int id, err;
+ int next_id;
if (size > IPCMNI)
size = IPCMNI;
@@ -267,20 +268,43 @@ int ipc_addid(struct ipc_ids* ids, struc
if (ids->in_use >= size)
return -ENOSPC;
- err = idr_get_new(&ids->ipcs_idr, new, &id);
- if (err)
- return err;
+ next_id = next_ipcid(current);
+ if (next_id >= 0) {
+ /* There is a target id specified, try to use it */
+ int new_lid = next_id % SEQ_MULTIPLIER;
+
+ if (next_id !=
+ (new_lid + (next_id / SEQ_MULTIPLIER) * SEQ_MULTIPLIER))
+ return -EINVAL;
+
+ err = idr_get_new_above(&ids->ipcs_idr, new, new_lid, &id);
+ if (err)
+ return err;
+ if (id != new_lid) {
+ idr_remove(&ids->ipcs_idr, id);
+ return -EBUSY;
+ }
+
+ new->id = next_id;
+ new->seq = next_id / SEQ_MULTIPLIER;
+ id_blocks_free(current->next_id);
+ current->next_id = NULL;
+ } else {
+ err = idr_get_new(&ids->ipcs_idr, new, &id);
+ if (err)
+ return err;
+
+ new->seq = ids->seq++;
+ if (ids->seq > ids->seq_max)
+ ids->seq = 0;
+ new->id = ipc_buildid(id, new->seq);
+ }
ids->in_use++;
new->cuid = new->uid = current->euid;
new->gid = new->cgid = current->egid;
- new->seq = ids->seq++;
- if(ids->seq > ids->seq_max)
- ids->seq = 0;
-
- new->id = ipc_buildid(id, new->seq);
spin_lock_init(&new->lock);
new->deleted = 0;
rcu_read_lock();
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 4/4] PID: use the target ID specified in procfs
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (5 preceding siblings ...)
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
@ 2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
` (2 subsequent siblings)
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Nadia Derbey
[-- Attachment #1: upidnr_use_next_id.patch --]
[-- Type: text/plain, Size: 6406 bytes --]
[PATCH 04/04]
This patch makes use of the target ids specified by a previous write to
/proc/self/next_id as the ids to use to allocate the next upid nrs.
Upper levels upid nrs that are not specified in next_pids file are left to the
kernel choice.
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
---
include/linux/pid.h | 2
kernel/fork.c | 3 -
kernel/pid.c | 141 +++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 126 insertions(+), 20 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/pid.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/pid.h 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/pid.h 2008-04-04 14:54:09.000000000 +0200
@@ -121,7 +121,7 @@ extern struct pid *find_get_pid(int nr);
extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
int next_pidmap(struct pid_namespace *pid_ns, int last);
-extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern struct pid *alloc_pid(struct pid_namespace *ns, int *retval);
extern void free_pid(struct pid *pid);
/*
Index: linux-2.6.25-rc8-mm1/kernel/fork.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:54:43.000000000 +0200
@@ -1200,8 +1200,7 @@ static struct task_struct *copy_process(
goto bad_fork_cleanup_io;
if (pid != &init_struct_pid) {
- retval = -ENOMEM;
- pid = alloc_pid(task_active_pid_ns(p));
+ pid = alloc_pid(task_active_pid_ns(p), &retval);
if (!pid)
goto bad_fork_cleanup_io;
Index: linux-2.6.25-rc8-mm1/kernel/pid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/pid.c 2008-04-04 13:11:39.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/pid.c 2008-04-04 14:59:24.000000000 +0200
@@ -122,6 +122,26 @@ static void free_pidmap(struct upid *upi
atomic_inc(&map->nr_free);
}
+static inline int alloc_pidmap_page(struct pidmap *map)
+{
+ if (unlikely(!map->page)) {
+ void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ /*
+ * Free the page if someone raced with us
+ * installing it:
+ */
+ spin_lock_irq(&pidmap_lock);
+ if (map->page)
+ kfree(page);
+ else
+ map->page = page;
+ spin_unlock_irq(&pidmap_lock);
+ if (unlikely(!map->page))
+ return -1;
+ }
+ return 0;
+}
+
static int alloc_pidmap(struct pid_namespace *pid_ns)
{
int i, offset, max_scan, pid, last = pid_ns->last_pid;
@@ -134,21 +154,8 @@ static int alloc_pidmap(struct pid_names
map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
for (i = 0; i <= max_scan; ++i) {
- if (unlikely(!map->page)) {
- void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
- /*
- * Free the page if someone raced with us
- * installing it:
- */
- spin_lock_irq(&pidmap_lock);
- if (map->page)
- kfree(page);
- else
- map->page = page;
- spin_unlock_irq(&pidmap_lock);
- if (unlikely(!map->page))
- break;
- }
+ if (unlikely(alloc_pidmap_page(map)))
+ break;
if (likely(atomic_read(&map->nr_free))) {
do {
if (!test_and_set_bit(offset, map->page)) {
@@ -182,6 +189,35 @@ static int alloc_pidmap(struct pid_names
return -1;
}
+/*
+ * Return a predefined pid value if successful (ID_AT(pid_l, level)),
+ * -errno else
+ */
+static int alloc_fixed_pidmap(struct pid_namespace *pid_ns,
+ struct sys_id *pid_l, int level)
+{
+ int offset, pid;
+ struct pidmap *map;
+
+ pid = ID_AT(pid_l, level);
+ if (pid < RESERVED_PIDS || pid >= pid_max)
+ return -EINVAL;
+
+ map = &pid_ns->pidmap[pid / BITS_PER_PAGE];
+
+ if (unlikely(alloc_pidmap_page(map)))
+ return -ENOMEM;
+
+ offset = pid & BITS_PER_PAGE_MASK;
+ if (test_and_set_bit(offset, map->page))
+ return -EBUSY;
+
+ atomic_dec(&map->nr_free);
+ pid_ns->last_pid = max(pid_ns->last_pid, pid);
+
+ return pid;
+}
+
int next_pidmap(struct pid_namespace *pid_ns, int last)
{
int offset;
@@ -243,20 +279,91 @@ void free_pid(struct pid *pid)
call_rcu(&pid->rcu, delayed_put_pid);
}
-struct pid *alloc_pid(struct pid_namespace *ns)
+/*
+ * Called by alloc_pid() to use a list of predefined ids for the calling
+ * process' upper ns levels.
+ * Returns next pid ns to visit if successful (may be NULL if walked through
+ * the entire pid ns hierarchy).
+ * i is filled with next level to be visited (useful for the error cases).
+ */
+static struct pid_namespace *set_predefined_pids(struct pid_namespace *ns,
+ struct pid *pid,
+ struct sys_id *pid_l,
+ int *next_level)
+{
+ struct pid_namespace *tmp;
+ int rel_level, i, nr;
+
+ rel_level = pid_l->nids - 1;
+ if (rel_level > ns->level)
+ return ERR_PTR(-EINVAL);
+
+ tmp = ns;
+
+ /*
+ * Use the predefined upid nrs for levels ns->level down to
+ * ns->level - rel_level
+ */
+ for (i = ns->level ; rel_level >= 0; i--, rel_level--) {
+ nr = alloc_fixed_pidmap(tmp, pid_l, rel_level);
+ if (nr < 0) {
+ tmp = ERR_PTR(nr);
+ goto out;
+ }
+
+ pid->numbers[i].nr = nr;
+ pid->numbers[i].ns = tmp;
+ tmp = tmp->parent;
+ }
+
+ id_blocks_free(pid_l);
+out:
+ *next_level = i;
+ return tmp;
+}
+
+struct pid *alloc_pid(struct pid_namespace *ns, int *retval)
{
struct pid *pid;
enum pid_type type;
int i, nr;
struct pid_namespace *tmp;
struct upid *upid;
+ struct sys_id *pid_l;
+ *retval = -ENOMEM;
pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
if (!pid)
goto out;
tmp = ns;
- for (i = ns->level; i >= 0; i--) {
+ i = ns->level;
+
+ /*
+ * If there is a list of upid nrs specified, use it instead of letting
+ * the kernel chose the upid nrs for us.
+ */
+ pid_l = current->next_id;
+ if (pid_l && pid_l->nids) {
+ /*
+ * returns the next ns to be visited in the following loop
+ * (or NULL if we are done).
+ * i is filled in with the next level to be visited. We need
+ * it to undo things in the error cases.
+ */
+ tmp = set_predefined_pids(ns, pid, pid_l, &i);
+ if (IS_ERR(tmp)) {
+ *retval = PTR_ERR(tmp);
+ goto out_free;
+ }
+ current->next_id = NULL;
+ }
+
+ *retval = -ENOMEM;
+ /*
+ * Let the lower levels upid nrs be automatically allocated
+ */
+ for ( ; i >= 0; i--) {
nr = alloc_pidmap(tmp);
if (nr < 0)
goto out_free;
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC][PATCH 4/4] PID: use the target ID specified in procfs
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (6 preceding siblings ...)
2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM
@ 2008-04-04 14:51 ` Nadia.Derbey
2008-04-15 3:06 ` [RFC][PATCH 0/4] Object creation with a specified id Nick Andrew
[not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
9 siblings, 0 replies; 31+ messages in thread
From: Nadia.Derbey @ 2008-04-04 14:51 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, orenl, Nadia Derbey
[-- Attachment #1: upidnr_use_next_id.patch --]
[-- Type: text/plain, Size: 6386 bytes --]
[PATCH 04/04]
This patch makes use of the target ids specified by a previous write to
/proc/self/next_id as the ids to use to allocate the next upid nrs.
Upper levels upid nrs that are not specified in next_pids file are left to the
kernel choice.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
---
include/linux/pid.h | 2
kernel/fork.c | 3 -
kernel/pid.c | 141 +++++++++++++++++++++++++++++++++++++++++++++-------
3 files changed, 126 insertions(+), 20 deletions(-)
Index: linux-2.6.25-rc8-mm1/include/linux/pid.h
===================================================================
--- linux-2.6.25-rc8-mm1.orig/include/linux/pid.h 2008-04-04 13:11:37.000000000 +0200
+++ linux-2.6.25-rc8-mm1/include/linux/pid.h 2008-04-04 14:54:09.000000000 +0200
@@ -121,7 +121,7 @@ extern struct pid *find_get_pid(int nr);
extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
int next_pidmap(struct pid_namespace *pid_ns, int last);
-extern struct pid *alloc_pid(struct pid_namespace *ns);
+extern struct pid *alloc_pid(struct pid_namespace *ns, int *retval);
extern void free_pid(struct pid *pid);
/*
Index: linux-2.6.25-rc8-mm1/kernel/fork.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/fork.c 2008-04-04 14:00:35.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/fork.c 2008-04-04 14:54:43.000000000 +0200
@@ -1200,8 +1200,7 @@ static struct task_struct *copy_process(
goto bad_fork_cleanup_io;
if (pid != &init_struct_pid) {
- retval = -ENOMEM;
- pid = alloc_pid(task_active_pid_ns(p));
+ pid = alloc_pid(task_active_pid_ns(p), &retval);
if (!pid)
goto bad_fork_cleanup_io;
Index: linux-2.6.25-rc8-mm1/kernel/pid.c
===================================================================
--- linux-2.6.25-rc8-mm1.orig/kernel/pid.c 2008-04-04 13:11:39.000000000 +0200
+++ linux-2.6.25-rc8-mm1/kernel/pid.c 2008-04-04 14:59:24.000000000 +0200
@@ -122,6 +122,26 @@ static void free_pidmap(struct upid *upi
atomic_inc(&map->nr_free);
}
+static inline int alloc_pidmap_page(struct pidmap *map)
+{
+ if (unlikely(!map->page)) {
+ void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ /*
+ * Free the page if someone raced with us
+ * installing it:
+ */
+ spin_lock_irq(&pidmap_lock);
+ if (map->page)
+ kfree(page);
+ else
+ map->page = page;
+ spin_unlock_irq(&pidmap_lock);
+ if (unlikely(!map->page))
+ return -1;
+ }
+ return 0;
+}
+
static int alloc_pidmap(struct pid_namespace *pid_ns)
{
int i, offset, max_scan, pid, last = pid_ns->last_pid;
@@ -134,21 +154,8 @@ static int alloc_pidmap(struct pid_names
map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
for (i = 0; i <= max_scan; ++i) {
- if (unlikely(!map->page)) {
- void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
- /*
- * Free the page if someone raced with us
- * installing it:
- */
- spin_lock_irq(&pidmap_lock);
- if (map->page)
- kfree(page);
- else
- map->page = page;
- spin_unlock_irq(&pidmap_lock);
- if (unlikely(!map->page))
- break;
- }
+ if (unlikely(alloc_pidmap_page(map)))
+ break;
if (likely(atomic_read(&map->nr_free))) {
do {
if (!test_and_set_bit(offset, map->page)) {
@@ -182,6 +189,35 @@ static int alloc_pidmap(struct pid_names
return -1;
}
+/*
+ * Return a predefined pid value if successful (ID_AT(pid_l, level)),
+ * -errno else
+ */
+static int alloc_fixed_pidmap(struct pid_namespace *pid_ns,
+ struct sys_id *pid_l, int level)
+{
+ int offset, pid;
+ struct pidmap *map;
+
+ pid = ID_AT(pid_l, level);
+ if (pid < RESERVED_PIDS || pid >= pid_max)
+ return -EINVAL;
+
+ map = &pid_ns->pidmap[pid / BITS_PER_PAGE];
+
+ if (unlikely(alloc_pidmap_page(map)))
+ return -ENOMEM;
+
+ offset = pid & BITS_PER_PAGE_MASK;
+ if (test_and_set_bit(offset, map->page))
+ return -EBUSY;
+
+ atomic_dec(&map->nr_free);
+ pid_ns->last_pid = max(pid_ns->last_pid, pid);
+
+ return pid;
+}
+
int next_pidmap(struct pid_namespace *pid_ns, int last)
{
int offset;
@@ -243,20 +279,91 @@ void free_pid(struct pid *pid)
call_rcu(&pid->rcu, delayed_put_pid);
}
-struct pid *alloc_pid(struct pid_namespace *ns)
+/*
+ * Called by alloc_pid() to use a list of predefined ids for the calling
+ * process' upper ns levels.
+ * Returns next pid ns to visit if successful (may be NULL if walked through
+ * the entire pid ns hierarchy).
+ * i is filled with next level to be visited (useful for the error cases).
+ */
+static struct pid_namespace *set_predefined_pids(struct pid_namespace *ns,
+ struct pid *pid,
+ struct sys_id *pid_l,
+ int *next_level)
+{
+ struct pid_namespace *tmp;
+ int rel_level, i, nr;
+
+ rel_level = pid_l->nids - 1;
+ if (rel_level > ns->level)
+ return ERR_PTR(-EINVAL);
+
+ tmp = ns;
+
+ /*
+ * Use the predefined upid nrs for levels ns->level down to
+ * ns->level - rel_level
+ */
+ for (i = ns->level ; rel_level >= 0; i--, rel_level--) {
+ nr = alloc_fixed_pidmap(tmp, pid_l, rel_level);
+ if (nr < 0) {
+ tmp = ERR_PTR(nr);
+ goto out;
+ }
+
+ pid->numbers[i].nr = nr;
+ pid->numbers[i].ns = tmp;
+ tmp = tmp->parent;
+ }
+
+ id_blocks_free(pid_l);
+out:
+ *next_level = i;
+ return tmp;
+}
+
+struct pid *alloc_pid(struct pid_namespace *ns, int *retval)
{
struct pid *pid;
enum pid_type type;
int i, nr;
struct pid_namespace *tmp;
struct upid *upid;
+ struct sys_id *pid_l;
+ *retval = -ENOMEM;
pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
if (!pid)
goto out;
tmp = ns;
- for (i = ns->level; i >= 0; i--) {
+ i = ns->level;
+
+ /*
+ * If there is a list of upid nrs specified, use it instead of letting
+ * the kernel chose the upid nrs for us.
+ */
+ pid_l = current->next_id;
+ if (pid_l && pid_l->nids) {
+ /*
+ * returns the next ns to be visited in the following loop
+ * (or NULL if we are done).
+ * i is filled in with the next level to be visited. We need
+ * it to undo things in the error cases.
+ */
+ tmp = set_predefined_pids(ns, pid, pid_l, &i);
+ if (IS_ERR(tmp)) {
+ *retval = PTR_ERR(tmp);
+ goto out_free;
+ }
+ current->next_id = NULL;
+ }
+
+ *retval = -ENOMEM;
+ /*
+ * Let the lower levels upid nrs be automatically allocated
+ */
+ for ( ; i >= 0; i--) {
nr = alloc_pidmap(tmp);
if (nr < 0)
goto out_free;
--
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
@ 2008-04-15 3:06 ` Nick Andrew
0 siblings, 0 replies; 31+ messages in thread
From: Nick Andrew @ 2008-04-15 3:06 UTC (permalink / raw)
To: Nadia.Derbey-6ktuUTfB/bM
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
> . echo "LONG XX" > /proc/self/next_id
> next object to be created will have an id set to XX
> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
> next object to be created will have its ids set to XX0, ... X<n-1>
> This is particularly useful for processes that may have several ids if
> they belong to nested namespaces.
How do you handle race conditions, i.e. you specify the ID for the
next object to be created, and then some other thread goes and creates
an object before your thread creates one?
Nick.
--
PGP Key ID = 0x418487E7 http://www.nick-andrew.net/
PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
` (7 preceding siblings ...)
2008-04-04 14:51 ` Nadia.Derbey
@ 2008-04-15 3:06 ` Nick Andrew
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
` (2 more replies)
[not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
9 siblings, 3 replies; 31+ messages in thread
From: Nick Andrew @ 2008-04-15 3:06 UTC (permalink / raw)
To: Nadia.Derbey; +Cc: linux-kernel, containers, orenl
On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote:
> . echo "LONG XX" > /proc/self/next_id
> next object to be created will have an id set to XX
> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
> next object to be created will have its ids set to XX0, ... X<n-1>
> This is particularly useful for processes that may have several ids if
> they belong to nested namespaces.
How do you handle race conditions, i.e. you specify the ID for the
next object to be created, and then some other thread goes and creates
an object before your thread creates one?
Nick.
--
PGP Key ID = 0x418487E7 http://www.nick-andrew.net/
PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
@ 2008-04-15 10:30 ` Nadia Derbey
2008-04-18 5:46 ` Nadia Derbey
1 sibling, 0 replies; 31+ messages in thread
From: Nadia Derbey @ 2008-04-15 10:30 UTC (permalink / raw)
To: Nick Andrew
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Nick Andrew wrote:
> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>
>> . echo "LONG XX" > /proc/self/next_id
>> next object to be created will have an id set to XX
>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>> next object to be created will have its ids set to XX0, ... X<n-1>
>> This is particularly useful for processes that may have several ids if
>> they belong to nested namespaces.
>
>
> How do you handle race conditions, i.e. you specify the ID for the
> next object to be created, and then some other thread goes and creates
> an object before your thread creates one?
>
> Nick.
Sorry for not answering earlier, I just saw your e-mail!
It's true that the way I've done things, the "create_with_id" doesn't
take into account multi-threaded apps, since "self" is related to the
thread group leader.
May be using something like /proc/self/task/<my_tid>/next_id would be
better, but I have to think more about it...
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
2008-04-15 3:06 ` [RFC][PATCH 0/4] Object creation with a specified id Nick Andrew
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
@ 2008-04-15 10:30 ` Nadia Derbey
[not found] ` <480483C2.3030509-6ktuUTfB/bM@public.gmane.org>
2008-04-18 5:46 ` Nadia Derbey
2 siblings, 1 reply; 31+ messages in thread
From: Nadia Derbey @ 2008-04-15 10:30 UTC (permalink / raw)
To: Nick Andrew; +Cc: linux-kernel, containers, orenl
Nick Andrew wrote:
> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote:
>
>> . echo "LONG XX" > /proc/self/next_id
>> next object to be created will have an id set to XX
>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>> next object to be created will have its ids set to XX0, ... X<n-1>
>> This is particularly useful for processes that may have several ids if
>> they belong to nested namespaces.
>
>
> How do you handle race conditions, i.e. you specify the ID for the
> next object to be created, and then some other thread goes and creates
> an object before your thread creates one?
>
> Nick.
Sorry for not answering earlier, I just saw your e-mail!
It's true that the way I've done things, the "create_with_id" doesn't
take into account multi-threaded apps, since "self" is related to the
thread group leader.
May be using something like /proc/self/task/<my_tid>/next_id would be
better, but I have to think more about it...
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
2008-04-15 10:30 ` Nadia Derbey
@ 2008-04-15 18:52 ` Oren Laadan
0 siblings, 0 replies; 31+ messages in thread
From: Oren Laadan @ 2008-04-15 18:52 UTC (permalink / raw)
To: Nadia Derbey
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Nick Andrew, linux-kernel-u79uwXL29TY76Z2rM5mHXA
Nadia Derbey wrote:
> Nick Andrew wrote:
>> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>>
>>> . echo "LONG XX" > /proc/self/next_id
>>> next object to be created will have an id set to XX
>>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>>> next object to be created will have its ids set to XX0, ... X<n-1>
>>> This is particularly useful for processes that may have several
>>> ids if
>>> they belong to nested namespaces.
>>
>>
>> How do you handle race conditions, i.e. you specify the ID for the
>> next object to be created, and then some other thread goes and creates
>> an object before your thread creates one?
>>
>> Nick.
>
>
> Sorry for not answering earlier, I just saw your e-mail!
[I too managed to miss that message].
>
> It's true that the way I've done things, the "create_with_id" doesn't
> take into account multi-threaded apps, since "self" is related to the
> thread group leader.
>
> May be using something like /proc/self/task/<my_tid>/next_id would be
> better, but I have to think more about it...
That /proc/self links to /proc/TGID slipped my mind. Definitely must
be done on a per-thread basis (and /proc/<TGID>/task/<PID>/next_id
will do the trick).
Oren.
>
> Regards,
> Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
@ 2008-04-15 18:52 ` Oren Laadan
0 siblings, 0 replies; 31+ messages in thread
From: Oren Laadan @ 2008-04-15 18:52 UTC (permalink / raw)
To: Nadia Derbey; +Cc: Nick Andrew, linux-kernel, containers
Nadia Derbey wrote:
> Nick Andrew wrote:
>> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote:
>>
>>> . echo "LONG XX" > /proc/self/next_id
>>> next object to be created will have an id set to XX
>>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>>> next object to be created will have its ids set to XX0, ... X<n-1>
>>> This is particularly useful for processes that may have several
>>> ids if
>>> they belong to nested namespaces.
>>
>>
>> How do you handle race conditions, i.e. you specify the ID for the
>> next object to be created, and then some other thread goes and creates
>> an object before your thread creates one?
>>
>> Nick.
>
>
> Sorry for not answering earlier, I just saw your e-mail!
[I too managed to miss that message].
>
> It's true that the way I've done things, the "create_with_id" doesn't
> take into account multi-threaded apps, since "self" is related to the
> thread group leader.
>
> May be using something like /proc/self/task/<my_tid>/next_id would be
> better, but I have to think more about it...
That /proc/self links to /proc/TGID slipped my mind. Definitely must
be done on a per-thread basis (and /proc/<TGID>/task/<PID>/next_id
will do the trick).
Oren.
>
> Regards,
> Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
2008-04-15 10:30 ` Nadia Derbey
@ 2008-04-18 5:46 ` Nadia Derbey
1 sibling, 0 replies; 31+ messages in thread
From: Nadia Derbey @ 2008-04-18 5:46 UTC (permalink / raw)
To: Nick Andrew
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Nick Andrew wrote:
> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org wrote:
>
>> . echo "LONG XX" > /proc/self/next_id
>> next object to be created will have an id set to XX
>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>> next object to be created will have its ids set to XX0, ... X<n-1>
>> This is particularly useful for processes that may have several ids if
>> they belong to nested namespaces.
>
>
> How do you handle race conditions, i.e. you specify the ID for the
> next object to be created, and then some other thread goes and creates
> an object before your thread creates one?
>
> Nick.
OK, race problem between threads is fixed. Thanks for finding the issue!
The new patch series is coming next.
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC][PATCH 0/4] Object creation with a specified id
2008-04-15 3:06 ` [RFC][PATCH 0/4] Object creation with a specified id Nick Andrew
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
2008-04-15 10:30 ` Nadia Derbey
@ 2008-04-18 5:46 ` Nadia Derbey
2 siblings, 0 replies; 31+ messages in thread
From: Nadia Derbey @ 2008-04-18 5:46 UTC (permalink / raw)
To: Nick Andrew; +Cc: linux-kernel, containers, orenl
Nick Andrew wrote:
> On Fri, Apr 04, 2008 at 04:51:29PM +0200, Nadia.Derbey@bull.net wrote:
>
>> . echo "LONG XX" > /proc/self/next_id
>> next object to be created will have an id set to XX
>> . echo "LONG<n> X0 ... X<n-1>" > /proc/self/next_id
>> next object to be created will have its ids set to XX0, ... X<n-1>
>> This is particularly useful for processes that may have several ids if
>> they belong to nested namespaces.
>
>
> How do you handle race conditions, i.e. you specify the ID for the
> next object to be created, and then some other thread goes and creates
> an object before your thread creates one?
>
> Nick.
OK, race problem between threads is fixed. Thanks for finding the issue!
The new patch series is coming next.
Regards,
Nadia
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-04-18 5:46 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 14:51 [RFC][PATCH 0/4] Object creation with a specified id Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 1/4] Provide a new procfs interface to set next id Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
2008-04-04 14:51 ` [RFC][PATCH 2/4] Provide a new procfs interface to set next upid nr(s) Nadia.Derbey
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` [RFC][PATCH 3/4] IPC: use the target ID specified in procfs Nadia.Derbey
2008-04-04 14:51 ` Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` [RFC][PATCH 4/4] PID: " Nadia.Derbey-6ktuUTfB/bM
2008-04-04 14:51 ` Nadia.Derbey
2008-04-15 3:06 ` [RFC][PATCH 0/4] Object creation with a specified id Nick Andrew
[not found] ` <20080415030623.GA8171-ZRFfYzONFVA@public.gmane.org>
2008-04-15 10:30 ` Nadia Derbey
2008-04-18 5:46 ` Nadia Derbey
2008-04-15 10:30 ` Nadia Derbey
[not found] ` <480483C2.3030509-6ktuUTfB/bM@public.gmane.org>
2008-04-15 18:52 ` Oren Laadan
2008-04-15 18:52 ` Oren Laadan
2008-04-18 5:46 ` Nadia Derbey
[not found] ` <20080404145129.637145000-6ktuUTfB/bM@public.gmane.org>
2008-04-15 3:06 ` Nick Andrew
-- strict thread matches above, loose matches on Subject: below --
2008-04-04 14:51 Nadia.Derbey-6ktuUTfB/bM
2008-03-10 13:50 Nadia.Derbey-6ktuUTfB/bM
[not found] ` <20080310135054.312992000-6ktuUTfB/bM@public.gmane.org>
2008-03-13 23:16 ` Oren Laadan
[not found] ` <47D9B5B7.6060803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 6:21 ` Nadia Derbey
[not found] ` <47DA195B.8070704-6ktuUTfB/bM@public.gmane.org>
2008-03-14 15:50 ` Oren Laadan
[not found] ` <47DA9EB5.8040704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 15:56 ` Pavel Emelyanov
[not found] ` <47DAA041.9090009-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-03-14 16:02 ` Oren Laadan
[not found] ` <47DAA1A6.6010509-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-14 16:08 ` Pavel Emelyanov
2008-03-14 16:11 ` Nadia Derbey
2008-03-14 16:11 ` Nadia Derbey
[not found] ` <47DAA3AA.4050906-6ktuUTfB/bM@public.gmane.org>
2008-03-14 16:45 ` Oren Laadan
[not found] ` <47DAABAB.7000706-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-16 3:43 ` Serge E. Hallyn
[not found] ` <20080316034320.GA19793-6s5zFf/epYLPQpwDFJZrxFMas7LaWZ9n@public.gmane.org>
2008-03-16 19:08 ` Oren Laadan
[not found] ` <47DD703C.4030809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-03-17 14:44 ` Serge E. Hallyn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.