* Re: [PATCH 0/16] Pid namespaces
[not found] ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-07-16 8:47 ` Pavel Emelianov
[not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Pavel Emelianov @ 2007-07-16 8:47 UTC (permalink / raw)
To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Linux Containers
> My x86_64 system boots fine but crashes as below, when I run my
> 'pidns_exec' test with a simple program that prints getpid(), getppid()
> etc of the process in the child pid ns.
>
> Pls see
>
> http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
>
> for the patches I currently have applied and let me know if I need more
> on top.
>
> And see
>
> http://www.geocities.com/sukadevb/Pidspace/Test1/
>
> for the test programs. You may need to run the 'mypid-loop.x' script
> to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID
> and execs the given program (it was included in Patch 0 of the patchset I
> posted to Containers).
>
> Suka
>
> login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP:
> [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> PGD 104d53067 PUD 104d4d067 PMD 0
> Oops: 0002 [1] SMP
> CPU 2
> Modules linked in:
> Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
> RIP: 0010:[<ffffffff802b9e5e>] [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> RSP: 0018:ffff8101029d7d28 EFLAGS: 00010202
> RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
> RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
> RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
> R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
> FS: 00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task ffff81010269e7f0)
> Stack: ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
> ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
> 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
> Call Trace:
> [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
> [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
> [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
> [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
> [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
> [<ffffffff802347e2>] copy_process+0xcb4/0x1439
> [<ffffffff8020bbee>] system_call+0x7e/0x83
> [<ffffffff8023556a>] do_fork+0x6c/0x1e7
> [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
>
Here's the patch fixing the problem. So, Suka, I propose that you review
my patches, point out things that you don't like and would like to see
your code instead. After all I will re-split the set with your fixes, mark
some patches with your From: and send them to Andrew. What do you think?
---
--- ./fs/proc/root.c.procpidnsfix 2007-07-16 10:32:00.000000000 +0400
+++ ./fs/proc/root.c 2007-07-16 12:34:35.000000000 +0400
@@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
if (!ei->pid)
ei->pid = find_get_pid(1);
sb->s_flags |= MS_ACTIVE;
-
- mntput(ns->proc_mnt);
ns->proc_mnt = mnt;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/16] Pid namespaces
[not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-07-17 4:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
0 siblings, 0 replies; 6+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-17 4:23 UTC (permalink / raw)
To: Pavel Emelianov; +Cc: Linux Containers
Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| >My x86_64 system boots fine but crashes as below, when I run my
| >'pidns_exec' test with a simple program that prints getpid(), getppid()
| >etc of the process in the child pid ns.
| >
| >Pls see
| >
| >http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
| >
| >for the patches I currently have applied and let me know if I need more
| >on top.
| >
| >And see
| >
| >http://www.geocities.com/sukadevb/Pidspace/Test1/
| >
| >for the test programs. You may need to run the 'mypid-loop.x' script
| >to repro the crash. The pidns_exec.c program calls clone() with
| >CLONE_NEWPID
| >and execs the given program (it was included in Patch 0 of the patchset I
| >posted to Containers).
| >
| >Suka
| >
| >login: Unable to handle kernel NULL pointer dereference at
| >00000000000002fc RIP:
| > [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >PGD 104d53067 PUD 104d4d067 PMD 0
| >Oops: 0002 [1] SMP
| >CPU 2
| >Modules linked in:
| >Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
| >RIP: 0010:[<ffffffff802b9e5e>] [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >RSP: 0018:ffff8101029d7d28 EFLAGS: 00010202
| >RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
| >RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
| >RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
| >R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
| >R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
| >FS: 00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000)
| >knlGS:0000000000000000
| >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
| >CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
| >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
| >Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task
| >ffff81010269e7f0)
| >Stack: ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
| > ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
| > 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
| >Call Trace:
| > [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
| > [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
| > [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
| > [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
| > [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
| > [<ffffffff802347e2>] copy_process+0xcb4/0x1439
| > [<ffffffff8020bbee>] system_call+0x7e/0x83
| > [<ffffffff8023556a>] do_fork+0x6c/0x1e7
| > [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
| >
|
| Here's the patch fixing the problem.
Yes, I think it fixes the problem I was seeing before.
However, my next test failed. This time I run:
$ cat ./lxc-wrap.sh
#!/bin/sh
if [ $$ -eq 1 ]; then
echo "$$ (`basename $0`): Executing in nested pid ns..."
fi
port_num=$1;
if [ $# -lt 1 ]; then
port_num=2709
fi
mount -t proc lxcproc /proc
/usr/sbin/sshd -D -p $port_num
umount /proc
$ ./pidns_exec ./lxc-wrap.sh
This creates a new pid ns and an sshd in that ns. From another window
I ssh to the system with port number 2709 and I am in the new pid ns.
(a simple prog to print getpid(), getppid() etc seems to be fine).
But "ps -e" fails:
$ ps -e
Error: /proc must be mounted
To mount /proc at boot you need an /etc/fstab line like:
/proc /proc proc defaults
In the meantime, mount /proc /proc -t proc
$ mount -t proc none /proc
mount: /proc is busy
If I comment out the "mount" command in the lxc-wrap.sh script above,
this mount seems to work, but "ps -e" still reports the same error.
| So, Suka, I propose that you review my patches, point out things that you
| don't like and would like to see your code instead.
I am reviewing them and I thought we wanted to start with my patchset. I
started porting my patches to rc6-mm1 plus patches we already sent to -mm.
Well, I can hold off on that and review/test your patches.
| After all I will re-split the set with your fixes, mark
| some patches with your From: and send them to Andrew. What do you think?
I don't mind re-splitting the patches, but would that be more work for us
and would Andrew and other reviewers prefer patches split logically. For
instance if you and I contributed to a patch, can we just put both our
Signed-off on one patch rather than splitting it ?
|
| ---
|
| --- ./fs/proc/root.c.procpidnsfix 2007-07-16 10:32:00.000000000 +0400
| +++ ./fs/proc/root.c 2007-07-16 12:34:35.000000000 +0400
| @@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
| if (!ei->pid)
| ei->pid = find_get_pid(1);
| sb->s_flags |= MS_ACTIVE;
| -
| - mntput(ns->proc_mnt);
| ns->proc_mnt = mnt;
| }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
[not found] ` <468DF907.2060809@openvz.org>
@ 2007-07-25 0:36 ` sukadev
2007-07-25 10:07 ` Pavel Emelyanov
0 siblings, 1 reply; 6+ messages in thread
From: sukadev @ 2007-07-25 0:36 UTC (permalink / raw)
To: Pavel Emelianov
Cc: Andrew Morton, Serge Hallyn, Eric W. Biederman, Linux Containers,
Linux Kernel Mailing List, Kirill Korotaev
Pavel Emelianov [xemul@openvz.org] wrote:
| Make alloc_pid() initialize pid_numbers and hash them
| into the hashtable, not the struct pid itself.
|
| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
|
| ---
|
| pid.c | 47 +++++++++++++++++++++++++++++++++--------------
| 1 files changed, 33 insertions(+), 14 deletions(-)
|
| --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400
| +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400
| @@ -28,8 +28,10 @@
| #include <linux/hash.h>
| #include <linux/pid_namespace.h>
| #include <linux/init_task.h>
| +#include <linux/proc_fs.h>
|
| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| +#define pid_hashfn(nr, ns) \
| + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
| static struct hlist_head *pid_hash;
| static int pidhash_shift;
| struct pid init_struct_pid = INIT_STRUCT_PID;
| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
| if (!pid)
| return;
|
| - ns = pid->numbers[0].ns;
| + ns = pid->numbers[pid->level].ns;
| if ((atomic_read(&pid->count) == 1) ||
| atomic_dec_and_test(&pid->count))
| kmem_cache_free(ns->pid_cachep, pid);
| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
| fastcall void free_pid(struct pid *pid)
| {
| /* We can be called with write_lock_irq(&tasklist_lock) held */
| + int i;
| unsigned long flags;
|
| spin_lock_irqsave(&pidmap_lock, flags);
| - hlist_del_rcu(&pid->pid_chain);
| + for (i = 0; i <= pid->level; i++)
| + hlist_del_rcu(&pid->numbers[i].pid_chain);
| spin_unlock_irqrestore(&pidmap_lock, flags);
|
| - free_pidmap(&init_pid_ns, pid->nr);
| + for (i = 0; i <= pid->level; i++)
| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| +
| call_rcu(&pid->rcu, delayed_put_pid);
| }
|
| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
| {
| struct pid *pid;
| enum pid_type type;
| - int nr = -1;
| + struct pid_namespace *ns;
| + int i, nr;
|
| - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
| if (!pid)
| goto out;
|
| - nr = alloc_pidmap(current->nsproxy->pid_ns);
| - if (nr < 0)
| - goto out_free;
| + ns = pid_ns;
| + for (i = pid_ns->level; i >= 0; i--) {
| + nr = alloc_pidmap(ns);
| + if (nr < 0)
| + goto out_free;
If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
and fails when i=2, we would try to free_pidmap() even from
pid->pid_number[2].pid_ns. This would incorrectly a)
drop reference count on that pid namespace, and incorrectly
increment pidmap->nr_free.
Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
before calling free_pidmap() below ?
|
| + pid->numbers[i].nr = nr;
| + pid->numbers[i].ns = ns;
| + ns = ns->parent;
| + }
| +
| + pid->level = pid_ns->level;
| atomic_set(&pid->count, 1);
| - pid->nr = nr;
| for (type = 0; type < PIDTYPE_MAX; ++type)
| INIT_HLIST_HEAD(&pid->tasks[type]);
|
| spin_lock_irq(&pidmap_lock);
| - hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
| + for (i = pid->level; i >= 0; i--)
| + hlist_add_head_rcu(&pid->numbers[i].pid_chain,
| + &pid_hash[pid_hashfn(pid->numbers[i].nr,
| + pid->numbers[i].ns)]);
| spin_unlock_irq(&pidmap_lock);
| -
| out:
| return pid;
|
| out_free:
| - kmem_cache_free(init_pid_ns.pid_cachep, pid);
| + for (i++; i <= pid->level; i++)
| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
i.e all pid->numbers[] may not be initialized here right ?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
2007-07-25 0:36 ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev
@ 2007-07-25 10:07 ` Pavel Emelyanov
[not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Pavel Emelyanov @ 2007-07-25 10:07 UTC (permalink / raw)
To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List
sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | Make alloc_pid() initialize pid_numbers and hash them
> | into the hashtable, not the struct pid itself.
> |
> | Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> |
> | ---
> |
> | pid.c | 47 +++++++++++++++++++++++++++++++++--------------
> | 1 files changed, 33 insertions(+), 14 deletions(-)
> |
> | --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400
> | +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400
> | @@ -28,8 +28,10 @@
> | #include <linux/hash.h>
> | #include <linux/pid_namespace.h>
> | #include <linux/init_task.h>
> | +#include <linux/proc_fs.h>
> |
> | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | +#define pid_hashfn(nr, ns) \
> | + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> | static struct hlist_head *pid_hash;
> | static int pidhash_shift;
> | struct pid init_struct_pid = INIT_STRUCT_PID;
> | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> | if (!pid)
> | return;
> |
> | - ns = pid->numbers[0].ns;
> | + ns = pid->numbers[pid->level].ns;
> | if ((atomic_read(&pid->count) == 1) ||
> | atomic_dec_and_test(&pid->count))
> | kmem_cache_free(ns->pid_cachep, pid);
> | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> | fastcall void free_pid(struct pid *pid)
> | {
> | /* We can be called with write_lock_irq(&tasklist_lock) held */
> | + int i;
> | unsigned long flags;
> |
> | spin_lock_irqsave(&pidmap_lock, flags);
> | - hlist_del_rcu(&pid->pid_chain);
> | + for (i = 0; i <= pid->level; i++)
> | + hlist_del_rcu(&pid->numbers[i].pid_chain);
> | spin_unlock_irqrestore(&pidmap_lock, flags);
> |
> | - free_pidmap(&init_pid_ns, pid->nr);
> | + for (i = 0; i <= pid->level; i++)
> | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | +
> | call_rcu(&pid->rcu, delayed_put_pid);
> | }
> |
> | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> | {
> | struct pid *pid;
> | enum pid_type type;
> | - int nr = -1;
> | + struct pid_namespace *ns;
> | + int i, nr;
> |
> | - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> | if (!pid)
> | goto out;
> |
> | - nr = alloc_pidmap(current->nsproxy->pid_ns);
> | - if (nr < 0)
> | - goto out_free;
> | + ns = pid_ns;
> | + for (i = pid_ns->level; i >= 0; i--) {
> | + nr = alloc_pidmap(ns);
> | + if (nr < 0)
> | + goto out_free;
>
> If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
The loop is descending, not ascending...
> and fails when i=2, we would try to free_pidmap() even from
> pid->pid_number[2].pid_ns. This would incorrectly a)
> drop reference count on that pid namespace, and incorrectly
> increment pidmap->nr_free.
>
> Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
> before calling free_pidmap() below ?
>
> |
> | + pid->numbers[i].nr = nr;
> | + pid->numbers[i].ns = ns;
> | + ns = ns->parent;
> | + }
> | +
> | + pid->level = pid_ns->level;
> | atomic_set(&pid->count, 1);
> | - pid->nr = nr;
> | for (type = 0; type < PIDTYPE_MAX; ++type)
> | INIT_HLIST_HEAD(&pid->tasks[type]);
> |
> | spin_lock_irq(&pidmap_lock);
> | - hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
> | + for (i = pid->level; i >= 0; i--)
> | + hlist_add_head_rcu(&pid->numbers[i].pid_chain,
> | + &pid_hash[pid_hashfn(pid->numbers[i].nr,
> | + pid->numbers[i].ns)]);
> | spin_unlock_irq(&pidmap_lock);
> | -
> | out:
> | return pid;
> |
> | out_free:
> | - kmem_cache_free(init_pid_ns.pid_cachep, pid);
> | + for (i++; i <= pid->level; i++)
> | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
>
> i.e all pid->numbers[] may not be initialized here right ?
The numbers from i up to pid->level are initialized, so this
loop looks correct.
Thanks,
Pavel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
[not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2007-07-26 6:42 ` Pavel Emelyanov
0 siblings, 1 reply; 6+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-25 19:13 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: Linux Containers, Linux Kernel Mailing List
Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
| >Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| >| Make alloc_pid() initialize pid_numbers and hash them
| >| into the hashtable, not the struct pid itself.
| >|
| >| Signed-off-by: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
| >|
| >| ---
| >|
| >| pid.c | 47 +++++++++++++++++++++++++++++++++--------------
| >| 1 files changed, 33 insertions(+), 14 deletions(-)
| >|
| >| --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400
| >| +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400
| >| @@ -28,8 +28,10 @@
| >| #include <linux/hash.h>
| >| #include <linux/pid_namespace.h>
| >| #include <linux/init_task.h>
| >| +#include <linux/proc_fs.h>
| >|
| >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| >| +#define pid_hashfn(nr, ns) \
| >| + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
| >| static struct hlist_head *pid_hash;
| >| static int pidhash_shift;
| >| struct pid init_struct_pid = INIT_STRUCT_PID;
| >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
| >| if (!pid)
| >| return;
| >|
| >| - ns = pid->numbers[0].ns;
| >| + ns = pid->numbers[pid->level].ns;
| >| if ((atomic_read(&pid->count) == 1) ||
| >| atomic_dec_and_test(&pid->count))
| >| kmem_cache_free(ns->pid_cachep, pid);
| >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
| >| fastcall void free_pid(struct pid *pid)
| >| {
| >| /* We can be called with write_lock_irq(&tasklist_lock) held */
| >| + int i;
| >| unsigned long flags;
| >|
| >| spin_lock_irqsave(&pidmap_lock, flags);
| >| - hlist_del_rcu(&pid->pid_chain);
| >| + for (i = 0; i <= pid->level; i++)
| >| + hlist_del_rcu(&pid->numbers[i].pid_chain);
| >| spin_unlock_irqrestore(&pidmap_lock, flags);
| >|
| >| - free_pidmap(&init_pid_ns, pid->nr);
| >| + for (i = 0; i <= pid->level; i++)
| >| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| >| +
| >| call_rcu(&pid->rcu, delayed_put_pid);
| >| }
| >|
| >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
| >| {
| >| struct pid *pid;
| >| enum pid_type type;
| >| - int nr = -1;
| >| + struct pid_namespace *ns;
| >| + int i, nr;
| >|
| >| - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| >| + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
| >| if (!pid)
| >| goto out;
| >|
| >| - nr = alloc_pidmap(current->nsproxy->pid_ns);
| >| - if (nr < 0)
| >| - goto out_free;
| >| + ns = pid_ns;
| >| + for (i = pid_ns->level; i >= 0; i--) {
| >| + nr = alloc_pidmap(ns);
| >| + if (nr < 0)
| >| + goto out_free;
| >
| >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
|
| It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
| The loop is descending, not ascending...
Aah descending - thats right. But I still think there is a problem.
Here is your code that I am referring to:
pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
if (!pid)
goto out;
ns = pid_ns;
for (i = pid_ns->level; i >= 0; i--) {
nr = alloc_pidmap(ns);
if (nr < 0)
goto out_free;
pid->numbers[i].nr = nr;
pid->numbers[i].ns = ns;
ns = ns->parent;
}
pid->level = pid_ns->level;
<snip>
out:
return pid;
out_free:
for (i++; i <= pid->level; i++)
free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
kmem_cache_free(pid_ns->pid_cachep, pid);
pid = NULL;
goto out;
<end code>
Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
i=3 and i=2 but fails for i=1 and we execute "goto out_free".
But pid->level is uninitialized at this point right ?
Even if it were set to zero (using kmem_cache_zalloc()), we may not
free the two pidmap entries we allocated for i=3 and i=2.
Suka
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
@ 2007-07-26 6:42 ` Pavel Emelyanov
0 siblings, 0 replies; 6+ messages in thread
From: Pavel Emelyanov @ 2007-07-26 6:42 UTC (permalink / raw)
To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List
sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | sukadev@us.ibm.com wrote:
> | >Pavel Emelianov [xemul@openvz.org] wrote:
> | >| Make alloc_pid() initialize pid_numbers and hash them
> | >| into the hashtable, not the struct pid itself.
> | >|
> | >| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | >|
> | >| ---
> | >|
> | >| pid.c | 47 +++++++++++++++++++++++++++++++++--------------
> | >| 1 files changed, 33 insertions(+), 14 deletions(-)
> | >|
> | >| --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400
> | >| +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400
> | >| @@ -28,8 +28,10 @@
> | >| #include <linux/hash.h>
> | >| #include <linux/pid_namespace.h>
> | >| #include <linux/init_task.h>
> | >| +#include <linux/proc_fs.h>
> | >|
> | >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | >| +#define pid_hashfn(nr, ns) \
> | >| + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> | >| static struct hlist_head *pid_hash;
> | >| static int pidhash_shift;
> | >| struct pid init_struct_pid = INIT_STRUCT_PID;
> | >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> | >| if (!pid)
> | >| return;
> | >|
> | >| - ns = pid->numbers[0].ns;
> | >| + ns = pid->numbers[pid->level].ns;
> | >| if ((atomic_read(&pid->count) == 1) ||
> | >| atomic_dec_and_test(&pid->count))
> | >| kmem_cache_free(ns->pid_cachep, pid);
> | >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> | >| fastcall void free_pid(struct pid *pid)
> | >| {
> | >| /* We can be called with write_lock_irq(&tasklist_lock) held */
> | >| + int i;
> | >| unsigned long flags;
> | >|
> | >| spin_lock_irqsave(&pidmap_lock, flags);
> | >| - hlist_del_rcu(&pid->pid_chain);
> | >| + for (i = 0; i <= pid->level; i++)
> | >| + hlist_del_rcu(&pid->numbers[i].pid_chain);
> | >| spin_unlock_irqrestore(&pidmap_lock, flags);
> | >|
> | >| - free_pidmap(&init_pid_ns, pid->nr);
> | >| + for (i = 0; i <= pid->level; i++)
> | >| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | >| +
> | >| call_rcu(&pid->rcu, delayed_put_pid);
> | >| }
> | >|
> | >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> | >| {
> | >| struct pid *pid;
> | >| enum pid_type type;
> | >| - int nr = -1;
> | >| + struct pid_namespace *ns;
> | >| + int i, nr;
> | >|
> | >| - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | >| + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> | >| if (!pid)
> | >| goto out;
> | >|
> | >| - nr = alloc_pidmap(current->nsproxy->pid_ns);
> | >| - if (nr < 0)
> | >| - goto out_free;
> | >| + ns = pid_ns;
> | >| + for (i = pid_ns->level; i >= 0; i--) {
> | >| + nr = alloc_pidmap(ns);
> | >| + if (nr < 0)
> | >| + goto out_free;
> | >
> | >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
> |
> | It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
> | The loop is descending, not ascending...
>
> Aah descending - thats right. But I still think there is a problem.
>
> Here is your code that I am referring to:
>
> pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> if (!pid)
> goto out;
>
> ns = pid_ns;
> for (i = pid_ns->level; i >= 0; i--) {
> nr = alloc_pidmap(ns);
> if (nr < 0)
> goto out_free;
>
> pid->numbers[i].nr = nr;
> pid->numbers[i].ns = ns;
> ns = ns->parent;
> }
>
> pid->level = pid_ns->level;
>
> <snip>
>
> out:
> return pid;
> out_free:
> for (i++; i <= pid->level; i++)
> free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
>
> kmem_cache_free(pid_ns->pid_cachep, pid);
> pid = NULL;
> goto out;
>
> <end code>
>
> Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
> i=3 and i=2 but fails for i=1 and we execute "goto out_free".
>
> But pid->level is uninitialized at this point right ?
>
> Even if it were set to zero (using kmem_cache_zalloc()), we may not
> free the two pidmap entries we allocated for i=3 and i=2.
Yes. I found this after detailed look at the code and (hope) fixed.
> Suka
>
Thanks,
Pavel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-07-26 6:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <468DF6F7.1010906@openvz.org>
[not found] ` <20070710002929.GA11549@us.ibm.com>
[not found] ` <469384DA.70805@openvz.org>
[not found] ` <20070712031937.GB11489@us.ibm.com>
[not found] ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-07-16 8:47 ` [PATCH 0/16] Pid namespaces Pavel Emelianov
[not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-17 4:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <468DF907.2060809@openvz.org>
2007-07-25 0:36 ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev
2007-07-25 10:07 ` Pavel Emelyanov
[not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2007-07-26 6:42 ` Pavel Emelyanov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox