* Re: [PATCH 0/16] Pid namespaces [not found] ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2007-07-16 8:47 ` Pavel Emelianov [not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Pavel Emelianov @ 2007-07-16 8:47 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Linux Containers > My x86_64 system boots fine but crashes as below, when I run my > 'pidns_exec' test with a simple program that prints getpid(), getppid() > etc of the process in the child pid ns. > > Pls see > > http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz > > for the patches I currently have applied and let me know if I need more > on top. > > And see > > http://www.geocities.com/sukadevb/Pidspace/Test1/ > > for the test programs. You may need to run the 'mypid-loop.x' script > to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID > and execs the given program (it was included in Patch 0 of the patchset I > posted to Containers). > > Suka > > login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP: > [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138 > PGD 104d53067 PUD 104d4d067 PMD 0 > Oops: 0002 [1] SMP > CPU 2 > Modules linked in: > Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10 > RIP: 0010:[<ffffffff802b9e5e>] [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138 > RSP: 0018:ffff8101029d7d28 EFLAGS: 00010202 > RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878 > RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238 > RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000 > R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000 > R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0 > FS: 00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task ffff81010269e7f0) > Stack: ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920 > ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164 > 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000 > Call Trace: > [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b > [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e > [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164 > [<ffffffff8024af34>] create_new_namespaces+0xde/0x192 > [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85 > [<ffffffff802347e2>] copy_process+0xcb4/0x1439 > [<ffffffff8020bbee>] system_call+0x7e/0x83 > [<ffffffff8023556a>] do_fork+0x6c/0x1e7 > [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0 > Here's the patch fixing the problem. So, Suka, I propose that you review my patches, point out things that you don't like and would like to see your code instead. After all I will re-split the set with your fixes, mark some patches with your From: and send them to Andrew. What do you think? --- --- ./fs/proc/root.c.procpidnsfix 2007-07-16 10:32:00.000000000 +0400 +++ ./fs/proc/root.c 2007-07-16 12:34:35.000000000 +0400 @@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste if (!ei->pid) ei->pid = find_get_pid(1); sb->s_flags |= MS_ACTIVE; - - mntput(ns->proc_mnt); ns->proc_mnt = mnt; } ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 0/16] Pid namespaces [not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2007-07-17 4:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 0 siblings, 0 replies; 6+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-17 4:23 UTC (permalink / raw) To: Pavel Emelianov; +Cc: Linux Containers Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote: | >My x86_64 system boots fine but crashes as below, when I run my | >'pidns_exec' test with a simple program that prints getpid(), getppid() | >etc of the process in the child pid ns. | > | >Pls see | > | >http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz | > | >for the patches I currently have applied and let me know if I need more | >on top. | > | >And see | > | >http://www.geocities.com/sukadevb/Pidspace/Test1/ | > | >for the test programs. You may need to run the 'mypid-loop.x' script | >to repro the crash. The pidns_exec.c program calls clone() with | >CLONE_NEWPID | >and execs the given program (it was included in Patch 0 of the patchset I | >posted to Containers). | > | >Suka | > | >login: Unable to handle kernel NULL pointer dereference at | >00000000000002fc RIP: | > [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138 | >PGD 104d53067 PUD 104d4d067 PMD 0 | >Oops: 0002 [1] SMP | >CPU 2 | >Modules linked in: | >Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10 | >RIP: 0010:[<ffffffff802b9e5e>] [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138 | >RSP: 0018:ffff8101029d7d28 EFLAGS: 00010202 | >RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878 | >RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238 | >RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000 | >R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000 | >R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0 | >FS: 00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) | >knlGS:0000000000000000 | >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b | >CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0 | >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 | >Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task | >ffff81010269e7f0) | >Stack: ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920 | > ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164 | > 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000 | >Call Trace: | > [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b | > [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e | > [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164 | > [<ffffffff8024af34>] create_new_namespaces+0xde/0x192 | > [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85 | > [<ffffffff802347e2>] copy_process+0xcb4/0x1439 | > [<ffffffff8020bbee>] system_call+0x7e/0x83 | > [<ffffffff8023556a>] do_fork+0x6c/0x1e7 | > [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0 | > | | Here's the patch fixing the problem. Yes, I think it fixes the problem I was seeing before. However, my next test failed. This time I run: $ cat ./lxc-wrap.sh #!/bin/sh if [ $$ -eq 1 ]; then echo "$$ (`basename $0`): Executing in nested pid ns..." fi port_num=$1; if [ $# -lt 1 ]; then port_num=2709 fi mount -t proc lxcproc /proc /usr/sbin/sshd -D -p $port_num umount /proc $ ./pidns_exec ./lxc-wrap.sh This creates a new pid ns and an sshd in that ns. From another window I ssh to the system with port number 2709 and I am in the new pid ns. (a simple prog to print getpid(), getppid() etc seems to be fine). But "ps -e" fails: $ ps -e Error: /proc must be mounted To mount /proc at boot you need an /etc/fstab line like: /proc /proc proc defaults In the meantime, mount /proc /proc -t proc $ mount -t proc none /proc mount: /proc is busy If I comment out the "mount" command in the lxc-wrap.sh script above, this mount seems to work, but "ps -e" still reports the same error. | So, Suka, I propose that you review my patches, point out things that you | don't like and would like to see your code instead. I am reviewing them and I thought we wanted to start with my patchset. I started porting my patches to rc6-mm1 plus patches we already sent to -mm. Well, I can hold off on that and review/test your patches. | After all I will re-split the set with your fixes, mark | some patches with your From: and send them to Andrew. What do you think? I don't mind re-splitting the patches, but would that be more work for us and would Andrew and other reviewers prefer patches split logically. For instance if you and I contributed to a patch, can we just put both our Signed-off on one patch rather than splitting it ? | | --- | | --- ./fs/proc/root.c.procpidnsfix 2007-07-16 10:32:00.000000000 +0400 | +++ ./fs/proc/root.c 2007-07-16 12:34:35.000000000 +0400 | @@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste | if (!ei->pid) | ei->pid = find_get_pid(1); | sb->s_flags |= MS_ACTIVE; | - | - mntput(ns->proc_mnt); | ns->proc_mnt = mnt; | } ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <468DF907.2060809@openvz.org>]
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [not found] ` <468DF907.2060809@openvz.org> @ 2007-07-25 0:36 ` sukadev 2007-07-25 10:07 ` Pavel Emelyanov 0 siblings, 1 reply; 6+ messages in thread From: sukadev @ 2007-07-25 0:36 UTC (permalink / raw) To: Pavel Emelianov Cc: Andrew Morton, Serge Hallyn, Eric W. Biederman, Linux Containers, Linux Kernel Mailing List, Kirill Korotaev Pavel Emelianov [xemul@openvz.org] wrote: | Make alloc_pid() initialize pid_numbers and hash them | into the hashtable, not the struct pid itself. | | Signed-off-by: Pavel Emelianov <xemul@openvz.org> | | --- | | pid.c | 47 +++++++++++++++++++++++++++++++++-------------- | 1 files changed, 33 insertions(+), 14 deletions(-) | | --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400 | +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400 | @@ -28,8 +28,10 @@ | #include <linux/hash.h> | #include <linux/pid_namespace.h> | #include <linux/init_task.h> | +#include <linux/proc_fs.h> | | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) | +#define pid_hashfn(nr, ns) \ | + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift) | static struct hlist_head *pid_hash; | static int pidhash_shift; | struct pid init_struct_pid = INIT_STRUCT_PID; | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid) | if (!pid) | return; | | - ns = pid->numbers[0].ns; | + ns = pid->numbers[pid->level].ns; | if ((atomic_read(&pid->count) == 1) || | atomic_dec_and_test(&pid->count)) | kmem_cache_free(ns->pid_cachep, pid); | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h | fastcall void free_pid(struct pid *pid) | { | /* We can be called with write_lock_irq(&tasklist_lock) held */ | + int i; | unsigned long flags; | | spin_lock_irqsave(&pidmap_lock, flags); | - hlist_del_rcu(&pid->pid_chain); | + for (i = 0; i <= pid->level; i++) | + hlist_del_rcu(&pid->numbers[i].pid_chain); | spin_unlock_irqrestore(&pidmap_lock, flags); | | - free_pidmap(&init_pid_ns, pid->nr); | + for (i = 0; i <= pid->level; i++) | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); | + | call_rcu(&pid->rcu, delayed_put_pid); | } | | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa | { | struct pid *pid; | enum pid_type type; | - int nr = -1; | + struct pid_namespace *ns; | + int i, nr; | | - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL); | + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); | if (!pid) | goto out; | | - nr = alloc_pidmap(current->nsproxy->pid_ns); | - if (nr < 0) | - goto out_free; | + ns = pid_ns; | + for (i = pid_ns->level; i >= 0; i--) { | + nr = alloc_pidmap(ns); | + if (nr < 0) | + goto out_free; If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1 and fails when i=2, we would try to free_pidmap() even from pid->pid_number[2].pid_ns. This would incorrectly a) drop reference count on that pid namespace, and incorrectly increment pidmap->nr_free. Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns before calling free_pidmap() below ? | | + pid->numbers[i].nr = nr; | + pid->numbers[i].ns = ns; | + ns = ns->parent; | + } | + | + pid->level = pid_ns->level; | atomic_set(&pid->count, 1); | - pid->nr = nr; | for (type = 0; type < PIDTYPE_MAX; ++type) | INIT_HLIST_HEAD(&pid->tasks[type]); | | spin_lock_irq(&pidmap_lock); | - hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]); | + for (i = pid->level; i >= 0; i--) | + hlist_add_head_rcu(&pid->numbers[i].pid_chain, | + &pid_hash[pid_hashfn(pid->numbers[i].nr, | + pid->numbers[i].ns)]); | spin_unlock_irq(&pidmap_lock); | - | out: | return pid; | | out_free: | - kmem_cache_free(init_pid_ns.pid_cachep, pid); | + for (i++; i <= pid->level; i++) | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); i.e all pid->numbers[] may not be initialized here right ? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids 2007-07-25 0:36 ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev @ 2007-07-25 10:07 ` Pavel Emelyanov [not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Pavel Emelyanov @ 2007-07-25 10:07 UTC (permalink / raw) To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List sukadev@us.ibm.com wrote: > Pavel Emelianov [xemul@openvz.org] wrote: > | Make alloc_pid() initialize pid_numbers and hash them > | into the hashtable, not the struct pid itself. > | > | Signed-off-by: Pavel Emelianov <xemul@openvz.org> > | > | --- > | > | pid.c | 47 +++++++++++++++++++++++++++++++++-------------- > | 1 files changed, 33 insertions(+), 14 deletions(-) > | > | --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400 > | +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400 > | @@ -28,8 +28,10 @@ > | #include <linux/hash.h> > | #include <linux/pid_namespace.h> > | #include <linux/init_task.h> > | +#include <linux/proc_fs.h> > | > | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) > | +#define pid_hashfn(nr, ns) \ > | + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift) > | static struct hlist_head *pid_hash; > | static int pidhash_shift; > | struct pid init_struct_pid = INIT_STRUCT_PID; > | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid) > | if (!pid) > | return; > | > | - ns = pid->numbers[0].ns; > | + ns = pid->numbers[pid->level].ns; > | if ((atomic_read(&pid->count) == 1) || > | atomic_dec_and_test(&pid->count)) > | kmem_cache_free(ns->pid_cachep, pid); > | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h > | fastcall void free_pid(struct pid *pid) > | { > | /* We can be called with write_lock_irq(&tasklist_lock) held */ > | + int i; > | unsigned long flags; > | > | spin_lock_irqsave(&pidmap_lock, flags); > | - hlist_del_rcu(&pid->pid_chain); > | + for (i = 0; i <= pid->level; i++) > | + hlist_del_rcu(&pid->numbers[i].pid_chain); > | spin_unlock_irqrestore(&pidmap_lock, flags); > | > | - free_pidmap(&init_pid_ns, pid->nr); > | + for (i = 0; i <= pid->level; i++) > | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); > | + > | call_rcu(&pid->rcu, delayed_put_pid); > | } > | > | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa > | { > | struct pid *pid; > | enum pid_type type; > | - int nr = -1; > | + struct pid_namespace *ns; > | + int i, nr; > | > | - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL); > | + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); > | if (!pid) > | goto out; > | > | - nr = alloc_pidmap(current->nsproxy->pid_ns); > | - if (nr < 0) > | - goto out_free; > | + ns = pid_ns; > | + for (i = pid_ns->level; i >= 0; i--) { > | + nr = alloc_pidmap(ns); > | + if (nr < 0) > | + goto out_free; > > If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1 It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence. The loop is descending, not ascending... > and fails when i=2, we would try to free_pidmap() even from > pid->pid_number[2].pid_ns. This would incorrectly a) > drop reference count on that pid namespace, and incorrectly > increment pidmap->nr_free. > > Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns > before calling free_pidmap() below ? > > | > | + pid->numbers[i].nr = nr; > | + pid->numbers[i].ns = ns; > | + ns = ns->parent; > | + } > | + > | + pid->level = pid_ns->level; > | atomic_set(&pid->count, 1); > | - pid->nr = nr; > | for (type = 0; type < PIDTYPE_MAX; ++type) > | INIT_HLIST_HEAD(&pid->tasks[type]); > | > | spin_lock_irq(&pidmap_lock); > | - hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]); > | + for (i = pid->level; i >= 0; i--) > | + hlist_add_head_rcu(&pid->numbers[i].pid_chain, > | + &pid_hash[pid_hashfn(pid->numbers[i].nr, > | + pid->numbers[i].ns)]); > | spin_unlock_irq(&pidmap_lock); > | - > | out: > | return pid; > | > | out_free: > | - kmem_cache_free(init_pid_ns.pid_cachep, pid); > | + for (i++; i <= pid->level; i++) > | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); > > i.e all pid->numbers[] may not be initialized here right ? The numbers from i up to pid->level are initialized, so this loop looks correct. Thanks, Pavel ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA 2007-07-26 6:42 ` Pavel Emelyanov 0 siblings, 1 reply; 6+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-25 19:13 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: Linux Containers, Linux Kernel Mailing List Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote: | sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote: | >Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote: | >| Make alloc_pid() initialize pid_numbers and hash them | >| into the hashtable, not the struct pid itself. | >| | >| Signed-off-by: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> | >| | >| --- | >| | >| pid.c | 47 +++++++++++++++++++++++++++++++++-------------- | >| 1 files changed, 33 insertions(+), 14 deletions(-) | >| | >| --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400 | >| +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400 | >| @@ -28,8 +28,10 @@ | >| #include <linux/hash.h> | >| #include <linux/pid_namespace.h> | >| #include <linux/init_task.h> | >| +#include <linux/proc_fs.h> | >| | >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) | >| +#define pid_hashfn(nr, ns) \ | >| + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift) | >| static struct hlist_head *pid_hash; | >| static int pidhash_shift; | >| struct pid init_struct_pid = INIT_STRUCT_PID; | >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid) | >| if (!pid) | >| return; | >| | >| - ns = pid->numbers[0].ns; | >| + ns = pid->numbers[pid->level].ns; | >| if ((atomic_read(&pid->count) == 1) || | >| atomic_dec_and_test(&pid->count)) | >| kmem_cache_free(ns->pid_cachep, pid); | >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h | >| fastcall void free_pid(struct pid *pid) | >| { | >| /* We can be called with write_lock_irq(&tasklist_lock) held */ | >| + int i; | >| unsigned long flags; | >| | >| spin_lock_irqsave(&pidmap_lock, flags); | >| - hlist_del_rcu(&pid->pid_chain); | >| + for (i = 0; i <= pid->level; i++) | >| + hlist_del_rcu(&pid->numbers[i].pid_chain); | >| spin_unlock_irqrestore(&pidmap_lock, flags); | >| | >| - free_pidmap(&init_pid_ns, pid->nr); | >| + for (i = 0; i <= pid->level; i++) | >| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); | >| + | >| call_rcu(&pid->rcu, delayed_put_pid); | >| } | >| | >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa | >| { | >| struct pid *pid; | >| enum pid_type type; | >| - int nr = -1; | >| + struct pid_namespace *ns; | >| + int i, nr; | >| | >| - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL); | >| + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); | >| if (!pid) | >| goto out; | >| | >| - nr = alloc_pidmap(current->nsproxy->pid_ns); | >| - if (nr < 0) | >| - goto out_free; | >| + ns = pid_ns; | >| + for (i = pid_ns->level; i >= 0; i--) { | >| + nr = alloc_pidmap(ns); | >| + if (nr < 0) | >| + goto out_free; | > | >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1 | | It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence. | The loop is descending, not ascending... Aah descending - thats right. But I still think there is a problem. Here is your code that I am referring to: pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); if (!pid) goto out; ns = pid_ns; for (i = pid_ns->level; i >= 0; i--) { nr = alloc_pidmap(ns); if (nr < 0) goto out_free; pid->numbers[i].nr = nr; pid->numbers[i].ns = ns; ns = ns->parent; } pid->level = pid_ns->level; <snip> out: return pid; out_free: for (i++; i <= pid->level; i++) free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); kmem_cache_free(pid_ns->pid_cachep, pid); pid = NULL; goto out; <end code> Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for i=3 and i=2 but fails for i=1 and we execute "goto out_free". But pid->level is uninitialized at this point right ? Even if it were set to zero (using kmem_cache_zalloc()), we may not free the two pidmap entries we allocated for i=3 and i=2. Suka ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids 2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-26 6:42 ` Pavel Emelyanov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Emelyanov @ 2007-07-26 6:42 UTC (permalink / raw) To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List sukadev@us.ibm.com wrote: > Pavel Emelianov [xemul@openvz.org] wrote: > | sukadev@us.ibm.com wrote: > | >Pavel Emelianov [xemul@openvz.org] wrote: > | >| Make alloc_pid() initialize pid_numbers and hash them > | >| into the hashtable, not the struct pid itself. > | >| > | >| Signed-off-by: Pavel Emelianov <xemul@openvz.org> > | >| > | >| --- > | >| > | >| pid.c | 47 +++++++++++++++++++++++++++++++++-------------- > | >| 1 files changed, 33 insertions(+), 14 deletions(-) > | >| > | >| --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.000000000 +0400 > | >| +++ ./kernel/pid.c 2007-07-05 11:08:23.000000000 +0400 > | >| @@ -28,8 +28,10 @@ > | >| #include <linux/hash.h> > | >| #include <linux/pid_namespace.h> > | >| #include <linux/init_task.h> > | >| +#include <linux/proc_fs.h> > | >| > | >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) > | >| +#define pid_hashfn(nr, ns) \ > | >| + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift) > | >| static struct hlist_head *pid_hash; > | >| static int pidhash_shift; > | >| struct pid init_struct_pid = INIT_STRUCT_PID; > | >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid) > | >| if (!pid) > | >| return; > | >| > | >| - ns = pid->numbers[0].ns; > | >| + ns = pid->numbers[pid->level].ns; > | >| if ((atomic_read(&pid->count) == 1) || > | >| atomic_dec_and_test(&pid->count)) > | >| kmem_cache_free(ns->pid_cachep, pid); > | >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h > | >| fastcall void free_pid(struct pid *pid) > | >| { > | >| /* We can be called with write_lock_irq(&tasklist_lock) held */ > | >| + int i; > | >| unsigned long flags; > | >| > | >| spin_lock_irqsave(&pidmap_lock, flags); > | >| - hlist_del_rcu(&pid->pid_chain); > | >| + for (i = 0; i <= pid->level; i++) > | >| + hlist_del_rcu(&pid->numbers[i].pid_chain); > | >| spin_unlock_irqrestore(&pidmap_lock, flags); > | >| > | >| - free_pidmap(&init_pid_ns, pid->nr); > | >| + for (i = 0; i <= pid->level; i++) > | >| + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); > | >| + > | >| call_rcu(&pid->rcu, delayed_put_pid); > | >| } > | >| > | >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa > | >| { > | >| struct pid *pid; > | >| enum pid_type type; > | >| - int nr = -1; > | >| + struct pid_namespace *ns; > | >| + int i, nr; > | >| > | >| - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL); > | >| + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); > | >| if (!pid) > | >| goto out; > | >| > | >| - nr = alloc_pidmap(current->nsproxy->pid_ns); > | >| - if (nr < 0) > | >| - goto out_free; > | >| + ns = pid_ns; > | >| + for (i = pid_ns->level; i >= 0; i--) { > | >| + nr = alloc_pidmap(ns); > | >| + if (nr < 0) > | >| + goto out_free; > | > > | >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1 > | > | It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence. > | The loop is descending, not ascending... > > Aah descending - thats right. But I still think there is a problem. > > Here is your code that I am referring to: > > pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); > if (!pid) > goto out; > > ns = pid_ns; > for (i = pid_ns->level; i >= 0; i--) { > nr = alloc_pidmap(ns); > if (nr < 0) > goto out_free; > > pid->numbers[i].nr = nr; > pid->numbers[i].ns = ns; > ns = ns->parent; > } > > pid->level = pid_ns->level; > > <snip> > > out: > return pid; > out_free: > for (i++; i <= pid->level; i++) > free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); > > kmem_cache_free(pid_ns->pid_cachep, pid); > pid = NULL; > goto out; > > <end code> > > Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for > i=3 and i=2 but fails for i=1 and we execute "goto out_free". > > But pid->level is uninitialized at this point right ? > > Even if it were set to zero (using kmem_cache_zalloc()), we may not > free the two pidmap entries we allocated for i=3 and i=2. Yes. I found this after detailed look at the code and (hope) fixed. > Suka > Thanks, Pavel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-07-26 6:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <468DF6F7.1010906@openvz.org>
[not found] ` <20070710002929.GA11549@us.ibm.com>
[not found] ` <469384DA.70805@openvz.org>
[not found] ` <20070712031937.GB11489@us.ibm.com>
[not found] ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-07-16 8:47 ` [PATCH 0/16] Pid namespaces Pavel Emelianov
[not found] ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-17 4:23 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <468DF907.2060809@openvz.org>
2007-07-25 0:36 ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev
2007-07-25 10:07 ` Pavel Emelyanov
[not found] ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-25 19:13 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2007-07-26 6:42 ` Pavel Emelyanov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox