Linux Container Development
 help / color / mirror / Atom feed
* Re: [PATCH 0/16] Pid namespaces
       [not found]       ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-07-16  8:47         ` Pavel Emelianov
       [not found]           ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Emelianov @ 2007-07-16  8:47 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Linux Containers

> My x86_64 system boots fine but crashes as below, when I run my
> 'pidns_exec' test with a simple program that prints getpid(), getppid()
> etc of the process in the child pid ns.
> 
> Pls see 
> 
> http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
> 
> for the patches I currently have applied and let me know if I need more
> on top.
> 
> And see
> 
> http://www.geocities.com/sukadevb/Pidspace/Test1/
> 
> for the test programs. You may need to run the 'mypid-loop.x' script
> to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID
> and execs the given program (it was included in Patch 0 of the patchset I
> posted to Containers).
> 
> Suka
> 
> login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP:
>  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> PGD 104d53067 PUD 104d4d067 PMD 0
> Oops: 0002 [1] SMP
> CPU 2
> Modules linked in:
> Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
> RIP: 0010:[<ffffffff802b9e5e>]  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> RSP: 0018:ffff8101029d7d28  EFLAGS: 00010202
> RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
> RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
> RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
> R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
> FS:  00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task ffff81010269e7f0)
> Stack:  ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
>  ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
>  00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
> Call Trace:
>  [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
>  [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
>  [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
>  [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
>  [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
>  [<ffffffff802347e2>] copy_process+0xcb4/0x1439
>  [<ffffffff8020bbee>] system_call+0x7e/0x83
>  [<ffffffff8023556a>] do_fork+0x6c/0x1e7
>  [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
> 

Here's the patch fixing the problem. So, Suka, I propose that you review
my patches, point out things that you don't like and would like to see
your code instead. After all I will re-split the set with your fixes, mark
some patches with your From: and send them to Andrew. What do you think?

---

--- ./fs/proc/root.c.procpidnsfix	2007-07-16 10:32:00.000000000 +0400
+++ ./fs/proc/root.c	2007-07-16 12:34:35.000000000 +0400
@@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
 		if (!ei->pid)
 			ei->pid = find_get_pid(1);
 		sb->s_flags |= MS_ACTIVE;
-
-		mntput(ns->proc_mnt);
 		ns->proc_mnt = mnt;
 	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/16] Pid namespaces
       [not found]           ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-07-17  4:23             ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
  0 siblings, 0 replies; 6+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-17  4:23 UTC (permalink / raw)
  To: Pavel Emelianov; +Cc: Linux Containers

Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| >My x86_64 system boots fine but crashes as below, when I run my
| >'pidns_exec' test with a simple program that prints getpid(), getppid()
| >etc of the process in the child pid ns.
| >
| >Pls see 
| >
| >http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
| >
| >for the patches I currently have applied and let me know if I need more
| >on top.
| >
| >And see
| >
| >http://www.geocities.com/sukadevb/Pidspace/Test1/
| >
| >for the test programs. You may need to run the 'mypid-loop.x' script
| >to repro the crash. The pidns_exec.c program calls clone() with 
| >CLONE_NEWPID
| >and execs the given program (it was included in Patch 0 of the patchset I
| >posted to Containers).
| >
| >Suka
| >
| >login: Unable to handle kernel NULL pointer dereference at 
| >00000000000002fc RIP:
| > [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >PGD 104d53067 PUD 104d4d067 PMD 0
| >Oops: 0002 [1] SMP
| >CPU 2
| >Modules linked in:
| >Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
| >RIP: 0010:[<ffffffff802b9e5e>]  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >RSP: 0018:ffff8101029d7d28  EFLAGS: 00010202
| >RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
| >RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
| >RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
| >R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
| >R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
| >FS:  00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) 
| >knlGS:0000000000000000
| >CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
| >CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
| >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
| >Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task 
| >ffff81010269e7f0)
| >Stack:  ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
| > ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
| > 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
| >Call Trace:
| > [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
| > [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
| > [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
| > [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
| > [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
| > [<ffffffff802347e2>] copy_process+0xcb4/0x1439
| > [<ffffffff8020bbee>] system_call+0x7e/0x83
| > [<ffffffff8023556a>] do_fork+0x6c/0x1e7
| > [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
| >
| 
| Here's the patch fixing the problem.

Yes, I think it fixes the problem I was seeing before.
However, my next test failed. This time I run:

	$ cat ./lxc-wrap.sh
	#!/bin/sh

	if [ $$ -eq 1 ]; then
		echo "$$ (`basename $0`): Executing in nested pid ns..."
	fi

	port_num=$1;
	if [ $# -lt 1 ]; then
	    port_num=2709
	fi

	mount -t proc lxcproc /proc
	/usr/sbin/sshd -D -p $port_num
	umount /proc


	$ ./pidns_exec ./lxc-wrap.sh

This creates a new pid ns and an sshd in that ns. From another window
I ssh to the system with port number 2709 and I am in the new pid ns.
(a simple prog to print getpid(), getppid() etc seems to be fine).

But "ps -e" fails:
	
	$ ps -e
	Error: /proc must be mounted
	  To mount /proc at boot you need an /etc/fstab line like:
	      /proc   /proc   proc    defaults
	  In the meantime, mount /proc /proc -t proc

	$ mount -t proc none /proc
	mount: /proc is busy

If I comment out the "mount" command in the lxc-wrap.sh script above,
this mount seems to work, but "ps -e" still reports the same error. 

| So, Suka, I propose that you review my patches, point out things that you
| don't like and would like to see your code instead.

I am reviewing them and I thought we wanted to start with my patchset. I
started porting my patches to rc6-mm1 plus patches we already sent to -mm.
Well, I can hold off on that and review/test your patches.

| After all I will re-split the set with your fixes, mark
| some patches with your From: and send them to Andrew. What do you think?

I don't mind re-splitting the patches, but would that be more work for us
and would Andrew and other reviewers prefer patches split logically.  For
instance if you and I contributed to a patch, can we just put both our
Signed-off on one patch rather than splitting it ?

| 
| ---
| 
| --- ./fs/proc/root.c.procpidnsfix	2007-07-16 10:32:00.000000000 +0400
| +++ ./fs/proc/root.c	2007-07-16 12:34:35.000000000 +0400
| @@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
| 		if (!ei->pid)
| 			ei->pid = find_get_pid(1);
| 		sb->s_flags |= MS_ACTIVE;
| -
| -		mntput(ns->proc_mnt);
| 		ns->proc_mnt = mnt;
| 	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
       [not found] ` <468DF907.2060809@openvz.org>
@ 2007-07-25  0:36   ` sukadev
  2007-07-25 10:07     ` Pavel Emelyanov
  0 siblings, 1 reply; 6+ messages in thread
From: sukadev @ 2007-07-25  0:36 UTC (permalink / raw)
  To: Pavel Emelianov
  Cc: Andrew Morton, Serge Hallyn, Eric W. Biederman, Linux Containers,
	Linux Kernel Mailing List, Kirill Korotaev

Pavel Emelianov [xemul@openvz.org] wrote:
| Make alloc_pid() initialize pid_numbers and hash them
| into the hashtable, not the struct pid itself.
| 
| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
| 
| ---
| 
|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
|  1 files changed, 33 insertions(+), 14 deletions(-)
| 
| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
| @@ -28,8 +28,10 @@
|  #include <linux/hash.h>
|  #include <linux/pid_namespace.h>
|  #include <linux/init_task.h>
| +#include <linux/proc_fs.h>
| 
| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| +#define pid_hashfn(nr, ns)	\
| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
|  static struct hlist_head *pid_hash;
|  static int pidhash_shift;
|  struct pid init_struct_pid = INIT_STRUCT_PID;
| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
|  	if (!pid)
|  		return;
| 
| -	ns = pid->numbers[0].ns;
| +	ns = pid->numbers[pid->level].ns;
|  	if ((atomic_read(&pid->count) == 1) ||
|  	     atomic_dec_and_test(&pid->count))
|  		kmem_cache_free(ns->pid_cachep, pid);
| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
|  fastcall void free_pid(struct pid *pid)
|  {
|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
| +	int i;
|  	unsigned long flags;
| 
|  	spin_lock_irqsave(&pidmap_lock, flags);
| -	hlist_del_rcu(&pid->pid_chain);
| +	for (i = 0; i <= pid->level; i++)
| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
|  	spin_unlock_irqrestore(&pidmap_lock, flags);
| 
| -	free_pidmap(&init_pid_ns, pid->nr);
| +	for (i = 0; i <= pid->level; i++)
| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| +
|  	call_rcu(&pid->rcu, delayed_put_pid);
|  }
| 
| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
|  {
|  	struct pid *pid;
|  	enum pid_type type;
| -	int nr = -1;
| +	struct pid_namespace *ns;
| +	int i, nr;
| 
| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
|  	if (!pid)
|  		goto out;
| 
| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
| -	if (nr < 0)
| -		goto out_free;
| +	ns = pid_ns;
| +	for (i = pid_ns->level; i >= 0; i--) {
| +		nr = alloc_pidmap(ns);
| +		if (nr < 0)
| +			goto out_free;

If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
and fails when i=2, we would try to free_pidmap() even from 
pid->pid_number[2].pid_ns. This would incorrectly a)
drop reference count on that pid namespace, and incorrectly
increment pidmap->nr_free.

Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
before calling free_pidmap() below ?

| 
| +		pid->numbers[i].nr = nr;
| +		pid->numbers[i].ns = ns;
| +		ns = ns->parent;
| +	}
| +
| +	pid->level = pid_ns->level;
|  	atomic_set(&pid->count, 1);
| -	pid->nr = nr;
|  	for (type = 0; type < PIDTYPE_MAX; ++type)
|  		INIT_HLIST_HEAD(&pid->tasks[type]);
| 
|  	spin_lock_irq(&pidmap_lock);
| -	hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
| +	for (i = pid->level; i >= 0; i--)
| +		hlist_add_head_rcu(&pid->numbers[i].pid_chain,
| +				&pid_hash[pid_hashfn(pid->numbers[i].nr,
| +					pid->numbers[i].ns)]);
|  	spin_unlock_irq(&pidmap_lock);
| -
|  out:
|  	return pid;
| 
|  out_free:
| -	kmem_cache_free(init_pid_ns.pid_cachep, pid);
| +	for (i++; i <= pid->level; i++)
| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);

i.e all pid->numbers[] may not be initialized here right ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
  2007-07-25  0:36   ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev
@ 2007-07-25 10:07     ` Pavel Emelyanov
       [not found]       ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Emelyanov @ 2007-07-25 10:07 UTC (permalink / raw)
  To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List

sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | Make alloc_pid() initialize pid_numbers and hash them
> | into the hashtable, not the struct pid itself.
> | 
> | Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | 
> | ---
> | 
> |  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
> |  1 files changed, 33 insertions(+), 14 deletions(-)
> | 
> | --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
> | +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
> | @@ -28,8 +28,10 @@
> |  #include <linux/hash.h>
> |  #include <linux/pid_namespace.h>
> |  #include <linux/init_task.h>
> | +#include <linux/proc_fs.h>
> | 
> | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | +#define pid_hashfn(nr, ns)	\
> | +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> |  static struct hlist_head *pid_hash;
> |  static int pidhash_shift;
> |  struct pid init_struct_pid = INIT_STRUCT_PID;
> | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> |  	if (!pid)
> |  		return;
> | 
> | -	ns = pid->numbers[0].ns;
> | +	ns = pid->numbers[pid->level].ns;
> |  	if ((atomic_read(&pid->count) == 1) ||
> |  	     atomic_dec_and_test(&pid->count))
> |  		kmem_cache_free(ns->pid_cachep, pid);
> | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> |  fastcall void free_pid(struct pid *pid)
> |  {
> |  	/* We can be called with write_lock_irq(&tasklist_lock) held */
> | +	int i;
> |  	unsigned long flags;
> | 
> |  	spin_lock_irqsave(&pidmap_lock, flags);
> | -	hlist_del_rcu(&pid->pid_chain);
> | +	for (i = 0; i <= pid->level; i++)
> | +		hlist_del_rcu(&pid->numbers[i].pid_chain);
> |  	spin_unlock_irqrestore(&pidmap_lock, flags);
> | 
> | -	free_pidmap(&init_pid_ns, pid->nr);
> | +	for (i = 0; i <= pid->level; i++)
> | +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | +
> |  	call_rcu(&pid->rcu, delayed_put_pid);
> |  }
> | 
> | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> |  {
> |  	struct pid *pid;
> |  	enum pid_type type;
> | -	int nr = -1;
> | +	struct pid_namespace *ns;
> | +	int i, nr;
> | 
> | -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> |  	if (!pid)
> |  		goto out;
> | 
> | -	nr = alloc_pidmap(current->nsproxy->pid_ns);
> | -	if (nr < 0)
> | -		goto out_free;
> | +	ns = pid_ns;
> | +	for (i = pid_ns->level; i >= 0; i--) {
> | +		nr = alloc_pidmap(ns);
> | +		if (nr < 0)
> | +			goto out_free;
> 
> If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1

It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
The loop is descending, not ascending...

> and fails when i=2, we would try to free_pidmap() even from 
> pid->pid_number[2].pid_ns. This would incorrectly a)
> drop reference count on that pid namespace, and incorrectly
> increment pidmap->nr_free.
> 
> Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
> before calling free_pidmap() below ?
> 
> | 
> | +		pid->numbers[i].nr = nr;
> | +		pid->numbers[i].ns = ns;
> | +		ns = ns->parent;
> | +	}
> | +
> | +	pid->level = pid_ns->level;
> |  	atomic_set(&pid->count, 1);
> | -	pid->nr = nr;
> |  	for (type = 0; type < PIDTYPE_MAX; ++type)
> |  		INIT_HLIST_HEAD(&pid->tasks[type]);
> | 
> |  	spin_lock_irq(&pidmap_lock);
> | -	hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
> | +	for (i = pid->level; i >= 0; i--)
> | +		hlist_add_head_rcu(&pid->numbers[i].pid_chain,
> | +				&pid_hash[pid_hashfn(pid->numbers[i].nr,
> | +					pid->numbers[i].ns)]);
> |  	spin_unlock_irq(&pidmap_lock);
> | -
> |  out:
> |  	return pid;
> | 
> |  out_free:
> | -	kmem_cache_free(init_pid_ns.pid_cachep, pid);
> | +	for (i++; i <= pid->level; i++)
> | +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> 
> i.e all pid->numbers[] may not be initialized here right ?

The numbers from i up to pid->level are initialized, so this
loop looks correct.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
       [not found]       ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-07-25 19:13         ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
  2007-07-26  6:42           ` Pavel Emelyanov
  0 siblings, 1 reply; 6+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-07-25 19:13 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Linux Containers, Linux Kernel Mailing List

Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
| >Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| >| Make alloc_pid() initialize pid_numbers and hash them
| >| into the hashtable, not the struct pid itself.
| >| 
| >| Signed-off-by: Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
| >| 
| >| ---
| >| 
| >|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
| >|  1 files changed, 33 insertions(+), 14 deletions(-)
| >| 
| >| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
| >| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
| >| @@ -28,8 +28,10 @@
| >|  #include <linux/hash.h>
| >|  #include <linux/pid_namespace.h>
| >|  #include <linux/init_task.h>
| >| +#include <linux/proc_fs.h>
| >| 
| >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| >| +#define pid_hashfn(nr, ns)	\
| >| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
| >|  static struct hlist_head *pid_hash;
| >|  static int pidhash_shift;
| >|  struct pid init_struct_pid = INIT_STRUCT_PID;
| >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
| >|  	if (!pid)
| >|  		return;
| >| 
| >| -	ns = pid->numbers[0].ns;
| >| +	ns = pid->numbers[pid->level].ns;
| >|  	if ((atomic_read(&pid->count) == 1) ||
| >|  	     atomic_dec_and_test(&pid->count))
| >|  		kmem_cache_free(ns->pid_cachep, pid);
| >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
| >|  fastcall void free_pid(struct pid *pid)
| >|  {
| >|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
| >| +	int i;
| >|  	unsigned long flags;
| >| 
| >|  	spin_lock_irqsave(&pidmap_lock, flags);
| >| -	hlist_del_rcu(&pid->pid_chain);
| >| +	for (i = 0; i <= pid->level; i++)
| >| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
| >|  	spin_unlock_irqrestore(&pidmap_lock, flags);
| >| 
| >| -	free_pidmap(&init_pid_ns, pid->nr);
| >| +	for (i = 0; i <= pid->level; i++)
| >| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| >| +
| >|  	call_rcu(&pid->rcu, delayed_put_pid);
| >|  }
| >| 
| >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
| >|  {
| >|  	struct pid *pid;
| >|  	enum pid_type type;
| >| -	int nr = -1;
| >| +	struct pid_namespace *ns;
| >| +	int i, nr;
| >| 
| >| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| >| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
| >|  	if (!pid)
| >|  		goto out;
| >| 
| >| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
| >| -	if (nr < 0)
| >| -		goto out_free;
| >| +	ns = pid_ns;
| >| +	for (i = pid_ns->level; i >= 0; i--) {
| >| +		nr = alloc_pidmap(ns);
| >| +		if (nr < 0)
| >| +			goto out_free;
| >
| >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
| 
| It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
| The loop is descending, not ascending...

Aah descending - thats right. But I still think there is a problem.

Here is your code that I am referring to:

        pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
        if (!pid)
                goto out;

        ns = pid_ns;
        for (i = pid_ns->level; i >= 0; i--) {
                nr = alloc_pidmap(ns);
                if (nr < 0)
                        goto out_free;

                pid->numbers[i].nr = nr;
                pid->numbers[i].ns = ns;
                ns = ns->parent;
        }

	pid->level = pid_ns->level;

	<snip>

out:
	return pid;
out_free:
        for (i++; i <= pid->level; i++)
                free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);

        kmem_cache_free(pid_ns->pid_cachep, pid);
        pid = NULL;
	goto out;

<end code>

Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
i=3 and i=2 but fails for i=1 and we execute "goto out_free".

But pid->level is uninitialized at this point right ?

Even if it were set to zero (using kmem_cache_zalloc()), we may not
free the two pidmap entries we allocated for i=3 and i=2.

Suka

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
  2007-07-25 19:13         ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
@ 2007-07-26  6:42           ` Pavel Emelyanov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Emelyanov @ 2007-07-26  6:42 UTC (permalink / raw)
  To: sukadev; +Cc: Serge Hallyn, Linux Containers, Linux Kernel Mailing List

sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | sukadev@us.ibm.com wrote:
> | >Pavel Emelianov [xemul@openvz.org] wrote:
> | >| Make alloc_pid() initialize pid_numbers and hash them
> | >| into the hashtable, not the struct pid itself.
> | >| 
> | >| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | >| 
> | >| ---
> | >| 
> | >|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
> | >|  1 files changed, 33 insertions(+), 14 deletions(-)
> | >| 
> | >| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
> | >| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
> | >| @@ -28,8 +28,10 @@
> | >|  #include <linux/hash.h>
> | >|  #include <linux/pid_namespace.h>
> | >|  #include <linux/init_task.h>
> | >| +#include <linux/proc_fs.h>
> | >| 
> | >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | >| +#define pid_hashfn(nr, ns)	\
> | >| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> | >|  static struct hlist_head *pid_hash;
> | >|  static int pidhash_shift;
> | >|  struct pid init_struct_pid = INIT_STRUCT_PID;
> | >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> | >|  	if (!pid)
> | >|  		return;
> | >| 
> | >| -	ns = pid->numbers[0].ns;
> | >| +	ns = pid->numbers[pid->level].ns;
> | >|  	if ((atomic_read(&pid->count) == 1) ||
> | >|  	     atomic_dec_and_test(&pid->count))
> | >|  		kmem_cache_free(ns->pid_cachep, pid);
> | >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> | >|  fastcall void free_pid(struct pid *pid)
> | >|  {
> | >|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
> | >| +	int i;
> | >|  	unsigned long flags;
> | >| 
> | >|  	spin_lock_irqsave(&pidmap_lock, flags);
> | >| -	hlist_del_rcu(&pid->pid_chain);
> | >| +	for (i = 0; i <= pid->level; i++)
> | >| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
> | >|  	spin_unlock_irqrestore(&pidmap_lock, flags);
> | >| 
> | >| -	free_pidmap(&init_pid_ns, pid->nr);
> | >| +	for (i = 0; i <= pid->level; i++)
> | >| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | >| +
> | >|  	call_rcu(&pid->rcu, delayed_put_pid);
> | >|  }
> | >| 
> | >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> | >|  {
> | >|  	struct pid *pid;
> | >|  	enum pid_type type;
> | >| -	int nr = -1;
> | >| +	struct pid_namespace *ns;
> | >| +	int i, nr;
> | >| 
> | >| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | >| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> | >|  	if (!pid)
> | >|  		goto out;
> | >| 
> | >| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
> | >| -	if (nr < 0)
> | >| -		goto out_free;
> | >| +	ns = pid_ns;
> | >| +	for (i = pid_ns->level; i >= 0; i--) {
> | >| +		nr = alloc_pidmap(ns);
> | >| +		if (nr < 0)
> | >| +			goto out_free;
> | >
> | >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
> | 
> | It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
> | The loop is descending, not ascending...
> 
> Aah descending - thats right. But I still think there is a problem.
> 
> Here is your code that I am referring to:
> 
>         pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
>         if (!pid)
>                 goto out;
> 
>         ns = pid_ns;
>         for (i = pid_ns->level; i >= 0; i--) {
>                 nr = alloc_pidmap(ns);
>                 if (nr < 0)
>                         goto out_free;
> 
>                 pid->numbers[i].nr = nr;
>                 pid->numbers[i].ns = ns;
>                 ns = ns->parent;
>         }
> 
> 	pid->level = pid_ns->level;
> 
> 	<snip>
> 
> out:
> 	return pid;
> out_free:
>         for (i++; i <= pid->level; i++)
>                 free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> 
>         kmem_cache_free(pid_ns->pid_cachep, pid);
>         pid = NULL;
> 	goto out;
> 
> <end code>
> 
> Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
> i=3 and i=2 but fails for i=1 and we execute "goto out_free".
> 
> But pid->level is uninitialized at this point right ?
> 
> Even if it were set to zero (using kmem_cache_zalloc()), we may not
> free the two pidmap entries we allocated for i=3 and i=2.

Yes. I found this after detailed look at the code and (hope) fixed.

> Suka
> 

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-07-26  6:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <468DF6F7.1010906@openvz.org>
     [not found] ` <20070710002929.GA11549@us.ibm.com>
     [not found]   ` <469384DA.70805@openvz.org>
     [not found]     ` <20070712031937.GB11489@us.ibm.com>
     [not found]       ` <20070712031937.GB11489-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-07-16  8:47         ` [PATCH 0/16] Pid namespaces Pavel Emelianov
     [not found]           ` <469B308E.2080705-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-17  4:23             ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
     [not found] ` <468DF907.2060809@openvz.org>
2007-07-25  0:36   ` [PATCH 13/16] Switch to operating with pid_numbers instead of pids sukadev
2007-07-25 10:07     ` Pavel Emelyanov
     [not found]       ` <46A720C8.5070803-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-07-25 19:13         ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
2007-07-26  6:42           ` Pavel Emelyanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox