[RFC][PATCH] ps command race fix take 4 [4/4] proc root open/release/llseek

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH] ps command race fix take 4 [4/4] proc root open/release/llseek
@ 2006-08-25  9:29 KAMEZAWA Hiroyuki
  2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-08-25  9:29 UTC (permalink / raw)
  To: LKML; +Cc: ebiederm, kamezawa.hiroyu, Andrew Morton, saito.tadashi, ak

implements open/release/llseek ops are needed by new proc_pid_readdir()

llseek()'s offset is specified by 'bytes' but /proc root doesn't handle
file->f_pos as bytes. So I disabled llseek for /proc for now.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 fs/proc/root.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+)

Index: linux-2.6.18-rc4/fs/proc/root.c
===================================================================
--- linux-2.6.18-rc4.orig/fs/proc/root.c
+++ linux-2.6.18-rc4/fs/proc/root.c
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/bitops.h>
 #include <linux/smp_lock.h>
+#include <linux/adaptive_pointer.h>
 
 #include "internal.h"
 
@@ -118,14 +119,47 @@ static int proc_root_readdir(struct file
 	return ret;
 }
 
+static int proc_root_open(struct inode *inode, struct file *filp)
+{
+	struct adaptive_pointer *ap;
+	ap = kzalloc(sizeof(*ap), GFP_KERNEL);
+	init_adaptive_pointer(ap);
+	if (ap) {
+		filp->private_data = ap;
+		return 0;
+	}
+	return -ENOMEM;
+}
+
+static int proc_root_release(struct inode *inode, struct file *filp)
+{
+	struct adaptive_pointer *ap;
+	ap = filp->private_data;
+	rcu_read_lock();
+	__ap_get_remove_pointer(ap);
+	rcu_read_unlock();
+	kfree(ap);
+	filp->private_data = NULL;
+	return 0;
+}
+
+static loff_t proc_root_llseek(struct file *file, loff_t off, int pos)
+{
+	/* pos is bytes...but we don't use fp->f_pos as bytes... */
+	return -ENOTSUPP;
+}
+
 /*
  * The root /proc directory is special, as it has the
  * <pid> directories. Thus we don't use the generic
  * directory handling functions for that..
  */
 static struct file_operations proc_root_operations = {
+	.open		 = proc_root_open,
 	.read		 = generic_read_dir,
 	.readdir	 = proc_root_readdir,
+	.release	= proc_root_release,
+	.llseek		= proc_root_llseek,
 };
 
 /*


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] proc: readdir race fix.
  2006-08-25  9:29 [RFC][PATCH] ps command race fix take 4 [4/4] proc root open/release/llseek KAMEZAWA Hiroyuki
@ 2006-09-04 23:13 ` Eric W. Biederman
  2006-09-05  1:30   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-04 23:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, saito.tadashi, ak, KAMEZAWA Hiroyuki,
	Oleg Nesterov, Jean Delvare


The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of
system calls.  For this race to trigger the process that proc_pid_readdir
stops at must exit before readdir is called again.

This can cause ps to fail to report processes, and  it is in violation
of posix guarantees and normal application expectations with respect
to readdir.

Currently there is no way to work around this problem in user space
short of providing a gargantuan buffer to user space so the directory
read all happens in on system call.

This patch implements the normal directory semantics for proc,
that guarantee that a directory entry that is neither created nor
destroyed while reading the directory entry will be returned.  For
directory that are either created or destroyed during the readdir you
may or may not see them.  Furthermore you may seek to a directory
offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires,
and more importantly that user space expects. Plus it is a simple
semantic to implement reliable service.  It is just a matter of
calling readdir a second time if you are wondering if something new
has show up.

These better semantics are implemented by scanning through the
pids in numerical order and by making the file offset a pid
plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have
to look at to scan a linked list, and the worst case of having to scan
the entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less
code than what we are doing now.

This patch is against 2.6.18-rc6 and it should be relatively straight
forward to backport to older kernels as well.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/base.c      |   93 +++++++++++++--------------------------------------
 include/linux/pid.h |    1 +
 kernel/pid.c        |   37 ++++++++++++++++++++
 3 files changed, 62 insertions(+), 69 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index fe8d55f..7b93454 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2141,72 +2141,32 @@ out_no_task:
 }
 
 /*
- * Find the first tgid to return to user space.
+ * Find the first task with tgid >= tgid
  *
- * Usually this is just whatever follows &init_task, but if the users
- * buffer was too small to hold the full list or there was a seek into
- * the middle of the directory we have more work to do.
- *
- * In the case of a short read we start with find_task_by_pid.
- *
- * In the case of a seek we start with &init_task and walk nr
- * threads past it.
  */
-static struct task_struct *first_tgid(int tgid, unsigned int nr)
+static struct task_struct *next_tgid(unsigned int tgid)
 {
-	struct task_struct *pos;
-	rcu_read_lock();
-	if (tgid && nr) {
-		pos = find_task_by_pid(tgid);
-		if (pos && thread_group_leader(pos))
-			goto found;
-	}
-	/* If nr exceeds the number of processes get out quickly */
-	pos = NULL;
-	if (nr && nr >= nr_processes())
-		goto done;
-
-	/* If we haven't found our starting place yet start with
-	 * the init_task and walk nr tasks forward.
-	 */
-	for (pos = next_task(&init_task); nr > 0; --nr) {
-		pos = next_task(pos);
-		if (pos == &init_task) {
-			pos = NULL;
-			goto done;
-		}
-	}
-found:
-	get_task_struct(pos);
-done:
-	rcu_read_unlock();
-	return pos;
-}
+	struct task_struct *task;
+	struct pid *pid;
 
-/*
- * Find the next task in the task list.
- * Return NULL if we loop or there is any error.
- *
- * The reference to the input task_struct is released.
- */
-static struct task_struct *next_tgid(struct task_struct *start)
-{
-	struct task_struct *pos;
+	task = NULL;
 	rcu_read_lock();
-	pos = start;
-	if (pid_alive(start))
-		pos = next_task(start);
-	if (pid_alive(pos) && (pos != &init_task)) {
-		get_task_struct(pos);
-		goto done;
+retry:
+	pid = find_next_pid(tgid);
+	if (pid) {
+		tgid = pid->nr + 1;
+		task = pid_task(pid, PIDTYPE_PID);
+		if (!task || !thread_group_leader(task))
+			goto retry;
+		get_task_struct(task);
 	}
-	pos = NULL;
-done:
 	rcu_read_unlock();
-	put_task_struct(start);
-	return pos;
+	return task;
+	
 }
 
+#define TGID_OFFSET ((FIRST_PROCESS_ENTRY + 1) - 1)
+
 /* for the /proc/ directory itself, after non-process stuff has been done */
 int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
@@ -2222,29 +2182,24 @@ int proc_pid_readdir(struct file * filp,
 		filp->f_pos++;
 		nr++;
 	}
-	nr -= 1;
 
-	/* f_version caches the tgid value that the last readdir call couldn't
-	 * return. lseek aka telldir automagically resets f_version to 0.
-	 */
-	tgid = filp->f_version;
-	filp->f_version = 0;
-	for (task = first_tgid(tgid, nr);
+	tgid = filp->f_pos - TGID_OFFSET;
+	for (task = next_tgid(tgid);
 	     task;
-	     task = next_tgid(task), filp->f_pos++) {
+	     task = next_tgid(tgid + 1)) {
 		int len;
 		ino_t ino;
 		tgid = task->pid;
+		filp->f_pos = tgid + TGID_OFFSET;
 		len = snprintf(buf, sizeof(buf), "%d", tgid);
 		ino = fake_ino(tgid, PROC_TGID_INO);
 		if (filldir(dirent, buf, len, filp->f_pos, ino, DT_DIR) < 0) {
-			/* returning this tgid failed, save it as the first
-			 * pid for the next readir call */
-			filp->f_version = tgid;
 			put_task_struct(task);
-			break;
+			goto out;
 		}
 	}
+	filp->f_pos = PID_MAX_LIMIT + TGID_OFFSET;
+out:
 	return 0;
 }
 
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 29960b0..a24db52 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -87,6 +87,7 @@ extern struct pid *FASTCALL(find_pid(int
  * Lookup a PID in the hash table, and return with it's count elevated.
  */
 extern struct pid *find_get_pid(int nr);
+extern struct pid *find_next_pid(int nr);
 
 extern struct pid *alloc_pid(void);
 extern void FASTCALL(free_pid(struct pid *pid));
diff --git a/kernel/pid.c b/kernel/pid.c
index 93e212f..53d4159 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -145,6 +145,23 @@ static int alloc_pidmap(void)
 	return -1;
 }
 
+static int next_pidmap(int last)
+{
+	int offset;
+	pidmap_t *map;
+
+	offset = (last + 1) & BITS_PER_PAGE_MASK;
+	map = &pidmap_array[(last + 1)/BITS_PER_PAGE];
+	for (; map < &pidmap_array[PIDMAP_ENTRIES]; map++, offset = 0) {
+		if (unlikely(!map->page))
+			continue;
+		offset = find_next_bit((map)->page, BITS_PER_PAGE, offset);
+		if (offset < BITS_PER_PAGE) 
+			return mk_pid(map, offset);
+	}
+	return -1;
+}
+
 fastcall void put_pid(struct pid *pid)
 {
 	if (!pid)
@@ -297,6 +314,26 @@ struct pid *find_get_pid(pid_t nr)
 }
 
 /*
+ * Used by proc to find the pid with the first
+ * pid that is greater than or equal to number.
+ *
+ * If there is a pid at nr this function is exactly the same as find_pid.
+ */
+struct pid *find_next_pid(int nr)
+{
+	struct pid *next;
+
+	next = find_pid(nr);
+	while (!next) {
+		nr = next_pidmap(nr);
+		if (nr <= 0)
+			break;
+		next = find_pid(nr);
+	}
+	return next;
+}
+
+/*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
  * more.
-- 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
@ 2006-09-05  1:30   ` KAMEZAWA Hiroyuki
  2006-09-05  2:30     ` Eric W. Biederman
  2006-09-05  2:41     ` KAMEZAWA Hiroyuki
  2006-09-05  2:26   ` KAMEZAWA Hiroyuki
  2006-09-05  5:26   ` [PATCH] proc: readdir race fix KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  1:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, akpm, saito.tadashi, ak, oleg, jdelvare

Hi,

On Mon, 04 Sep 2006 17:13:10 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:
> These better semantics are implemented by scanning through the
> pids in numerical order and by making the file offset a pid
> plus a fixed offset.
I think this is very sane/solid approach.
Maybe this is the way to go. I'll test and ack later, thank you.

> The pid scan happens on the pid bitmap, which when you look at it is
> remarkably efficient for a brute force algorithm.  Given that a typical
> cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
> are only 40 cache lines for the entire 32K pid space.  A typical system
> will have 100 pids or more so this is actually fewer cache lines we have
> to look at to scan a linked list, and the worst case of having to scan
> the entire pid bitmap is pretty reasonable.

I agree with you but..
Becasue this approach has to access *all* task structs in a system,
and have to scan pidhash many times. I'm not sure that this scan & lookup
is good for future implementation. But this patch is obviously better than
current implementation.



>  /*
> + * Used by proc to find the pid with the first
> + * pid that is greater than or equal to number.
> + *
> + * If there is a pid at nr this function is exactly the same as find_pid.
> + */
> +struct pid *find_next_pid(int nr)
> +{

How about find_first_used_pid(int nr) ?

-Kame


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-05  1:30   ` KAMEZAWA Hiroyuki
@ 2006-09-05  2:30     ` Eric W. Biederman
  2006-09-05  2:41     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-05  2:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, saito.tadashi, ak, oleg, jdelvare

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> Hi,
>
> On Mon, 04 Sep 2006 17:13:10 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>> These better semantics are implemented by scanning through the
>> pids in numerical order and by making the file offset a pid
>> plus a fixed offset.
> I think this is very sane/solid approach.
> Maybe this is the way to go. I'll test and ack later, thank you.
>
>> The pid scan happens on the pid bitmap, which when you look at it is
>> remarkably efficient for a brute force algorithm.  Given that a typical
>> cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
>> are only 40 cache lines for the entire 32K pid space.  A typical system
>> will have 100 pids or more so this is actually fewer cache lines we have
>> to look at to scan a linked list, and the worst case of having to scan
>> the entire pid bitmap is pretty reasonable.
>
> I agree with you but..
> Becasue this approach has to access *all* task structs in a system,
> and have to scan pidhash many times. I'm not sure that this scan & lookup
> is good for future implementation. But this patch is obviously better than
> current implementation.

Quite possibly.  But where this approach falls down in not in the basics
but in the data structures.  And it isn't that hard to fix the data
structures.  Just something I don't want to do the first pass.

>>  /*
>> + * Used by proc to find the pid with the first
>> + * pid that is greater than or equal to number.
>> + *
>> + * If there is a pid at nr this function is exactly the same as find_pid.
>> + */
>> +struct pid *find_next_pid(int nr)
>> +{
>
> How about find_first_used_pid(int nr) ?

Maybe.  The best name I can think of is:
find_greater_or_equal_pid(int nr) and that is a little 
clumsy.   On the other hand it isn't confusing what the function
does.

Any other suggestions?

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-05  1:30   ` KAMEZAWA Hiroyuki
  2006-09-05  2:30     ` Eric W. Biederman
@ 2006-09-05  2:41     ` KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  2:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: ebiederm, linux-kernel, akpm, saito.tadashi, ak, oleg, jdelvare

On Tue, 5 Sep 2006 10:30:10 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Hi,
> 
> On Mon, 04 Sep 2006 17:13:10 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
> > These better semantics are implemented by scanning through the
> > pids in numerical order and by making the file offset a pid
> > plus a fixed offset.
> I think this is very sane/solid approach.
> Maybe this is the way to go. I'll test and ack later, thank you.
> 
Just a memo:

One interesting aspect of this patch is..

- default result of ps with current(old) kernel is naturally sorted by  starttime.
  then, ps command itself comes at the bottom of the result, as the newest process.

- default result of ps with this patch is sorted by pid, regardless of starttime.

==
kamezawa  3770  0.0  0.0  5156  996 tty1     S+   11:17   0:00 /bin/bash ./testp
kawamura  3989  0.0  0.0  2168  832 pts/3    S+   11:32   0:00 sh -c (cd /usr/sh
kawamura  3990  0.0  0.0  2168  408 pts/3    S+   11:32   0:00 sh -c (cd /usr/sh
kawamura  4002  0.0  0.0  1924  680 pts/3    S+   11:32   0:00 /usr/bin/less -iR
root     10772  0.0  0.0  6912 2188 ?        Ss   11:31   0:00 sshd: kawamura [p
kamezawa 12341  0.6  0.0  5336 1476 tty2     Ss   11:17   0:06 -bash
kamezawa 15308  0.0  0.0  4788  544 tty1     S+   11:32   0:00 sleep 1
kamezawa 15315  0.0  0.0  2380  756 pts/0    R+   11:32   0:00 ps aux
kamezawa 17322  0.0  0.0  5332 1472 tty3     Ss+  11:18   0:00 -bash
root     22194  0.0  0.0  1760  744 ?        Ss   11:20   0:00 in.telnetd: awork
root     22198  0.0  0.0  2992 1476 ?        Ss   11:20   0:00 login -- kamezawa
kawamura 24526  0.0  0.0  6912 1544 ?        S    11:31   0:00 sshd: kawamura@pt
kawamura 24529  0.0  0.0  5336 1456 pts/3    Ss   11:31   0:00 -bash
satoshi  26239  2.2  0.3 40292 13696 ?       Sl   11:30   0:02 /usr/bin/gnome-te
==

But 'ps --sort start_time' is available.

-Kame


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
  2006-09-05  1:30   ` KAMEZAWA Hiroyuki
@ 2006-09-05  2:26   ` KAMEZAWA Hiroyuki
  2006-09-05  2:54     ` Eric W. Biederman
  2006-09-05  3:07     ` Eric W. Biederman
  2006-09-05  5:26   ` [PATCH] proc: readdir race fix KAMEZAWA Hiroyuki
  2 siblings, 2 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  2:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, akpm, saito.tadashi, ak, oleg, jdelvare

On Mon, 04 Sep 2006 17:13:10 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:
Hi, Hit OOM-Killer, because of memory leak of task struct.

patch is attached.
-Kame

task struct should be put always.

 fs/proc/base.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.18-psfix/fs/proc/base.c
===================================================================
--- linux-2.6.18-psfix.orig/fs/proc/base.c	2006-09-05 10:42:40.000000000 +0900
+++ linux-2.6.18-psfix/fs/proc/base.c	2006-09-05 11:11:42.000000000 +0900
@@ -2193,8 +2193,8 @@
 		filp->f_pos = tgid + TGID_OFFSET;
 		len = snprintf(buf, sizeof(buf), "%d", tgid);
 		ino = fake_ino(tgid, PROC_TGID_INO);
+		put_task_struct(task);
 		if (filldir(dirent, buf, len, filp->f_pos, ino, DT_DIR) < 0) {
-			put_task_struct(task);
 			goto out;
 		}
 	}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-05  2:26   ` KAMEZAWA Hiroyuki
@ 2006-09-05  2:54     ` Eric W. Biederman
  2006-09-05  3:07     ` Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-05  2:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, akpm, saito.tadashi, ak, oleg, jdelvare

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> On Mon, 04 Sep 2006 17:13:10 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
> Hi, Hit OOM-Killer, because of memory leak of task struct.
>
> patch is attached.
> -Kame
>
> task struct should be put always.

Good catch. I actually have a pending cleanup that works better if the
task struct is held across the filldir so the incremental patch should look like:

I also noticed a benign typo in TGID_OFFSET (1 subtract one that I shouldn't)

Complete patch in just a moment.

Eric

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b7650b9..03f6680 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2186,7 +2186,7 @@ int proc_pid_readdir(struct file * filp,
        tgid = filp->f_pos - TGID_OFFSET;
        for (task = next_tgid(tgid);
             task;
-            task = next_tgid(tgid + 1)) {
+            put_task_struct(task), task = next_tgid(tgid + 1)) {
                int len;
                ino_t ino;
                tgid = task->pid;


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH] proc: readdir race fix
  2006-09-05  2:26   ` KAMEZAWA Hiroyuki
  2006-09-05  2:54     ` Eric W. Biederman
@ 2006-09-05  3:07     ` Eric W. Biederman
  2006-09-05  5:39       ` KAMEZAWA Hiroyuki
  2006-09-05 10:10       ` Oleg Nesterov
  1 sibling, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-05  3:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-kernel, akpm, ak, oleg, jdelvare, Albert Cahalan,
	Paul Jackson


The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of
system calls.  For this race to trigger the process that proc_pid_readdir
stops at must exit before readdir is called again.

This can cause ps to fail to report processes, and  it is in violation
of posix guarantees and normal application expectations with respect
to readdir.

Currently there is no way to work around this problem in user space
short of providing a gargantuan buffer to user space so the directory
read all happens in on system call.

This patch implements the normal directory semantics for proc,
that guarantee that a directory entry that is neither created nor
destroyed while reading the directory entry will be returned.  For
directory that are either created or destroyed during the readdir you
may or may not see them.  Furthermore you may seek to a directory
offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires,
and more importantly that user space expects. Plus it is a simple
semantic to implement reliable service.  It is just a matter of
calling readdir a second time if you are wondering if something new
has show up.

These better semantics are implemented by scanning through the
pids in numerical order and by making the file offset a pid
plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have
to look at to scan a linked list, and the worst case of having to scan
the entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less
code than what we are doing now.

This patch is against 2.6.18-rc6 and it should be relatively straight
forward to backport to older kernels as well.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/base.c      |   93 +++++++++++++--------------------------------------
 include/linux/pid.h |    1 +
 kernel/pid.c        |   37 ++++++++++++++++++++
 3 files changed, 62 insertions(+), 69 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index fe8d55f..03f6680 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2141,72 +2141,32 @@ out_no_task:
 }
 
 /*
- * Find the first tgid to return to user space.
+ * Find the first task with tgid >= tgid
  *
- * Usually this is just whatever follows &init_task, but if the users
- * buffer was too small to hold the full list or there was a seek into
- * the middle of the directory we have more work to do.
- *
- * In the case of a short read we start with find_task_by_pid.
- *
- * In the case of a seek we start with &init_task and walk nr
- * threads past it.
  */
-static struct task_struct *first_tgid(int tgid, unsigned int nr)
+static struct task_struct *next_tgid(unsigned int tgid)
 {
-	struct task_struct *pos;
-	rcu_read_lock();
-	if (tgid && nr) {
-		pos = find_task_by_pid(tgid);
-		if (pos && thread_group_leader(pos))
-			goto found;
-	}
-	/* If nr exceeds the number of processes get out quickly */
-	pos = NULL;
-	if (nr && nr >= nr_processes())
-		goto done;
-
-	/* If we haven't found our starting place yet start with
-	 * the init_task and walk nr tasks forward.
-	 */
-	for (pos = next_task(&init_task); nr > 0; --nr) {
-		pos = next_task(pos);
-		if (pos == &init_task) {
-			pos = NULL;
-			goto done;
-		}
-	}
-found:
-	get_task_struct(pos);
-done:
-	rcu_read_unlock();
-	return pos;
-}
+	struct task_struct *task;
+	struct pid *pid;
 
-/*
- * Find the next task in the task list.
- * Return NULL if we loop or there is any error.
- *
- * The reference to the input task_struct is released.
- */
-static struct task_struct *next_tgid(struct task_struct *start)
-{
-	struct task_struct *pos;
+	task = NULL;
 	rcu_read_lock();
-	pos = start;
-	if (pid_alive(start))
-		pos = next_task(start);
-	if (pid_alive(pos) && (pos != &init_task)) {
-		get_task_struct(pos);
-		goto done;
+retry:
+	pid = find_next_pid(tgid);
+	if (pid) {
+		tgid = pid->nr + 1;
+		task = pid_task(pid, PIDTYPE_PID);
+		if (!task || !thread_group_leader(task))
+			goto retry;
+		get_task_struct(task);
 	}
-	pos = NULL;
-done:
 	rcu_read_unlock();
-	put_task_struct(start);
-	return pos;
+	return task;
+	
 }
 
+#define TGID_OFFSET (FIRST_PROCESS_ENTRY + (1 /* /proc/self */))
+
 /* for the /proc/ directory itself, after non-process stuff has been done */
 int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
@@ -2222,29 +2182,24 @@ int proc_pid_readdir(struct file * filp,
 		filp->f_pos++;
 		nr++;
 	}
-	nr -= 1;
 
-	/* f_version caches the tgid value that the last readdir call couldn't
-	 * return. lseek aka telldir automagically resets f_version to 0.
-	 */
-	tgid = filp->f_version;
-	filp->f_version = 0;
-	for (task = first_tgid(tgid, nr);
+	tgid = filp->f_pos - TGID_OFFSET;
+	for (task = next_tgid(tgid);
 	     task;
-	     task = next_tgid(task), filp->f_pos++) {
+	     put_task_struct(task), task = next_tgid(tgid + 1)) {
 		int len;
 		ino_t ino;
 		tgid = task->pid;
+		filp->f_pos = tgid + TGID_OFFSET;
 		len = snprintf(buf, sizeof(buf), "%d", tgid);
 		ino = fake_ino(tgid, PROC_TGID_INO);
 		if (filldir(dirent, buf, len, filp->f_pos, ino, DT_DIR) < 0) {
-			/* returning this tgid failed, save it as the first
-			 * pid for the next readir call */
-			filp->f_version = tgid;
 			put_task_struct(task);
-			break;
+			goto out;
 		}
 	}
+	filp->f_pos = PID_MAX_LIMIT + TGID_OFFSET;
+out:
 	return 0;
 }
 
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 29960b0..a24db52 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -87,6 +87,7 @@ extern struct pid *FASTCALL(find_pid(int
  * Lookup a PID in the hash table, and return with it's count elevated.
  */
 extern struct pid *find_get_pid(int nr);
+extern struct pid *find_next_pid(int nr);
 
 extern struct pid *alloc_pid(void);
 extern void FASTCALL(free_pid(struct pid *pid));
diff --git a/kernel/pid.c b/kernel/pid.c
index 93e212f..53d4159 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -145,6 +145,23 @@ static int alloc_pidmap(void)
 	return -1;
 }
 
+static int next_pidmap(int last)
+{
+	int offset;
+	pidmap_t *map;
+
+	offset = (last + 1) & BITS_PER_PAGE_MASK;
+	map = &pidmap_array[(last + 1)/BITS_PER_PAGE];
+	for (; map < &pidmap_array[PIDMAP_ENTRIES]; map++, offset = 0) {
+		if (unlikely(!map->page))
+			continue;
+		offset = find_next_bit((map)->page, BITS_PER_PAGE, offset);
+		if (offset < BITS_PER_PAGE) 
+			return mk_pid(map, offset);
+	}
+	return -1;
+}
+
 fastcall void put_pid(struct pid *pid)
 {
 	if (!pid)
@@ -297,6 +314,26 @@ struct pid *find_get_pid(pid_t nr)
 }
 
 /*
+ * Used by proc to find the pid with the first
+ * pid that is greater than or equal to number.
+ *
+ * If there is a pid at nr this function is exactly the same as find_pid.
+ */
+struct pid *find_next_pid(int nr)
+{
+	struct pid *next;
+
+	next = find_pid(nr);
+	while (!next) {
+		nr = next_pidmap(nr);
+		if (nr <= 0)
+			break;
+		next = find_pid(nr);
+	}
+	return next;
+}
+
+/*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
  * more.
-- 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix
  2006-09-05  3:07     ` Eric W. Biederman
@ 2006-09-05  5:39       ` KAMEZAWA Hiroyuki
  2006-09-05 10:10       ` Oleg Nesterov
  1 sibling, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  5:39 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, akpm, ak, oleg, jdelvare, acahalan, pj

On Mon, 04 Sep 2006 21:07:29 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
> providing the first fix, pointing this out and working on it.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

worked well with my (short) test set. and looks good.
Thank you.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix
  2006-09-05  3:07     ` Eric W. Biederman
  2006-09-05  5:39       ` KAMEZAWA Hiroyuki
@ 2006-09-05 10:10       ` Oleg Nesterov
  2006-09-05 11:36         ` Eric W. Biederman
  2006-09-05 14:52         ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
  1 sibling, 2 replies; 25+ messages in thread
From: Oleg Nesterov @ 2006-09-05 10:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: KAMEZAWA Hiroyuki, linux-kernel, akpm, ak, jdelvare,
	Albert Cahalan, Paul Jackson

On 09/04, Eric W. Biederman wrote:
> 
> -static struct task_struct *next_tgid(struct task_struct *start)
> -{
> -	struct task_struct *pos;
> +	task = NULL;
>  	rcu_read_lock();
> -	pos = start;
> -	if (pid_alive(start))
> -		pos = next_task(start);
> -	if (pid_alive(pos) && (pos != &init_task)) {
> -		get_task_struct(pos);
> -		goto done;
> +retry:
> +	pid = find_next_pid(tgid);
> +	if (pid) {
> +		tgid = pid->nr + 1;
> +		task = pid_task(pid, PIDTYPE_PID);
> +		if (!task || !thread_group_leader(task))
                              ^^^^^^^^^^^^^^^^^^^
There is a window while de_thread() switches leadership, so next_tgid()
may skip a task doing exec. What do you think about

                             // needs a comment
		if (!task || task->pid != task->tgid)
			goto retry;

instead? Currently first_tgid() has the same (very minor) problem.

> +			goto retry;
> +		get_task_struct(task);
>  	}
> -	pos = NULL;
> -done:
>  	rcu_read_unlock();
> -	put_task_struct(start);
> -	return pos;
> +	return task;
> +
>  }

Emply line before '}'

> +struct pid *find_next_pid(int nr)
> +{
> +	struct pid *next;
> +
> +	next = find_pid(nr);
> +	while (!next) {
> +		nr = next_pidmap(nr);
> +		if (nr <= 0)
> +			break;
> +		next = find_pid(nr);
> +	}
> +	return next;
> +}

This is strange that we are doing find_pid() before and at the end of loop,
I'd suggest this code:

	struct pid *find_next_pid(int nr)
	{
		struct pid *pid;

		do {
			pid = find_pid(nr);
			if (pid != NULL)
				break;
			nr = next_pidmap(nr);
		} while (nr > 0);

		return pid;
	}

Imho, a bit easier to read.

Oleg.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix
  2006-09-05 10:10       ` Oleg Nesterov
@ 2006-09-05 11:36         ` Eric W. Biederman
  2006-09-05 14:52         ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-05 11:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: KAMEZAWA Hiroyuki, linux-kernel, akpm, ak, jdelvare,
	Albert Cahalan, Paul Jackson

Oleg Nesterov <oleg@tv-sign.ru> writes:

> On 09/04, Eric W. Biederman wrote:
>> 
>> -static struct task_struct *next_tgid(struct task_struct *start)
>> -{
>> -	struct task_struct *pos;
>> +	task = NULL;
>>  	rcu_read_lock();
>> -	pos = start;
>> -	if (pid_alive(start))
>> -		pos = next_task(start);
>> -	if (pid_alive(pos) && (pos != &init_task)) {
>> -		get_task_struct(pos);
>> -		goto done;
>> +retry:
>> +	pid = find_next_pid(tgid);
>> +	if (pid) {
>> +		tgid = pid->nr + 1;
>> +		task = pid_task(pid, PIDTYPE_PID);
>> +		if (!task || !thread_group_leader(task))
>                               ^^^^^^^^^^^^^^^^^^^
> There is a window while de_thread() switches leadership, so next_tgid()
> may skip a task doing exec. What do you think about
>
>                              // needs a comment
> 		if (!task || task->pid != task->tgid)
> 			goto retry;
>
> instead? Currently first_tgid() has the same (very minor) problem.

I see the problem, and your test will certainly alleviate the symptom.
You are making the test has this process ever been a thread group leader.

I guess alleviating the symptom is all that is necessary there.

Grumble.  I hate that entire pid transfer case, too bad glibc cares.

If I could in the fix for this I would like to add a clean concept
that we are testing for wrapped in a helper function.  Otherwise
even with a big fat comment this will be easy to break next time
someone refactors the code.


>> +			goto retry;
>> +		get_task_struct(task);
>>  	}
>> -	pos = NULL;
>> -done:
>>  	rcu_read_unlock();
>> -	put_task_struct(start);
>> -	return pos;
>> +	return task;
>> +
>>  }
>
> Emply line before '}'
>
>> +struct pid *find_next_pid(int nr)
>> +{
>> +	struct pid *next;
>> +
>> +	next = find_pid(nr);
>> +	while (!next) {
>> +		nr = next_pidmap(nr);
>> +		if (nr <= 0)
>> +			break;
>> +		next = find_pid(nr);
>> +	}
>> +	return next;
>> +}
>
> This is strange that we are doing find_pid() before and at the end of loop,
> I'd suggest this code:
>
> 	struct pid *find_next_pid(int nr)
> 	{
> 		struct pid *pid;
>
> 		do {
> 			pid = find_pid(nr);
> 			if (pid != NULL)
> 				break;
> 			nr = next_pidmap(nr);
> 		} while (nr > 0);
>
> 		return pid;
> 	}
>
> Imho, a bit easier to read.
It is at least not worse, so it is probably worth doing.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] proc: readdir race fix (take 3)
  2006-09-05 10:10       ` Oleg Nesterov
  2006-09-05 11:36         ` Eric W. Biederman
@ 2006-09-05 14:52         ` Eric W. Biederman
  2006-09-06  9:01           ` Jean Delvare
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-05 14:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak, jdelvare,
	Albert Cahalan, Paul Jackson


The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of
system calls.  For this race to trigger the process that proc_pid_readdir
stops at must exit before readdir is called again.

This can cause ps to fail to report processes, and  it is in violation
of posix guarantees and normal application expectations with respect
to readdir.

Currently there is no way to work around this problem in user space
short of providing a gargantuan buffer to user space so the directory
read all happens in on system call.

This patch implements the normal directory semantics for proc,
that guarantee that a directory entry that is neither created nor
destroyed while reading the directory entry will be returned.  For
directory that are either created or destroyed during the readdir you
may or may not see them.  Furthermore you may seek to a directory
offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires,
and more importantly that user space expects. Plus it is a simple
semantic to implement reliable service.  It is just a matter of
calling readdir a second time if you are wondering if something new
has show up.

These better semantics are implemented by scanning through the
pids in numerical order and by making the file offset a pid
plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have
to look at to scan a linked list, and the worst case of having to scan
the entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less
code than what we are doing now.

Also another very subtle bug in this area has been fixed.  It is
possible to catch a task in the middle of de_thread where a thread is
assuming the thread of it's thread group leader.  This patch carefully
handles that case so if we hit it we don't fail to return the pid, that
is undergoing the de_thread dance.

This patch is against 2.6.18-rc6 and it should be relatively straight
forward to backport to older kernels as well.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/base.c        |  104 ++++++++++++++++---------------------------------
 include/linux/pid.h   |    1 
 include/linux/sched.h |   11 +++++
 kernel/pid.c          |   36 +++++++++++++++++
 4 files changed, 83 insertions(+), 69 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index fe8d55f..28e56c3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2141,72 +2141,43 @@ out_no_task:
 }
 
 /*
- * Find the first tgid to return to user space.
+ * Find the first task with tgid >= tgid
  *
- * Usually this is just whatever follows &init_task, but if the users
- * buffer was too small to hold the full list or there was a seek into
- * the middle of the directory we have more work to do.
- *
- * In the case of a short read we start with find_task_by_pid.
- *
- * In the case of a seek we start with &init_task and walk nr
- * threads past it.
  */
-static struct task_struct *first_tgid(int tgid, unsigned int nr)
+static struct task_struct *next_tgid(unsigned int tgid)
 {
-	struct task_struct *pos;
-	rcu_read_lock();
-	if (tgid && nr) {
-		pos = find_task_by_pid(tgid);
-		if (pos && thread_group_leader(pos))
-			goto found;
-	}
-	/* If nr exceeds the number of processes get out quickly */
-	pos = NULL;
-	if (nr && nr >= nr_processes())
-		goto done;
-
-	/* If we haven't found our starting place yet start with
-	 * the init_task and walk nr tasks forward.
-	 */
-	for (pos = next_task(&init_task); nr > 0; --nr) {
-		pos = next_task(pos);
-		if (pos == &init_task) {
-			pos = NULL;
-			goto done;
-		}
-	}
-found:
-	get_task_struct(pos);
-done:
-	rcu_read_unlock();
-	return pos;
-}
+	struct task_struct *task;
+	struct pid *pid;
 
-/*
- * Find the next task in the task list.
- * Return NULL if we loop or there is any error.
- *
- * The reference to the input task_struct is released.
- */
-static struct task_struct *next_tgid(struct task_struct *start)
-{
-	struct task_struct *pos;
+	task = NULL;
 	rcu_read_lock();
-	pos = start;
-	if (pid_alive(start))
-		pos = next_task(start);
-	if (pid_alive(pos) && (pos != &init_task)) {
-		get_task_struct(pos);
-		goto done;
+retry:
+	pid = find_ge_pid(tgid);
+	if (pid) {
+		tgid = pid->nr + 1;
+		task = pid_task(pid, PIDTYPE_PID);
+		/* What we to know is if the pid we have find is the
+		 * pid of a thread_group_leader.  Testing for task
+		 * being a thread_group_leader is the obvious thing
+		 * todo but there is a window when it fails, due to
+		 * the pid transfer logic in de_thread.
+		 *
+		 * So we perform the straight forward test of seeing
+		 * if the pid we have found is the pid of a thread
+		 * group leader, and don't worry if the task we have
+		 * found doesn't happen to be a thread group leader.
+		 * As we don't care in the case of readdir.
+		 */
+		if (!task || !has_group_leader_pid(task))
+			goto retry;
+		get_task_struct(task);
 	}
-	pos = NULL;
-done:
 	rcu_read_unlock();
-	put_task_struct(start);
-	return pos;
+	return task;
 }
 
+#define TGID_OFFSET (FIRST_PROCESS_ENTRY + (1 /* /proc/self */))
+
 /* for the /proc/ directory itself, after non-process stuff has been done */
 int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
@@ -2222,29 +2193,24 @@ int proc_pid_readdir(struct file * filp,
 		filp->f_pos++;
 		nr++;
 	}
-	nr -= 1;
 
-	/* f_version caches the tgid value that the last readdir call couldn't
-	 * return. lseek aka telldir automagically resets f_version to 0.
-	 */
-	tgid = filp->f_version;
-	filp->f_version = 0;
-	for (task = first_tgid(tgid, nr);
+	tgid = filp->f_pos - TGID_OFFSET;
+	for (task = next_tgid(tgid);
 	     task;
-	     task = next_tgid(task), filp->f_pos++) {
+	     put_task_struct(task), task = next_tgid(tgid + 1)) {
 		int len;
 		ino_t ino;
 		tgid = task->pid;
+		filp->f_pos = tgid + TGID_OFFSET;
 		len = snprintf(buf, sizeof(buf), "%d", tgid);
 		ino = fake_ino(tgid, PROC_TGID_INO);
 		if (filldir(dirent, buf, len, filp->f_pos, ino, DT_DIR) < 0) {
-			/* returning this tgid failed, save it as the first
-			 * pid for the next readir call */
-			filp->f_version = tgid;
 			put_task_struct(task);
-			break;
+			goto out;
 		}
 	}
+	filp->f_pos = PID_MAX_LIMIT + TGID_OFFSET;
+out:
 	return 0;
 }
 
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 29960b0..525af8b 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -87,6 +87,7 @@ extern struct pid *FASTCALL(find_pid(int
  * Lookup a PID in the hash table, and return with it's count elevated.
  */
 extern struct pid *find_get_pid(int nr);
+extern struct pid *find_ge_pid(int nr);
 
 extern struct pid *alloc_pid(void);
 extern void FASTCALL(free_pid(struct pid *pid));
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 34ed0d9..a5dea85 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1332,6 +1332,17 @@ #define while_each_thread(g, t) \
 /* de_thread depends on thread_group_leader not being a pid based check */
 #define thread_group_leader(p)	(p == p->group_leader)
 
+/* Do to the insanities of de_thread it is possible for a process
+ * to have the pid of the thread group leader without actually being
+ * the thread group leader.  For iteration through the pids in proc
+ * all we care about is that we have a task with the appropriate 
+ * pid, we don't actually care if we have the right task.
+ */
+static inline int has_group_leader_pid(struct task_struct *p)
+{
+	return p->pid == p->tgid;
+}
+
 static inline struct task_struct *next_thread(const struct task_struct *p)
 {
 	return list_entry(rcu_dereference(p->thread_group.next),
diff --git a/kernel/pid.c b/kernel/pid.c
index 93e212f..f396796 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -145,6 +145,23 @@ static int alloc_pidmap(void)
 	return -1;
 }
 
+static int next_pidmap(int last)
+{
+	int offset;
+	pidmap_t *map;
+
+	offset = (last + 1) & BITS_PER_PAGE_MASK;
+	map = &pidmap_array[(last + 1)/BITS_PER_PAGE];
+	for (; map < &pidmap_array[PIDMAP_ENTRIES]; map++, offset = 0) {
+		if (unlikely(!map->page))
+			continue;
+		offset = find_next_bit((map)->page, BITS_PER_PAGE, offset);
+		if (offset < BITS_PER_PAGE) 
+			return mk_pid(map, offset);
+	}
+	return -1;
+}
+
 fastcall void put_pid(struct pid *pid)
 {
 	if (!pid)
@@ -297,6 +314,25 @@ struct pid *find_get_pid(pid_t nr)
 }
 
 /*
+ * Used by proc to find the first pid that is greater then or equal to nr.
+ *
+ * If there is a pid at nr this function is exactly the same as find_pid.
+ */
+struct pid *find_ge_pid(int nr)
+{
+	struct pid *pid;
+
+	do {
+		pid = find_pid(nr);
+		if (pid)
+			break;
+		nr = next_pidmap(nr);
+	} while (nr > 0);
+
+	return pid;
+}
+
+/*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
  * more.
-- 



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-05 14:52         ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
@ 2006-09-06  9:01           ` Jean Delvare
  2006-09-06 21:12             ` Jean Delvare
  0 siblings, 1 reply; 25+ messages in thread
From: Jean Delvare @ 2006-09-06  9:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak,
	Albert Cahalan, Paul Jackson

On Tuesday 5 September 2006 16:52, Eric W. Biederman wrote:
> The problem: An opendir, readdir, closedir sequence can fail to report
> process ids that are continually in use throughout the sequence of
> system calls.  For this race to trigger the process that
> proc_pid_readdir stops at must exit before readdir is called again.
>
> This can cause ps to fail to report processes, and  it is in violation
> of posix guarantees and normal application expectations with respect
> to readdir.
>
> Currently there is no way to work around this problem in user space
> short of providing a gargantuan buffer to user space so the directory
> read all happens in on system call.
>
> This patch implements the normal directory semantics for proc,
> that guarantee that a directory entry that is neither created nor
> destroyed while reading the directory entry will be returned.  For
> directory that are either created or destroyed during the readdir you
> may or may not see them.  Furthermore you may seek to a directory
> offset you have previously seen.
>
> These are the guarantee that ext[23] provides and that posix requires,
> and more importantly that user space expects. Plus it is a simple
> semantic to implement reliable service.  It is just a matter of
> calling readdir a second time if you are wondering if something new
> has show up.
>
> These better semantics are implemented by scanning through the
> pids in numerical order and by making the file offset a pid
> plus a fixed offset.
>
> The pid scan happens on the pid bitmap, which when you look at it is
> remarkably efficient for a brute force algorithm.  Given that a typical
> cache line is 64 bytes and thus covers space for 64*8 == 200 pids. 
> There are only 40 cache lines for the entire 32K pid space.  A typical
> system will have 100 pids or more so this is actually fewer cache lines
> we have to look at to scan a linked list, and the worst case of having
> to scan the entire pid bitmap is pretty reasonable.
>
> If we need something more efficient we can go to a more efficient data
> structure for indexing the pids, but for now what we have should be
> sufficient.
>
> In addition this takes no additional locks and is actually less
> code than what we are doing now.
>
> Also another very subtle bug in this area has been fixed.  It is
> possible to catch a task in the middle of de_thread where a thread is
> assuming the thread of it's thread group leader.  This patch carefully
> handles that case so if we hit it we don't fail to return the pid, that
> is undergoing the de_thread dance.
>
> This patch is against 2.6.18-rc6 and it should be relatively straight
> forward to backport to older kernels as well.
>
> Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
> providing the first fix, pointing this out and working on it.

Eric, Kame, thanks a lot for working on this. I'll be giving some good 
testing to this patch today, and will return back to you when I'm done.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-06  9:01           ` Jean Delvare
@ 2006-09-06 21:12             ` Jean Delvare
  2006-09-06 22:25               ` Oleg Nesterov
  2006-09-06 22:43               ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
  0 siblings, 2 replies; 25+ messages in thread
From: Jean Delvare @ 2006-09-06 21:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
> Eric, Kame, thanks a lot for working on this. I'll be giving some good
> testing to this patch today, and will return back to you when I'm done.

The original issue is indeed fixed, but there's a problem with the patch. 
When stressing /proc (to verify the bug was fixed), my test machine ended 
up crashing. Here are the 2 traces I found in the logs:

Sep  6 12:06:00 arrakis kernel: BUG: warning at 
kernel/fork.c:113/__put_task_struct()
Sep  6 12:06:00 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
Sep  6 12:06:00 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
Sep  6 12:06:00 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:00 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:00 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 12:06:05 arrakis kernel: BUG: warning at 
kernel/fork.c:113/__put_task_struct()
Sep  6 12:06:05 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
Sep  6 12:06:05 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 12:06:05 arrakis kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 00000004
Sep  6 12:06:05 arrakis kernel:  printing eip:
Sep  6 12:06:05 arrakis kernel: c0115ede
Sep  6 12:06:05 arrakis kernel: *pde = 00000000
Sep  6 12:06:05 arrakis kernel: Oops: 0002 [#1]
Sep  6 12:06:05 arrakis kernel: PREEMPT 
Sep  6 12:06:05 arrakis kernel: Modules linked in: button battery ac 
thermal processor cpufreq_powersave speedstep_ich speedstep_lib 
freq_table snd_pcm_oss snd_mixer_oss smbfs sg sd_mod nls_iso8859_1 
nls_cp437 vfat fat radeon drm usb_storage scsi_mod eth1394 usbhid 
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer ide_cd 
uhci_hcd snd intel_rng psmouse cdrom usbcore evdev soundcore ohci1394 
intel_agp i2c_i801 snd_page_alloc ieee1394 e100 mii unix
Sep  6 12:06:05 arrakis kernel: CPU:    0
Sep  6 12:06:05 arrakis kernel: EIP:    0060:[<c0115ede>]    Not tainted 
VLI
Sep  6 12:06:05 arrakis kernel: EFLAGS: 00010282   (2.6.18-rc6 #113) 
Sep  6 12:06:05 arrakis kernel: EIP is at __put_task_struct+0x3e/0x100
Sep  6 12:06:05 arrakis kernel: eax: 00000000   ebx: c7ba7580   ecx: 
c0316bd4   edx: 00000000
Sep  6 12:06:05 arrakis kernel: esi: 00000000   edi: cc4b2d40   ebp: 
c953ef48   esp: c953ef14
Sep  6 12:06:05 arrakis kernel: ds: 007b   es: 007b   ss: 0068
Sep  6 12:06:05 arrakis kernel: Process bug (pid: 28719, ti=c953e000 
task=c953da90 task.ti=c953e000)
Sep  6 12:06:05 arrakis kernel: Stack: 00000000 c02ddbf5 00000071 c02c8401 
c7ba7580 c019666a c7ba7580 c953ef48 
Sep  6 12:06:05 arrakis kernel:        00000001 00000101 00000000 00000002 
00000004 39350030 cc4b0038 c953ef98 
Sep  6 12:06:05 arrakis kernel:        c0174750 cc4b2d40 c137bdf0 fffffffe 
c137be60 c01745f0 cc4b2d40 c953ef98 
Sep  6 12:06:05 arrakis kernel: Call Trace:
Sep  6 12:06:05 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 12:06:05 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 12:06:05 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 12:06:05 arrakis kernel: Code: 84 af 00 00 00 8b 43 08 85 c0 75 77 
89 e0 25 00 f0 ff ff 3b 18 74 3e 8b 83 84 01 00 00 89 04 24 e8 68 e0 00 
00 8b 83 70 01 00 00 <ff> 48 04 0f 94 c2 84 d2 75 10 89 5c 24 18 8b 5c 24 
10 83 c4 14 
Sep  6 12:06:05 arrakis kernel: EIP: [<c0115ede>] 
__put_task_struct+0x3e/0x100 SS:ESP 0068:c953ef14


Sep  6 19:38:23 arrakis kernel: BUG: warning at 
kernel/fork.c:113/__put_task_struct()
Sep  6 19:38:23 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
Sep  6 19:38:23 arrakis kernel:  [<c019667e>] proc_pid_readdir+0x14e/0x150
Sep  6 19:38:23 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:23 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 19:38:23 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:23 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 19:38:23 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:23 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 19:38:26 arrakis kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 0000000c
Sep  6 19:38:26 arrakis kernel:  printing eip:
Sep  6 19:38:26 arrakis kernel: c01d2a69
Sep  6 19:38:26 arrakis kernel: *pde = 00000000
Sep  6 19:38:26 arrakis kernel: Oops: 0000 [#1]
Sep  6 19:38:26 arrakis kernel: PREEMPT 
Sep  6 19:38:26 arrakis kernel: Modules linked in: button battery ac 
thermal processor cpufreq_powersave speedstep_ich speedstep_lib 
freq_table snd_pcm_oss snd_mixer_oss smbfs nls_iso8859_1 nls_cp437 vfat 
fat radeon drm sg sd_mod usb_storage scsi_mod eth1394 usbhid snd_intel8x0 
snd_ac97_codec snd_ac97_bus snd_pcm snd_timer intel_rng uhci_hcd snd 
ide_cd intel_agp e100 usbcore i2c_i801 cdrom ohci1394 psmouse soundcore 
evdev mii ieee1394 snd_page_alloc unix
Sep  6 19:38:26 arrakis kernel: CPU:    0
Sep  6 19:38:26 arrakis kernel: EIP:    0060:[<c01d2a69>]    Not tainted 
VLI
Sep  6 19:38:26 arrakis kernel: EFLAGS: 00010017   (2.6.18-rc6 #115) 
Sep  6 19:38:26 arrakis kernel: EIP is at rwsem_wake+0x99/0x140
Sep  6 19:38:26 arrakis kernel: eax: 00000000   ebx: d6e483b4   ecx: 
cd522f64   edx: 00000001
Sep  6 19:38:26 arrakis kernel: esi: cd522f64   edi: b393a000   ebp: 
d6e48380   esp: d34dff58
Sep  6 19:38:26 arrakis kernel: ds: 007b   es: 007b   ss: 0068
Sep  6 19:38:26 arrakis kernel: Process firefox-bin (pid: 2400, 
ti=d34df000 task=d6e6c580 task.ti=d34df000)
Sep  6 19:38:26 arrakis kernel: Stack: cf2d94e8 d6e483b8 00000292 cd784df4 
d6e483b4 b393a000 d6e48380 c013219e 
Sep  6 19:38:26 arrakis kernel:        00000025 c0112b99 d6e483b4 cd784df4 
b393a000 00000000 00030002 00000000 
Sep  6 19:38:26 arrakis kernel:        b393a000 d6e6c580 00000004 d34dffbc 
b77d4b94 09473ed0 c0112810 bfa3ea00 
Sep  6 19:38:26 arrakis kernel: Call Trace:
Sep  6 19:38:26 arrakis kernel:  [<c013219e>] .text.lock.rwsem+0x21/0x43
Sep  6 19:38:26 arrakis kernel:  [<c0112b99>] do_page_fault+0x389/0x580
Sep  6 19:38:26 arrakis kernel:  [<c0112810>] do_page_fault+0x0/0x580
Sep  6 19:38:26 arrakis kernel:  [<c01039f1>] error_code+0x39/0x40
Sep  6 19:38:26 arrakis kernel: Code: c3 e8 1c 0b 0f 00 eb ef 8b 4b 04 89 
ce f6 41 0c 02 75 7c 31 d2 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 01 
42 3b 44 24 04 74 08 <f6> 40 0c 01 89 c1 75 ef 89 d0 89 d5 c1 e0 10 8d 54 
02 ff 01 13 
Sep  6 19:38:26 arrakis kernel: EIP: [<c01d2a69>] rwsem_wake+0x99/0x140 
SS:ESP 0068:d34dff58
Sep  6 19:38:26 arrakis kernel:  <6>note: firefox-bin[2400] exited with 
preempt_count 1
Sep  6 19:38:26 arrakis kernel: BUG: scheduling while atomic: 
firefox-bin/0x00000001/2400
Sep  6 19:38:26 arrakis kernel:  [<c02c3552>] schedule+0x662/0x670
Sep  6 19:38:26 arrakis kernel:  [<c02c4ebd>] 
rwsem_down_read_failed+0xbd/0x1a0
Sep  6 19:38:26 arrakis kernel:  [<c0237435>] vt_console_print+0x85/0x2a0
Sep  6 19:38:26 arrakis kernel:  [<c0132184>] .text.lock.rwsem+0x7/0x43
Sep  6 19:38:26 arrakis kernel:  [<c0133585>] futex_wake+0x25/0xf0
Sep  6 19:38:26 arrakis kernel:  [<c0135210>] do_futex+0x70/0x110
Sep  6 19:38:26 arrakis kernel:  [<c0119a34>] 
release_console_sem+0x104/0x110
Sep  6 19:38:26 arrakis kernel:  [<c013530c>] sys_futex+0x5c/0x130
Sep  6 19:38:26 arrakis kernel:  [<c011633b>] mm_release+0x8b/0x90
Sep  6 19:38:26 arrakis kernel:  [<c011b0e5>] exit_mm+0x25/0x110
Sep  6 19:38:26 arrakis kernel:  [<c011b999>] do_exit+0xf9/0x530
Sep  6 19:38:26 arrakis kernel:  [<c01042cc>] die+0x20c/0x210
Sep  6 19:38:26 arrakis kernel:  [<c0112ab4>] do_page_fault+0x2a4/0x580
Sep  6 19:38:26 arrakis kernel:  [<c0112810>] do_page_fault+0x0/0x580
Sep  6 19:38:26 arrakis kernel:  [<c01039f1>] error_code+0x39/0x40
Sep  6 19:38:26 arrakis kernel:  [<c01d2a69>] rwsem_wake+0x99/0x140
Sep  6 19:38:26 arrakis kernel:  [<c013219e>] .text.lock.rwsem+0x21/0x43
Sep  6 19:38:26 arrakis kernel:  [<c0112b99>] do_page_fault+0x389/0x580
Sep  6 19:38:26 arrakis kernel:  [<c0112810>] do_page_fault+0x0/0x580
Sep  6 19:38:26 arrakis kernel:  [<c01039f1>] error_code+0x39/0x40
Sep  6 19:38:27 arrakis kernel: BUG: warning at 
kernel/fork.c:113/__put_task_struct()
Sep  6 19:38:27 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
Sep  6 19:38:27 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 19:38:27 arrakis kernel: BUG: unable to handle kernel paging 
request at virtual address 61672d78
Sep  6 19:38:27 arrakis kernel:  printing eip:
Sep  6 19:38:27 arrakis kernel: c01cf5ff
Sep  6 19:38:27 arrakis kernel: *pde = 00000000
Sep  6 19:38:27 arrakis kernel: Oops: 0002 [#2]
Sep  6 19:38:27 arrakis kernel: PREEMPT 
Sep  6 19:38:27 arrakis kernel: Modules linked in: button battery ac 
thermal processor cpufreq_powersave speedstep_ich speedstep_lib 
freq_table snd_pcm_oss snd_mixer_oss smbfs nls_iso8859_1 nls_cp437 vfat 
fat radeon drm sg sd_mod usb_storage scsi_mod eth1394 usbhid snd_intel8x0 
snd_ac97_codec snd_ac97_bus snd_pcm snd_timer intel_rng uhci_hcd snd 
ide_cd intel_agp e100 usbcore i2c_i801 cdrom ohci1394 psmouse soundcore 
evdev mii ieee1394 snd_page_alloc unix
Sep  6 19:38:27 arrakis kernel: CPU:    0
Sep  6 19:38:27 arrakis kernel: EIP:    0060:[<c01cf5ff>]    Not tainted 
VLI
Sep  6 19:38:27 arrakis kernel: EFLAGS: 00010013   (2.6.18-rc6 #115) 
Sep  6 19:38:27 arrakis kernel: EIP is at _atomic_dec_and_lock+0xf/0x50
Sep  6 19:38:27 arrakis kernel: eax: cd772000   ebx: 61672d78   ecx: 
00000000   edx: 00000001
Sep  6 19:38:27 arrakis kernel: esi: 61672d78   edi: cd1f84e0   ebp: 
cd772f48   esp: cd772ef8
Sep  6 19:38:27 arrakis kernel: ds: 007b   es: 007b   ss: 0068
Sep  6 19:38:27 arrakis kernel: Process bug (pid: 10459, ti=cd772000 
task=cd75a030 task.ti=cd772000)
Sep  6 19:38:27 arrakis kernel: Stack: 00000206 c0123f67 61672d78 c03af980 
cd521ab0 00000000 c0115ed8 61672d78 
Sep  6 19:38:27 arrakis kernel:        c02ddbd5 00000071 c02c8401 cd521ab0 
c019666a cd521ab0 cd772f48 00000001 
Sep  6 19:38:27 arrakis kernel:        00000101 00000000 00000002 00000004 
35310030 cd1f8400 cd772f98 c0174750 
Sep  6 19:38:27 arrakis kernel: Call Trace:
Sep  6 19:38:27 arrakis kernel:  [<c0123f67>] free_uid+0x27/0xa0
Sep  6 19:38:27 arrakis kernel:  [<c0115ed8>] __put_task_struct+0x38/0x100
Sep  6 19:38:27 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
Sep  6 19:38:27 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
Sep  6 19:38:27 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
Sep  6 19:38:27 arrakis kernel: Code: 01 c1 e2 0a 89 06 8b 5c 24 0c 89 d0 
89 ca 8b 74 24 10 83 c4 14 c3 90 90 90 90 90 90 53 b8 01 00 00 00 8b 5c 
24 08 e8 21 52 f4 ff <ff> 0b 0f 94 c0 84 c0 ba 01 00 00 00 74 04 5b 89 d0 
c3 b8 01 00 
Sep  6 19:38:27 arrakis kernel: EIP: [<c01cf5ff>] 
_atomic_dec_and_lock+0xf/0x50 SS:ESP 0068:cd772ef8
Sep  6 19:38:27 arrakis kernel:  <6>note: bug[10459] exited with 
preempt_count 1

Sometimes the machine just hung, with nothing in the logs. The machine is 
a Sony laptop (i686).

I have been testing the patch on another machine (x86_64) and had no 
problem at all, so the reproduceability of the bug might depend on the 
arch or some config option. I'll help nailing down this issue if I can, 
just tell me what to do.

Thanks,
-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-06 21:12             ` Jean Delvare
@ 2006-09-06 22:25               ` Oleg Nesterov
  2006-09-06 22:38                 ` [PATCH] proc-readdir-race-fix-take-3-fix-3 Oleg Nesterov
  2006-09-06 22:43               ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
  1 sibling, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2006-09-06 22:25 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Eric W. Biederman, Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel,
	ak

On 09/06, Jean Delvare wrote:
>
> On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
> > Eric, Kame, thanks a lot for working on this. I'll be giving some good
> > testing to this patch today, and will return back to you when I'm done.
> 
> The original issue is indeed fixed, but there's a problem with the patch. 
> When stressing /proc (to verify the bug was fixed), my test machine ended 
> up crashing. Here are the 2 traces I found in the logs:
> 
> Sep  6 12:06:00 arrakis kernel: BUG: warning at 
> kernel/fork.c:113/__put_task_struct()
> Sep  6 12:06:00 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
> Sep  6 12:06:00 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
> Sep  6 12:06:00 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
> Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
> Sep  6 12:06:00 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
> Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
> Sep  6 12:06:00 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb

I think there is a bug in next_tgid(),

> -static struct task_struct *next_tgid(struct task_struct *start)
> -{
> -	struct task_struct *pos;
> +	task = NULL;
>  	rcu_read_lock();
> -	pos = start;
> -	if (pid_alive(start))
> -		pos = next_task(start);
> -	if (pid_alive(pos) &amp;&amp; (pos != &amp;init_task)) {
> -		get_task_struct(pos);
> -		goto done;
> +retry:
> +	pid = find_ge_pid(tgid);
> +	if (pid) {
> +		tgid = pid->nr + 1;
> +		task = pid_task(pid, PIDTYPE_PID);
> +		/* What we to know is if the pid we have find is the
> +		 * pid of a thread_group_leader.  Testing for task
> +		 * being a thread_group_leader is the obvious thing
> +		 * todo but there is a window when it fails, due to
> +		 * the pid transfer logic in de_thread.
> +		 *
> +		 * So we perform the straight forward test of seeing
> +		 * if the pid we have found is the pid of a thread
> +		 * group leader, and don't worry if the task we have
> +		 * found doesn't happen to be a thread group leader.
> +		 * As we don't care in the case of readdir.
> +		 */
> +		if (!task || !has_group_leader_pid(task))
> +			goto retry;
> +		get_task_struct(task);
>  	}
> -	pos = NULL;
> -done:
>  	rcu_read_unlock();
> -	put_task_struct(start);
> -	return pos;
> +	return task;
>  }

If the task found is not a group leader, we go to retry, but
the task != NULL.

Now, if find_ge_pid(tgid) returns NULL, we return that wrong
task, and it was not get_task_struct()'ed.

Oleg.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] proc-readdir-race-fix-take-3-fix-3
  2006-09-06 22:25               ` Oleg Nesterov
@ 2006-09-06 22:38                 ` Oleg Nesterov
  2006-09-06 22:59                   ` Eric W. Biederman
  2006-09-08  6:38                   ` Eric W. Biederman
  0 siblings, 2 replies; 25+ messages in thread
From: Oleg Nesterov @ 2006-09-06 22:38 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Eric W. Biederman, Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel,
	ak

On 09/07, Oleg Nesterov wrote:
>
> On 09/06, Jean Delvare wrote:
> >
> > On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
> > > Eric, Kame, thanks a lot for working on this. I'll be giving some good
> > > testing to this patch today, and will return back to you when I'm done.
> > 
> > The original issue is indeed fixed, but there's a problem with the patch. 
> > When stressing /proc (to verify the bug was fixed), my test machine ended 
> > up crashing. Here are the 2 traces I found in the logs:
> > 
> > Sep  6 12:06:00 arrakis kernel: BUG: warning at 
> > kernel/fork.c:113/__put_task_struct()
> > Sep  6 12:06:00 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
> > Sep  6 12:06:00 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
> > Sep  6 12:06:00 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
> > Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
> > Sep  6 12:06:00 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
> > Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
> > Sep  6 12:06:00 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
> 
> If the task found is not a group leader, we go to retry, but
> the task != NULL.
> 
> Now, if find_ge_pid(tgid) returns NULL, we return that wrong
> task, and it was not get_task_struct()'ed.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- t/fs/proc/base.c~	2006-09-07 02:33:26.000000000 +0400
+++ t/fs/proc/base.c	2006-09-07 02:34:19.000000000 +0400
@@ -2149,9 +2149,9 @@ static struct task_struct *next_tgid(uns
 	struct task_struct *task;
 	struct pid *pid;
 
-	task = NULL;
 	rcu_read_lock();
 retry:
+	task = NULL;
 	pid = find_ge_pid(tgid);
 	if (pid) {
 		tgid = pid->nr + 1;


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc-readdir-race-fix-take-3-fix-3
  2006-09-06 22:38                 ` [PATCH] proc-readdir-race-fix-take-3-fix-3 Oleg Nesterov
@ 2006-09-06 22:59                   ` Eric W. Biederman
  2006-09-08  6:38                   ` Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-06 22:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jean Delvare, Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel, ak

Oleg Nesterov <oleg@tv-sign.ru> writes:

> On 09/07, Oleg Nesterov wrote:
>>
>> On 09/06, Jean Delvare wrote:
>> >
>> > On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
>> > > Eric, Kame, thanks a lot for working on this. I'll be giving some good
>> > > testing to this patch today, and will return back to you when I'm done.
>> > 
>> > The original issue is indeed fixed, but there's a problem with the patch. 
>> > When stressing /proc (to verify the bug was fixed), my test machine ended 
>> > up crashing. Here are the 2 traces I found in the logs:
>> > 
>> > Sep  6 12:06:00 arrakis kernel: BUG: warning at 
>> > kernel/fork.c:113/__put_task_struct()
>> > Sep  6 12:06:00 arrakis kernel:  [<c0115f93>] __put_task_struct+0xf3/0x100
>> > Sep  6 12:06:00 arrakis kernel:  [<c019666a>] proc_pid_readdir+0x13a/0x150
>> > Sep  6 12:06:00 arrakis kernel:  [<c01745f0>] vfs_readdir+0x80/0xa0
>> > Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
>> > Sep  6 12:06:00 arrakis kernel:  [<c017488c>] sys_getdents+0x6c/0xb0
>> > Sep  6 12:06:00 arrakis kernel:  [<c0174750>] filldir+0x0/0xd0
>> > Sep  6 12:06:00 arrakis kernel:  [<c0102fb7>] syscall_call+0x7/0xb
>> 
>> If the task found is not a group leader, we go to retry, but
>> the task != NULL.
>> 
>> Now, if find_ge_pid(tgid) returns NULL, we return that wrong
>> task, and it was not get_task_struct()'ed.

Yep.  That would do it.  And of course having written the
code it was very hard for me to step back far enough to see that.

The other two failure modes still don't make sense to me but
they may have been a side effect of this.

And it would have taken a thread being the highest pid in the
system that exits while we are calling filldir to trigger this.
Wow.  That was a good stress test.


Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
>
> --- t/fs/proc/base.c~	2006-09-07 02:33:26.000000000 +0400
> +++ t/fs/proc/base.c	2006-09-07 02:34:19.000000000 +0400
> @@ -2149,9 +2149,9 @@ static struct task_struct *next_tgid(uns
>  	struct task_struct *task;
>  	struct pid *pid;
>  
> -	task = NULL;
>  	rcu_read_lock();
>  retry:
> +	task = NULL;
>  	pid = find_ge_pid(tgid);
>  	if (pid) {
>  		tgid = pid->nr + 1;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc-readdir-race-fix-take-3-fix-3
  2006-09-06 22:38                 ` [PATCH] proc-readdir-race-fix-take-3-fix-3 Oleg Nesterov
  2006-09-06 22:59                   ` Eric W. Biederman
@ 2006-09-08  6:38                   ` Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-08  6:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jean Delvare, Andrew Morton, KAMEZAWA Hiroyuki, linux-kernel, ak

Ok.  While we are on the topic of races in readdir in /proc.
I just wanted to note that I believe thread groups have the same
potential problem.

I don't think it is severe enough to cause any real world problems.

But it makes sense to note it.

Something to think about.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-06 21:12             ` Jean Delvare
  2006-09-06 22:25               ` Oleg Nesterov
@ 2006-09-06 22:43               ` Eric W. Biederman
  2006-09-07  8:31                 ` Jean Delvare
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-06 22:43 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

Jean Delvare <jdelvare@suse.de> writes:

> On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
>> Eric, Kame, thanks a lot for working on this. I'll be giving some good
>> testing to this patch today, and will return back to you when I'm done.
>
> The original issue is indeed fixed, but there's a problem with the patch. 
> When stressing /proc (to verify the bug was fixed), my test machine ended 
> up crashing. Here are the 2 traces I found in the logs:

Ugh.  

So the death in __put_task_struct() is from:
WARN_ON(!(tsk->exit_state & (EXIT_DEAD | EXIT_ZOMBIE)));
So it appears we have something that is decrementing but not
incrementing the count on the task struct.

Now what is interesting is that there are a couple of other failure modes
present here.
free_uid called from __put_task_struct is failing

And you seem to have a recursive page fault going on somewhere.

I suspect the triggering of this bug is the result of an earlier oops,
that left some process half cleaned up.

Have you tested 2.6.18-rc6 without my patch?
If not can you please test the same 2.6.18-rc6 configuration with my patch?

> Sometimes the machine just hung, with nothing in the logs. The machine is 
> a Sony laptop (i686).
>
> I have been testing the patch on another machine (x86_64) and had no 
> problem at all, so the reproduceability of the bug might depend on the 
> arch or some config option. I'll help nailing down this issue if I can, 
> just tell me what to do.

So I don't know what is going on with your laptop.  It feels nasty.

I think my patch is just tripping on the problem, rather than causing
it.  The previous version of fs/proc/base.c should have tripped over
this problem as well if it happened to have hit the same process.

I'm staring at the patch and I can not think of anything that would
explain your problem.  The reference counting is simple and the only
bug I had in a posted version was a failure to decrement the count
on the task_struct.

I guess the practical question is what was your test methodology to
reproduce this problem?  A couple of more people running the same
test on a few more machines might at least give us confidence in what
is going on.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-06 22:43               ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
@ 2006-09-07  8:31                 ` Jean Delvare
  2006-09-07 13:57                   ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Jean Delvare @ 2006-09-07  8:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

On Thursday 7 September 2006 00:43, Eric W. Biederman wrote:
> Have you tested 2.6.18-rc6 without my patch?

Yes I did, it didn't crash after a couple hours. Of course it doesn't 
prove anything as the crash appears to be the result of a race.

I'll now apply Oleg's fix and see if things get better.

> I guess the practical question is what was your test methodology to
> reproduce this problem?  A couple of more people running the same
> test on a few more machines might at least give us confidence in what
> is going on.

"My" test program forks 1000 children who sleep for 1 second then look for 
themselves in /proc, warn if they can't find themselves, and exit. So 
basically the idea is that the process list will shrink very rapidly at 
the same moment every child does readdir(/proc).

I attached the test program, I take no credit (nor shame) for it, it was 
provided to me by IBM (possibly on behalf of one of their own customers) 
as a way to demonstrate and reproduce the original readdir(/proc) race 
bug.

-- 
Jean Delvare

[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 1151 bytes --]

#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/param.h>
#include <utmp.h>
#include <pwd.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <syslog.h>
#include <errno.h>
#include <stdarg.h>
#include <ctype.h>

#define NUM_CHILDREN 1000

findme(i)
int i;
{
	DIR * dir = NULL;
	struct dirent *d;
	int pid;
	int mypid;

	mypid = getpid();


	if ((dir = opendir("/proc")) == (DIR *)0)
	{
		perror("failed to open /proc\n");
		exit(1);
	}

	while((d = readdir(dir)) != (struct dirent *)0) {
        	if ((pid = (pid_t)atoi(d->d_name)) == 0) continue;
		if (pid==mypid) return(1);
	}
	printf("\nfailed to find myself: pid %d, iteration %d\n",mypid,i);
	return(0);
}

fork_child(i)
int i;
{
    int pid;

    switch ((pid = fork())) {
    case 0:   /* child */
	sleep(1);
	findme(i);
	exit(0);
	;;
    case -1:  /* error */
	perror("failed to fork\n");
	exit(1);
	;;
    default:  /* parent */
	;;
   }
	
}


main()
{
	int i;
        (void)signal(SIGCHLD, SIG_IGN);


	for (i=0; i<NUM_CHILDREN; i++)
	{
		fork_child(i);
	}

}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-07  8:31                 ` Jean Delvare
@ 2006-09-07 13:57                   ` Eric W. Biederman
  2006-09-07 18:07                     ` Jean Delvare
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-07 13:57 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

Jean Delvare <jdelvare@suse.de> writes:

> On Thursday 7 September 2006 00:43, Eric W. Biederman wrote:
>> Have you tested 2.6.18-rc6 without my patch?
>
> Yes I did, it didn't crash after a couple hours. Of course it doesn't 
> prove anything as the crash appears to be the result of a race.
>
> I'll now apply Oleg's fix and see if things get better.
>
>> I guess the practical question is what was your test methodology to
>> reproduce this problem?  A couple of more people running the same
>> test on a few more machines might at least give us confidence in what
>> is going on.
>
> "My" test program forks 1000 children who sleep for 1 second then look for 
> themselves in /proc, warn if they can't find themselves, and exit. So 
> basically the idea is that the process list will shrink very rapidly at 
> the same moment every child does readdir(/proc).
>
> I attached the test program, I take no credit (nor shame) for it, it was 
> provided to me by IBM (possibly on behalf of one of their own customers) 
> as a way to demonstrate and reproduce the original readdir(/proc) race 
> bug.

Ok.  So whatever is creating lots of child threads that tripped you
up is probably peculiar to the environment on your laptop.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-07 13:57                   ` Eric W. Biederman
@ 2006-09-07 18:07                     ` Jean Delvare
  2006-09-07 18:40                       ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Jean Delvare @ 2006-09-07 18:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

On Thursday 7 September 2006 15:57, Eric W. Biederman wrote:
> Jean Delvare <jdelvare@suse.de> writes:
> > I'll now apply Oleg's fix and see if things get better.

After 8 hours of stress testing on two machines, no crash and no freeze. 
So Oleg's fix seems to do the trick. Thanks Oleg :)

I'll keep the patches applied on both machines, even without stress tests 
it is still better to make sure nothing bad happens in the long run.

> > "My" test program forks 1000 children who sleep for 1 second then
> > look for themselves in /proc, warn if they can't find themselves, and
> > exit. So basically the idea is that the process list will shrink very
> > rapidly at the same moment every child does readdir(/proc).
> >
> > I attached the test program, I take no credit (nor shame) for it, it
> > was provided to me by IBM (possibly on behalf of one of their own
> > customers) as a way to demonstrate and reproduce the original
> > readdir(/proc) race bug.
>
> Ok.  So whatever is creating lots of child threads that tripped you
> up is probably peculiar to the environment on your laptop.

There's nothing really special running on this laptop. Slackware 10.2 with 
xterm, firefox, sylpheed, xchat, and that's about it. At least one of the 
crashes I had yesterday happened when I was actively using firefox, I 
can't tell for the other one.

The difference with the system where no problem was observed may be that 
the laptop has a preemptive kernel, and the desktop hasn't.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix (take 3)
  2006-09-07 18:07                     ` Jean Delvare
@ 2006-09-07 18:40                       ` Eric W. Biederman
  0 siblings, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2006-09-07 18:40 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Andrew Morton, Oleg Nesterov, KAMEZAWA Hiroyuki, linux-kernel, ak

Jean Delvare <jdelvare@suse.de> writes:

> On Thursday 7 September 2006 15:57, Eric W. Biederman wrote:
>> Jean Delvare <jdelvare@suse.de> writes:
>> > I'll now apply Oleg's fix and see if things get better.
>
> After 8 hours of stress testing on two machines, no crash and no freeze. 
> So Oleg's fix seems to do the trick. Thanks Oleg :)
>
> I'll keep the patches applied on both machines, even without stress tests 
> it is still better to make sure nothing bad happens in the long run.
>
>> > "My" test program forks 1000 children who sleep for 1 second then
>> > look for themselves in /proc, warn if they can't find themselves, and
>> > exit. So basically the idea is that the process list will shrink very
>> > rapidly at the same moment every child does readdir(/proc).
>> >
>> > I attached the test program, I take no credit (nor shame) for it, it
>> > was provided to me by IBM (possibly on behalf of one of their own
>> > customers) as a way to demonstrate and reproduce the original
>> > readdir(/proc) race bug.
>>
>> Ok.  So whatever is creating lots of child threads that tripped you
>> up is probably peculiar to the environment on your laptop.
>
> There's nothing really special running on this laptop. Slackware 10.2 with 
> xterm, firefox, sylpheed, xchat, and that's about it. At least one of the 
> crashes I had yesterday happened when I was actively using firefox, I 
> can't tell for the other one.

Well firefox is threaded so that may be enough.  It takes a threaded
program to be able to trigger it.

> The difference with the system where no problem was observed may be that 
> the laptop has a preemptive kernel, and the desktop hasn't.

I suspect that just has the potential to make the window bigger.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
  2006-09-05  1:30   ` KAMEZAWA Hiroyuki
  2006-09-05  2:26   ` KAMEZAWA Hiroyuki
@ 2006-09-05  5:26   ` KAMEZAWA Hiroyuki
  2006-09-05  5:41     ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  5:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, akpm, ak, oleg, jdelvare

On Mon, 04 Sep 2006 17:13:10 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

worked well with my (short) test set. and looks good.
Thank you.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] proc: readdir race fix.
  2006-09-05  5:26   ` [PATCH] proc: readdir race fix KAMEZAWA Hiroyuki
@ 2006-09-05  5:41     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 25+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-09-05  5:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: ebiederm, linux-kernel, akpm, ak, oleg, jdelvare

On Tue, 5 Sep 2006 14:26:38 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 04 Sep 2006 17:13:10 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
> 
> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> 
> worked well with my (short) test set. and looks good.
> Thank you.
> 
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
Sorry (>_<. I should ack to another e-mail. ignore this.
Hmm, please change subject of e-mail if patch is changed.

-Kame


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-09-08  6:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-25  9:29 [RFC][PATCH] ps command race fix take 4 [4/4] proc root open/release/llseek KAMEZAWA Hiroyuki
2006-09-04 23:13 ` [PATCH] proc: readdir race fix Eric W. Biederman
2006-09-05  1:30   ` KAMEZAWA Hiroyuki
2006-09-05  2:30     ` Eric W. Biederman
2006-09-05  2:41     ` KAMEZAWA Hiroyuki
2006-09-05  2:26   ` KAMEZAWA Hiroyuki
2006-09-05  2:54     ` Eric W. Biederman
2006-09-05  3:07     ` Eric W. Biederman
2006-09-05  5:39       ` KAMEZAWA Hiroyuki
2006-09-05 10:10       ` Oleg Nesterov
2006-09-05 11:36         ` Eric W. Biederman
2006-09-05 14:52         ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
2006-09-06  9:01           ` Jean Delvare
2006-09-06 21:12             ` Jean Delvare
2006-09-06 22:25               ` Oleg Nesterov
2006-09-06 22:38                 ` [PATCH] proc-readdir-race-fix-take-3-fix-3 Oleg Nesterov
2006-09-06 22:59                   ` Eric W. Biederman
2006-09-08  6:38                   ` Eric W. Biederman
2006-09-06 22:43               ` [PATCH] proc: readdir race fix (take 3) Eric W. Biederman
2006-09-07  8:31                 ` Jean Delvare
2006-09-07 13:57                   ` Eric W. Biederman
2006-09-07 18:07                     ` Jean Delvare
2006-09-07 18:40                       ` Eric W. Biederman
2006-09-05  5:26   ` [PATCH] proc: readdir race fix KAMEZAWA Hiroyuki
2006-09-05  5:41     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox