[patch] procfs/procps threading performance speedup, 2.5.62

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] procfs/procps threading performance speedup, 2.5.62
@ 2003-02-20 16:33 Ingo Molnar
  2003-02-20 16:47 ` Miquel van Smoorenburg
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-20 16:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Alex Larsson, procps-list

Lots of people have requested that threads should show up in /proc again,
to be able to look at per-thread CPU usage, activity, and generally, to
ease the debugging of threaded apps.

the main problem with threads in /proc is that there's a big slowdown when
using lots of threads. Here are some runtime numbers in seconds, under
2.5.62, on a 525MHz PIII box:

   # of threads (*):  1000    2000    4000    8000    16000
   --------------------------------------------------------
   ps                 0.77    1.52    3.13    6.44    13.81
   ps -axm            0.93    1.85    3.78    7.73    18.37
   top -d 0 -n 0      0.75    1.53    3.23    7.60    22.12

[ (*): the system is completely idle, all threads are sleeping. Overhead
is combined system and userspace overhead measured via 'time'.]

ie. the overhead is really massive, eg. with 16K threads running, procps
is basically unusable for any administration or debugging work. And there
are users that want 50K or more threads. So clearly, this state of procfs
and procps is unacceptable - who would use 'top' to check a system's state
if a single screen-refresh takes 22 seconds?!

in the above timings, only 'ps -axm' is actually displaying every thread,
all other commands produce only a few lines of output. The reason of the
overhead is two-fold:

1) there's significant kernel overhead in reading large /proc directories,
   the overhead of many readdir()'s is O(N^2). The main overhead is in
   get_pid_list(), which has to loop over an increasing number of threads
   to find the next intended batch of PIDs.

to fix this overhead i've introduced a 'lookup cursor' cookie, which is
cached in filp->private_data, across readdir() [getdents64()] calls. If
the cursor matches then we skip all the overhead of skipping threads. If
the cursor is not available then we fall back to the old-style skipping
algorithm.

2) procps is forced to parse every thread in /proc to build up accurate
   'process CPU usage' counters. The parsing and accessing of every
   /proc/PID/stat file is necessary because CPU statistics are scattered
   across all threads.

the fix for this is two-fold. First, it must be possible for procps to
separate 'threads' from 'processes' without having to go into 16 thousand
directories. I solved this by prefixing 'threads' (ie. non-group-leader
threads) with a dot ('.') character in the /proc listing:

 $ ls -a /proc
 .      16994   .17078  412  7          execdomains  locks       stat
 ..     16995   .17079  460  8          filesystems  meminfo     swaps
 1      17031   .17080  469  9          fs           misc        sys
 16864  17033   .17081  5    92         ide          mounts      sysvipc
 16866  17034   .17082  515  buddyinfo  interrupts   mtrr        tty
 16867  17072   17113   516  bus        iomem        net         uptime
 16946  .17073  2       517  cmdline    ioports      partitions  version
 16948  .17074  3       518  cpuinfo    irq          profile     vmstat
 16949  .17075  390     519  devices    kcore        scsi
 16989  .17076  4       520  dma        kmsg         self
 16992  .17077  400     6    driver     loadavg      slabinfo

the .17073 ... .17082 entries belong to the thread-group 17072.

The key here is for procps to be able to parse threads without having to
call into the kernel 16K times. The dot-approach also has the added
benefit of 'hiding' threads in the default 'ls /proc' listing.

the other change needed was the ability to read comulative CPU usage
statistics from the thread group leader. I've introduced 4 new fields in
/proc/PID/stat for that purpose, the kernel keeps those uptodate across
fork/exit and in the timer interrupt - it's very low-overhead.

the attached patch, against 2.5.62-BK, implements these kernel features.

Alex Larsson has modified procps for these new kernel capabilities, the
new procps package (or the patch against upstream procps) can be
downloaded from:

	http://people.redhat.com/alexl/procps/

here are the performance measurements (with stock procps+procfs numbers in
paranthesis)

   # of threads:       1000     2000     4000      8000      16000
   ----------------------------------------------------------------
   ps                  0.02     0.03     0.03      0.03       0.04
                      (0.77)   (1.52)   (3.13)    (6.44)    (13.81)

   ps -axm             0.89     1.72     3.40      6.87      15.57
                      (0.93)   (1.85)   (3.78)    (7.73)    (18.37)

   top -d 0 -n 0       0.11     0.12     0.12      0.13       0.16
                      (0.75)   (1.53)   (3.23)    (7.60)    (22.12)

eg. with 16K threads running in the background, a single 'top'
screen-refresh got more than 130 times faster. A simple 'ps' got more than
340 times faster ... Even the 'ps -axm' (which displays all threads)  
command got faster, due to the pid-cursor. But even with just 1000 threads
running a simple 'ps' is 30 times faster, and top refresh is 6 times
faster.

another advantage of this approach is that old procps is fully compatible
with the new kernel, and new procps is fully compatible with old kernels.  
Plus everything is still encoded in the ASCII namespace, no binary
interfaces are used.

the patch works just fine on my boxes. (the patch is also included in the
threading backport, in the rawhide 2.4 kernel, and has been in use for a
couple of weeks already.)

	Ingo

--- linux/include/linux/sched.h.orig	
+++ linux/include/linux/sched.h	
@@ -359,6 +359,7 @@ struct task_struct {
 	unsigned long it_real_incr, it_prof_incr, it_virt_incr;
 	struct timer_list real_timer;
 	unsigned long utime, stime, cutime, cstime;
+	unsigned long group_utime, group_stime, group_cutime, group_cstime;
 	u64 start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
@@ -640,6 +641,9 @@ extern void kick_if_running(task_t * p);
  * Careful: do_each_thread/while_each_thread is a double loop so
  *          'break' will not work as expected - use goto instead.
  */
+#define __do_each_thread(g, t) \
+	for ( ; (g = t = next_task(g)) != &init_task ; ) do
+
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do

--- linux/fs/proc/array.c.orig	
+++ linux/fs/proc/array.c	
@@ -336,7 +336,7 @@ int proc_pid_stat(struct task_struct *ta
 	read_unlock(&tasklist_lock);
 	res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %llu %lu %ld %lu %lu %lu %lu %lu \
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu\n",
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %lu %lu %lu %lu %lu %lu\n",
 		task->pid,
 		task->comm,
 		state,
@@ -381,8 +381,12 @@ int proc_pid_stat(struct task_struct *ta
 		task->exit_signal,
 		task_cpu(task),
 		task->rt_priority,
-		task->policy);
-	if(mm)
+		task->policy,
+		task->group_utime,
+		task->group_stime,
+		task->group_cutime,
+		task->group_cstime);
+	if (mm)
 		mmput(mm);
 	return res;
 }
--- linux/fs/proc/base.c.orig	
+++ linux/fs/proc/base.c	
@@ -876,6 +876,12 @@ static unsigned name_to_int(struct dentr

 	if (len > 1 && *name == '0')
 		goto out;
+	/*
+	 * Deal with dot-aliases for threads.
+	 */
+	if (name[0] == '.')
+		name++, len--;
+
 	while (len-- > 0) {
 		unsigned c = *name++ - '0';
 		if (c > 9)
@@ -1155,31 +1161,50 @@ out:
  * tasklist lock while doing this, and we must release it before
  * we actually do the filldir itself, so we use a temp buffer..
  */
-static int get_pid_list(int index, unsigned int *pids)
+static int get_pid_list(int index, int *pids, struct file *filp)
 {
-	struct task_struct *p;
-	int nr_pids = 0;
+	int nr_pids = 0, pid = 0, pid_cursor = (int)filp->private_data;
+	struct task_struct *g = NULL, *p = NULL;

-	index--;
 	read_lock(&tasklist_lock);
-	for_each_process(p) {
-		int pid = p->pid;
-		if (!pid)
-			continue;
+	if (pid_cursor) {
+		p = find_task_by_pid(pid_cursor);
+		g = p->group_leader;
+	}
+	if (!p) {
+		g = p = &init_task;
+		index--;
+	} else
+		index = 0;
+
+	goto inside;
+
+	__do_each_thread(g, p) {
 		if (--index >= 0)
 			continue;
-		pids[nr_pids] = pid;
+		pid = p->pid;
+		if (!pid)
+			BUG();
+		if (p->tgid != p->pid)
+			pids[nr_pids] = -pid;
+		else
+			pids[nr_pids] = pid;
 		nr_pids++;
 		if (nr_pids >= PROC_MAXPIDS)
-			break;
-	}
+			goto out;
+inside:
+		;
+	} while_each_thread(g, p);
+out:
+	filp->private_data = (void *)pid;
 	read_unlock(&tasklist_lock);
+
 	return nr_pids;
 }

 int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
-	unsigned int pid_array[PROC_MAXPIDS];
+	int pid_array[PROC_MAXPIDS];
 	char buf[PROC_NUMBUF];
 	unsigned int nr = filp->f_pos - FIRST_PROCESS_ENTRY;
 	unsigned int nr_pids, i;
@@ -1192,14 +1217,16 @@ int proc_pid_readdir(struct file * filp,
 		nr++;
 	}

-	nr_pids = get_pid_list(nr, pid_array);
+	nr_pids = get_pid_list(nr, pid_array, filp);

 	for (i = 0; i < nr_pids; i++) {
-		int pid = pid_array[i];
+		int pid = abs(pid_array[i]);
 		ino_t ino = fake_ino(pid,PROC_PID_INO);
 		unsigned long j = PROC_NUMBUF;

 		do buf[--j] = '0' + (pid % 10); while (pid/=10);
+		if (pid_array[i] < 0)
+			buf[--j] = '.';

 		if (filldir(dirent, buf+j, PROC_NUMBUF-j, filp->f_pos, ino, DT_DIR) < 0)
 			break;
--- linux/kernel/timer.c.orig	
+++ linux/kernel/timer.c	
@@ -657,7 +657,10 @@ static inline void do_process_times(stru
 	unsigned long psecs;

 	psecs = (p->utime += user);
+	p->group_leader->group_utime += user;
 	psecs += (p->stime += system);
+	p->group_leader->group_stime += system;
+
 	if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {
 		/* Send SIGXCPU every second.. */
 		if (!(psecs % HZ))
--- linux/kernel/exit.c.orig	
+++ linux/kernel/exit.c	
@@ -92,6 +92,8 @@ void release_task(struct task_struct * p

 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
+	p->parent->group_leader->group_cutime += p->utime + p->cutime;
+	p->parent->group_leader->group_cstime += p->stime + p->cstime;
 	p->parent->cmin_flt += p->min_flt + p->cmin_flt;
 	p->parent->cmaj_flt += p->maj_flt + p->cmaj_flt;
 	p->parent->cnswap += p->nswap + p->cnswap;
--- linux/kernel/fork.c.orig	
+++ linux/kernel/fork.c	
@@ -830,6 +830,9 @@ static struct task_struct *copy_process(
 	p->tty_old_pgrp = 0;
 	p->utime = p->stime = 0;
 	p->cutime = p->cstime = 0;
+	p->group_utime = p->group_stime = 0;
+	p->group_cutime = p->group_cstime = 0;
+
 	p->array = NULL;
 	p->lock_depth = -1;		/* -1 = no lock */
 	p->start_time = get_jiffies_64();

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 16:33 [patch] procfs/procps threading performance speedup, 2.5.62 Ingo Molnar
@ 2003-02-20 16:47 ` Miquel van Smoorenburg
  2003-02-20 16:55   ` Daniel Jacobowitz
  2003-02-20 17:00 ` Nikita Danilov
  2003-02-20 17:06 ` Linus Torvalds
  2 siblings, 1 reply; 28+ messages in thread
From: Miquel van Smoorenburg @ 2003-02-20 16:47 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.44.0302201656030.30000-100000@localhost.localdomain>,
Ingo Molnar  <mingo@elte.hu> wrote:
>the fix for this is two-fold. First, it must be possible for procps to
>separate 'threads' from 'processes' without having to go into 16 thousand
>directories. I solved this by prefixing 'threads' (ie. non-group-leader
>threads) with a dot ('.') character in the /proc listing:

Why not put threads belonging to a thread group into /proc/17072/threads ?

> $ ls -a /proc

I'm seeing 17072 as a group-leader and the 'threads' as .17073 etc.
When you put all 'threads' into /proc/17072/threads, you'd get
/proc/17072/threads/17072, /proc/17072/threads/17073, etc.
/proc/17072 would show stats for the whole threads group, while
/proc/17072/threads/17072 would show stats for just that thread.

>the .17073 ... .17082 entries belong to the thread-group 17072.

Yuck ;(

>The key here is for procps to be able to parse threads without having to
>call into the kernel 16K times. The dot-approach also has the added
>benefit of 'hiding' threads in the default 'ls /proc' listing.

What is against /proc/<pid>/threads ?

>the other change needed was the ability to read comulative CPU usage
>statistics from the thread group leader. I've introduced 4 new fields in
>/proc/PID/stat for that purpose, the kernel keeps those uptodate across
>fork/exit and in the timer interrupt - it's very low-overhead.

That would also be solved with /proc/<pid> and /proc/<pid>/threads/<thread>

>another advantage of this approach is that old procps is fully compatible
>with the new kernel, and new procps is fully compatible with old kernels.  

That would also be the case with /proc/<pid>/threads

Mike.
-- 
Anyone who is capable of getting themselves made President should
on no account be allowed to do the job -- Douglas Adams.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 16:47 ` Miquel van Smoorenburg
@ 2003-02-20 16:55   ` Daniel Jacobowitz
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Jacobowitz @ 2003-02-20 16:55 UTC (permalink / raw)
  To: linux-kernel

On Thu, Feb 20, 2003 at 04:47:15PM +0000, Miquel van Smoorenburg wrote:
> In article <Pine.LNX.4.44.0302201656030.30000-100000@localhost.localdomain>,
> Ingo Molnar  <mingo@elte.hu> wrote:
> >the fix for this is two-fold. First, it must be possible for procps to
> >separate 'threads' from 'processes' without having to go into 16 thousand
> >directories. I solved this by prefixing 'threads' (ie. non-group-leader
> >threads) with a dot ('.') character in the /proc listing:
> 
> Why not put threads belonging to a thread group into /proc/17072/threads ?

I'm also inclined to think this is a good idea.  We could even choose
to use /proc/17072/lwp, to match Solaris, but it's probably not worth
it since the semantics are different.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 16:33 [patch] procfs/procps threading performance speedup, 2.5.62 Ingo Molnar
  2003-02-20 16:47 ` Miquel van Smoorenburg
@ 2003-02-20 17:00 ` Nikita Danilov
  2003-02-20 17:06 ` Linus Torvalds
  2 siblings, 0 replies; 28+ messages in thread
From: Nikita Danilov @ 2003-02-20 17:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Linus Torvalds, Alex Larsson, procps-list

Ingo Molnar writes:
 > 

[...]

 > 
 > to fix this overhead i've introduced a 'lookup cursor' cookie, which is
 > cached in filp->private_data, across readdir() [getdents64()] calls. If
 > the cursor matches then we skip all the overhead of skipping threads. If
 > the cursor is not available then we fall back to the old-style skipping
 > algorithm.

Shouldn't filp->private_data be cleared on lseek? It looks like lookup
cursor is never cleared once set and so readdir will always go forward
independently of ->f_pos updates. Note that glibc implementation of
readdir() (on the top of getdents64()) does call lseek on the
directory. So does seekdir(3).

 > 
 > 2) procps is forced to parse every thread in /proc to build up accurate
 >    'process CPU usage' counters. The parsing and accessing of every

Nikita.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 16:33 [patch] procfs/procps threading performance speedup, 2.5.62 Ingo Molnar
  2003-02-20 16:47 ` Miquel van Smoorenburg
  2003-02-20 17:00 ` Nikita Danilov
@ 2003-02-20 17:06 ` Linus Torvalds
  2003-02-20 17:14   ` Ingo Molnar
  2003-02-20 17:20   ` Linus Torvalds
  2 siblings, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2003-02-20 17:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Alex Larsson, procps-list

On Thu, 20 Feb 2003, Ingo Molnar wrote:
>
> the main problem with threads in /proc is that there's a big slowdown when
> using lots of threads.

Well, part of the problem (I think) is that you added all the threads to 
the same main directory.

Putting a "." in front of the name doesn't fix the /proc level directory
scalability issues, it only means that you can avoid some of the user- 
level scalability ones.

So to offset that bad design, you then add other cruft, like the lookup
cursor and the "." marker. Which is not a bad idea in itself, but I claim
that if you'd made the directory structure saner you wouldn't have needed
it in the first place.

It would just be _so_ much nicer if the threads would show up as 
subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable, 
and just generally more sane.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:06 ` Linus Torvalds
@ 2003-02-20 17:14   ` Ingo Molnar
  2003-02-20 17:22     ` Linus Torvalds
                       ` (2 more replies)
  2003-02-20 17:20   ` Linus Torvalds
  1 sibling, 3 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-20 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Alex Larsson, procps-list, Alexander Viro


On Thu, 20 Feb 2003, Linus Torvalds wrote:

> > the main problem with threads in /proc is that there's a big slowdown when
> > using lots of threads.
> 
> Well, part of the problem (I think) is that you added all the threads to
> the same main directory.
> 
> Putting a "." in front of the name doesn't fix the /proc level directory
> scalability issues, it only means that you can avoid some of the user-
> level scalability ones.
> 
> So to offset that bad design, you then add other cruft, like the lookup
> cursor and the "." marker. Which is not a bad idea in itself, but I
> claim that if you'd made the directory structure saner you wouldn't have
> needed it in the first place.

with the dot-marker it's not scaling badly at all - you can see the
numbers in the mail, all of the command variants (even a simple 'ps')  
still reads the _full_ /proc directory. (just they dont go further than
that, because the readdir() output is enough to filter stuff.)

> It would just be _so_ much nicer if the threads would show up as
> subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable,
> and just generally more sane.

Al says that this cannot be done sanely, and is fraught with security
problems. I'd vote for it if it were possible. Al?

but, if you worry about the scalability of large /proc directories - it's
not bad at all.

	Ingo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:14   ` Ingo Molnar
@ 2003-02-20 17:22     ` Linus Torvalds
  2003-02-21  7:38       ` Oliver Xymoron
  2003-02-20 17:23     ` Ingo Molnar
  2003-02-20 17:28     ` Jeff Garzik
  2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2003-02-20 17:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Alex Larsson, procps-list, Alexander Viro

On Thu, 20 Feb 2003, Ingo Molnar wrote:
> 
> Al says that this cannot be done sanely, and is fraught with security
> problems. I'd vote for it if it were possible. Al?

I seriously doubt it. It's all exactly the same as the _current_ 
/proc/<pid> stuff, it just shows up in a different place.

> but, if you worry about the scalability of large /proc directories - it's
> not bad at all.

I worry about the _sanity_ of it, and it basically makes no sense to
iterate over every single thread, when you should always be able to just
iterate over processes (and then within the process - only when you want
to - iterate over the threads of _that_ process).

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:22     ` Linus Torvalds
@ 2003-02-21  7:38       ` Oliver Xymoron
  0 siblings, 0 replies; 28+ messages in thread
From: Oliver Xymoron @ 2003-02-21  7:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel

On Thu, Feb 20, 2003 at 09:22:42AM -0800, Linus Torvalds wrote:
> 
> On Thu, 20 Feb 2003, Ingo Molnar wrote:
> > 
> > Al says that this cannot be done sanely, and is fraught with security
> > problems. I'd vote for it if it were possible. Al?
> 
> I seriously doubt it. It's all exactly the same as the _current_ 
> /proc/<pid> stuff, it just shows up in a different place.

Well perhaps that was just Al's way of saying the current stuff is
broken too. I think we could get into trouble reparenting or exec'ing
really quickly though.

-- 
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:14   ` Ingo Molnar
  2003-02-20 17:22     ` Linus Torvalds
@ 2003-02-20 17:23     ` Ingo Molnar
  2003-02-22 20:52       ` Albert Cahalan
  2003-02-20 17:28     ` Jeff Garzik
  2 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2003-02-20 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Alex Larsson, procps-list, Alexander Viro

eg. with 16,000 threads in /proc, "ls /proc" is still fast:

     real:0m0.032s   user:0m0.007s   sys:0m0.024s

without those threads, it's:

     real:0m0.014s user:0m0.004s sys:0m0.010s

15 msecs difference. Even simple 'ls' reads the full /proc directory (all
16K+ entries). So performance-wise there's not a big difference.

architecture-wise there is a difference, and i'd be the last one arguing
against a tree-based approach to thread groups. It's much easier to find
threads belonging to a single 'process' via /proc this way - although no
functionality in procps has or requires such a feature currently.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:23     ` Ingo Molnar
@ 2003-02-22 20:52       ` Albert Cahalan
  2003-02-22 21:55         ` Andrew Morton
  2003-02-23 18:49         ` Ingo Molnar
  0 siblings, 2 replies; 28+ messages in thread
From: Albert Cahalan @ 2003-02-22 20:52 UTC (permalink / raw)
  To: procps-list, mingo
  Cc: Linus Torvalds, linux-kernel, Alex Larsson, Alexander Viro

On Thu, 2003-02-20 at 12:23, Ingo Molnar wrote:

> architecture-wise there is a difference, and i'd be
> the last one arguing against a tree-based approach to
> thread groups. It's much easier to find threads belonging
> to a single 'process' via /proc this way - although no
> functionality in procps has or requires such a feature currently.

Nope, the /proc/123/threads/246/stat approach is required.
Without this, procps is forced to read _all_ tasks to
group threads together. This is slow, prone to race conditions,
more vulnerable to kernel bugs, and a memory hog.

FYI, thread grouping is required even if whole-process
information is available. Many "ps" output formats need
grouping, and it is desirable for many more.

I might as well mention that whole-process information
includes the four fault counters and some indication
that wchan data is multi-valued (a '*' must be displayed
in that case). There may be more I haven't spotted yet.

Note that the recent /proc/*/wchan addition was botched.
Caching is prevented due to race conditions. This could
be fixed by changing the file format to contain:
    number, function name
with optional:
    function address, file name, module name
(next time, discuss such changes with an experienced
procps developer first)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-22 20:52       ` Albert Cahalan
@ 2003-02-22 21:55         ` Andrew Morton
  2003-02-22 23:04           ` Albert Cahalan
  2003-02-23 18:49         ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2003-02-22 21:55 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: procps-list, linux-kernel

Albert Cahalan <albert@users.sourceforge.net> wrote:
>
> Note that the recent /proc/*/wchan addition was botched.
> Caching is prevented due to race conditions. This could
> be fixed by changing the file format to contain:
>     number, function name

There is not enough detail here for it to be fixed.

What are the race conditions?

What is "number"?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-22 21:55         ` Andrew Morton
@ 2003-02-22 23:04           ` Albert Cahalan
  0 siblings, 0 replies; 28+ messages in thread
From: Albert Cahalan @ 2003-02-22 23:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Albert Cahalan, procps-list, linux-kernel

On Sat, 2003-02-22 at 16:55, Andrew Morton wrote:
> Albert Cahalan <albert@users.sourceforge.net> wrote:

>> Note that the recent /proc/*/wchan addition was botched.
>> Caching is prevented due to race conditions. This could
>> be fixed by changing the file format to contain:
>>     number, function name
>
> There is not enough detail here for it to be fixed.
>
> What are the race conditions?
>
> What is "number"?

By "number" I mean the numeric wchan. ("ps -o nwchan")
This value is provided in the /proc/*/stat files.

Most tools are stuck reading /proc/*/stat most of
the time. So the nwchan value is being read already.
You can't cache a mapping from nwchan to wchan,
because the values may change from the time you
read /proc/*/stat to the time you read /proc/*/wchan.
This forces reading /proc/*/wchan, even though top
has already seen the same value before and already
has the numeric version of it.

One more thing is needed. There should be a counter,
incremented every time a module is loaded or unloaded.
This allows the cache to be invalidated as needed.
I suppose /proc/stat is a decent place for the counter,
either at the end (safest) or right before intr.

Caching makes a major difference for wchan. Look how
few values are seen: ps -eonwchan= | sort -u | wc -l

Since the current /proc/*/wchan format isn't yet in
a stable kernel release, breaking the format wouldn't
be too bad. I like "%08x %s \n" and "%016lx %s \n"
(note the space on the end) for a nice fast parser and
ease of adding more fields. If you have something against
that space, then "%s %08x\n" and "%s %016lx\n" would do.

Any thoughts on the other stuff? Different modules
and files could have functions with the same name.
Maybe some info about this should be available, at
least if it doesn't slow things down much.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-22 20:52       ` Albert Cahalan
  2003-02-22 21:55         ` Andrew Morton
@ 2003-02-23 18:49         ` Ingo Molnar
  2003-02-23 21:51           ` Albert D. Cahalan
  1 sibling, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2003-02-23 18:49 UTC (permalink / raw)
  To: procps-list; +Cc: Linus Torvalds, linux-kernel, Alex Larsson, Alexander Viro


On 22 Feb 2003, Albert Cahalan wrote:

> > architecture-wise there is a difference, and i'd be
> > the last one arguing against a tree-based approach to
> > thread groups. It's much easier to find threads belonging
> > to a single 'process' via /proc this way - although no
> > functionality in procps has or requires such a feature currently.
> 
> Nope, the /proc/123/threads/246/stat approach is required. Without this,
> procps is forced to read _all_ tasks to group threads together. This is
> slow, prone to race conditions, more vulnerable to kernel bugs, and a
> memory hog.

actually, what you mention does not happen in practice. We coded it up and
it works, and with tons of threads around it performs a few orders of
magnitudes faster than any other stuff available so far. So the question
here is 'only' interface/approach cleanliness, not actual performance
difference. Sure, we could shave off another millisec or two, but the
performance problems are off the radar already.

> Note that the recent /proc/*/wchan addition was botched.

(fyi, i have nothing to do with that change, so spare your insults for
someone else.)

> (next time, discuss such changes with an experienced procps developer
> first)

(given that this whole area was left alone in a state like this for years
i'm not quite sure how you can still sit so high on your horse.)

	Ingo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-23 18:49         ` Ingo Molnar
@ 2003-02-23 21:51           ` Albert D. Cahalan
  2003-02-24  9:28             ` Ingo Molnar
  2003-02-24  9:38             ` Ingo Molnar
  0 siblings, 2 replies; 28+ messages in thread
From: Albert D. Cahalan @ 2003-02-23 21:51 UTC (permalink / raw)
  To: procps-list; +Cc: torvalds, linux-kernel, alexl, viro, mingo

Ingo Molnar writes:
> On 22 Feb 2003, Albert Cahalan wrote:

>>> architecture-wise there is a difference, and i'd be
>>> the last one arguing against a tree-based approach to
>>> thread groups. It's much easier to find threads belonging
>>> to a single 'process' via /proc this way - although no
>>> functionality in procps has or requires such a feature currently.
>>
>> Nope, the /proc/123/threads/246/stat approach is required.
>> Without this, procps is forced to read _all_ tasks to group
>> threads together. This is slow, prone to race conditions,
>> more vulnerable to kernel bugs, and a memory hog.
>
> actually, what you mention does not happen in practice.
> We coded it up and it works,

Surely you realize that I have seen this code?

Proper "ps m" behavior groups threads in the output.
The hacked-up procps being used by Red Hat fails to
do this. You chould change that... at the cost of reading
all processes and sorting them. There goes all of your
performance improvement.

I hope you will eliminate the Red Hat bias in future patches.
Using the subdirectory approach certainly won't hurt you.
In time, you will realize that it is needed, or you will
switch over to using procps 3. Several distributions have
recently switched; I doubt you can hold out forever.

(I'd still like to know why Red Hat broke "sort -n" once.
What is it with you guys mucking up the most basic tools?)

BTW, proper thread support in ps includes the "-T" and
"-L" options. There's more too. I know this; nobody
doing procps-2 work has a clue. You're living in the
same Linux-only world that made procps-1 so hated for
incompatibility with the rest of the world. Sys admins
like to write scripts that run on everything they use.

>> (next time, discuss such changes with an experienced
>> procps developer first)
>
> (given that this whole area was left alone in a state
> like this for years i'm not quite sure how you can still
> sit so high on your horse.)

What area, in what state? If you mean procps in general:

I've been maintaining upstream procps for Debian for years.
The project was kept quiet out of politeness to the former
maintainer, who didn't want to let go of the project. Now
that the former maintainer is clearly not coming back, I've
put my tree at http://procps.sf.net/ for all to use. I am
after all the original author of ps itself, so I'm one to
know what is required for correct and efficient operation.

If you mean the threading support specifically:

The kernel does not provide proper support for grouping
threads by process. Hacks exist to group them anyway,
but such hacks will falsely group similar tasks and will
fail to group tasks due to race conditions. The hacks are
also slow. As none of this is acceptable in a critical
system tool, task grouping is not currently available. 

There's a need for responsibility here. One shouldn't just
throw unreliable hacks into a critical system tool.

The FAQ covers some of this and more.
http://procps.sourceforge.net/faq.html

Linux-kernel readers should note that only procps-3 can
handle the very latest kernels. No procps-2 release is
able to do so, in more than one way.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-23 21:51           ` Albert D. Cahalan
@ 2003-02-24  9:28             ` Ingo Molnar
  2003-02-24 11:12               ` Albert D. Cahalan
  2003-02-24  9:38             ` Ingo Molnar
  1 sibling, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24  9:28 UTC (permalink / raw)
  To: procps-list; +Cc: Linus Torvalds, linux-kernel, alexl, Alexander Viro


On Sun, 23 Feb 2003, Albert D. Cahalan wrote:

> Surely you realize that I have seen this code?

have you actually tried it?

> Proper "ps m" behavior groups threads in the output. The hacked-up
> procps being used by Red Hat fails to do this. You chould change that...
> at the cost of reading all processes and sorting them. There goes all of
> your performance improvement.

Albert, the new code properly reads all threads and sorts them, in the
"ps m" case. Had you truly read my emails you'd notice where the overhead
lies, and what steps were taken to get rid of it.

	Ingo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24  9:28             ` Ingo Molnar
@ 2003-02-24 11:12               ` Albert D. Cahalan
  2003-02-24 11:26                 ` Ingo Molnar
  0 siblings, 1 reply; 28+ messages in thread
From: Albert D. Cahalan @ 2003-02-24 11:12 UTC (permalink / raw)
  To: procps-list; +Cc: Linus Torvalds, linux-kernel, alexl, viro, mingo

Ingo Molnar writes:
> On Sun, 23 Feb 2003, Albert D. Cahalan wrote:

>> Surely you realize that I have seen this code?
>
> have you actually tried it?
>
>> Proper "ps m" behavior groups threads in the output. The hacked-up
>> procps being used by Red Hat fails to do this. You chould change that...
>> at the cost of reading all processes and sorting them. There goes all of
>> your performance improvement.
>
> Albert, the new code properly reads all threads and sorts them, in the
> "ps m" case. Had you truly read my emails you'd notice where the overhead
> lies, and what steps were taken to get rid of it.

I see that for default output, unpatched procps-2.x.xx is
forced to read and sort all tasks. Ignoring fault counts
and wchan for the moment, you made this work unnecessary by
adding some whole-process values. You also sped up /proc
directory operations in general, so not "all" I guess.

In the "ps m" and "ps -m" cases, you revert to the
"loose tasks" behavior. Neither "ps -L" nor "ps -T"
have been implemented. In some of these cases, the
threads should be grouped. Without subdirectories,
user code must do something bloated and slow to get
the grouping. (unless you care to guarantee the order
of tasks in a /proc directory listing, and relying on
such an ordering would prevent some other optimizations)

By grouping I mean something roughly like this:

PID TID
123 123
123 222
444 444
444 456

Not this:

PID TID
123 123
444 444
123 222
444 456

Header names and meanings may vary, etc. I'm not about to
dig out my notes regarding correct behavior for this email.

Anyway, to quote Linus:

----- begin -----
Well, part of the problem (I think) is that you added all the threads to 
the same main directory.

Putting a "." in front of the name doesn't fix the /proc level directory
scalability issues, it only means that you can avoid some of the user- 
level scalability ones.

So to offset that bad design, you then add other cruft, like the lookup
cursor and the "." marker. Which is not a bad idea in itself, but I claim
that if you'd made the directory structure saner you wouldn't have needed
it in the first place.

It would just be _so_ much nicer if the threads would show up as 
subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable, 
and just generally more sane.
----- end -----

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24 11:12               ` Albert D. Cahalan
@ 2003-02-24 11:26                 ` Ingo Molnar
  2003-02-24 12:29                   ` Albert D. Cahalan
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24 11:26 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: procps-list, Linus Torvalds, linux-kernel, alexl, viro

On Mon, 24 Feb 2003, Albert D. Cahalan wrote:

> By grouping I mean something roughly like this:
> 
> PID TID
> 123 123
> 123 222
> 444 444
> 444 456

Albert, do you realize the simple fact that the procps enhancements we did
change absolutely nothing for the 'ps m' case? All thread PIDs are still
scanned and sorted.

And mind you, thread-directories do not change much in this area - the
PIDs within the thread-directory will still be largely unsorted, and it
will not make the reading & sorting of 20K threads any faster.

> Anyway, to quote Linus:

and to quote myself:

> [...] i'd be the last one arguing against a tree-based approach to
> thread groups. It's much easier to find threads belonging to a single
> 'process' via /proc this way  [...]

so i'm in full agreement with Linus. And here's an email of mine from more 
than 7 months ago:

> and yet another issue, what do you think about not listing every
> CLONE_THREAD_GROUP thread in /proc, but to list them in the group
> leader's directory - something like /proc/12345/threads/567/... This
> will remove much of the runtime overhead of massively threaded
> applications, where there are lots of native threads.

so in fact _i_ came up with this whole issue 7 months ago. I just dont
share many of _your_ largely bogus arguments that seem to miss the point.  
Can we finally stop this storm in a teapot?

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24 11:26                 ` Ingo Molnar
@ 2003-02-24 12:29                   ` Albert D. Cahalan
  2003-02-24 12:41                     ` Ingo Molnar
                                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Albert D. Cahalan @ 2003-02-24 12:29 UTC (permalink / raw)
  To: mingo
  Cc: Albert D. Cahalan, procps-list, Linus Torvalds, linux-kernel,
	alexl, viro

> Albert, do you realize the simple fact that the procps
> enhancements we did change absolutely nothing for the 'ps m'
> case? All thread PIDs are still scanned and sorted.

That is a contradiction. There is no sorting with "ps m".
Any ordering is from the /proc directory itself.

Sorting is not default because of the memory requirements
and because there have been many kernel bugs that cause
ps to hang when it hits a particular process. Sorting may
mean that ps hangs or is killed before producing anything.

> And mind you, thread-directories do not change much in
> this area - the PIDs within the thread-directory will
> still be largely unsorted, and it will not make the
> reading & sorting of 20K threads any faster.

That's OK. I need in-kernel grouping, not in-kernel sorting.
It's fine to mix up thread order, but bad to interleave the
threads of unrelated processes.

> so in fact _i_ came up with this whole issue 7 months ago. I just dont
> share many of _your_ largely bogus arguments that seem to miss the point.  
> Can we finally stop this storm in a teapot?

Glad to, assuming you understand the importance of grouping.
I hope I have now done a better job of explaining it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24 12:29                   ` Albert D. Cahalan
@ 2003-02-24 12:41                     ` Ingo Molnar
  2003-02-24 12:45                     ` Ingo Molnar
  2003-02-24 12:52                     ` Ingo Molnar
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24 12:41 UTC (permalink / raw)
  To: procps-list
  Cc: Albert D. Cahalan, Linus Torvalds, linux-kernel, alexl,
	Alexander Viro

On Mon, 24 Feb 2003, Albert D. Cahalan wrote:

> It's fine to mix up thread order, but bad to interleave the threads of
> unrelated processes.

this is very simple to do, and does not necessiate thread-directories.  
There's a PID and TGid field in /proc/PID/status. Just link the task to
the TGid-task, and you have instant access to all threads per TGid. In the
'groupped output' case you have to scan & access all threads anyway. Ok?

(but this is way offtopic. The changes we posted address the normal case
of process-listing. (no -m option.) If there are tons of threads around
then any discussed variant of 'ps -m' will be slow.)

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24 12:29                   ` Albert D. Cahalan
  2003-02-24 12:41                     ` Ingo Molnar
@ 2003-02-24 12:45                     ` Ingo Molnar
  2003-02-24 12:52                     ` Ingo Molnar
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24 12:45 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: procps-list, Linus Torvalds, linux-kernel, alexl, viro

On Mon, 24 Feb 2003, Albert D. Cahalan wrote:

> Sorting is not default because of the memory requirements and because
> there have been many kernel bugs that cause ps to hang when it hits a
> particular process. Sorting may mean that ps hangs or is killed before
> producing anything.

in fact there was an unconditional qsort() done after scanning all tasks,
until i pointed it out to Alex. It never caused any problems, and sorting
never showed up as a source of overhead in the profiler, so i'm not sure
why you insist on sorting so much.

if procps hangs then that's a bug in procps. I'm not quite sure what you
mean by 'it hits a particular process'.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24 12:29                   ` Albert D. Cahalan
  2003-02-24 12:41                     ` Ingo Molnar
  2003-02-24 12:45                     ` Ingo Molnar
@ 2003-02-24 12:52                     ` Ingo Molnar
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24 12:52 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: procps-list, linux-kernel, alexl

Albert, you, as a contributor to procps, have you ever run a high number
of (eg. more than 10K) threads and tested procps for that workload? Just
try it - start up eg. 10,000 threads (which just sleep) under 2.5.62, and
time all the various procps utilities. Then you'll see it first-hand,
where all the overhead lies.

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-23 21:51           ` Albert D. Cahalan
  2003-02-24  9:28             ` Ingo Molnar
@ 2003-02-24  9:38             ` Ingo Molnar
  2003-02-24 12:13               ` Albert D. Cahalan
  1 sibling, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2003-02-24  9:38 UTC (permalink / raw)
  To: procps-list; +Cc: Linus Torvalds, linux-kernel, alexl, Alexander Viro, mingo


On Sun, 23 Feb 2003, Albert D. Cahalan wrote:

> The kernel does not provide proper support for grouping threads by
> process. Hacks exist to group them anyway, but such hacks will falsely
> group similar tasks and will fail to group tasks due to race conditions.
> The hacks are also slow. As none of this is acceptable in a critical
> system tool, task grouping is not currently available.

Albert, here you make the mistake of connecting 'thread directories' to
'raceless task state displaying'. There's no such connection, and there's
no way to get rid of said 'race conditions'. There's just no way, and in
fact no desire to get a completely race-free task state from /proc. It's
simply not possible. readdir() and /proc is just fundamentally non-atomic
- i challenge you to show me a single mainstream OS in existence that can
display a completely race-free and accurate snapshot of task state.

here's an offer: propose a sane /proc based mechanism that is totally
race-free, and i'll implement it. If you think that thread-directories
solve the races then you are dead wrong. (Btw., i too would like to see
thread directories, and if you look at the first NPTL patches you'll see
that i've proposed them as the superior solution.) This is a hard problem,
and if you think it's "just" the matter of kernel support then you vastly
underestimate the complexity involved.

	Ingo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-24  9:38             ` Ingo Molnar
@ 2003-02-24 12:13               ` Albert D. Cahalan
  0 siblings, 0 replies; 28+ messages in thread
From: Albert D. Cahalan @ 2003-02-24 12:13 UTC (permalink / raw)
  To: procps-list; +Cc: Linus Torvalds, linux-kernel, alexl, Alexander Viro, mingo

Ingo Molnar writes:
> On Sun, 23 Feb 2003, Albert D. Cahalan wrote:

>> The kernel does not provide proper support for grouping threads by
>> process. Hacks exist to group them anyway, but such hacks will falsely
>> group similar tasks and will fail to group tasks due to race conditions.
>> The hacks are also slow. As none of this is acceptable in a critical
>> system tool, task grouping is not currently available.
>
> Albert, here you make the mistake of connecting 'thread directories'
> to 'raceless task state displaying'. There's no such connection,
> and there's no way to get rid of said 'race conditions'.

There are races with serious consequences, and ones without.
That we have race conditions is no excuse to be adding more!
I never did claim that /proc would provide an atomic snapshot.
I don't even claim that the race conditions are because of
your code, but you were passing up an opportunity to fix a few.

As long as there is a way to determine all tasks belonging
to a POSIX process (TIDs in a TGID, or whatever) without
reading damn near everything, then things can be sane.

> here's an offer: propose a sane /proc based mechanism that
> is totally race-free, and i'll implement it. If you think
> that thread-directories solve the races then you are dead wrong.

All or nothing? I would value a few improvements.

Thread directories are one. Alternately, add per-task lists
of tasks with the same VM or a mapping file.

I just went looking for that syscall which was supposed to
allow opening a file relative to a directory file descriptor.
Hmmm, not there. Was that one shot down for some security
concern related to file descriptor passing? It would help
with the process death and PID-wrap-around races at least.

Per-task UUID values could help. Reset one on exec, and one
for every operation that would set an existing ID.

> (Btw., i too would like to see thread directories, and if
> you look at the first NPTL patches you'll see that i've
> proposed them as the superior solution.) This is a hard
> problem, and if you think it's "just" the matter of kernel
> support then you vastly underestimate the complexity involved.

There are many problems, some of which are hard.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:14   ` Ingo Molnar
  2003-02-20 17:22     ` Linus Torvalds
  2003-02-20 17:23     ` Ingo Molnar
@ 2003-02-20 17:28     ` Jeff Garzik
  2003-02-20 17:30       ` Ingo Molnar
  2003-02-20 17:48       ` Linus Torvalds
  2 siblings, 2 replies; 28+ messages in thread
From: Jeff Garzik @ 2003-02-20 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Alex Larsson, procps-list,
	Alexander Viro

On Thu, Feb 20, 2003 at 06:14:38PM +0100, Ingo Molnar wrote:
> On Thu, 20 Feb 2003, Linus Torvalds wrote:
> > It would just be _so_ much nicer if the threads would show up as
> > subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable,
> > and just generally more sane.
> 
> Al says that this cannot be done sanely, and is fraught with security
> problems. I'd vote for it if it were possible. Al?

Having the kernel automatically manage creation/destruction of
directories is the sticking point, AFAIK.

Why not use the "squid method"?  Create directories 00..FF, and sort the
pids/tids into buckets that way.  Then you are not creating and
destroying directories all the time.

	Jeff




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:28     ` Jeff Garzik
@ 2003-02-20 17:30       ` Ingo Molnar
  2003-02-20 17:48       ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2003-02-20 17:30 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linus Torvalds, linux-kernel, Alex Larsson, procps-list,
	Alexander Viro


On Thu, 20 Feb 2003, Jeff Garzik wrote:

> > > It would just be _so_ much nicer if the threads would show up as
> > > subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable,
> > > and just generally more sane.
> > 
> > Al says that this cannot be done sanely, and is fraught with security
> > problems. I'd vote for it if it were possible. Al?
> 
> Having the kernel automatically manage creation/destruction of
> directories is the sticking point, AFAIK.

it already has to create/destroy the main PID directory, so i cannot see a 
big difference.

> Why not use the "squid method"?  Create directories 00..FF, and sort the
> pids/tids into buckets that way.  Then you are not creating and
> destroying directories all the time.

yuck ...

	Ingo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:28     ` Jeff Garzik
  2003-02-20 17:30       ` Ingo Molnar
@ 2003-02-20 17:48       ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2003-02-20 17:48 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ingo Molnar, linux-kernel, Alex Larsson, procps-list,
	Alexander Viro


On Thu, 20 Feb 2003, Jeff Garzik wrote:
> 
> Having the kernel automatically manage creation/destruction of
> directories is the sticking point, AFAIK.

That _used_ to be an issue, since we didn't actually create the dentry at 
fork time. But we do all that maintenance anyway these days, I think.

		Linus


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:06 ` Linus Torvalds
  2003-02-20 17:14   ` Ingo Molnar
@ 2003-02-20 17:20   ` Linus Torvalds
  2003-02-20 19:13     ` Daniel Jacobowitz
  1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2003-02-20 17:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Alex Larsson, procps-list


On Thu, 20 Feb 2003, Linus Torvalds wrote:
> 
> It would just be _so_ much nicer if the threads would show up as 
> subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable, 
> and just generally more sane.

It shouldn't even be all that much harder. You only really need to add the 
"lookup()" and "readdir()" logic to the pid-fd's, and they both should be 
fairly straightforward, ie something like the appended should do the 
lookup() part.

(UNTESTED! NOT COMPILED! PROBABLY HORRIBLY BUGGY! CAVEAT USER! CONCEPTUAL 
CODE ONLY! YOU GET THE IDEA! I'M GETTING HOARSE FROM ALL THE SHOUTING!)

		Linus

---
===== base.c 1.39 vs edited =====
--- 1.39/fs/proc/base.c	Sat Feb 15 19:30:17 2003
+++ edited/base.c	Thu Feb 20 09:18:15 2003
@@ -964,10 +964,15 @@
 	struct task_struct *task = proc_task(dir);
 	struct pid_entry *p;
 	struct proc_inode *ei;
+	unsigned long tid;
 
 	error = -ENOENT;
 	inode = NULL;
 
+	tid = name_to_int(dentry);
+	if (tid != ~0U)
+		goto thread_lookup;
+                                
 	for (p = base_stuff; p->name; p++) {
 		if (p->len != dentry->d_name.len)
 			continue;
@@ -1052,6 +1057,30 @@
 	if (!proc_task(dentry->d_inode)->pid)
 		d_drop(dentry);
 	return NULL;
+
+thread_lookup: {
+	struct task_struct *thread;
+	read_lock(&tasklist_lock);
+	thread = task;
+	while ((thread = next_thread(thread)) != task) {
+		if (thread->pid == tid)
+			goto found_thread;
+	}
+	read_unlock(&tasklist_lock);
+	return ERR_PTR(-ENOENT);
+
+found_thread:
+	get_task_struct(thread);
+	read_unlock(&tasklist_lock);
+
+	inode = proc_pid_make_inode(sb, thread, 0x800000);
+	put_task_struct(thread);
+	if (!inode)
+		ERR_PTR(-ENOENT);
+	dentry->d_op = dentry->d_parent->d_op;
+	d_add(dentry, inode);
+	return NULL;
+}
 
 out:
 	return ERR_PTR(error);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [patch] procfs/procps threading performance speedup, 2.5.62
  2003-02-20 17:20   ` Linus Torvalds
@ 2003-02-20 19:13     ` Daniel Jacobowitz
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Jacobowitz @ 2003-02-20 19:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, linux-kernel, Alex Larsson, procps-list

On Thu, Feb 20, 2003 at 09:20:38AM -0800, Linus Torvalds wrote:
> 
> On Thu, 20 Feb 2003, Linus Torvalds wrote:
> > 
> > It would just be _so_ much nicer if the threads would show up as 
> > subdirectories ie /proc/<tgid>/<tid>/xxx. More scalable, more readable, 
> > and just generally more sane.
> 
> It shouldn't even be all that much harder. You only really need to add the 
> "lookup()" and "readdir()" logic to the pid-fd's, and they both should be 
> fairly straightforward, ie something like the appended should do the 
> lookup() part.
> 
> (UNTESTED! NOT COMPILED! PROBABLY HORRIBLY BUGGY! CAVEAT USER! CONCEPTUAL 
> CODE ONLY! YOU GET THE IDEA! I'M GETTING HOARSE FROM ALL THE SHOUTING!)

It'd be a little (very little) more complex, but can I once again
suggest /proc/<tgid>/threads/<tid> instead of /proc/<tgid>/<tid>/xxx?


-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2003-02-24 12:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-20 16:33 [patch] procfs/procps threading performance speedup, 2.5.62 Ingo Molnar
2003-02-20 16:47 ` Miquel van Smoorenburg
2003-02-20 16:55   ` Daniel Jacobowitz
2003-02-20 17:00 ` Nikita Danilov
2003-02-20 17:06 ` Linus Torvalds
2003-02-20 17:14   ` Ingo Molnar
2003-02-20 17:22     ` Linus Torvalds
2003-02-21  7:38       ` Oliver Xymoron
2003-02-20 17:23     ` Ingo Molnar
2003-02-22 20:52       ` Albert Cahalan
2003-02-22 21:55         ` Andrew Morton
2003-02-22 23:04           ` Albert Cahalan
2003-02-23 18:49         ` Ingo Molnar
2003-02-23 21:51           ` Albert D. Cahalan
2003-02-24  9:28             ` Ingo Molnar
2003-02-24 11:12               ` Albert D. Cahalan
2003-02-24 11:26                 ` Ingo Molnar
2003-02-24 12:29                   ` Albert D. Cahalan
2003-02-24 12:41                     ` Ingo Molnar
2003-02-24 12:45                     ` Ingo Molnar
2003-02-24 12:52                     ` Ingo Molnar
2003-02-24  9:38             ` Ingo Molnar
2003-02-24 12:13               ` Albert D. Cahalan
2003-02-20 17:28     ` Jeff Garzik
2003-02-20 17:30       ` Ingo Molnar
2003-02-20 17:48       ` Linus Torvalds
2003-02-20 17:20   ` Linus Torvalds
2003-02-20 19:13     ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox