[PATCH] get_pid() performance fix

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hubertus Franke <frankeh@watson.ibm.com>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	davej@suse.de, marcelo@cnectiva.com.br
Cc: "Rajan Ravindran" <rajancr@us.ibm.com>,
	linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net
Subject: [PATCH] get_pid() performance fix
Date: Fri, 22 Mar 2002 17:14:03 -0500	[thread overview]
Message-ID: <20020322221318.5F6083FE06@smtp.linux.ibm.com> (raw)
In-Reply-To: <OF810580E6.8672B341-ON85256B73.005AF9B8@pok.ibm.com> <20020315183610.212993FE06@smtp.linux.ibm.com> <87zo19jdu7.fsf@devron.myhome.or.jp>

I implemented an alternative version of getpid, that for large thread counts
( > 220000), provides "significantly" better performance as shown in attached
performance analysis. This is particulary viable for PID_MAX=32768.
Note under the current algorithm, allocating the last pid will take 32 
seconds (!!!) on a 500MHZ P-III.
I addressed the correctnes issue that Hirofumi brought up with the last 
submission (now test case "-c 4")
Attached patch is against 2.5.7.

-- Hubertus Franke <frankeh@watson.ibm.com>

------------------------------------------------------------------------------

Currently the getpid algorithm works as follows:
At any given time an interval of [ last_pid .. next_safe ) is known to hold  
unused pids. Initially the interval is set to [0 .. 32767]

Hence to allocate a new pid the following is sufficient:

       if (++last_pid < next_safe) return last_pid;

However, if we move out of the safe interval, the next safe interval needs 
to be established first.
This is currently done by a repetive search 

repeat:
     foralltasks(p) {
        if (p uses lastpid) { last_pid++; goto repeat; }
        /* narrow [ last_pid .. next_safe ) */
        if (p->pids in [ last_pid .. next_safe ) ) next_safe = p->pid
     }

Particulary for large number of tasks, this can lead to frequent exercise of
the repeat resulting in a O(N^2) algorithm. We call this : <algo-0>.

Instead I have provided an alternative mechanism that at the time
of determining the next interval marks a bitmask by walking the tasklist
once [ O(N) ] and then finding the proper bit offsets to mark the next free
interval starting from last_pid. The bitmap requires 4096 bytes.
This is <algo-1>.

An optimization to this to keep the last bitmap instead of clearing it
with every search. Only if we fail to obtain a free range, then we have
to go back and clear the bitmap and redo the search one more time.
This is <algo-2>. Here I omit the results, as they have only shown 
improvements for the very last few pids.

I dragged the various algorithms into a userlevel test program to figure
out where the cut off points are with PID_MAX=32768. In this testprogram
I maintain <TH> tasks, and for 10 rounds (delete <DEL> random tasks and 
reallocate <DEL> tasks again) resulting in T=10*D total measured allocations.

Si states how many interval searches where needed for algo-i.
Gi states the average overhead per get_pid for algo-i in usecs.

Based on that one should use the current algorithm until ~ 22-25K tasks and
beyond that use algo-2. Only the last 15 tasks are a bit faster under algo-1.
We can safely ignore that case.

Based on that providing an adaptive implementation seems the right choice.
The patch for 2.5.7 is attached.


executed program example: getpid -c 2 -s 10 -e 100 -D 10 -t <0,1> 
        0 is old 1 is new algo 2.

TH:  average thread count in system
DEL: number of randomly deleted threads and then reallocated
TAL: total number of getpid invocation
C-NEW: number of times search was invoked 
T-NEW: per getpid() overhead
C-OLD: number of times search was invoked 
T-OLD: per getpid() overhead


   TH   DEL   TAL |   C-NEW       T-NEW |   C-OLD       T-OLD
   10    10    80 |      1        0.79  |      1        0.41
   20    10   100 |      1        0.56  |      1        0.36
   30    10   100 |      1        0.56  |      1        0.38
   40    10   100 |      1        0.58  |      1        0.42
   50    10   100 |      1        0.59  |      1        0.43
   60    10   100 |      2        0.84  |      2        0.52
   70    10   100 |      1        0.72  |      1        0.43
   80    10   100 |      1        0.72  |      1        0.48
  100    50   500 |      2        0.38  |      2        0.27
  150    50   500 |      4        0.56  |      3        0.32
  200    50   500 |      5        0.79  |      4        0.40
  250    50   500 |      6        1.03  |      2        0.35
  300    50   500 |      9        1.68  |      2        0.38
  350    50   500 |      6        1.47  |      4        0.58
  400    50   500 |     10        2.54  |      8        1.05
  450    50   500 |      7        2.10  |      7        1.02
  500    50   500 |      7        2.32  |      4        0.75
  550    50   500 |      9        3.13  |      5        0.95
  600    50   500 |     14        5.14  |      6        1.18
  650    50   500 |     13        5.23  |      9        1.79
  700    50   500 |     10        4.35  |      9        1.87
  750    50   500 |     15        6.91  |      8        2.02
  800    50   500 |     12        5.98  |      8        1.95
  850    50   500 |     13        6.85  |      9        2.29
  900    50   500 |     27       14.55  |     18        4.46
 1000  1000  9980 |      1        0.21  |      1        0.21
 2000  1000 10000 |    322       14.36  |     76        2.03
 3000  1000 10000 |    606       46.10  |    161        6.63
 4000  1000 10000 |    901       97.58  |    359       18.87
 5000  1000 10000 |   1165      164.19  |    539       38.75
 6000  1000 10000 |   1498      266.58  |    724       66.96
 7000  1000 10000 |   1835      396.91  |   1026      122.35
 8000  1000 10000 |   2084      531.83  |   1228      179.70
 9000  1000 10000 |   2489      746.99  |   1516      264.16
10000  1000 10000 |   2763      946.99  |   1818      375.22
11000  1000 10000 |   3095     1199.73  |   2093      502.47
12000  1000 10000 |   3277     1422.02  |   2305      648.17
13000  1000 10000 |   3711     1776.28  |   2680      854.89
14000  1000 10000 |   3878     2033.30  |   2959     1061.61
15000  1000 10000 |   4301     2452.35  |   3167     1266.78
16000  1000 10000 |   4485     2757.00  |   3466     1554.22
17000  1000 10000 |   4844     3210.19  |   3743     1857.74
18000  1000 10000 |   5218     3681.90  |   4069     2247.13
19000  1000 10000 |   5501     4118.69  |   4293     2605.90
20000  1000 10000 |   5770     4594.99  |   4616     3095.39
21000  1000 10000 |   6161     5172.44  |   4974     3686.87
22000  1000 10000 |   6389     5637.80  |   5177     4258.81
23000  1000 10000 |   6665     6191.73  |   5483     4949.61
24000  1000 10000 |   6982     6777.02  |   5838     5831.25
25000  1000 10000 |   7209     7318.15  |   6183     6905.34
----------------> Break even range <------------------------
26000  1000 10000 |   7533     7954.33  |   6413     7955.66
27000  1000 10000 |   7959     8749.97  |   6748     9444.79
28000  1000 10000 |   8140     9302.36  |   7139    11617.75
29000  1000 10000 |   8501    10085.77  |   7638    14960.53
30000  1000 10000 |   8911    10946.80  |   8178    19584.24
32000    50   500 |    487    12265.89  |    486    94498.36
32050    50   500 |    486    12314.17  |    486    95832.03
32100    50   500 |    486    12389.38  |    488   108895.28
32150    50   500 |    492    12599.58  |    484   111742.39
32200    50   500 |    497    12792.36  |    491   124440.45
32250    50   500 |    490    12659.33  |    490   134499.09
32300    50   500 |    495    12843.78  |    489   145873.72
32350    50   500 |    495    12915.18  |    493   158940.66
32400    50   500 |    496    12988.64  |    494   183092.45
32450    50   500 |    492    12924.86  |    490   196135.28
32500    50   500 |    495    13043.85  |    488   223226.98
32550    50   500 |    498    13164.09  |    495   265431.10
32600    50   500 |    498    13222.56  |    495   326363.36
32650    50   500 |    498    13279.90  |    497   441002.09
32700    50   500 |    499    13368.66  |    499   656269.52
32751     1    10 |     10    12836.40  |     10  4031660.40
32752     1    10 |     10    12831.70  |     10  4620214.70
32753     1    10 |     10    12832.60  |     10  4188605.70
32754     1    10 |     10    12837.60  |     10  2972975.40
32755     1    10 |     10    12835.90  |     10  4434635.70
32756     1    10 |     10    12833.80  |     10  3892058.30
32757     1    10 |     10    12839.10  |     10  5002365.30
32758     1    10 |     10    12840.20  |     10  6332786.20
32759     1    10 |     10    12840.20  |     10  5269462.90
32760     1    10 |     10    14127.80  |     10  8234780.40
32761     1    10 |     10    14134.90  |     10  7558043.20
32762     1    10 |     10    14129.70  |     10  9117119.40
32763     1    10 |     10    14134.70  |     10 13895498.10
32764     1    10 |     10    14140.10  |     10 13608972.90
32765     1    10 |     10    15427.30  |     10 12930469.20
32766     1    10 |     10    16708.30  |     10 23576610.90
32767     1    10 |     10    17997.90  |     10 32603396.10

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

diff -urbN linux-2.5.7/kernel/fork.c linux-2.5.7-pid/kernel/fork.c
--- linux-2.5.7/kernel/fork.c	Mon Mar 18 15:37:05 2002
+++ linux-2.5.7-pid/kernel/fork.c	Fri Mar 22 16:38:29 2002
@@ -125,24 +125,111 @@
 /* Protects next_safe and last_pid. */
 spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;
 
+/* this should be provided in every architecture */
+#ifndef SHIFT_PER_LONG
+#if BITS_PER_LONG == 64
+#   define SHIFT_PER_LONG 6
+#elif BITS_PER_LONG == 32
+#   define SHIFT_PER_LONG 5
+#else
+#error "SHIFT_PER_LONG"
+#endif
+#endif
+
+#define RESERVED_PIDS       (300)
+#define GETPID_THRESHOLD    (26000)  /* when to switch to a mapped algo */
+#define PID_MAP_SIZE        (PID_MAX >> SHIFT_PER_LONG)
+static unsigned long pid_map[PID_MAP_SIZE];
+static int next_safe = PID_MAX;
+
+static inline void mark_pid(int pid)
+{
+	__set_bit(pid,pid_map);
+}
+
+static int get_pid_by_map(int lastpid) 
+{
+	static int mark_and_sweep = 0;
+
+	int round = 0;
+        struct task_struct *p;
+        int i;
+	unsigned long mask;
+	int fpos;
+
+
+	if (mark_and_sweep) {
+repeat:
+		mark_and_sweep = 0;
+		memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
+		lastpid = RESERVED_PIDS;
+	}
+	for_each_task(p) {
+		mark_pid(p->pid);
+		mark_pid(p->pgrp);
+		mark_pid(p->tgid);
+		mark_pid(p->session);
+	}
+
+	/* find next free pid */
+	i = (lastpid >> SHIFT_PER_LONG);
+	mask = pid_map[i] | (((lastpid & (BITS_PER_LONG-1)) << 1) - 1);
+	
+	while ((mask == ~0) && (++i < PID_MAP_SIZE)) 
+		mask = pid_map[i];
+	
+	if (i == PID_MAP_SIZE) { 
+		if (round == 0) {
+			round = 1;
+			goto repeat;
+		}
+		next_safe = RESERVED_PIDS;
+		mark_and_sweep = 1;  /* mark next time */
+		return 0; 
+	}
+
+	fpos = ffz(mask);
+	i &= (PID_MAX-1);
+	lastpid = (i << SHIFT_PER_LONG) + fpos;
+
+	/* find next save pid */
+	mask &= ~((1 << fpos) - 1);
+
+	while ((mask == 0) && (++i < PID_MAP_SIZE)) 
+		mask = pid_map[i];
+
+	if (i==PID_MAP_SIZE) 
+		next_safe = PID_MAX;
+	else 
+		next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
+	return lastpid;
+}
+
 static int get_pid(unsigned long flags)
 {
-	static int next_safe = PID_MAX;
 	struct task_struct *p;
-	int pid;
+	int pid,beginpid;
 
 	if (flags & CLONE_PID)
 		return current->pid;
 
 	spin_lock(&lastpid_lock);
+	beginpid = last_pid;
 	if((++last_pid) & 0xffff8000) {
-		last_pid = 300;		/* Skip daemons etc. */
+		last_pid = RESERVED_PIDS;		/* Skip daemons etc. */
 		goto inside;
 	}
 	if(last_pid >= next_safe) {
 inside:
 		next_safe = PID_MAX;
 		read_lock(&tasklist_lock);
+		if (nr_threads > GETPID_THRESHOLD) {
+			last_pid = get_pid_by_map(last_pid);
+			if (unlikely(last_pid == 0)) {
+				last_pid = PID_MAX;
+				goto nomorepids; 
+			}
+		} else {
 	repeat:
 		for_each_task(p) {
 			if(p->pid == last_pid	||
@@ -151,9 +238,11 @@
 			   p->session == last_pid) {
 				if(++last_pid >= next_safe) {
 					if(last_pid & 0xffff8000)
-						last_pid = 300;
+							last_pid = RESERVED_PIDS;
 					next_safe = PID_MAX;
 				}
+					if(unlikely(last_pid == beginpid))
+						goto nomorepids;
 				goto repeat;
 			}
 			if(p->pid > last_pid && next_safe > p->pid)
@@ -162,6 +251,9 @@
 				next_safe = p->pgrp;
 			if(p->session > last_pid && next_safe > p->session)
 				next_safe = p->session;
+				if(p->tgid > last_pid && next_safe > p->tgid)
+					next_safe = p->tgid;
+			}
 		}
 		read_unlock(&tasklist_lock);
 	}
@@ -169,6 +261,10 @@
 	spin_unlock(&lastpid_lock);
 
 	return pid;
+nomorepids:
+	read_unlock(&tasklist_lock);
+	spin_unlock(&lastpid_lock);
+	return 0;
 }
 
 static inline int dup_mmap(struct mm_struct * mm)

next prev parent reply	other threads:[~2002-03-22 22:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-05 16:43 Fwd: [Lse-tech] get_pid() performance fix Rajan Ravindran
2002-03-05 17:57 ` OGAWA Hirofumi
2002-03-05 19:53   ` Hubertus Franke
2002-03-05 20:10     ` OGAWA Hirofumi
2002-03-05 21:59       ` Hubertus Franke
2002-03-05 22:48         ` OGAWA Hirofumi
2002-03-05 23:40           ` Hubertus Franke
2002-03-14 23:18 ` [PATCH] " Hubertus Franke
2002-03-15 14:57   ` OGAWA Hirofumi
2002-03-15 15:16   ` OGAWA Hirofumi
2002-03-15 18:37     ` Hubertus Franke
2002-03-16  5:12       ` OGAWA Hirofumi
2002-03-18 21:44         ` Hubertus Franke
2002-03-22 22:14         ` Hubertus Franke [this message]
2002-03-22 22:28           ` Davide Libenzi
2002-03-22 22:26             ` Hubertus Franke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020322221318.5F6083FE06@smtp.linux.ibm.com \
    --to=frankeh@watson.ibm.com \
    --cc=davej@suse.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=marcelo@cnectiva.com.br \
    --cc=rajancr@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.