public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Scalability requirements for sysv ipc - v2
@ 2008-04-29 14:33 Nadia.Derbey
  2008-04-29 14:33 ` [PATCH 01/10] Fix idr_remove() Nadia.Derbey
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm

After scalability problems had been detected when using the sysV ipcs, I
have proposed to use an RCU based implementation of the IDR api instead
(see thread http://lkml.org/lkml/2008/4/11/212).

Resending the patch series after
 . integrating Paul's remarks
 . fixing a bug I had introduced
 . porting it to 2.6.25-mm1.

Reviewers are still welcome!

Patch 1 can be taken alone: it fixes a problem in the existing IDR API.

Patches should be applied on linux-2.6.25-mm1, in the following order:

[ PATCH 01/10 ] : idr_minor_fix.patch
[ PATCH 02/10 ] : ridr_structure.patch
[ PATCH 03/10 ] : ridr_pre_get.patch
[ PATCH 04/10 ] : ridr_init.patch
[ PATCH 05/10 ] : ridr_get_new_above.patch
[ PATCH 06/10 ] : ridr_get_new.patch
[ PATCH 07/10 ] : ridr_find.patch
[ PATCH 08/10 ] : ridr_remove.patch
[ PATCH 09/10 ] : ipc_use_ridr.patch
[ PATCH 10/10 ] : remove_ipc_lock_down.patch

Regards,
Nadia


--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/10] Fix idr_remove()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-04-29 18:44   ` Andrew Morton
  2008-04-29 14:33 ` [PATCH 02/10] Introduce the ridr structure Nadia.Derbey
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_minor_fix.patch --]
[-- Type: text/plain, Size: 709 bytes --]

[PATCH 01/10]

This patch fixes idr_remove(): the return inside the loop makes us free only
a single layer.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 lib/idr.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-25 15:29:00.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-04-25 15:48:34.000000000 +0200
@@ -385,8 +385,8 @@ void idr_remove(struct idr *idp, int id)
 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
 		p = alloc_layer(idp);
 		kmem_cache_free(idr_layer_cache, p);
-		return;
 	}
+	return;
 }
 EXPORT_SYMBOL(idr_remove);
 

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 02/10] Introduce the ridr structure
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
  2008-04-29 14:33 ` [PATCH 01/10] Fix idr_remove() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:30   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 03/10] Introduce ridr_pre_get() Nadia.Derbey
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_structure.patch --]
[-- Type: text/plain, Size: 4985 bytes --]

[PATCH 02/10]

This patch introduces the ridr structure as an RCU safe idr structure: a layer
is actually made of the existing idr layer, followed by the rcu head.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h  |    6 +++---
 include/linux/ridr.h |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 init/main.c          |    2 ++
 lib/Makefile         |    2 +-
 lib/idr.c            |    2 +-
 lib/ridr.c           |   26 ++++++++++++++++++++++++++
 6 files changed, 83 insertions(+), 5 deletions(-)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 13:08:00.000000000 +0200
@@ -0,0 +1,50 @@
+/*
+ * include/linux/ridr.h
+ *
+ * Small id to pointer translation service avoiding fixed sized
+ * tables. RCU-based implmentation of IDRs.
+ */
+
+#ifndef _RIDR_H_
+#define _RIDR_H_
+
+#include <linux/idr.h>
+#include <linux/rcupdate.h>
+
+/*
+ * The idr_layer part should stay at the beginning of the structure
+ */
+struct ridr_layer {
+	struct idr_layer idr;
+	struct rcu_head	 rcu_head;
+};
+
+struct ridr {
+	struct ridr_layer *top;
+	struct ridr_layer *id_free;
+	int		  layers;
+	int		  id_free_cnt;
+	spinlock_t	  lock;
+};
+
+#define RIDR_INIT(name)						\
+{								\
+	.top		= NULL,					\
+	.id_free	= NULL,					\
+	.layers 	= 0,					\
+	.id_free_cnt	= 0,					\
+	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),	\
+}
+#define DEFINE_RIDR(name)	struct ridr name = RIDR_INIT(name)
+
+#define idr_to_ridr(p) container_of((p), struct ridr_layer, idr)
+#define ridr_to_idr(p) (&((p)->idr))
+
+/*
+ * This is what we export.
+ */
+
+
+void __init ridr_init_cache(void);
+
+#endif /* _RIDR_H_ */
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 13:08:05.000000000 +0200
@@ -0,0 +1,26 @@
+/*
+ * RCU-based idr API
+ */
+
+
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/ridr.h>
+
+static struct kmem_cache *ridr_layer_cache;
+
+
+static void ridr_cache_ctor(struct kmem_cache *ridr_layer_cache,
+			void *ridr_layer)
+{
+	memset(ridr_layer, 0, sizeof(struct ridr_layer));
+}
+
+void __init ridr_init_cache(void)
+{
+	ridr_layer_cache = kmem_cache_create("ridr_layer_cache",
+				sizeof(struct ridr_layer), 0, SLAB_PANIC,
+				ridr_cache_ctor);
+}
+
Index: linux-2.6.25-mm1/lib/Makefile
===================================================================
--- linux-2.6.25-mm1.orig/lib/Makefile	2008-04-25 15:29:00.000000000 +0200
+++ linux-2.6.25-mm1/lib/Makefile	2008-04-25 15:57:39.000000000 +0200
@@ -6,7 +6,7 @@ lib-y := ctype.o string.o vsprintf.o cmd
 	 rbtree.o radix-tree.o dump_stack.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
 	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
-	 proportions.o prio_heap.o ratelimit.o
+	 proportions.o prio_heap.o ratelimit.o ridr.o
 
 ifdef CONFIG_FTRACE
 # Do not profile string.o, since it may be used in early boot or vdso
Index: linux-2.6.25-mm1/init/main.c
===================================================================
--- linux-2.6.25-mm1.orig/init/main.c	2008-04-25 15:29:13.000000000 +0200
+++ linux-2.6.25-mm1/init/main.c	2008-04-25 15:58:15.000000000 +0200
@@ -61,6 +61,7 @@
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/idr.h>
+#include <linux/ridr.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -639,6 +640,7 @@ asmlinkage void __init start_kernel(void
 	kmem_cache_init();
 	debug_objects_mem_init();
 	idr_init_cache();
+	ridr_init_cache();
 	setup_per_cpu_pageset();
 	numa_policy_init();
 	if (late_time_init)
Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-25 17:22:43.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-04-29 13:08:05.000000000 +0200
@@ -342,7 +342,7 @@ static void sub_remove(struct idr *idp, 
 	while ((shift > 0) && p) {
 		n = (id >> shift) & IDR_MASK;
 		__clear_bit(n, &p->bitmap);
-		*++paa = &p->ary[n];
+		*++paa = (struct idr_layer **) &p->ary[n];
 		p = p->ary[n];
 		shift -= IDR_BITS;
 	}
Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-25 17:22:43.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 13:08:00.000000000 +0200
@@ -48,9 +48,9 @@
 #define IDR_FREE_MAX MAX_LEVEL + MAX_LEVEL
 
 struct idr_layer {
-	unsigned long		 bitmap; /* A zero bit means "space here" */
-	struct idr_layer	*ary[1<<IDR_BITS];
-	int			 count;	 /* When zero, we can release it */
+	unsigned long	 bitmap; /* A zero bit means "space here" */
+	void		*ary[1<<IDR_BITS];
+	int		 count;	 /* When zero, we can release it */
 };
 
 struct idr {

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 03/10] Introduce ridr_pre_get()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
  2008-04-29 14:33 ` [PATCH 01/10] Fix idr_remove() Nadia.Derbey
  2008-04-29 14:33 ` [PATCH 02/10] Introduce the ridr structure Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:31   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 04/10] Introduce ridr_init() Nadia.Derbey
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_pre_get.patch --]
[-- Type: text/plain, Size: 2312 bytes --]

[PATCH 03/10]

This patch introduces the ridr_pre_get() routine.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ridr.h |    1 +
 lib/ridr.c           |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 13:08:00.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 13:14:56.000000000 +0200
@@ -43,6 +43,7 @@ struct ridr {
 /*
  * This is what we export.
  */
+int ridr_pre_get(struct ridr *, gfp_t);
 
 
 void __init ridr_init_cache(void);
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 13:08:05.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 13:17:59.000000000 +0200
@@ -11,6 +11,52 @@
 static struct kmem_cache *ridr_layer_cache;
 
 
+/* only called when idp->lock is held */
+/* moves an ridr_layer to the free list */
+static void __move_to_free_list(struct ridr *idp, struct ridr_layer *p)
+{
+	p->idr.ary[0] = idp->id_free;
+	idp->id_free = p;
+	idp->id_free_cnt++;
+}
+
+static void move_to_free_list(struct ridr *idp, struct ridr_layer *p)
+{
+	unsigned long flags;
+
+	/*
+	 * Depends on the return element being zeroed.
+	 */
+	spin_lock_irqsave(&idp->lock, flags);
+	__move_to_free_list(idp, p);
+	spin_unlock_irqrestore(&idp->lock, flags);
+}
+
+/**
+ * ridr_pre_get - reserver resources for ridr allocation
+ * @idp:	ridr handle
+ * @gfp_mask:	memory allocation flags
+ *
+ * This function should be called prior to locking and calling the
+ * ridr_get_new* functions. It preallocates enough memory to satisfy
+ * the worst possible allocation.
+ *
+ * If the system is REALLY out of memory this function returns 0,
+ * otherwise 1.
+ */
+int ridr_pre_get(struct ridr *idp, gfp_t gfp_mask)
+{
+	while (idp->id_free_cnt < IDR_FREE_MAX) {
+		struct ridr_layer *new;
+		new = kmem_cache_alloc(ridr_layer_cache, gfp_mask);
+		if (new == NULL)
+			return (0);
+		move_to_free_list(idp, new);
+	}
+	return 1;
+}
+EXPORT_SYMBOL(ridr_pre_get);
+
 static void ridr_cache_ctor(struct kmem_cache *ridr_layer_cache,
 			void *ridr_layer)
 {

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 04/10] Introduce ridr_init()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (2 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 03/10] Introduce ridr_pre_get() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:31   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 05/10] Introduce ridr_get_new_above() Nadia.Derbey
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_init.patch --]
[-- Type: text/plain, Size: 1332 bytes --]

[PATCH 04/10]

This patch introduces the ridr_init() routine.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ridr.h |    1 +
 lib/ridr.c           |   14 ++++++++++++++
 2 files changed, 15 insertions(+)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 13:14:56.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 13:21:56.000000000 +0200
@@ -44,6 +44,7 @@ struct ridr {
  * This is what we export.
  */
 int ridr_pre_get(struct ridr *, gfp_t);
+void ridr_init(struct ridr *);
 
 
 void __init ridr_init_cache(void);
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 13:17:59.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 13:23:17.000000000 +0200
@@ -70,3 +70,17 @@ void __init ridr_init_cache(void)
 				ridr_cache_ctor);
 }
 
+/**
+ * ridr_init - initialize ridr handle
+ * @idp:	ridr handle
+ *
+ * This function is used to set up the handle (@idp) that you will pass
+ * to the rest of the functions.
+ */
+void ridr_init(struct ridr *idp)
+{
+	memset(idp, 0, sizeof(struct ridr));
+	spin_lock_init(&idp->lock);
+}
+EXPORT_SYMBOL(ridr_init);
+

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 05/10] Introduce ridr_get_new_above()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (3 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 04/10] Introduce ridr_init() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:32   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 06/10] Introduce ridr_get_new() Nadia.Derbey
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_get_new_above.patch --]
[-- Type: text/plain, Size: 12016 bytes --]

[PATCH 05/10]

This patch introduces the ridr_get_new_above() routine, and some common code
between the idr an ridr API's.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h  |   21 ++++++
 include/linux/ridr.h |    1 
 lib/idr.c            |  151 ++++++++++++++++++++++++++------------------
 lib/ridr.c           |  175 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 287 insertions(+), 61 deletions(-)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 13:21:56.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 14:03:35.000000000 +0200
@@ -44,6 +44,7 @@ struct ridr {
  * This is what we export.
  */
 int ridr_pre_get(struct ridr *, gfp_t);
+int ridr_get_new_above(struct ridr *, void *, int, int *);
 void ridr_init(struct ridr *);
 
 
Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-29 13:08:00.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 14:08:47.000000000 +0200
@@ -71,6 +71,27 @@ struct idr {
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
+/* Actions to be taken after a call to _idr_sub_alloc */
+#define IDR_DONE         -1
+#define IDR_NEED_TO_GROW -2
+#define IDR_NOMORE_SPACE -3
+#define IDR_GO_TOP       -4
+#define IDR_GO_UP        -5
+
+#define _idr_rc_to_errno(rc) ((rc) == -1 ? -EAGAIN : -ENOSPC)
+
+static inline void _idr_set_new_slot(struct idr_layer *new,
+						struct idr_layer *p)
+{
+	new->ary[0] = p;
+	new->count = 1;
+	if (p->bitmap == IDR_FULL)
+		__set_bit(0, &new->bitmap);
+}
+
+void _idr_mark_full(struct idr_layer **, int);
+int _idr_sub_alloc(int *, int *, struct idr_layer **, struct idr_layer **);
+
 /*
  * This is what we export.
  */
Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-29 13:08:05.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-04-29 14:06:20.000000000 +0200
@@ -70,7 +70,7 @@ static void free_layer(struct idr *idp, 
 	spin_unlock_irqrestore(&idp->lock, flags);
 }
 
-static void idr_mark_full(struct idr_layer **pa, int id)
+void _idr_mark_full(struct idr_layer **pa, int id)
 {
 	struct idr_layer *p = pa[0];
 	int l = 0;
@@ -115,12 +115,71 @@ int idr_pre_get(struct idr *idp, gfp_t g
 }
 EXPORT_SYMBOL(idr_pre_get);
 
-static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
+int _idr_sub_alloc(int *cur_id, int *layer, struct idr_layer **cur_ptr,
+	struct idr_layer **pa)
 {
 	int n, m, sh;
-	struct idr_layer *p, *new;
-	int l, id, oid;
+	int l = *layer;
+	int id = *cur_id;
 	unsigned long bm;
+	struct idr_layer *p = *cur_ptr;
+	int rc;
+
+	n = (id >> (IDR_BITS * l)) & IDR_MASK;
+	bm = ~p->bitmap;
+	m = find_next_bit(&bm, IDR_SIZE, n);
+	if (m == IDR_SIZE) {
+		int oid;
+
+		/* no space available go back to previous layer. */
+		l++;
+		oid = id;
+		id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
+
+		/* if already at the top layer, we need to grow */
+		p = pa[l];
+		if (!p) {
+			*cur_id = id;
+			return IDR_NEED_TO_GROW;
+		}
+
+		/* If we need to go up one layer, continue the
+		 * loop; otherwise, restart from the top.
+		 */
+		*cur_id = id;
+		*layer = l;
+		*cur_ptr = p;
+		sh = IDR_BITS * (l + 1);
+		if (oid >> sh == id >> sh)
+			rc = IDR_GO_UP;
+		else
+			rc = IDR_GO_TOP;
+		goto end_sub_alloc;
+	}
+	if (m != n) {
+		sh = IDR_BITS * l;
+		id = ((id >> sh) ^ n ^ m) << sh;
+	}
+	if ((id >= MAX_ID_BIT) || (id < 0))
+		return IDR_NOMORE_SPACE;
+
+	if (l == 0)
+		rc = IDR_DONE;
+	else
+		rc = m;
+end_sub_alloc:
+	*cur_id = id;
+	*layer = l;
+	*cur_ptr = p;
+
+	return rc;
+}
+
+static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
+{
+	int m;
+	struct idr_layer *p, *new;
+	int l, id;
 
 	id = *starting_id;
  restart:
@@ -131,38 +190,23 @@ static int sub_alloc(struct idr *idp, in
 		/*
 		 * We run around this while until we reach the leaf node...
 		 */
-		n = (id >> (IDR_BITS*l)) & IDR_MASK;
-		bm = ~p->bitmap;
-		m = find_next_bit(&bm, IDR_SIZE, n);
-		if (m == IDR_SIZE) {
-			/* no space available go back to previous layer. */
-			l++;
-			oid = id;
-			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
-
-			/* if already at the top layer, we need to grow */
-			if (!(p = pa[l])) {
-				*starting_id = id;
-				return -2;
-			}
-
-			/* If we need to go up one layer, continue the
-			 * loop; otherwise, restart from the top.
-			 */
-			sh = IDR_BITS * (l + 1);
-			if (oid >> sh == id >> sh)
-				continue;
-			else
-				goto restart;
-		}
-		if (m != n) {
-			sh = IDR_BITS*l;
-			id = ((id >> sh) ^ n ^ m) << sh;
-		}
-		if ((id >= MAX_ID_BIT) || (id < 0))
-			return -3;
-		if (l == 0)
+		int action = _idr_sub_alloc(&id, &l, &p, pa);
+		switch (action) {
+		case IDR_NEED_TO_GROW:
+			*starting_id = id;
+		case IDR_NOMORE_SPACE:
+			return action;
+		case IDR_DONE:
+			goto end_loop;
+		case IDR_GO_UP:
+			continue;
+		case IDR_GO_TOP:
+			goto restart;
+		default:
+			m = action;
 			break;
+		}
+		BUG_ON(m < 0);
 		/*
 		 * Create the layer below if it is missing.
 		 */
@@ -175,7 +219,7 @@ static int sub_alloc(struct idr *idp, in
 		pa[l--] = p;
 		p = p->ary[m];
 	}
-
+end_loop:
 	pa[l] = p;
 	return id;
 }
@@ -219,16 +263,13 @@ build_up:
 			spin_unlock_irqrestore(&idp->lock, flags);
 			return -1;
 		}
-		new->ary[0] = p;
-		new->count = 1;
-		if (p->bitmap == IDR_FULL)
-			__set_bit(0, &new->bitmap);
+		_idr_set_new_slot(new, p);
 		p = new;
 	}
 	idp->top = p;
 	idp->layers = layers;
 	v = sub_alloc(idp, &id, pa);
-	if (v == -2)
+	if (v == IDR_NEED_TO_GROW)
 		goto build_up;
 	return(v);
 }
@@ -246,7 +287,7 @@ static int idr_get_new_above_int(struct 
 		 */
 		pa[0]->ary[id & IDR_MASK] = (struct idr_layer *)ptr;
 		pa[0]->count++;
-		idr_mark_full(pa, id);
+		_idr_mark_full(pa, id);
 	}
 
 	return id;
@@ -277,12 +318,8 @@ int idr_get_new_above(struct idr *idp, v
 	 * This is a cheap hack until the IDR code can be fixed to
 	 * return proper error values.
 	 */
-	if (rv < 0) {
-		if (rv == -1)
-			return -EAGAIN;
-		else /* Will be -3 */
-			return -ENOSPC;
-	}
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
 	*id = rv;
 	return 0;
 }
@@ -312,12 +349,8 @@ int idr_get_new(struct idr *idp, void *p
 	 * This is a cheap hack until the IDR code can be fixed to
 	 * return proper error values.
 	 */
-	if (rv < 0) {
-		if (rv == -1)
-			return -EAGAIN;
-		else /* Will be -3 */
-			return -ENOSPC;
-	}
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
 	*id = rv;
 	return 0;
 }
@@ -694,12 +727,8 @@ int ida_get_new_above(struct ida *ida, i
  restart:
 	/* get vacant slot */
 	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
-	if (t < 0) {
-		if (t == -1)
-			return -EAGAIN;
-		else /* will be -3 */
-			return -ENOSPC;
-	}
+	if (t < 0)
+		return _idr_rc_to_errno(t);
 
 	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
 		return -ENOSPC;
@@ -739,7 +768,7 @@ int ida_get_new_above(struct ida *ida, i
 
 	__set_bit(t, bitmap->bitmap);
 	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
-		idr_mark_full(pa, idr_id);
+		_idr_mark_full(pa, idr_id);
 
 	*p_id = id;
 
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 13:23:17.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:03:35.000000000 +0200
@@ -11,6 +11,23 @@
 static struct kmem_cache *ridr_layer_cache;
 
 
+static struct ridr_layer *get_from_free_list(struct ridr *idp)
+{
+	struct ridr_layer *q;
+	struct idr_layer *p;
+	unsigned long flags;
+
+	spin_lock_irqsave(&idp->lock, flags);
+	if ((q = idp->id_free)) {
+		p = ridr_to_idr(q);
+		idp->id_free = p->ary[0];
+		idp->id_free_cnt--;
+		p->ary[0] = NULL;
+	}
+	spin_unlock_irqrestore(&idp->lock, flags);
+	return(q);
+}
+
 /* only called when idp->lock is held */
 /* moves an ridr_layer to the free list */
 static void __move_to_free_list(struct ridr *idp, struct ridr_layer *p)
@@ -57,6 +74,164 @@ int ridr_pre_get(struct ridr *idp, gfp_t
 }
 EXPORT_SYMBOL(ridr_pre_get);
 
+static int sub_alloc(struct ridr *idp, int *starting_id,
+			struct ridr_layer **rpa, struct idr_layer **pa)
+{
+	int m;
+	struct ridr_layer *q, *new;
+	struct idr_layer *p;
+	int l, id;
+
+	id = *starting_id;
+ restart:
+	q = idp->top;
+	p = ridr_to_idr(q);
+	l = idp->layers;
+	pa[l] = NULL;
+	rpa[l--] = NULL;
+	while (1) {
+		/*
+		 * We run around this while until we reach the leaf node...
+		 */
+		int action = _idr_sub_alloc(&id, &l, &p, pa);
+		switch (action) {
+		case IDR_NEED_TO_GROW:
+			*starting_id = id;
+		case IDR_NOMORE_SPACE:
+			return action;
+		case IDR_DONE:
+			goto end_loop;
+		case IDR_GO_UP:
+			continue;
+		case IDR_GO_TOP:
+			goto restart;
+		default:
+			m = action;
+			break;
+		}
+		BUG_ON(m < 0);
+		/*
+		 * Create the layer below if it is missing.
+		 */
+		if (!p->ary[m]) {
+			new = get_from_free_list(idp);
+			if (!new)
+				return -1;
+			rcu_assign_pointer(p->ary[m], new);
+			p->count++;
+		}
+		pa[l] = p;
+		rpa[l--] = idr_to_ridr(p);
+		p = p->ary[m];
+	}
+
+end_loop:
+	pa[l] = p;
+	rpa[l] = idr_to_ridr(p);
+	return id;
+}
+
+static int ridr_get_empty_slot(struct ridr *idp, int starting_id,
+			      struct ridr_layer **rpa, struct idr_layer **pa)
+{
+	struct ridr_layer *p, *rnew;
+	int layers, v, id;
+	unsigned long flags;
+
+	id = starting_id;
+build_up:
+	p = idp->top;
+	layers = idp->layers;
+	if (unlikely(!p)) {
+		p = get_from_free_list(idp);
+		if (!p)
+			return -1;
+		layers = 1;
+	}
+	/*
+	 * Add a new layer to the top of the tree if the requested
+	 * id is larger than the currently allocated space.
+	 */
+	while (layers < MAX_LEVEL - 1 && id >= (1 << (layers * IDR_BITS))) {
+		layers++;
+		if (!p->idr.count)
+			continue;
+		rnew = get_from_free_list(idp);
+		if (!rnew) {
+			/*
+			 * The allocation failed.  If we built part of
+			 * the structure tear it down.
+			 */
+			spin_lock_irqsave(&idp->lock, flags);
+			for (rnew = p; p && p != idp->top; rnew = p) {
+				p = p->idr.ary[0];
+				rnew->idr.ary[0] = NULL;
+				rnew->idr.bitmap = rnew->idr.count = 0;
+				__move_to_free_list(idp, rnew);
+			}
+			spin_unlock_irqrestore(&idp->lock, flags);
+			return -1;
+		}
+		_idr_set_new_slot(ridr_to_idr(rnew), ridr_to_idr(p));
+		p = rnew;
+	}
+	rcu_assign_pointer(idp->top, p);
+	idp->layers = layers;
+	v = sub_alloc(idp, &id, rpa, pa);
+	if (v == IDR_NEED_TO_GROW)
+		goto build_up;
+	return(v);
+}
+
+static int ridr_get_new_above_int(struct ridr *idp, void *ptr, int starting_id)
+{
+	struct ridr_layer *rpa[MAX_LEVEL];
+	struct idr_layer *pa[MAX_LEVEL];
+	int id;
+
+	id = ridr_get_empty_slot(idp, starting_id, rpa, pa);
+	if (id >= 0) {
+		/*
+		 * Successfully found an empty slot.  Install the user
+		 * pointer and mark the slot full.
+		 */
+		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
+				(struct ridr_layer *)ptr);
+		pa[0]->count++;
+		_idr_mark_full(pa, id);
+	}
+
+	return id;
+}
+
+/**
+ * ridr_get_new_above - allocate new ridr entry above or equal to a start id
+ * @idp: ridr handle
+ * @ptr: pointer you want associated with the ide
+ * @start_id: id to start search at
+ * @id: pointer to the allocated handle
+ *
+ * This is the allocate id function.  It should be called with any
+ * required locks.
+ *
+ * If memory is required, it will return -EAGAIN, you should unlock
+ * and go back to the ridr_pre_get() call.  If the ridr is full, it will
+ * return -ENOSPC.
+ *
+ * @id returns a value in the range 0 ... 0x7fffffff
+ */
+int ridr_get_new_above(struct ridr *idp, void *ptr, int starting_id, int *id)
+{
+	int rv;
+
+	rv = ridr_get_new_above_int(idp, ptr, starting_id);
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
+	*id = rv;
+	return 0;
+}
+EXPORT_SYMBOL(ridr_get_new_above);
+
 static void ridr_cache_ctor(struct kmem_cache *ridr_layer_cache,
 			void *ridr_layer)
 {

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 06/10] Introduce ridr_get_new()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (4 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 05/10] Introduce ridr_get_new_above() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:32   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 07/10] Introduce ridr_find() Nadia.Derbey
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_get_new.patch --]
[-- Type: text/plain, Size: 1916 bytes --]

[PATCH 06/10]

This patch introduces the ridr_get_new() routine.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ridr.h |    1 +
 lib/ridr.c           |   27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 14:03:35.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 14:16:09.000000000 +0200
@@ -44,6 +44,7 @@ struct ridr {
  * This is what we export.
  */
 int ridr_pre_get(struct ridr *, gfp_t);
+int ridr_get_new(struct ridr *, void *, int *);
 int ridr_get_new_above(struct ridr *, void *, int, int *);
 void ridr_init(struct ridr *);
 
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 14:03:35.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:17:05.000000000 +0200
@@ -232,6 +232,33 @@ int ridr_get_new_above(struct ridr *idp,
 }
 EXPORT_SYMBOL(ridr_get_new_above);
 
+/**
+ * ridr_get_new - allocate new ridr entry
+ * @idp: ridr handle
+ * @ptr: pointer you want associated with the ide
+ * @id: pointer to the allocated handle
+ *
+ * This is the allocate id function.  It should be called with any
+ * required locks.
+ *
+ * If memory is required, it will return -EAGAIN, you should unlock
+ * and go back to the ridr_pre_get() call.  If the ridr is full, it will
+ * return -ENOSPC.
+ *
+ * @id returns a value in the range 0 ... 0x7fffffff
+ */
+int ridr_get_new(struct ridr *idp, void *ptr, int *id)
+{
+	int rv;
+
+	rv = ridr_get_new_above_int(idp, ptr, 0);
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
+	*id = rv;
+	return 0;
+}
+EXPORT_SYMBOL(ridr_get_new);
+
 static void ridr_cache_ctor(struct kmem_cache *ridr_layer_cache,
 			void *ridr_layer)
 {

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 07/10] Introduce ridr_find()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (5 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 06/10] Introduce ridr_get_new() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:32   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 08/10] Introduce ridr_remove() Nadia.Derbey
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_find.patch --]
[-- Type: text/plain, Size: 2848 bytes --]

[PATCH 07/10]

This patch introduces the ridr_find() routine.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ridr.h |   17 +++++++++++++++++
 lib/ridr.c           |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 14:16:09.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 14:31:43.000000000 +0200
@@ -40,9 +40,26 @@ struct ridr {
 #define idr_to_ridr(p) container_of((p), struct ridr_layer, idr)
 #define ridr_to_idr(p) (&((p)->idr))
 
+/**
+ * Ridr synchronization (see radix-tree.h)
+ *
+ * ridr_find() is able to be called locklessly, using RCU. The caller must
+ * ensure calls to this function are made within rcu_read_lock() regions.
+ * Other readers (lock-free or otherwise) and modifications may be running
+ * concurrently.
+ *
+ * It is still required that the caller manage the synchronization and
+ * lifetimes of the items. So if RCU lock-free lookups are used, typically
+ * this would mean that the items have their own locks, or are amenable to
+ * lock-free access; and that the items are freed by RCU (or only freed after
+ * having been deleted from the ridr tree *and* a synchronize_rcu() grace
+ * period).
+ */
+
 /*
  * This is what we export.
  */
+void *ridr_find(struct ridr *, int);
 int ridr_pre_get(struct ridr *, gfp_t);
 int ridr_get_new(struct ridr *, void *, int *);
 int ridr_get_new_above(struct ridr *, void *, int, int *);
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 14:17:05.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:32:26.000000000 +0200
@@ -259,6 +259,40 @@ int ridr_get_new(struct ridr *idp, void 
 }
 EXPORT_SYMBOL(ridr_get_new);
 
+/**
+ * ridr_find - return pointer for given id
+ * @idp: ridr handle
+ * @id: lookup key
+ *
+ * Return the pointer given the id it has been registered with.  A %NULL
+ * return indicates that @id is not valid or you passed %NULL in
+ * ridr_get_new().
+ *
+ * This function can be called under rcu_read_lock(), given that the leaf
+ * pointers lifetimes are correctly managed.
+ */
+void *ridr_find(struct ridr *idp, int id)
+{
+	int n;
+	struct ridr_layer *p;
+
+	n = idp->layers * IDR_BITS;
+	p = rcu_dereference(idp->top);
+
+	/* Mask off upper bits we don't use for the search. */
+	id &= MAX_ID_MASK;
+
+	if (id >= (1 << n))
+		return NULL;
+
+	while (n > 0 && p) {
+		n -= IDR_BITS;
+		p = rcu_dereference(p->idr.ary[(id >> n) & IDR_MASK]);
+	}
+	return((void *)p);
+}
+EXPORT_SYMBOL(ridr_find);
+
 static void ridr_cache_ctor(struct kmem_cache *ridr_layer_cache,
 			void *ridr_layer)
 {

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 08/10] Introduce ridr_remove()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (6 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 07/10] Introduce ridr_find() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:32   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 09/10] Integrate the ridr code into IPC code Nadia.Derbey
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ridr_remove.patch --]
[-- Type: text/plain, Size: 4993 bytes --]

[PATCH 08/10]

This patch introduces the ridr_remove() routine.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h  |    1 
 include/linux/ridr.h |    1 
 lib/idr.c            |    7 ++--
 lib/ridr.c           |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 89 insertions(+), 3 deletions(-)

Index: linux-2.6.25-mm1/include/linux/ridr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ridr.h	2008-04-29 14:31:43.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 14:34:53.000000000 +0200
@@ -63,6 +63,7 @@ void *ridr_find(struct ridr *, int);
 int ridr_pre_get(struct ridr *, gfp_t);
 int ridr_get_new(struct ridr *, void *, int *);
 int ridr_get_new_above(struct ridr *, void *, int, int *);
+void ridr_remove(struct ridr *, int);
 void ridr_init(struct ridr *);
 
 
Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-29 14:08:47.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 14:35:37.000000000 +0200
@@ -91,6 +91,7 @@ static inline void _idr_set_new_slot(str
 
 void _idr_mark_full(struct idr_layer **, int);
 int _idr_sub_alloc(int *, int *, struct idr_layer **, struct idr_layer **);
+void _idr_remove_warning(const char *, int);
 
 /*
  * This is what we export.
Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-29 14:06:20.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-04-29 14:40:12.000000000 +0200
@@ -356,9 +356,10 @@ int idr_get_new(struct idr *idp, void *p
 }
 EXPORT_SYMBOL(idr_get_new);
 
-static void idr_remove_warning(int id)
+void _idr_remove_warning(const char *caller, int id)
 {
-	printk("idr_remove called for id=%d which is not allocated.\n", id);
+	printk(KERN_INFO "%s called for id=%d which is not allocated.\n",
+		caller, id);
 	dump_stack();
 }
 
@@ -390,7 +391,7 @@ static void sub_remove(struct idr *idp, 
 		if (!*paa)
 			idp->layers = 0;
 	} else
-		idr_remove_warning(id);
+		_idr_remove_warning("idr_remove", id);
 }
 
 /**
Index: linux-2.6.25-mm1/lib/ridr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 14:32:26.000000000 +0200
+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:41:57.000000000 +0200
@@ -28,6 +28,19 @@ static struct ridr_layer *get_from_free_
 	return(q);
 }
 
+static void ridr_layer_rcu_free(struct rcu_head *head)
+{
+	struct ridr_layer *layer;
+
+	layer = container_of(head, struct ridr_layer, rcu_head);
+	kmem_cache_free(ridr_layer_cache, layer);
+}
+
+static inline void free_layer(struct ridr_layer *p)
+{
+	call_rcu(&p->rcu_head, ridr_layer_rcu_free);
+}
+
 /* only called when idp->lock is held */
 /* moves an ridr_layer to the free list */
 static void __move_to_free_list(struct ridr *idp, struct ridr_layer *p)
@@ -259,6 +272,76 @@ int ridr_get_new(struct ridr *idp, void 
 }
 EXPORT_SYMBOL(ridr_get_new);
 
+static void sub_remove(struct ridr *idp, int shift, int id)
+{
+	struct ridr_layer *p = idp->top;
+	struct ridr_layer **pa[MAX_LEVEL];
+	struct ridr_layer ***paa = &pa[0];
+	struct ridr_layer *to_free;
+	int n;
+
+	*paa = NULL;
+	*++paa = &idp->top;
+
+	while ((shift > 0) && p) {
+		n = (id >> shift) & IDR_MASK;
+		__clear_bit(n, &p->idr.bitmap);
+		*++paa = (struct ridr_layer **) &(p->idr.ary[n]);
+		p = p->idr.ary[n];
+		shift -= IDR_BITS;
+	}
+	n = id & IDR_MASK;
+	if (likely(p != NULL && test_bit(n, &p->idr.bitmap))) {
+		__clear_bit(n, &p->idr.bitmap);
+		rcu_assign_pointer(p->idr.ary[n], NULL);
+		to_free = NULL;
+		while (*paa && !--((**paa)->idr.count)) {
+			if (to_free)
+				free_layer(to_free);
+			to_free = **paa;
+			**paa-- = NULL;
+		}
+		if (!*paa)
+			idp->layers = 0;
+		if (to_free)
+			free_layer(to_free);
+	} else
+		_idr_remove_warning("ridr_remove", id);
+}
+/**
+ * ridr_remove - remove the given id and free it's slot
+ * @idp: ridr handle
+ * @id: unique key
+ */
+void ridr_remove(struct ridr *idp, int id)
+{
+	struct ridr_layer *p;
+	struct ridr_layer *to_free;
+
+	/* Mask off upper bits we don't use for the search. */
+	id &= MAX_ID_MASK;
+
+	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
+	if (idp->top && idp->top->idr.count == 1 && (idp->layers > 1) &&
+	    idp->top->idr.ary[0]) {
+		/*
+		 * Single child at leftmost slot: we can shrink the tree.
+		 * This level is not needed anymore since when layers are
+		 * inserted, they are inserted at the top of the existing
+		 * tree.
+		 */
+		to_free = idp->top;
+		p = idp->top->idr.ary[0];
+		rcu_assign_pointer(idp->top, p);
+		--idp->layers;
+		to_free->idr.bitmap = to_free->idr.count = 0;
+		to_free->idr.ary[0] = NULL;
+		free_layer(to_free);
+	}
+	return;
+}
+EXPORT_SYMBOL(ridr_remove);
+
 /**
  * ridr_find - return pointer for given id
  * @idp: ridr handle

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 09/10] Integrate the ridr code into IPC code
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (7 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 08/10] Introduce ridr_remove() Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:32   ` Tim Pepper
  2008-04-29 14:33 ` [PATCH 10/10] Get rid of ipc_lock_down() Nadia.Derbey
  2008-04-29 18:54 ` [PATCH 00/10] Scalability requirements for sysv ipc - v2 Andrew Morton
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ipc_use_ridr.patch --]
[-- Type: text/plain, Size: 7179 bytes --]

[PATCH 09/10]

This patche makes the ipc ode use the ridr API.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/ipc_namespace.h |    4 ++--
 ipc/namespace.c               |    2 +-
 ipc/shm.c                     |    2 +-
 ipc/util.c                    |   41 ++++++++++++++++-------------------------
 4 files changed, 20 insertions(+), 29 deletions(-)

Index: linux-2.6.25-mm1/include/linux/ipc_namespace.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/ipc_namespace.h	2008-04-25 15:29:00.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/ipc_namespace.h	2008-04-29 14:46:34.000000000 +0200
@@ -2,7 +2,7 @@
 #define __IPC_NAMESPACE_H__
 
 #include <linux/err.h>
-#include <linux/idr.h>
+#include <linux/ridr.h>
 #include <linux/rwsem.h>
 #include <linux/notifier.h>
 
@@ -21,7 +21,7 @@ struct ipc_ids {
 	unsigned short seq;
 	unsigned short seq_max;
 	struct rw_semaphore rw_mutex;
-	struct idr ipcs_idr;
+	struct ridr ipcs_ridr;
 };
 
 struct ipc_namespace {
Index: linux-2.6.25-mm1/ipc/namespace.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/namespace.c	2008-04-25 15:29:13.000000000 +0200
+++ linux-2.6.25-mm1/ipc/namespace.c	2008-04-29 14:47:25.000000000 +0200
@@ -74,7 +74,7 @@ void free_ipcs(struct ipc_namespace *ns,
 	in_use = ids->in_use;
 
 	for (total = 0, next_id = 0; total < in_use; next_id++) {
-		perm = idr_find(&ids->ipcs_idr, next_id);
+		perm = ridr_find(&ids->ipcs_ridr, next_id);
 		if (perm == NULL)
 			continue;
 		ipc_lock_by_ptr(perm);
Index: linux-2.6.25-mm1/ipc/shm.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/shm.c	2008-04-25 15:29:13.000000000 +0200
+++ linux-2.6.25-mm1/ipc/shm.c	2008-04-29 14:48:26.000000000 +0200
@@ -569,7 +569,7 @@ static void shm_get_stat(struct ipc_name
 		struct shmid_kernel *shp;
 		struct inode *inode;
 
-		shp = idr_find(&shm_ids(ns).ipcs_idr, next_id);
+		shp = ridr_find(&shm_ids(ns).ipcs_ridr, next_id);
 		if (shp == NULL)
 			continue;
 
Index: linux-2.6.25-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.c	2008-04-25 15:29:13.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.c	2008-04-29 14:53:14.000000000 +0200
@@ -122,7 +122,7 @@ __initcall(ipc_init);
  *	@ids: Identifier set
  *
  *	Set up the sequence range to use for the ipc identifier range (limited
- *	below IPCMNI) then initialise the ids idr.
+ *	below IPCMNI) then initialise the ids ridr.
  */
  
 void ipc_init_ids(struct ipc_ids *ids)
@@ -139,7 +139,7 @@ void ipc_init_ids(struct ipc_ids *ids)
 		 	ids->seq_max = seq_limit;
 	}
 
-	idr_init(&ids->ipcs_idr);
+	ridr_init(&ids->ipcs_ridr);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -194,7 +194,7 @@ static struct kern_ipc_perm *ipc_findkey
 	int total;
 
 	for (total = 0, next_id = 0; total < ids->in_use; next_id++) {
-		ipc = idr_find(&ids->ipcs_idr, next_id);
+		ipc = ridr_find(&ids->ipcs_ridr, next_id);
 
 		if (ipc == NULL)
 			continue;
@@ -233,7 +233,7 @@ int ipc_get_maxid(struct ipc_ids *ids)
 	/* Look for the last assigned id */
 	total = 0;
 	for (id = 0; id < IPCMNI && total < ids->in_use; id++) {
-		ipc = idr_find(&ids->ipcs_idr, id);
+		ipc = ridr_find(&ids->ipcs_ridr, id);
 		if (ipc != NULL) {
 			max_id = id;
 			total++;
@@ -248,7 +248,7 @@ int ipc_get_maxid(struct ipc_ids *ids)
  *	@new: new IPC permission set
  *	@size: limit for the number of used ids
  *
- *	Add an entry 'new' to the IPC ids idr. The permissions object is
+ *	Add an entry 'new' to the IPC ids ridr. The permissions object is
  *	initialised and the first free entry is set up and the id assigned
  *	is returned. The 'new' entry is returned in a locked state on success.
  *	On failure the entry is not locked and a negative err-code is returned.
@@ -266,7 +266,7 @@ int ipc_addid(struct ipc_ids* ids, struc
 	if (ids->in_use >= size)
 		return -ENOSPC;
 
-	err = idr_get_new(&ids->ipcs_idr, new, &id);
+	err = ridr_get_new(&ids->ipcs_ridr, new, &id);
 	if (err)
 		return err;
 
@@ -302,7 +302,7 @@ static int ipcget_new(struct ipc_namespa
 {
 	int err;
 retry:
-	err = idr_pre_get(&ids->ipcs_idr, GFP_KERNEL);
+	err = ridr_pre_get(&ids->ipcs_ridr, GFP_KERNEL);
 
 	if (!err)
 		return -ENOMEM;
@@ -368,7 +368,7 @@ static int ipcget_public(struct ipc_name
 	int flg = params->flg;
 	int err;
 retry:
-	err = idr_pre_get(&ids->ipcs_idr, GFP_KERNEL);
+	err = ridr_pre_get(&ids->ipcs_ridr, GFP_KERNEL);
 
 	/*
 	 * Take the lock as a writer since we are potentially going to add
@@ -424,7 +424,7 @@ void ipc_rmid(struct ipc_ids *ids, struc
 {
 	int lid = ipcid_to_idx(ipcp->id);
 
-	idr_remove(&ids->ipcs_idr, lid);
+	ridr_remove(&ids->ipcs_ridr, lid);
 
 	ids->in_use--;
 
@@ -685,13 +685,9 @@ void ipc64_perm_to_ipc_perm (struct ipc6
  * @ids: IPC identifier set
  * @id: ipc id to look for
  *
- * Look for an id in the ipc ids idr and lock the associated ipc object.
+ * Look for an id in the ipc ids ridr and lock the associated ipc object.
  *
  * The ipc object is locked on exit.
- *
- * This is the routine that should be called when the rw_mutex is not already
- * held, i.e. idr tree not protected: it protects the idr tree in read mode
- * during the idr_find().
  */
 
 struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
@@ -699,18 +695,13 @@ struct kern_ipc_perm *ipc_lock(struct ip
 	struct kern_ipc_perm *out;
 	int lid = ipcid_to_idx(id);
 
-	down_read(&ids->rw_mutex);
-
 	rcu_read_lock();
-	out = idr_find(&ids->ipcs_idr, lid);
+	out = ridr_find(&ids->ipcs_ridr, lid);
 	if (out == NULL) {
 		rcu_read_unlock();
-		up_read(&ids->rw_mutex);
 		return ERR_PTR(-EINVAL);
 	}
 
-	up_read(&ids->rw_mutex);
-
 	spin_lock(&out->lock);
 	
 	/* ipc_rmid() may have already freed the ID while ipc_lock
@@ -730,12 +721,12 @@ struct kern_ipc_perm *ipc_lock(struct ip
  * @ids: IPC identifier set
  * @id: ipc id to look for
  *
- * Look for an id in the ipc ids idr and lock the associated ipc object.
+ * Look for an id in the ipc ids ridr and lock the associated ipc object.
  *
  * The ipc object is locked on exit.
  *
  * This is the routine that should be called when the rw_mutex is already
- * held, i.e. idr tree protected.
+ * held, i.e. ridr tree protected.
  */
 
 struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *ids, int id)
@@ -744,7 +735,7 @@ struct kern_ipc_perm *ipc_lock_down(stru
 	int lid = ipcid_to_idx(id);
 
 	rcu_read_lock();
-	out = idr_find(&ids->ipcs_idr, lid);
+	out = ridr_find(&ids->ipcs_ridr, lid);
 	if (out == NULL) {
 		rcu_read_unlock();
 		return ERR_PTR(-EINVAL);
@@ -915,7 +906,7 @@ static struct kern_ipc_perm *sysvipc_fin
 
 	total = 0;
 	for (id = 0; id < pos && total < ids->in_use; id++) {
-		ipc = idr_find(&ids->ipcs_idr, id);
+		ipc = ridr_find(&ids->ipcs_ridr, id);
 		if (ipc != NULL)
 			total++;
 	}
@@ -924,7 +915,7 @@ static struct kern_ipc_perm *sysvipc_fin
 		return NULL;
 
 	for ( ; pos < IPCMNI; pos++) {
-		ipc = idr_find(&ids->ipcs_idr, pos);
+		ipc = ridr_find(&ids->ipcs_ridr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
 			ipc_lock_by_ptr(ipc);

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 10/10] Get rid of ipc_lock_down()
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (8 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 09/10] Integrate the ridr code into IPC code Nadia.Derbey
@ 2008-04-29 14:33 ` Nadia.Derbey
  2008-05-01  4:33   ` Tim Pepper
  2008-04-29 18:54 ` [PATCH 00/10] Scalability requirements for sysv ipc - v2 Andrew Morton
  10 siblings, 1 reply; 26+ messages in thread
From: Nadia.Derbey @ 2008-04-29 14:33 UTC (permalink / raw)
  To: manfred, paulmck; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: remove_ipc_lock_down.patch --]
[-- Type: text/plain, Size: 4569 bytes --]

[PATCH 10/10]

With the previous patch ipc_lock() becomes lockless.
So there is no need anymore for ipc_lock_down()-like routines.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 ipc/shm.c  |   21 +++------------------
 ipc/util.c |   52 +---------------------------------------------------
 ipc/util.h |    6 ------
 3 files changed, 4 insertions(+), 75 deletions(-)

Index: linux-2.6.25-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.c	2008-04-29 14:53:14.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.c	2008-04-29 14:59:06.000000000 +0200
@@ -716,56 +716,6 @@ struct kern_ipc_perm *ipc_lock(struct ip
 	return out;
 }
 
-/**
- * ipc_lock_down - Lock an ipc structure with rw_sem held
- * @ids: IPC identifier set
- * @id: ipc id to look for
- *
- * Look for an id in the ipc ids ridr and lock the associated ipc object.
- *
- * The ipc object is locked on exit.
- *
- * This is the routine that should be called when the rw_mutex is already
- * held, i.e. ridr tree protected.
- */
-
-struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *ids, int id)
-{
-	struct kern_ipc_perm *out;
-	int lid = ipcid_to_idx(id);
-
-	rcu_read_lock();
-	out = ridr_find(&ids->ipcs_ridr, lid);
-	if (out == NULL) {
-		rcu_read_unlock();
-		return ERR_PTR(-EINVAL);
-	}
-
-	spin_lock(&out->lock);
-
-	/*
-	 * No need to verify that the structure is still valid since the
-	 * rw_mutex is held.
-	 */
-	return out;
-}
-
-struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id)
-{
-	struct kern_ipc_perm *out;
-
-	out = ipc_lock_down(ids, id);
-	if (IS_ERR(out))
-		return out;
-
-	if (ipc_checkid(out, id)) {
-		ipc_unlock(out);
-		return ERR_PTR(-EIDRM);
-	}
-
-	return out;
-}
-
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id)
 {
 	struct kern_ipc_perm *out;
@@ -837,7 +787,7 @@ struct kern_ipc_perm *ipcctl_pre_down(st
 	int err;
 
 	down_write(&ids->rw_mutex);
-	ipcp = ipc_lock_check_down(ids, id);
+	ipcp = ipc_lock_check(ids, id);
 	if (IS_ERR(ipcp)) {
 		err = PTR_ERR(ipcp);
 		goto out_up;
Index: linux-2.6.25-mm1/ipc/util.h
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.h	2008-04-25 15:29:13.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.h	2008-04-29 14:59:49.000000000 +0200
@@ -102,11 +102,6 @@ void* ipc_rcu_alloc(int size);
 void ipc_rcu_getref(void *ptr);
 void ipc_rcu_putref(void *ptr);
 
-/*
- * ipc_lock_down: called with rw_mutex held
- * ipc_lock: called without that lock held
- */
-struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *, int);
 struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
 
 void kernel_to_ipc64_perm(struct kern_ipc_perm *in, struct ipc64_perm *out);
@@ -155,7 +150,6 @@ static inline void ipc_unlock(struct ker
 	rcu_read_unlock();
 }
 
-struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id);
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
 int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
 			struct ipc_ops *ops, struct ipc_params *params);
Index: linux-2.6.25-mm1/ipc/shm.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/shm.c	2008-04-29 14:48:26.000000000 +0200
+++ linux-2.6.25-mm1/ipc/shm.c	2008-04-29 15:01:24.000000000 +0200
@@ -112,23 +112,8 @@ void __init shm_init (void)
 }
 
 /*
- * shm_lock_(check_)down routines are called in the paths where the rw_mutex
- * is held to protect access to the idr tree.
- */
-static inline struct shmid_kernel *shm_lock_down(struct ipc_namespace *ns,
-						int id)
-{
-	struct kern_ipc_perm *ipcp = ipc_lock_down(&shm_ids(ns), id);
-
-	if (IS_ERR(ipcp))
-		return (struct shmid_kernel *)ipcp;
-
-	return container_of(ipcp, struct shmid_kernel, shm_perm);
-}
-
-/*
  * shm_lock_(check_) routines are called in the paths where the rw_mutex
- * is not held.
+ * is not necessarily held.
  */
 static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
 {
@@ -211,7 +196,7 @@ static void shm_close(struct vm_area_str
 
 	down_write(&shm_ids(ns).rw_mutex);
 	/* remove from the list of attaches of the shm segment */
-	shp = shm_lock_down(ns, sfd->id);
+	shp = shm_lock(ns, sfd->id);
 	BUG_ON(IS_ERR(shp));
 	shp->shm_lprid = task_tgid_vnr(current);
 	shp->shm_dtim = get_seconds();
@@ -933,7 +918,7 @@ invalid:
 
 out_nattch:
 	down_write(&shm_ids(ns).rw_mutex);
-	shp = shm_lock_down(ns, shmid);
+	shp = shm_lock(ns, shmid);
 	BUG_ON(IS_ERR(shp));
 	shp->shm_nattch--;
 	if(shp->shm_nattch == 0 &&

--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/10] Fix idr_remove()
  2008-04-29 14:33 ` [PATCH 01/10] Fix idr_remove() Nadia.Derbey
@ 2008-04-29 18:44   ` Andrew Morton
  2008-05-05  9:26     ` Nadia Derbey
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2008-04-29 18:44 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, Nadia.Derbey

On Tue, 29 Apr 2008 16:33:05 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 01/10]
> 
> This patch fixes idr_remove(): the return inside the loop makes us free only
> a single layer.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-25 15:29:00.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-04-25 15:48:34.000000000 +0200
> @@ -385,8 +385,8 @@ void idr_remove(struct idr *idp, int id)
>  	while (idp->id_free_cnt >= IDR_FREE_MAX) {
>  		p = alloc_layer(idp);
>  		kmem_cache_free(idr_layer_cache, p);
> -		return;
>  	}
> +	return;
>  }
>  EXPORT_SYMBOL(idr_remove);

erk, ancient bug.

I _think_ the implications of this are that an idr tree will grow fatter
than it needs to be, but there is no permanent leak: idr_destroy() will
still free everything, yes?

And a consequence of the fix is that idr manipulations will now result in
more allocs and frees, but the amount of memory which a tree uses will be
less?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10] Scalability requirements for sysv ipc - v2
  2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
                   ` (9 preceding siblings ...)
  2008-04-29 14:33 ` [PATCH 10/10] Get rid of ipc_lock_down() Nadia.Derbey
@ 2008-04-29 18:54 ` Andrew Morton
  2008-05-05 16:52   ` Nadia Derbey
  10 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2008-04-29 18:54 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault

On Tue, 29 Apr 2008 16:33:04 +0200
Nadia.Derbey@bull.net wrote:

> After scalability problems had been detected when using the sysV ipcs, I
> have proposed to use an RCU based implementation of the IDR api instead
> (see thread http://lkml.org/lkml/2008/4/11/212).
> 
> Resending the patch series after
>  . integrating Paul's remarks
>  . fixing a bug I had introduced
>  . porting it to 2.6.25-mm1.
> 
> Reviewers are still welcome!
> 
> Patch 1 can be taken alone: it fixes a problem in the existing IDR API.
> 
> Patches should be applied on linux-2.6.25-mm1, in the following order:
> 
> [ PATCH 01/10 ] : idr_minor_fix.patch
> [ PATCH 02/10 ] : ridr_structure.patch
> [ PATCH 03/10 ] : ridr_pre_get.patch
> [ PATCH 04/10 ] : ridr_init.patch
> [ PATCH 05/10 ] : ridr_get_new_above.patch
> [ PATCH 06/10 ] : ridr_get_new.patch
> [ PATCH 07/10 ] : ridr_find.patch
> [ PATCH 08/10 ] : ridr_remove.patch
> [ PATCH 09/10 ] : ipc_use_ridr.patch
> [ PATCH 10/10 ] : remove_ipc_lock_down.patch
> 

hm, you've been busy.

As all this work is entirely for performance reasons, the changelog should
have (lots of) performance testing results!  Please.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/10] Introduce the ridr structure
  2008-04-29 14:33 ` [PATCH 02/10] Introduce the ridr structure Nadia.Derbey
@ 2008-05-01  4:30   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:30 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:06 +0200 Nadia.Derbey@bull.net said:

> Index: linux-2.6.25-mm1/include/linux/ridr.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.25-mm1/include/linux/ridr.h	2008-04-29 13:08:00.000000000 +0200
> @@ -0,0 +1,50 @@
> +/*
> + * include/linux/ridr.h
> + *
> + * Small id to pointer translation service avoiding fixed sized
> + * tables. RCU-based implmentation of IDRs.
> + */
                        ^^^^^^^
s/implmentation/reimplementation

Additional comment here might be nice here since this amounts to a mostly
copy of idr.h and its backing implementation, with all the associated
maintenance headaches if both are in tree.

> +/*
> + * The idr_layer part should stay at the beginning of the structure
> + */
> +struct ridr_layer {
> +	struct idr_layer idr;
> +	struct rcu_head	 rcu_head;
> +};

Why does it have to be at the beginning?  So an ridr_layer can be used as
an idr_layer?  This doesn't help review.  As previous commenters noted,
review would be a lot easier if this was incremental on idr.

> Index: linux-2.6.25-mm1/lib/Makefile
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/Makefile	2008-04-25 15:29:00.000000000 +0200
> +++ linux-2.6.25-mm1/lib/Makefile	2008-04-25 15:57:39.000000000 +0200
> @@ -6,7 +6,7 @@ lib-y := ctype.o string.o vsprintf.o cmd
>  	 rbtree.o radix-tree.o dump_stack.o \
>  	 idr.o int_sqrt.o extable.o prio_tree.o \
>  	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
> -	 proportions.o prio_heap.o ratelimit.o
> +	 proportions.o prio_heap.o ratelimit.o ridr.o

I vaguely recall past discussion of preferred insertion point, but
lacking something in CodingStyle and especially given the redundancy
here it could be nice to add ridr.o next to idr.o.

> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-25 17:22:43.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-04-29 13:08:05.000000000 +0200
> @@ -342,7 +342,7 @@ static void sub_remove(struct idr *idp, 
>  	while ((shift > 0) && p) {
>  		n = (id >> shift) & IDR_MASK;
>  		__clear_bit(n, &p->bitmap);
> -		*++paa = &p->ary[n];
> +		*++paa = (struct idr_layer **) &p->ary[n];
>  		p = p->ary[n];
>  		shift -= IDR_BITS;
>  	}

For shifty use of an ridr_layer as an idr_layer?

> Index: linux-2.6.25-mm1/include/linux/idr.h
> ===================================================================
> --- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-25 17:22:43.000000000 +0200
> +++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 13:08:00.000000000 +0200
> @@ -48,9 +48,9 @@
>  #define IDR_FREE_MAX MAX_LEVEL + MAX_LEVEL
> 
>  struct idr_layer {
> -	unsigned long		 bitmap; /* A zero bit means "space here" */
> -	struct idr_layer	*ary[1<<IDR_BITS];
> -	int			 count;	 /* When zero, we can release it */
> +	unsigned long	 bitmap; /* A zero bit means "space here" */
> +	void		*ary[1<<IDR_BITS];
> +	int		 count;	 /* When zero, we can release it */
>  };
> 
>  struct idr {
> 

This hunk is white space only...

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/10] Introduce ridr_pre_get()
  2008-04-29 14:33 ` [PATCH 03/10] Introduce ridr_pre_get() Nadia.Derbey
@ 2008-05-01  4:31   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:31 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:07 +0200 Nadia.Derbey@bull.net said:
> [PATCH 03/10]
> 
> This patch introduces the ridr_pre_get() routine.

There's almost nothing here that isn't 1:1 duplication of idr.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/10] Introduce ridr_init()
  2008-04-29 14:33 ` [PATCH 04/10] Introduce ridr_init() Nadia.Derbey
@ 2008-05-01  4:31   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:31 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:08 +0200 Nadia.Derbey@bull.net said:
> [PATCH 04/10]
> 
> This patch introduces the ridr_init() routine.

Again nothing added.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/10] Introduce ridr_get_new_above()
  2008-04-29 14:33 ` [PATCH 05/10] Introduce ridr_get_new_above() Nadia.Derbey
@ 2008-05-01  4:32   ` Tim Pepper
  2008-05-05 10:33     ` Nadia Derbey
  0 siblings, 1 reply; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:09 +0200 Nadia.Derbey@bull.net said:
> [PATCH 05/10]
> 
> This patch introduces the ridr_get_new_above() routine, and some common code
> between the idr an ridr API's.

The ridr_get_new_above() is the first place we see something really
different compared to idr (an RCU addition).  This is a lot of patching
so far for what would be a small incremental change otherwise.

> Index: linux-2.6.25-mm1/include/linux/idr.h
> ===================================================================
> --- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-29 13:08:00.000000000 +0200
> +++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 14:08:47.000000000 +0200
> @@ -71,6 +71,27 @@ struct idr {
>  }
>  #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
> 
> +/* Actions to be taken after a call to _idr_sub_alloc */
> +#define IDR_DONE         -1
> +#define IDR_NEED_TO_GROW -2
> +#define IDR_NOMORE_SPACE -3
> +#define IDR_GO_TOP       -4
> +#define IDR_GO_UP        -5

This stuff is useful on its own in as much as it improves code
readability.  A lot of this patch (the "some common code between the
idr and ridr API's" part) could be a standalone patch distinct from the
ridr series and then be at the head of your stack.

> +			return action;
> +		case IDR_DONE:
> +			goto end_loop;
> +		case IDR_GO_UP:
> +			continue;
> +		case IDR_GO_TOP:
> +			goto restart;
> +		default:
> +			m = action;
>  			break;
> +		}
> +		BUG_ON(m < 0);

Why the added BUG_ON()?  These couple hunks are a bit muddled, so maybe I
missed something.  But I don't see anything different in how m or bm are
being manipulated such that m<0 is anymore likely after this patch.

> Index: linux-2.6.25-mm1/lib/ridr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 13:23:17.000000000 +0200
> +++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:03:35.000000000 +0200
> @@ -11,6 +11,23 @@
>  static struct kmem_cache *ridr_layer_cache;
> 
> 
> +static struct ridr_layer *get_from_free_list(struct ridr *idp)
> +{
> +	struct ridr_layer *q;
> +	struct idr_layer *p;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&idp->lock, flags);
> +	if ((q = idp->id_free)) {
> +		p = ridr_to_idr(q);
> +		idp->id_free = p->ary[0];
> +		idp->id_free_cnt--;
> +		p->ary[0] = NULL;
> +	}
> +	spin_unlock_irqrestore(&idp->lock, flags);
> +	return(q);
> +}
> +

idr's alloc_layer() in disguise?

> +static int sub_alloc(struct ridr *idp, int *starting_id,
> +			struct ridr_layer **rpa, struct idr_layer **pa)

...
More or less duplication
...

> +		 * Create the layer below if it is missing.
> +		 */
> +		if (!p->ary[m]) {
> +			new = get_from_free_list(idp);
> +			if (!new)
> +				return -1;
> +			rcu_assign_pointer(p->ary[m], new);
> +			p->count++;
> +		}
> +		pa[l] = p;
> +		rpa[l--] = idr_to_ridr(p);
> +		p = p->ary[m];
> +	}
> +
> +end_loop:
> +	pa[l] = p;
> +	rpa[l] = idr_to_ridr(p);
> +	return id;
> +}

Oh but wait..there's some RCU-ness tucked in there.

> +
> +static int ridr_get_empty_slot(struct ridr *idp, int starting_id,
> +			      struct ridr_layer **rpa, struct idr_layer **pa)
> +{
> +	struct ridr_layer *p, *rnew;
> +	int layers, v, id;
> +	unsigned long flags;
> +
> +	id = starting_id;
> +build_up:
> +	p = idp->top;
> +	layers = idp->layers;
> +	if (unlikely(!p)) {
> +		p = get_from_free_list(idp);
> +		if (!p)
> +			return -1;
> +		layers = 1;
> +	}
> +	/*
> +	 * Add a new layer to the top of the tree if the requested
> +	 * id is larger than the currently allocated space.
> +	 */
> +	while (layers < MAX_LEVEL - 1 && id >= (1 << (layers * IDR_BITS))) {
               ^^                   ^^
Dropped some parens.  Otherwise more duplication...

> +				rnew->idr.ary[0] = NULL;
> +				rnew->idr.bitmap = rnew->idr.count = 0;
> +				__move_to_free_list(idp, rnew);
> +			}
> +			spin_unlock_irqrestore(&idp->lock, flags);
> +			return -1;
> +		}
> +		_idr_set_new_slot(ridr_to_idr(rnew), ridr_to_idr(p));
> +		p = rnew;
> +	}
> +	rcu_assign_pointer(idp->top, p);
> +	idp->layers = layers;
> +	v = sub_alloc(idp, &id, rpa, pa);
> +	if (v == IDR_NEED_TO_GROW)
> +		goto build_up;
> +	return(v);
> +}

Some more RCU.

> +static int ridr_get_new_above_int(struct ridr *idp, void *ptr, int starting_id)
> +{
> +	struct ridr_layer *rpa[MAX_LEVEL];
> +	struct idr_layer *pa[MAX_LEVEL];
> +	int id;
> +
> +	id = ridr_get_empty_slot(idp, starting_id, rpa, pa);
> +	if (id >= 0) {
> +		/*
> +		 * Successfully found an empty slot.  Install the user
> +		 * pointer and mark the slot full.
> +		 */
> +		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
> +				(struct ridr_layer *)ptr);
> +		pa[0]->count++;
> +		_idr_mark_full(pa, id);
> +	}
> +
> +	return id;
> +}

And other line of RCU.

OK.  So at this point in patch 5/10 we've got 3 lines of new code and
hundreds of lines of duplicated code?

A while more looking through the rest of the patches for the rest of the
context and I might be able to actually think about the implications of
these three lines being where they are.

Locking changes are complicated enough without all this obfuscation!
I understand the desire to not break IDR, but...

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 06/10] Introduce ridr_get_new()
  2008-04-29 14:33 ` [PATCH 06/10] Introduce ridr_get_new() Nadia.Derbey
@ 2008-05-01  4:32   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:10 +0200 Nadia.Derbey@bull.net said:
> [PATCH 06/10]
> 
> This patch introduces the ridr_get_new() routine.

Duplication + small useful cleanup that could be standalone.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 07/10] Introduce ridr_find()
  2008-04-29 14:33 ` [PATCH 07/10] Introduce ridr_find() Nadia.Derbey
@ 2008-05-01  4:32   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:11 +0200 Nadia.Derbey@bull.net said:
> [PATCH 07/10]
> 
> This patch introduces the ridr_find() routine.

Finally this one has some meat.  But, besides the comments, this could be
around two lines added to idr instead of many more duplicated.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/10] Introduce ridr_remove()
  2008-04-29 14:33 ` [PATCH 08/10] Introduce ridr_remove() Nadia.Derbey
@ 2008-05-01  4:32   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:12 +0200 Nadia.Derbey@bull.net said:
> [PATCH 08/10]
> 
> This patch introduces the ridr_remove() routine.

A bit of cleanup and roughly 2 lines RCU.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/10] Integrate the ridr code into IPC code
  2008-04-29 14:33 ` [PATCH 09/10] Integrate the ridr code into IPC code Nadia.Derbey
@ 2008-05-01  4:32   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:13 +0200 Nadia.Derbey@bull.net said:
> [PATCH 09/10]
> 
> This patche makes the ipc ode use the ridr API.

Superfluous except...

> @@ -699,18 +695,13 @@ struct kern_ipc_perm *ipc_lock(struct ip
>  	struct kern_ipc_perm *out;
>  	int lid = ipcid_to_idx(id);
> 
> -	down_read(&ids->rw_mutex);
> -
>  	rcu_read_lock();
> -	out = idr_find(&ids->ipcs_idr, lid);
> +	out = ridr_find(&ids->ipcs_ridr, lid);
>  	if (out == NULL) {
>  		rcu_read_unlock();
> -		up_read(&ids->rw_mutex);
>  		return ERR_PTR(-EINVAL);
>  	}
> 
> -	up_read(&ids->rw_mutex);
> -

...the removal of the rw_mutex down/up_read().


-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/10] Get rid of ipc_lock_down()
  2008-04-29 14:33 ` [PATCH 10/10] Get rid of ipc_lock_down() Nadia.Derbey
@ 2008-05-01  4:33   ` Tim Pepper
  0 siblings, 0 replies; 26+ messages in thread
From: Tim Pepper @ 2008-05-01  4:33 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, linux-kernel, efault, akpm

On Tue 29 Apr at 16:33:14 +0200 Nadia.Derbey@bull.net said:
> [PATCH 10/10]
> 
> With the previous patch ipc_lock() becomes lockless.
> So there is no need anymore for ipc_lock_down()-like routines.

...cleanup post conversion to RCU.

Could we get a version of the series without all the duplication and
without the cleanups (separate into standalone pre- and post- patches)?
If you're too busy, now that I've sifted through these I could make a
simpler series that does this.

It will be a looooot easier to discuss the locking when all we're looking
at is the locking changes.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/10] Fix idr_remove()
  2008-04-29 18:44   ` Andrew Morton
@ 2008-05-05  9:26     ` Nadia Derbey
  0 siblings, 0 replies; 26+ messages in thread
From: Nadia Derbey @ 2008-05-05  9:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: manfred, paulmck, linux-kernel, efault

Andrew Morton wrote:
> On Tue, 29 Apr 2008 16:33:05 +0200
> Nadia.Derbey@bull.net wrote:
> 
> 
>>[PATCH 01/10]
>>
>>This patch fixes idr_remove(): the return inside the loop makes us free only
>>a single layer.
>>
>>Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
>>
>>---
>> lib/idr.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>Index: linux-2.6.25-mm1/lib/idr.c
>>===================================================================
>>--- linux-2.6.25-mm1.orig/lib/idr.c	2008-04-25 15:29:00.000000000 +0200
>>+++ linux-2.6.25-mm1/lib/idr.c	2008-04-25 15:48:34.000000000 +0200
>>@@ -385,8 +385,8 @@ void idr_remove(struct idr *idp, int id)
>> 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
>> 		p = alloc_layer(idp);
>> 		kmem_cache_free(idr_layer_cache, p);
>>-		return;
>> 	}
>>+	return;
>> }
>> EXPORT_SYMBOL(idr_remove);
> 
> 
> erk, ancient bug.
> 
> I _think_ the implications of this are that an idr tree will grow fatter
> than it needs to be, but there is no permanent leak: idr_destroy() will
> still free everything, yes?

Yes, exactly. Actually, I've not checked whether all the kernel 
components call idr_destroy() when needed.

> 
> And a consequence of the fix is that idr manipulations will now result in
> more allocs and frees,

Not necessarily more allocs: this loop keeps IDR_FREE_MAX layers in the 
free list. So idr_pre_get() should be a noop.

> but the amount of memory which a tree uses will be
> less?
> 
> 
> 

Regards,
Nadia



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/10] Introduce ridr_get_new_above()
  2008-05-01  4:32   ` Tim Pepper
@ 2008-05-05 10:33     ` Nadia Derbey
  0 siblings, 0 replies; 26+ messages in thread
From: Nadia Derbey @ 2008-05-05 10:33 UTC (permalink / raw)
  To: Tim Pepper; +Cc: manfred, paulmck, linux-kernel, efault, akpm

Tim Pepper wrote:
> On Tue 29 Apr at 16:33:09 +0200 Nadia.Derbey@bull.net said:
> 
>>[PATCH 05/10]
>>
>>This patch introduces the ridr_get_new_above() routine, and some common code
>>between the idr an ridr API's.
> 
> 
> The ridr_get_new_above() is the first place we see something really
> different compared to idr (an RCU addition).  This is a lot of patching
> so far for what would be a small incremental change otherwise.
> 
> 
>>Index: linux-2.6.25-mm1/include/linux/idr.h
>>===================================================================
>>--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-04-29 13:08:00.000000000 +0200
>>+++ linux-2.6.25-mm1/include/linux/idr.h	2008-04-29 14:08:47.000000000 +0200
>>@@ -71,6 +71,27 @@ struct idr {
>> }
>> #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
>>
>>+/* Actions to be taken after a call to _idr_sub_alloc */
>>+#define IDR_DONE         -1
>>+#define IDR_NEED_TO_GROW -2
>>+#define IDR_NOMORE_SPACE -3
>>+#define IDR_GO_TOP       -4
>>+#define IDR_GO_UP        -5
> 
> 
> This stuff is useful on its own in as much as it improves code
> readability.  A lot of this patch (the "some common code between the
> idr and ridr API's" part) could be a standalone patch distinct from the
> ridr series and then be at the head of your stack.
> 
> 
>>+			return action;
>>+		case IDR_DONE:
>>+			goto end_loop;
>>+		case IDR_GO_UP:
>>+			continue;
>>+		case IDR_GO_TOP:
>>+			goto restart;
>>+		default:
>>+			m = action;
>> 			break;
>>+		}
>>+		BUG_ON(m < 0);
> 
> 
> Why the added BUG_ON()?  These couple hunks are a bit muddled, so maybe I
> missed something.  But I don't see anything different in how m or bm are
> being manipulated such that m<0 is anymore likely after this patch.
> 
> 
>>Index: linux-2.6.25-mm1/lib/ridr.c
>>===================================================================
>>--- linux-2.6.25-mm1.orig/lib/ridr.c	2008-04-29 13:23:17.000000000 +0200
>>+++ linux-2.6.25-mm1/lib/ridr.c	2008-04-29 14:03:35.000000000 +0200
>>@@ -11,6 +11,23 @@
>> static struct kmem_cache *ridr_layer_cache;
>>
>>
>>+static struct ridr_layer *get_from_free_list(struct ridr *idp)
>>+{
>>+	struct ridr_layer *q;
>>+	struct idr_layer *p;
>>+	unsigned long flags;
>>+
>>+	spin_lock_irqsave(&idp->lock, flags);
>>+	if ((q = idp->id_free)) {
>>+		p = ridr_to_idr(q);
>>+		idp->id_free = p->ary[0];
>>+		idp->id_free_cnt--;
>>+		p->ary[0] = NULL;
>>+	}
>>+	spin_unlock_irqrestore(&idp->lock, flags);
>>+	return(q);
>>+}
>>+
> 
> 
> idr's alloc_layer() in disguise?
> 
> 
>>+static int sub_alloc(struct ridr *idp, int *starting_id,
>>+			struct ridr_layer **rpa, struct idr_layer **pa)
> 
> 
> ...
> More or less duplication
> ...
> 
> 
>>+		 * Create the layer below if it is missing.
>>+		 */
>>+		if (!p->ary[m]) {
>>+			new = get_from_free_list(idp);
>>+			if (!new)
>>+				return -1;
>>+			rcu_assign_pointer(p->ary[m], new);
>>+			p->count++;
>>+		}
>>+		pa[l] = p;
>>+		rpa[l--] = idr_to_ridr(p);
>>+		p = p->ary[m];
>>+	}
>>+
>>+end_loop:
>>+	pa[l] = p;
>>+	rpa[l] = idr_to_ridr(p);
>>+	return id;
>>+}
> 
> 
> Oh but wait..there's some RCU-ness tucked in there.
> 
> 
>>+
>>+static int ridr_get_empty_slot(struct ridr *idp, int starting_id,
>>+			      struct ridr_layer **rpa, struct idr_layer **pa)
>>+{
>>+	struct ridr_layer *p, *rnew;
>>+	int layers, v, id;
>>+	unsigned long flags;
>>+
>>+	id = starting_id;
>>+build_up:
>>+	p = idp->top;
>>+	layers = idp->layers;
>>+	if (unlikely(!p)) {
>>+		p = get_from_free_list(idp);
>>+		if (!p)
>>+			return -1;
>>+		layers = 1;
>>+	}
>>+	/*
>>+	 * Add a new layer to the top of the tree if the requested
>>+	 * id is larger than the currently allocated space.
>>+	 */
>>+	while (layers < MAX_LEVEL - 1 && id >= (1 << (layers * IDR_BITS))) {
> 
>                ^^                   ^^
> Dropped some parens.  Otherwise more duplication...
> 
> 
>>+				rnew->idr.ary[0] = NULL;
>>+				rnew->idr.bitmap = rnew->idr.count = 0;
>>+				__move_to_free_list(idp, rnew);
>>+			}
>>+			spin_unlock_irqrestore(&idp->lock, flags);
>>+			return -1;
>>+		}
>>+		_idr_set_new_slot(ridr_to_idr(rnew), ridr_to_idr(p));
>>+		p = rnew;
>>+	}
>>+	rcu_assign_pointer(idp->top, p);
>>+	idp->layers = layers;
>>+	v = sub_alloc(idp, &id, rpa, pa);
>>+	if (v == IDR_NEED_TO_GROW)
>>+		goto build_up;
>>+	return(v);
>>+}
> 
> 
> Some more RCU.
> 
> 
>>+static int ridr_get_new_above_int(struct ridr *idp, void *ptr, int starting_id)
>>+{
>>+	struct ridr_layer *rpa[MAX_LEVEL];
>>+	struct idr_layer *pa[MAX_LEVEL];
>>+	int id;
>>+
>>+	id = ridr_get_empty_slot(idp, starting_id, rpa, pa);
>>+	if (id >= 0) {
>>+		/*
>>+		 * Successfully found an empty slot.  Install the user
>>+		 * pointer and mark the slot full.
>>+		 */
>>+		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
>>+				(struct ridr_layer *)ptr);
>>+		pa[0]->count++;
>>+		_idr_mark_full(pa, id);
>>+	}
>>+
>>+	return id;
>>+}
> 
> 
> And other line of RCU.
> 
> OK.  So at this point in patch 5/10 we've got 3 lines of new code and
> hundreds of lines of duplicated code?
> 
> A while more looking through the rest of the patches for the rest of the
> context and I might be able to actually think about the implications of
> these three lines being where they are.
> 
> Locking changes are complicated enough without all this obfuscation!
> I understand the desire to not break IDR, but...
> 

OK, you convinced me. I'll re-send a new patchset that is incremental on 
top of idr. Sorry (and thanks a lot) for the painful review!

Regards,
Nadia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10] Scalability requirements for sysv ipc - v2
  2008-04-29 18:54 ` [PATCH 00/10] Scalability requirements for sysv ipc - v2 Andrew Morton
@ 2008-05-05 16:52   ` Nadia Derbey
  2008-05-05 17:07     ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Nadia Derbey @ 2008-05-05 16:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: manfred, paulmck, linux-kernel, efault

[-- Attachment #1: Type: text/plain, Size: 1879 bytes --]

Andrew Morton wrote:
> On Tue, 29 Apr 2008 16:33:04 +0200
> Nadia.Derbey@bull.net wrote:
> 
> 
>>After scalability problems had been detected when using the sysV ipcs, I
>>have proposed to use an RCU based implementation of the IDR api instead
>>(see thread http://lkml.org/lkml/2008/4/11/212).
>>
>>Resending the patch series after
>> . integrating Paul's remarks
>> . fixing a bug I had introduced
>> . porting it to 2.6.25-mm1.
>>
>>Reviewers are still welcome!
>>
>>Patch 1 can be taken alone: it fixes a problem in the existing IDR API.
>>
>>Patches should be applied on linux-2.6.25-mm1, in the following order:
>>
>>[ PATCH 01/10 ] : idr_minor_fix.patch
>>[ PATCH 02/10 ] : ridr_structure.patch
>>[ PATCH 03/10 ] : ridr_pre_get.patch
>>[ PATCH 04/10 ] : ridr_init.patch
>>[ PATCH 05/10 ] : ridr_get_new_above.patch
>>[ PATCH 06/10 ] : ridr_get_new.patch
>>[ PATCH 07/10 ] : ridr_find.patch
>>[ PATCH 08/10 ] : ridr_remove.patch
>>[ PATCH 09/10 ] : ipc_use_ridr.patch
>>[ PATCH 10/10 ] : remove_ipc_lock_down.patch
>>
> 
> 
> hm, you've been busy.
> 
> As all this work is entirely for performance reasons, the changelog should
> have (lots of) performance testing results!  Please.
> 
> 

Here are the results I've got  for the pmsg test sent by Manfred:

for i in 1 2 3 4 5 6 7 8;do ./pmsg $i 5;done > output.25_mm1.ref.8
for i in 1 2 3 4 5 6 7 8;do ./pmsg $i 5;done > output.25_mm1.patch.8


                 2.6.25-mm1 (ref)   2.6.25-mm1(patched)

Total (1 cpu):   876000             985496
Total (2 cpus): 1549592            1740591
Total (3 cpus): 1694370            2327468
Total (4 cpus):  404553            2465787
Total (5 cpus):  391283            3215676
Total (6 cpus):  263249            3849121
Total (7 cpus):  191742            4209866
Total (8 cpus):  141722            4255585



Full output files attached.

Will send the rest tomorrow.

Regards
Nadia


[-- Attachment #2: output.25_mm1.ref.8 --]
[-- Type: text/x-troff-man, Size: 5846 bytes --]

pmsg [nr queues] [timeout]
Using 1 queues/cpus (2 threads) for 5 seconds.
thread 1: sysvmsg        0 type 0 bound to 0001h
thread 0: sysvmsg        0 type 1 bound to 0001h
Result matrix:
  Thread   0:   438000       1:   438000
Total: 876000
pmsg [nr queues] [timeout]
Using 2 queues/cpus (4 threads) for 5 seconds.
thread 0: sysvmsg    32768 type 1 bound to 0001h
thread 1: sysvmsg    65537 type 1 bound to 0002h
thread 3: sysvmsg    65537 type 0 bound to 0002h
thread 2: sysvmsg    32768 type 0 bound to 0001h
Result matrix:
  Thread   0:   385675       2:   385675
  Thread   1:   389121       3:   389121
Total: 1549592
pmsg [nr queues] [timeout]
Using 3 queues/cpus (6 threads) for 5 seconds.
thread 0: sysvmsg    98304 type 1 bound to 0001h
thread 3: sysvmsg    98304 type 0 bound to 0001h
thread 1: sysvmsg   131073 type 1 bound to 0002h
thread 2: sysvmsg   163842 type 1 bound to 0004h
thread 5: sysvmsg   163842 type 0 bound to 0004h
thread 4: sysvmsg   131073 type 0 bound to 0002h
Result matrix:
  Thread   0:   279690       3:   279690
  Thread   1:   283228       4:   283229
  Thread   2:   284266       5:   284267
Total: 1694370
pmsg [nr queues] [timeout]
Using 4 queues/cpus (8 threads) for 5 seconds.
thread 0: sysvmsg   196608 type 1 bound to 0001h
thread 1: sysvmsg   229377 type 1 bound to 0002h
thread 2: sysvmsg   262146 type 1 bound to 0004h
thread 3: sysvmsg   294915 type 1 bound to 0008h
thread 4: sysvmsg   196608 type 0 bound to 0001h
thread 7: sysvmsg   294915 type 0 bound to 0008h
thread 6: sysvmsg   262146 type 0 bound to 0004h
thread 5: sysvmsg   229377 type 0 bound to 0002h
Result matrix:
  Thread   0:    50305       4:    50305
  Thread   1:    50168       5:    50168
  Thread   2:    50922       6:    50922
  Thread   3:    50881       7:    50882
Total: 404553
pmsg [nr queues] [timeout]
Using 5 queues/cpus (10 threads) for 5 seconds.
thread 0: sysvmsg   327680 type 1 bound to 0001h
thread 1: sysvmsg   360449 type 1 bound to 0002h
thread 2: sysvmsg   393218 type 1 bound to 0004h
thread 3: sysvmsg   425987 type 1 bound to 0008h
thread 4: sysvmsg   458756 type 1 bound to 0010h
thread 8: sysvmsg   425987 type 0 bound to 0008h
thread 7: sysvmsg   393218 type 0 bound to 0004h
thread 6: sysvmsg   360449 type 0 bound to 0002h
thread 5: sysvmsg   327680 type 0 bound to 0001h
thread 9: sysvmsg   458756 type 0 bound to 0010h
Result matrix:
  Thread   0:    29684       5:    29684
  Thread   1:    30236       6:    30237
  Thread   2:    30902       7:    30902
  Thread   3:    30699       8:    30700
  Thread   4:    74119       9:    74120
Total: 391283
pmsg [nr queues] [timeout]
Using 6 queues/cpus (12 threads) for 5 seconds.
thread 0: sysvmsg   491520 type 1 bound to 0001h
thread 2: sysvmsg   557058 type 1 bound to 0004h
thread 7: sysvmsg   524289 type 0 bound to 0002h
thread 1: sysvmsg   524289 type 1 bound to 0002h
thread 8: sysvmsg   557058 type 0 bound to 0004h
thread 3: sysvmsg   589827 type 1 bound to 0008h
thread 6: sysvmsg   491520 type 0 bound to 0001h
thread 4: sysvmsg   622596 type 1 bound to 0010h
thread 11: sysvmsg   655365 type 0 bound to 0020h
thread 5: sysvmsg   655365 type 1 bound to 0020h
thread 10: sysvmsg   622596 type 0 bound to 0010h
thread 9: sysvmsg   589827 type 0 bound to 0008h
Result matrix:
  Thread   0:    17229       6:    17230
  Thread   1:    17442       7:    17442
  Thread   2:    17337       8:    17337
  Thread   3:    17420       9:    17421
  Thread   4:    31020      10:    31020
  Thread   5:    31175      11:    31176
Total: 263249
pmsg [nr queues] [timeout]
Using 7 queues/cpus (14 threads) for 5 seconds.
thread 0: sysvmsg   688128 type 1 bound to 0001h
thread 1: sysvmsg   720897 type 1 bound to 0002h
thread 2: sysvmsg   753666 type 1 bound to 0004h
thread 3: sysvmsg   786435 type 1 bound to 0008h
thread 4: sysvmsg   819204 type 1 bound to 0010h
thread 5: sysvmsg   851973 type 1 bound to 0020h
thread 6: sysvmsg   884742 type 1 bound to 0040h
thread 8: sysvmsg   720897 type 0 bound to 0002h
thread 10: sysvmsg   786435 type 0 bound to 0008h
thread 9: sysvmsg   753666 type 0 bound to 0004h
thread 11: sysvmsg   819204 type 0 bound to 0010h
thread 7: sysvmsg   688128 type 0 bound to 0001h
thread 13: sysvmsg   884742 type 0 bound to 0040h
thread 12: sysvmsg   851973 type 0 bound to 0020h
Result matrix:
  Thread   0:    12058       7:    12058
  Thread   1:    12054       8:    12055
  Thread   2:    12023       9:    12023
  Thread   3:    12062      10:    12062
  Thread   4:    15940      11:    15940
  Thread   5:    15725      12:    15726
  Thread   6:    16008      13:    16008
Total: 191742
pmsg [nr queues] [timeout]
Using 8 queues/cpus (16 threads) for 5 seconds.
thread 1: sysvmsg   950273 type 1 bound to 0002h
thread 2: sysvmsg   983042 type 1 bound to 0004h
thread 3: sysvmsg  1015811 type 1 bound to 0008h
thread 4: sysvmsg  1048580 type 1 bound to 0010h
thread 0: sysvmsg   917504 type 1 bound to 0001h
thread 8: sysvmsg   917504 type 0 bound to 0001h
thread 10: sysvmsg   983042 type 0 bound to 0004h
thread 9: sysvmsg   950273 type 0 bound to 0002h
thread 12: sysvmsg  1048580 type 0 bound to 0010h
thread 11: sysvmsg  1015811 type 0 bound to 0008h
thread 5: sysvmsg  1081349 type 1 bound to 0020h
thread 6: sysvmsg  1114118 type 1 bound to 0040h
thread 13: sysvmsg  1081349 type 0 bound to 0020h
thread 7: sysvmsg  1146887 type 1 bound to 0080h
thread 14: sysvmsg  1114118 type 0 bound to 0040h
thread 15: sysvmsg  1146887 type 0 bound to 0080h
Result matrix:
  Thread   0:     8757       8:     8757
  Thread   1:     8819       9:     8819
  Thread   2:     8848      10:     8849
  Thread   3:     8832      11:     8832
  Thread   4:     8930      12:     8931
  Thread   5:     8893      13:     8894
  Thread   6:     8894      14:     8894
  Thread   7:     8886      15:     8887
Total: 141722

[-- Attachment #3: output.25_mm1.patch.8 --]
[-- Type: text/x-troff-man, Size: 5851 bytes --]

pmsg [nr queues] [timeout]
Using 1 queues/cpus (2 threads) for 5 seconds.
thread 0: sysvmsg  1507328 type 1 bound to 0001h
thread 1: sysvmsg  1507328 type 0 bound to 0001h
Result matrix:
  Thread   0:   492748       1:   492748
Total: 985496
pmsg [nr queues] [timeout]
Using 2 queues/cpus (4 threads) for 5 seconds.
thread 0: sysvmsg  1540096 type 1 bound to 0001h
thread 1: sysvmsg  1572865 type 1 bound to 0002h
thread 3: sysvmsg  1572865 type 0 bound to 0002h
thread 2: sysvmsg  1540096 type 0 bound to 0001h
Result matrix:
  Thread   0:   436429       2:   436430
  Thread   1:   433866       3:   433866
Total: 1740591
pmsg [nr queues] [timeout]
Using 3 queues/cpus (6 threads) for 5 seconds.
thread 0: sysvmsg  1605632 type 1 bound to 0001h
thread 1: sysvmsg  1638401 type 1 bound to 0002h
thread 2: sysvmsg  1671170 type 1 bound to 0004h
thread 5: sysvmsg  1671170 type 0 bound to 0004h
thread 4: sysvmsg  1638401 type 0 bound to 0002h
thread 3: sysvmsg  1605632 type 0 bound to 0001h
Result matrix:
  Thread   0:   388457       3:   388457
  Thread   1:   384674       4:   384674
  Thread   2:   390603       5:   390603
Total: 2327468
pmsg [nr queues] [timeout]
Using 4 queues/cpus (8 threads) for 5 seconds.
thread 0: sysvmsg  1703936 type 1 bound to 0001h
thread 1: sysvmsg  1736705 type 1 bound to 0002h
thread 2: sysvmsg  1769474 type 1 bound to 0004h
thread 3: sysvmsg  1802243 type 1 bound to 0008h
thread 6: sysvmsg  1769474 type 0 bound to 0004h
thread 5: sysvmsg  1736705 type 0 bound to 0002h
thread 7: sysvmsg  1802243 type 0 bound to 0008h
thread 4: sysvmsg  1703936 type 0 bound to 0001h
Result matrix:
  Thread   0:   307122       4:   307122
  Thread   1:   308710       5:   308711
  Thread   2:   307589       6:   307589
  Thread   3:   309472       7:   309472
Total: 2465787
pmsg [nr queues] [timeout]
Using 5 queues/cpus (10 threads) for 5 seconds.
thread 0: sysvmsg  1835008 type 1 bound to 0001h
thread 1: sysvmsg  1867777 type 1 bound to 0002h
thread 2: sysvmsg  1900546 type 1 bound to 0004h
thread 3: sysvmsg  1933315 type 1 bound to 0008h
thread 4: sysvmsg  1966084 type 1 bound to 0010h
thread 7: sysvmsg  1900546 type 0 bound to 0004h
thread 9: sysvmsg  1966084 type 0 bound to 0010h
thread 8: sysvmsg  1933315 type 0 bound to 0008h
thread 6: sysvmsg  1867777 type 0 bound to 0002h
thread 5: sysvmsg  1835008 type 0 bound to 0001h
Result matrix:
  Thread   0:   338483       5:   338483
  Thread   1:   266569       6:   266569
  Thread   2:   285255       7:   285256
  Thread   3:   335819       8:   335819
  Thread   4:   381711       9:   381712
Total: 3215676
pmsg [nr queues] [timeout]
Using 6 queues/cpus (12 threads) for 5 seconds.
thread 0: sysvmsg  1998848 type 1 bound to 0001h
thread 2: sysvmsg  2064386 type 1 bound to 0004h
thread 7: sysvmsg  2031617 type 0 bound to 0002h
thread 3: sysvmsg  2097155 type 1 bound to 0008h
thread 4: sysvmsg  2129924 type 1 bound to 0010h
thread 5: sysvmsg  2162693 type 1 bound to 0020h
thread 8: sysvmsg  2064386 type 0 bound to 0004h
thread 1: sysvmsg  2031617 type 1 bound to 0002h
thread 6: sysvmsg  1998848 type 0 bound to 0001h
thread 11: sysvmsg  2162693 type 0 bound to 0020h
thread 10: sysvmsg  2129924 type 0 bound to 0010h
thread 9: sysvmsg  2097155 type 0 bound to 0008h
Result matrix:
  Thread   0:   318850       6:   318850
  Thread   1:   307664       7:   307664
  Thread   2:   316157       8:   316157
  Thread   3:   274740       9:   274741
  Thread   4:   352108      10:   352109
  Thread   5:   355040      11:   355041
Total: 3849121
pmsg [nr queues] [timeout]
Using 7 queues/cpus (14 threads) for 5 seconds.
thread 1: sysvmsg  2228225 type 1 bound to 0002h
thread 7: sysvmsg  2195456 type 0 bound to 0001h
thread 2: sysvmsg  2260994 type 1 bound to 0004h
thread 3: sysvmsg  2293763 type 1 bound to 0008h
thread 4: sysvmsg  2326532 type 1 bound to 0010h
thread 5: sysvmsg  2359301 type 1 bound to 0020h
thread 13: sysvmsg  2392070 type 0 bound to 0040h
thread 9: sysvmsg  2260994 type 0 bound to 0004h
thread 8: sysvmsg  2228225 type 0 bound to 0002h
thread 0: sysvmsg  2195456 type 1 bound to 0001h
thread 12: sysvmsg  2359301 type 0 bound to 0020h
thread 6: sysvmsg  2392070 type 1 bound to 0040h
thread 11: sysvmsg  2326532 type 0 bound to 0010h
thread 10: sysvmsg  2293763 type 0 bound to 0008h
Result matrix:
  Thread   0:   311401       7:   311402
  Thread   1:   255123       8:   255124
  Thread   2:   311868       9:   311868
  Thread   3:   299011      10:   299011
  Thread   4:   280114      11:   280115
  Thread   5:   314069      12:   314070
  Thread   6:   333345      13:   333345
Total: 4209866
pmsg [nr queues] [timeout]
Using 8 queues/cpus (16 threads) for 5 seconds.
thread 0: sysvmsg  2424832 type 1 bound to 0001h
thread 1: sysvmsg  2457601 type 1 bound to 0002h
thread 2: sysvmsg  2490370 type 1 bound to 0004h
thread 3: sysvmsg  2523139 type 1 bound to 0008h
thread 4: sysvmsg  2555908 type 1 bound to 0010h
thread 5: sysvmsg  2588677 type 1 bound to 0020h
thread 6: sysvmsg  2621446 type 1 bound to 0040h
thread 7: sysvmsg  2654215 type 1 bound to 0080h
thread 10: sysvmsg  2490370 type 0 bound to 0004h
thread 13: sysvmsg  2588677 type 0 bound to 0020h
thread 9: sysvmsg  2457601 type 0 bound to 0002h
thread 12: sysvmsg  2555908 type 0 bound to 0010h
thread 11: sysvmsg  2523139 type 0 bound to 0008h
thread 8: sysvmsg  2424832 type 0 bound to 0001h
thread 14: sysvmsg  2621446 type 0 bound to 0040h
thread 15: sysvmsg  2654215 type 0 bound to 0080h
Result matrix:
  Thread   0:   234942       8:   234942
  Thread   1:   262977       9:   262978
  Thread   2:   294207      10:   294208
  Thread   3:   288878      11:   288879
  Thread   4:   273532      12:   273532
  Thread   5:   248904      13:   248904
  Thread   6:   260121      14:   260122
  Thread   7:   264229      15:   264230
Total: 4255585

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10] Scalability requirements for sysv ipc - v2
  2008-05-05 16:52   ` Nadia Derbey
@ 2008-05-05 17:07     ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2008-05-05 17:07 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: manfred, paulmck, linux-kernel, efault

On Mon, 05 May 2008 18:52:50 +0200 Nadia Derbey <Nadia.Derbey@bull.net> wrote:

>                  2.6.25-mm1 (ref)   2.6.25-mm1(patched)
> 
> Total (1 cpu):   876000             985496
> Total (2 cpus): 1549592            1740591
> Total (3 cpus): 1694370            2327468
> Total (4 cpus):  404553            2465787
> Total (5 cpus):  391283            3215676
> Total (6 cpus):  263249            3849121
> Total (7 cpus):  191742            4209866
> Total (8 cpus):  141722            4255585
> 

Looks encouraging.  A comparison with 2.6.22 (or whichever kernel it was
before we broke it) would be interesting too, please.

> 
> Full output files attached.
>
> Will send the rest tomorrow.

Nobody reads the raw data, sorry.  And it is inefficient to have everyone
independently pore over the raw data extracting the (same) actual
information.

Please condense the result of your performance testing into a
preferably-less-than-one-page summary and include (and maintain) that within
the patch changelog.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-05-05 17:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-29 14:33 [PATCH 00/10] Scalability requirements for sysv ipc - v2 Nadia.Derbey
2008-04-29 14:33 ` [PATCH 01/10] Fix idr_remove() Nadia.Derbey
2008-04-29 18:44   ` Andrew Morton
2008-05-05  9:26     ` Nadia Derbey
2008-04-29 14:33 ` [PATCH 02/10] Introduce the ridr structure Nadia.Derbey
2008-05-01  4:30   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 03/10] Introduce ridr_pre_get() Nadia.Derbey
2008-05-01  4:31   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 04/10] Introduce ridr_init() Nadia.Derbey
2008-05-01  4:31   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 05/10] Introduce ridr_get_new_above() Nadia.Derbey
2008-05-01  4:32   ` Tim Pepper
2008-05-05 10:33     ` Nadia Derbey
2008-04-29 14:33 ` [PATCH 06/10] Introduce ridr_get_new() Nadia.Derbey
2008-05-01  4:32   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 07/10] Introduce ridr_find() Nadia.Derbey
2008-05-01  4:32   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 08/10] Introduce ridr_remove() Nadia.Derbey
2008-05-01  4:32   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 09/10] Integrate the ridr code into IPC code Nadia.Derbey
2008-05-01  4:32   ` Tim Pepper
2008-04-29 14:33 ` [PATCH 10/10] Get rid of ipc_lock_down() Nadia.Derbey
2008-05-01  4:33   ` Tim Pepper
2008-04-29 18:54 ` [PATCH 00/10] Scalability requirements for sysv ipc - v2 Andrew Morton
2008-05-05 16:52   ` Nadia Derbey
2008-05-05 17:07     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox