public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Add for_each_module iterator function
@ 2010-09-22 10:08 Andi Kleen
  2010-09-22 10:08 ` [PATCH 2/2] Rewrite jump_label.c to use binary search Andi Kleen
  2010-09-22 12:25 ` [PATCH 1/2] Add for_each_module iterator function Thomas Gleixner
  0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2010-09-22 10:08 UTC (permalink / raw)
  To: jbaron
  Cc: rostedt, linux-kernel, mingo, mathieu.desnoyers, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

This is a generic function to iterate over all modules.
To be used in the next patch.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/module.h |    1 +
 kernel/module.c        |   10 ++++++++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 403ac26..809b6db 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -410,6 +410,7 @@ struct module *__module_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
 bool is_module_percpu_address(unsigned long addr);
 bool is_module_text_address(unsigned long addr);
+void for_each_module(void (*op)(struct module *, void *arg), void *arg);
 
 static inline int within_module_core(unsigned long addr, struct module *mod)
 {
diff --git a/kernel/module.c b/kernel/module.c
index eba1341..b8fb3e6 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -384,6 +384,16 @@ struct module *find_module(const char *name)
 }
 EXPORT_SYMBOL_GPL(find_module);
 
+void for_each_module(void (*op)(struct module *, void *arg), void *arg)
+{
+	struct module *mod;
+
+	preempt_disable();
+	list_for_each_entry_rcu(mod, &modules, list)
+		op(mod, arg);
+	preempt_enable();
+}
+
 #ifdef CONFIG_SMP
 
 static inline void __percpu *mod_percpu(struct module *mod)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 10:08 [PATCH 1/2] Add for_each_module iterator function Andi Kleen
@ 2010-09-22 10:08 ` Andi Kleen
  2010-09-22 11:31   ` Mathieu Desnoyers
  2010-09-22 16:12   ` H. Peter Anvin
  2010-09-22 12:25 ` [PATCH 1/2] Add for_each_module iterator function Thomas Gleixner
  1 sibling, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2010-09-22 10:08 UTC (permalink / raw)
  To: jbaron
  Cc: rostedt, linux-kernel, mingo, mathieu.desnoyers, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

After the discussion on linux-kernel I decided to rewrite the
hash table based jump_label.c to use binary search as proposed.

This shrunk the code dramatically by >50%:

hash:
 2763      80     544    3387     d3b kernel/jump_label.o
binsearch:
 1072      40       0    1112     458 kernel/jump_label.o

This is only code size, at runtime there will be additional savings
because the new code does not allocate any additional memory.

Also the code is a lot simpler now:

jump_label.c |  391 +++++++++++------------------------------------------------
 1 files changed, 74 insertions(+), 317 deletions(-)

(+ a few lines for a new generic utility function in module.c)

I believe it's also much cleaner than before.

I did some performance tests comparing the hash implementation
with the binary search. This was with a moderate config x86-64
kernel with 68 modules loaded and about 480 trace points active
in several modules and 1340 dynamic debug statements.

I measured arming all/unarming trace points and enabling all
dynamic debug points. The differences in cycles were always
within 30%, in fact in the dynamic debug case the binsearch
implementation was even faster.

In all cases the runtimes were less than a single ms, so
there was no user visible difference at all.

So this is a vastly simpler implementation with in practice
equivalent performance and less memory overhead (no data structures
outside the sections)

The patch is on top of Jason's patchkit.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/jump_label.c |  391 ++++++++++-----------------------------------------
 1 files changed, 74 insertions(+), 317 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index f82878b..478625e 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -2,133 +2,78 @@
  * jump label support
  *
  * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
- *
+ * Rewritten in 2010 by Andi Kleen
  */
 #include <linux/jump_label.h>
 #include <linux/memory.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
-#include <linux/list.h>
-#include <linux/jhash.h>
-#include <linux/slab.h>
 #include <linux/sort.h>
 #include <linux/err.h>
 
 #ifdef HAVE_JUMP_LABEL
 
-#define JUMP_LABEL_HASH_BITS 6
-#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
-static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
+#define ENTRIES(start, stop) (((stop) - (start)) / sizeof(*(start)))
 
 /* mutex to protect coming/going of the the jump_label table */
 static DEFINE_MUTEX(jump_label_mutex);
 
-struct jump_label_entry {
-	struct hlist_node hlist;
-	struct jump_entry *table;
-	int nr_entries;
-	/* hang modules off here */
-	struct hlist_head modules;
-	unsigned long key;
-};
-
-struct jump_label_module_entry {
-	struct hlist_node hlist;
-	struct jump_entry *table;
-	int nr_entries;
-	struct module *mod;
-};
-
 static int jump_label_cmp(const void *a, const void *b)
 {
 	const struct jump_entry *jea = a;
 	const struct jump_entry *jeb = b;
 
-	if (jea->key < jeb->key)
-		return -1;
-
-	if (jea->key > jeb->key)
-		return 1;
-
-	return 0;
+	return jea->key - jeb->key;
 }
 
 static void sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
 {
-	unsigned long size;
-
-	size = (((unsigned long)stop - (unsigned long)start)
-					/ sizeof(struct jump_entry));
-	sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
+	sort(start, ENTRIES(start, stop), sizeof(struct jump_entry), jump_label_cmp, 
+	     NULL);
 }
 
-static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
+static struct jump_entry *
+search_jump_table(struct jump_entry *first, struct jump_entry *last, 
+		  unsigned long key)
 {
-	struct hlist_head *head;
-	struct hlist_node *node;
-	struct jump_label_entry *e;
-	u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
-
-	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
-	hlist_for_each_entry(e, node, head, hlist) {
-		if (key == e->key)
-			return e;
+	while (first <= last) { 
+		struct jump_entry *mid = (last - first)/2 + first;
+		
+		if (mid->key < key)
+			first = mid + 1;
+		else if (mid->key > key)
+			last = mid - 1;
+		else
+			return mid;
 	}
 	return NULL;
 }
 
-static struct jump_label_entry *add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
+void patch_jump_table(unsigned long key, enum jump_label_type type,
+	 	      struct jump_entry *start, struct jump_entry *stop)
 {
-	struct hlist_head *head;
-	struct jump_label_entry *e;
-	u32 hash;
-
-	e = get_jump_label_entry(key);
-	if (e)
-		return ERR_PTR(-EEXIST);
-
-	e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
-	if (!e)
-		return ERR_PTR(-ENOMEM);
-
-	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
-	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
-	e->key = key;
-	e->table = table;
-	e->nr_entries = nr_entries;
-	INIT_HLIST_HEAD(&(e->modules));
-	hlist_add_head(&e->hlist, head);
-	return e;
+	struct jump_entry *entry = search_jump_table(start, stop, key);
+	if (!entry)
+		return;
+	while (entry > start && entry[-1].key == key)
+		entry--;
+	for (; entry < stop && entry->key == key; entry++)
+		if (kernel_text_address(entry->code))
+			arch_jump_label_transform(entry, type);
 }
 
-static int build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
+struct args {
+	unsigned long key;
+	enum jump_label_type type;
+};
+
+static void module_patch_jump_table(struct module *mod, void *args)
 {
-	struct jump_entry *iter, *iter_begin;
-	struct jump_label_entry *entry;
-	int count;
+	struct args *a = args;
 
-	sort_jump_label_entries(start, stop);
-	iter = start;
-	while (iter < stop) {
-		entry = get_jump_label_entry(iter->key);
-		if (!entry) {
-			iter_begin = iter;
-			count = 0;
-			while ((iter < stop) &&
-				(iter->key == iter_begin->key)) {
-				iter++;
-				count++;
-			}
-			entry = add_jump_label_entry(iter_begin->key,
-							count, iter_begin);
-			if (IS_ERR(entry))
-				return PTR_ERR(entry);
-		 } else {
-			WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
-			return -1;
-		}
-	}
-	return 0;
+	if (mod->num_jump_entries)
+		patch_jump_table(a->key, a->type, mod->jump_entries,
+				 mod->jump_entries + mod->num_jump_entries);
 }
 
 /***
@@ -143,82 +88,42 @@ static int build_jump_label_hashtable(struct jump_entry *start, struct jump_entr
 
 void jump_label_update(unsigned long key, enum jump_label_type type)
 {
-	struct jump_entry *iter;
-	struct jump_label_entry *entry;
-	struct hlist_node *module_node;
-	struct jump_label_module_entry *e_module;
-	int count;
+	struct args args = { .key = key, .type = type };
 
 	mutex_lock(&jump_label_mutex);
-	entry = get_jump_label_entry((jump_label_t)key);
-	if (entry) {
-		count = entry->nr_entries;
-		iter = entry->table;
-		while (count--) {
-			if (kernel_text_address(iter->code))
-				arch_jump_label_transform(iter, type);
-			iter++;
-		}
-		/* eanble/disable jump labels in modules */
-		hlist_for_each_entry(e_module, module_node, &(entry->modules),
-							hlist) {
-			count = e_module->nr_entries;
-			iter = e_module->table;
-			while (count--) {
-				if (kernel_text_address(iter->code))
-					arch_jump_label_transform(iter, type);
-				iter++;
-			}
-		}
-	}
+	patch_jump_table(key, type, __start___jump_table, __stop___jump_table);
+	for_each_module(module_patch_jump_table, &args);
 	mutex_unlock(&jump_label_mutex);
 }
 
-static int addr_conflict(struct jump_entry *entry, void *start, void *end)
+static int check_conflict(struct jump_entry *jstart, struct jump_entry *jstop,
+			   void *start, void *stop)
 {
-	if (entry->code <= (unsigned long)end &&
-		entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
-		return 1;
+	struct jump_entry *entry;
 
+	for (entry = jstart; entry < jstop; entry++)
+		if (entry->code <= (unsigned long)start &&
+		entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
+			return 1;
 	return 0;
 }
 
-#ifdef CONFIG_MODULES
+struct conflict_args {
+	void *start, *end;
+	int conflict;
+};
 
-static int module_conflict(void *start, void *end)
+static void module_check_conflict(struct module *mod, void *args)
 {
-	struct hlist_head *head;
-	struct hlist_node *node, *node_next, *module_node, *module_node_next;
-	struct jump_label_entry *e;
-	struct jump_label_module_entry *e_module;
-	struct jump_entry *iter;
-	int i, count;
-	int conflict = 0;
+	struct conflict_args *a = args;
 
-	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
-		head = &jump_label_table[i];
-		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
-			hlist_for_each_entry_safe(e_module, module_node,
-							module_node_next,
-							&(e->modules), hlist) {
-				count = e_module->nr_entries;
-				iter = e_module->table;
-				while (count--) {
-					if (addr_conflict(iter, start, end)) {
-						conflict = 1;
-						goto out;
-					}
-					iter++;
-				}
-			}
-		}
-	}
-out:
-	return conflict;
+	if (within_module_core((unsigned long)a->start, mod) || 
+	    within_module_init((unsigned long)a->start, mod))
+		a->conflict = check_conflict(mod->jump_entries, 
+				     mod->jump_entries + mod->num_jump_entries, 
+				     a->start, a->end);
 }
 
-#endif
-
 /***
  * jump_label_text_reserved - check if addr range is reserved
  * @start: start text addr
@@ -233,189 +138,41 @@ out:
  */
 int jump_label_text_reserved(void *start, void *end)
 {
-	struct jump_entry *iter;
-	struct jump_entry *iter_start = __start___jump_table;
-	struct jump_entry *iter_stop = __start___jump_table;
-	int conflict = 0;
+	struct conflict_args args = { .start = start, .end = end };
 
 	mutex_lock(&jump_label_mutex);
-	iter = iter_start;
-	while (iter < iter_stop) {
-		if (addr_conflict(iter, start, end)) {
-			conflict = 1;
-			goto out;
-		}
-		iter++;
-	}
-
+	args.conflict = check_conflict(__start___jump_table, 
+				       __stop___jump_table, start, end);
 	/* now check modules */
 #ifdef CONFIG_MODULES
-	conflict = module_conflict(start, end);
+	if (!args.conflict)
+		for_each_module(module_check_conflict, &args);
 #endif
-out:
 	mutex_unlock(&jump_label_mutex);
-	return conflict;
+	return args.conflict;
 }
 
-static __init int init_jump_label(void)
+static void setup_jump_label(struct jump_entry *start, struct jump_entry *stop)
 {
-	int ret;
-	struct jump_entry *iter_start = __start___jump_table;
-	struct jump_entry *iter_stop = __stop___jump_table;
-	struct jump_entry *iter;
+	struct jump_entry *entry;
 
 	mutex_lock(&jump_label_mutex);
-	ret = build_jump_label_hashtable(__start___jump_table,
-					 __stop___jump_table);
-	iter = iter_start;
-	while (iter < iter_stop) {
-		arch_jump_label_text_poke_early(iter->code);
-		iter++;
-	}
+	sort_jump_label_entries(start, stop);
+	for (entry = start; entry < stop; entry++)
+		arch_jump_label_text_poke_early(entry->code);
 	mutex_unlock(&jump_label_mutex);
-	return ret;
-}
-early_initcall(init_jump_label);
-
-#ifdef CONFIG_MODULES
-
-static struct jump_label_module_entry *add_jump_label_module_entry(struct jump_label_entry *entry, struct jump_entry *iter_begin, int count, struct module *mod)
-{
-	struct jump_label_module_entry *e;
-
-	e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
-	if (!e)
-		return ERR_PTR(-ENOMEM);
-	e->mod = mod;
-	e->nr_entries = count;
-	e->table = iter_begin;
-	hlist_add_head(&e->hlist, &entry->modules);
-	return e;
 }
 
-static int add_jump_label_module(struct module *mod)
+static int init_jump_label(void)
 {
-	struct jump_entry *iter, *iter_begin;
-	struct jump_label_entry *entry;
-	struct jump_label_module_entry *module_entry;
-	int count;
-
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return 0;
-
-	sort_jump_label_entries(mod->jump_entries,
-				mod->jump_entries + mod->num_jump_entries);
-	iter = mod->jump_entries;
-	while (iter < mod->jump_entries + mod->num_jump_entries) {
-		entry = get_jump_label_entry(iter->key);
-		iter_begin = iter;
-		count = 0;
-		while ((iter < mod->jump_entries + mod->num_jump_entries) &&
-			(iter->key == iter_begin->key)) {
-				iter++;
-				count++;
-		}
-		if (!entry) {
-			entry = add_jump_label_entry(iter_begin->key, 0, NULL);
-			if (IS_ERR(entry))
-				return PTR_ERR(entry);
-		}
-		module_entry = add_jump_label_module_entry(entry, iter_begin,
-							   count, mod);
-		if (IS_ERR(module_entry))
-			return PTR_ERR(module_entry);
-	}
+	setup_jump_label(__start___jump_table, __stop___jump_table);
 	return 0;
 }
+early_initcall(init_jump_label);
 
-static void remove_jump_label_module(struct module *mod)
-{
-	struct hlist_head *head;
-	struct hlist_node *node, *node_next, *module_node, *module_node_next;
-	struct jump_label_entry *e;
-	struct jump_label_module_entry *e_module;
-	int i;
-
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return;
-
-	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
-		head = &jump_label_table[i];
-		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
-			hlist_for_each_entry_safe(e_module, module_node,
-						  module_node_next,
-						  &(e->modules), hlist) {
-				if (e_module->mod == mod) {
-					hlist_del(&e_module->hlist);
-					kfree(e_module);
-				}
-			}
-			if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
-				hlist_del(&e->hlist);
-				kfree(e);
-			}
-		}
-	}
-}
-
-static int jump_label_module_notify(struct notifier_block *self, unsigned long val, void *data)
-{
-	struct module *mod = data;
-	int ret = 0;
-
-	switch (val) {
-	case MODULE_STATE_COMING:
-		mutex_lock(&jump_label_mutex);
-		ret = add_jump_label_module(mod);
-		if (ret)
-			remove_jump_label_module(mod);
-		mutex_unlock(&jump_label_mutex);
-		break;
-	case MODULE_STATE_GOING:
-		mutex_lock(&jump_label_mutex);
-		remove_jump_label_module(mod);
-		mutex_unlock(&jump_label_mutex);
-		break;
-	}
-	return ret;
-}
-
-/***
- * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
- * @mod: module to patch
- *
- * Allow for run-time selection of the optimal nops. Before the module
- * loads patch these with arch_get_jump_label_nop(), which is specified by
- * the arch specific jump label code.
- */
 void jump_label_apply_nops(struct module *mod)
 {
-	struct jump_entry *iter;
-
-	/* if the module doesn't have jump label entries, just return */
-	if (!mod->num_jump_entries)
-		return;
-
-	iter = mod->jump_entries;
-	while (iter < mod->jump_entries + mod->num_jump_entries) {
-		arch_jump_label_text_poke_early(iter->code);
-		iter++;
-	}
-}
-
-struct notifier_block jump_label_module_nb = {
-	.notifier_call = jump_label_module_notify,
-	.priority = 0,
-};
-
-static __init int init_jump_label_module(void)
-{
-	return register_module_notifier(&jump_label_module_nb);
+	setup_jump_label(mod->jump_entries, 
+			 mod->jump_entries + mod->num_jump_entries);
 }
-early_initcall(init_jump_label_module);
-
-#endif /* CONFIG_MODULES */
-
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 10:08 ` [PATCH 2/2] Rewrite jump_label.c to use binary search Andi Kleen
@ 2010-09-22 11:31   ` Mathieu Desnoyers
  2010-09-22 11:56     ` Andi Kleen
  2010-09-22 11:57     ` Frederic Weisbecker
  2010-09-22 16:12   ` H. Peter Anvin
  1 sibling, 2 replies; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 11:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbaron, rostedt, linux-kernel, mingo, hpa, tglx, roland, rth,
	mhiramat, fweisbec, avi, davem, vgoyal, sam, tony, Andi Kleen

* Andi Kleen (andi@firstfloor.org) wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> After the discussion on linux-kernel I decided to rewrite the
> hash table based jump_label.c to use binary search as proposed.
> 
> This shrunk the code dramatically by >50%:
> 
> hash:
>  2763      80     544    3387     d3b kernel/jump_label.o
> binsearch:
>  1072      40       0    1112     458 kernel/jump_label.o
> 
> This is only code size, at runtime there will be additional savings
> because the new code does not allocate any additional memory.
> 
> Also the code is a lot simpler now:
> 
> jump_label.c |  391 +++++++++++------------------------------------------------
>  1 files changed, 74 insertions(+), 317 deletions(-)
> 
> (+ a few lines for a new generic utility function in module.c)
> 
> I believe it's also much cleaner than before.
> 
> I did some performance tests comparing the hash implementation
> with the binary search. This was with a moderate config x86-64
> kernel with 68 modules loaded and about 480 trace points active
> in several modules and 1340 dynamic debug statements.
> 
> I measured arming all/unarming trace points and enabling all
> dynamic debug points. The differences in cycles were always
> within 30%, in fact in the dynamic debug case the binsearch
> implementation was even faster.
> 
> In all cases the runtimes were less than a single ms, so
> there was no user visible difference at all.
> 
> So this is a vastly simpler implementation with in practice
> equivalent performance and less memory overhead (no data structures
> outside the sections)

If both code size and performance show no significant degradation over
Jason's hash table approach, then it's definitely worth looking into it.
Thanks for proposing this patch, some comments follow. Due to the bugs I
identified below, benchmarks should be re-run to fairly compare your
approach to Jason's. Your current code does not seem to enable module
static jumps at all.

> 
> The patch is on top of Jason's patchkit.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  kernel/jump_label.c |  391 ++++++++++-----------------------------------------
>  1 files changed, 74 insertions(+), 317 deletions(-)
> 
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index f82878b..478625e 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -2,133 +2,78 @@
>   * jump label support
>   *
>   * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
> - *
> + * Rewritten in 2010 by Andi Kleen
>   */
>  #include <linux/jump_label.h>
>  #include <linux/memory.h>
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
> -#include <linux/list.h>
> -#include <linux/jhash.h>
> -#include <linux/slab.h>
>  #include <linux/sort.h>
>  #include <linux/err.h>
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> -#define JUMP_LABEL_HASH_BITS 6
> -#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
> -static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
> +#define ENTRIES(start, stop) (((stop) - (start)) / sizeof(*(start)))
>  
>  /* mutex to protect coming/going of the the jump_label table */
>  static DEFINE_MUTEX(jump_label_mutex);
>  
> -struct jump_label_entry {
> -	struct hlist_node hlist;
> -	struct jump_entry *table;
> -	int nr_entries;
> -	/* hang modules off here */
> -	struct hlist_head modules;
> -	unsigned long key;
> -};
> -
> -struct jump_label_module_entry {
> -	struct hlist_node hlist;
> -	struct jump_entry *table;
> -	int nr_entries;
> -	struct module *mod;
> -};
> -
>  static int jump_label_cmp(const void *a, const void *b)
>  {
>  	const struct jump_entry *jea = a;
>  	const struct jump_entry *jeb = b;
>  
> -	if (jea->key < jeb->key)
> -		return -1;
> -
> -	if (jea->key > jeb->key)
> -		return 1;
> -
> -	return 0;
> +	return jea->key - jeb->key;
>  }
>  
>  static void sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
>  {
> -	unsigned long size;
> -
> -	size = (((unsigned long)stop - (unsigned long)start)
> -					/ sizeof(struct jump_entry));
> -	sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
> +	sort(start, ENTRIES(start, stop), sizeof(struct jump_entry), jump_label_cmp, 
> +	     NULL);

If you really want to sort this table, then you should move it from
RODATA to DATA. I just looked at the bug table: it does not seem to be
sorted, so for that one it looks fine to leave it into RODATA.

Ideally we'd like to do this sorting at a post-compilation stage, so we
can leave the section RODATA.

>  }
>  
> -static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
> +static struct jump_entry *
> +search_jump_table(struct jump_entry *first, struct jump_entry *last, 
> +		  unsigned long key)
>  {
> -	struct hlist_head *head;
> -	struct hlist_node *node;
> -	struct jump_label_entry *e;
> -	u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
> -
> -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
> -	hlist_for_each_entry(e, node, head, hlist) {
> -		if (key == e->key)
> -			return e;
> +	while (first <= last) { 

There seems to be a mismatch between the caller and this function. The
module patch_jump_table() caller passes first and first + len, while
this expects first and last. This looks a little off by one to me.

> +		struct jump_entry *mid = (last - first)/2 + first;
> +		
> +		if (mid->key < key)
> +			first = mid + 1;
> +		else if (mid->key > key)
> +			last = mid - 1;
> +		else
> +			return mid;
>  	}
>  	return NULL;
>  }
>  
> -static struct jump_label_entry *add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
> +void patch_jump_table(unsigned long key, enum jump_label_type type,
> +	 	      struct jump_entry *start, struct jump_entry *stop)
>  {
> -	struct hlist_head *head;
> -	struct jump_label_entry *e;
> -	u32 hash;
> -
> -	e = get_jump_label_entry(key);
> -	if (e)
> -		return ERR_PTR(-EEXIST);
> -
> -	e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
> -	if (!e)
> -		return ERR_PTR(-ENOMEM);
> -
> -	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
> -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
> -	e->key = key;
> -	e->table = table;
> -	e->nr_entries = nr_entries;
> -	INIT_HLIST_HEAD(&(e->modules));
> -	hlist_add_head(&e->hlist, head);
> -	return e;
> +	struct jump_entry *entry = search_jump_table(start, stop, key);
> +	if (!entry)
> +		return;
> +	while (entry > start && entry[-1].key == key)
> +		entry--;

OK, I like the way it handles patching of multiple entries with the same
key at once. Sorting really makes sense here.

> +	for (; entry < stop && entry->key == key; entry++)
> +		if (kernel_text_address(entry->code))

This does not work for modules I'm afraid, only for the core kernel. You
should test for __module_text_address() somewhere.

Dumb question: why do you need this test at all ?

> +			arch_jump_label_transform(entry, type);
>  }
>  
> -static int build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
> +struct args {
> +	unsigned long key;
> +	enum jump_label_type type;
> +};
> +
> +static void module_patch_jump_table(struct module *mod, void *args)
>  {
> -	struct jump_entry *iter, *iter_begin;
> -	struct jump_label_entry *entry;
> -	int count;
> +	struct args *a = args;
>  
> -	sort_jump_label_entries(start, stop);
> -	iter = start;
> -	while (iter < stop) {
> -		entry = get_jump_label_entry(iter->key);
> -		if (!entry) {
> -			iter_begin = iter;
> -			count = 0;
> -			while ((iter < stop) &&
> -				(iter->key == iter_begin->key)) {
> -				iter++;
> -				count++;
> -			}
> -			entry = add_jump_label_entry(iter_begin->key,
> -							count, iter_begin);
> -			if (IS_ERR(entry))
> -				return PTR_ERR(entry);
> -		 } else {
> -			WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
> -			return -1;
> -		}
> -	}
> -	return 0;
> +	if (mod->num_jump_entries)
> +		patch_jump_table(a->key, a->type, mod->jump_entries,
> +				 mod->jump_entries + mod->num_jump_entries);
>  }
>  
>  /***
> @@ -143,82 +88,42 @@ static int build_jump_label_hashtable(struct jump_entry *start, struct jump_entr
>  
>  void jump_label_update(unsigned long key, enum jump_label_type type)
>  {
> -	struct jump_entry *iter;
> -	struct jump_label_entry *entry;
> -	struct hlist_node *module_node;
> -	struct jump_label_module_entry *e_module;
> -	int count;
> +	struct args args = { .key = key, .type = type };
>  
>  	mutex_lock(&jump_label_mutex);
> -	entry = get_jump_label_entry((jump_label_t)key);
> -	if (entry) {
> -		count = entry->nr_entries;
> -		iter = entry->table;
> -		while (count--) {
> -			if (kernel_text_address(iter->code))
> -				arch_jump_label_transform(iter, type);
> -			iter++;
> -		}
> -		/* eanble/disable jump labels in modules */
> -		hlist_for_each_entry(e_module, module_node, &(entry->modules),
> -							hlist) {
> -			count = e_module->nr_entries;
> -			iter = e_module->table;
> -			while (count--) {
> -				if (kernel_text_address(iter->code))
> -					arch_jump_label_transform(iter, type);
> -				iter++;
> -			}
> -		}
> -	}
> +	patch_jump_table(key, type, __start___jump_table, __stop___jump_table);
> +	for_each_module(module_patch_jump_table, &args);

Can we have a for_each_module that keeps preemption on ? Calling
arch_jump_label_transform() with preemption off is wrong, as it locks
the text_mutex and get/put online cpus, not to mention that
text_poke_smp() is probably still based on stop_machine().

Maybe we could stick a synchronize_rcu() in module delete and use a
proper RCU read-side C.S. in for_each_module ?

>  	mutex_unlock(&jump_label_mutex);
>  }
>  
> -static int addr_conflict(struct jump_entry *entry, void *start, void *end)
> +static int check_conflict(struct jump_entry *jstart, struct jump_entry *jstop,
> +			   void *start, void *stop)
>  {
> -	if (entry->code <= (unsigned long)end &&
> -		entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
> -		return 1;
> +	struct jump_entry *entry;
>  
> +	for (entry = jstart; entry < jstop; entry++)
> +		if (entry->code <= (unsigned long)start &&

start -> end ?

> +		entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)

Maybe it's my mail client, but indentation looks wrong here.

> +			return 1;
>  	return 0;
>  }
>  
> -#ifdef CONFIG_MODULES

Why make it build when modules are configured out ?

> +struct conflict_args {
> +	void *start, *end;
> +	int conflict;
> +};
>  
> -static int module_conflict(void *start, void *end)
> +static void module_check_conflict(struct module *mod, void *args)
>  {
> -	struct hlist_head *head;
> -	struct hlist_node *node, *node_next, *module_node, *module_node_next;
> -	struct jump_label_entry *e;
> -	struct jump_label_module_entry *e_module;
> -	struct jump_entry *iter;
> -	int i, count;
> -	int conflict = 0;
> +	struct conflict_args *a = args;
>  
> -	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> -		head = &jump_label_table[i];
> -		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> -			hlist_for_each_entry_safe(e_module, module_node,
> -							module_node_next,
> -							&(e->modules), hlist) {
> -				count = e_module->nr_entries;
> -				iter = e_module->table;
> -				while (count--) {
> -					if (addr_conflict(iter, start, end)) {
> -						conflict = 1;
> -						goto out;
> -					}
> -					iter++;
> -				}
> -			}
> -		}
> -	}
> -out:
> -	return conflict;
> +	if (within_module_core((unsigned long)a->start, mod) || 
> +	    within_module_init((unsigned long)a->start, mod))
> +		a->conflict = check_conflict(mod->jump_entries, 
> +				     mod->jump_entries + mod->num_jump_entries, 
> +				     a->start, a->end);
>  }
>  
> -#endif
> -
>  /***
>   * jump_label_text_reserved - check if addr range is reserved
>   * @start: start text addr
> @@ -233,189 +138,41 @@ out:
>   */
>  int jump_label_text_reserved(void *start, void *end)
>  {
> -	struct jump_entry *iter;
> -	struct jump_entry *iter_start = __start___jump_table;
> -	struct jump_entry *iter_stop = __start___jump_table;
> -	int conflict = 0;
> +	struct conflict_args args = { .start = start, .end = end };
>  
>  	mutex_lock(&jump_label_mutex);
> -	iter = iter_start;
> -	while (iter < iter_stop) {
> -		if (addr_conflict(iter, start, end)) {
> -			conflict = 1;
> -			goto out;
> -		}
> -		iter++;
> -	}
> -
> +	args.conflict = check_conflict(__start___jump_table, 
> +				       __stop___jump_table, start, end);
>  	/* now check modules */
>  #ifdef CONFIG_MODULES

ifdef could be removed from within this function body by replacing it
with a #ifdef CONFIG_MODULES #else #endif that defines an empty static
inline when disabled. Also for_each_module for be defined as an empty
static inline when modules are disabled.

Thanks!

Mathieu

> -	conflict = module_conflict(start, end);
> +	if (!args.conflict)
> +		for_each_module(module_check_conflict, &args);
>  #endif
> -out:
>  	mutex_unlock(&jump_label_mutex);
> -	return conflict;
> +	return args.conflict;
>  }
>  
> -static __init int init_jump_label(void)
> +static void setup_jump_label(struct jump_entry *start, struct jump_entry *stop)
>  {
> -	int ret;
> -	struct jump_entry *iter_start = __start___jump_table;
> -	struct jump_entry *iter_stop = __stop___jump_table;
> -	struct jump_entry *iter;
> +	struct jump_entry *entry;
>  
>  	mutex_lock(&jump_label_mutex);
> -	ret = build_jump_label_hashtable(__start___jump_table,
> -					 __stop___jump_table);
> -	iter = iter_start;
> -	while (iter < iter_stop) {
> -		arch_jump_label_text_poke_early(iter->code);
> -		iter++;
> -	}
> +	sort_jump_label_entries(start, stop);
> +	for (entry = start; entry < stop; entry++)
> +		arch_jump_label_text_poke_early(entry->code);
>  	mutex_unlock(&jump_label_mutex);
> -	return ret;
> -}
> -early_initcall(init_jump_label);
> -
> -#ifdef CONFIG_MODULES
> -
> -static struct jump_label_module_entry *add_jump_label_module_entry(struct jump_label_entry *entry, struct jump_entry *iter_begin, int count, struct module *mod)
> -{
> -	struct jump_label_module_entry *e;
> -
> -	e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
> -	if (!e)
> -		return ERR_PTR(-ENOMEM);
> -	e->mod = mod;
> -	e->nr_entries = count;
> -	e->table = iter_begin;
> -	hlist_add_head(&e->hlist, &entry->modules);
> -	return e;
>  }
>  
> -static int add_jump_label_module(struct module *mod)
> +static int init_jump_label(void)
>  {
> -	struct jump_entry *iter, *iter_begin;
> -	struct jump_label_entry *entry;
> -	struct jump_label_module_entry *module_entry;
> -	int count;
> -
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> -		return 0;
> -
> -	sort_jump_label_entries(mod->jump_entries,
> -				mod->jump_entries + mod->num_jump_entries);
> -	iter = mod->jump_entries;
> -	while (iter < mod->jump_entries + mod->num_jump_entries) {
> -		entry = get_jump_label_entry(iter->key);
> -		iter_begin = iter;
> -		count = 0;
> -		while ((iter < mod->jump_entries + mod->num_jump_entries) &&
> -			(iter->key == iter_begin->key)) {
> -				iter++;
> -				count++;
> -		}
> -		if (!entry) {
> -			entry = add_jump_label_entry(iter_begin->key, 0, NULL);
> -			if (IS_ERR(entry))
> -				return PTR_ERR(entry);
> -		}
> -		module_entry = add_jump_label_module_entry(entry, iter_begin,
> -							   count, mod);
> -		if (IS_ERR(module_entry))
> -			return PTR_ERR(module_entry);
> -	}
> +	setup_jump_label(__start___jump_table, __stop___jump_table);
>  	return 0;
>  }
> +early_initcall(init_jump_label);
>  
> -static void remove_jump_label_module(struct module *mod)
> -{
> -	struct hlist_head *head;
> -	struct hlist_node *node, *node_next, *module_node, *module_node_next;
> -	struct jump_label_entry *e;
> -	struct jump_label_module_entry *e_module;
> -	int i;
> -
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> -		return;
> -
> -	for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> -		head = &jump_label_table[i];
> -		hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> -			hlist_for_each_entry_safe(e_module, module_node,
> -						  module_node_next,
> -						  &(e->modules), hlist) {
> -				if (e_module->mod == mod) {
> -					hlist_del(&e_module->hlist);
> -					kfree(e_module);
> -				}
> -			}
> -			if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
> -				hlist_del(&e->hlist);
> -				kfree(e);
> -			}
> -		}
> -	}
> -}
> -
> -static int jump_label_module_notify(struct notifier_block *self, unsigned long val, void *data)
> -{
> -	struct module *mod = data;
> -	int ret = 0;
> -
> -	switch (val) {
> -	case MODULE_STATE_COMING:
> -		mutex_lock(&jump_label_mutex);
> -		ret = add_jump_label_module(mod);
> -		if (ret)
> -			remove_jump_label_module(mod);
> -		mutex_unlock(&jump_label_mutex);
> -		break;
> -	case MODULE_STATE_GOING:
> -		mutex_lock(&jump_label_mutex);
> -		remove_jump_label_module(mod);
> -		mutex_unlock(&jump_label_mutex);
> -		break;
> -	}
> -	return ret;
> -}
> -
> -/***
> - * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
> - * @mod: module to patch
> - *
> - * Allow for run-time selection of the optimal nops. Before the module
> - * loads patch these with arch_get_jump_label_nop(), which is specified by
> - * the arch specific jump label code.
> - */
>  void jump_label_apply_nops(struct module *mod)
>  {
> -	struct jump_entry *iter;
> -
> -	/* if the module doesn't have jump label entries, just return */
> -	if (!mod->num_jump_entries)
> -		return;
> -
> -	iter = mod->jump_entries;
> -	while (iter < mod->jump_entries + mod->num_jump_entries) {
> -		arch_jump_label_text_poke_early(iter->code);
> -		iter++;
> -	}
> -}
> -
> -struct notifier_block jump_label_module_nb = {
> -	.notifier_call = jump_label_module_notify,
> -	.priority = 0,
> -};
> -
> -static __init int init_jump_label_module(void)
> -{
> -	return register_module_notifier(&jump_label_module_nb);
> +	setup_jump_label(mod->jump_entries, 
> +			 mod->jump_entries + mod->num_jump_entries);
>  }
> -early_initcall(init_jump_label_module);
> -
> -#endif /* CONFIG_MODULES */
> -
>  #endif
> -- 
> 1.7.1
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 11:31   ` Mathieu Desnoyers
@ 2010-09-22 11:56     ` Andi Kleen
  2010-09-22 12:04       ` Andi Kleen
                         ` (2 more replies)
  2010-09-22 11:57     ` Frederic Weisbecker
  1 sibling, 3 replies; 21+ messages in thread
From: Andi Kleen @ 2010-09-22 11:56 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, jbaron, rusty, rostedt, linux-kernel, mingo, hpa,
	tglx, roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam,
	tony, Andi Kleen


Thanks for the review.

> If you really want to sort this table, then you should move it from
> RODATA to DATA. I just looked at the bug table: it does not seem to be
> sorted, so for that one it looks fine to leave it into RODATA.

FWIW  the table was already sorted before, i didn't actually
change that code except for writing it a bit more cleanly.

But you're right. If it's rodata it could fault. Normally
it shouldn't be though because the loader has to relocate it
for modules. I didn't test with rodata enabled, but will see
what happens. Maybe this happens at the right place that
it doesn't matter.

I suspect the original version had the same problem
if this one has.


> Ideally we'd like to do this sorting at a post-compilation stage, so we
> can leave the section RODATA.

That's complicated because you would need to fix relocations too.
I guess doing that would need to put significant parts of a linker
into modpost. Since the goal of this was to reduce complexity ...

This is patterned after the exception tables which always were
sorted at runtime, which is much easier.

Besides I don't see a lot of value of having this rodata

>
>>  }
>>
>> -static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
>> +static struct jump_entry *
>> +search_jump_table(struct jump_entry *first, struct jump_entry *last,
>> +		  unsigned long key)
>>  {
>> -	struct hlist_head *head;
>> -	struct hlist_node *node;
>> -	struct jump_label_entry *e;
>> -	u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
>> -
>> -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
>> -	hlist_for_each_entry(e, node, head, hlist) {
>> -		if (key == e->key)
>> -			return e;
>> +	while (first <= last) {
>
> There seems to be a mismatch between the caller and this function. The
> module patch_jump_table() caller passes first and first + len, while
> this expects first and last. This looks a little off by one to me.

True. Will fix that.


>> +	for (; entry < stop && entry->key == key; entry++)
>> +		if (kernel_text_address(entry->code))
>
> This does not work for modules I'm afraid, only for the core kernel. You
> should test for __module_text_address() somewhere.

I thought it was shared now, but ok.

> Dumb question: why do you need this test at all ?

I just copied that from the original code. In fact I was wondering
too why it's needed. Jason?


>> -	}
>> +	patch_jump_table(key, type, __start___jump_table,
>> __stop___jump_table);
>> +	for_each_module(module_patch_jump_table, &args);
>
> Can we have a for_each_module that keeps preemption on ? Calling
> arch_jump_label_transform() with preemption off is wrong, as it locks
> the text_mutex and get/put online cpus, not to mention that
> text_poke_smp() is probably still based on stop_machine().
>
> Maybe we could stick a synchronize_rcu() in module delete and use a
> proper RCU read-side C.S. in for_each_module ?

Hmpf. Ok I didn't read that code.

I guess could just take the module mutex or whatever it's called
during the walk.

Copying Rusty, maybe he has ideas how to handle this.


>> -		return 1;
>> +	struct jump_entry *entry;
>>
>> +	for (entry = jstart; entry < jstop; entry++)
>> +		if (entry->code <= (unsigned long)start &&
>
> start -> end ?

I copied this from the original code too. I remember stumbling
over it too. Yes it was probably wrong in the original too.

>
>> +		entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
>
> Maybe it's my mail client, but indentation looks wrong here.

All the horrors you do for the holy 80 columns.


>>
>> -#ifdef CONFIG_MODULES
>
> Why make it build when modules are configured out ?

Because it's shared code for the module and non module case.


>> +	args.conflict = check_conflict(__start___jump_table,
>> +				       __stop___jump_table, start, end);
>>  	/* now check modules */
>>  #ifdef CONFIG_MODULES
>
> ifdef could be removed from within this function body by replacing it
> with a #ifdef CONFIG_MODULES #else #endif that defines an empty static
> inline when disabled. Also for_each_module for be defined as an empty
> static inline when modules are disabled.

It could yes, but really a single clean #ifdef is imho fine
and much simpler.

-Andi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 11:31   ` Mathieu Desnoyers
  2010-09-22 11:56     ` Andi Kleen
@ 2010-09-22 11:57     ` Frederic Weisbecker
  1 sibling, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2010-09-22 11:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, avi, davem, vgoyal, sam, tony, Andi Kleen

On Wed, Sep 22, 2010 at 07:31:14AM -0400, Mathieu Desnoyers wrote:
> > -static struct jump_label_entry *add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
> > +void patch_jump_table(unsigned long key, enum jump_label_type type,
> > +	 	      struct jump_entry *start, struct jump_entry *stop)
> >  {
> > -	struct hlist_head *head;
> > -	struct jump_label_entry *e;
> > -	u32 hash;
> > -
> > -	e = get_jump_label_entry(key);
> > -	if (e)
> > -		return ERR_PTR(-EEXIST);
> > -
> > -	e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
> > -	if (!e)
> > -		return ERR_PTR(-ENOMEM);
> > -
> > -	hash = jhash((void *)&key, sizeof(jump_label_t), 0);
> > -	head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
> > -	e->key = key;
> > -	e->table = table;
> > -	e->nr_entries = nr_entries;
> > -	INIT_HLIST_HEAD(&(e->modules));
> > -	hlist_add_head(&e->hlist, head);
> > -	return e;
> > +	struct jump_entry *entry = search_jump_table(start, stop, key);
> > +	if (!entry)
> > +		return;
> > +	while (entry > start && entry[-1].key == key)
> > +		entry--;
> 
> OK, I like the way it handles patching of multiple entries with the same
> key at once. Sorting really makes sense here.
> 
> > +	for (; entry < stop && entry->key == key; entry++)
> > +		if (kernel_text_address(entry->code))
> 
> This does not work for modules I'm afraid, only for the core kernel. You
> should test for __module_text_address() somewhere.



No, look:

int kernel_text_address(unsigned long addr)
{
	if (core_kernel_text(addr))
		return 1;
	return is_module_text_address(addr);
}


 
> Dumb question: why do you need this test at all ?


I wonder about that too.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 11:56     ` Andi Kleen
@ 2010-09-22 12:04       ` Andi Kleen
  2010-09-22 15:02         ` Mathieu Desnoyers
  2010-09-22 13:46       ` Jason Baron
  2010-09-22 18:14       ` Jason Baron
  2 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2010-09-22 12:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mathieu Desnoyers, Andi Kleen, jbaron, rusty, rostedt,
	linux-kernel, mingo, hpa, tglx, roland, rth, mhiramat, fweisbec,
	avi, davem, vgoyal, sam, tony, Andi Kleen


>
>>> +	for (; entry < stop && entry->key == key; entry++)
>>> +		if (kernel_text_address(entry->code))
>>
>> This does not work for modules I'm afraid, only for the core kernel. You
>> should test for __module_text_address() somewhere.
>
> I thought it was shared now, but ok.

Double checked. This is ok because kernel_text_address()
already checks for modules. You were probably thinking
of __kernel_text_address()

-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Add for_each_module iterator function
  2010-09-22 10:08 [PATCH 1/2] Add for_each_module iterator function Andi Kleen
  2010-09-22 10:08 ` [PATCH 2/2] Rewrite jump_label.c to use binary search Andi Kleen
@ 2010-09-22 12:25 ` Thomas Gleixner
  2010-09-22 12:52   ` Andi Kleen
  2010-09-22 14:03   ` Mathieu Desnoyers
  1 sibling, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2010-09-22 12:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbaron, rostedt, linux-kernel, mingo, mathieu.desnoyers, hpa,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen



On Wed, 22 Sep 2010, Andi Kleen wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> This is a generic function to iterate over all modules.
> To be used in the next patch.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  include/linux/module.h |    1 +
>  kernel/module.c        |   10 ++++++++++
>  2 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 403ac26..809b6db 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -410,6 +410,7 @@ struct module *__module_address(unsigned long addr);
>  bool is_module_address(unsigned long addr);
>  bool is_module_percpu_address(unsigned long addr);
>  bool is_module_text_address(unsigned long addr);
> +void for_each_module(void (*op)(struct module *, void *arg), void *arg);
>  
>  static inline int within_module_core(unsigned long addr, struct module *mod)
>  {
> diff --git a/kernel/module.c b/kernel/module.c
> index eba1341..b8fb3e6 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -384,6 +384,16 @@ struct module *find_module(const char *name)
>  }
>  EXPORT_SYMBOL_GPL(find_module);
>  
> +void for_each_module(void (*op)(struct module *, void *arg), void *arg)
> +{
> +	struct module *mod;
> +
> +	preempt_disable();

  That wants rcu_read_lock()

> +	list_for_each_entry_rcu(mod, &modules, list)
> +		op(mod, arg);
> +	preempt_enable();

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Add for_each_module iterator function
  2010-09-22 12:25 ` [PATCH 1/2] Add for_each_module iterator function Thomas Gleixner
@ 2010-09-22 12:52   ` Andi Kleen
  2010-09-22 14:03   ` Mathieu Desnoyers
  1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2010-09-22 12:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo,
	mathieu.desnoyers, hpa, roland, rth, mhiramat, fweisbec, avi,
	davem, vgoyal, sam, tony, Andi Kleen

> > +void for_each_module(void (*op)(struct module *, void *arg), void *arg)
> > +{
> > +	struct module *mod;
> > +
> > +	preempt_disable();
> 
>   That wants rcu_read_lock()

Thanks. It was wrong anyways for the intended use case, 
switched to module_mutex now.

BTW I think i copied this from somewhere else in module.c
Probably should be fixed there too.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 11:56     ` Andi Kleen
  2010-09-22 12:04       ` Andi Kleen
@ 2010-09-22 13:46       ` Jason Baron
  2010-09-22 18:14       ` Jason Baron
  2 siblings, 0 replies; 21+ messages in thread
From: Jason Baron @ 2010-09-22 13:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mathieu Desnoyers, rusty, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On Wed, Sep 22, 2010 at 01:56:48PM +0200, Andi Kleen wrote:
> 
> >> +	for (; entry < stop && entry->key == key; entry++)
> >> +		if (kernel_text_address(entry->code))
> >
> > This does not work for modules I'm afraid, only for the core kernel. You
> > should test for __module_text_address() somewhere.
> 
> I thought it was shared now, but ok.
> 
> > Dumb question: why do you need this test at all ?
> 
> I just copied that from the original code. In fact I was wondering
> too why it's needed. Jason?
> 
> 

It was put there for jump labels that are in __init sections. Without 
this check the code will corrupt itself when we toggle a jump label 
after the __init sections have been freed.

thanks,

-Jason


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] Add for_each_module iterator function
  2010-09-22 12:25 ` [PATCH 1/2] Add for_each_module iterator function Thomas Gleixner
  2010-09-22 12:52   ` Andi Kleen
@ 2010-09-22 14:03   ` Mathieu Desnoyers
  1 sibling, 0 replies; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 14:03 UTC (permalink / raw)
  To: Thomas Gleixner, Paul E. McKenney
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, hpa, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* Thomas Gleixner (tglx@linutronix.de) wrote:
> 
> 
> On Wed, 22 Sep 2010, Andi Kleen wrote:
> 
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > This is a generic function to iterate over all modules.
> > To be used in the next patch.
> > 
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > ---
> >  include/linux/module.h |    1 +
> >  kernel/module.c        |   10 ++++++++++
> >  2 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 403ac26..809b6db 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -410,6 +410,7 @@ struct module *__module_address(unsigned long addr);
> >  bool is_module_address(unsigned long addr);
> >  bool is_module_percpu_address(unsigned long addr);
> >  bool is_module_text_address(unsigned long addr);
> > +void for_each_module(void (*op)(struct module *, void *arg), void *arg);
> >  
> >  static inline int within_module_core(unsigned long addr, struct module *mod)
> >  {
> > diff --git a/kernel/module.c b/kernel/module.c
> > index eba1341..b8fb3e6 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -384,6 +384,16 @@ struct module *find_module(const char *name)
> >  }
> >  EXPORT_SYMBOL_GPL(find_module);
> >  
> > +void for_each_module(void (*op)(struct module *, void *arg), void *arg)
> > +{
> > +	struct module *mod;
> > +
> > +	preempt_disable();
> 
>   That wants rcu_read_lock()

Then we might need to add synchronize_rcu() call when modules are
unloaded. For now, there is only synchronize_sched().

Mathieu

> 
> > +	list_for_each_entry_rcu(mod, &modules, list)
> > +		op(mod, arg);
> > +	preempt_enable();
> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 12:04       ` Andi Kleen
@ 2010-09-22 15:02         ` Mathieu Desnoyers
  2010-09-22 15:07           ` Mathieu Desnoyers
  2010-09-22 15:28           ` Jason Baron
  0 siblings, 2 replies; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 15:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbaron, rusty, rostedt, linux-kernel, mingo, hpa, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* Andi Kleen (andi@firstfloor.org) wrote:
> 
> >
> >>> +	for (; entry < stop && entry->key == key; entry++)
> >>> +		if (kernel_text_address(entry->code))
> >>
> >> This does not work for modules I'm afraid, only for the core kernel. You
> >> should test for __module_text_address() somewhere.
> >
> > I thought it was shared now, but ok.
> 
> Double checked. This is ok because kernel_text_address()
> already checks for modules. You were probably thinking
> of __kernel_text_address()

Ah right,

Although we have another problem:

__module_text_address() includes module init text, which defeats the
purpose of the check put in there by Jason.

So the check works for the core kernel, but not for modules.

Mathieu

> 
> -Andi
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 15:02         ` Mathieu Desnoyers
@ 2010-09-22 15:07           ` Mathieu Desnoyers
  2010-09-22 15:43             ` Jason Baron
  2010-09-22 15:28           ` Jason Baron
  1 sibling, 1 reply; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 15:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbaron, rusty, rostedt, linux-kernel, mingo, hpa, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Andi Kleen (andi@firstfloor.org) wrote:
> > 
> > >
> > >>> +	for (; entry < stop && entry->key == key; entry++)
> > >>> +		if (kernel_text_address(entry->code))
> > >>
> > >> This does not work for modules I'm afraid, only for the core kernel. You
> > >> should test for __module_text_address() somewhere.
> > >
> > > I thought it was shared now, but ok.
> > 
> > Double checked. This is ok because kernel_text_address()
> > already checks for modules. You were probably thinking
> > of __kernel_text_address()
> 
> Ah right,
> 
> Although we have another problem:
> 
> __module_text_address() includes module init text, which defeats the
> purpose of the check put in there by Jason.
> 
> So the check works for the core kernel, but not for modules.

So, we have this issue, but I also have a question for Jason: what
happens if someone puts static jump in a function declared in the __init
section of the core kernel/module ? Can we enable them at early boot ?

Thanks,

Mathieu

> 
> Mathieu
> 
> > 
> > -Andi
> > 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 15:02         ` Mathieu Desnoyers
  2010-09-22 15:07           ` Mathieu Desnoyers
@ 2010-09-22 15:28           ` Jason Baron
  2010-09-22 19:19             ` Mathieu Desnoyers
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Baron @ 2010-09-22 15:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, rusty, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On Wed, Sep 22, 2010 at 11:02:50AM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (andi@firstfloor.org) wrote:
> > 
> > >
> > >>> +	for (; entry < stop && entry->key == key; entry++)
> > >>> +		if (kernel_text_address(entry->code))
> > >>
> > >> This does not work for modules I'm afraid, only for the core kernel. You
> > >> should test for __module_text_address() somewhere.
> > >
> > > I thought it was shared now, but ok.
> > 
> > Double checked. This is ok because kernel_text_address()
> > already checks for modules. You were probably thinking
> > of __kernel_text_address()
> 
> Ah right,
> 
> Although we have another problem:
> 
> __module_text_address() includes module init text, which defeats the
> purpose of the check put in there by Jason.
> 
> So the check works for the core kernel, but not for modules.
> 
> Mathieu
> 

it works for modules too...it does:

struct module *__module_text_address(unsigned long addr)
{
        struct module *mod = __module_address(addr);
        if (mod) {
                /* Make sure it's within the text section. */
                if (!within(addr, mod->module_init, mod->init_text_size)
                    && !within(addr, mod->module_core,
mod->core_text_size))
                        mod = NULL;
        }
        return mod;
}

and then in kernel/module.c we have :


        module_free(mod, mod->module_init);
        mod->module_init = NULL;


So, I was relying on the fact module_init gets set to NULL after the
free happens. However, there a small race there in that the vfree()
happens before module_init() is set to NULL. So that is probably most
easily fixed be wrapping those two lines with the jump_label_mutex.

thanks,

-Jason



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 15:07           ` Mathieu Desnoyers
@ 2010-09-22 15:43             ` Jason Baron
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Baron @ 2010-09-22 15:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, rusty, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On Wed, Sep 22, 2010 at 11:07:43AM -0400, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > * Andi Kleen (andi@firstfloor.org) wrote:
> > > 
> > > >
> > > >>> +	for (; entry < stop && entry->key == key; entry++)
> > > >>> +		if (kernel_text_address(entry->code))
> > > >>
> > > >> This does not work for modules I'm afraid, only for the core kernel. You
> > > >> should test for __module_text_address() somewhere.
> > > >
> > > > I thought it was shared now, but ok.
> > > 
> > > Double checked. This is ok because kernel_text_address()
> > > already checks for modules. You were probably thinking
> > > of __kernel_text_address()
> > 
> > Ah right,
> > 
> > Although we have another problem:
> > 
> > __module_text_address() includes module init text, which defeats the
> > purpose of the check put in there by Jason.
> > 
> > So the check works for the core kernel, but not for modules.
> 
> So, we have this issue, but I also have a question for Jason: what
> happens if someone puts static jump in a function declared in the __init
> section of the core kernel/module ? Can we enable them at early boot ?
> 
> Thanks,
> 
> Mathieu
> 

Hi Mathieu,

Yes, we should be able to enable these. Look at the definition for
core_kernel_text():

int core_kernel_text(unsigned long addr)
{
        if (addr >= (unsigned long)_stext &&
            addr <= (unsigned long)_etext)
                return 1;

        if (system_state == SYSTEM_BOOTING &&
            init_kernel_text(addr))
                return 1;
        return 0;
}

If the system is in the SYSTEM_BOOTING state we allow the text to be
updated. same for modules, the check will allow module __init text
modifications. At very early init there could be some issue
text_poke_smp() working on x86, I'm not sure. In any case, we could
always fall back to text_poke_early(), if needed. So I don't see any
fundamental issues with this case.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 10:08 ` [PATCH 2/2] Rewrite jump_label.c to use binary search Andi Kleen
  2010-09-22 11:31   ` Mathieu Desnoyers
@ 2010-09-22 16:12   ` H. Peter Anvin
  2010-09-22 19:43     ` Mathieu Desnoyers
  1 sibling, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2010-09-22 16:12 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbaron, rostedt, linux-kernel, mingo, mathieu.desnoyers, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On 09/22/2010 03:08 AM, Andi Kleen wrote:
>
> I believe it's also much cleaner than before.
>
> I did some performance tests comparing the hash implementation
> with the binary search. This was with a moderate config x86-64
> kernel with 68 modules loaded and about 480 trace points active
> in several modules and 1340 dynamic debug statements.
>

Now, the ideal data structure for this stuff is something called a 
minimal perfect hash.  This is something that would have to be generated 
(for each module and for the monolithic kernel) at compile time -- 
presumably in modpost -- because the static generation of them is fairly 
complex.

That would provide extremely simple handling in the kernel (just do some 
shifts and a table lookup and you have your answer) and if combined with 
an AVL or red-black tree holding the module addresses it would give 
extremely fast address-to-metadata lookup, not just for this but also 
for things like exception handling.

I have used MPH's in other projects and pitched them to Linus at one 
point for exception handling.  What is clear, though, is that we should 
do the same thing for exceptions, trace points, and any other similar 
things that depend on exact PC.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 11:56     ` Andi Kleen
  2010-09-22 12:04       ` Andi Kleen
  2010-09-22 13:46       ` Jason Baron
@ 2010-09-22 18:14       ` Jason Baron
  2 siblings, 0 replies; 21+ messages in thread
From: Jason Baron @ 2010-09-22 18:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mathieu Desnoyers, rusty, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On Wed, Sep 22, 2010 at 01:56:48PM +0200, Andi Kleen wrote:
> >> +	struct jump_entry *entry;
> >>
> >> +	for (entry = jstart; entry < jstop; entry++)
> >> +		if (entry->code <= (unsigned long)start &&
> >
> > start -> end ?
> 
> I copied this from the original code too. I remember stumbling
> over it too. Yes it was probably wrong in the original too.
> 

No...the original version had it correct, check the archive.

This bug was introduced in your version.

thanks,

-Jason


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 15:28           ` Jason Baron
@ 2010-09-22 19:19             ` Mathieu Desnoyers
  0 siblings, 0 replies; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 19:19 UTC (permalink / raw)
  To: Jason Baron
  Cc: Andi Kleen, rusty, rostedt, linux-kernel, mingo, hpa, tglx,
	roland, rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* Jason Baron (jbaron@redhat.com) wrote:
> On Wed, Sep 22, 2010 at 11:02:50AM -0400, Mathieu Desnoyers wrote:
> > * Andi Kleen (andi@firstfloor.org) wrote:
> > > 
> > > >
> > > >>> +	for (; entry < stop && entry->key == key; entry++)
> > > >>> +		if (kernel_text_address(entry->code))
> > > >>
> > > >> This does not work for modules I'm afraid, only for the core kernel. You
> > > >> should test for __module_text_address() somewhere.
> > > >
> > > > I thought it was shared now, but ok.
> > > 
> > > Double checked. This is ok because kernel_text_address()
> > > already checks for modules. You were probably thinking
> > > of __kernel_text_address()
> > 
> > Ah right,
> > 
> > Although we have another problem:
> > 
> > __module_text_address() includes module init text, which defeats the
> > purpose of the check put in there by Jason.
> > 
> > So the check works for the core kernel, but not for modules.
> > 
> > Mathieu
> > 
> 
> it works for modules too...it does:
> 
> struct module *__module_text_address(unsigned long addr)
> {
>         struct module *mod = __module_address(addr);
>         if (mod) {
>                 /* Make sure it's within the text section. */
>                 if (!within(addr, mod->module_init, mod->init_text_size)
>                     && !within(addr, mod->module_core,
> mod->core_text_size))
>                         mod = NULL;
>         }
>         return mod;
> }
> 
> and then in kernel/module.c we have :
> 
> 
>         module_free(mod, mod->module_init);
>         mod->module_init = NULL;
> 
> 
> So, I was relying on the fact module_init gets set to NULL after the
> free happens. However, there a small race there in that the vfree()
> happens before module_init() is set to NULL. So that is probably most
> easily fixed be wrapping those two lines with the jump_label_mutex.

It's both module_init = NULL _and_ init_text_size = 0 that make sure
the test "within(addr, mod->module_init, mod->init_text_size)" is valid.
Just the "module_init = NULL" can cause problems with addresses in the
low range of kernel addresses. With a long enough module init section,
the offset from NULL can end up (temporarily) in the kernel address
range.

But this is all wrong: __module_text_address is relying on
preempt_disable() to ensure coherency of this test is just racy, as you
point out above. So we either do the RCU synchronization properly, or
hold the module_mutex around the module text address test _and_ actual
access to the module init section.

Thanks,

Mathieu

> 
> thanks,
> 
> -Jason
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 16:12   ` H. Peter Anvin
@ 2010-09-22 19:43     ` Mathieu Desnoyers
  2010-09-22 20:06       ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 19:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 09/22/2010 03:08 AM, Andi Kleen wrote:
>>
>> I believe it's also much cleaner than before.
>>
>> I did some performance tests comparing the hash implementation
>> with the binary search. This was with a moderate config x86-64
>> kernel with 68 modules loaded and about 480 trace points active
>> in several modules and 1340 dynamic debug statements.
>>
>
> Now, the ideal data structure for this stuff is something called a  
> minimal perfect hash.  This is something that would have to be generated  
> (for each module and for the monolithic kernel) at compile time --  
> presumably in modpost -- because the static generation of them is fairly  
> complex.
>
> That would provide extremely simple handling in the kernel (just do some  
> shifts and a table lookup and you have your answer) and if combined with  
> an AVL or red-black tree holding the module addresses it would give  
> extremely fast address-to-metadata lookup, not just for this but also  
> for things like exception handling.
>
> I have used MPH's in other projects and pitched them to Linus at one  
> point for exception handling.  What is clear, though, is that we should  
> do the same thing for exceptions, trace points, and any other similar  
> things that depend on exact PC.

That's a very interesting idea, which applies very well to exception
handlers, but tracepoints and static jumps suffer from the problem that
there are many possible instances of the same key.

Tracepoints use a lookup by tracepoint name. Static jumps use a lookup
by associated variable address (but this variable can be associated with
many instances, e.g. in the case of static inline functions, or just
when the same variable is used to control many instances of static
jumps).

But maybe we could find a way to do an initial sort phase, so the
perfect hash could point to the first entry corresponding to the looked
up key ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 19:43     ` Mathieu Desnoyers
@ 2010-09-22 20:06       ` H. Peter Anvin
  2010-09-22 20:41         ` Mathieu Desnoyers
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2010-09-22 20:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On 09/22/2010 12:43 PM, Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
>> On 09/22/2010 03:08 AM, Andi Kleen wrote:
> 
> That's a very interesting idea, which applies very well to exception
> handlers, but tracepoints and static jumps suffer from the problem that
> there are many possible instances of the same key.
> 
> Tracepoints use a lookup by tracepoint name. Static jumps use a lookup
> by associated variable address (but this variable can be associated with
> many instances, e.g. in the case of static inline functions, or just
> when the same variable is used to control many instances of static
> jumps).
> 
> But maybe we could find a way to do an initial sort phase, so the
> perfect hash could point to the first entry corresponding to the looked
> up key ?
> 

In the case of multiple instances of the same key you want the perfect
hash to point to the cluster of solutions -- a list.  Since this is by
necessity something that needs to be done at compile time that list can
simply be an array.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 20:06       ` H. Peter Anvin
@ 2010-09-22 20:41         ` Mathieu Desnoyers
  2010-09-22 20:54           ` H. Peter Anvin
  0 siblings, 1 reply; 21+ messages in thread
From: Mathieu Desnoyers @ 2010-09-22 20:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

* H. Peter Anvin (hpa@zytor.com) wrote:
> On 09/22/2010 12:43 PM, Mathieu Desnoyers wrote:
> > * H. Peter Anvin (hpa@zytor.com) wrote:
> >> On 09/22/2010 03:08 AM, Andi Kleen wrote:
> > 
> > That's a very interesting idea, which applies very well to exception
> > handlers, but tracepoints and static jumps suffer from the problem that
> > there are many possible instances of the same key.
> > 
> > Tracepoints use a lookup by tracepoint name. Static jumps use a lookup
> > by associated variable address (but this variable can be associated with
> > many instances, e.g. in the case of static inline functions, or just
> > when the same variable is used to control many instances of static
> > jumps).
> > 
> > But maybe we could find a way to do an initial sort phase, so the
> > perfect hash could point to the first entry corresponding to the looked
> > up key ?
> > 
> 
> In the case of multiple instances of the same key you want the perfect
> hash to point to the cluster of solutions -- a list.  Since this is by
> simply be an array.

Yep, and sorting the section seems like a very natural way to create
these arrays. So to summarize:

- We add a post-linking step to core image and module build in
  modpost.c.
- This step accesses exception tables, tracepoint and static jump
  sections.
  - Both tracepoint and static jump need to be sorted.
  - For each of the 3 sections, a perfect hash is computed (creation
    must have the property to always succeed). The perfect hash creation
    should only take into account the first entry of duplicate keys.
  - Each of these perfect hash would translate into C code that would
    need to be compiled in a post-link phase.
  - Then we can link the perfect hash objects with the rest of the code,
    filling in one symbol per considered section (function pointer to
    the perfect hash function) and setting function pointers in struct
    module for modules.

I'm mostly writing this down as food for thoughts, since my own
implementation time is currently focused on other things.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] Rewrite jump_label.c to use binary search
  2010-09-22 20:41         ` Mathieu Desnoyers
@ 2010-09-22 20:54           ` H. Peter Anvin
  0 siblings, 0 replies; 21+ messages in thread
From: H. Peter Anvin @ 2010-09-22 20:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, jbaron, rostedt, linux-kernel, mingo, tglx, roland,
	rth, mhiramat, fweisbec, avi, davem, vgoyal, sam, tony,
	Andi Kleen

On 09/22/2010 01:41 PM, Mathieu Desnoyers wrote:
>>
>> In the case of multiple instances of the same key you want the perfect
>> hash to point to the cluster of solutions -- a list.  Since this is by
>> simply be an array.
> 
> Yep, and sorting the section seems like a very natural way to create
> these arrays. So to summarize:
> 
> - We add a post-linking step to core image and module build in
>   modpost.c.
> - This step accesses exception tables, tracepoint and static jump
>   sections.
>   - Both tracepoint and static jump need to be sorted.
>   - For each of the 3 sections, a perfect hash is computed (creation
>     must have the property to always succeed). The perfect hash creation
>     should only take into account the first entry of duplicate keys.
>   - Each of these perfect hash would translate into C code that would
>     need to be compiled in a post-link phase.
>   - Then we can link the perfect hash objects with the rest of the code,
>     filling in one symbol per considered section (function pointer to
>     the perfect hash function) and setting function pointers in struct
>     module for modules.
> 
> I'm mostly writing this down as food for thoughts, since my own
> implementation time is currently focused on other things.
> 

For what it's worth, here is a working (verified and in use) perfect
hash generator written in Perl:

http://repo.or.cz/w/nasm.git/tree/HEAD:/perllib

Like most other perfect hash generators it needs a prehash: the prehash
should be parameterizable (seedable) and produce 2 ceil(log n) bits of
hash material and cannot have collisions.  The actual phash algorithm
compresses it down to a perfect hash.  The prehash is typically
generated via a pseudorandom algorithm: the particular implementation
pointed to uses one based on CRC64 because it's fast to compute but has
a finite probability of not existing; a universal prehash is guaranteed
to exist but is much more expensive.  In practice a very simple prehash
is usually sufficient, and one goes for speed.

For binary numbers being input, an even simpler prehash based on
multiplies or rotates is generally more than sufficient.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-09-22 20:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-22 10:08 [PATCH 1/2] Add for_each_module iterator function Andi Kleen
2010-09-22 10:08 ` [PATCH 2/2] Rewrite jump_label.c to use binary search Andi Kleen
2010-09-22 11:31   ` Mathieu Desnoyers
2010-09-22 11:56     ` Andi Kleen
2010-09-22 12:04       ` Andi Kleen
2010-09-22 15:02         ` Mathieu Desnoyers
2010-09-22 15:07           ` Mathieu Desnoyers
2010-09-22 15:43             ` Jason Baron
2010-09-22 15:28           ` Jason Baron
2010-09-22 19:19             ` Mathieu Desnoyers
2010-09-22 13:46       ` Jason Baron
2010-09-22 18:14       ` Jason Baron
2010-09-22 11:57     ` Frederic Weisbecker
2010-09-22 16:12   ` H. Peter Anvin
2010-09-22 19:43     ` Mathieu Desnoyers
2010-09-22 20:06       ` H. Peter Anvin
2010-09-22 20:41         ` Mathieu Desnoyers
2010-09-22 20:54           ` H. Peter Anvin
2010-09-22 12:25 ` [PATCH 1/2] Add for_each_module iterator function Thomas Gleixner
2010-09-22 12:52   ` Andi Kleen
2010-09-22 14:03   ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox