public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1
@ 2008-04-09 15:08 Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 01/17] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Andi Kleen, Rusty Russell

Hi Andrew,

Here is the complete patchset required to get basic Immediate Values support
ported to 2.6.25-rc8-mm1. It provides the "simple", non nmi-safe version of
immediate values, which means that immediate values should not be used in code
paths reachable by NMI or MCE handlers. This version also uses
stop_machine_run() for immediate values updates, which is a very heavy lock.  In
order to make incremental, easy to review, changes, I think we could start by
this "simple" version as a first step before we switch to the nmi-safe version
later.

It applies at the end of your series files in the following order :

# The following patches are required for any kind of immediate values
# implementation
kprobes-use-mutex-for-insn-pages.patch
kprobes-dont-use-kprobes-mutex-in-arch-code.patch
kprobes-declare-kprobes-mutex-static.patch

x86-enhance-debug-rodata-support-alternatives.patch
fix-text-poke-for-vmalloced-pages.patch
x86-enhance-debug-rodata-support-for-hotplug-and-kprobes.patch

text-edit-lock-architecture-independent-code.patch
text-edit-lock-kprobes-architecture-independent-support.patch

# The following patches provide non nmi-safe immediate values
add-all-cpus-option-to-stop-machine-run.patch
immediate-values-architecture-independent-code.patch
implement-immediate-update-via-stop-machine-run.patch
immediate-values-kconfig-menu-in-embedded.patch
immediate-values-x86-optimization.patch
add-text-poke-and-sync-core-to-powerpc.patch
immediate-values-powerpc-optimization.patch
immediate-values-documentation.patch

#Those are the immediate values users
scheduler-profiling-use-immediate-values.patch

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 01/17] Kprobes - use a mutex to protect the instruction pages list.
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 20:08   ` Masami Hiramatsu
  2008-04-09 15:08 ` [patch 02/17] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers,
	Ananth N Mavinakayanahalli, hch, anil.s.keshavamurthy, davem

[-- Attachment #1: kprobes-use-mutex-for-insn-pages.patch --]
[-- Type: text/plain, Size: 3612 bytes --]

Protect the instruction pages list by a specific insn pages mutex, called in 
get_insn_slot() and free_insn_slot(). It makes sure that architectures that does
not need to call arch_remove_kprobe() does not take an unneeded kprobes mutex.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: hch@infradead.org
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
---
 kernel/kprobes.c |   27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

Index: linux-2.6-lttng.mm/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/kprobes.c	2008-04-09 10:28:02.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/kprobes.c	2008-04-09 10:32:34.000000000 -0400
@@ -107,6 +107,10 @@ enum kprobe_slot_state {
 	SLOT_USED = 2,
 };
 
+/*
+ * Protects the kprobe_insn_pages list. Can nest into kprobe_mutex.
+ */
+static DEFINE_MUTEX(kprobe_insn_mutex);
 static struct hlist_head kprobe_insn_pages;
 static int kprobe_garbage_slots;
 static int collect_garbage_slots(void);
@@ -143,7 +147,9 @@ kprobe_opcode_t __kprobes *get_insn_slot
 {
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;
+	kprobe_opcode_t *ret;
 
+	mutex_lock(&kprobe_insn_mutex);
  retry:
 	hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
 		if (kip->nused < INSNS_PER_PAGE) {
@@ -152,7 +158,8 @@ kprobe_opcode_t __kprobes *get_insn_slot
 				if (kip->slot_used[i] == SLOT_CLEAN) {
 					kip->slot_used[i] = SLOT_USED;
 					kip->nused++;
-					return kip->insns + (i * MAX_INSN_SIZE);
+					ret = kip->insns + (i * MAX_INSN_SIZE);
+					goto end;
 				}
 			}
 			/* Surprise!  No unused slots.  Fix kip->nused. */
@@ -166,8 +173,10 @@ kprobe_opcode_t __kprobes *get_insn_slot
 	}
 	/* All out of space.  Need to allocate a new page. Use slot 0. */
 	kip = kmalloc(sizeof(struct kprobe_insn_page), GFP_KERNEL);
-	if (!kip)
-		return NULL;
+	if (!kip) {
+		ret = NULL;
+		goto end;
+	}
 
 	/*
 	 * Use module_alloc so this page is within +/- 2GB of where the
@@ -177,7 +186,8 @@ kprobe_opcode_t __kprobes *get_insn_slot
 	kip->insns = module_alloc(PAGE_SIZE);
 	if (!kip->insns) {
 		kfree(kip);
-		return NULL;
+		ret = NULL;
+		goto end;
 	}
 	INIT_HLIST_NODE(&kip->hlist);
 	hlist_add_head(&kip->hlist, &kprobe_insn_pages);
@@ -185,7 +195,10 @@ kprobe_opcode_t __kprobes *get_insn_slot
 	kip->slot_used[0] = SLOT_USED;
 	kip->nused = 1;
 	kip->ngarbage = 0;
-	return kip->insns;
+	ret = kip->insns;
+end:
+	mutex_unlock(&kprobe_insn_mutex);
+	return ret;
 }
 
 /* Return 1 if all garbages are collected, otherwise 0. */
@@ -219,7 +232,7 @@ static int __kprobes collect_garbage_slo
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos, *next;
 
-	/* Ensure no-one is preepmted on the garbages */
+	/* Ensure no-one is preempted on the garbages */
 	if (check_safety() != 0)
 		return -EAGAIN;
 
@@ -243,6 +256,7 @@ void __kprobes free_insn_slot(kprobe_opc
 	struct kprobe_insn_page *kip;
 	struct hlist_node *pos;
 
+	mutex_lock(&kprobe_insn_mutex);
 	hlist_for_each_entry(kip, pos, &kprobe_insn_pages, hlist) {
 		if (kip->insns <= slot &&
 		    slot < kip->insns + (INSNS_PER_PAGE * MAX_INSN_SIZE)) {
@@ -259,6 +273,7 @@ void __kprobes free_insn_slot(kprobe_opc
 
 	if (dirty && ++kprobe_garbage_slots > INSNS_PER_PAGE)
 		collect_garbage_slots();
+	mutex_unlock(&kprobe_insn_mutex);
 }
 #endif
 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 02/17] Kprobes - do not use kprobes mutex in arch code
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 01/17] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 20:08   ` Masami Hiramatsu
  2008-04-09 15:08 ` [patch 03/17] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers,
	Ananth N Mavinakayanahalli, anil.s.keshavamurthy, davem

[-- Attachment #1: kprobes-dont-use-kprobes-mutex-in-arch-code.patch --]
[-- Type: text/plain, Size: 4186 bytes --]

Remove the kprobes mutex from kprobes.h, since it does not belong there. Also
remove all use of this mutex in the architecture specific code, replacing it by
a proper mutex lock/unlock in the architecture agnostic code.

Changelog :
- remove unnecessary kprobe_mutex around arch_remove_kprobe()

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
---
 arch/ia64/kernel/kprobes.c    |    2 --
 arch/powerpc/kernel/kprobes.c |    2 --
 arch/s390/kernel/kprobes.c    |    2 --
 arch/x86/kernel/kprobes.c     |    2 --
 include/linux/kprobes.h       |    2 --
 kernel/kprobes.c              |    2 ++
 6 files changed, 2 insertions(+), 10 deletions(-)

Index: linux-2.6-lttng.mm/include/linux/kprobes.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/linux/kprobes.h	2008-04-09 10:43:10.000000000 -0400
+++ linux-2.6-lttng.mm/include/linux/kprobes.h	2008-04-09 10:43:38.000000000 -0400
@@ -35,7 +35,6 @@
 #include <linux/percpu.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
-#include <linux/mutex.h>
 
 #ifdef CONFIG_KPROBES
 #include <asm/kprobes.h>
@@ -202,7 +201,6 @@ static inline int init_test_probes(void)
 #endif /* CONFIG_KPROBES_SANITY_TEST */
 
 extern spinlock_t kretprobe_lock;
-extern struct mutex kprobe_mutex;
 extern int arch_prepare_kprobe(struct kprobe *p);
 extern void arch_arm_kprobe(struct kprobe *p);
 extern void arch_disarm_kprobe(struct kprobe *p);
Index: linux-2.6-lttng.mm/arch/x86/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/x86/kernel/kprobes.c	2008-04-09 10:43:10.000000000 -0400
+++ linux-2.6-lttng.mm/arch/x86/kernel/kprobes.c	2008-04-09 10:43:38.000000000 -0400
@@ -376,9 +376,7 @@ void __kprobes arch_disarm_kprobe(struct
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
 {
-	mutex_lock(&kprobe_mutex);
 	free_insn_slot(p->ainsn.insn, (p->ainsn.boostable == 1));
-	mutex_unlock(&kprobe_mutex);
 }
 
 static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
Index: linux-2.6-lttng.mm/arch/ia64/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/ia64/kernel/kprobes.c	2008-04-09 10:43:10.000000000 -0400
+++ linux-2.6-lttng.mm/arch/ia64/kernel/kprobes.c	2008-04-09 10:43:38.000000000 -0400
@@ -672,9 +672,7 @@ void __kprobes arch_disarm_kprobe(struct
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
 {
-	mutex_lock(&kprobe_mutex);
 	free_insn_slot(p->ainsn.insn, p->ainsn.inst_flag & INST_FLAG_BOOSTABLE);
-	mutex_unlock(&kprobe_mutex);
 }
 /*
  * We are resuming execution after a single step fault, so the pt_regs
Index: linux-2.6-lttng.mm/arch/powerpc/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/powerpc/kernel/kprobes.c	2008-04-09 10:43:10.000000000 -0400
+++ linux-2.6-lttng.mm/arch/powerpc/kernel/kprobes.c	2008-04-09 10:43:38.000000000 -0400
@@ -88,9 +88,7 @@ void __kprobes arch_disarm_kprobe(struct
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
 {
-	mutex_lock(&kprobe_mutex);
 	free_insn_slot(p->ainsn.insn, 0);
-	mutex_unlock(&kprobe_mutex);
 }
 
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
Index: linux-2.6-lttng.mm/arch/s390/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/s390/kernel/kprobes.c	2008-04-09 10:43:10.000000000 -0400
+++ linux-2.6-lttng.mm/arch/s390/kernel/kprobes.c	2008-04-09 10:43:38.000000000 -0400
@@ -220,9 +220,7 @@ void __kprobes arch_disarm_kprobe(struct
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
 {
-	mutex_lock(&kprobe_mutex);
 	free_insn_slot(p->ainsn.insn, 0);
-	mutex_unlock(&kprobe_mutex);
 }
 
 static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 03/17] Kprobes - declare kprobe_mutex static
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 01/17] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 02/17] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 20:08   ` Masami Hiramatsu
  2008-04-09 15:08 ` [patch 04/17] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers,
	Ananth N Mavinakayanahalli, hch, anil.s.keshavamurthy, davem

[-- Attachment #1: kprobes-declare-kprobes-mutex-static.patch --]
[-- Type: text/plain, Size: 1205 bytes --]

Since it will not be used by other kernel objects, it makes sense to declare it
static.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: hch@infradead.org
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
---
 kernel/kprobes.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-lttng/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.orig/kernel/kprobes.c	2007-08-19 09:09:15.000000000 -0400
+++ linux-2.6-lttng/kernel/kprobes.c	2007-08-19 17:18:07.000000000 -0400
@@ -68,7 +68,7 @@ static struct hlist_head kretprobe_inst_
 /* NOTE: change this value only with kprobe_mutex held */
 static bool kprobe_enabled;
 
-DEFINE_MUTEX(kprobe_mutex);		/* Protects kprobe_table */
+static DEFINE_MUTEX(kprobe_mutex);	/* Protects kprobe_table */
 DEFINE_SPINLOCK(kretprobe_lock);	/* Protects kretprobe_inst_table */
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 04/17] x86 - Enhance DEBUG_RODATA support - alternatives
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2008-04-09 15:08 ` [patch 03/17] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 05/17] x86 Fix text_poke for vmalloced pages Mathieu Desnoyers
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, pageexec,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge

[-- Attachment #1: x86-enhance-debug-rodata-support-alternatives.patch --]
[-- Type: text/plain, Size: 9750 bytes --]

Fix a memcpy that should be a text_poke (in apply_alternatives).

Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA
correctly and so the CPU HOTPLUG special case can be removed.

Add text_poke_early, for alternatives and paravirt boot-time and module load
time patching.

Changelog:

- Fix text_set and text_poke alignment check (mixed up bitwise and and or)
- Remove text_set
- Export add_nops, so it can be used by others.
- Document text_poke_early.
- Remove clflush, since it breaks some VIA architectures and is not strictly
  necessary.
- Add kerneldoc to text_poke and text_poke_early.
- Create a second vmap instead of using the WP bit to support Xen and VMI.
- Move local_irq disable within text_poke and text_poke_early to be able to
  be sleepable in these functions.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/kernel/alternative.c |   88 +++++++++++++++++++++++++++++++-----------
 include/asm-x86/alternative.h |   23 ++++++++++
 2 files changed, 87 insertions(+), 24 deletions(-)

Index: linux-2.6-lttng.mm/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/x86/kernel/alternative.c	2008-04-09 11:02:43.000000000 -0400
+++ linux-2.6-lttng.mm/arch/x86/kernel/alternative.c	2008-04-09 11:04:09.000000000 -0400
@@ -5,12 +5,14 @@
 #include <linux/kprobes.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
+#include <linux/io.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
 #include <asm/pgtable.h>
 #include <asm/mce.h>
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
+#include <asm/cacheflush.h>
 
 #define MAX_PATCH_LEN (255-1)
 
@@ -173,7 +175,7 @@ static const unsigned char*const * find_
 #endif /* CONFIG_X86_64 */
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void add_nops(void *insns, unsigned int len)
+void add_nops(void *insns, unsigned int len)
 {
 	const unsigned char *const *noptable = find_nop_table();
 
@@ -186,6 +188,7 @@ static void add_nops(void *insns, unsign
 		len -= noplen;
 	}
 }
+EXPORT_SYMBOL_GPL(add_nops);
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern u8 *__smp_locks[], *__smp_locks_end[];
@@ -219,7 +222,7 @@ void apply_alternatives(struct alt_instr
 		memcpy(insnbuf, a->replacement, a->replacementlen);
 		add_nops(insnbuf + a->replacementlen,
 			 a->instrlen - a->replacementlen);
-		text_poke(instr, insnbuf, a->instrlen);
+		text_poke_early(instr, insnbuf, a->instrlen);
 	}
 }
 
@@ -280,7 +283,6 @@ void alternatives_smp_module_add(struct 
 				 void *text,  void *text_end)
 {
 	struct smp_alt_module *smp;
-	unsigned long flags;
 
 	if (noreplace_smp)
 		return;
@@ -306,39 +308,37 @@ void alternatives_smp_module_add(struct 
 		__FUNCTION__, smp->locks, smp->locks_end,
 		smp->text, smp->text_end, smp->name);
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_add_tail(&smp->next, &smp_alt_modules);
 	if (boot_cpu_has(X86_FEATURE_UP))
 		alternatives_smp_unlock(smp->locks, smp->locks_end,
 					smp->text, smp->text_end);
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_module_del(struct module *mod)
 {
 	struct smp_alt_module *item;
-	unsigned long flags;
 
 	if (smp_alt_once || noreplace_smp)
 		return;
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 	list_for_each_entry(item, &smp_alt_modules, next) {
 		if (mod != item->mod)
 			continue;
 		list_del(&item->next);
-		spin_unlock_irqrestore(&smp_alt, flags);
+		spin_unlock(&smp_alt);
 		DPRINTK("%s: %s\n", __FUNCTION__, item->name);
 		kfree(item);
 		return;
 	}
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 void alternatives_smp_switch(int smp)
 {
 	struct smp_alt_module *mod;
-	unsigned long flags;
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -355,7 +355,7 @@ void alternatives_smp_switch(int smp)
 		return;
 	BUG_ON(!smp && (num_online_cpus() > 1));
 
-	spin_lock_irqsave(&smp_alt, flags);
+	spin_lock(&smp_alt);
 
 	/*
 	 * Avoid unnecessary switches because it forces JIT based VMs to
@@ -379,7 +379,7 @@ void alternatives_smp_switch(int smp)
 						mod->text, mod->text_end);
 	}
 	smp_mode = smp;
-	spin_unlock_irqrestore(&smp_alt, flags);
+	spin_unlock(&smp_alt);
 }
 
 #endif
@@ -407,7 +407,7 @@ void apply_paravirt(struct paravirt_patc
 
 		/* Pad the rest with nops */
 		add_nops(insnbuf + used, p->len - used);
-		text_poke(p->instr, insnbuf, p->len);
+		text_poke_early(p->instr, insnbuf, p->len);
 	}
 }
 extern struct paravirt_patch_site __start_parainstructions[],
@@ -416,8 +416,6 @@ extern struct paravirt_patch_site __star
 
 void __init alternative_instructions(void)
 {
-	unsigned long flags;
-
 	/* The patching is not fully atomic, so try to avoid local interruptions
 	   that might execute the to be patched code.
 	   Other CPUs are not running. */
@@ -426,7 +424,6 @@ void __init alternative_instructions(voi
 	stop_mce();
 #endif
 
-	local_irq_save(flags);
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 	/* switch to patch-once-at-boottime-only mode and free the
@@ -458,7 +455,6 @@ void __init alternative_instructions(voi
 	}
 #endif
  	apply_paravirt(__parainstructions, __parainstructions_end);
-	local_irq_restore(flags);
 
 	if (smp_alt_once)
 		free_init_pages("SMP alternatives",
@@ -471,18 +467,64 @@ void __init alternative_instructions(voi
 #endif
 }
 
-/*
- * Warning:
+/**
+ * text_poke_early - Update instructions on a live kernel at boot time
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
  * When you use this code to patch more than one byte of an instruction
  * you need to make sure that other CPUs cannot execute this code in parallel.
- * Also no thread must be currently preempted in the middle of these instructions.
- * And on the local CPU you need to be protected again NMI or MCE handlers
- * seeing an inconsistent instruction while you patch.
+ * Also no thread must be currently preempted in the middle of these
+ * instructions. And on the local CPU you need to be protected again NMI or MCE
+ * handlers seeing an inconsistent instruction while you patch.
  */
-void __kprobes text_poke(void *addr, unsigned char *opcode, int len)
+void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
+	local_irq_save(flags);
 	memcpy(addr, opcode, len);
+	local_irq_restore(flags);
+	sync_core();
+	/* Could also do a CLFLUSH here to speed up CPU recovery; but
+	   that causes hangs on some VIA CPUs. */
+	return addr;
+}
+
+/**
+ * text_poke - Update instructions on a live kernel
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
+ * Only atomic text poke/set should be allowed when not doing early patching.
+ * It means the size must be writable atomically and the address must be aligned
+ * in a way that permits an atomic write. It also makes sure we fit on a single
+ * page.
+ */
+void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
+{
+	unsigned long flags;
+	char *vaddr;
+	int nr_pages = 2;
+
+	BUG_ON(len > sizeof(long));
+	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
+		- ((long)addr & ~(sizeof(long) - 1)));
+	{
+		struct page *pages[2] = { virt_to_page(addr),
+			virt_to_page(addr + PAGE_SIZE) };
+		if (!pages[1])
+			nr_pages = 1;
+		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+		WARN_ON(!vaddr);
+		local_irq_save(flags);
+		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
+		local_irq_restore(flags);
+		vunmap(vaddr);
+	}
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
+	return addr;
 }
Index: linux-2.6-lttng.mm/include/asm-x86/alternative.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/asm-x86/alternative.h	2008-04-09 11:02:43.000000000 -0400
+++ linux-2.6-lttng.mm/include/asm-x86/alternative.h	2008-04-09 11:03:04.000000000 -0400
@@ -156,6 +156,27 @@ apply_paravirt(struct paravirt_patch_sit
 #define __parainstructions_end	NULL
 #endif
 
-extern void text_poke(void *addr, unsigned char *opcode, int len);
+extern void add_nops(void *insns, unsigned int len);
+
+/*
+ * Clear and restore the kernel write-protection flag on the local CPU.
+ * Allows the kernel to edit read-only pages.
+ * Side-effect: any interrupt handler running between save and restore will have
+ * the ability to write to read-only pages.
+ *
+ * Warning:
+ * Code patching in the UP case is safe if NMIs and MCE handlers are stopped and
+ * no thread can be preempted in the instructions being modified (no iret to an
+ * invalid instruction possible) or if the instructions are changed from a
+ * consistent state to another consistent state atomically.
+ * More care must be taken when modifying code in the SMP case because of
+ * Intel's errata.
+ * On the local CPU you need to be protected again NMI or MCE handlers seeing an
+ * inconsistent instruction while you patch.
+ * The _early version expects the memory to already be RW.
+ */
+
+extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern void *text_poke_early(void *addr, const void *opcode, size_t len);
 
 #endif /* _ASM_X86_ALTERNATIVE_H */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 05/17] x86 Fix text_poke for vmalloced pages
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2008-04-09 15:08 ` [patch 04/17] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 06/17] x86 - Enhance DEBUG_RODATA support for hotplug and kprobes Mathieu Desnoyers
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers

[-- Attachment #1: fix-text-poke-for-vmalloced-pages.patch --]
[-- Type: text/plain, Size: 2105 bytes --]

The shadow vmap for DEBUG_RODATA kernel text modification uses virt_to_page to
get the pages from the pointer address and vmalloc_to_page for vmalloc'ed
pages.

- Changelog:
Deal with read-only module text.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/alternative.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2008-03-24 10:00:30.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2008-03-24 10:18:19.000000000 -0400
@@ -507,22 +507,27 @@ void *__kprobes text_poke(void *addr, co
 	unsigned long flags;
 	char *vaddr;
 	int nr_pages = 2;
+	struct page *pages[2];
 
 	BUG_ON(len > sizeof(long));
 	BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1))
 		- ((long)addr & ~(sizeof(long) - 1)));
-	{
-		struct page *pages[2] = { virt_to_page(addr),
-			virt_to_page(addr + PAGE_SIZE) };
-		if (!pages[1])
-			nr_pages = 1;
-		vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-		WARN_ON(!vaddr);
-		local_irq_save(flags);
-		memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-		local_irq_restore(flags);
-		vunmap(vaddr);
+	if (is_vmalloc_addr(addr)) {
+		pages[0] = vmalloc_to_page(addr);
+		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
+	} else {
+		pages[0] = virt_to_page(addr);
+		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
+	BUG_ON(!pages[0]);
+	if (!pages[1])
+		nr_pages = 1;
+	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+	BUG_ON(!vaddr);
+	local_irq_save(flags);
+	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
+	local_irq_restore(flags);
+	vunmap(vaddr);
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 06/17] x86 - Enhance DEBUG_RODATA support for hotplug and kprobes
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2008-04-09 15:08 ` [patch 05/17] x86 Fix text_poke for vmalloced pages Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 07/17] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, pageexec,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge

[-- Attachment #1: x86-enhance-debug-rodata-support-for-hotplug-and-kprobes.patch --]
[-- Type: text/plain, Size: 3432 bytes --]

Standardize DEBUG_RODATA, removing special cases for hotplug and kprobes.

Changelog :
- Use is_rodata_addr to check which pages must be set RW/RO under DEBUG_RODATA.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: pageexec@freemail.hu
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
---
 arch/x86/mm/init_32.c |   24 ++++++++----------------
 arch/x86/mm/init_64.c |   20 +++-----------------
 2 files changed, 11 insertions(+), 33 deletions(-)

Index: linux-2.6-lttng.mm/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/x86/mm/init_32.c	2008-04-09 10:24:48.000000000 -0400
+++ linux-2.6-lttng.mm/arch/x86/mm/init_32.c	2008-04-09 10:34:26.000000000 -0400
@@ -723,25 +723,17 @@ void mark_rodata_ro(void)
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long size = PFN_ALIGN(_etext) - start;
 
-#ifndef CONFIG_KPROBES
-#ifdef CONFIG_HOTPLUG_CPU
-	/* It must still be possible to apply SMP alternatives. */
-	if (num_possible_cpus() <= 1)
-#endif
-	{
-		set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
-		printk(KERN_INFO "Write protecting the kernel text: %luk\n",
-			size >> 10);
+	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
+	printk(KERN_INFO "Write protecting the kernel text: %luk\n",
+		size >> 10);
 
 #ifdef CONFIG_CPA_DEBUG
-		printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
-			start, start+size);
-		set_pages_rw(virt_to_page(start), size>>PAGE_SHIFT);
+	printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
+		start, start+size);
+	set_pages_rw(virt_to_page(start), size>>PAGE_SHIFT);
 
-		printk(KERN_INFO "Testing CPA: write protecting again\n");
-		set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
-#endif
-	}
+	printk(KERN_INFO "Testing CPA: write protecting again\n");
+	set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
 #endif
 	start += size;
 	size = (unsigned long)__end_rodata - start;
Index: linux-2.6-lttng.mm/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/x86/mm/init_64.c	2008-04-09 10:25:48.000000000 -0400
+++ linux-2.6-lttng.mm/arch/x86/mm/init_64.c	2008-04-09 10:34:26.000000000 -0400
@@ -632,23 +632,8 @@ EXPORT_SYMBOL_GPL(rodata_test_data);
 
 void mark_rodata_ro(void)
 {
-	unsigned long start = (unsigned long)_stext, end;
-
-#ifdef CONFIG_HOTPLUG_CPU
-	/* It must still be possible to apply SMP alternatives. */
-	if (num_possible_cpus() > 1)
-		start = (unsigned long)_etext;
-#endif
-
-#ifdef CONFIG_KPROBES
-	start = (unsigned long)__start_rodata;
-#endif
-
-	end = (unsigned long)__end_rodata;
-	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
-	end &= PAGE_MASK;
-	if (end <= start)
-		return;
+	unsigned long start = PFN_ALIGN(_stext),
+			end = PFN_ALIGN(__end_rodata);
 
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
@@ -679,6 +664,7 @@ void free_initrd_mem(unsigned long start
 {
 	free_init_pages("initrd memory", start, end);
 }
+
 #endif
 
 void __init reserve_bootmem_generic(unsigned long phys, unsigned len)

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 07/17] Text Edit Lock - Architecture Independent Code
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2008-04-09 15:08 ` [patch 06/17] x86 - Enhance DEBUG_RODATA support for hotplug and kprobes Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 08/17] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers

[-- Attachment #1: text-edit-lock-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 3188 bytes --]

This is an architecture independant synchronization around kernel text
modifications through use of a global mutex.

A mutex has been chosen so that kprobes, the main user of this, can sleep during
memory allocation between the memory read of the instructions it must replace
and the memory write of the breakpoint.

Other user of this interface: immediate values.

Paravirt and alternatives are always done when SMP is inactive, so there is no
need to use locks.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: Ingo Molnar <mingo@elte.hu>
---
 include/linux/memory.h |    7 +++++++
 mm/memory.c            |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

Index: linux-2.6-lttng.mm/include/linux/memory.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/linux/memory.h	2008-04-09 10:28:15.000000000 -0400
+++ linux-2.6-lttng.mm/include/linux/memory.h	2008-04-09 10:34:30.000000000 -0400
@@ -99,4 +99,11 @@ extern int memory_notify(unsigned long v
 #define hotplug_memory_notifier(fn, pri) do { } while (0)
 #endif
 
+/*
+ * Take and release the kernel text modification lock, used for code patching.
+ * Users of this lock can sleep.
+ */
+extern void kernel_text_lock(void);
+extern void kernel_text_unlock(void);
+
 #endif /* _LINUX_MEMORY_H_ */
Index: linux-2.6-lttng.mm/mm/memory.c
===================================================================
--- linux-2.6-lttng.mm.orig/mm/memory.c	2008-04-09 10:27:34.000000000 -0400
+++ linux-2.6-lttng.mm/mm/memory.c	2008-04-09 10:34:30.000000000 -0400
@@ -51,6 +51,8 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/kprobes.h>
+#include <linux/mutex.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -96,6 +98,12 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+/*
+ * mutex protecting text section modification (dynamic code patching).
+ * some users need to sleep (allocating memory...) while they hold this lock.
+ */
+static DEFINE_MUTEX(text_mutex);
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
@@ -2807,3 +2815,29 @@ void print_vma_addr(char *prefix, unsign
 	}
 	up_read(&current->mm->mmap_sem);
 }
+
+/**
+ * kernel_text_lock     -   Take the kernel text modification lock
+ *
+ * Insures mutual write exclusion of kernel and modules text live text
+ * modification. Should be used for code patching.
+ * Users of this lock can sleep.
+ */
+void __kprobes kernel_text_lock(void)
+{
+	mutex_lock(&text_mutex);
+}
+EXPORT_SYMBOL_GPL(kernel_text_lock);
+
+/**
+ * kernel_text_unlock   -   Release the kernel text modification lock
+ *
+ * Insures mutual write exclusion of kernel and modules text live text
+ * modification. Should be used for code patching.
+ * Users of this lock can sleep.
+ */
+void __kprobes kernel_text_unlock(void)
+{
+	mutex_unlock(&text_mutex);
+}
+EXPORT_SYMBOL_GPL(kernel_text_unlock);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 08/17] Text Edit Lock - kprobes architecture independent support
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2008-04-09 15:08 ` [patch 07/17] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 09/17] Add all cpus option to stop machine run Mathieu Desnoyers
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers,
	Ananth N Mavinakayanahalli, anil.s.keshavamurthy, davem,
	Roel Kluin

[-- Attachment #1: text-edit-lock-kprobes-architecture-independent-support.patch --]
[-- Type: text/plain, Size: 3099 bytes --]

Use the mutual exclusion provided by the text edit lock in the kprobes code. It
allows coherent manipulation of the kernel code by other subsystems.

Changelog:

Move the kernel_text_lock/unlock out of the for loops.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: ananth@in.ibm.com
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
CC: Roel Kluin <12o3l@tiscali.nl>
---
 kernel/kprobes.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Index: linux-2.6-lttng.mm/kernel/kprobes.c
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/kprobes.c	2008-04-09 10:52:33.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/kprobes.c	2008-04-09 10:52:43.000000000 -0400
@@ -43,6 +43,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/kdebug.h>
+#include <linux/memory.h>
 
 #include <asm-generic/sections.h>
 #include <asm/cacheflush.h>
@@ -618,9 +619,10 @@ static int __kprobes __register_kprobe(s
 		goto out;
 	}
 
+	kernel_text_lock();
 	ret = arch_prepare_kprobe(p);
 	if (ret)
-		goto out;
+		goto out_unlock_text;
 
 	INIT_HLIST_NODE(&p->hlist);
 	hlist_add_head_rcu(&p->hlist,
@@ -628,7 +630,8 @@ static int __kprobes __register_kprobe(s
 
 	if (kprobe_enabled)
 		arch_arm_kprobe(p);
-
+out_unlock_text:
+	kernel_text_unlock();
 out:
 	mutex_unlock(&kprobe_mutex);
 
@@ -664,8 +667,11 @@ valid_p:
 		 * enabled - otherwise, the breakpoint would already have
 		 * been removed. We save on flushing icache.
 		 */
-		if (kprobe_enabled)
+		if (kprobe_enabled) {
+			kernel_text_lock();
 			arch_disarm_kprobe(p);
+			kernel_text_unlock();
+		}
 		hlist_del_rcu(&old_p->hlist);
 	} else {
 		if (p->break_handler)
@@ -862,7 +868,6 @@ static int __kprobes pre_handler_kretpro
 		}
 
 		arch_prepare_kretprobe(ri, regs);
-
 		/* XXX(hch): why is there no hlist_move_head? */
 		hlist_del(&ri->uflist);
 		hlist_add_head(&ri->uflist, &ri->rp->used_instances);
@@ -1162,11 +1167,13 @@ static void __kprobes enable_all_kprobes
 	if (kprobe_enabled)
 		goto already_enabled;
 
+	kernel_text_lock();
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist)
 			arch_arm_kprobe(p);
 	}
+	kernel_text_unlock();
 
 	kprobe_enabled = true;
 	printk(KERN_INFO "Kprobes globally enabled\n");
@@ -1191,6 +1198,7 @@ static void __kprobes disable_all_kprobe
 
 	kprobe_enabled = false;
 	printk(KERN_INFO "Kprobes globally disabled\n");
+	kernel_text_lock();
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist) {
@@ -1198,6 +1206,7 @@ static void __kprobes disable_all_kprobe
 				arch_disarm_kprobe(p);
 		}
 	}
+	kernel_text_unlock();
 
 	mutex_unlock(&kprobe_mutex);
 	/* Allow all currently running kprobes to complete */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2008-04-09 15:08 ` [patch 08/17] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 18:10   ` Alexey Dobriyan
  2008-04-09 15:08 ` [patch 10/17] Immediate Values - Architecture Independent Code Mathieu Desnoyers
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Jason Baron, Mathieu Desnoyers,
	Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, akpm

[-- Attachment #1: add-all-cpus-option-to-stop-machine-run.patch --]
[-- Type: text/plain, Size: 4369 bytes --]

-allow stop_mahcine_run() to call a function on all cpus. Calling 
 stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
 stop_machine_run() proceeds as normal until the calling cpu has
 invoked 'fn'. Then, we tell all the other cpus to call 'fn'.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---

 include/linux/stop_machine.h |    8 +++++++-
 kernel/stop_machine.c        |   31 ++++++++++++++++++++++++-------
 2 files changed, 31 insertions(+), 8 deletions(-)


Index: linux-2.6-lttng/include/linux/stop_machine.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/stop_machine.h	2008-02-27 13:20:14.000000000 -0500
+++ linux-2.6-lttng/include/linux/stop_machine.h	2008-02-29 08:32:32.000000000 -0500
@@ -8,11 +8,17 @@
 #include <asm/system.h>
 
 #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+
+#define ALL_CPUS ~0U
+
 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
  * @data: the data ptr for the @fn()
- * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
+ * @cpu: if @cpu == n, run @fn() on cpu n
+ *       if @cpu == NR_CPUS, run @fn() on any cpu
+ *       if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
+ *       concurrently on all the other cpus
  *
  * Description: This causes a thread to be scheduled on every other cpu,
  * each of which disables interrupts, and finally interrupts are disabled
Index: linux-2.6-lttng/kernel/stop_machine.c
===================================================================
--- linux-2.6-lttng.orig/kernel/stop_machine.c	2008-02-27 13:20:14.000000000 -0500
+++ linux-2.6-lttng/kernel/stop_machine.c	2008-02-29 08:34:25.000000000 -0500
@@ -23,9 +23,17 @@ enum stopmachine_state {
 	STOPMACHINE_WAIT,
 	STOPMACHINE_PREPARE,
 	STOPMACHINE_DISABLE_IRQ,
+	STOPMACHINE_RUN,
 	STOPMACHINE_EXIT,
 };
 
+struct stop_machine_data {
+	int (*fn)(void *);
+	void *data;
+	struct completion done;
+	int run_all;
+} smdata;
+
 static enum stopmachine_state stopmachine_state;
 static unsigned int stopmachine_num_threads;
 static atomic_t stopmachine_thread_ack;
@@ -34,6 +42,7 @@ static int stopmachine(void *cpu)
 {
 	int irqs_disabled = 0;
 	int prepared = 0;
+	int ran = 0;
 
 	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
 
@@ -58,6 +67,11 @@ static int stopmachine(void *cpu)
 			prepared = 1;
 			smp_mb(); /* Must read state first. */
 			atomic_inc(&stopmachine_thread_ack);
+		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
+			smdata.fn(smdata.data);
+			ran = 1;
+			smp_mb(); /* Must read state first. */
+			atomic_inc(&stopmachine_thread_ack);
 		}
 		/* Yield in first stage: migration threads need to
 		 * help our sisters onto their CPUs. */
@@ -135,12 +149,10 @@ static void restart_machine(void)
 	preempt_enable_no_resched();
 }
 
-struct stop_machine_data
+static void run_other_cpus(void)
 {
-	int (*fn)(void *);
-	void *data;
-	struct completion done;
-};
+	stopmachine_set_state(STOPMACHINE_RUN);
+}
 
 static int do_stop(void *_smdata)
 {
@@ -150,6 +162,8 @@ static int do_stop(void *_smdata)
 	ret = stop_machine();
 	if (ret == 0) {
 		ret = smdata->fn(smdata->data);
+		if (smdata->run_all)
+			run_other_cpus();
 		restart_machine();
 	}
 
@@ -173,14 +187,17 @@ struct task_struct *__stop_machine_run(i
 	struct stop_machine_data smdata;
 	struct task_struct *p;
 
+	mutex_lock(&stopmachine_mutex);
+
 	smdata.fn = fn;
 	smdata.data = data;
+	smdata.run_all = (cpu == ALL_CPUS) ? 1 : 0;
 	init_completion(&smdata.done);
 
-	mutex_lock(&stopmachine_mutex);
+	smp_wmb(); /* make sure other cpus see smdata updates */
 
 	/* If they don't care which CPU fn runs on, bind to any online one. */
-	if (cpu == NR_CPUS)
+	if (cpu == NR_CPUS || cpu == ALL_CPUS)
 		cpu = raw_smp_processor_id();
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 10/17] Immediate Values - Architecture Independent Code
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2008-04-09 15:08 ` [patch 09/17] Add all cpus option to stop machine run Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 11/17] Implement immediate update via stop_machine_run Mathieu Desnoyers
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 18814 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _imv_read() version, which uses standard global
variables, and optimized per architecture imv_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__imv" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __imv is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV().
  - imv_read(&var) becomes imv_read(var) because of this.
- Adding a new EXPORT_IMV_SYMBOL(_GPL).
- remove imv_if(). Should use if (unlikely(imv_read(var))) instead.
  - Wait until we have gcc support before we add the imv_if macro, since
    its form may have to change.
- Dont't declare the __imv section in vmlinux.lds.h, just put the content
  in the rodata section.
- Simplify interface : remove imv_set_early, keep track of kernel boot
  status internally.
- Remove the ALIGN(8) before the __imv section. It is packed now.
- Uses an IPI busy-loop on each CPU with interrupts disabled as a simple,
  architecture agnostic, update mechanism.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 include/asm-generic/vmlinux.lds.h |    3 
 include/linux/immediate.h         |   94 +++++++++++++++++++
 include/linux/module.h            |   16 +++
 init/main.c                       |    8 +
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  187 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   50 +++++++++-
 7 files changed, 358 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng.mm/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.mm/include/linux/immediate.h	2008-04-09 10:53:20.000000000 -0400
@@ -0,0 +1,94 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+struct __imv {
+	unsigned long var;	/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	unsigned long imv;	/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	unsigned char size;	/* Type size. */
+} __attribute__ ((packed));
+
+#include <asm/immediate.h>
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)						\
+	do {								\
+		name##__imv = (i);					\
+		core_imv_update();					\
+		module_imv_update();					\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_imv_update(void);
+extern void imv_update_range(const struct __imv *begin,
+	const struct __imv *end);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define imv_read(name)			_imv_read(name)
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)		(name##__imv = (i))
+
+static inline void core_imv_update(void) { }
+static inline void module_imv_update(void) { }
+
+#endif
+
+#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
+#define DEFINE_IMV(type, name)  __typeof__(type) name##__imv
+
+#define EXPORT_IMV_SYMBOL(name) EXPORT_SYMBOL(name##__imv)
+#define EXPORT_IMV_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__imv)
+
+/**
+ * _imv_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _imv_read(name)		(name##__imv)
+
+#endif
Index: linux-2.6-lttng.mm/include/linux/module.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/linux/module.h	2008-04-09 10:27:30.000000000 -0400
+++ linux-2.6-lttng.mm/include/linux/module.h	2008-04-09 10:53:20.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <linux/marker.h>
 #include <asm/local.h>
 
@@ -338,6 +339,10 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+#ifdef CONFIG_IMMEDIATE
+	const struct __imv *immediate;
+	unsigned int num_immediate;
+#endif
 #ifdef CONFIG_MARKERS
 	struct marker *markers;
 	unsigned int num_markers;
@@ -450,6 +455,9 @@ extern void print_modules(void);
 
 extern void module_update_markers(void);
 
+extern void _module_imv_update(void);
+extern void module_imv_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -554,6 +562,14 @@ static inline void module_update_markers
 {
 }
 
+static inline void _module_imv_update(void)
+{
+}
+
+static inline void module_imv_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng.mm/kernel/module.c
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/module.c	2008-04-09 10:28:23.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/module.c	2008-04-09 10:54:41.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1727,6 +1728,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	unsigned int markersindex;
 	unsigned int markersstringsindex;
 	struct module *mod;
@@ -1825,6 +1827,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__imv");
 
 	/* Don't keep modinfo and version sections. */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1984,6 +1987,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2051,11 +2059,16 @@ static struct module *load_module(void _
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	if (!mod->taints) {
 #ifdef CONFIG_MARKERS
-	if (!mod->taints)
 		marker_update_probe_range(mod->markers,
 			mod->markers + mod->num_markers);
 #endif
+#ifdef CONFIG_IMMEDIATE
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+	}
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2601,3 +2614,38 @@ void module_update_markers(void)
 	mutex_unlock(&module_mutex);
 }
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_imv_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_imv_update);
+
+/**
+ * module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_imv_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_imv_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_imv_update);
+#endif
Index: linux-2.6-lttng.mm/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.mm/kernel/immediate.c	2008-04-09 10:53:20.000000000 -0400
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+#include <linux/cpu.h>
+
+#include <asm/cacheflush.h>
+
+/*
+ * Kernel ready to execute the SMP update that may depend on trap and ipi.
+ */
+static int imv_early_boot_complete;
+
+extern const struct __imv __start___imv[];
+extern const struct __imv __stop___imv[];
+
+/*
+ * imv_mutex nests inside module_mutex. imv_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(imv_mutex);
+
+static atomic_t wait_sync;
+
+struct ipi_loop_data {
+	long value;
+	const struct __imv *imv;
+} loop_data;
+
+static void ipi_busy_loop(void *arg)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > loop_data.value);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > 0);
+	/*
+	 * Issuing a synchronizing instruction must be done on each CPU before
+	 * reenabling interrupts after modifying an instruction. Required by
+	 * Intel's errata.
+	 */
+	sync_core();
+	flush_icache_range(loop_data.imv->imv,
+		loop_data.imv->imv + loop_data.imv->size);
+	local_irq_restore(flags);
+}
+
+/**
+ * apply_imv_update - update one immediate value
+ * @imv: pointer of type const struct __imv to update
+ *
+ * Update one immediate value. Must be called with imv_mutex held.
+ * It makes sure all CPUs are not executing the modified code by having them
+ * busy looping with interrupts disabled.
+ * It does _not_ protect against NMI and MCE (could be a problem with Intel's
+ * errata if we use immediate values in their code path).
+ */
+static int apply_imv_update(const struct __imv *imv)
+{
+	unsigned long flags;
+	long online_cpus;
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (imv->size) {
+	case 1:	if (*(uint8_t *)imv->imv
+				== *(uint8_t *)imv->var)
+			return 0;
+		break;
+	case 2:	if (*(uint16_t *)imv->imv
+				== *(uint16_t *)imv->var)
+			return 0;
+		break;
+	case 4:	if (*(uint32_t *)imv->imv
+				== *(uint32_t *)imv->var)
+			return 0;
+		break;
+	case 8:	if (*(uint64_t *)imv->imv
+				== *(uint64_t *)imv->var)
+			return 0;
+		break;
+	default:return -EINVAL;
+	}
+
+	if (imv_early_boot_complete) {
+		kernel_text_lock();
+		get_online_cpus();
+		online_cpus = num_online_cpus();
+		atomic_set(&wait_sync, 2 * online_cpus);
+		loop_data.value = online_cpus;
+		loop_data.imv = imv;
+		smp_call_function(ipi_busy_loop, NULL, 1, 0);
+		local_irq_save(flags);
+		atomic_dec(&wait_sync);
+		do {
+			/* Make sure the wait_sync gets re-read */
+			smp_mb();
+		} while (atomic_read(&wait_sync) > online_cpus);
+		text_poke((void *)imv->imv, (void *)imv->var,
+				imv->size);
+		/*
+		 * Make sure the modified instruction is seen by all CPUs before
+		 * we continue (visible to other CPUs and local interrupts).
+		 */
+		wmb();
+		atomic_dec(&wait_sync);
+		flush_icache_range(imv->imv,
+				imv->imv + imv->size);
+		local_irq_restore(flags);
+		put_online_cpus();
+		kernel_text_unlock();
+	} else
+		text_poke_early((void *)imv->imv, (void *)imv->var,
+				imv->size);
+	return 0;
+}
+
+/**
+ * imv_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Updates a range of immediates.
+ */
+void imv_update_range(const struct __imv *begin,
+		const struct __imv *end)
+{
+	const struct __imv *iter;
+	int ret;
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&imv_mutex);
+		ret = apply_imv_update(iter);
+		if (imv_early_boot_complete && ret)
+			printk(KERN_WARNING
+				"Invalid immediate value. "
+				"Variable at %p, "
+				"instruction at %p, size %hu\n",
+				(void *)iter->imv,
+				(void *)iter->var, iter->size);
+		mutex_unlock(&imv_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(imv_update_range);
+
+/**
+ * imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_imv_update(void)
+{
+	/* Core kernel imvs */
+	imv_update_range(__start___imv, __stop___imv);
+}
+EXPORT_SYMBOL_GPL(core_imv_update);
+
+void __init imv_init_complete(void)
+{
+	imv_early_boot_complete = 1;
+}
Index: linux-2.6-lttng.mm/init/main.c
===================================================================
--- linux-2.6-lttng.mm.orig/init/main.c	2008-04-09 10:28:29.000000000 -0400
+++ linux-2.6-lttng.mm/init/main.c	2008-04-09 10:55:06.000000000 -0400
@@ -60,6 +60,7 @@
 #include <linux/sched.h>
 #include <linux/signal.h>
 #include <linux/idr.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -103,6 +104,11 @@ static inline void mark_rodata_ro(void) 
 #ifdef CONFIG_TC
 extern void tc_init(void);
 #endif
+#ifdef CONFIG_IMMEDIATE
+extern void imv_init_complete(void);
+#else
+static inline void imv_init_complete(void) { }
+#endif
 
 enum system_states system_state;
 EXPORT_SYMBOL(system_state);
@@ -521,6 +527,7 @@ asmlinkage void __init start_kernel(void
 	lockdep_init();
 	debug_objects_early_init();
 	cgroup_init_early();
+	core_imv_update();
 
 	local_irq_disable();
 	early_boot_irqs_off();
@@ -646,6 +653,7 @@ asmlinkage void __init start_kernel(void
 	cpuset_init();
 	taskstats_init_early();
 	delayacct_init();
+	imv_init_complete();
 
 	check_bugs();
 
Index: linux-2.6-lttng.mm/kernel/Makefile
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/Makefile	2008-04-09 10:28:21.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/Makefile	2008-04-09 10:53:20.000000000 -0400
@@ -66,6 +66,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 obj-$(CONFIG_MARKERS) += marker.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 
Index: linux-2.6-lttng.mm/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/asm-generic/vmlinux.lds.h	2008-04-09 10:24:48.000000000 -0400
+++ linux-2.6-lttng.mm/include/asm-generic/vmlinux.lds.h	2008-04-09 10:53:20.000000000 -0400
@@ -61,6 +61,9 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
+		VMLINUX_SYMBOL(__start___imv) = .;			\
+		*(__imv)		/* Immediate values: pointers */ \
+		VMLINUX_SYMBOL(__stop___imv) = .;			\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 11/17] Implement immediate update via stop_machine_run
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (9 preceding siblings ...)
  2008-04-09 15:08 ` [patch 10/17] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-10  8:04   ` KOSAKI Motohiro
  2008-04-09 15:08 ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Jason Baron, Mathieu Desnoyers,
	Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, akpm

[-- Attachment #1: implement-immediate-update-via-stop-machine-run.patch --]
[-- Type: text/plain, Size: 4771 bytes --]

-Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
 cpus using IPIs leads to deadlocks. Process A held a read lock on 
 tasklist_lock, then process B called apply_imv_update(). Process A received the 
 IPI and begins executing ipi_busy_loop(). Then process C takes a write lock 
 irq on the task list lock, before receiving the IPI. Thus, process A holds up 
 process C, and C can't get an IPI b/c interrupts are disabled. Solve this 
 problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which 
 runs a function on all cpus after they are busy looping and have disabled 
 irqs. Since this is done in a new process context, we don't have to worry 
 about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
 of testing...

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---

 kernel/immediate.c |   80 +++++++++++++----------------------------------------
 1 file changed, 21 insertions(+), 59 deletions(-)


Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/immediate.c	2008-03-03 10:22:06.000000000 -0500
+++ linux-2.6-lttng/kernel/immediate.c	2008-03-03 10:42:38.000000000 -0500
@@ -20,6 +20,7 @@
 #include <linux/immediate.h>
 #include <linux/memory.h>
 #include <linux/cpu.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 
@@ -27,48 +28,33 @@
  * Kernel ready to execute the SMP update that may depend on trap and ipi.
  */
 static int imv_early_boot_complete;
+static int wrote_text;
 
 extern const struct __imv __start___imv[];
 extern const struct __imv __stop___imv[];
 
+static int stop_machine_imv_update(void *imv_ptr)
+{
+	struct __imv *imv = imv_ptr;
+
+	if (!wrote_text) {
+		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
+		wrote_text = 1;
+		smp_wmb(); /* make sure other cpus see that this has run */
+	} else
+		sync_core();
+
+	flush_icache_range(imv->imv, imv->imv + imv->size);
+
+	return 0;
+}
+
 /*
  * imv_mutex nests inside module_mutex. imv_mutex protects builtin
  * immediates and module immediates.
  */
 static DEFINE_MUTEX(imv_mutex);
 
-static atomic_t wait_sync;
-
-struct ipi_loop_data {
-	long value;
-	const struct __imv *imv;
-} loop_data;
-
-static void ipi_busy_loop(void *arg)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
-	/*
-	 * Issuing a synchronizing instruction must be done on each CPU before
-	 * reenabling interrupts after modifying an instruction. Required by
-	 * Intel's errata.
-	 */
-	sync_core();
-	flush_icache_range(loop_data.imv->imv,
-		loop_data.imv->imv + loop_data.imv->size);
-	local_irq_restore(flags);
-}
 
 /**
  * apply_imv_update - update one immediate value
@@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
  */
 static int apply_imv_update(const struct __imv *imv)
 {
-	unsigned long flags;
-	long online_cpus;
-
 	/*
 	 * If the variable and the instruction have the same value, there is
 	 * nothing to do.
@@ -111,30 +94,9 @@ static int apply_imv_update(const struct
 
 	if (imv_early_boot_complete) {
 		kernel_text_lock();
-		get_online_cpus();
-		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
-		loop_data.value = online_cpus;
-		loop_data.imv = imv;
-		smp_call_function(ipi_busy_loop, NULL, 1, 0);
-		local_irq_save(flags);
-		atomic_dec(&wait_sync);
-		do {
-			/* Make sure the wait_sync gets re-read */
-			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
-		text_poke((void *)imv->imv, (void *)imv->var,
-				imv->size);
-		/*
-		 * Make sure the modified instruction is seen by all CPUs before
-		 * we continue (visible to other CPUs and local interrupts).
-		 */
-		wmb();
-		atomic_dec(&wait_sync);
-		flush_icache_range(imv->imv,
-				imv->imv + imv->size);
-		local_irq_restore(flags);
-		put_online_cpus();
+		wrote_text = 0;
+		stop_machine_run(stop_machine_imv_update, (void *)imv,
+					ALL_CPUS);
 		kernel_text_unlock();
 	} else
 		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (10 preceding siblings ...)
  2008-04-09 15:08 ` [patch 11/17] Implement immediate update via stop_machine_run Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-10  3:23   ` Rusty Russell
  2008-04-09 15:08 ` [patch 13/17] Immediate Values - x86 Optimization Mathieu Desnoyers
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

[-- Attachment #1: immediate-values-kconfig-menu-in-embedded.patch --]
[-- Type: text/plain, Size: 2830 bytes --]

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Note: Since I think that I really should let embedded systems developers using
RO memory the option to disable the immediate values, I choose to leave this
menu option there, in the EMBEDDED menu. Also, the "CONFIG_IMMEDIATE" makes
sense because we want to compile out all the immediate code when we decide not
to use optimized immediate values at all (it removes otherwise unused code).

Changelog:
- Change ARCH_SUPPORTS_IMMEDIATE for ARCH_HAS_IMMEDIATE

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 init/Kconfig |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Index: linux-2.6-lttng.mm/init/Kconfig
===================================================================
--- linux-2.6-lttng.mm.orig/init/Kconfig	2008-04-09 10:28:21.000000000 -0400
+++ linux-2.6-lttng.mm/init/Kconfig	2008-04-09 10:55:11.000000000 -0400
@@ -509,6 +509,20 @@ config CC_OPTIMIZE_FOR_SIZE
 config SYSCTL
 	bool
 
+config IMMEDIATE
+	default y if !DISABLE_IMMEDIATE
+	depends on HAVE_IMMEDIATE
+	bool
+	help
+	  Immediate values are used as read-mostly variables that are rarely
+	  updated. They use code patching to modify the values inscribed in the
+	  instruction stream. It provides a way to save precious cache lines
+	  that would otherwise have to be used by these variables. They can be
+	  disabled through the EMBEDDED menu.
+
+config HAVE_IMMEDIATE
+	def_bool n
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -774,6 +788,16 @@ config PROC_PAGE_MONITOR
 	  /proc/kpagecount, and /proc/kpageflags. Disabling these
           interfaces will reduce the size of the kernel by approximately 4kb.
 
+config DISABLE_IMMEDIATE
+	default y if EMBEDDED
+	bool "Disable immediate values" if EMBEDDED
+	depends on HAVE_IMMEDIATE
+	help
+	  Disable code patching based immediate values for embedded systems. It
+	  consumes slightly more memory and requires to modify the instruction
+	  stream each time a variable is updated. Should really be disabled for
+	  embedded systems with read-only text.
+
 endmenu		# General setup
 
 config SLABINFO

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 13/17] Immediate Values - x86 Optimization
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (11 preceding siblings ...)
  2008-04-09 15:08 ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 18:01   ` H. Peter Anvin
  2008-04-09 15:08 ` [patch 14/17] Add text_poke and sync_core to powerpc Mathieu Desnoyers
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Andi Kleen,
	H. Peter Anvin, Chuck Ebbert, Christoph Hellwig,
	Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk,
	Alexey Dobriyan, akpm

[-- Attachment #1: immediate-values-x86-optimization.patch --]
[-- Type: text/plain, Size: 5234 bytes --]

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
  non atomic writes to a code region only touched by us (nobody can execute it
  since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.

Ok, so the most flexible solution that I see, that should fit for both
x86 and x86_64 would be :
1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
                       x86-64 it is equivalent to "r" class (for 8-bit
                       instructions that do not use upper halves).
2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
                       general register.

- "Redux" immediate values : no need to put a breakpoint, therefore, no
  need to know where the instruction starts. It's therefore OK to have a
  REX prefix.

- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
  immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
  Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: akpm@osdl.org
---
 arch/x86/Kconfig            |    1 
 include/asm-x86/immediate.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/immediate.h	2008-03-06 08:55:22.000000000 -0500
@@ -0,0 +1,77 @@
+#ifndef _ASM_X86_IMMEDIATE_H
+#define _ASM_X86_IMMEDIATE_H
+
+/*
+ * Immediate values. x86 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ * If size is bigger than the architecture long size, fall back on a memory
+ * read.
+ *
+ * Make sure to populate the initial static 64 bits opcode with a value
+ * what will generate an instruction with 8 bytes immediate value (not the REX.W
+ * prefixed one that loads a sign extended 32 bits immediate value in a r64
+ * register).
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=q" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 2:							\
+		case 4:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 8:							\
+			if (sizeof(long) < 8) {				\
+				value = name##__imv;			\
+				break;					\
+			}						\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0xFEFEFEFE01010101,%0\n\t" 	\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_X86_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig	2008-03-06 08:45:31.000000000 -0500
+++ linux-2.6-lttng/arch/x86/Kconfig	2008-03-06 08:55:47.000000000 -0500
@@ -23,6 +23,7 @@ config X86
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
+	select HAVE_IMMEDIATE
 
 
 config GENERIC_LOCKBREAK

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 14/17] Add text_poke and sync_core to powerpc
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (12 preceding siblings ...)
  2008-04-09 15:08 ` [patch 13/17] Immediate Values - x86 Optimization Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 15/17] Immediate Values - Powerpc Optimization Mathieu Desnoyers
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Christoph Hellwig,
	Paul Mackerras, Adrian Bunk, Alexey Dobriyan, akpm

[-- Attachment #1: add-text-poke-and-sync-core-to-powerpc.patch --]
[-- Type: text/plain, Size: 1503 bytes --]

- Needed on architectures where we must surround live instruction modification
  with "WP flag disable".
- Turns into a memcpy on powerpc since there is no WP flag activated for
  instruction pages (yet..).
- Add empty sync_core to powerpc so it can be used in architecture independent
  code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 include/asm-powerpc/cacheflush.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-powerpc/cacheflush.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/cacheflush.h	2007-11-19 12:05:50.000000000 -0500
+++ linux-2.6-lttng/include/asm-powerpc/cacheflush.h	2007-11-19 13:27:36.000000000 -0500
@@ -63,7 +63,9 @@ extern void flush_dcache_phys_range(unsi
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 	memcpy(dst, src, len)
 
-
+#define text_poke	memcpy
+#define text_poke_early	text_poke
+#define sync_core()
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 /* internal debugging function */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 15/17] Immediate Values - Powerpc Optimization
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (13 preceding siblings ...)
  2008-04-09 15:08 ` [patch 14/17] Add text_poke and sync_core to powerpc Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-09 15:08 ` [patch 16/17] Immediate Values - Documentation Mathieu Desnoyers
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Christoph Hellwig,
	Paul Mackerras, Adrian Bunk, Alexey Dobriyan, akpm

[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 3174 bytes --]

PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value.

Changelog:
- Put imv_set and _imv_set in the architecture independent header.
- Pack the __imv section. Use smallest types required for size (char).
- Remove architecture specific update code : now handled by architecture
  agnostic code.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 arch/powerpc/Kconfig            |    1 
 include/asm-powerpc/immediate.h |   55 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

Index: linux-2.6-lttng.mm/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng.mm/include/asm-powerpc/immediate.h	2008-04-09 10:55:31.000000000 -0400
@@ -0,0 +1,55 @@
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-1)\n\t"	\
+					".byte 1\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 2:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-2)\n\t"	\
+					".byte 2\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 4:							\
+		case 8:	value = name##__imv;				\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng.mm/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.mm.orig/arch/powerpc/Kconfig	2008-04-09 10:26:07.000000000 -0400
+++ linux-2.6-lttng.mm/arch/powerpc/Kconfig	2008-04-09 10:55:48.000000000 -0400
@@ -92,6 +92,7 @@ config PPC
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_LMB
+	select HAVE_IMMEDIATE
 
 config EARLY_PRINTK
 	bool

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 16/17] Immediate Values - Documentation
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (14 preceding siblings ...)
  2008-04-09 15:08 ` [patch 15/17] Immediate Values - Powerpc Optimization Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-10  3:33   ` Rusty Russell
  2008-04-09 15:08 ` [patch 17/17] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 9057 bytes --]

Changelog:
- Remove imv_set_early (removed from API).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 Documentation/immediate.txt |  221 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 221 insertions(+)

Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt	2008-02-01 07:42:01.000000000 -0500
@@ -0,0 +1,221 @@
+		        Using the Immediate Values
+
+			    Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel variables that sit within
+the instruction stream. They are meant to be rarely updated but read often.
+Using immediate values for these variables will save cache lines.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+Compiling code meant to be rarely enabled at runtime can be done using
+if (unlikely(imv_read(var))) as condition surrounding the code. The
+smallest data type required for the test (an 8 bits char) is preferred, since
+some architectures, such as powerpc, only allow up to 16 bits immediate values.
+
+
+* Usage
+
+In order to use the "immediate" macros, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+DEFINE_IMV(char, this_immediate);
+EXPORT_IMV_SYMBOL(this_immediate);
+
+
+And use, in the body of a function:
+
+Use imv_set(this_immediate) to set the immediate value.
+
+Use imv_read(this_immediate) to read the immediate value.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+If you have to read the immediate values from a function declared as __init or
+__exit, you should explicitly use _imv_read(), which will fall back on a
+global variable read. Failing to do so will leave a reference to the __init
+section after it is freed (it would generate a modpost warning).
+
+You can choose to set an initial static value to the immediate by using, for
+instance:
+
+DEFINE_IMV(long, myptr) = 10;
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+
+* Performance improvement
+
+
+  * Memory hit for a data-based branch
+
+Here are the results on a 3GHz Pentium 4:
+
+number of tests: 100
+number of branches per test: 100000
+memory hit cycles per iteration (mean): 636.611
+L1 cache hit cycles per iteration (mean): 89.6413
+instruction stream based test, cycles per iteration (mean): 85.3438
+Just getting the pointer from a modulo on a pseudo-random value, doing
+  nothing with it, cycles per iteration (mean): 77.5044
+
+So:
+Base case:                      77.50 cycles
+instruction stream based test:  +7.8394 cycles
+L1 cache hit based test:        +12.1369 cycles
+Memory load based test:         +559.1066 cycles
+
+So let's say we have a ping flood coming at
+(14014 packets transmitted, 14014 received, 0% packet loss, time 1826ms)
+7674 packets per second. If we put 2 markers for irq entry/exit, it
+brings us to 15348 markers sites executed per second.
+
+(15348 exec/s) * (559 cycles/exec) / (3G cycles/s) = 0.0029
+We therefore have a 0.29% slowdown just on this case.
+
+Compared to this, the instruction stream based test will cause a
+slowdown of:
+
+(15348 exec/s) * (7.84 cycles/exec) / (3G cycles/s) = 0.00004
+For a 0.004% slowdown.
+
+If we plan to use this for memory allocation, spinlock, and all sorts of
+very high event rate tracing, we can assume it will execute 10 to 100
+times more sites per second, which brings us to 0.4% slowdown with the
+instruction stream based test compared to 29% slowdown with the memory
+load based test on a system with high memory pressure.
+
+
+
+  * Markers impact under heavy memory load
+
+Running a kernel with my LTTng instrumentation set, in a test that
+generates memory pressure (from userspace) by trashing L1 and L2 caches
+between calls to getppid() (note: syscall_trace is active and calls
+a marker upon syscall entry and syscall exit; markers are disarmed).
+This test is done in user-space, so there are some delays due to IRQs
+coming and to the scheduler. (UP 2.6.22-rc6-mm1 kernel, task with -20
+nice level)
+
+My first set of results: Linear cache trashing, turned out not to be
+very interesting, because it seems like the linearity of the memset on a
+full array is somehow detected and it does not "really" trash the
+caches.
+
+Now the most interesting result: Random walk L1 and L2 trashing
+surrounding a getppid() call.
+
+- Markers compiled out (but syscall_trace execution forced)
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.033 cycles
+getppid: 1681.4 cycles
+With memory pressure
+Reading timestamps takes 102.938 cycles
+getppid: 15691.6 cycles
+
+
+- With the immediate values based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.006 cycles
+getppid: 1681.84 cycles
+With memory pressure
+Reading timestamps takes 100.291 cycles
+getppid: 11793 cycles
+
+
+- With global variables based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 107.999 cycles
+getppid: 1669.06 cycles
+With memory pressure
+Reading timestamps takes 102.839 cycles
+getppid: 12535 cycles
+
+The result is quite interesting in that the kernel is slower without
+markers than with markers. I explain it by the fact that the data
+accessed is not laid out in the same manner in the cache lines when the
+markers are compiled in or out. It seems that it aligns the function's
+data better to compile-in the markers in this case.
+
+But since the interesting comparison is between the immediate values and
+global variables based markers, and because they share the same memory
+layout, except for the movl being replaced by a movz, we see that the
+global variable based markers (2 markers) adds 742 cycles to each system
+call (syscall entry and exit are traced and memory locations for both
+global variables lie on the same cache line).
+
+
+- Test redone with less iterations, but with error estimates
+
+10 runs of 100 iterations each: Tests done on a 3GHz P4. Here I run getppid with
+syscall trace inactive, comparing the case with memory pressure and without
+memory pressure. (sorry, my system is not setup to execute syscall_trace this
+time, but it will make the point anyway).
+
+No memory pressure
+Reading timestamps:     150.92 cycles,     std dev.    1.01 cycles
+getppid:               1462.09 cycles,     std dev.   18.87 cycles
+
+With memory pressure
+Reading timestamps:     578.22 cycles,     std dev.  269.51 cycles
+getppid:              17113.33 cycles,     std dev. 1655.92 cycles
+
+
+Now for memory read timing: (10 runs, branches per test: 100000)
+Memory read based branch:
+                       644.09 cycles,      std dev.   11.39 cycles
+L1 cache hit based branch:
+                        88.16 cycles,      std dev.    1.35 cycles
+
+
+So, now that we have the raw results, let's calculate:
+
+Memory read:
+644.09±11.39 - 88.16±1.35 = 555.93±11.46 cycles
+
+Getppid without memory pressure:
+1462.09±18.87 - 150.92±1.01 = 1311.17±18.90 cycles
+
+Getppid with memory pressure:
+17113.33±1655.92 - 578.22±269.51 = 16535.11±1677.71 cycles
+
+Therefore, if we add 2 markers not based on immediate values to the getppid
+code, which would add 2 memory reads, we would add
+2 * 555.93±12.74 = 1111.86±25.48 cycles
+
+Therefore,
+
+1111.86±25.48 / 16535.11±1677.71 = 0.0672
+ relative error: sqrt(((25.48/1111.86)^2)+((1677.71/16535.11)^2))
+                     = 0.1040
+ absolute error: 0.1040 * 0.0672 = 0.0070
+
+Therefore: 0.0672±0.0070 * 100% = 6.72±0.70 %
+
+We can therefore affirm that adding 2 markers to getppid, on a system with high
+memory pressure, would have a performance hit of at least 6.0% on the system
+call time, all within the uncertainty limits of these tests. The same applies to
+other kernel code paths. The smaller those code paths are, the highest the
+impact ratio will be.
+
+Therefore, not only is it interesting to use the immediate values to dynamically
+activate dormant code such as the markers, but I think it should also be
+considered as a replacement for many of the "read-mostly" static variables.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 17/17] Scheduler Profiling - Use Immediate Values
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (15 preceding siblings ...)
  2008-04-09 15:08 ` [patch 16/17] Immediate Values - Documentation Mathieu Desnoyers
@ 2008-04-09 15:08 ` Mathieu Desnoyers
  2008-04-10  4:23 ` [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 KOSAKI Motohiro
  2008-04-10  7:31 ` Takashi Nishiie
  18 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 15:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

[-- Attachment #1: scheduler-profiling-use-immediate-values.patch --]
[-- Type: text/plain, Size: 6066 bytes --]

Use immediate values with lower d-cache hit in optimized version as a
condition for scheduler profiling call.

Changelog :
- Use imv_* instead of immediate_*.
- Follow the white rabbit : kvm_main.c which becomes x86.c.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 arch/x86/kvm/x86.c      |    2 +-
 include/linux/profile.h |    5 +++--
 kernel/profile.c        |   22 +++++++++++-----------
 kernel/sched_fair.c     |    5 +----
 4 files changed, 16 insertions(+), 18 deletions(-)

Index: linux-2.6-lttng.mm/kernel/profile.c
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/profile.c	2008-04-09 10:23:26.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/profile.c	2008-04-09 10:55:53.000000000 -0400
@@ -41,8 +41,8 @@ static int (*timer_hook)(struct pt_regs 
 static atomic_t *prof_buffer;
 static unsigned long prof_len, prof_shift;
 
-int prof_on __read_mostly;
-EXPORT_SYMBOL_GPL(prof_on);
+DEFINE_IMV(char, prof_on) __read_mostly;
+EXPORT_IMV_SYMBOL_GPL(prof_on);
 
 static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
 #ifdef CONFIG_SMP
@@ -60,7 +60,7 @@ static int __init profile_setup(char *st
 
 	if (!strncmp(str, sleepstr, strlen(sleepstr))) {
 #ifdef CONFIG_SCHEDSTATS
-		prof_on = SLEEP_PROFILING;
+		imv_set(prof_on, SLEEP_PROFILING);
 		if (str[strlen(sleepstr)] == ',')
 			str += strlen(sleepstr) + 1;
 		if (get_option(&str, &par))
@@ -73,7 +73,7 @@ static int __init profile_setup(char *st
 			"kernel sleep profiling requires CONFIG_SCHEDSTATS\n");
 #endif /* CONFIG_SCHEDSTATS */
 	} else if (!strncmp(str, schedstr, strlen(schedstr))) {
-		prof_on = SCHED_PROFILING;
+		imv_set(prof_on, SCHED_PROFILING);
 		if (str[strlen(schedstr)] == ',')
 			str += strlen(schedstr) + 1;
 		if (get_option(&str, &par))
@@ -82,7 +82,7 @@ static int __init profile_setup(char *st
 			"kernel schedule profiling enabled (shift: %ld)\n",
 			prof_shift);
 	} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
-		prof_on = KVM_PROFILING;
+		imv_set(prof_on, KVM_PROFILING);
 		if (str[strlen(kvmstr)] == ',')
 			str += strlen(kvmstr) + 1;
 		if (get_option(&str, &par))
@@ -92,7 +92,7 @@ static int __init profile_setup(char *st
 			prof_shift);
 	} else if (get_option(&str, &par)) {
 		prof_shift = par;
-		prof_on = CPU_PROFILING;
+		imv_set(prof_on, CPU_PROFILING);
 		printk(KERN_INFO "kernel profiling enabled (shift: %ld)\n",
 			prof_shift);
 	}
@@ -103,7 +103,7 @@ __setup("profile=", profile_setup);
 
 void __init profile_init(void)
 {
-	if (!prof_on)
+	if (!_imv_read(prof_on))
 		return;
 
 	/* only text is profiled */
@@ -290,7 +290,7 @@ void profile_hits(int type, void *__pc, 
 	int i, j, cpu;
 	struct profile_hit *hits;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
 	i = primary = (pc & (NR_PROFILE_GRP - 1)) << PROFILE_GRPSHIFT;
@@ -400,7 +400,7 @@ void profile_hits(int type, void *__pc, 
 {
 	unsigned long pc;
 
-	if (prof_on != type || !prof_buffer)
+	if (!prof_buffer)
 		return;
 	pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
 	atomic_add(nr_hits, &prof_buffer[min(pc, prof_len - 1)]);
@@ -557,7 +557,7 @@ static int __init create_hash_tables(voi
 	}
 	return 0;
 out_cleanup:
-	prof_on = 0;
+	imv_set(prof_on, 0);
 	smp_mb();
 	on_each_cpu(profile_nop, NULL, 0, 1);
 	for_each_online_cpu(cpu) {
@@ -584,7 +584,7 @@ static int __init create_proc_profile(vo
 {
 	struct proc_dir_entry *entry;
 
-	if (!prof_on)
+	if (!_imv_read(prof_on))
 		return 0;
 	if (create_hash_tables())
 		return -1;
Index: linux-2.6-lttng.mm/include/linux/profile.h
===================================================================
--- linux-2.6-lttng.mm.orig/include/linux/profile.h	2008-04-09 10:28:31.000000000 -0400
+++ linux-2.6-lttng.mm/include/linux/profile.h	2008-04-09 10:55:53.000000000 -0400
@@ -5,10 +5,11 @@
 #include <linux/init.h>
 #include <linux/cpumask.h>
 #include <linux/cache.h>
+#include <linux/immediate.h>
 
 #include <asm/errno.h>
 
-extern int prof_on __read_mostly;
+DECLARE_IMV(char, prof_on) __read_mostly;
 
 #define CPU_PROFILING	1
 #define SCHED_PROFILING	2
@@ -36,7 +37,7 @@ static inline void profile_hit(int type,
 	/*
 	 * Speedup for the common (no profiling enabled) case:
 	 */
-	if (unlikely(prof_on == type))
+	if (unlikely(imv_read(prof_on) == type))
 		profile_hits(type, ip, 1);
 }
 
Index: linux-2.6-lttng.mm/kernel/sched_fair.c
===================================================================
--- linux-2.6-lttng.mm.orig/kernel/sched_fair.c	2008-04-09 10:26:45.000000000 -0400
+++ linux-2.6-lttng.mm/kernel/sched_fair.c	2008-04-09 10:55:53.000000000 -0400
@@ -455,11 +455,8 @@ static void enqueue_sleeper(struct cfs_r
 		 * get a milliseconds-range estimation of the amount of
 		 * time that the task spent sleeping:
 		 */
-		if (unlikely(prof_on == SLEEP_PROFILING)) {
-
-			profile_hits(SLEEP_PROFILING, (void *)get_wchan(tsk),
+		profile_hits(SLEEP_PROFILING, (void *)get_wchan(task_of(se)),
 				     delta >> 20);
-		}
 		account_scheduler_latency(tsk, delta >> 10, 0);
 	}
 #endif
Index: linux-2.6-lttng.mm/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6-lttng.mm.orig/arch/x86/kvm/x86.c	2008-04-09 10:26:33.000000000 -0400
+++ linux-2.6-lttng.mm/arch/x86/kvm/x86.c	2008-04-09 10:55:53.000000000 -0400
@@ -2814,7 +2814,7 @@ again:
 	/*
 	 * Profile KVM exit RIPs:
 	 */
-	if (unlikely(prof_on == KVM_PROFILING)) {
+	if (unlikely(imv_read(prof_on) == KVM_PROFILING)) {
 		kvm_x86_ops->cache_regs(vcpu);
 		profile_hit(KVM_PROFILING, (void *)vcpu->arch.rip);
 	}

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization
  2008-04-09 15:08 ` [patch 13/17] Immediate Values - x86 Optimization Mathieu Desnoyers
@ 2008-04-09 18:01   ` H. Peter Anvin
  2008-04-09 19:08     ` Mathieu Desnoyers
  2008-04-09 20:21     ` [patch 13/17] Immediate Values - x86 Optimization (updated) Mathieu Desnoyers
  0 siblings, 2 replies; 48+ messages in thread
From: H. Peter Anvin @ 2008-04-09 18:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

Mathieu Desnoyers wrote:
> Ok, so the most flexible solution that I see, that should fit for both
> x86 and x86_64 would be :
> 1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
>                        x86-64 it is equivalent to "r" class (for 8-bit
>                        instructions that do not use upper halves).
> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
>                        general register.

Any reason to keep carrying this completely misleading comment chunk still?

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 15:08 ` [patch 09/17] Add all cpus option to stop machine run Mathieu Desnoyers
@ 2008-04-09 18:10   ` Alexey Dobriyan
  2008-04-09 18:24     ` Andi Kleen
  2008-04-09 18:54     ` Mathieu Desnoyers
  0 siblings, 2 replies; 48+ messages in thread
From: Alexey Dobriyan @ 2008-04-09 18:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Jason Baron, Adrian Bunk, Christoph Hellwig, akpm

Please, stop Cc'ing me on this.

I continue to think that all this so-called "immediate" infrastructure
is an absolute overkill and declared D-cache line savings will _never_
matter and will never be measured on real life workloads. Right now you
claim 1 (one) cacheline saving in schedule() which can be trivially
moved under CONFIG_PROFILING umbrella. On the other hand is more ugly
assembler code. Sorry, people add infra with much better good/bad ratios.

Also, my gut feeling of a guy who is also on receiveing end of bugreports
at SWsoft that line with .text games was crossed by SMP alternatives and
fooprobes. The rest won't matter because programs like firefox will fuck
up L1, L2 caches in one blow just by showing animation.

We don't need bugs when immediate update screwed up fighting for
cacheline.

And bugs when screwup happened out of the window of "Code:" printout, so
you won't have a chance to compare relevant vmlinux .text and printout.

And bugs when CPU executed bullshit and silently rebooted.

And so on.

	Alexey, more and more liking OpenBSD
		where such games won't even
	        hit mailing lists


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 18:10   ` Alexey Dobriyan
@ 2008-04-09 18:24     ` Andi Kleen
  2008-04-10  3:34       ` Rusty Russell
  2008-04-10  4:26       ` KOSAKI Motohiro
  2008-04-09 18:54     ` Mathieu Desnoyers
  1 sibling, 2 replies; 48+ messages in thread
From: Andi Kleen @ 2008-04-09 18:24 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Mathieu Desnoyers, akpm, Ingo Molnar, linux-kernel, Andi Kleen,
	Rusty Russell, Jason Baron, Adrian Bunk, Christoph Hellwig, akpm

On Wed, Apr 09, 2008 at 10:10:10PM +0400, Alexey Dobriyan wrote:
> I continue to think that all this so-called "immediate" infrastructure
> is an absolute overkill and declared D-cache line savings will _never_
> matter 

You are quite wrong on that.

> and will never be measured on real life workloads. Right now you
> claim 1 (one) cacheline saving in schedule() which can be trivially

That is just an example. Once the infrastructure is in a lot more
flags would move it into it. I think eventually most sysctls
should be immediate values for once.

> Also, my gut feeling of a guy who is also on receiveing end of bugreports
> at SWsoft that line with .text games was crossed by SMP alternatives and

You already lost -- Linux regularly rewrites itself. Ok not quite yet
but self modifying code is already wide spread and happens commonly
(e.g. with alternatives and some other cases) 

> 
> And bugs when CPU executed bullshit and silently rebooted.

So far nobody has seen that and the probably of it actually happening
is rather remote too.


> And so on.
> 
> 	Alexey, more and more liking OpenBSD
> 		where such games won't even
> 	        hit mailing lists

Maybe that is why Linux scales to large systems and OpenBSD 
doesn't ...

-Andi


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 18:10   ` Alexey Dobriyan
  2008-04-09 18:24     ` Andi Kleen
@ 2008-04-09 18:54     ` Mathieu Desnoyers
  1 sibling, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 18:54 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Jason Baron, Adrian Bunk, Christoph Hellwig, akpm

* Alexey Dobriyan (adobriyan@gmail.com) wrote:
> Please, stop Cc'ing me on this.
> 

Fixed.

> I continue to think that all this so-called "immediate" infrastructure
> is an absolute overkill and declared D-cache line savings will _never_
> matter and will never be measured on real life workloads. Right now you
> claim 1 (one) cacheline saving in schedule() which can be trivially
> moved under CONFIG_PROFILING umbrella. On the other hand is more ugly
> assembler code. Sorry, people add infra with much better good/bad ratios.
> 
> Also, my gut feeling of a guy who is also on receiveing end of bugreports
> at SWsoft that line with .text games was crossed by SMP alternatives and
> fooprobes. The rest won't matter because programs like firefox will fuck
> up L1, L2 caches in one blow just by showing animation.
> 

I did not know the only thing we should ever care about as kernel
developers was Firefox ? ;)

I find it quite amusing that at one end of the spectrum we have Ingo
fearing about a few bytes added to the scheduler object even when they
are never loaded in cache, and that at the complete other extreme we
find people arguing that cache lines used by the kernel does not matter
because this and this userspace app is-oh-so-bloated.

Am I the only one interested in finally getting a tracer infrastructure
in the Linux kernel ? (yes, this is the original objective behind
immediate values by the way)

Mathieu

> We don't need bugs when immediate update screwed up fighting for
> cacheline.
> 
> And bugs when screwup happened out of the window of "Code:" printout, so
> you won't have a chance to compare relevant vmlinux .text and printout.
> 
> And bugs when CPU executed bullshit and silently rebooted.
> 
> And so on.
> 
> 	Alexey, more and more liking OpenBSD
> 		where such games won't even
> 	        hit mailing lists
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization
  2008-04-09 18:01   ` H. Peter Anvin
@ 2008-04-09 19:08     ` Mathieu Desnoyers
  2008-04-09 22:33       ` H. Peter Anvin
  2008-04-09 20:21     ` [patch 13/17] Immediate Values - x86 Optimization (updated) Mathieu Desnoyers
  1 sibling, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 19:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Ok, so the most flexible solution that I see, that should fit for both
>> x86 and x86_64 would be :
>> 1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
>>                        x86-64 it is equivalent to "r" class (for 8-bit
>>                        instructions that do not use upper halves).
>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is 
>> in a
>>                        general register.
>
> Any reason to keep carrying this completely misleading comment chunk still?
>
> 	-hpa

This comment explains why I use the =q constraint for the 1 bytes
immediate value. It makes sure we use an instruction with 1-byte opcode,
without REX.R prefix, on x86_64.

That's required for the NMI-safe version of the immediate values, which
uses a breakpoint, but not for this version based on stop_machine_run().
However, to minimize the amount of changes between the two versions, I
left the =q constraint, which is more restrictive. Is it worth it to use
=r instead ? It will typically let the compiler use a wider range of
registers on x86_64.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 01/17] Kprobes - use a mutex to protect the instruction pages list.
  2008-04-09 15:08 ` [patch 01/17] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
@ 2008-04-09 20:08   ` Masami Hiramatsu
  0 siblings, 0 replies; 48+ messages in thread
From: Masami Hiramatsu @ 2008-04-09 20:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Ananth N Mavinakayanahalli, hch, anil.s.keshavamurthy, davem

Mathieu Desnoyers wrote:
> Protect the instruction pages list by a specific insn pages mutex, called in 
> get_insn_slot() and free_insn_slot(). It makes sure that architectures that does
> not need to call arch_remove_kprobe() does not take an unneeded kprobes mutex.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> CC: hch@infradead.org
> CC: anil.s.keshavamurthy@intel.com
> CC: davem@davemloft.net

These kprobes related patches are good to me.

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>

Thank you!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 02/17] Kprobes - do not use kprobes mutex in arch code
  2008-04-09 15:08 ` [patch 02/17] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
@ 2008-04-09 20:08   ` Masami Hiramatsu
  0 siblings, 0 replies; 48+ messages in thread
From: Masami Hiramatsu @ 2008-04-09 20:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Ananth N Mavinakayanahalli, anil.s.keshavamurthy, davem

Mathieu Desnoyers wrote:
> Remove the kprobes mutex from kprobes.h, since it does not belong there. Also
> remove all use of this mutex in the architecture specific code, replacing it by
> a proper mutex lock/unlock in the architecture agnostic code.
> 
> Changelog :
> - remove unnecessary kprobe_mutex around arch_remove_kprobe()
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> CC: anil.s.keshavamurthy@intel.com
> CC: davem@davemloft.net

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>

Thanks,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 03/17] Kprobes - declare kprobe_mutex static
  2008-04-09 15:08 ` [patch 03/17] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
@ 2008-04-09 20:08   ` Masami Hiramatsu
  0 siblings, 0 replies; 48+ messages in thread
From: Masami Hiramatsu @ 2008-04-09 20:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Ananth N Mavinakayanahalli, hch, anil.s.keshavamurthy, davem

Mathieu Desnoyers wrote:
> Since it will not be used by other kernel objects, it makes sense to declare it
> static.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> CC: hch@infradead.org
> CC: anil.s.keshavamurthy@intel.com
> CC: davem@davemloft.net

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>

Thanks,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization (updated)
  2008-04-09 18:01   ` H. Peter Anvin
  2008-04-09 19:08     ` Mathieu Desnoyers
@ 2008-04-09 20:21     ` Mathieu Desnoyers
  2008-04-09 22:33       ` H. Peter Anvin
  1 sibling, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 20:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Ok, so the most flexible solution that I see, that should fit for both
>> x86 and x86_64 would be :
>> 1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
>>                        x86-64 it is equivalent to "r" class (for 8-bit
>>                        instructions that do not use upper halves).
>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is 
>> in a
>>                        general register.
>
> Any reason to keep carrying this completely misleading comment chunk still?
>
> 	-hpa

Hrm, since even the nmi-safe version supports REX-prefixed instructions,
there is no need for an =q constraint on single-byte immediate values
anymore. (thanks to your "discard" section used in the nmi-safe version)

Here is the updated patch for the "[patch 13/17] Immediate Values - x86
Optimization". Thanks!

Mathieu

Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
  non atomic writes to a code region only touched by us (nobody can execute it
  since we are protected by the imv_mutex).
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
  immediate structure.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
  Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: akpm@osdl.org
---
 arch/x86/Kconfig            |    1 
 include/asm-x86/immediate.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/immediate.h	2008-04-09 15:02:34.000000000 -0400
@@ -0,0 +1,67 @@
+#ifndef _ASM_X86_IMMEDIATE_H
+#define _ASM_X86_IMMEDIATE_H
+
+/*
+ * Immediate values. x86 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ * If size is bigger than the architecture long size, fall back on a memory
+ * read.
+ *
+ * Make sure to populate the initial static 64 bits opcode with a value
+ * what will generate an instruction with 8 bytes immediate value (not the REX.W
+ * prefixed one that loads a sign extended 32 bits immediate value in a r64
+ * register).
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+		case 2:							\
+		case 4:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 8:							\
+			if (sizeof(long) < 8) {				\
+				value = name##__imv;			\
+				break;					\
+			}						\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0xFEFEFEFE01010101,%0\n\t" 	\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_X86_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig	2008-04-09 11:04:58.000000000 -0400
+++ linux-2.6-lttng/arch/x86/Kconfig	2008-04-09 15:00:01.000000000 -0400
@@ -23,6 +23,7 @@ config X86
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
+	select HAVE_IMMEDIATE
 
 
 config GENERIC_LOCKBREAK


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization (updated)
  2008-04-09 20:21     ` [patch 13/17] Immediate Values - x86 Optimization (updated) Mathieu Desnoyers
@ 2008-04-09 22:33       ` H. Peter Anvin
  2008-04-09 23:15         ` Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2008-04-09 22:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

Mathieu Desnoyers wrote:
> 
> Hrm, since even the nmi-safe version supports REX-prefixed instructions,
> there is no need for an =q constraint on single-byte immediate values
> anymore. (thanks to your "discard" section used in the nmi-safe version)
> 

WRONG!

Using =r for single-byte values is incorrect for 32-bit code -- that 
would permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode. 
That is not the incorrect bit, it's the description that is confused.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization
  2008-04-09 19:08     ` Mathieu Desnoyers
@ 2008-04-09 22:33       ` H. Peter Anvin
  2008-04-10  0:42         ` Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: H. Peter Anvin @ 2008-04-09 22:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
>> Mathieu Desnoyers wrote:
>>> Ok, so the most flexible solution that I see, that should fit for both
>>> x86 and x86_64 would be :
>>> 1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
>>>                        x86-64 it is equivalent to "r" class (for 8-bit
>>>                        instructions that do not use upper halves).
>>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is 
>>> in a
>>>                        general register.
>> Any reason to keep carrying this completely misleading comment chunk still?
>>
>> 	-hpa
> 
> This comment explains why I use the =q constraint for the 1 bytes
> immediate value. It makes sure we use an instruction with 1-byte opcode,
> without REX.R prefix, on x86_64.

No, it doesn't.  That would be "=Q".

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization (updated)
  2008-04-09 22:33       ` H. Peter Anvin
@ 2008-04-09 23:15         ` Mathieu Desnoyers
  0 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-09 23:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Hrm, since even the nmi-safe version supports REX-prefixed instructions,
>> there is no need for an =q constraint on single-byte immediate values
>> anymore. (thanks to your "discard" section used in the nmi-safe version)
>
> WRONG!
>
> Using =r for single-byte values is incorrect for 32-bit code -- that would 
> permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode. That is not 
> the incorrect bit, it's the description that is confused.
>
> 	-hpa

Ah, right. I remembered there was something that made us use =q, but
could not remember what. I'll describe it correctly.

Therefore, this updated patch is bogus. The original one was ok, given
that we change the header to :


Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Note : a movb needs to get its value froma =q constraint.

Quoting "H. Peter Anvin" <hpa@zytor.com>

Using =r for single-byte values is incorrect for 32-bit code -- that would  
permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode. 

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
  non atomic writes to a code region only touched by us (nobody can execute it
  since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
  immediate structure.
- Change the immediate.c update code to support variable length opcodes.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
  Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.


Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization
  2008-04-09 22:33       ` H. Peter Anvin
@ 2008-04-10  0:42         ` Mathieu Desnoyers
  2008-04-10  0:47           ` H. Peter Anvin
  0 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-10  0:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> * H. Peter Anvin (hpa@zytor.com) wrote:
>>> Mathieu Desnoyers wrote:
>>>> Ok, so the most flexible solution that I see, that should fit for both
>>>> x86 and x86_64 would be :
>>>> 1 byte  :       "=q" : "a", "b", "c", or "d" register for the i386.  For
>>>>                        x86-64 it is equivalent to "r" class (for 8-bit
>>>>                        instructions that do not use upper halves).
>>>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is 
>>>> in a
>>>>                        general register.
>>> Any reason to keep carrying this completely misleading comment chunk 
>>> still?
>>>
>>> 	-hpa
>> This comment explains why I use the =q constraint for the 1 bytes
>> immediate value. It makes sure we use an instruction with 1-byte opcode,
>> without REX.R prefix, on x86_64.
>
> No, it doesn't.  That would be "=Q".
>
> 	-hpa

Ok. Sorry, it's been a few months since we looked at this. So the =q
opcode lets the compiler choose instructions with or without REX prefix.
We can allow this because 

- We don't need the opcode length in the stop_machine_run() version
- we support variable length opcode in the nmi-safe version

Am I remembering correctly now ?

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 13/17] Immediate Values - x86 Optimization
  2008-04-10  0:42         ` Mathieu Desnoyers
@ 2008-04-10  0:47           ` H. Peter Anvin
  0 siblings, 0 replies; 48+ messages in thread
From: H. Peter Anvin @ 2008-04-10  0:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge,
	Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, akpm

Mathieu Desnoyers wrote:
> 
> Ok. Sorry, it's been a few months since we looked at this. So the =q
> opcode lets the compiler choose instructions with or without REX prefix.
> We can allow this because 
> 
> - We don't need the opcode length in the stop_machine_run() version
> - we support variable length opcode in the nmi-safe version
> 
> Am I remembering correctly now ?
> 

Yes.  Both =r and =q allow REX opcodes.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED
  2008-04-09 15:08 ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
@ 2008-04-10  3:23   ` Rusty Russell
  2008-04-10 19:32     ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated) Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: Rusty Russell @ 2008-04-10  3:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Christoph Hellwig, akpm

On Thursday 10 April 2008 01:08:41 Mathieu Desnoyers wrote:
> +config IMMEDIATE
> +	default y if !DISABLE_IMMEDIATE

Wouldn't it be simlpler to roll DISABLE_IMMEDIATE into this?

ie.
	default y
	depends on HAVE_IMMEDIATE
	bool "Immediate value optimization" if EMBEDDED
	help
	  Immediate values are used as read-mostly variables that are rarely
	  updated. They use code patching to modify the values inscribed in the
	  instruction stream. It provides a way to save precious cache lines
	  that would otherwise have to be used by these variables.

	  It consumes slightly more memory and requires to modify the instruction
	  stream each time a variable is updated. Should really be disabled for
	  embedded systems with read-only text.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 16/17] Immediate Values - Documentation
  2008-04-09 15:08 ` [patch 16/17] Immediate Values - Documentation Mathieu Desnoyers
@ 2008-04-10  3:33   ` Rusty Russell
  2008-04-11  1:16     ` Mathieu Desnoyers
  2008-04-11 13:44     ` [RFC PATCH] Immediate Values Support init Mathieu Desnoyers
  0 siblings, 2 replies; 48+ messages in thread
From: Rusty Russell @ 2008-04-10  3:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> If you have to read the immediate values from a function declared as
> __init or __exit, you should explicitly use _imv_read(), which will fall
> back on a global variable read. Failing to do so will leave a reference to
> the __init section after it is freed (it would generate a modpost
> warning).

That's a real usability wart.  Couldn't we skip these in the patching loop if 
required and revert so noone can make this mistake?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 18:24     ` Andi Kleen
@ 2008-04-10  3:34       ` Rusty Russell
  2008-04-10  4:26       ` KOSAKI Motohiro
  1 sibling, 0 replies; 48+ messages in thread
From: Rusty Russell @ 2008-04-10  3:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexey Dobriyan, Mathieu Desnoyers, akpm, Ingo Molnar,
	linux-kernel, Jason Baron, Adrian Bunk, Christoph Hellwig, akpm

On Thursday 10 April 2008 04:24:54 Andi Kleen wrote:
> On Wed, Apr 09, 2008 at 10:10:10PM +0400, Alexey Dobriyan wrote:
> > I continue to think that all this so-called "immediate" infrastructure
> > is an absolute overkill and declared D-cache line savings will _never_
> > matter
>
> You are quite wrong on that.

Well, Alexey did make me read through the patches reasonably thoroughly.

I think it's interesting, and to be honest the current simplified approach 
just isn't that scary.  There are some minor warts, but it's generally sound.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (16 preceding siblings ...)
  2008-04-09 15:08 ` [patch 17/17] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
@ 2008-04-10  4:23 ` KOSAKI Motohiro
  2008-04-10  7:31 ` Takashi Nishiie
  18 siblings, 0 replies; 48+ messages in thread
From: KOSAKI Motohiro @ 2008-04-10  4:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: kosaki.motohiro, akpm, Ingo Molnar, linux-kernel, Andi Kleen,
	Rusty Russell

Hi Mathieu

> Here is the complete patchset required to get basic Immediate Values support
> ported to 2.6.25-rc8-mm1. It provides the "simple", non nmi-safe version of
> immediate values, which means that immediate values should not be used in code
> paths reachable by NMI or MCE handlers. This version also uses
> stop_machine_run() for immediate values updates, which is a very heavy lock.  In
> order to make incremental, easy to review, changes, I think we could start by
> this "simple" version as a first step before we switch to the nmi-safe version
> later.

I think this patch is marker performance improvement infrastructure, right?
if so, I will help debug and testing.

at least, I like this patch series :)




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 09/17] Add all cpus option to stop machine run
  2008-04-09 18:24     ` Andi Kleen
  2008-04-10  3:34       ` Rusty Russell
@ 2008-04-10  4:26       ` KOSAKI Motohiro
  1 sibling, 0 replies; 48+ messages in thread
From: KOSAKI Motohiro @ 2008-04-10  4:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kosaki.motohiro, Alexey Dobriyan, Mathieu Desnoyers, akpm,
	Ingo Molnar, linux-kernel, Rusty Russell, Jason Baron,
	Adrian Bunk, Christoph Hellwig, akpm

> > and will never be measured on real life workloads. Right now you
> > claim 1 (one) cacheline saving in schedule() which can be trivially
> 
> That is just an example. Once the infrastructure is in a lot more
> flags would move it into it. I think eventually most sysctls
> should be immediate values for once.

It seems very good viewpoint.
if sysctl value become immediate, I think some fastpath increase performance.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1
  2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
                   ` (17 preceding siblings ...)
  2008-04-10  4:23 ` [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 KOSAKI Motohiro
@ 2008-04-10  7:31 ` Takashi Nishiie
  18 siblings, 0 replies; 48+ messages in thread
From: Takashi Nishiie @ 2008-04-10  7:31 UTC (permalink / raw)
  To: linux-kernel

Hi Mathieu

> Here is the complete patchset required to get basic Immediate Values 
> support ported to 2.6.25-rc8-mm1. It provides the "simple", non 
> nmi-safe version of immediate values, which means that immediate 
> values should not be used in code paths reachable by NMI or MCE 
> handlers. This version also uses
> stop_machine_run() for immediate values updates, which is a very heavy 
> lock.  In order to make incremental, easy to review, changes, I think 
> we could start by this "simple" version as a first step before we 
> switch to the nmi-safe version later.

I am interested in the performance improvement of "marker."
It is important in arguing that it is "simple."
I will help debug and testing.

Thanks





^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 11/17] Implement immediate update via stop_machine_run
  2008-04-09 15:08 ` [patch 11/17] Implement immediate update via stop_machine_run Mathieu Desnoyers
@ 2008-04-10  8:04   ` KOSAKI Motohiro
  2008-04-10 20:01     ` Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: KOSAKI Motohiro @ 2008-04-10  8:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: kosaki.motohiro, akpm, Ingo Molnar, linux-kernel, Andi Kleen,
	Rusty Russell, Jason Baron, Adrian Bunk, Alexey Dobriyan,
	Christoph Hellwig, akpm

Hi

> -Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
>  cpus using IPIs leads to deadlocks. Process A held a read lock on 
>  tasklist_lock, then process B called apply_imv_update(). Process A received the 
>  IPI and begins executing ipi_busy_loop(). Then process C takes a write lock 
>  irq on the task list lock, before receiving the IPI. Thus, process A holds up 
>  process C, and C can't get an IPI b/c interrupts are disabled. Solve this 
>  problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which 
>  runs a function on all cpus after they are busy looping and have disabled 
>  irqs. Since this is done in a new process context, we don't have to worry 
>  about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
>  of testing...

it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
this patch remove almost portion of [10/17].
IMHO these patch merge into 1 patch is better.


> +static int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!wrote_text) {

it seems racy.
Why don't need test_and_set?

I think your stop_machine_run(ALL_CPUS) call fn concurrency...


> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		wrote_text = 1;
> +		smp_wmb(); /* make sure other cpus see that this has run */
> +	} else
> +		sync_core();
> +
> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
> +



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated)
  2008-04-10  3:23   ` Rusty Russell
@ 2008-04-10 19:32     ` Mathieu Desnoyers
  2008-04-10 21:54       ` Rusty Russell
  0 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-10 19:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Christoph Hellwig, akpm

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Thursday 10 April 2008 01:08:41 Mathieu Desnoyers wrote:
> > +config IMMEDIATE
> > +	default y if !DISABLE_IMMEDIATE
> 
> Wouldn't it be simlpler to roll DISABLE_IMMEDIATE into this?
> 
> ie.
> 	default y
> 	depends on HAVE_IMMEDIATE
> 	bool "Immediate value optimization" if EMBEDDED
> 	help
> 	  Immediate values are used as read-mostly variables that are rarely
> 	  updated. They use code patching to modify the values inscribed in the
> 	  instruction stream. It provides a way to save precious cache lines
> 	  that would otherwise have to be used by these variables.
> 
> 	  It consumes slightly more memory and requires to modify the instruction
> 	  stream each time a variable is updated. Should really be disabled for
> 	  embedded systems with read-only text.
> 

Sure, thanks for the tip. Here is the updated version.


Immediate Values - Kconfig menu in EMBEDDED

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Note: Since I think that I really should let embedded systems developers using
RO memory the option to disable the immediate values, I choose to leave this
menu option there, in the EMBEDDED menu. Also, the "CONFIG_IMMEDIATE" makes
sense because we want to compile out all the immediate code when we decide not
to use optimized immediate values at all (it removes otherwise unused code).

Changelog:
- Change ARCH_SUPPORTS_IMMEDIATE for HAS_IMMEDIATE
- Turn DISABLE_IMMEDIATE into positive logic

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
---
 init/Kconfig |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2008-04-10 15:22:37.000000000 -0400
+++ linux-2.6-lttng/init/Kconfig	2008-04-10 15:29:09.000000000 -0400
@@ -758,6 +758,24 @@ config PROC_PAGE_MONITOR
 	  /proc/kpagecount, and /proc/kpageflags. Disabling these
           interfaces will reduce the size of the kernel by approximately 4kb.
 
+config HAVE_IMMEDIATE
+	def_bool n
+
+config IMMEDIATE
+	default y
+	depends on HAVE_IMMEDIATE
+	bool "Immediate value optimization" if EMBEDDED
+	help
+	  Immediate values are used as read-mostly variables that are rarely
+	  updated. They use code patching to modify the values inscribed in the
+	  instruction stream. It provides a way to save precious cache lines
+	  that would otherwise have to be used by these variables. They can be
+	  disabled through the EMBEDDED menu.
+
+	  It consumes slightly more memory and requires to modify the
+	  instruction stream each time a variable is updated. Should really be
+	  disabled for embedded systems with read-only text.
+
 endmenu		# General setup
 
 config SLABINFO

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 11/17] Implement immediate update via stop_machine_run
  2008-04-10  8:04   ` KOSAKI Motohiro
@ 2008-04-10 20:01     ` Mathieu Desnoyers
  2008-04-11  4:50       ` KOSAKI Motohiro
  0 siblings, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-10 20:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Rusty Russell,
	Jason Baron, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig,
	akpm

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
> 
> > -Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
> >  cpus using IPIs leads to deadlocks. Process A held a read lock on 
> >  tasklist_lock, then process B called apply_imv_update(). Process A received the 
> >  IPI and begins executing ipi_busy_loop(). Then process C takes a write lock 
> >  irq on the task list lock, before receiving the IPI. Thus, process A holds up 
> >  process C, and C can't get an IPI b/c interrupts are disabled. Solve this 
> >  problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which 
> >  runs a function on all cpus after they are busy looping and have disabled 
> >  irqs. Since this is done in a new process context, we don't have to worry 
> >  about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
> >  of testing...
> 
> it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
> this patch remove almost portion of [10/17].
> IMHO these patch merge into 1 patch is better.
> 

Hi Kosaki,

You are right, I will merge them and resend them in the following post.

> 
> > +static int stop_machine_imv_update(void *imv_ptr)
> > +{
> > +	struct __imv *imv = imv_ptr;
> > +
> > +	if (!wrote_text) {
> 
> it seems racy.
> Why don't need test_and_set?
> 
> I think your stop_machine_run(ALL_CPUS) call fn concurrency...
> 
The answer to this mistery is in include/linux/stop_machine.h modified
by add-all-cpus-option-to-stop-machine-run.patch :


 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
  * @data: the data ptr for the @fn()
- * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
+ * @cpu: if @cpu == n, run @fn() on cpu n
+ *       if @cpu == NR_CPUS, run @fn() on any cpu
+ *       if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
+ *       concurrently on all the other cpus
  *
  * Description: This causes a thread to be scheduled on every other cpu,
  * each of which disables interrupts, and finally interrupts are disabled

Therefore, the first execution of the function is done before all other
executions.

Thanks,

Mathieu

> 
> > +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> > +		wrote_text = 1;
> > +		smp_wmb(); /* make sure other cpus see that this has run */
> > +	} else
> > +		sync_core();
> > +
> > +	flush_icache_range(imv->imv, imv->imv + imv->size);
> > +
> > +	return 0;
> > +}
> > +
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated)
  2008-04-10 19:32     ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated) Mathieu Desnoyers
@ 2008-04-10 21:54       ` Rusty Russell
  2008-04-14 23:52         ` Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: Rusty Russell @ 2008-04-10 21:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Christoph Hellwig, akpm

On Friday 11 April 2008 05:32:26 Mathieu Desnoyers wrote:
> +	  It consumes slightly more memory and requires to modify the
> +	  instruction stream each time a variable is updated.

Sorry, I missed this grammar bug last time.  Also, probable don't want to 
scare someone that we're changing instructions on every var change :)

	It consumes slightly more memory and modifies the instruction
	stream each time any specially-marked variable is updated.

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 16/17] Immediate Values - Documentation
  2008-04-10  3:33   ` Rusty Russell
@ 2008-04-11  1:16     ` Mathieu Desnoyers
  2008-04-11 15:06       ` Rusty Russell
  2008-04-11 13:44     ` [RFC PATCH] Immediate Values Support init Mathieu Desnoyers
  1 sibling, 1 reply; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-11  1:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > If you have to read the immediate values from a function declared as
> > __init or __exit, you should explicitly use _imv_read(), which will fall
> > back on a global variable read. Failing to do so will leave a reference to
> > the __init section after it is freed (it would generate a modpost
> > warning).
> 
> That's a real usability wart.  Couldn't we skip these in the patching loop if 
> required and revert so noone can make this mistake?
> 

Yeah, I know :(

Well, only if we can find a way to detect the macro is put within a init
or exit section. Is there some assembly trickery that would permit us to
do that ?

Otherwise, given the memory freed from the init section could be reused
later by the kernel, I don't see how we can detect the pointer leads to
a freed init section and, say, a module.

Mathieu

> Thanks,
> Rusty.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 11/17] Implement immediate update via stop_machine_run
  2008-04-10 20:01     ` Mathieu Desnoyers
@ 2008-04-11  4:50       ` KOSAKI Motohiro
  0 siblings, 0 replies; 48+ messages in thread
From: KOSAKI Motohiro @ 2008-04-11  4:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: KOSAKI Motohiro, akpm, Ingo Molnar, linux-kernel, Andi Kleen,
	Rusty Russell, Jason Baron, Adrian Bunk, Alexey Dobriyan,
	Christoph Hellwig, akpm, linuxdev-kanex

Hi  Mathieu,
>> it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
>> this patch remove almost portion of [10/17].
>> IMHO these patch merge into 1 patch is better.
>>
>>     
>
> Hi Kosaki,
>
> You are right, I will merge them and resend them in the following post.
>   
Thanks

>>> +static int stop_machine_imv_update(void *imv_ptr)
>>> +{
>>> +	struct __imv *imv = imv_ptr;
>>> +
>>> +	if (!wrote_text) {
>>>       
>> it seems racy.
>> Why don't need test_and_set?
>>
>> I think your stop_machine_run(ALL_CPUS) call fn concurrency...
>>     
> The answer to this mistery is in include/linux/stop_machine.h modified
> by add-all-cpus-option-to-stop-machine-run.patch :
>
>
>  /**
>   * stop_machine_run: freeze the machine on all CPUs and run this function
>   * @fn: the function to run
>   * @data: the data ptr for the @fn()
> - * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
> + * @cpu: if @cpu == n, run @fn() on cpu n
> + *       if @cpu == NR_CPUS, run @fn() on any cpu
> + *       if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
> + *       concurrently on all the other cpus
>   *
>   * Description: This causes a thread to be scheduled on every other cpu,
>   * each of which disables interrupts, and finally interrupts are disabled
>
> Therefore, the first execution of the function is done before all other
> executions.
>
>   
Ah, OK. I understand.
Thanks.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH] Immediate Values Support init
  2008-04-10  3:33   ` Rusty Russell
  2008-04-11  1:16     ` Mathieu Desnoyers
@ 2008-04-11 13:44     ` Mathieu Desnoyers
  1 sibling, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-11 13:44 UTC (permalink / raw)
  To: Rusty Russell
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

Supports placing immediate values in init code

We need to put the immediate values in RW data section so we can edit them
before init section unload.

This code puts NULL pointers in lieu of original pointer referencing init code
before the init sections are freed, both in the core kernel and in modules.

(experimental patch submitted for comments and testing)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 Documentation/immediate.txt       |    5 -----
 include/asm-generic/vmlinux.lds.h |    8 ++++----
 include/asm-powerpc/immediate.h   |    4 ++--
 include/asm-x86/immediate.h       |    6 +++---
 include/linux/immediate.h         |    7 ++++++-
 include/linux/module.h            |    2 +-
 init/main.c                       |    1 +
 kernel/immediate.c                |   32 ++++++++++++++++++++++++++++++--
 kernel/module.c                   |    2 ++
 9 files changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/immediate.c	2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/kernel/immediate.c	2008-04-11 09:41:33.000000000 -0400
@@ -22,6 +22,7 @@
 #include <linux/cpu.h>
 #include <linux/stop_machine.h>
 
+#include <asm/sections.h>
 #include <asm/cacheflush.h>
 
 /*
@@ -30,8 +31,8 @@
 static int imv_early_boot_complete;
 static int wrote_text;
 
-extern const struct __imv __start___imv[];
-extern const struct __imv __stop___imv[];
+extern struct __imv __start___imv[];
+extern struct __imv __stop___imv[];
 
 static int stop_machine_imv_update(void *imv_ptr)
 {
@@ -118,6 +119,8 @@ void imv_update_range(const struct __imv
 	int ret;
 	for (iter = begin; iter < end; iter++) {
 		mutex_lock(&imv_mutex);
+		if (!iter->imv)	/* Skip removed __init immediate values */
+			goto skip;
 		ret = apply_imv_update(iter);
 		if (imv_early_boot_complete && ret)
 			printk(KERN_WARNING
@@ -126,6 +129,7 @@ void imv_update_range(const struct __imv
 				"instruction at %p, size %hu\n",
 				(void *)iter->imv,
 				(void *)iter->var, iter->size);
+skip:
 		mutex_unlock(&imv_mutex);
 	}
 }
@@ -143,6 +147,30 @@ void core_imv_update(void)
 }
 EXPORT_SYMBOL_GPL(core_imv_update);
 
+/**
+ * imv_unref_init
+ *
+ * Deactivate any immediate value reference pointing into init code before
+ * it vanishes.
+ */
+void imv_unref_init(struct __imv *begin, struct __imv *end, void *init,
+		unsigned long init_size)
+{
+	struct __imv *iter;
+
+	for (iter = begin; iter < end; iter++)
+		if (iter->imv >= (unsigned long)init
+			&& iter->imv < (unsigned long)init + init_size)
+			iter->imv = 0UL;
+}
+
+void imv_unref_core_init(void)
+{
+	imv_unref_init(__start___imv, __stop___imv,
+		__init_begin,
+		(unsigned long)__init_end - (unsigned long)__init_begin);
+}
+
 void __init imv_init_complete(void)
 {
 	imv_early_boot_complete = 1;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2008-04-11 09:35:54.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2008-04-11 09:36:58.000000000 -0400
@@ -2208,6 +2208,8 @@ sys_init_module(void __user *umod,
 	/* Drop initial reference. */
 	module_put(mod);
 	unwind_remove_table(mod->unwind_info, 1);
+	imv_unref_init(mod->immediate, mod->immediate + mod->num_immediate,
+		mod->module_init, mod->init_size);
 	module_free(mod, mod->module_init);
 	mod->module_init = NULL;
 	mod->init_size = 0;
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2008-04-11 09:35:43.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2008-04-11 09:36:58.000000000 -0400
@@ -357,7 +357,7 @@ struct module
 	   keeping pointers to this stuff */
 	char *args;
 #ifdef CONFIG_IMMEDIATE
-	const struct __imv *immediate;
+	struct __imv *immediate;
 	unsigned int num_immediate;
 #endif
 #ifdef CONFIG_MARKERS
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2008-04-11 09:35:37.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2008-04-11 09:36:58.000000000 -0400
@@ -52,7 +52,10 @@
 	. = ALIGN(8);							\
 	VMLINUX_SYMBOL(__start___markers) = .;				\
 	*(__markers)							\
-	VMLINUX_SYMBOL(__stop___markers) = .;
+	VMLINUX_SYMBOL(__stop___markers) = .;				\
+	VMLINUX_SYMBOL(__start___imv) = .;				\
+	*(__imv)		/* Immediate values: pointers */ 	\
+	VMLINUX_SYMBOL(__stop___imv) = .;
 
 #define RO_DATA(align)							\
 	. = ALIGN((align));						\
@@ -61,9 +64,6 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
-		VMLINUX_SYMBOL(__start___imv) = .;			\
-		*(__imv)		/* Immediate values: pointers */ \
-		VMLINUX_SYMBOL(__stop___imv) = .;			\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\
Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/immediate.h	2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/linux/immediate.h	2008-04-11 09:36:58.000000000 -0400
@@ -46,6 +46,9 @@ struct __imv {
 extern void core_imv_update(void);
 extern void imv_update_range(const struct __imv *begin,
 	const struct __imv *end);
+extern void imv_unref_core_init(void);
+extern void imv_unref_init(struct __imv *begin, struct __imv *end, void *init,
+		unsigned long init_size);
 
 #else
 
@@ -73,7 +76,9 @@ extern void imv_update_range(const struc
 
 static inline void core_imv_update(void) { }
 static inline void module_imv_update(void) { }
-
+static inline void imv_unref_core_init(void) { }
+static inline void imv_unref_init(struct __imv *begin, struct __imv *end,
+		void *init, unsigned long init_size) { }
 #endif
 
 #define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2008-04-11 09:35:37.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2008-04-11 09:36:58.000000000 -0400
@@ -776,6 +776,7 @@ static void run_init_process(char *init_
  */
 static int noinline init_post(void)
 {
+	imv_unref_core_init();
 	free_initmem();
 	unlock_kernel();
 	mark_rodata_ro();
Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/immediate.h	2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/immediate.h	2008-04-11 09:38:23.000000000 -0400
@@ -33,7 +33,7 @@
 		BUILD_BUG_ON(sizeof(value) > 8);			\
 		switch (sizeof(value)) {				\
 		case 1:							\
-			asm(".section __imv,\"a\",@progbits\n\t"	\
+			asm(".section __imv,\"aw\",@progbits\n\t"	\
 				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
 				".byte %c2\n\t"				\
 				".previous\n\t"				\
@@ -45,7 +45,7 @@
 			break;						\
 		case 2:							\
 		case 4:							\
-			asm(".section __imv,\"a\",@progbits\n\t"	\
+			asm(".section __imv,\"aw\",@progbits\n\t"	\
 				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
 				".byte %c2\n\t"				\
 				".previous\n\t"				\
@@ -60,7 +60,7 @@
 				value = name##__imv;			\
 				break;					\
 			}						\
-			asm(".section __imv,\"a\",@progbits\n\t"	\
+			asm(".section __imv,\"aw\",@progbits\n\t"	\
 				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
 				".byte %c2\n\t"				\
 				".previous\n\t"				\
Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/immediate.h	2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2008-04-11 09:36:58.000000000 -0400
@@ -26,7 +26,7 @@
 		BUILD_BUG_ON(sizeof(value) > 8);			\
 		switch (sizeof(value)) {				\
 		case 1:							\
-			asm(".section __imv,\"a\",@progbits\n\t"	\
+			asm(".section __imv,\"aw\",@progbits\n\t"	\
 					PPC_LONG "%c1, ((1f)-1)\n\t"	\
 					".byte 1\n\t"			\
 					".previous\n\t"			\
@@ -36,7 +36,7 @@
 				: "i" (&name##__imv));			\
 			break;						\
 		case 2:							\
-			asm(".section __imv,\"a\",@progbits\n\t"	\
+			asm(".section __imv,\"aw\",@progbits\n\t"	\
 					PPC_LONG "%c1, ((1f)-2)\n\t"	\
 					".byte 2\n\t"			\
 					".previous\n\t"			\
Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- linux-2.6-lttng.orig/Documentation/immediate.txt	2008-04-11 09:43:03.000000000 -0400
+++ linux-2.6-lttng/Documentation/immediate.txt	2008-04-11 09:43:19.000000000 -0400
@@ -42,11 +42,6 @@ The immediate mechanism supports inserti
 immediate. Immediate values can be put in inline functions, inlined static
 functions, and unrolled loops.
 
-If you have to read the immediate values from a function declared as __init or
-__exit, you should explicitly use _imv_read(), which will fall back on a
-global variable read. Failing to do so will leave a reference to the __init
-section after it is freed (it would generate a modpost warning).
-
 You can choose to set an initial static value to the immediate by using, for
 instance:
 
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 16/17] Immediate Values - Documentation
  2008-04-11  1:16     ` Mathieu Desnoyers
@ 2008-04-11 15:06       ` Rusty Russell
  2008-04-15  0:12         ` Mathieu Desnoyers
  0 siblings, 1 reply; 48+ messages in thread
From: Rusty Russell @ 2008-04-11 15:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

On Friday 11 April 2008 11:16:47 Mathieu Desnoyers wrote:
> * Rusty Russell (rusty@rustcorp.com.au) wrote:
> > On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > > If you have to read the immediate values from a function declared as
> > > __init or __exit, you should explicitly use _imv_read(), which will
> > > fall back on a global variable read. Failing to do so will leave a
> > > reference to the __init section after it is freed (it would generate a
> > > modpost warning).
> >
> > That's a real usability wart.  Couldn't we skip these in the patching
> > loop if required and revert so noone can make this mistake?
>
> Yeah, I know :(
>
> Well, only if we can find a way to detect the macro is put within a init
> or exit section. Is there some assembly trickery that would permit us to
> do that ?
>
> Otherwise, given the memory freed from the init section could be reused
> later by the kernel, I don't see how we can detect the pointer leads to
> a freed init section and, say, a module.

In theory although not in practice, since everyone vmallocs modules.  Let's 
not rely on that tho.

How about we sweep the immediate table on init discard and remove/mark all the 
init and exit references?

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated)
  2008-04-10 21:54       ` Rusty Russell
@ 2008-04-14 23:52         ` Mathieu Desnoyers
  0 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-14 23:52 UTC (permalink / raw)
  To: Rusty Russell
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Christoph Hellwig, akpm

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Friday 11 April 2008 05:32:26 Mathieu Desnoyers wrote:
> > +	  It consumes slightly more memory and requires to modify the
> > +	  instruction stream each time a variable is updated.
> 
> Sorry, I missed this grammar bug last time.  Also, probable don't want to 
> scare someone that we're changing instructions on every var change :)
> 
> 	It consumes slightly more memory and modifies the instruction
> 	stream each time any specially-marked variable is updated.
> 

Will update, thanks!

> Thanks,
> Rusty.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 16/17] Immediate Values - Documentation
  2008-04-11 15:06       ` Rusty Russell
@ 2008-04-15  0:12         ` Mathieu Desnoyers
  0 siblings, 0 replies; 48+ messages in thread
From: Mathieu Desnoyers @ 2008-04-15  0:12 UTC (permalink / raw)
  To: Rusty Russell
  Cc: akpm, Ingo Molnar, linux-kernel, Andi Kleen, Adrian Bunk,
	Alexey Dobriyan, Christoph Hellwig, akpm

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Friday 11 April 2008 11:16:47 Mathieu Desnoyers wrote:
> > * Rusty Russell (rusty@rustcorp.com.au) wrote:
> > > On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > > > If you have to read the immediate values from a function declared as
> > > > __init or __exit, you should explicitly use _imv_read(), which will
> > > > fall back on a global variable read. Failing to do so will leave a
> > > > reference to the __init section after it is freed (it would generate a
> > > > modpost warning).
> > >
> > > That's a real usability wart.  Couldn't we skip these in the patching
> > > loop if required and revert so noone can make this mistake?
> >
> > Yeah, I know :(
> >
> > Well, only if we can find a way to detect the macro is put within a init
> > or exit section. Is there some assembly trickery that would permit us to
> > do that ?
> >
> > Otherwise, given the memory freed from the init section could be reused
> > later by the kernel, I don't see how we can detect the pointer leads to
> > a freed init section and, say, a module.
> 
> In theory although not in practice, since everyone vmallocs modules.  Let's 
> not rely on that tho.
> 
> How about we sweep the immediate table on init discard and remove/mark all the 
> init and exit references?
> 
> Cheers,
> Rusty.

I already posted a patch which nullifies the immediate values pointing
to init code after the init phase of the core kernel and the init phase
of modules, just before the init section is freed.

For the exit section, I could add some code which nullifies the
immediate values pointing to exit section for !CONFIG_MODULE_UNLOAD.
However, I would need to get the equivalent of init and init_size for
.exit too.

I wonder what would happen if someone declares a __exit function in a
builtin object, with an immediate value in it ? Is there a possibility
that it leaves a reference to code not even linked in ?

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2008-04-15  0:12 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-09 15:08 [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 Mathieu Desnoyers
2008-04-09 15:08 ` [patch 01/17] Kprobes - use a mutex to protect the instruction pages list Mathieu Desnoyers
2008-04-09 20:08   ` Masami Hiramatsu
2008-04-09 15:08 ` [patch 02/17] Kprobes - do not use kprobes mutex in arch code Mathieu Desnoyers
2008-04-09 20:08   ` Masami Hiramatsu
2008-04-09 15:08 ` [patch 03/17] Kprobes - declare kprobe_mutex static Mathieu Desnoyers
2008-04-09 20:08   ` Masami Hiramatsu
2008-04-09 15:08 ` [patch 04/17] x86 - Enhance DEBUG_RODATA support - alternatives Mathieu Desnoyers
2008-04-09 15:08 ` [patch 05/17] x86 Fix text_poke for vmalloced pages Mathieu Desnoyers
2008-04-09 15:08 ` [patch 06/17] x86 - Enhance DEBUG_RODATA support for hotplug and kprobes Mathieu Desnoyers
2008-04-09 15:08 ` [patch 07/17] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2008-04-09 15:08 ` [patch 08/17] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2008-04-09 15:08 ` [patch 09/17] Add all cpus option to stop machine run Mathieu Desnoyers
2008-04-09 18:10   ` Alexey Dobriyan
2008-04-09 18:24     ` Andi Kleen
2008-04-10  3:34       ` Rusty Russell
2008-04-10  4:26       ` KOSAKI Motohiro
2008-04-09 18:54     ` Mathieu Desnoyers
2008-04-09 15:08 ` [patch 10/17] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-04-09 15:08 ` [patch 11/17] Implement immediate update via stop_machine_run Mathieu Desnoyers
2008-04-10  8:04   ` KOSAKI Motohiro
2008-04-10 20:01     ` Mathieu Desnoyers
2008-04-11  4:50       ` KOSAKI Motohiro
2008-04-09 15:08 ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2008-04-10  3:23   ` Rusty Russell
2008-04-10 19:32     ` [patch 12/17] Immediate Values - Kconfig menu in EMBEDDED (updated) Mathieu Desnoyers
2008-04-10 21:54       ` Rusty Russell
2008-04-14 23:52         ` Mathieu Desnoyers
2008-04-09 15:08 ` [patch 13/17] Immediate Values - x86 Optimization Mathieu Desnoyers
2008-04-09 18:01   ` H. Peter Anvin
2008-04-09 19:08     ` Mathieu Desnoyers
2008-04-09 22:33       ` H. Peter Anvin
2008-04-10  0:42         ` Mathieu Desnoyers
2008-04-10  0:47           ` H. Peter Anvin
2008-04-09 20:21     ` [patch 13/17] Immediate Values - x86 Optimization (updated) Mathieu Desnoyers
2008-04-09 22:33       ` H. Peter Anvin
2008-04-09 23:15         ` Mathieu Desnoyers
2008-04-09 15:08 ` [patch 14/17] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2008-04-09 15:08 ` [patch 15/17] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2008-04-09 15:08 ` [patch 16/17] Immediate Values - Documentation Mathieu Desnoyers
2008-04-10  3:33   ` Rusty Russell
2008-04-11  1:16     ` Mathieu Desnoyers
2008-04-11 15:06       ` Rusty Russell
2008-04-15  0:12         ` Mathieu Desnoyers
2008-04-11 13:44     ` [RFC PATCH] Immediate Values Support init Mathieu Desnoyers
2008-04-09 15:08 ` [patch 17/17] Scheduler Profiling - Use Immediate Values Mathieu Desnoyers
2008-04-10  4:23 ` [patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1 KOSAKI Motohiro
2008-04-10  7:31 ` Takashi Nishiie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox