public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [0/11] Some more x86 patches for review
@ 2007-07-20 15:32 Andi Kleen
  2007-07-20 15:32 ` [PATCH] [1/11] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog Andi Kleen
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: patches, linux-kernel


Updated patches or new patches since last batch. All planned for a merge
soon. Please review.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [1/11] i386: Reserve the right performance counter  for the Intel PerfMon NMI watchdog
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: B.Steinbrink, patches, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 975 bytes --]


From: Björn Steinbrink <B.Steinbrink@gmx.de>

The Intel PerfMon NMI watchdog reserves the first performance counter,
but uses the second one. Make it correctly reserve the second one.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/perfctr-watchdog.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ linux/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -599,8 +599,8 @@ static struct wd_ops intel_arch_wd_ops =
 	.setup = setup_intel_arch_watchdog,
 	.rearm = p6_rearm,
 	.stop = single_msr_stop_watchdog,
-	.perfctr = MSR_ARCH_PERFMON_PERFCTR0,
-	.evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
+	.perfctr = MSR_ARCH_PERFMON_PERFCTR1,
+	.evntsel = MSR_ARCH_PERFMON_EVENTSEL1,
 };
 
 static void probe_nmi_watchdog(void)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
  2007-07-20 15:32 ` [PATCH] [1/11] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:59   ` Mathieu Desnoyers
                     ` (2 more replies)
  2007-07-20 15:32 ` [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching Andi Kleen
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: jbeulich, jeremy, compudj, zach, patches, linux-kernel


Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA
Add a general utility function to change write protected text.
The new function remaps the code using vmap to write it and takes
care of CPU synchronization. It also does CLFLUSH to make icache recovery
faster. 

There are some limitations on when the function can be used, see
the comment.

This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.

Contains bug fixes from Zach Amsden and suggestions from
Mathieu Desnoyers.

Cc: jbeulich@novell.com
Cc: jeremy@goop.org
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: zach@vmware.com

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/alternative.c   |   33 +++++++++++++++++++++++++++++++--
 arch/i386/kernel/kprobes.c       |    9 +++------
 arch/i386/mm/init.c              |   14 +++-----------
 arch/x86_64/kernel/kprobes.c     |   10 +++-------
 arch/x86_64/mm/init.c            |   10 ----------
 arch/x86_64/mm/pageattr.c        |    2 +-
 include/asm-i386/alternative.h   |    2 ++
 include/asm-x86_64/alternative.h |    2 ++
 include/asm-x86_64/pgtable.h     |    2 ++
 9 files changed, 47 insertions(+), 37 deletions(-)

Index: linux/arch/x86_64/kernel/kprobes.c
===================================================================
--- linux.orig/arch/x86_64/kernel/kprobes.c
+++ linux/arch/x86_64/kernel/kprobes.c
@@ -39,9 +39,9 @@
 #include <linux/module.h>
 #include <linux/kdebug.h>
 
-#include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
+#include <asm/alternative.h>
 
 void jprobe_return_end(void);
 static void __kprobes arch_copy_kprobe(struct kprobe *p);
@@ -209,16 +209,12 @@ static void __kprobes arch_copy_kprobe(s
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
-	*p->addr = BREAKPOINT_INSTRUCTION;
-	flush_icache_range((unsigned long) p->addr,
-			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
-	*p->addr = p->opcode;
-	flush_icache_range((unsigned long) p->addr,
-			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	text_poke(p->addr, &p->opcode, 1);
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -2,8 +2,12 @@
 #include <linux/sched.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/kprobes.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
+#include <asm/pgtable.h>
 
 #ifdef CONFIG_HOTPLUG_CPU
 static int smp_alt_once;
@@ -150,7 +154,7 @@ static void nop_out(void *insns, unsigne
 		unsigned int noplen = len;
 		if (noplen > ASM_NOP_MAX)
 			noplen = ASM_NOP_MAX;
-		memcpy(insns, noptable[noplen], noplen);
+		text_poke(insns, noptable[noplen], noplen);
 		insns += noplen;
 		len -= noplen;
 	}
@@ -202,7 +206,7 @@ static void alternatives_smp_lock(u8 **s
 			continue;
 		if (*ptr > text_end)
 			continue;
-		**ptr = 0xf0; /* lock prefix */
+		text_poke(*ptr, ((unsigned char []){0xf0}), 1); /* add lock prefix */
 	};
 }
 
@@ -360,10 +364,6 @@ void apply_paravirt(struct paravirt_patc
 		/* Pad the rest with nops */
 		nop_out(p->instr + used, p->len - used);
 	}
-
-	/* Sync to be conservative, in case we patched following
-	 * instructions */
-	sync_core();
 }
 extern struct paravirt_patch_site __start_parainstructions[],
 	__stop_parainstructions[];
@@ -406,3 +406,30 @@ void __init alternative_instructions(voi
  	apply_paravirt(__parainstructions, __parainstructions_end);
 	local_irq_restore(flags);
 }
+
+/*
+ * Warning: 
+ * When you use this code to patch more than one byte of an instruction
+ * you need to make sure that other CPUs cannot execute this code in parallel.
+ * And on the local CPU you need to be protected again NMI or MCE handlers
+ * seeing an inconsistent instruction while you patch.
+ */
+void __kprobes text_poke(void *oaddr, unsigned char *opcode, int len)
+{
+        u8 *addr = oaddr;
+	if (!pte_write(*lookup_address((unsigned long)addr))) {
+		struct page *p[2] = { virt_to_page(addr), virt_to_page(addr+PAGE_SIZE) };
+		addr = vmap(p, 2, VM_MAP, PAGE_KERNEL);
+		if (!addr)
+			return;
+		addr += ((unsigned long)oaddr) % PAGE_SIZE;
+	}
+	memcpy(addr, opcode, len);
+	sync_core();
+	/* Not strictly needed, but can speed CPU recovery up. Ignore cross cacheline
+	   case. */
+	if (cpu_has_clflush)
+		asm("clflush (%0) " :: "r" (oaddr) : "memory");
+	if (addr != oaddr)
+		vunmap(addr);
+}
Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -13,7 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
-static inline pte_t *lookup_address(unsigned long address) 
+pte_t *lookup_address(unsigned long address)
 { 
 	pgd_t *pgd = pgd_offset_k(address);
 	pud_t *pud;
Index: linux/include/asm-i386/alternative.h
===================================================================
--- linux.orig/include/asm-i386/alternative.h
+++ linux/include/asm-i386/alternative.h
@@ -149,4 +149,6 @@ apply_paravirt(struct paravirt_patch_sit
 #define __parainstructions_end	NULL
 #endif
 
+extern void text_poke(void *addr, unsigned char *opcode, int len);
+
 #endif /* _I386_ALTERNATIVE_H */
Index: linux/include/asm-x86_64/alternative.h
===================================================================
--- linux.orig/include/asm-x86_64/alternative.h
+++ linux/include/asm-x86_64/alternative.h
@@ -154,4 +154,6 @@ apply_paravirt(struct paravirt_patch *st
 #define __parainstructions_end NULL
 #endif
 
+extern void text_poke(void *addr, unsigned char *opcode, int len);
+
 #endif /* _X86_64_ALTERNATIVE_H */
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -403,6 +403,8 @@ extern struct list_head pgd_list;
 
 extern int kern_addr_valid(unsigned long addr); 
 
+pte_t *lookup_address(unsigned long addr);
+
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
 		remap_pfn_range(vma, vaddr, pfn, size, prot)
 
Index: linux/arch/i386/kernel/kprobes.c
===================================================================
--- linux.orig/arch/i386/kernel/kprobes.c
+++ linux/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
 #include <asm/cacheflush.h>
 #include <asm/desc.h>
 #include <asm/uaccess.h>
+#include <asm/alternative.h>
 
 void jprobe_return_end(void);
 
@@ -169,16 +170,12 @@ int __kprobes arch_prepare_kprobe(struct
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
-	*p->addr = BREAKPOINT_INSTRUCTION;
-	flush_icache_range((unsigned long) p->addr,
-			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
-	*p->addr = p->opcode;
-	flush_icache_range((unsigned long) p->addr,
-			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+	text_poke(p->addr, &p->opcode, 1);
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
Index: linux/arch/i386/mm/init.c
===================================================================
--- linux.orig/arch/i386/mm/init.c
+++ linux/arch/i386/mm/init.c
@@ -800,17 +800,9 @@ void mark_rodata_ro(void)
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long size = PFN_ALIGN(_etext) - start;
 
-#ifndef CONFIG_KPROBES
-#ifdef CONFIG_HOTPLUG_CPU
-	/* It must still be possible to apply SMP alternatives. */
-	if (num_possible_cpus() <= 1)
-#endif
-	{
-		change_page_attr(virt_to_page(start),
-		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
-		printk("Write protecting the kernel text: %luk\n", size >> 10);
-	}
-#endif
+	change_page_attr(virt_to_page(start),
+	                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
+	printk("Write protecting the kernel text: %luk\n", size >> 10);
 	start += size;
 	size = (unsigned long)__end_rodata - start;
 	change_page_attr(virt_to_page(start),
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -600,16 +600,6 @@ void mark_rodata_ro(void)
 {
 	unsigned long start = (unsigned long)_stext, end;
 
-#ifdef CONFIG_HOTPLUG_CPU
-	/* It must still be possible to apply SMP alternatives. */
-	if (num_possible_cpus() > 1)
-		start = (unsigned long)_etext;
-#endif
-
-#ifdef CONFIG_KPROBES
-	start = (unsigned long)__start_rodata;
-#endif
-	
 	end = (unsigned long)__end_rodata;
 	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
 	end &= PAGE_MASK;
Index: linux/arch/i386/kernel/paravirt.c
===================================================================
--- linux.orig/arch/i386/kernel/paravirt.c
+++ linux/arch/i386/kernel/paravirt.c
@@ -124,20 +124,28 @@ unsigned paravirt_patch_ignore(unsigned 
 	return len;
 }
 
+struct branch { 
+	unsigned char opcode;
+	u32 delta;
+} __attribute__((packed)); 
+
 unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
 			     void *site, u16 site_clobbers,
 			     unsigned len)
 {
 	unsigned char *call = site;
 	unsigned long delta = (unsigned long)target - (unsigned long)(call+5);
+	struct branch b;
 
 	if (tgt_clobbers & ~site_clobbers)
 		return len;	/* target would clobber too much for this site */
 	if (len < 5)
 		return len;	/* call too long for patch site */
 
-	*call++ = 0xe8;		/* call */
-	*(unsigned long *)call = delta;
+	b.opcode = 0xe8; /* call */
+	b.delta = delta;
+	BUILD_BUG_ON(sizeof(b) != 5);
+	text_poke(call, (unsigned char *)&b, 5);
 
 	return 5;
 }
@@ -150,8 +158,9 @@ unsigned paravirt_patch_jmp(void *target
 	if (len < 5)
 		return len;	/* call too long for patch site */
 
-	*jmp++ = 0xe9;		/* jmp */
-	*(unsigned long *)jmp = delta;
+	b.opcode = 0xe9;	/* jmp */
+	b.delta = delta;
+	text_poke(call, (unsigned char *)&b, 5);
 
 	return 5;
 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
  2007-07-20 15:32 ` [PATCH] [1/11] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog Andi Kleen
  2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:45   ` Jeremy Fitzhardinge
  2007-07-20 16:03   ` Mathieu Desnoyers
  2007-07-20 15:32 ` [PATCH] [4/11] x86_64: Set K8 CPUID flag for K8/Fam10h/Fam11h Andi Kleen
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: compudj, jeremy, patches, linux-kernel


When a machine check or NMI occurs while multiple byte code is patched
the CPU could theoretically see an inconsistent instruction and crash.
Prevent this by temporarily disabling MCEs and returning early
in the NMI handler.

Based on discussion with Mathieu Desnoyers.

Cc:  Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: jeremy@goop.org

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -8,6 +8,8 @@
 #include <asm/alternative.h>
 #include <asm/sections.h>
 #include <asm/pgtable.h>
+#include <asm/mce.h>
+#include <asm/nmi.h>
 
 #ifdef CONFIG_HOTPLUG_CPU
 static int smp_alt_once;
@@ -373,6 +375,12 @@ void __init alternative_instructions(voi
 {
 	unsigned long flags;
 
+	/* The patching is not fully atomic, so try to avoid local interruptions
+	   that might execute the to be patched code.
+	   Other CPUs are not running. */
+	stop_nmi();
+	stop_mce();
+
 	local_irq_save(flags);
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
@@ -405,6 +413,9 @@ void __init alternative_instructions(voi
 #endif
  	apply_paravirt(__parainstructions, __parainstructions_end);
 	local_irq_restore(flags);
+
+	restart_nmi();
+	restart_mce();
 }
 
 /*
Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -667,6 +667,20 @@ static struct miscdevice mce_log_device 
 	&mce_chrdev_ops,
 };
 
+static unsigned long old_cr4 __initdata;
+
+void __init stop_mce(void) 
+{
+	old_cr4 = read_cr4();
+	clear_in_cr4(X86_CR4_MCE);
+} 
+
+void __init restart_mce(void)
+{
+	if (old_cr4 & X86_CR4_MCE)
+		set_in_cr4(X86_CR4_MCE);	
+}
+
 /* 
  * Old style boot options parsing. Only for compatibility. 
  */
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -384,11 +384,14 @@ int __kprobes nmi_watchdog_tick(struct p
 	return rc;
 }
 
+static unsigned ignore_nmis;
+
 asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
 {
 	nmi_enter();
 	add_pda(__nmi_count,1);
-	default_do_nmi(regs);
+	if (!ignore_nmis) 
+		default_do_nmi(regs);
 	nmi_exit();
 }
 
@@ -401,6 +404,18 @@ int do_nmi_callback(struct pt_regs * reg
 	return 0;
 }
 
+void stop_nmi(void)
+{
+	ignore_nmis++;
+	acpi_nmi_disable();
+}
+
+void restart_nmi(void)
+{
+	ignore_nmis--;
+	acpi_nmi_enable();
+}
+
 #ifdef CONFIG_SYSCTL
 
 static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu)
Index: linux/include/asm-x86_64/mce.h
===================================================================
--- linux.orig/include/asm-x86_64/mce.h
+++ linux/include/asm-x86_64/mce.h
@@ -107,6 +107,9 @@ extern void do_machine_check(struct pt_r
 
 extern int mce_notify_user(void);
 
+extern void stop_mce(void);
+extern void restart_mce(void);
+
 #endif
 
 #endif
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -88,5 +88,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
 int lapic_watchdog_ok(void);
 void disable_lapic_nmi_watchdog(void);
 void enable_lapic_nmi_watchdog(void);
+void stop_nmi(void);
+void restart_nmi(void);
 
 #endif /* ASM_NMI_H */
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -774,6 +774,8 @@ static __kprobes void default_do_nmi(str
 	reassert_nmi();
 }
 
+static int ignore_nmis;
+
 fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
 {
 	int cpu;
@@ -784,11 +786,24 @@ fastcall __kprobes void do_nmi(struct pt
 
 	++nmi_count(cpu);
 
-	default_do_nmi(regs);
+	if (!ignore_nmis)
+		default_do_nmi(regs);
 
 	nmi_exit();
 }
 
+void stop_nmi(void)
+{
+	ignore_nmis++;
+	acpi_nmi_disable();
+}
+
+void restart_nmi(void)
+{
+	ignore_nmis--;
+	acpi_nmi_enable();
+}
+
 #ifdef CONFIG_KPROBES
 fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
 {
Index: linux/include/asm-i386/nmi.h
===================================================================
--- linux.orig/include/asm-i386/nmi.h
+++ linux/include/asm-i386/nmi.h
@@ -57,5 +57,7 @@ unsigned lapic_adjust_nmi_hz(unsigned hz
 int lapic_watchdog_ok(void);
 void disable_lapic_nmi_watchdog(void);
 void enable_lapic_nmi_watchdog(void);
+void stop_nmi(void);
+void restart_nmi(void);
 
 #endif /* ASM_NMI_H */
Index: linux/arch/i386/kernel/cpu/mcheck/mce.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/mce.c
+++ linux/arch/i386/kernel/cpu/mcheck/mce.c
@@ -60,6 +60,20 @@ void mcheck_init(struct cpuinfo_x86 *c)
 	}
 }
 
+static unsigned long old_cr4 __initdata;
+
+void __init stop_mce(void) 
+{
+	old_cr4 = read_cr4();
+	clear_in_cr4(X86_CR4_MCE);
+} 
+
+void __init restart_mce(void)
+{
+	if (old_cr4 & X86_CR4_MCE)
+		set_in_cr4(X86_CR4_MCE);	
+}
+
 static int __init mcheck_disable(char *str)
 {
 	mce_disabled = 1;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [4/11] x86_64: Set K8 CPUID flag for K8/Fam10h/Fam11h
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (2 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [5/11] x86: Unify alternative nop handling between i386 and x86-64 Andi Kleen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: patches, linux-kernel


Previously this flag was only used on 32bit, but some shared code can use
it now.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -612,6 +612,9 @@ static void __cpuinit init_amd(struct cp
 	else
 		num_cache_leaves = 3;
 
+	if (c->x86 == 0xf || c->x86 == 0x10 || c->x86 == 0x11)
+		set_bit(X86_FEATURE_K8, &c->x86_capability);
+
 	/* RDTSC can be speculated around */
 	clear_bit(X86_FEATURE_SYNC_RDTSC, &c->x86_capability);
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [5/11] x86: Unify alternative nop handling between i386 and x86-64
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (3 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [4/11] x86_64: Set K8 CPUID flag for K8/Fam10h/Fam11h Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [6/11] i386: Tune AMD Fam10h/11h like K8 Andi Kleen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: patches, linux-kernel


Move nops into own header file.
Removes some duplicated code and some ifdefs.
The x86-64 kernel now knows about Intel (and K7) nops.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-i386/processor-nops.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/processor-nops.h
@@ -0,0 +1,66 @@
+#ifndef _ASM_PROCESSOR_NOPS_H
+#define _ASM_PROCESSOR_NOPS_H 1
+
+/* generic versions from gas */
+#define GENERIC_NOP1	".byte 0x90\n"
+#define GENERIC_NOP2    	".byte 0x89,0xf6\n"
+#define GENERIC_NOP3        ".byte 0x8d,0x76,0x00\n"
+#define GENERIC_NOP4        ".byte 0x8d,0x74,0x26,0x00\n"
+#define GENERIC_NOP5        GENERIC_NOP1 GENERIC_NOP4
+#define GENERIC_NOP6	".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n"
+#define GENERIC_NOP7	".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n"
+#define GENERIC_NOP8	GENERIC_NOP1 GENERIC_NOP7
+
+/* Opteron nops */
+#define K8_NOP1 GENERIC_NOP1
+#define K8_NOP2	".byte 0x66,0x90\n" 
+#define K8_NOP3	".byte 0x66,0x66,0x90\n" 
+#define K8_NOP4	".byte 0x66,0x66,0x66,0x90\n" 
+#define K8_NOP5	K8_NOP3 K8_NOP2 
+#define K8_NOP6	K8_NOP3 K8_NOP3
+#define K8_NOP7	K8_NOP4 K8_NOP3
+#define K8_NOP8	K8_NOP4 K8_NOP4
+
+/* K7 nops */
+/* uses eax dependencies (arbitary choice) */
+#define K7_NOP1  GENERIC_NOP1
+#define K7_NOP2	".byte 0x8b,0xc0\n" 
+#define K7_NOP3	".byte 0x8d,0x04,0x20\n"
+#define K7_NOP4	".byte 0x8d,0x44,0x20,0x00\n"
+#define K7_NOP5	K7_NOP4 ASM_NOP1
+#define K7_NOP6	".byte 0x8d,0x80,0,0,0,0\n"
+#define K7_NOP7        ".byte 0x8D,0x04,0x05,0,0,0,0\n"
+#define K7_NOP8        K7_NOP7 ASM_NOP1
+
+#ifdef CONFIG_MK8
+#define ASM_NOP1 K8_NOP1
+#define ASM_NOP2 K8_NOP2
+#define ASM_NOP3 K8_NOP3
+#define ASM_NOP4 K8_NOP4
+#define ASM_NOP5 K8_NOP5
+#define ASM_NOP6 K8_NOP6
+#define ASM_NOP7 K8_NOP7
+#define ASM_NOP8 K8_NOP8
+#elif defined(CONFIG_MK7)
+#define ASM_NOP1 K7_NOP1
+#define ASM_NOP2 K7_NOP2
+#define ASM_NOP3 K7_NOP3
+#define ASM_NOP4 K7_NOP4
+#define ASM_NOP5 K7_NOP5
+#define ASM_NOP6 K7_NOP6
+#define ASM_NOP7 K7_NOP7
+#define ASM_NOP8 K7_NOP8
+#else
+#define ASM_NOP1 GENERIC_NOP1
+#define ASM_NOP2 GENERIC_NOP2
+#define ASM_NOP3 GENERIC_NOP3
+#define ASM_NOP4 GENERIC_NOP4
+#define ASM_NOP5 GENERIC_NOP5
+#define ASM_NOP6 GENERIC_NOP6
+#define ASM_NOP7 GENERIC_NOP7
+#define ASM_NOP8 GENERIC_NOP8
+#endif
+
+#define ASM_NOP_MAX 8
+
+#endif
Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -22,6 +22,7 @@
 #include <linux/cpumask.h>
 #include <linux/init.h>
 #include <asm/processor-flags.h>
+#include <asm/processor-nops.h>
 
 /* flag for disabling the tsc */
 extern int tsc_disable;
@@ -654,68 +655,6 @@ static inline unsigned int cpuid_edx(uns
 	return edx;
 }
 
-/* generic versions from gas */
-#define GENERIC_NOP1	".byte 0x90\n"
-#define GENERIC_NOP2    	".byte 0x89,0xf6\n"
-#define GENERIC_NOP3        ".byte 0x8d,0x76,0x00\n"
-#define GENERIC_NOP4        ".byte 0x8d,0x74,0x26,0x00\n"
-#define GENERIC_NOP5        GENERIC_NOP1 GENERIC_NOP4
-#define GENERIC_NOP6	".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n"
-#define GENERIC_NOP7	".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n"
-#define GENERIC_NOP8	GENERIC_NOP1 GENERIC_NOP7
-
-/* Opteron nops */
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP2	".byte 0x66,0x90\n" 
-#define K8_NOP3	".byte 0x66,0x66,0x90\n" 
-#define K8_NOP4	".byte 0x66,0x66,0x66,0x90\n" 
-#define K8_NOP5	K8_NOP3 K8_NOP2 
-#define K8_NOP6	K8_NOP3 K8_NOP3
-#define K8_NOP7	K8_NOP4 K8_NOP3
-#define K8_NOP8	K8_NOP4 K8_NOP4
-
-/* K7 nops */
-/* uses eax dependencies (arbitary choice) */
-#define K7_NOP1  GENERIC_NOP1
-#define K7_NOP2	".byte 0x8b,0xc0\n" 
-#define K7_NOP3	".byte 0x8d,0x04,0x20\n"
-#define K7_NOP4	".byte 0x8d,0x44,0x20,0x00\n"
-#define K7_NOP5	K7_NOP4 ASM_NOP1
-#define K7_NOP6	".byte 0x8d,0x80,0,0,0,0\n"
-#define K7_NOP7        ".byte 0x8D,0x04,0x05,0,0,0,0\n"
-#define K7_NOP8        K7_NOP7 ASM_NOP1
-
-#ifdef CONFIG_MK8
-#define ASM_NOP1 K8_NOP1
-#define ASM_NOP2 K8_NOP2
-#define ASM_NOP3 K8_NOP3
-#define ASM_NOP4 K8_NOP4
-#define ASM_NOP5 K8_NOP5
-#define ASM_NOP6 K8_NOP6
-#define ASM_NOP7 K8_NOP7
-#define ASM_NOP8 K8_NOP8
-#elif defined(CONFIG_MK7)
-#define ASM_NOP1 K7_NOP1
-#define ASM_NOP2 K7_NOP2
-#define ASM_NOP3 K7_NOP3
-#define ASM_NOP4 K7_NOP4
-#define ASM_NOP5 K7_NOP5
-#define ASM_NOP6 K7_NOP6
-#define ASM_NOP7 K7_NOP7
-#define ASM_NOP8 K7_NOP8
-#else
-#define ASM_NOP1 GENERIC_NOP1
-#define ASM_NOP2 GENERIC_NOP2
-#define ASM_NOP3 GENERIC_NOP3
-#define ASM_NOP4 GENERIC_NOP4
-#define ASM_NOP5 GENERIC_NOP5
-#define ASM_NOP6 GENERIC_NOP6
-#define ASM_NOP7 GENERIC_NOP7
-#define ASM_NOP8 GENERIC_NOP8
-#endif
-
-#define ASM_NOP_MAX 8
-
 /* Prefetch instructions for Pentium III and AMD Athlon */
 /* It's not worth to care about 3dnow! prefetches for the K6
    because they are microcoded there and very slow.
Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -21,6 +21,7 @@
 #include <linux/personality.h>
 #include <linux/cpumask.h>
 #include <asm/processor-flags.h>
+#include <asm-i386/processor-nops.h>
 
 #define TF_MASK		0x00000100
 #define IF_MASK		0x00000200
@@ -339,28 +340,6 @@ struct extended_sigtable {
 	struct extended_signature sigs[0];
 };
 
-
-#define ASM_NOP1 K8_NOP1
-#define ASM_NOP2 K8_NOP2
-#define ASM_NOP3 K8_NOP3
-#define ASM_NOP4 K8_NOP4
-#define ASM_NOP5 K8_NOP5
-#define ASM_NOP6 K8_NOP6
-#define ASM_NOP7 K8_NOP7
-#define ASM_NOP8 K8_NOP8
-
-/* Opteron nops */
-#define K8_NOP1 ".byte 0x90\n"
-#define K8_NOP2	".byte 0x66,0x90\n" 
-#define K8_NOP3	".byte 0x66,0x66,0x90\n" 
-#define K8_NOP4	".byte 0x66,0x66,0x66,0x90\n" 
-#define K8_NOP5	K8_NOP3 K8_NOP2 
-#define K8_NOP6	K8_NOP3 K8_NOP3
-#define K8_NOP7	K8_NOP4 K8_NOP3
-#define K8_NOP8	K8_NOP4 K8_NOP4
-
-#define ASM_NOP_MAX 8
-
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static inline void rep_nop(void)
 {
Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -56,7 +56,6 @@ __setup("noreplace-paravirt", setup_nore
 #define DPRINTK(fmt, args...) if (debug_alternative) \
 	printk(KERN_DEBUG fmt, args)
 
-#ifdef GENERIC_NOP1
 /* Use inline assembly to define this because the nops are defined
    as inline assembly strings in the include files and we cannot
    get them easily into strings. */
@@ -75,9 +74,7 @@ static unsigned char *intel_nops[ASM_NOP
 	intelnops + 1 + 2 + 3 + 4 + 5 + 6,
 	intelnops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
 };
-#endif
 
-#ifdef K8_NOP1
 asm("\t.data\nk8nops: "
 	K8_NOP1 K8_NOP2 K8_NOP3 K8_NOP4 K8_NOP5 K8_NOP6
 	K8_NOP7 K8_NOP8);
@@ -93,9 +90,7 @@ static unsigned char *k8_nops[ASM_NOP_MA
 	k8nops + 1 + 2 + 3 + 4 + 5 + 6,
 	k8nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
 };
-#endif
 
-#ifdef K7_NOP1
 asm("\t.data\nk7nops: "
 	K7_NOP1 K7_NOP2 K7_NOP3 K7_NOP4 K7_NOP5 K7_NOP6
 	K7_NOP7 K7_NOP8);
@@ -111,17 +106,10 @@ static unsigned char *k7_nops[ASM_NOP_MA
 	k7nops + 1 + 2 + 3 + 4 + 5 + 6,
 	k7nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
 };
-#endif
 
 #ifdef CONFIG_X86_64
-
 extern char __vsyscall_0;
-static inline unsigned char** find_nop_table(void)
-{
-	return k8_nops;
-}
-
-#else /* CONFIG_X86_64 */
+#endif
 
 static struct nop {
 	int cpuid;
@@ -146,8 +134,6 @@ static unsigned char** find_nop_table(vo
 	return noptable;
 }
 
-#endif /* CONFIG_X86_64 */
-
 static void nop_out(void *insns, unsigned int len)
 {
 	unsigned char **noptable = find_nop_table();

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [6/11] i386: Tune AMD Fam10h/11h like K8
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (4 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [5/11] x86: Unify alternative nop handling between i386 and x86-64 Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [7/11] i386: Do not include other cpus' interrupt 0 in nmi_watchdog Andi Kleen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: patches, linux-kernel


This mainly changes the nops for alternative, so not very revolutionary.
 
Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/i386/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/amd.c
+++ linux/arch/i386/kernel/cpu/amd.c
@@ -230,6 +230,9 @@ static void __cpuinit init_amd(struct cp
 	}
 
 	switch (c->x86) {
+		/* Use K8 tuning for Fam10h and Fam11h */
+	case 0x10:	
+	case 0x11:
 	case 15:
 		set_bit(X86_FEATURE_K8, c->x86_capability);
 		break;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [7/11] i386: Do not include other cpus' interrupt 0 in nmi_watchdog
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (5 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [6/11] i386: Tune AMD Fam10h/11h like K8 Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [8/11] x86_64: x86_64 - Use non locked version for local_cmpxchg() Andi Kleen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: kaos, patches, linux-kernel


From: Keith Owens <kaos@ocs.com.au>
kstat_irqs(0) includes the count of interrupt 0 from all cpus, not just
the current cpu.  The updated interrupt 0 on other cpus can stop the
nmi_watchdog from tripping, so only include the current cpu's int 0.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/nmi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/i386/kernel/nmi.c
+++ linux/arch/i386/kernel/nmi.c
@@ -353,7 +353,7 @@ __kprobes int nmi_watchdog_tick(struct p
 	 * Take the local apic timer and PIT/HPET into account. We don't
 	 * know which one is active, when we have highres/dyntick on
 	 */
-	sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_irqs(0);
+	sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0];
 
 	/* if the none of the timers isn't firing, this cpu isn't doing much */
 	if (!touched && last_irq_sums[cpu] == sum) {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [8/11] x86_64: x86_64 - Use non locked version for local_cmpxchg()
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (6 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [7/11] i386: Do not include other cpus' interrupt 0 in nmi_watchdog Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions Andi Kleen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: mathieu.desnoyers, andi, patches, linux-kernel


From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
x86_64 - Use non locked version for local_cmpxchg()

local_cmpxchg() should not use any LOCK prefix. This change probably got lost in
the move to cmpxchg.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
CC: Andi Kleen <andi@firstfloor.org>
---
 include/asm-x86_64/cmpxchg.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/asm-x86_64/cmpxchg.h
===================================================================
--- linux.orig/include/asm-x86_64/cmpxchg.h
+++ linux/include/asm-x86_64/cmpxchg.h
@@ -128,7 +128,7 @@ static inline unsigned long __cmpxchg_lo
 	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 #define cmpxchg_local(ptr,o,n)\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
+	((__typeof__(*(ptr)))__cmpxchg_local((ptr),(unsigned long)(o),\
 					(unsigned long)(n),sizeof(*(ptr))))
 
 #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (7 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [8/11] x86_64: x86_64 - Use non locked version for local_cmpxchg() Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-21 12:00   ` Alexey Dobriyan
  2007-07-20 15:32 ` [PATCH] [10/11] i386: Handle P6s without performance counters in nmi watchdog Andi Kleen
  2007-07-20 15:32 ` [PATCH] [11/11] i386: Use patchable lock prefix in set_64bit Andi Kleen
  10 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: juergen, patches, linux-kernel


From: Juergen Beisert <juergen@kreuzholzen.de>
With the macros a line like this fails (and does nothing):
	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
With inlined functions this line will work as expected.

Note about a side effect: Seems on Geode GX1 based systems the
"suspend on halt power saving feature" was never enabled due to this
wrong macro expansion. With inlined functions it will be enabled, but
this will stop the TSC when the CPU runs into a HLT instruction.
Kernel output something like this:
	Clocksource tsc unstable (delta = -472746897 ns)

This is the 3rd version of this patch.

 - Adding missed arch/i386/kernel/cpu/mtrr/state.c
	Thanks to Andres Salomon
 - Adding some big fat comments into the new header file
 	Suggested by Andi Kleen

AK: fixed x86-64 compilation

Signed-off-by: Juergen Beisert <juergen@kreuzholzen.de>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/cpufreq/gx-suspmod.c |    2 +-
 arch/i386/kernel/cpu/cyrix.c              |    2 +-
 arch/i386/kernel/cpu/mtrr/cyrix.c         |    1 +
 arch/i386/kernel/cpu/mtrr/state.c         |    1 +
 include/asm-i386/processor-cyrix.h        |   30 ++++++++++++++++++++++++++++++
 include/asm-i386/processor.h              |   11 -----------
 include/asm-x86_64/processor.h            |   11 -----------
 7 files changed, 34 insertions(+), 24 deletions(-)

Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -168,17 +168,6 @@ static inline void clear_in_cr4 (unsigne
 	write_cr4(cr4);
 }
 
-/*
- *      NSC/Cyrix CPU indexed register access macros
- */
-
-#define getCx86(reg) ({ outb((reg), 0x22); inb(0x23); })
-
-#define setCx86(reg, data) do { \
-	outb((reg), 0x22); \
-	outb((data), 0x23); \
-} while (0)
-
 /* Stop speculative execution */
 static inline void sync_core(void)
 {
Index: linux/include/asm-i386/processor-cyrix.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/processor-cyrix.h
@@ -0,0 +1,30 @@
+/*
+ * NSC/Cyrix CPU indexed register access. Must be inlined instead of
+ * macros to ensure correct access ordering
+ * Access order is always 0x22 (=offset), 0x23 (=value)
+ *
+ * When using the old macros a line like
+ *   setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
+ * gets expanded to:
+ *  do {
+ *    outb((CX86_CCR2), 0x22);
+ *    outb((({
+ *        outb((CX86_CCR2), 0x22);
+ *        inb(0x23);
+ *    }) | 0x88), 0x23);
+ *  } while (0);
+ *
+ * which in fact violates the access order (= 0x22, 0x22, 0x23, 0x23).
+ */
+
+static inline u8 getCx86(u8 reg)
+{
+	outb(reg, 0x22);
+	return inb(0x23);
+}
+
+static inline void setCx86(u8 reg, u8 data)
+{
+	outb(reg, 0x22);
+	outb(data, 0x23);
+}
Index: linux/arch/i386/kernel/cpu/cyrix.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/cyrix.c
+++ linux/arch/i386/kernel/cpu/cyrix.c
@@ -4,7 +4,7 @@
 #include <linux/pci.h>
 #include <asm/dma.h>
 #include <asm/io.h>
-#include <asm/processor.h>
+#include <asm/processor-cyrix.h>
 #include <asm/timer.h>
 #include <asm/pci-direct.h>
 #include <asm/tsc.h>
Index: linux/arch/i386/kernel/cpu/mtrr/cyrix.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mtrr/cyrix.c
+++ linux/arch/i386/kernel/cpu/mtrr/cyrix.c
@@ -3,6 +3,7 @@
 #include <asm/mtrr.h>
 #include <asm/msr.h>
 #include <asm/io.h>
+#include <asm/processor-cyrix.h>
 #include "mtrr.h"
 
 int arr3_protected;
Index: linux/arch/i386/kernel/cpu/cpufreq/gx-suspmod.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/cpufreq/gx-suspmod.c
+++ linux/arch/i386/kernel/cpu/cpufreq/gx-suspmod.c
@@ -79,7 +79,7 @@
 #include <linux/smp.h>
 #include <linux/cpufreq.h>
 #include <linux/pci.h>
-#include <asm/processor.h>
+#include <asm/processor-cyrix.h>
 #include <asm/errno.h>
 
 /* PCI config registers, all at F0 */
Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -389,17 +389,6 @@ static inline void prefetchw(void *x) 
 
 #define cpu_relax()   rep_nop()
 
-/*
- *      NSC/Cyrix CPU indexed register access macros
- */
-
-#define getCx86(reg) ({ outb((reg), 0x22); inb(0x23); })
-
-#define setCx86(reg, data) do { \
-	outb((reg), 0x22); \
-	outb((data), 0x23); \
-} while (0)
-
 static inline void serialize_cpu(void)
 {
 	__asm__ __volatile__ ("cpuid" : : : "ax", "bx", "cx", "dx");
Index: linux/arch/i386/kernel/cpu/mtrr/state.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mtrr/state.c
+++ linux/arch/i386/kernel/cpu/mtrr/state.c
@@ -3,6 +3,7 @@
 #include <asm/io.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
+#include <asm-i386/processor-cyrix.h>
 #include "mtrr.h"
 
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [10/11] i386: Handle P6s without performance counters in nmi watchdog
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (8 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  2007-07-20 15:32 ` [PATCH] [11/11] i386: Use patchable lock prefix in set_64bit Andi Kleen
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: avi, patches, linux-kernel


I got an oops while booting a 32bit kernel on KVM because it doesn't
implement performance counters used by the NMI watchdog. Handle this
case. 

Cc: avi@qumranet.com

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/i386/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ linux/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -346,7 +346,9 @@ static int setup_p6_watchdog(unsigned nm
 	perfctr_msr = MSR_P6_PERFCTR0;
 	evntsel_msr = MSR_P6_EVNTSEL0;
 
-	wrmsrl(perfctr_msr, 0UL);
+	/* KVM doesn't implement this MSR */
+	if (wrmsr_safe(perfctr_msr, 0, 0) < 0) 
+		return 0;
 
 	evntsel = P6_EVNTSEL_INT
 		| P6_EVNTSEL_OS

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] [11/11] i386: Use patchable lock prefix in set_64bit
  2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
                   ` (9 preceding siblings ...)
  2007-07-20 15:32 ` [PATCH] [10/11] i386: Handle P6s without performance counters in nmi watchdog Andi Kleen
@ 2007-07-20 15:32 ` Andi Kleen
  10 siblings, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-20 15:32 UTC (permalink / raw)
  To: patches, linux-kernel


Previously lock was unconditionally used, but shouldn't be needed on
UP systems.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-i386/cmpxchg.h
===================================================================
--- linux.orig/include/asm-i386/cmpxchg.h
+++ linux/include/asm-i386/cmpxchg.h
@@ -34,7 +34,7 @@ static inline void __set_64bit (unsigned
 		"\n1:\t"
 		"movl (%0), %%eax\n\t"
 		"movl 4(%0), %%edx\n\t"
-		"lock cmpxchg8b (%0)\n\t"
+		LOCK_PREFIX "cmpxchg8b (%0)\n\t"
 		"jnz 1b"
 		: /* no outputs */
 		:	"D"(ptr),

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching
  2007-07-20 15:32 ` [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching Andi Kleen
@ 2007-07-20 15:45   ` Jeremy Fitzhardinge
  2007-07-20 16:03   ` Mathieu Desnoyers
  1 sibling, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-20 15:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: compudj, patches, linux-kernel

Andi Kleen wrote:
> +void stop_nmi(void)
> +{
> +	ignore_nmis++;
> +	acpi_nmi_disable();
> +}
> +
> +void restart_nmi(void)
> +{
> +	ignore_nmis--;
> +	acpi_nmi_enable();
> +}
>   

Wouldn't it be better to just assign ignore_nmis rather than inc/dec it
(and perhaps BUG on a double stop or restart)?  Also, is
acpi_nmi_disable/enable's use of on_each_cpu safe with interrupts
disabled if we know there's only one CPU alive at this point?

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
@ 2007-07-20 15:59   ` Mathieu Desnoyers
  2007-07-23 14:46   ` Mathieu Desnoyers
  2007-07-24  6:06   ` Rusty Russell
  2 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-20 15:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jbeulich, jeremy, zach, patches, linux-kernel

* Andi Kleen (ak@suse.de) wrote:
...
> +/*
> + * Warning: 
> + * When you use this code to patch more than one byte of an instruction
> + * you need to make sure that other CPUs cannot execute this code in parallel.
> + * And on the local CPU you need to be protected again NMI or MCE handlers
> + * seeing an inconsistent instruction while you patch.
> + */

I guess it would be good to also mention the fact that no thread must be
preempted in the middle of multiple instructions combined into one that
would make the iret run an illegal instruction when the thread is
scheduled again.

Mathieu


> +void __kprobes text_poke(void *oaddr, unsigned char *opcode, int len)
> +{
> +        u8 *addr = oaddr;
> +	if (!pte_write(*lookup_address((unsigned long)addr))) {
> +		struct page *p[2] = { virt_to_page(addr), virt_to_page(addr+PAGE_SIZE) };
> +		addr = vmap(p, 2, VM_MAP, PAGE_KERNEL);
> +		if (!addr)
> +			return;
> +		addr += ((unsigned long)oaddr) % PAGE_SIZE;
> +	}
> +	memcpy(addr, opcode, len);
> +	sync_core();
> +	/* Not strictly needed, but can speed CPU recovery up. Ignore cross cacheline
> +	   case. */
> +	if (cpu_has_clflush)
> +		asm("clflush (%0) " :: "r" (oaddr) : "memory");
> +	if (addr != oaddr)
> +		vunmap(addr);
> +}
> Index: linux/arch/x86_64/mm/pageattr.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/pageattr.c
> +++ linux/arch/x86_64/mm/pageattr.c
> @@ -13,7 +13,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/io.h>
>  
> -static inline pte_t *lookup_address(unsigned long address) 
> +pte_t *lookup_address(unsigned long address)
>  { 
>  	pgd_t *pgd = pgd_offset_k(address);
>  	pud_t *pud;
> Index: linux/include/asm-i386/alternative.h
> ===================================================================
> --- linux.orig/include/asm-i386/alternative.h
> +++ linux/include/asm-i386/alternative.h
> @@ -149,4 +149,6 @@ apply_paravirt(struct paravirt_patch_sit
>  #define __parainstructions_end	NULL
>  #endif
>  
> +extern void text_poke(void *addr, unsigned char *opcode, int len);
> +
>  #endif /* _I386_ALTERNATIVE_H */
> Index: linux/include/asm-x86_64/alternative.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/alternative.h
> +++ linux/include/asm-x86_64/alternative.h
> @@ -154,4 +154,6 @@ apply_paravirt(struct paravirt_patch *st
>  #define __parainstructions_end NULL
>  #endif
>  
> +extern void text_poke(void *addr, unsigned char *opcode, int len);
> +
>  #endif /* _X86_64_ALTERNATIVE_H */
> Index: linux/include/asm-x86_64/pgtable.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/pgtable.h
> +++ linux/include/asm-x86_64/pgtable.h
> @@ -403,6 +403,8 @@ extern struct list_head pgd_list;
>  
>  extern int kern_addr_valid(unsigned long addr); 
>  
> +pte_t *lookup_address(unsigned long addr);
> +
>  #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
>  		remap_pfn_range(vma, vaddr, pfn, size, prot)
>  
> Index: linux/arch/i386/kernel/kprobes.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/kprobes.c
> +++ linux/arch/i386/kernel/kprobes.c
> @@ -35,6 +35,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/desc.h>
>  #include <asm/uaccess.h>
> +#include <asm/alternative.h>
>  
>  void jprobe_return_end(void);
>  
> @@ -169,16 +170,12 @@ int __kprobes arch_prepare_kprobe(struct
>  
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = BREAKPOINT_INSTRUCTION;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
>  }
>  
>  void __kprobes arch_disarm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = p->opcode;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, &p->opcode, 1);
>  }
>  
>  void __kprobes arch_remove_kprobe(struct kprobe *p)
> Index: linux/arch/i386/mm/init.c
> ===================================================================
> --- linux.orig/arch/i386/mm/init.c
> +++ linux/arch/i386/mm/init.c
> @@ -800,17 +800,9 @@ void mark_rodata_ro(void)
>  	unsigned long start = PFN_ALIGN(_text);
>  	unsigned long size = PFN_ALIGN(_etext) - start;
>  
> -#ifndef CONFIG_KPROBES
> -#ifdef CONFIG_HOTPLUG_CPU
> -	/* It must still be possible to apply SMP alternatives. */
> -	if (num_possible_cpus() <= 1)
> -#endif
> -	{
> -		change_page_attr(virt_to_page(start),
> -		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
> -		printk("Write protecting the kernel text: %luk\n", size >> 10);
> -	}
> -#endif
> +	change_page_attr(virt_to_page(start),
> +	                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
> +	printk("Write protecting the kernel text: %luk\n", size >> 10);
>  	start += size;
>  	size = (unsigned long)__end_rodata - start;
>  	change_page_attr(virt_to_page(start),
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -600,16 +600,6 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long start = (unsigned long)_stext, end;
>  
> -#ifdef CONFIG_HOTPLUG_CPU
> -	/* It must still be possible to apply SMP alternatives. */
> -	if (num_possible_cpus() > 1)
> -		start = (unsigned long)_etext;
> -#endif
> -
> -#ifdef CONFIG_KPROBES
> -	start = (unsigned long)__start_rodata;
> -#endif
> -	
>  	end = (unsigned long)__end_rodata;
>  	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
>  	end &= PAGE_MASK;
> Index: linux/arch/i386/kernel/paravirt.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/paravirt.c
> +++ linux/arch/i386/kernel/paravirt.c
> @@ -124,20 +124,28 @@ unsigned paravirt_patch_ignore(unsigned 
>  	return len;
>  }
>  
> +struct branch { 
> +	unsigned char opcode;
> +	u32 delta;
> +} __attribute__((packed)); 
> +
>  unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
>  			     void *site, u16 site_clobbers,
>  			     unsigned len)
>  {
>  	unsigned char *call = site;
>  	unsigned long delta = (unsigned long)target - (unsigned long)(call+5);
> +	struct branch b;
>  
>  	if (tgt_clobbers & ~site_clobbers)
>  		return len;	/* target would clobber too much for this site */
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	*call++ = 0xe8;		/* call */
> -	*(unsigned long *)call = delta;
> +	b.opcode = 0xe8; /* call */
> +	b.delta = delta;
> +	BUILD_BUG_ON(sizeof(b) != 5);
> +	text_poke(call, (unsigned char *)&b, 5);
>  
>  	return 5;
>  }
> @@ -150,8 +158,9 @@ unsigned paravirt_patch_jmp(void *target
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	*jmp++ = 0xe9;		/* jmp */
> -	*(unsigned long *)jmp = delta;
> +	b.opcode = 0xe9;	/* jmp */
> +	b.delta = delta;
> +	text_poke(call, (unsigned char *)&b, 5);
>  
>  	return 5;
>  }
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching
  2007-07-20 15:32 ` [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching Andi Kleen
  2007-07-20 15:45   ` Jeremy Fitzhardinge
@ 2007-07-20 16:03   ` Mathieu Desnoyers
  1 sibling, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-20 16:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jeremy, patches, linux-kernel

* Andi Kleen (ak@suse.de) wrote:

> Index: linux/arch/x86_64/kernel/nmi.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/nmi.c
> +++ linux/arch/x86_64/kernel/nmi.c
> @@ -384,11 +384,14 @@ int __kprobes nmi_watchdog_tick(struct p
>  	return rc;
>  }
>  
> +static unsigned ignore_nmis;
> +
>  asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
>  {
>  	nmi_enter();
>  	add_pda(__nmi_count,1);
> -	default_do_nmi(regs);
> +	if (!ignore_nmis) 
> +		default_do_nmi(regs);
>  	nmi_exit();
>  }
>  
> @@ -401,6 +404,18 @@ int do_nmi_callback(struct pt_regs * reg
>  	return 0;
>  }
>  
> +void stop_nmi(void)
> +{
> +	ignore_nmis++;
> +	acpi_nmi_disable();
> +}

I would do:

acpi_nmi_disable();
ignore_nmis++;

Instead, so we minimize the chances to have to drop a do_nmi.

> +
> +void restart_nmi(void)
> +{
> +	ignore_nmis--;
> +	acpi_nmi_enable();
> +}
> +
>  #ifdef CONFIG_SYSCTL
>  
...
>  #endif /* ASM_NMI_H */
> Index: linux/arch/i386/kernel/traps.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/traps.c
> +++ linux/arch/i386/kernel/traps.c
> @@ -774,6 +774,8 @@ static __kprobes void default_do_nmi(str
>  	reassert_nmi();
>  }
>  
> +static int ignore_nmis;
> +
>  fastcall __kprobes void do_nmi(struct pt_regs * regs, long error_code)
>  {
>  	int cpu;
> @@ -784,11 +786,24 @@ fastcall __kprobes void do_nmi(struct pt
>  
>  	++nmi_count(cpu);
>  
> -	default_do_nmi(regs);
> +	if (!ignore_nmis)
> +		default_do_nmi(regs);
>  
>  	nmi_exit();
>  }
>  
> +void stop_nmi(void)
> +{
> +	ignore_nmis++;
> +	acpi_nmi_disable();
> +}
> +

Same here, I would swap the two.

> +void restart_nmi(void)
> +{
> +	ignore_nmis--;
> +	acpi_nmi_enable();
> +}
> +

Besides that, it looks good.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
  2007-07-20 15:32 ` [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions Andi Kleen
@ 2007-07-21 12:00   ` Alexey Dobriyan
  2007-07-21 12:04     ` Andi Kleen
  2007-07-22 10:45     ` Juergen Beisert
  0 siblings, 2 replies; 25+ messages in thread
From: Alexey Dobriyan @ 2007-07-21 12:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: juergen, patches, linux-kernel

On Fri, Jul 20, 2007 at 05:32:53PM +0200, Andi Kleen wrote:
> --- /dev/null
> +++ linux/include/asm-i386/processor-cyrix.h
> @@ -0,0 +1,30 @@
> +/*
> + * NSC/Cyrix CPU indexed register access. Must be inlined instead of
> + * macros to ensure correct access ordering
> + * Access order is always 0x22 (=offset), 0x23 (=value)
> + *
> + * When using the old macros a line like
> + *   setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
> + * gets expanded to:
> + *  do {
> + *    outb((CX86_CCR2), 0x22);
> + *    outb((({
> + *        outb((CX86_CCR2), 0x22);
> + *        inb(0x23);
> + *    }) | 0x88), 0x23);
> + *  } while (0);
> + *
> + * which in fact violates the access order (= 0x22, 0x22, 0x23, 0x23).
> + */
> +
> +static inline u8 getCx86(u8 reg)
> +{
> +	outb(reg, 0x22);
> +	return inb(0x23);
> +}
> +
> +static inline void setCx86(u8 reg, u8 data)
> +{
> +	outb(reg, 0x22);
> +	outb(data, 0x23);
> +}

Why bother with new header, just make static inline in asm/processor.h.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
  2007-07-21 12:00   ` Alexey Dobriyan
@ 2007-07-21 12:04     ` Andi Kleen
  2007-07-22 10:45     ` Juergen Beisert
  1 sibling, 0 replies; 25+ messages in thread
From: Andi Kleen @ 2007-07-21 12:04 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: juergen, patches, linux-kernel


> Why bother with new header, just make static inline in asm/processor.h.

processor.h is already far too big, some splitting up is a good thing.

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
  2007-07-21 12:00   ` Alexey Dobriyan
  2007-07-21 12:04     ` Andi Kleen
@ 2007-07-22 10:45     ` Juergen Beisert
  1 sibling, 0 replies; 25+ messages in thread
From: Juergen Beisert @ 2007-07-22 10:45 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Andi Kleen, patches, linux-kernel

On Saturday 21 July 2007 14:00, Alexey Dobriyan wrote:
> On Fri, Jul 20, 2007 at 05:32:53PM +0200, Andi Kleen wrote:
> > --- /dev/null
> > +++ linux/include/asm-i386/processor-cyrix.h
> > @@ -0,0 +1,30 @@
> > +/*
> > + * NSC/Cyrix CPU indexed register access. Must be inlined instead of
> > + * macros to ensure correct access ordering
> > + * Access order is always 0x22 (=offset), 0x23 (=value)
> > + *
> > + * When using the old macros a line like
> > + *   setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
> > + * gets expanded to:
> > + *  do {
> > + *    outb((CX86_CCR2), 0x22);
> > + *    outb((({
> > + *        outb((CX86_CCR2), 0x22);
> > + *        inb(0x23);
> > + *    }) | 0x88), 0x23);
> > + *  } while (0);
> > + *
> > + * which in fact violates the access order (= 0x22, 0x22, 0x23, 0x23).
> > + */
> > +
> > +static inline u8 getCx86(u8 reg)
> > +{
> > +	outb(reg, 0x22);
> > +	return inb(0x23);
> > +}
> > +
> > +static inline void setCx86(u8 reg, u8 data)
> > +{
> > +	outb(reg, 0x22);
> > +	outb(data, 0x23);
> > +}
>
> Why bother with new header, just make static inline in asm/processor.h.

Already tried. But it requires asm/io.h everywhere, because it contains 
inb/outb. When they where macros that doesn't matter. Just included asm/io.h 
when these macros where used. But as inlined functions you need asm/io.h all 
the time.

Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
  2007-07-20 15:59   ` Mathieu Desnoyers
@ 2007-07-23 14:46   ` Mathieu Desnoyers
  2007-07-23 15:01     ` Andi Kleen
  2007-07-24  6:06   ` Rusty Russell
  2 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-23 14:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jbeulich, jeremy, zach, patches, linux-kernel

Hi Andi,

I would recommend to prefix with __init every function that is assumed
"safe" because executed at boot time. It would minimize the risks of
buggy usage of these text-modification primitives. That includes
text_poke() and a big chunk of the apply alternative code. (it should
never be executed on a live system, except at boot time, right ?)

Since the constraints are completely different between boot time code
patching and live code patching, there might be some need for two
text_poke versions.

Mathieu

* Andi Kleen (ak@suse.de) wrote:
> 
> Reenable kprobes and alternative patching when the kernel text is write
> protected by DEBUG_RODATA
> Add a general utility function to change write protected text.
> The new function remaps the code using vmap to write it and takes
> care of CPU synchronization. It also does CLFLUSH to make icache recovery
> faster. 
> 
> There are some limitations on when the function can be used, see
> the comment.
> 
> This is a newer version that also changes the paravirt_ops code.
> text_poke also supports multi byte patching now.
> 
> Contains bug fixes from Zach Amsden and suggestions from
> Mathieu Desnoyers.
> 
> Cc: jbeulich@novell.com
> Cc: jeremy@goop.org
> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
> Cc: zach@vmware.com
> 
> Signed-off-by: Andi Kleen <ak@suse.de>
> 
> ---
>  arch/i386/kernel/alternative.c   |   33 +++++++++++++++++++++++++++++++--
>  arch/i386/kernel/kprobes.c       |    9 +++------
>  arch/i386/mm/init.c              |   14 +++-----------
>  arch/x86_64/kernel/kprobes.c     |   10 +++-------
>  arch/x86_64/mm/init.c            |   10 ----------
>  arch/x86_64/mm/pageattr.c        |    2 +-
>  include/asm-i386/alternative.h   |    2 ++
>  include/asm-x86_64/alternative.h |    2 ++
>  include/asm-x86_64/pgtable.h     |    2 ++
>  9 files changed, 47 insertions(+), 37 deletions(-)
> 
> Index: linux/arch/x86_64/kernel/kprobes.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/kprobes.c
> +++ linux/arch/x86_64/kernel/kprobes.c
> @@ -39,9 +39,9 @@
>  #include <linux/module.h>
>  #include <linux/kdebug.h>
>  
> -#include <asm/cacheflush.h>
>  #include <asm/pgtable.h>
>  #include <asm/uaccess.h>
> +#include <asm/alternative.h>
>  
>  void jprobe_return_end(void);
>  static void __kprobes arch_copy_kprobe(struct kprobe *p);
> @@ -209,16 +209,12 @@ static void __kprobes arch_copy_kprobe(s
>  
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = BREAKPOINT_INSTRUCTION;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
>  }
>  
>  void __kprobes arch_disarm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = p->opcode;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, &p->opcode, 1);
>  }
>  
>  void __kprobes arch_remove_kprobe(struct kprobe *p)
> Index: linux/arch/i386/kernel/alternative.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/alternative.c
> +++ linux/arch/i386/kernel/alternative.c
> @@ -2,8 +2,12 @@
>  #include <linux/sched.h>
>  #include <linux/spinlock.h>
>  #include <linux/list.h>
> +#include <linux/kprobes.h>
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
>  #include <asm/alternative.h>
>  #include <asm/sections.h>
> +#include <asm/pgtable.h>
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  static int smp_alt_once;
> @@ -150,7 +154,7 @@ static void nop_out(void *insns, unsigne
>  		unsigned int noplen = len;
>  		if (noplen > ASM_NOP_MAX)
>  			noplen = ASM_NOP_MAX;
> -		memcpy(insns, noptable[noplen], noplen);
> +		text_poke(insns, noptable[noplen], noplen);
>  		insns += noplen;
>  		len -= noplen;
>  	}
> @@ -202,7 +206,7 @@ static void alternatives_smp_lock(u8 **s
>  			continue;
>  		if (*ptr > text_end)
>  			continue;
> -		**ptr = 0xf0; /* lock prefix */
> +		text_poke(*ptr, ((unsigned char []){0xf0}), 1); /* add lock prefix */
>  	};
>  }
>  
> @@ -360,10 +364,6 @@ void apply_paravirt(struct paravirt_patc
>  		/* Pad the rest with nops */
>  		nop_out(p->instr + used, p->len - used);
>  	}
> -
> -	/* Sync to be conservative, in case we patched following
> -	 * instructions */
> -	sync_core();
>  }
>  extern struct paravirt_patch_site __start_parainstructions[],
>  	__stop_parainstructions[];
> @@ -406,3 +406,30 @@ void __init alternative_instructions(voi
>   	apply_paravirt(__parainstructions, __parainstructions_end);
>  	local_irq_restore(flags);
>  }
> +
> +/*
> + * Warning: 
> + * When you use this code to patch more than one byte of an instruction
> + * you need to make sure that other CPUs cannot execute this code in parallel.
> + * And on the local CPU you need to be protected again NMI or MCE handlers
> + * seeing an inconsistent instruction while you patch.
> + */
> +void __kprobes text_poke(void *oaddr, unsigned char *opcode, int len)
> +{
> +        u8 *addr = oaddr;
> +	if (!pte_write(*lookup_address((unsigned long)addr))) {
> +		struct page *p[2] = { virt_to_page(addr), virt_to_page(addr+PAGE_SIZE) };
> +		addr = vmap(p, 2, VM_MAP, PAGE_KERNEL);
> +		if (!addr)
> +			return;
> +		addr += ((unsigned long)oaddr) % PAGE_SIZE;
> +	}
> +	memcpy(addr, opcode, len);
> +	sync_core();
> +	/* Not strictly needed, but can speed CPU recovery up. Ignore cross cacheline
> +	   case. */
> +	if (cpu_has_clflush)
> +		asm("clflush (%0) " :: "r" (oaddr) : "memory");
> +	if (addr != oaddr)
> +		vunmap(addr);
> +}
> Index: linux/arch/x86_64/mm/pageattr.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/pageattr.c
> +++ linux/arch/x86_64/mm/pageattr.c
> @@ -13,7 +13,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/io.h>
>  
> -static inline pte_t *lookup_address(unsigned long address) 
> +pte_t *lookup_address(unsigned long address)
>  { 
>  	pgd_t *pgd = pgd_offset_k(address);
>  	pud_t *pud;
> Index: linux/include/asm-i386/alternative.h
> ===================================================================
> --- linux.orig/include/asm-i386/alternative.h
> +++ linux/include/asm-i386/alternative.h
> @@ -149,4 +149,6 @@ apply_paravirt(struct paravirt_patch_sit
>  #define __parainstructions_end	NULL
>  #endif
>  
> +extern void text_poke(void *addr, unsigned char *opcode, int len);
> +
>  #endif /* _I386_ALTERNATIVE_H */
> Index: linux/include/asm-x86_64/alternative.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/alternative.h
> +++ linux/include/asm-x86_64/alternative.h
> @@ -154,4 +154,6 @@ apply_paravirt(struct paravirt_patch *st
>  #define __parainstructions_end NULL
>  #endif
>  
> +extern void text_poke(void *addr, unsigned char *opcode, int len);
> +
>  #endif /* _X86_64_ALTERNATIVE_H */
> Index: linux/include/asm-x86_64/pgtable.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/pgtable.h
> +++ linux/include/asm-x86_64/pgtable.h
> @@ -403,6 +403,8 @@ extern struct list_head pgd_list;
>  
>  extern int kern_addr_valid(unsigned long addr); 
>  
> +pte_t *lookup_address(unsigned long addr);
> +
>  #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
>  		remap_pfn_range(vma, vaddr, pfn, size, prot)
>  
> Index: linux/arch/i386/kernel/kprobes.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/kprobes.c
> +++ linux/arch/i386/kernel/kprobes.c
> @@ -35,6 +35,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/desc.h>
>  #include <asm/uaccess.h>
> +#include <asm/alternative.h>
>  
>  void jprobe_return_end(void);
>  
> @@ -169,16 +170,12 @@ int __kprobes arch_prepare_kprobe(struct
>  
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = BREAKPOINT_INSTRUCTION;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, ((unsigned char []){BREAKPOINT_INSTRUCTION}), 1);
>  }
>  
>  void __kprobes arch_disarm_kprobe(struct kprobe *p)
>  {
> -	*p->addr = p->opcode;
> -	flush_icache_range((unsigned long) p->addr,
> -			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
> +	text_poke(p->addr, &p->opcode, 1);
>  }
>  
>  void __kprobes arch_remove_kprobe(struct kprobe *p)
> Index: linux/arch/i386/mm/init.c
> ===================================================================
> --- linux.orig/arch/i386/mm/init.c
> +++ linux/arch/i386/mm/init.c
> @@ -800,17 +800,9 @@ void mark_rodata_ro(void)
>  	unsigned long start = PFN_ALIGN(_text);
>  	unsigned long size = PFN_ALIGN(_etext) - start;
>  
> -#ifndef CONFIG_KPROBES
> -#ifdef CONFIG_HOTPLUG_CPU
> -	/* It must still be possible to apply SMP alternatives. */
> -	if (num_possible_cpus() <= 1)
> -#endif
> -	{
> -		change_page_attr(virt_to_page(start),
> -		                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
> -		printk("Write protecting the kernel text: %luk\n", size >> 10);
> -	}
> -#endif
> +	change_page_attr(virt_to_page(start),
> +	                 size >> PAGE_SHIFT, PAGE_KERNEL_RX);
> +	printk("Write protecting the kernel text: %luk\n", size >> 10);
>  	start += size;
>  	size = (unsigned long)__end_rodata - start;
>  	change_page_attr(virt_to_page(start),
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -600,16 +600,6 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long start = (unsigned long)_stext, end;
>  
> -#ifdef CONFIG_HOTPLUG_CPU
> -	/* It must still be possible to apply SMP alternatives. */
> -	if (num_possible_cpus() > 1)
> -		start = (unsigned long)_etext;
> -#endif
> -
> -#ifdef CONFIG_KPROBES
> -	start = (unsigned long)__start_rodata;
> -#endif
> -	
>  	end = (unsigned long)__end_rodata;
>  	start = (start + PAGE_SIZE - 1) & PAGE_MASK;
>  	end &= PAGE_MASK;
> Index: linux/arch/i386/kernel/paravirt.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/paravirt.c
> +++ linux/arch/i386/kernel/paravirt.c
> @@ -124,20 +124,28 @@ unsigned paravirt_patch_ignore(unsigned 
>  	return len;
>  }
>  
> +struct branch { 
> +	unsigned char opcode;
> +	u32 delta;
> +} __attribute__((packed)); 
> +
>  unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
>  			     void *site, u16 site_clobbers,
>  			     unsigned len)
>  {
>  	unsigned char *call = site;
>  	unsigned long delta = (unsigned long)target - (unsigned long)(call+5);
> +	struct branch b;
>  
>  	if (tgt_clobbers & ~site_clobbers)
>  		return len;	/* target would clobber too much for this site */
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	*call++ = 0xe8;		/* call */
> -	*(unsigned long *)call = delta;
> +	b.opcode = 0xe8; /* call */
> +	b.delta = delta;
> +	BUILD_BUG_ON(sizeof(b) != 5);
> +	text_poke(call, (unsigned char *)&b, 5);
>  
>  	return 5;
>  }
> @@ -150,8 +158,9 @@ unsigned paravirt_patch_jmp(void *target
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	*jmp++ = 0xe9;		/* jmp */
> -	*(unsigned long *)jmp = delta;
> +	b.opcode = 0xe9;	/* jmp */
> +	b.delta = delta;
> +	text_poke(call, (unsigned char *)&b, 5);
>  
>  	return 5;
>  }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-23 14:46   ` Mathieu Desnoyers
@ 2007-07-23 15:01     ` Andi Kleen
  2007-07-23 15:25       ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2007-07-23 15:01 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: jbeulich, jeremy, zach, patches, linux-kernel

On Monday 23 July 2007 16:46:03 Mathieu Desnoyers wrote:

> I would recommend to prefix with __init every function that is assumed
> "safe" because executed at boot time. It would minimize the risks of
> buggy usage of these text-modification primitives. That includes
> text_poke() and a big chunk of the apply alternative code. (it should
> never be executed on a live system, except at boot time, right ?)

lock prefix patching can happen later on cpu hotplug/unplug event.

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-23 15:01     ` Andi Kleen
@ 2007-07-23 15:25       ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-23 15:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jbeulich, jeremy, zach, patches, linux-kernel

* Andi Kleen (ak@suse.de) wrote:
> On Monday 23 July 2007 16:46:03 Mathieu Desnoyers wrote:
> 
> > I would recommend to prefix with __init every function that is assumed
> > "safe" because executed at boot time. It would minimize the risks of
> > buggy usage of these text-modification primitives. That includes
> > text_poke() and a big chunk of the apply alternative code. (it should
> > never be executed on a live system, except at boot time, right ?)
> 
> lock prefix patching can happen later on cpu hotplug/unplug event.
> 

There seem to be a separate set of functions for applying smp lock
prefixes and more "general" alternatives. I guess the "general"
alternatives and paravirt could be declared __init, and text_poke users
would have to use it responsibly. Such as:

__init apply_alternatives
__init apply_paravirt

non init:
alternatives_smp_*
alternatives_smp_

Mathieu

> -Andi
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
  2007-07-20 15:59   ` Mathieu Desnoyers
  2007-07-23 14:46   ` Mathieu Desnoyers
@ 2007-07-24  6:06   ` Rusty Russell
  2007-07-24 13:57     ` Mathieu Desnoyers
  2 siblings, 1 reply; 25+ messages in thread
From: Rusty Russell @ 2007-07-24  6:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jbeulich, jeremy, compudj, zach, patches, linux-kernel

On Fri, 2007-07-20 at 17:32 +0200, Andi Kleen wrote:
> Reenable kprobes and alternative patching when the kernel text is write
> protected by DEBUG_RODATA
> Add a general utility function to change write protected text.
> The new function remaps the code using vmap to write it and takes
> care of CPU synchronization. It also does CLFLUSH to make icache recovery
> faster. 
> 
> There are some limitations on when the function can be used, see
> the comment.
> 
> This is a newer version that also changes the paravirt_ops code.
> text_poke also supports multi byte patching now.

Hmm, I wrote this code originally, and would have liked to catch this
change before it went in.

It's broken on i386 w/ paravirt_ops.

Problem: lookup_address() contains paravirt_ops.  So we call
paravirt_ops.patch which patches part of the instructions, then calls
nop_out to patch the end of the instructions.  This calls text_poke
which calls lookup_address, which is what we've half-patched.  Mainly,
we get lucky at the moment (I just hit this in lguest).

We could use a simpler nop_out at boot, or fix this a little more
robustly by making everyone do their patching via a single call to
text_poke.

Needs testing for VMI and Xen (lguest and native booted here):
===
Make patching more robust, fix paravirt issue

Commit 19d36ccdc34f5ed444f8a6af0cbfdb6790eb1177 "x86: Fix alternatives
and kprobes to remap write-protected kernel text" uses code which is
being patched for patching.

In particular, paravirt_ops does patching in two stages: first it
calls paravirt_ops.patch, then it fills any remaining instructions
with nop_out().  nop_out calls text_poke() which calls
lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
that call site is one of the places we patch.

If we always do patching as one single call to text_poke(), we only
need make sure we're not patching the memcpy in text_poke itself.
This means the prototype to paravirt_ops.patch needs to change, to
marshal the new code into a buffer rather than patching in place as it
does now.  It also means all patching goes through text_poke(), which
is known to be safe (apply_alternatives is also changed to make a
single patch).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff -r b81206bbf749 arch/i386/kernel/alternative.c
--- a/arch/i386/kernel/alternative.c	Tue Jul 24 13:05:36 2007 +1000
+++ b/arch/i386/kernel/alternative.c	Tue Jul 24 14:50:54 2007 +1000
@@ -10,6 +10,8 @@
 #include <asm/pgtable.h>
 #include <asm/mce.h>
 #include <asm/nmi.h>
+
+#define MAX_PATCH_LEN 128
 
 #ifdef CONFIG_HOTPLUG_CPU
 static int smp_alt_once;
@@ -148,7 +150,8 @@ static unsigned char** find_nop_table(vo
 
 #endif /* CONFIG_X86_64 */
 
-static void nop_out(void *insns, unsigned int len)
+/* Use this to add nops to a buffer, then text_poke the whole buffer. */
+static void add_nops(void *insns, unsigned int len)
 {
 	unsigned char **noptable = find_nop_table();
 
@@ -156,7 +159,7 @@ static void nop_out(void *insns, unsigne
 		unsigned int noplen = len;
 		if (noplen > ASM_NOP_MAX)
 			noplen = ASM_NOP_MAX;
-		text_poke(insns, noptable[noplen], noplen);
+		memcpy(insns, noptable[noplen], noplen);
 		insns += noplen;
 		len -= noplen;
 	}
@@ -174,15 +177,14 @@ void apply_alternatives(struct alt_instr
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
 	struct alt_instr *a;
-	u8 *instr;
-	int diff;
+	char insnbuf[MAX_PATCH_LEN];
 
 	DPRINTK("%s: alt table %p -> %p\n", __FUNCTION__, start, end);
 	for (a = start; a < end; a++) {
 		BUG_ON(a->replacementlen > a->instrlen);
+		BUG_ON(a->instrlen > sizeof(insnbuf));
 		if (!boot_cpu_has(a->cpuid))
 			continue;
-		instr = a->instr;
 #ifdef CONFIG_X86_64
 		/* vsyscall code is not mapped yet. resolve it manually. */
 		if (instr >= (u8 *)VSYSCALL_START && instr < (u8*)VSYSCALL_END) {
@@ -191,9 +193,10 @@ void apply_alternatives(struct alt_instr
 				__FUNCTION__, a->instr, instr);
 		}
 #endif
-		memcpy(instr, a->replacement, a->replacementlen);
-		diff = a->instrlen - a->replacementlen;
-		nop_out(instr + a->replacementlen, diff);
+		memcpy(insnbuf, a->replacement, a->replacementlen);
+		add_nops(insnbuf + a->replacementlen,
+			 a->instrlen - a->replacementlen);
+		text_poke(a->instr, insnbuf, a->instrlen);
 	}
 }
 
@@ -215,16 +218,18 @@ static void alternatives_smp_unlock(u8 *
 static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 {
 	u8 **ptr;
+	char insn[1];
 
 	if (noreplace_smp)
 		return;
 
+	add_nops(insn, 1);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
 		if (*ptr > text_end)
 			continue;
-		nop_out(*ptr, 1);
+		text_poke(*ptr, insn, 1);
 	};
 }
 
@@ -351,6 +356,7 @@ void apply_paravirt(struct paravirt_patc
 		    struct paravirt_patch_site *end)
 {
 	struct paravirt_patch_site *p;
+	char insnbuf[MAX_PATCH_LEN];
 
 	if (noreplace_paravirt)
 		return;
@@ -358,13 +364,15 @@ void apply_paravirt(struct paravirt_patc
 	for (p = start; p < end; p++) {
 		unsigned int used;
 
-		used = paravirt_ops.patch(p->instrtype, p->clobbers, p->instr,
-					  p->len);
+		BUG_ON(p->len > MAX_PATCH_LEN);
+		used = paravirt_ops.patch(p->instrtype, p->clobbers, insnbuf,
+					  (unsigned long)p->instr, p->len);
 
 		BUG_ON(used > p->len);
 
 		/* Pad the rest with nops */
-		nop_out(p->instr + used, p->len - used);
+		add_nops(insnbuf + used, p->len - used);
+		text_poke(p->instr, insnbuf, p->len);
 	}
 }
 extern struct paravirt_patch_site __start_parainstructions[],
diff -r b81206bbf749 arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c	Tue Jul 24 13:05:36 2007 +1000
+++ b/arch/i386/kernel/paravirt.c	Tue Jul 24 15:13:32 2007 +1000
@@ -69,7 +69,8 @@ DEF_NATIVE(read_tsc, "rdtsc");
 
 DEF_NATIVE(ud2a, "ud2a");
 
-static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len)
+static unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
+			     unsigned long addr, unsigned len)
 {
 	const unsigned char *start, *end;
 	unsigned ret;
@@ -90,7 +91,7 @@ static unsigned native_patch(u8 type, u1
 #undef SITE
 
 	patch_site:
-		ret = paravirt_patch_insns(insns, len, start, end);
+		ret = paravirt_patch_insns(ibuf, len, start, end);
 		break;
 
 	case PARAVIRT_PATCH(make_pgd):
@@ -107,7 +108,7 @@ static unsigned native_patch(u8 type, u1
 		break;
 
 	default:
-		ret = paravirt_patch_default(type, clobbers, insns, len);
+		ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
 		break;
 	}
 
@@ -129,68 +130,67 @@ struct branch {
 	u32 delta;
 } __attribute__((packed));
 
-unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
-			     void *site, u16 site_clobbers,
+unsigned paravirt_patch_call(void *insnbuf,
+			     const void *target, u16 tgt_clobbers,
+			     unsigned long addr, u16 site_clobbers,
 			     unsigned len)
 {
-	unsigned char *call = site;
-	unsigned long delta = (unsigned long)target - (unsigned long)(call+5);
-	struct branch b;
+	struct branch *b = insnbuf;
+	unsigned long delta = (unsigned long)target - (addr+5);
 
 	if (tgt_clobbers & ~site_clobbers)
 		return len;	/* target would clobber too much for this site */
 	if (len < 5)
 		return len;	/* call too long for patch site */
 
-	b.opcode = 0xe8; /* call */
-	b.delta = delta;
-	BUILD_BUG_ON(sizeof(b) != 5);
-	text_poke(call, (unsigned char *)&b, 5);
+	b->opcode = 0xe8; /* call */
+	b->delta = delta;
+	BUILD_BUG_ON(sizeof(*b) != 5);
 
 	return 5;
 }
 
-unsigned paravirt_patch_jmp(void *target, void *site, unsigned len)
-{
-	unsigned char *jmp = site;
-	unsigned long delta = (unsigned long)target - (unsigned long)(jmp+5);
-	struct branch b;
+unsigned paravirt_patch_jmp(const void *target, void *insnbuf,
+			    unsigned long addr, unsigned len)
+{
+	struct branch *b = insnbuf;
+	unsigned long delta = (unsigned long)target - (addr+5);
 
 	if (len < 5)
 		return len;	/* call too long for patch site */
 
-	b.opcode = 0xe9;	/* jmp */
-	b.delta = delta;
-	text_poke(jmp, (unsigned char *)&b, 5);
+	b->opcode = 0xe9;	/* jmp */
+	b->delta = delta;
 
 	return 5;
 }
 
-unsigned paravirt_patch_default(u8 type, u16 clobbers, void *site, unsigned len)
+unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
+				unsigned long addr, unsigned len)
 {
 	void *opfunc = *((void **)&paravirt_ops + type);
 	unsigned ret;
 
 	if (opfunc == NULL)
 		/* If there's no function, patch it with a ud2a (BUG) */
-		ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a);
+		ret = paravirt_patch_insns(insnbuf, len, start_ud2a, end_ud2a);
 	else if (opfunc == paravirt_nop)
 		/* If the operation is a nop, then nop the callsite */
 		ret = paravirt_patch_nop();
 	else if (type == PARAVIRT_PATCH(iret) ||
 		 type == PARAVIRT_PATCH(irq_enable_sysexit))
 		/* If operation requires a jmp, then jmp */
-		ret = paravirt_patch_jmp(opfunc, site, len);
+		ret = paravirt_patch_jmp(opfunc, insnbuf, addr, len);
 	else
 		/* Otherwise call the function; assume target could
 		   clobber any caller-save reg */
-		ret = paravirt_patch_call(opfunc, CLBR_ANY,
-					  site, clobbers, len);
+		ret = paravirt_patch_call(insnbuf, opfunc, CLBR_ANY,
+					  addr, clobbers, len);
 
 	return ret;
 }
 
-unsigned paravirt_patch_insns(void *site, unsigned len,
+unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
 			      const char *start, const char *end)
 {
 	unsigned insn_len = end - start;
@@ -198,7 +198,7 @@ unsigned paravirt_patch_insns(void *site
 	if (insn_len > len || start == NULL)
 		insn_len = len;
 	else
-		memcpy(site, start, insn_len);
+		memcpy(insnbuf, start, insn_len);
 
 	return insn_len;
 }
diff -r b81206bbf749 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.c	Tue Jul 24 13:05:36 2007 +1000
+++ b/arch/i386/kernel/vmi.c	Tue Jul 24 15:12:42 2007 +1000
@@ -87,12 +87,14 @@ struct vmi_timer_ops vmi_timer_ops;
 #define IRQ_PATCH_INT_MASK 0
 #define IRQ_PATCH_DISABLE  5
 
-static inline void patch_offset(unsigned char *eip, unsigned char *dest)
-{
-        *(unsigned long *)(eip+1) = dest-eip-5;
-}
-
-static unsigned patch_internal(int call, unsigned len, void *insns)
+static inline void patch_offset(void *insnbuf,
+				unsigned long eip, unsigned long dest)
+{
+        *(unsigned long *)(insnbuf+1) = dest-eip-5;
+}
+
+static unsigned patch_internal(int call, unsigned len, void *insnbuf,
+			       unsigned long eip)
 {
 	u64 reloc;
 	struct vmi_relocation_info *const rel = (struct vmi_relocation_info *)&reloc;
@@ -100,14 +102,14 @@ static unsigned patch_internal(int call,
 	switch(rel->type) {
 		case VMI_RELOCATION_CALL_REL:
 			BUG_ON(len < 5);
-			*(char *)insns = MNEM_CALL;
-			patch_offset(insns, rel->eip);
+			*(char *)insnbuf = MNEM_CALL;
+			patch_offset(insnbuf, eip, (unsigned long)rel->eip);
 			return 5;
 
 		case VMI_RELOCATION_JUMP_REL:
 			BUG_ON(len < 5);
-			*(char *)insns = MNEM_JMP;
-			patch_offset(insns, rel->eip);
+			*(char *)insnbuf = MNEM_JMP;
+			patch_offset(insnbuf, eip, (unsigned long)rel->eip);
 			return 5;
 
 		case VMI_RELOCATION_NOP:
@@ -128,21 +130,26 @@ static unsigned patch_internal(int call,
  * Apply patch if appropriate, return length of new instruction
  * sequence.  The callee does nop padding for us.
  */
-static unsigned vmi_patch(u8 type, u16 clobbers, void *insns, unsigned len)
+static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
+			  unsigned long eip, unsigned len)
 {
 	switch (type) {
 		case PARAVIRT_PATCH(irq_disable):
-			return patch_internal(VMI_CALL_DisableInterrupts, len, insns);
+			return patch_internal(VMI_CALL_DisableInterrupts, len,
+					      insns, eip);
 		case PARAVIRT_PATCH(irq_enable):
-			return patch_internal(VMI_CALL_EnableInterrupts, len, insns);
+			return patch_internal(VMI_CALL_EnableInterrupts, len,
+					      insns, eip);
 		case PARAVIRT_PATCH(restore_fl):
-			return patch_internal(VMI_CALL_SetInterruptMask, len, insns);
+			return patch_internal(VMI_CALL_SetInterruptMask, len,
+					      insns, eip);
 		case PARAVIRT_PATCH(save_fl):
-			return patch_internal(VMI_CALL_GetInterruptMask, len, insns);
+			return patch_internal(VMI_CALL_GetInterruptMask, len,
+					      insns, eip);
 		case PARAVIRT_PATCH(iret):
-			return patch_internal(VMI_CALL_IRET, len, insns);
+			return patch_internal(VMI_CALL_IRET, len, insns, eip);
 		case PARAVIRT_PATCH(irq_enable_sysexit):
-			return patch_internal(VMI_CALL_SYSEXIT, len, insns);
+			return patch_internal(VMI_CALL_SYSEXIT, len, insns, eip);
 		default:
 			break;
 	}
diff -r b81206bbf749 arch/i386/xen/enlighten.c
--- a/arch/i386/xen/enlighten.c	Tue Jul 24 13:05:36 2007 +1000
+++ b/arch/i386/xen/enlighten.c	Tue Jul 24 15:10:37 2007 +1000
@@ -842,7 +842,8 @@ void __init xen_setup_vcpu_info_placemen
 	}
 }
 
-static unsigned xen_patch(u8 type, u16 clobbers, void *insns, unsigned len)
+static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
+			  unsigned long addr, unsigned len)
 {
 	char *start, *end, *reloc;
 	unsigned ret;
@@ -869,7 +870,7 @@ static unsigned xen_patch(u8 type, u16 c
 		if (start == NULL || (end-start) > len)
 			goto default_patch;
 
-		ret = paravirt_patch_insns(insns, len, start, end);
+		ret = paravirt_patch_insns(insnbuf, len, start, end);
 
 		/* Note: because reloc is assigned from something that
 		   appears to be an array, gcc assumes it's non-null,
@@ -877,8 +878,8 @@ static unsigned xen_patch(u8 type, u16 c
 		   end. */
 		if (reloc > start && reloc < end) {
 			int reloc_off = reloc - start;
-			long *relocp = (long *)(insns + reloc_off);
-			long delta = start - (char *)insns;
+			long *relocp = (long *)(insnbuf + reloc_off);
+			long delta = start - (char *)addr;
 
 			*relocp += delta;
 		}
@@ -886,7 +887,8 @@ static unsigned xen_patch(u8 type, u16 c
 
 	default_patch:
 	default:
-		ret = paravirt_patch_default(type, clobbers, insns, len);
+		ret = paravirt_patch_default(type, clobbers, insnbuf,
+					     addr, len);
 		break;
 	}
 
diff -r b81206bbf749 drivers/lguest/lguest.c
--- a/drivers/lguest/lguest.c	Tue Jul 24 13:05:36 2007 +1000
+++ b/drivers/lguest/lguest.c	Tue Jul 24 15:08:20 2007 +1000
@@ -521,21 +521,22 @@ static const struct lguest_insns
 	[PARAVIRT_PATCH(restore_fl)] = { lgstart_popf, lgend_popf },
 	[PARAVIRT_PATCH(save_fl)] = { lgstart_pushf, lgend_pushf },
 };
-static unsigned lguest_patch(u8 type, u16 clobber, void *insns, unsigned len)
+static unsigned lguest_patch(u8 type, u16 clobber, void *ibuf,
+			     unsigned long addr, unsigned len)
 {
 	unsigned int insn_len;
 
 	/* Don't touch it if we don't have a replacement */
 	if (type >= ARRAY_SIZE(lguest_insns) || !lguest_insns[type].start)
-		return paravirt_patch_default(type, clobber, insns, len);
+		return paravirt_patch_default(type, clobber, ibuf, addr, len);
 
 	insn_len = lguest_insns[type].end - lguest_insns[type].start;
 
 	/* Similarly if we can't fit replacement. */
 	if (len < insn_len)
-		return paravirt_patch_default(type, clobber, insns, len);
-
-	memcpy(insns, lguest_insns[type].start, insn_len);
+		return paravirt_patch_default(type, clobber, ibuf, addr, len);
+
+	memcpy(ibuf, lguest_insns[type].start, insn_len);
 	return insn_len;
 }
 
diff -r b81206bbf749 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h	Tue Jul 24 13:05:36 2007 +1000
+++ b/include/asm-i386/paravirt.h	Tue Jul 24 14:57:44 2007 +1000
@@ -47,7 +47,8 @@ struct paravirt_ops
 	 * The patch function should return the number of bytes of code
 	 * generated, as we nop pad the rest in generic code.
 	 */
-	unsigned (*patch)(u8 type, u16 clobber, void *firstinsn, unsigned len);
+	unsigned (*patch)(u8 type, u16 clobber, void *insnbuf,
+			  unsigned long addr, unsigned len);
 
 	/* Basic arch-specific setup */
 	void (*arch_setup)(void);
@@ -253,13 +254,16 @@ extern struct paravirt_ops paravirt_ops;
 
 unsigned paravirt_patch_nop(void);
 unsigned paravirt_patch_ignore(unsigned len);
-unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
-			     void *site, u16 site_clobbers,
+unsigned paravirt_patch_call(void *insnbuf,
+			     const void *target, u16 tgt_clobbers,
+			     unsigned long addr, u16 site_clobbers,
 			     unsigned len);
-unsigned paravirt_patch_jmp(void *target, void *site, unsigned len);
-unsigned paravirt_patch_default(u8 type, u16 clobbers, void *site, unsigned len);
-
-unsigned paravirt_patch_insns(void *site, unsigned len,
+unsigned paravirt_patch_jmp(const void *target, void *insnbuf,
+			    unsigned long addr, unsigned len);
+unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
+				unsigned long addr, unsigned len);
+
+unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
 			      const char *start, const char *end);
 
 int paravirt_disable_iospace(void);




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-24  6:06   ` Rusty Russell
@ 2007-07-24 13:57     ` Mathieu Desnoyers
  2007-07-24 15:17       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-24 13:57 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Andi Kleen, jbeulich, jeremy, zach, patches, linux-kernel

Hi Rusty,

Good catch. I also still wonder why alternative (except alternative smp,
which is ok) and paravirt patching functions are not marked __init. I really
hope they are never used when NMI, MCE are enabled or when threads may
have been preempted in the site being patched, or it could result in an
illegal instruction.

Mathieu

* Rusty Russell (rusty@rustcorp.com.au) wrote:
> On Fri, 2007-07-20 at 17:32 +0200, Andi Kleen wrote:
> > Reenable kprobes and alternative patching when the kernel text is write
> > protected by DEBUG_RODATA
> > Add a general utility function to change write protected text.
> > The new function remaps the code using vmap to write it and takes
> > care of CPU synchronization. It also does CLFLUSH to make icache recovery
> > faster. 
> > 
> > There are some limitations on when the function can be used, see
> > the comment.
> > 
> > This is a newer version that also changes the paravirt_ops code.
> > text_poke also supports multi byte patching now.
> 
> Hmm, I wrote this code originally, and would have liked to catch this
> change before it went in.
> 
> It's broken on i386 w/ paravirt_ops.
> 
> Problem: lookup_address() contains paravirt_ops.  So we call
> paravirt_ops.patch which patches part of the instructions, then calls
> nop_out to patch the end of the instructions.  This calls text_poke
> which calls lookup_address, which is what we've half-patched.  Mainly,
> we get lucky at the moment (I just hit this in lguest).
> 
> We could use a simpler nop_out at boot, or fix this a little more
> robustly by making everyone do their patching via a single call to
> text_poke.
> 
> Needs testing for VMI and Xen (lguest and native booted here):
> ===
> Make patching more robust, fix paravirt issue
> 
> Commit 19d36ccdc34f5ed444f8a6af0cbfdb6790eb1177 "x86: Fix alternatives
> and kprobes to remap write-protected kernel text" uses code which is
> being patched for patching.
> 
> In particular, paravirt_ops does patching in two stages: first it
> calls paravirt_ops.patch, then it fills any remaining instructions
> with nop_out().  nop_out calls text_poke() which calls
> lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
> that call site is one of the places we patch.
> 
> If we always do patching as one single call to text_poke(), we only
> need make sure we're not patching the memcpy in text_poke itself.
> This means the prototype to paravirt_ops.patch needs to change, to
> marshal the new code into a buffer rather than patching in place as it
> does now.  It also means all patching goes through text_poke(), which
> is known to be safe (apply_alternatives is also changed to make a
> single patch).
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff -r b81206bbf749 arch/i386/kernel/alternative.c
> --- a/arch/i386/kernel/alternative.c	Tue Jul 24 13:05:36 2007 +1000
> +++ b/arch/i386/kernel/alternative.c	Tue Jul 24 14:50:54 2007 +1000
> @@ -10,6 +10,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/mce.h>
>  #include <asm/nmi.h>
> +
> +#define MAX_PATCH_LEN 128
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  static int smp_alt_once;
> @@ -148,7 +150,8 @@ static unsigned char** find_nop_table(vo
>  
>  #endif /* CONFIG_X86_64 */
>  
> -static void nop_out(void *insns, unsigned int len)
> +/* Use this to add nops to a buffer, then text_poke the whole buffer. */
> +static void add_nops(void *insns, unsigned int len)
>  {
>  	unsigned char **noptable = find_nop_table();
>  
> @@ -156,7 +159,7 @@ static void nop_out(void *insns, unsigne
>  		unsigned int noplen = len;
>  		if (noplen > ASM_NOP_MAX)
>  			noplen = ASM_NOP_MAX;
> -		text_poke(insns, noptable[noplen], noplen);
> +		memcpy(insns, noptable[noplen], noplen);
>  		insns += noplen;
>  		len -= noplen;
>  	}
> @@ -174,15 +177,14 @@ void apply_alternatives(struct alt_instr
>  void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
>  {
>  	struct alt_instr *a;
> -	u8 *instr;
> -	int diff;
> +	char insnbuf[MAX_PATCH_LEN];
>  
>  	DPRINTK("%s: alt table %p -> %p\n", __FUNCTION__, start, end);
>  	for (a = start; a < end; a++) {
>  		BUG_ON(a->replacementlen > a->instrlen);
> +		BUG_ON(a->instrlen > sizeof(insnbuf));
>  		if (!boot_cpu_has(a->cpuid))
>  			continue;
> -		instr = a->instr;
>  #ifdef CONFIG_X86_64
>  		/* vsyscall code is not mapped yet. resolve it manually. */
>  		if (instr >= (u8 *)VSYSCALL_START && instr < (u8*)VSYSCALL_END) {
> @@ -191,9 +193,10 @@ void apply_alternatives(struct alt_instr
>  				__FUNCTION__, a->instr, instr);
>  		}
>  #endif
> -		memcpy(instr, a->replacement, a->replacementlen);
> -		diff = a->instrlen - a->replacementlen;
> -		nop_out(instr + a->replacementlen, diff);
> +		memcpy(insnbuf, a->replacement, a->replacementlen);
> +		add_nops(insnbuf + a->replacementlen,
> +			 a->instrlen - a->replacementlen);
> +		text_poke(a->instr, insnbuf, a->instrlen);
>  	}
>  }
>  
> @@ -215,16 +218,18 @@ static void alternatives_smp_unlock(u8 *
>  static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
>  {
>  	u8 **ptr;
> +	char insn[1];
>  
>  	if (noreplace_smp)
>  		return;
>  
> +	add_nops(insn, 1);
>  	for (ptr = start; ptr < end; ptr++) {
>  		if (*ptr < text)
>  			continue;
>  		if (*ptr > text_end)
>  			continue;
> -		nop_out(*ptr, 1);
> +		text_poke(*ptr, insn, 1);
>  	};
>  }
>  
> @@ -351,6 +356,7 @@ void apply_paravirt(struct paravirt_patc
>  		    struct paravirt_patch_site *end)
>  {
>  	struct paravirt_patch_site *p;
> +	char insnbuf[MAX_PATCH_LEN];
>  
>  	if (noreplace_paravirt)
>  		return;
> @@ -358,13 +364,15 @@ void apply_paravirt(struct paravirt_patc
>  	for (p = start; p < end; p++) {
>  		unsigned int used;
>  
> -		used = paravirt_ops.patch(p->instrtype, p->clobbers, p->instr,
> -					  p->len);
> +		BUG_ON(p->len > MAX_PATCH_LEN);
> +		used = paravirt_ops.patch(p->instrtype, p->clobbers, insnbuf,
> +					  (unsigned long)p->instr, p->len);
>  
>  		BUG_ON(used > p->len);
>  
>  		/* Pad the rest with nops */
> -		nop_out(p->instr + used, p->len - used);
> +		add_nops(insnbuf + used, p->len - used);
> +		text_poke(p->instr, insnbuf, p->len);
>  	}
>  }
>  extern struct paravirt_patch_site __start_parainstructions[],
> diff -r b81206bbf749 arch/i386/kernel/paravirt.c
> --- a/arch/i386/kernel/paravirt.c	Tue Jul 24 13:05:36 2007 +1000
> +++ b/arch/i386/kernel/paravirt.c	Tue Jul 24 15:13:32 2007 +1000
> @@ -69,7 +69,8 @@ DEF_NATIVE(read_tsc, "rdtsc");
>  
>  DEF_NATIVE(ud2a, "ud2a");
>  
> -static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len)
> +static unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
> +			     unsigned long addr, unsigned len)
>  {
>  	const unsigned char *start, *end;
>  	unsigned ret;
> @@ -90,7 +91,7 @@ static unsigned native_patch(u8 type, u1
>  #undef SITE
>  
>  	patch_site:
> -		ret = paravirt_patch_insns(insns, len, start, end);
> +		ret = paravirt_patch_insns(ibuf, len, start, end);
>  		break;
>  
>  	case PARAVIRT_PATCH(make_pgd):
> @@ -107,7 +108,7 @@ static unsigned native_patch(u8 type, u1
>  		break;
>  
>  	default:
> -		ret = paravirt_patch_default(type, clobbers, insns, len);
> +		ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
>  		break;
>  	}
>  
> @@ -129,68 +130,67 @@ struct branch {
>  	u32 delta;
>  } __attribute__((packed));
>  
> -unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
> -			     void *site, u16 site_clobbers,
> +unsigned paravirt_patch_call(void *insnbuf,
> +			     const void *target, u16 tgt_clobbers,
> +			     unsigned long addr, u16 site_clobbers,
>  			     unsigned len)
>  {
> -	unsigned char *call = site;
> -	unsigned long delta = (unsigned long)target - (unsigned long)(call+5);
> -	struct branch b;
> +	struct branch *b = insnbuf;
> +	unsigned long delta = (unsigned long)target - (addr+5);
>  
>  	if (tgt_clobbers & ~site_clobbers)
>  		return len;	/* target would clobber too much for this site */
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	b.opcode = 0xe8; /* call */
> -	b.delta = delta;
> -	BUILD_BUG_ON(sizeof(b) != 5);
> -	text_poke(call, (unsigned char *)&b, 5);
> +	b->opcode = 0xe8; /* call */
> +	b->delta = delta;
> +	BUILD_BUG_ON(sizeof(*b) != 5);
>  
>  	return 5;
>  }
>  
> -unsigned paravirt_patch_jmp(void *target, void *site, unsigned len)
> -{
> -	unsigned char *jmp = site;
> -	unsigned long delta = (unsigned long)target - (unsigned long)(jmp+5);
> -	struct branch b;
> +unsigned paravirt_patch_jmp(const void *target, void *insnbuf,
> +			    unsigned long addr, unsigned len)
> +{
> +	struct branch *b = insnbuf;
> +	unsigned long delta = (unsigned long)target - (addr+5);
>  
>  	if (len < 5)
>  		return len;	/* call too long for patch site */
>  
> -	b.opcode = 0xe9;	/* jmp */
> -	b.delta = delta;
> -	text_poke(jmp, (unsigned char *)&b, 5);
> +	b->opcode = 0xe9;	/* jmp */
> +	b->delta = delta;
>  
>  	return 5;
>  }
>  
> -unsigned paravirt_patch_default(u8 type, u16 clobbers, void *site, unsigned len)
> +unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
> +				unsigned long addr, unsigned len)
>  {
>  	void *opfunc = *((void **)&paravirt_ops + type);
>  	unsigned ret;
>  
>  	if (opfunc == NULL)
>  		/* If there's no function, patch it with a ud2a (BUG) */
> -		ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a);
> +		ret = paravirt_patch_insns(insnbuf, len, start_ud2a, end_ud2a);
>  	else if (opfunc == paravirt_nop)
>  		/* If the operation is a nop, then nop the callsite */
>  		ret = paravirt_patch_nop();
>  	else if (type == PARAVIRT_PATCH(iret) ||
>  		 type == PARAVIRT_PATCH(irq_enable_sysexit))
>  		/* If operation requires a jmp, then jmp */
> -		ret = paravirt_patch_jmp(opfunc, site, len);
> +		ret = paravirt_patch_jmp(opfunc, insnbuf, addr, len);
>  	else
>  		/* Otherwise call the function; assume target could
>  		   clobber any caller-save reg */
> -		ret = paravirt_patch_call(opfunc, CLBR_ANY,
> -					  site, clobbers, len);
> +		ret = paravirt_patch_call(insnbuf, opfunc, CLBR_ANY,
> +					  addr, clobbers, len);
>  
>  	return ret;
>  }
>  
> -unsigned paravirt_patch_insns(void *site, unsigned len,
> +unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
>  			      const char *start, const char *end)
>  {
>  	unsigned insn_len = end - start;
> @@ -198,7 +198,7 @@ unsigned paravirt_patch_insns(void *site
>  	if (insn_len > len || start == NULL)
>  		insn_len = len;
>  	else
> -		memcpy(site, start, insn_len);
> +		memcpy(insnbuf, start, insn_len);
>  
>  	return insn_len;
>  }
> diff -r b81206bbf749 arch/i386/kernel/vmi.c
> --- a/arch/i386/kernel/vmi.c	Tue Jul 24 13:05:36 2007 +1000
> +++ b/arch/i386/kernel/vmi.c	Tue Jul 24 15:12:42 2007 +1000
> @@ -87,12 +87,14 @@ struct vmi_timer_ops vmi_timer_ops;
>  #define IRQ_PATCH_INT_MASK 0
>  #define IRQ_PATCH_DISABLE  5
>  
> -static inline void patch_offset(unsigned char *eip, unsigned char *dest)
> -{
> -        *(unsigned long *)(eip+1) = dest-eip-5;
> -}
> -
> -static unsigned patch_internal(int call, unsigned len, void *insns)
> +static inline void patch_offset(void *insnbuf,
> +				unsigned long eip, unsigned long dest)
> +{
> +        *(unsigned long *)(insnbuf+1) = dest-eip-5;
> +}
> +
> +static unsigned patch_internal(int call, unsigned len, void *insnbuf,
> +			       unsigned long eip)
>  {
>  	u64 reloc;
>  	struct vmi_relocation_info *const rel = (struct vmi_relocation_info *)&reloc;
> @@ -100,14 +102,14 @@ static unsigned patch_internal(int call,
>  	switch(rel->type) {
>  		case VMI_RELOCATION_CALL_REL:
>  			BUG_ON(len < 5);
> -			*(char *)insns = MNEM_CALL;
> -			patch_offset(insns, rel->eip);
> +			*(char *)insnbuf = MNEM_CALL;
> +			patch_offset(insnbuf, eip, (unsigned long)rel->eip);
>  			return 5;
>  
>  		case VMI_RELOCATION_JUMP_REL:
>  			BUG_ON(len < 5);
> -			*(char *)insns = MNEM_JMP;
> -			patch_offset(insns, rel->eip);
> +			*(char *)insnbuf = MNEM_JMP;
> +			patch_offset(insnbuf, eip, (unsigned long)rel->eip);
>  			return 5;
>  
>  		case VMI_RELOCATION_NOP:
> @@ -128,21 +130,26 @@ static unsigned patch_internal(int call,
>   * Apply patch if appropriate, return length of new instruction
>   * sequence.  The callee does nop padding for us.
>   */
> -static unsigned vmi_patch(u8 type, u16 clobbers, void *insns, unsigned len)
> +static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
> +			  unsigned long eip, unsigned len)
>  {
>  	switch (type) {
>  		case PARAVIRT_PATCH(irq_disable):
> -			return patch_internal(VMI_CALL_DisableInterrupts, len, insns);
> +			return patch_internal(VMI_CALL_DisableInterrupts, len,
> +					      insns, eip);
>  		case PARAVIRT_PATCH(irq_enable):
> -			return patch_internal(VMI_CALL_EnableInterrupts, len, insns);
> +			return patch_internal(VMI_CALL_EnableInterrupts, len,
> +					      insns, eip);
>  		case PARAVIRT_PATCH(restore_fl):
> -			return patch_internal(VMI_CALL_SetInterruptMask, len, insns);
> +			return patch_internal(VMI_CALL_SetInterruptMask, len,
> +					      insns, eip);
>  		case PARAVIRT_PATCH(save_fl):
> -			return patch_internal(VMI_CALL_GetInterruptMask, len, insns);
> +			return patch_internal(VMI_CALL_GetInterruptMask, len,
> +					      insns, eip);
>  		case PARAVIRT_PATCH(iret):
> -			return patch_internal(VMI_CALL_IRET, len, insns);
> +			return patch_internal(VMI_CALL_IRET, len, insns, eip);
>  		case PARAVIRT_PATCH(irq_enable_sysexit):
> -			return patch_internal(VMI_CALL_SYSEXIT, len, insns);
> +			return patch_internal(VMI_CALL_SYSEXIT, len, insns, eip);
>  		default:
>  			break;
>  	}
> diff -r b81206bbf749 arch/i386/xen/enlighten.c
> --- a/arch/i386/xen/enlighten.c	Tue Jul 24 13:05:36 2007 +1000
> +++ b/arch/i386/xen/enlighten.c	Tue Jul 24 15:10:37 2007 +1000
> @@ -842,7 +842,8 @@ void __init xen_setup_vcpu_info_placemen
>  	}
>  }
>  
> -static unsigned xen_patch(u8 type, u16 clobbers, void *insns, unsigned len)
> +static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
> +			  unsigned long addr, unsigned len)
>  {
>  	char *start, *end, *reloc;
>  	unsigned ret;
> @@ -869,7 +870,7 @@ static unsigned xen_patch(u8 type, u16 c
>  		if (start == NULL || (end-start) > len)
>  			goto default_patch;
>  
> -		ret = paravirt_patch_insns(insns, len, start, end);
> +		ret = paravirt_patch_insns(insnbuf, len, start, end);
>  
>  		/* Note: because reloc is assigned from something that
>  		   appears to be an array, gcc assumes it's non-null,
> @@ -877,8 +878,8 @@ static unsigned xen_patch(u8 type, u16 c
>  		   end. */
>  		if (reloc > start && reloc < end) {
>  			int reloc_off = reloc - start;
> -			long *relocp = (long *)(insns + reloc_off);
> -			long delta = start - (char *)insns;
> +			long *relocp = (long *)(insnbuf + reloc_off);
> +			long delta = start - (char *)addr;
>  
>  			*relocp += delta;
>  		}
> @@ -886,7 +887,8 @@ static unsigned xen_patch(u8 type, u16 c
>  
>  	default_patch:
>  	default:
> -		ret = paravirt_patch_default(type, clobbers, insns, len);
> +		ret = paravirt_patch_default(type, clobbers, insnbuf,
> +					     addr, len);
>  		break;
>  	}
>  
> diff -r b81206bbf749 drivers/lguest/lguest.c
> --- a/drivers/lguest/lguest.c	Tue Jul 24 13:05:36 2007 +1000
> +++ b/drivers/lguest/lguest.c	Tue Jul 24 15:08:20 2007 +1000
> @@ -521,21 +521,22 @@ static const struct lguest_insns
>  	[PARAVIRT_PATCH(restore_fl)] = { lgstart_popf, lgend_popf },
>  	[PARAVIRT_PATCH(save_fl)] = { lgstart_pushf, lgend_pushf },
>  };
> -static unsigned lguest_patch(u8 type, u16 clobber, void *insns, unsigned len)
> +static unsigned lguest_patch(u8 type, u16 clobber, void *ibuf,
> +			     unsigned long addr, unsigned len)
>  {
>  	unsigned int insn_len;
>  
>  	/* Don't touch it if we don't have a replacement */
>  	if (type >= ARRAY_SIZE(lguest_insns) || !lguest_insns[type].start)
> -		return paravirt_patch_default(type, clobber, insns, len);
> +		return paravirt_patch_default(type, clobber, ibuf, addr, len);
>  
>  	insn_len = lguest_insns[type].end - lguest_insns[type].start;
>  
>  	/* Similarly if we can't fit replacement. */
>  	if (len < insn_len)
> -		return paravirt_patch_default(type, clobber, insns, len);
> -
> -	memcpy(insns, lguest_insns[type].start, insn_len);
> +		return paravirt_patch_default(type, clobber, ibuf, addr, len);
> +
> +	memcpy(ibuf, lguest_insns[type].start, insn_len);
>  	return insn_len;
>  }
>  
> diff -r b81206bbf749 include/asm-i386/paravirt.h
> --- a/include/asm-i386/paravirt.h	Tue Jul 24 13:05:36 2007 +1000
> +++ b/include/asm-i386/paravirt.h	Tue Jul 24 14:57:44 2007 +1000
> @@ -47,7 +47,8 @@ struct paravirt_ops
>  	 * The patch function should return the number of bytes of code
>  	 * generated, as we nop pad the rest in generic code.
>  	 */
> -	unsigned (*patch)(u8 type, u16 clobber, void *firstinsn, unsigned len);
> +	unsigned (*patch)(u8 type, u16 clobber, void *insnbuf,
> +			  unsigned long addr, unsigned len);
>  
>  	/* Basic arch-specific setup */
>  	void (*arch_setup)(void);
> @@ -253,13 +254,16 @@ extern struct paravirt_ops paravirt_ops;
>  
>  unsigned paravirt_patch_nop(void);
>  unsigned paravirt_patch_ignore(unsigned len);
> -unsigned paravirt_patch_call(void *target, u16 tgt_clobbers,
> -			     void *site, u16 site_clobbers,
> +unsigned paravirt_patch_call(void *insnbuf,
> +			     const void *target, u16 tgt_clobbers,
> +			     unsigned long addr, u16 site_clobbers,
>  			     unsigned len);
> -unsigned paravirt_patch_jmp(void *target, void *site, unsigned len);
> -unsigned paravirt_patch_default(u8 type, u16 clobbers, void *site, unsigned len);
> -
> -unsigned paravirt_patch_insns(void *site, unsigned len,
> +unsigned paravirt_patch_jmp(const void *target, void *insnbuf,
> +			    unsigned long addr, unsigned len);
> +unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
> +				unsigned long addr, unsigned len);
> +
> +unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
>  			      const char *start, const char *end);
>  
>  int paravirt_disable_iospace(void);
> 
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-24 13:57     ` Mathieu Desnoyers
@ 2007-07-24 15:17       ` Jeremy Fitzhardinge
  2007-07-24 15:20         ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-24 15:17 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Rusty Russell, Andi Kleen, jbeulich, zach, patches, linux-kernel

Mathieu Desnoyers wrote:
> Hi Rusty,
>
> Good catch. I also still wonder why alternative (except alternative smp,
> which is ok) and paravirt patching functions are not marked __init. I really
> hope they are never used when NMI, MCE are enabled or when threads may
> have been preempted in the site being patched, or it could result in an
> illegal instruction.
>   

We need to patch modules at load time.  Presumably an NMI/MCE handler
will not depend on code in a being-loaded module.

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text
  2007-07-24 15:17       ` Jeremy Fitzhardinge
@ 2007-07-24 15:20         ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-07-24 15:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, Andi Kleen, jbeulich, zach, patches, linux-kernel

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Mathieu Desnoyers wrote:
> > Hi Rusty,
> >
> > Good catch. I also still wonder why alternative (except alternative smp,
> > which is ok) and paravirt patching functions are not marked __init. I really
> > hope they are never used when NMI, MCE are enabled or when threads may
> > have been preempted in the site being patched, or it could result in an
> > illegal instruction.
> >   
> 
> We need to patch modules at load time.  Presumably an NMI/MCE handler
> will not depend on code in a being-loaded module.
> 
>     J
> 

Oh yes, I see. That's ok since the code of a being-loaded module is not
executed yet.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-07-24 15:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-20 15:32 [PATCH] [0/11] Some more x86 patches for review Andi Kleen
2007-07-20 15:32 ` [PATCH] [1/11] i386: Reserve the right performance counter for the Intel PerfMon NMI watchdog Andi Kleen
2007-07-20 15:32 ` [PATCH] [2/11] x86: Fix alternatives and kprobes to remap write-protected kernel text Andi Kleen
2007-07-20 15:59   ` Mathieu Desnoyers
2007-07-23 14:46   ` Mathieu Desnoyers
2007-07-23 15:01     ` Andi Kleen
2007-07-23 15:25       ` Mathieu Desnoyers
2007-07-24  6:06   ` Rusty Russell
2007-07-24 13:57     ` Mathieu Desnoyers
2007-07-24 15:17       ` Jeremy Fitzhardinge
2007-07-24 15:20         ` Mathieu Desnoyers
2007-07-20 15:32 ` [PATCH] [3/11] x86: Stop MCEs and NMIs during code patching Andi Kleen
2007-07-20 15:45   ` Jeremy Fitzhardinge
2007-07-20 16:03   ` Mathieu Desnoyers
2007-07-20 15:32 ` [PATCH] [4/11] x86_64: Set K8 CPUID flag for K8/Fam10h/Fam11h Andi Kleen
2007-07-20 15:32 ` [PATCH] [5/11] x86: Unify alternative nop handling between i386 and x86-64 Andi Kleen
2007-07-20 15:32 ` [PATCH] [6/11] i386: Tune AMD Fam10h/11h like K8 Andi Kleen
2007-07-20 15:32 ` [PATCH] [7/11] i386: Do not include other cpus' interrupt 0 in nmi_watchdog Andi Kleen
2007-07-20 15:32 ` [PATCH] [8/11] x86_64: x86_64 - Use non locked version for local_cmpxchg() Andi Kleen
2007-07-20 15:32 ` [PATCH] [9/11] x86: Replace NSC/Cyrix specific chipset access macros by inlined functions Andi Kleen
2007-07-21 12:00   ` Alexey Dobriyan
2007-07-21 12:04     ` Andi Kleen
2007-07-22 10:45     ` Juergen Beisert
2007-07-20 15:32 ` [PATCH] [10/11] i386: Handle P6s without performance counters in nmi watchdog Andi Kleen
2007-07-20 15:32 ` [PATCH] [11/11] i386: Use patchable lock prefix in set_64bit Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox