public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc
@ 2007-05-01  3:57 Andi Kleen
  2007-05-01  3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
                   ` (29 more replies)
  0 siblings, 30 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:57 UTC (permalink / raw)
  To: patches, linux-kernel


The last batch of x86 patches for .22 to review.

This one contains various patches that haven't hit l-k yet. Please
review closely.

- Finally vDSO support for x86-64
  * glibc support still missing unfortunately
- New early CPUID checking for i386
- Restructured NMI watchdog code
- Dynamic MCE polling interval adaption from Tim Hockin
- Some NUMA changes
- Use RDTSCP for synchronous get_cycles if possible
- Fix APIC timer calibration on x86-64 (stable candidate too) 
- Drop CONFIG_REORDER as announced earlier
- Misc stuff

Happy reviewing!

-Andi


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [1/30] x86_64: Dynamically adjust machine check interval
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
@ 2007-05-01  3:57 ` Andi Kleen
  2007-05-01  3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:57 UTC (permalink / raw)
  To: thockin, patches, linux-kernel


From: Tim Hockin <thockin@google.com>

Background:
 We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
 especially when we are trying really hard to stress the system out.  The
 current MCE poller uses a static interval which does not care whether it
 has or has not found MCEs recently.

Description:
 This patch makes the MCE poller adjust the polling interval dynamically.
 If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
 MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
 tunable becomes the max polling interval.  The "Machine check events
 logged" printk() is rate limited to the check_interval, which should be
 identical behavior to the old functionality.

Result:
 If you start to take a lot of correctable errors (not exceptions), you
 log them faster and more accurately (less chance of overflowing the MCA
 registers).  If you don't take a lot of errors, you will see no change.

Alternatives:
 I considered simply reducing the polling interval to 10 ms immediately
 and keeping it there as long as we continue to find errors.  This felt a
 bit heavy handed, but does perform significantly better for the default
 check_interval of 5 minutes (we're using a few seconds when testing for
 DRAM errors).  I could be convinced to go with this, if anyone felt it
 was not too aggressive.

Testing:
 I used an error-injecting DIMM to create lots of correctable DRAM errors
 and verified that the polling interval accelerates.  The printk() only
 happens once per check_interval seconds.

Patch:
 This patch is against 2.6.21-rc7.

Signed-Off-By: Tim Hockin <thockin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 Documentation/x86_64/machinecheck |    7 ++++++-
 arch/x86_64/kernel/mce.c          |   32 ++++++++++++++++++++++++--------
 2 files changed, 30 insertions(+), 9 deletions(-)

Index: linux/Documentation/x86_64/machinecheck
===================================================================
--- linux.orig/Documentation/x86_64/machinecheck
+++ linux/Documentation/x86_64/machinecheck
@@ -36,7 +36,12 @@ between all CPUs.
 
 check_interval
 	How often to poll for corrected machine check errors, in seconds
-	(Note output is hexademical). Default 5 minutes.
+	(Note output is hexademical). Default 5 minutes.  When the poller
+	finds MCEs it triggers an exponential speedup (poll more often) on
+	the polling interval.  When the poller stops finding MCEs, it
+	triggers an exponential backoff (poll less often) on the polling
+	interval. The check_interval variable is both the initial and
+	maximum polling interval.
 
 tolerant
 	Tolerance level. When a machine check exception occurs for a non
Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -323,10 +323,13 @@ void mce_log_therm_throt_event(unsigned 
 #endif /* CONFIG_X86_MCE_INTEL */
 
 /*
- * Periodic polling timer for "silent" machine check errors.
+ * Periodic polling timer for "silent" machine check errors.  If the
+ * poller finds an MCE, poll 2x faster.  When the poller finds no more
+ * errors, poll 2x slower (up to check_interval seconds).
  */
 
 static int check_interval = 5 * 60; /* 5 minutes */
+static int next_interval; /* in jiffies */
 static void mcheck_timer(struct work_struct *work);
 static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
 
@@ -339,7 +342,6 @@ static void mcheck_check_cpu(void *info)
 static void mcheck_timer(struct work_struct *work)
 {
 	on_each_cpu(mcheck_check_cpu, NULL, 1, 1);
-	schedule_delayed_work(&mcheck_work, check_interval * HZ);
 
 	/*
 	 * It's ok to read stale data here for notify_user and
@@ -349,17 +351,30 @@ static void mcheck_timer(struct work_str
 	 * writes.
 	 */
 	if (notify_user && console_logged) {
+		static unsigned long last_print;
+		unsigned long now = jiffies;
+
+		/* if we logged an MCE, reduce the polling interval */
+		next_interval = max(next_interval/2, HZ/100);
 		notify_user = 0;
 		clear_bit(0, &console_logged);
-		printk(KERN_INFO "Machine check events logged\n");
+		if (time_after_eq(now, last_print + (check_interval*HZ))) {
+			last_print = now;
+			printk(KERN_INFO "Machine check events logged\n");
+		}
+	} else {
+		next_interval = min(next_interval*2, check_interval*HZ);
 	}
+
+	schedule_delayed_work(&mcheck_work, next_interval);
 }
 
 
 static __init int periodic_mcheck_init(void)
 { 
-	if (check_interval)
-		schedule_delayed_work(&mcheck_work, check_interval*HZ);
+	next_interval = check_interval * HZ;
+	if (next_interval)
+		schedule_delayed_work(&mcheck_work, next_interval);
 	return 0;
 } 
 __initcall(periodic_mcheck_init);
@@ -597,12 +612,13 @@ static int mce_resume(struct sys_device 
 /* Reinit MCEs after user configuration changes */
 static void mce_restart(void) 
 { 
-	if (check_interval)
+	if (next_interval)
 		cancel_delayed_work(&mcheck_work);
 	/* Timer race is harmless here */
 	on_each_cpu(mce_init, NULL, 1, 1);       
-	if (check_interval)
-		schedule_delayed_work(&mcheck_work, check_interval*HZ);
+	next_interval = check_interval * HZ;
+	if (next_interval)
+		schedule_delayed_work(&mcheck_work, next_interval);
 }
 
 static struct sysdev_class mce_sysclass = {

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
  2007-05-01  3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
@ 2007-05-01  3:57 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:57 UTC (permalink / raw)
  To: suresh.b.siddha, andi, dada1, rientjes, clameter, patches,
	linux-kernel


From: Suresh Siddha <suresh.b.siddha@intel.com>

Set the node_possible_map at runtime on x86_64.  On a non NUMA system,
num_possible_nodes() will now say '1'.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
---

---
 arch/x86_64/mm/k8topology.c |    7 ++-----
 arch/x86_64/mm/numa.c       |   10 ++++++++--
 arch/x86_64/mm/srat.c       |    8 +++++---
 3 files changed, 15 insertions(+), 10 deletions(-)

Index: linux/arch/x86_64/mm/k8topology.c
===================================================================
--- linux.orig/arch/x86_64/mm/k8topology.c
+++ linux/arch/x86_64/mm/k8topology.c
@@ -49,11 +49,8 @@ int __init k8_scan_nodes(unsigned long s
 	int found = 0;
 	u32 reg;
 	unsigned numnodes;
-	nodemask_t nodes_parsed;
 	unsigned dualcore = 0;
 
-	nodes_clear(nodes_parsed);
-
 	if (!early_pci_allowed())
 		return -1;
 
@@ -102,7 +99,7 @@ int __init k8_scan_nodes(unsigned long s
 			       nodeid, (base>>8)&3, (limit>>8) & 3); 
 			return -1; 
 		}	
-		if (node_isset(nodeid, nodes_parsed)) { 
+		if (node_isset(nodeid, node_possible_map)) {
 			printk(KERN_INFO "Node %d already present. Skipping\n", 
 			       nodeid);
 			continue;
@@ -155,7 +152,7 @@ int __init k8_scan_nodes(unsigned long s
 
 		prevbase = base;
 
-		node_set(nodeid, nodes_parsed);
+		node_set(nodeid, node_possible_map);
 	} 
 
 	if (!found)
Index: linux/arch/x86_64/mm/numa.c
===================================================================
--- linux.orig/arch/x86_64/mm/numa.c
+++ linux/arch/x86_64/mm/numa.c
@@ -295,7 +295,7 @@ static int __init setup_node_range(int n
 		ret = -1;
 	}
 	nodes[nid].end = *addr;
-	node_set_online(nid);
+	node_set(nid, node_possible_map);
 	printk(KERN_INFO "Faking node %d at %016Lx-%016Lx (%LuMB)\n", nid,
 	       nodes[nid].start, nodes[nid].end,
 	       (nodes[nid].end - nodes[nid].start) >> 20);
@@ -479,7 +479,7 @@ out:
 	 * SRAT.
 	 */
 	remove_all_active_ranges();
-	for_each_online_node(i) {
+	for_each_node_mask(i, node_possible_map) {
 		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
 						nodes[i].end >> PAGE_SHIFT);
  		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
@@ -494,20 +494,25 @@ void __init numa_initmem_init(unsigned l
 { 
 	int i;
 
+	nodes_clear(node_possible_map);
+
 #ifdef CONFIG_NUMA_EMU
 	if (cmdline && !numa_emulation(start_pfn, end_pfn))
  		return;
+	nodes_clear(node_possible_map);
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
 	if (!numa_off && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
 					  end_pfn << PAGE_SHIFT))
  		return;
+	nodes_clear(node_possible_map);
 #endif
 
 #ifdef CONFIG_K8_NUMA
 	if (!numa_off && !k8_scan_nodes(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT))
 		return;
+	nodes_clear(node_possible_map);
 #endif
 	printk(KERN_INFO "%s\n",
 	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
@@ -521,6 +526,7 @@ void __init numa_initmem_init(unsigned l
 	memnodemap[0] = 0;
 	nodes_clear(node_online_map);
 	node_set_online(0);
+	node_set(0, node_possible_map);
 	for (i = 0; i < NR_CPUS; i++)
 		numa_set_node(i, 0);
 	node_to_cpumask[0] = cpumask_of_cpu(0);
Index: linux/arch/x86_64/mm/srat.c
===================================================================
--- linux.orig/arch/x86_64/mm/srat.c
+++ linux/arch/x86_64/mm/srat.c
@@ -419,19 +419,21 @@ int __init acpi_scan_nodes(unsigned long
 		return -1;
 	}
 
+	node_possible_map = nodes_parsed;
+
 	/* Finally register nodes */
-	for_each_node_mask(i, nodes_parsed)
+	for_each_node_mask(i, node_possible_map)
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	/* Try again in case setup_node_bootmem missed one due
 	   to missing bootmem */
-	for_each_node_mask(i, nodes_parsed)
+	for_each_node_mask(i, node_possible_map)
 		if (!node_online(i))
 			setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 
 	for (i = 0; i < NR_CPUS; i++) {
 		if (cpu_to_node[i] == NUMA_NO_NODE)
 			continue;
-		if (!node_isset(cpu_to_node[i], nodes_parsed))
+		if (!node_isset(cpu_to_node[i], node_possible_map))
 			numa_set_node(i, NUMA_NO_NODE);
 	}
 	numa_init_array();

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [3/30] i386: Clean up NMI watchdog code
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
  2007-05-01  3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
  2007-05-01  3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/Makefile           |    2 
 arch/i386/kernel/cpu/perfctr-watchdog.c |  658 +++++++++++++++++++++++++
 arch/i386/kernel/nmi.c                  |  829 ++------------------------------
 include/asm-i386/nmi.h                  |    8 
 4 files changed, 721 insertions(+), 776 deletions(-)

Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/i386/kernel/nmi.c
+++ linux/arch/i386/kernel/nmi.c
@@ -20,7 +20,6 @@
 #include <linux/sysdev.h>
 #include <linux/sysctl.h>
 #include <linux/percpu.h>
-#include <linux/dmi.h>
 #include <linux/kprobes.h>
 #include <linux/cpumask.h>
 #include <linux/kernel_stat.h>
@@ -28,30 +27,14 @@
 #include <asm/smp.h>
 #include <asm/nmi.h>
 #include <asm/kdebug.h>
-#include <asm/intel_arch_perfmon.h>
 
 #include "mach_traps.h"
 
 int unknown_nmi_panic;
 int nmi_watchdog_enabled;
 
-/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
- * evtsel_nmi_owner tracks the ownership of the event selection
- * - different performance counters/ event selection may be reserved for
- *   different subsystems this reservation system just tries to coordinate
- *   things a little
- */
-
-/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
- * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
- */
-#define NMI_MAX_COUNTER_BITS 66
-#define NMI_MAX_COUNTER_LONGS BITS_TO_LONGS(NMI_MAX_COUNTER_BITS)
-
-static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-
 static cpumask_t backtrace_mask = CPU_MASK_NONE;
+
 /* nmi_active:
  * >0: the lapic NMI watchdog is active, but can be disabled
  * <0: the lapic NMI watchdog has not been set up, and cannot
@@ -63,203 +46,11 @@ atomic_t nmi_active = ATOMIC_INIT(0);		/
 unsigned int nmi_watchdog = NMI_DEFAULT;
 static unsigned int nmi_hz = HZ;
 
-struct nmi_watchdog_ctlblk {
-	int enabled;
-	u64 check_bit;
-	unsigned int cccr_msr;
-	unsigned int perfctr_msr;  /* the MSR to reset in NMI handler */
-	unsigned int evntsel_msr;  /* the MSR to select the events to handle */
-};
-static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+static DEFINE_PER_CPU(short, wd_enabled);
 
 /* local prototypes */
 static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
 
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
-{
-	/* returns the bit offset of the performance counter register */
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return (msr - MSR_K7_PERFCTR0);
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return (msr - MSR_ARCH_PERFMON_PERFCTR0);
-
-		switch (boot_cpu_data.x86) {
-		case 6:
-			return (msr - MSR_P6_PERFCTR0);
-		case 15:
-			return (msr - MSR_P4_BPU_PERFCTR0);
-		}
-	}
-	return 0;
-}
-
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
-{
-	/* returns the bit offset of the event selection register */
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return (msr - MSR_K7_EVNTSEL0);
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
-
-		switch (boot_cpu_data.x86) {
-		case 6:
-			return (msr - MSR_P6_EVNTSEL0);
-		case 15:
-			return (msr - MSR_P4_BSU_ESCR0);
-		}
-	}
-	return 0;
-}
-
-/* checks for a bit availability (hack for oprofile) */
-int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
-{
-	int cpu;
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-	for_each_possible_cpu (cpu) {
-		if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
-			return 0;
-	}
-	return 1;
-}
-
-/* checks the an msr for availability */
-int avail_to_resrv_perfctr_nmi(unsigned int msr)
-{
-	unsigned int counter;
-	int cpu;
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	for_each_possible_cpu (cpu) {
-		if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
-			return 0;
-	}
-	return 1;
-}
-
-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	if (!test_and_set_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
-		return 1;
-	return 0;
-}
-
-static void __release_perfctr_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	clear_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]);
-}
-
-int reserve_perfctr_nmi(unsigned int msr)
-{
-	int cpu, i;
-	for_each_possible_cpu (cpu) {
-		if (!__reserve_perfctr_nmi(cpu, msr)) {
-			for_each_possible_cpu (i) {
-				if (i >= cpu)
-					break;
-				__release_perfctr_nmi(i, msr);
-			}
-			return 0;
-		}
-	}
-	return 1;
-}
-
-void release_perfctr_nmi(unsigned int msr)
-{
-	int cpu;
-	for_each_possible_cpu (cpu) {
-		__release_perfctr_nmi(cpu, msr);
-	}
-}
-
-int __reserve_evntsel_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_evntsel_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	if (!test_and_set_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]))
-		return 1;
-	return 0;
-}
-
-static void __release_evntsel_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_evntsel_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	clear_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]);
-}
-
-int reserve_evntsel_nmi(unsigned int msr)
-{
-	int cpu, i;
-	for_each_possible_cpu (cpu) {
-		if (!__reserve_evntsel_nmi(cpu, msr)) {
-			for_each_possible_cpu (i) {
-				if (i >= cpu)
-					break;
-				__release_evntsel_nmi(i, msr);
-			}
-			return 0;
-		}
-	}
-	return 1;
-}
-
-void release_evntsel_nmi(unsigned int msr)
-{
-	int cpu;
-	for_each_possible_cpu (cpu) {
-		__release_evntsel_nmi(cpu, msr);
-	}
-}
-
-static __cpuinit inline int nmi_known_cpu(void)
-{
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6)
-			|| (boot_cpu_data.x86 == 16));
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return 1;
-		else
-			return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
-	}
-	return 0;
-}
-
 static int endflag __initdata = 0;
 
 #ifdef CONFIG_SMP
@@ -281,28 +72,6 @@ static __init void nmi_cpu_busy(void *da
 }
 #endif
 
-static unsigned int adjust_for_32bit_ctr(unsigned int hz)
-{
-	u64 counter_val;
-	unsigned int retval = hz;
-
-	/*
-	 * On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter
-	 * are writable, with higher bits sign extending from bit 31.
-	 * So, we can only program the counter with 31 bit values and
-	 * 32nd bit should be 1, for 33.. to be 1.
-	 * Find the appropriate nmi_hz
-	 */
-	counter_val = (u64)cpu_khz * 1000;
-	do_div(counter_val, retval);
- 	if (counter_val > 0x7fffffffULL) {
-		u64 count = (u64)cpu_khz * 1000;
-		do_div(count, 0x7fffffffUL);
-		retval = count + 1;
-	}
-	return retval;
-}
-
 static int __init check_nmi_watchdog(void)
 {
 	unsigned int *prev_nmi_count;
@@ -335,14 +104,14 @@ static int __init check_nmi_watchdog(voi
 		if (!cpu_isset(cpu, cpu_callin_map))
 			continue;
 #endif
-		if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+		if (!per_cpu(wd_enabled, cpu))
 			continue;
 		if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
 			printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
 				cpu,
 				prev_nmi_count[cpu],
 				nmi_count(cpu));
-			per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+			per_cpu(wd_enabled, cpu) = 0;
 			atomic_dec(&nmi_active);
 		}
 	}
@@ -356,16 +125,8 @@ static int __init check_nmi_watchdog(voi
 
 	/* now that we know it works we can reduce NMI frequency to
 	   something more reasonable; makes a difference in some configs */
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-		nmi_hz = 1;
-
-		if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
-		    wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
-			nmi_hz = adjust_for_32bit_ctr(nmi_hz);
-		}
-	}
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+		nmi_hz = lapic_adjust_nmi_hz(1);
 
 	kfree(prev_nmi_count);
 	return 0;
@@ -388,85 +149,8 @@ static int __init setup_nmi_watchdog(cha
 
 __setup("nmi_watchdog=", setup_nmi_watchdog);
 
-static void disable_lapic_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
-	if (atomic_read(&nmi_active) <= 0)
-		return;
-
-	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
-	BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-static void enable_lapic_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
-	/* are we already enabled */
-	if (atomic_read(&nmi_active) != 0)
-		return;
-
-	/* are we lapic aware */
-	if (nmi_known_cpu() <= 0)
-		return;
-
-	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
-	touch_nmi_watchdog();
-}
 
-void disable_timer_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
-	if (atomic_read(&nmi_active) <= 0)
-		return;
-
-	disable_irq(0);
-	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
-	BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-void enable_timer_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
-	if (atomic_read(&nmi_active) == 0) {
-		touch_nmi_watchdog();
-		on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
-		enable_irq(0);
-	}
-}
-
-static void __acpi_nmi_disable(void *__unused)
-{
-	apic_write_around(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
-}
-
-/*
- * Disable timer based NMIs on all CPUs:
- */
-void acpi_nmi_disable(void)
-{
-	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
-		on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
-}
-
-static void __acpi_nmi_enable(void *__unused)
-{
-	apic_write_around(APIC_LVT0, APIC_DM_NMI);
-}
-
-/*
- * Enable timer based NMIs on all CPUs:
- */
-void acpi_nmi_enable(void)
-{
-	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
-		on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
-}
+/* Suspend/resume support */
 
 #ifdef CONFIG_PM
 
@@ -513,7 +197,7 @@ static int __init init_lapic_nmi_sysfs(v
 	if (nmi_watchdog != NMI_LOCAL_APIC)
 		return 0;
 
-	if ( atomic_read(&nmi_active) < 0 )
+	if (atomic_read(&nmi_active) < 0)
 		return 0;
 
 	error = sysdev_class_register(&nmi_sysclass);
@@ -526,433 +210,69 @@ late_initcall(init_lapic_nmi_sysfs);
 
 #endif	/* CONFIG_PM */
 
-/*
- * Activate the NMI watchdog via the local APIC.
- * Original code written by Keith Owens.
- */
-
-static void write_watchdog_counter(unsigned int perfctr_msr, const char *descr)
-{
-	u64 count = (u64)cpu_khz * 1000;
-
-	do_div(count, nmi_hz);
-	if(descr)
-		Dprintk("setting %s to -0x%08Lx\n", descr, count);
-	wrmsrl(perfctr_msr, 0 - count);
-}
-
-static void write_watchdog_counter32(unsigned int perfctr_msr,
-		const char *descr)
-{
-	u64 count = (u64)cpu_khz * 1000;
-
-	do_div(count, nmi_hz);
-	if(descr)
-		Dprintk("setting %s to -0x%08Lx\n", descr, count);
-	wrmsr(perfctr_msr, (u32)(-count), 0);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
-   the frequency varies with CPU load. */
-
-#define K7_EVNTSEL_ENABLE	(1 << 22)
-#define K7_EVNTSEL_INT		(1 << 20)
-#define K7_EVNTSEL_OS		(1 << 17)
-#define K7_EVNTSEL_USR		(1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
-#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
-
-static int setup_k7_watchdog(void)
-{
-	unsigned int perfctr_msr, evntsel_msr;
-	unsigned int evntsel;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	perfctr_msr = MSR_K7_PERFCTR0;
-	evntsel_msr = MSR_K7_EVNTSEL0;
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	wrmsrl(perfctr_msr, 0UL);
-
-	evntsel = K7_EVNTSEL_INT
-		| K7_EVNTSEL_OS
-		| K7_EVNTSEL_USR
-		| K7_NMI_EVENT;
-
-	/* setup the timer */
-	wrmsr(evntsel_msr, evntsel, 0);
-	write_watchdog_counter(perfctr_msr, "K7_PERFCTR0");
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= K7_EVNTSEL_ENABLE;
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = 0;  //unused
-	wd->check_bit = 1ULL<<63;
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
-}
-
-static void stop_k7_watchdog(void)
-{
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-#define P6_EVNTSEL0_ENABLE	(1 << 22)
-#define P6_EVNTSEL_INT		(1 << 20)
-#define P6_EVNTSEL_OS		(1 << 17)
-#define P6_EVNTSEL_USR		(1 << 16)
-#define P6_EVENT_CPU_CLOCKS_NOT_HALTED	0x79
-#define P6_NMI_EVENT		P6_EVENT_CPU_CLOCKS_NOT_HALTED
-
-static int setup_p6_watchdog(void)
-{
-	unsigned int perfctr_msr, evntsel_msr;
-	unsigned int evntsel;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	perfctr_msr = MSR_P6_PERFCTR0;
-	evntsel_msr = MSR_P6_EVNTSEL0;
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	wrmsrl(perfctr_msr, 0UL);
-
-	evntsel = P6_EVNTSEL_INT
-		| P6_EVNTSEL_OS
-		| P6_EVNTSEL_USR
-		| P6_NMI_EVENT;
-
-	/* setup the timer */
-	wrmsr(evntsel_msr, evntsel, 0);
-	nmi_hz = adjust_for_32bit_ctr(nmi_hz);
-	write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0");
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= P6_EVNTSEL0_ENABLE;
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = 0;  //unused
-	wd->check_bit = 1ULL<<39;
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
-}
-
-static void stop_p6_watchdog(void)
-{
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
-   the frequency varies with CPU load. */
-
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
-#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
-#define P4_ESCR_OS		(1<<3)
-#define P4_ESCR_USR		(1<<2)
-#define P4_CCCR_OVF_PMI0	(1<<26)
-#define P4_CCCR_OVF_PMI1	(1<<27)
-#define P4_CCCR_THRESHOLD(N)	((N)<<20)
-#define P4_CCCR_COMPLEMENT	(1<<19)
-#define P4_CCCR_COMPARE		(1<<18)
-#define P4_CCCR_REQUIRED	(3<<16)
-#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
-#define P4_CCCR_ENABLE		(1<<12)
-#define P4_CCCR_OVF 		(1<<31)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
-   CRU_ESCR0 (with any non-null event selector) through a complemented
-   max threshold. [IA32-Vol3, Section 14.9.9] */
-
-static int setup_p4_watchdog(void)
+static void __acpi_nmi_enable(void *__unused)
 {
-	unsigned int perfctr_msr, evntsel_msr, cccr_msr;
-	unsigned int evntsel, cccr_val;
-	unsigned int misc_enable, dummy;
-	unsigned int ht_num;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
-	if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
-		return 0;
-
-#ifdef CONFIG_SMP
-	/* detect which hyperthread we are on */
-	if (smp_num_siblings == 2) {
-		unsigned int ebx, apicid;
-
-        	ebx = cpuid_ebx(1);
-	        apicid = (ebx >> 24) & 0xff;
-        	ht_num = apicid & 1;
-	} else
-#endif
-		ht_num = 0;
-
-	/* performance counters are shared resources
-	 * assign each hyperthread its own set
-	 * (re-use the ESCR0 register, seems safe
-	 * and keeps the cccr_val the same)
-	 */
-	if (!ht_num) {
-		/* logical cpu 0 */
-		perfctr_msr = MSR_P4_IQ_PERFCTR0;
-		evntsel_msr = MSR_P4_CRU_ESCR0;
-		cccr_msr = MSR_P4_IQ_CCCR0;
-		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
-	} else {
-		/* logical cpu 1 */
-		perfctr_msr = MSR_P4_IQ_PERFCTR1;
-		evntsel_msr = MSR_P4_CRU_ESCR0;
-		cccr_msr = MSR_P4_IQ_CCCR1;
-		cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
-	}
-
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	evntsel = P4_ESCR_EVENT_SELECT(0x3F)
-	 	| P4_ESCR_OS
-		| P4_ESCR_USR;
-
-	cccr_val |= P4_CCCR_THRESHOLD(15)
-		 | P4_CCCR_COMPLEMENT
-		 | P4_CCCR_COMPARE
-		 | P4_CCCR_REQUIRED;
-
-	wrmsr(evntsel_msr, evntsel, 0);
-	wrmsr(cccr_msr, cccr_val, 0);
-	write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0");
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	cccr_val |= P4_CCCR_ENABLE;
-	wrmsr(cccr_msr, cccr_val, 0);
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = cccr_msr;
-	wd->check_bit = 1ULL<<39;
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
+	apic_write_around(APIC_LVT0, APIC_DM_NMI);
 }
 
-static void stop_p4_watchdog(void)
+/*
+ * Enable timer based NMIs on all CPUs:
+ */
+void acpi_nmi_enable(void)
 {
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	wrmsr(wd->cccr_msr, 0, 0);
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
+	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
+		on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
 }
 
-#define ARCH_PERFMON_NMI_EVENT_SEL	ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK	ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
-
-static int setup_intel_arch_watchdog(void)
+static void __acpi_nmi_disable(void *__unused)
 {
-	unsigned int ebx;
-	union cpuid10_eax eax;
-	unsigned int unused;
-	unsigned int perfctr_msr, evntsel_msr;
-	unsigned int evntsel;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/*
-	 * Check whether the Architectural PerfMon supports
-	 * Unhalted Core Cycles Event or not.
-	 * NOTE: Corresponding bit = 0 in ebx indicates event present.
-	 */
-	cpuid(10, &(eax.full), &ebx, &unused, &unused);
-	if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
-	    (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
-		goto fail;
-
-	perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
-	evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
-
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	wrmsrl(perfctr_msr, 0UL);
-
-	evntsel = ARCH_PERFMON_EVENTSEL_INT
-		| ARCH_PERFMON_EVENTSEL_OS
-		| ARCH_PERFMON_EVENTSEL_USR
-		| ARCH_PERFMON_NMI_EVENT_SEL
-		| ARCH_PERFMON_NMI_EVENT_UMASK;
-
-	/* setup the timer */
-	wrmsr(evntsel_msr, evntsel, 0);
-	nmi_hz = adjust_for_32bit_ctr(nmi_hz);
-	write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0");
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = 0;  //unused
-	wd->check_bit = 1ULL << (eax.split.bit_width - 1);
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
+	apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
 }
 
-static void stop_intel_arch_watchdog(void)
+/*
+ * Disable timer based NMIs on all CPUs:
+ */
+void acpi_nmi_disable(void)
 {
-	unsigned int ebx;
-	union cpuid10_eax eax;
-	unsigned int unused;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/*
-	 * Check whether the Architectural PerfMon supports
-	 * Unhalted Core Cycles Event or not.
-	 * NOTE: Corresponding bit = 0 in ebx indicates event present.
-	 */
-	cpuid(10, &(eax.full), &ebx, &unused, &unused);
-	if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
-	    (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
-		return;
-
-	wrmsr(wd->evntsel_msr, 0, 0);
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
+	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
+		on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
 }
 
 void setup_apic_nmi_watchdog (void *unused)
 {
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/* only support LOCAL and IO APICs for now */
-	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
-	    (nmi_watchdog != NMI_IO_APIC))
-	    	return;
-
-	if (wd->enabled == 1)
-		return;
+	if (__get_cpu_var(wd_enabled))
+ 		return;
 
 	/* cheap hack to support suspend/resume */
 	/* if cpu0 is not active neither should the other cpus */
 	if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
 		return;
 
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		switch (boot_cpu_data.x86_vendor) {
-		case X86_VENDOR_AMD:
-			if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 &&
-				boot_cpu_data.x86 != 16)
-				return;
-			if (!setup_k7_watchdog())
-				return;
-			break;
-		case X86_VENDOR_INTEL:
-			if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
-				if (!setup_intel_arch_watchdog())
-					return;
-				break;
-			}
-			switch (boot_cpu_data.x86) {
-			case 6:
-				if (boot_cpu_data.x86_model > 0xd)
-					return;
-
-				if (!setup_p6_watchdog())
-					return;
-				break;
-			case 15:
-				if (boot_cpu_data.x86_model > 0x4)
-					return;
-
-				if (!setup_p4_watchdog())
-					return;
-				break;
-			default:
-				return;
-			}
-			break;
-		default:
+	switch (nmi_watchdog) {
+	case NMI_LOCAL_APIC:
+		__get_cpu_var(wd_enabled) = 1; /* enable it before to avoid race with handler */
+		if (lapic_watchdog_init(nmi_hz) < 0) {
+			__get_cpu_var(wd_enabled) = 0;
 			return;
 		}
+		/* FALL THROUGH */
+	case NMI_IO_APIC:
+		__get_cpu_var(wd_enabled) = 1;
+		atomic_inc(&nmi_active);
 	}
-	wd->enabled = 1;
-	atomic_inc(&nmi_active);
 }
 
 void stop_apic_nmi_watchdog(void *unused)
 {
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
 	/* only support LOCAL and IO APICs for now */
 	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
 	    (nmi_watchdog != NMI_IO_APIC))
 	    	return;
-
-	if (wd->enabled == 0)
+	if (__get_cpu_var(wd_enabled) == 0)
 		return;
-
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		switch (boot_cpu_data.x86_vendor) {
-		case X86_VENDOR_AMD:
-			stop_k7_watchdog();
-			break;
-		case X86_VENDOR_INTEL:
-			if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
-				stop_intel_arch_watchdog();
-				break;
-			}
-			switch (boot_cpu_data.x86) {
-			case 6:
-				if (boot_cpu_data.x86_model > 0xd)
-					break;
-				stop_p6_watchdog();
-				break;
-			case 15:
-				if (boot_cpu_data.x86_model > 0x4)
-					break;
-				stop_p4_watchdog();
-				break;
-			}
-			break;
-		default:
-			return;
-		}
-	}
-	wd->enabled = 0;
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+		lapic_watchdog_stop();
+	__get_cpu_var(wd_enabled) = 0;
 	atomic_dec(&nmi_active);
 }
 
@@ -1008,8 +328,6 @@ __kprobes int nmi_watchdog_tick(struct p
 	unsigned int sum;
 	int touched = 0;
 	int cpu = smp_processor_id();
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-	u64 dummy;
 	int rc=0;
 
 	/* check for other users first */
@@ -1052,53 +370,20 @@ __kprobes int nmi_watchdog_tick(struct p
 		alert_counter[cpu] = 0;
 	}
 	/* see if the nmi watchdog went off */
-	if (wd->enabled) {
-		if (nmi_watchdog == NMI_LOCAL_APIC) {
-			rdmsrl(wd->perfctr_msr, dummy);
-			if (dummy & wd->check_bit){
-				/* this wasn't a watchdog timer interrupt */
-				goto done;
-			}
-
-			/* only Intel P4 uses the cccr msr */
-	 		if (wd->cccr_msr != 0) {
-	 			/*
-	 			 * P4 quirks:
-	 			 * - An overflown perfctr will assert its interrupt
-	 			 *   until the OVF flag in its CCCR is cleared.
-	 			 * - LVTPC is masked on interrupt and must be
-	 			 *   unmasked by the LVTPC handler.
-	 			 */
-				rdmsrl(wd->cccr_msr, dummy);
-				dummy &= ~P4_CCCR_OVF;
-	 			wrmsrl(wd->cccr_msr, dummy);
-	 			apic_write(APIC_LVTPC, APIC_DM_NMI);
-				/* start the cycle over again */
-				write_watchdog_counter(wd->perfctr_msr, NULL);
-	 		}
-			else if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
-				 wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
-				/* P6 based Pentium M need to re-unmask
-				 * the apic vector but it doesn't hurt
-				 * other P6 variant.
-				 * ArchPerfom/Core Duo also needs this */
-				apic_write(APIC_LVTPC, APIC_DM_NMI);
-				/* P6/ARCH_PERFMON has 32 bit counter write */
-				write_watchdog_counter32(wd->perfctr_msr, NULL);
-			} else {
-				/* start the cycle over again */
-				write_watchdog_counter(wd->perfctr_msr, NULL);
-			}
-			rc = 1;
-		} else if (nmi_watchdog == NMI_IO_APIC) {
-			/* don't know how to accurately check for this.
-			 * just assume it was a watchdog timer interrupt
-			 * This matches the old behaviour.
-			 */
-			rc = 1;
-		}
+	if (!__get_cpu_var(wd_enabled))
+		return rc;
+	switch (nmi_watchdog) {
+	case NMI_LOCAL_APIC:
+		rc |= lapic_wd_event(nmi_hz);
+		break;
+	case NMI_IO_APIC:
+		/* don't know how to accurately check for this.
+		 * just assume it was a watchdog timer interrupt
+		 * This matches the old behaviour.
+		 */
+		rc = 1;
+		break;
 	}
-done:
 	return rc;
 }
 
@@ -1143,7 +428,7 @@ int proc_nmi_enabled(struct ctl_table *t
 	}
 
 	if (nmi_watchdog == NMI_DEFAULT) {
-		if (nmi_known_cpu() > 0)
+		if (lapic_watchdog_ok())
 			nmi_watchdog = NMI_LOCAL_APIC;
 		else
 			nmi_watchdog = NMI_IO_APIC;
@@ -1179,11 +464,3 @@ void __trigger_all_cpu_backtrace(void)
 
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
-EXPORT_SYMBOL(reserve_perfctr_nmi);
-EXPORT_SYMBOL(release_perfctr_nmi);
-EXPORT_SYMBOL(reserve_evntsel_nmi);
-EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(disable_timer_nmi_watchdog);
-EXPORT_SYMBOL(enable_timer_nmi_watchdog);
Index: linux/arch/i386/kernel/cpu/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/cpu/Makefile
+++ linux/arch/i386/kernel/cpu/Makefile
@@ -17,3 +17,5 @@ obj-$(CONFIG_X86_MCE)	+=	mcheck/
 
 obj-$(CONFIG_MTRR)	+= 	mtrr/
 obj-$(CONFIG_CPU_FREQ)	+=	cpufreq/
+
+obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
Index: linux/arch/i386/kernel/cpu/perfctr-watchdog.c
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -0,0 +1,658 @@
+/* local apic based NMI watchdog for various CPUs.
+   This file also handles reservation of performance counters for coordination
+   with other users (like oprofile).
+
+   Note that these events normally don't tick when the CPU idles. This means
+   the frequency varies with CPU load.
+
+   Original code for K7/P6 written by Keith Owens */
+
+#include <linux/percpu.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <linux/smp.h>
+#include <linux/nmi.h>
+#include <asm/apic.h>
+#include <asm/intel_arch_perfmon.h>
+
+struct nmi_watchdog_ctlblk {
+	unsigned int cccr_msr;
+	unsigned int perfctr_msr;  /* the MSR to reset in NMI handler */
+	unsigned int evntsel_msr;  /* the MSR to select the events to handle */
+};
+
+/* Interface defining a CPU specific perfctr watchdog */
+struct wd_ops {
+	int (*reserve)(void);
+	void (*unreserve)(void);
+	int (*setup)(unsigned nmi_hz);
+	void (*rearm)(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz);
+	void (*stop)(void *);
+	unsigned perfctr;
+	unsigned evntsel;
+	u64 checkbit;
+};
+
+static struct wd_ops *wd_ops;
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
+
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ *   different subsystems this reservation system just tries to coordinate
+ *   things a little
+ */
+static DECLARE_BITMAP(perfctr_nmi_owner, NMI_MAX_COUNTER_BITS);
+static DECLARE_BITMAP(evntsel_nmi_owner, NMI_MAX_COUNTER_BITS);
+
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+	return wd_ops ? msr - wd_ops->perfctr : 0;
+}
+
+/* converts an msr to an appropriate reservation bit */
+/* returns the bit offset of the event selection register */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+	return wd_ops ? msr - wd_ops->evntsel : 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, perfctr_nmi_owner));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, perfctr_nmi_owner));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, perfctr_nmi_owner))
+		return 1;
+	return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, perfctr_nmi_owner);
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, evntsel_nmi_owner))
+		return 1;
+	return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, evntsel_nmi_owner);
+}
+
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
+
+void disable_lapic_nmi_watchdog(void)
+{
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	if (atomic_read(&nmi_active) <= 0)
+		return;
+
+	on_each_cpu(wd_ops->stop, NULL, 0, 1);
+	wd_ops->unreserve();
+
+	BUG_ON(atomic_read(&nmi_active) != 0);
+}
+
+void enable_lapic_nmi_watchdog(void)
+{
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	/* are we already enabled */
+	if (atomic_read(&nmi_active) != 0)
+		return;
+
+	/* are we lapic aware */
+	if (!wd_ops)
+		return;
+	if (!wd_ops->reserve()) {
+		printk(KERN_ERR "NMI watchdog: cannot reserve perfctrs\n");
+		return;
+	}
+
+	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+	touch_nmi_watchdog();
+}
+
+/*
+ * Activate the NMI watchdog via the local APIC.
+ */
+
+static unsigned int adjust_for_32bit_ctr(unsigned int hz)
+{
+	u64 counter_val;
+	unsigned int retval = hz;
+
+	/*
+	 * On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter
+	 * are writable, with higher bits sign extending from bit 31.
+	 * So, we can only program the counter with 31 bit values and
+	 * 32nd bit should be 1, for 33.. to be 1.
+	 * Find the appropriate nmi_hz
+	 */
+	counter_val = (u64)cpu_khz * 1000;
+	do_div(counter_val, retval);
+ 	if (counter_val > 0x7fffffffULL) {
+		u64 count = (u64)cpu_khz * 1000;
+		do_div(count, 0x7fffffffUL);
+		retval = count + 1;
+	}
+	return retval;
+}
+
+static void
+write_watchdog_counter(unsigned int perfctr_msr, const char *descr, unsigned nmi_hz)
+{
+	u64 count = (u64)cpu_khz * 1000;
+
+	do_div(count, nmi_hz);
+	if(descr)
+		Dprintk("setting %s to -0x%08Lx\n", descr, count);
+	wrmsrl(perfctr_msr, 0 - count);
+}
+
+static void write_watchdog_counter32(unsigned int perfctr_msr,
+		const char *descr, unsigned nmi_hz)
+{
+	u64 count = (u64)cpu_khz * 1000;
+
+	do_div(count, nmi_hz);
+	if(descr)
+		Dprintk("setting %s to -0x%08Lx\n", descr, count);
+	wrmsr(perfctr_msr, (u32)(-count), 0);
+}
+
+/* AMD K7/K8/Family10h/Family11h support. AMD keeps this interface
+   nicely stable so there is not much variety */
+
+#define K7_EVNTSEL_ENABLE	(1 << 22)
+#define K7_EVNTSEL_INT		(1 << 20)
+#define K7_EVNTSEL_OS		(1 << 17)
+#define K7_EVNTSEL_USR		(1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
+#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+
+static int setup_k7_watchdog(unsigned nmi_hz)
+{
+	unsigned int perfctr_msr, evntsel_msr;
+	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	perfctr_msr = MSR_K7_PERFCTR0;
+	evntsel_msr = MSR_K7_EVNTSEL0;
+
+	wrmsrl(perfctr_msr, 0UL);
+
+	evntsel = K7_EVNTSEL_INT
+		| K7_EVNTSEL_OS
+		| K7_EVNTSEL_USR
+		| K7_NMI_EVENT;
+
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	write_watchdog_counter(perfctr_msr, "K7_PERFCTR0",nmi_hz);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	evntsel |= K7_EVNTSEL_ENABLE;
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	return 1;
+}
+
+static void single_msr_stop_watchdog(void *arg)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->evntsel_msr, 0, 0);
+}
+
+static int single_msr_reserve(void)
+{
+	if (!reserve_perfctr_nmi(wd_ops->perfctr))
+		return 0;
+
+	if (!reserve_evntsel_nmi(wd_ops->evntsel)) {
+		release_perfctr_nmi(wd_ops->perfctr);
+		return 0;
+	}
+	return 1;
+}
+
+static void single_msr_unreserve(void)
+{
+	release_evntsel_nmi(wd_ops->perfctr);
+	release_perfctr_nmi(wd_ops->evntsel);
+}
+
+static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+	/* start the cycle over again */
+	write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
+}
+
+static struct wd_ops k7_wd_ops = {
+	.reserve = single_msr_reserve,
+	.unreserve = single_msr_unreserve,
+	.setup = setup_k7_watchdog,
+	.rearm = single_msr_rearm,
+	.stop = single_msr_stop_watchdog,
+	.perfctr = MSR_K7_PERFCTR0,
+	.evntsel = MSR_K7_EVNTSEL0,
+	.checkbit = 1ULL<<63,
+};
+
+/* Intel Model 6 (PPro+,P2,P3,P-M,Core1) */
+
+#define P6_EVNTSEL0_ENABLE	(1 << 22)
+#define P6_EVNTSEL_INT		(1 << 20)
+#define P6_EVNTSEL_OS		(1 << 17)
+#define P6_EVNTSEL_USR		(1 << 16)
+#define P6_EVENT_CPU_CLOCKS_NOT_HALTED	0x79
+#define P6_NMI_EVENT		P6_EVENT_CPU_CLOCKS_NOT_HALTED
+
+static int setup_p6_watchdog(unsigned nmi_hz)
+{
+	unsigned int perfctr_msr, evntsel_msr;
+	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	perfctr_msr = MSR_P6_PERFCTR0;
+	evntsel_msr = MSR_P6_EVNTSEL0;
+
+	wrmsrl(perfctr_msr, 0UL);
+
+	evntsel = P6_EVNTSEL_INT
+		| P6_EVNTSEL_OS
+		| P6_EVNTSEL_USR
+		| P6_NMI_EVENT;
+
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	nmi_hz = adjust_for_32bit_ctr(nmi_hz);
+	write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0",nmi_hz);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	evntsel |= P6_EVNTSEL0_ENABLE;
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	return 1;
+}
+
+static void p6_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+	/* P6 based Pentium M need to re-unmask
+	 * the apic vector but it doesn't hurt
+	 * other P6 variant.
+	 * ArchPerfom/Core Duo also needs this */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	/* P6/ARCH_PERFMON has 32 bit counter write */
+	write_watchdog_counter32(wd->perfctr_msr, NULL,nmi_hz);
+}
+
+static struct wd_ops p6_wd_ops = {
+	.reserve = single_msr_reserve,
+	.unreserve = single_msr_unreserve,
+	.setup = setup_p6_watchdog,
+	.rearm = p6_rearm,
+	.stop = single_msr_stop_watchdog,
+	.perfctr = MSR_P6_PERFCTR0,
+	.evntsel = MSR_P6_EVNTSEL0,
+	.checkbit = 1ULL<<39,
+};
+
+/* Intel P4 performance counters. By far the most complicated of all. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
+#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
+#define P4_ESCR_OS		(1<<3)
+#define P4_ESCR_USR		(1<<2)
+#define P4_CCCR_OVF_PMI0	(1<<26)
+#define P4_CCCR_OVF_PMI1	(1<<27)
+#define P4_CCCR_THRESHOLD(N)	((N)<<20)
+#define P4_CCCR_COMPLEMENT	(1<<19)
+#define P4_CCCR_COMPARE		(1<<18)
+#define P4_CCCR_REQUIRED	(3<<16)
+#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
+#define P4_CCCR_ENABLE		(1<<12)
+#define P4_CCCR_OVF 		(1<<31)
+
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+   CRU_ESCR0 (with any non-null event selector) through a complemented
+   max threshold. [IA32-Vol3, Section 14.9.9] */
+
+static int setup_p4_watchdog(unsigned nmi_hz)
+{
+	unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+	unsigned int evntsel, cccr_val;
+	unsigned int misc_enable, dummy;
+	unsigned int ht_num;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
+	if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
+		return 0;
+
+#ifdef CONFIG_SMP
+	/* detect which hyperthread we are on */
+	if (smp_num_siblings == 2) {
+		unsigned int ebx, apicid;
+
+        	ebx = cpuid_ebx(1);
+	        apicid = (ebx >> 24) & 0xff;
+        	ht_num = apicid & 1;
+	} else
+#endif
+		ht_num = 0;
+
+	/* performance counters are shared resources
+	 * assign each hyperthread its own set
+	 * (re-use the ESCR0 register, seems safe
+	 * and keeps the cccr_val the same)
+	 */
+	if (!ht_num) {
+		/* logical cpu 0 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR0;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR0;
+		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+	} else {
+		/* logical cpu 1 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR1;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR1;
+		cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+	}
+
+	evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+	 	| P4_ESCR_OS
+		| P4_ESCR_USR;
+
+	cccr_val |= P4_CCCR_THRESHOLD(15)
+		 | P4_CCCR_COMPLEMENT
+		 | P4_CCCR_COMPARE
+		 | P4_CCCR_REQUIRED;
+
+	wrmsr(evntsel_msr, evntsel, 0);
+	wrmsr(cccr_msr, cccr_val, 0);
+	write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0", nmi_hz);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	cccr_val |= P4_CCCR_ENABLE;
+	wrmsr(cccr_msr, cccr_val, 0);
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = cccr_msr;
+	return 1;
+}
+
+static void stop_p4_watchdog(void *arg)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+	wrmsr(wd->cccr_msr, 0, 0);
+	wrmsr(wd->evntsel_msr, 0, 0);
+}
+
+static int p4_reserve(void)
+{
+	if (!reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR0))
+		return 0;
+#ifdef CONFIG_SMP
+	if (smp_num_siblings > 1 && !reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR1))
+		goto fail1;
+#endif
+	if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+		goto fail2;
+	/* RED-PEN why is ESCR1 not reserved here? */
+	return 1;
+ fail2:
+#ifdef CONFIG_SMP
+	if (smp_num_siblings > 1)
+		release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
+ fail1:
+#endif
+	release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
+	return 0;
+}
+
+static void p4_unreserve(void)
+{
+#ifdef CONFIG_SMP
+	if (smp_num_siblings > 1)
+		release_evntsel_nmi(MSR_P4_IQ_PERFCTR1);
+#endif
+	release_evntsel_nmi(MSR_P4_IQ_PERFCTR0);
+	release_perfctr_nmi(MSR_P4_CRU_ESCR0);
+}
+
+static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+	unsigned dummy;
+	/*
+ 	 * P4 quirks:
+	 * - An overflown perfctr will assert its interrupt
+	 *   until the OVF flag in its CCCR is cleared.
+	 * - LVTPC is masked on interrupt and must be
+	 *   unmasked by the LVTPC handler.
+	 */
+	rdmsrl(wd->cccr_msr, dummy);
+	dummy &= ~P4_CCCR_OVF;
+	wrmsrl(wd->cccr_msr, dummy);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	/* start the cycle over again */
+	write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
+}
+
+static struct wd_ops p4_wd_ops = {
+	.reserve = p4_reserve,
+	.unreserve = p4_unreserve,
+	.setup = setup_p4_watchdog,
+	.rearm = p4_rearm,
+	.stop = stop_p4_watchdog,
+	/* RED-PEN this is wrong for the other sibling */
+	.perfctr = MSR_P4_BPU_PERFCTR0,
+	.evntsel = MSR_P4_BSU_ESCR0,
+	.checkbit = 1ULL<<39,
+};
+
+/* Watchdog using the Intel architected PerfMon. Used for Core2 and hopefully
+   all future Intel CPUs. */
+
+#define ARCH_PERFMON_NMI_EVENT_SEL	ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
+#define ARCH_PERFMON_NMI_EVENT_UMASK	ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+
+static int setup_intel_arch_watchdog(unsigned nmi_hz)
+{
+	unsigned int ebx;
+	union cpuid10_eax eax;
+	unsigned int unused;
+	unsigned int perfctr_msr, evntsel_msr;
+	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	/*
+	 * Check whether the Architectural PerfMon supports
+	 * Unhalted Core Cycles Event or not.
+	 * NOTE: Corresponding bit = 0 in ebx indicates event present.
+	 */
+	cpuid(10, &(eax.full), &ebx, &unused, &unused);
+	if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+	    (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+		return 0;
+
+	perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
+	evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
+
+	wrmsrl(perfctr_msr, 0UL);
+
+	evntsel = ARCH_PERFMON_EVENTSEL_INT
+		| ARCH_PERFMON_EVENTSEL_OS
+		| ARCH_PERFMON_EVENTSEL_USR
+		| ARCH_PERFMON_NMI_EVENT_SEL
+		| ARCH_PERFMON_NMI_EVENT_UMASK;
+
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	nmi_hz = adjust_for_32bit_ctr(nmi_hz);
+	write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0", nmi_hz);
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	wd_ops->checkbit = 1ULL << (eax.split.bit_width - 1);
+	return 1;
+}
+
+static struct wd_ops intel_arch_wd_ops = {
+	.reserve = single_msr_reserve,
+	.unreserve = single_msr_unreserve,
+	.setup = setup_intel_arch_watchdog,
+	.rearm = p6_rearm,
+	.stop = single_msr_stop_watchdog,
+	.perfctr = MSR_ARCH_PERFMON_PERFCTR0,
+	.evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
+};
+
+static void probe_nmi_watchdog(void)
+{
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 &&
+		    boot_cpu_data.x86 != 16)
+			return;
+		wd_ops = &k7_wd_ops;
+		break;
+	case X86_VENDOR_INTEL:
+		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+			wd_ops = &intel_arch_wd_ops;
+			break;
+		}
+		switch (boot_cpu_data.x86) {
+		case 6:
+			if (boot_cpu_data.x86_model > 0xd)
+				return;
+
+			wd_ops = &p6_wd_ops;
+			break;
+		case 15:
+			if (boot_cpu_data.x86_model > 0x4)
+				return;
+
+			wd_ops = &p4_wd_ops;
+			break;
+		default:
+			return;
+		}
+		break;
+	}
+}
+
+/* Interface to nmi.c */
+
+int lapic_watchdog_init(unsigned nmi_hz)
+{
+	if (!wd_ops) {
+		probe_nmi_watchdog();
+		if (!wd_ops)
+			return -1;
+	}
+
+	if (!(wd_ops->setup(nmi_hz))) {
+		printk(KERN_ERR "Cannot setup NMI watchdog on CPU %d\n",
+		       raw_smp_processor_id());
+		return -1;
+	}
+
+	return 0;
+}
+
+void lapic_watchdog_stop(void)
+{
+	if (wd_ops)
+		wd_ops->stop(NULL);
+}
+
+unsigned lapic_adjust_nmi_hz(unsigned hz)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+	if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
+	    wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
+		hz = adjust_for_32bit_ctr(hz);
+	return hz;
+}
+
+int lapic_wd_event(unsigned nmi_hz)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+	u64 ctr;
+	rdmsrl(wd->perfctr_msr, ctr);
+	if (ctr & wd_ops->checkbit) { /* perfctr still running? */
+		return 0;
+	}
+	wd_ops->rearm(wd, nmi_hz);
+	return 1;
+}
+
+int lapic_watchdog_ok(void)
+{
+	return wd_ops != NULL;
+}
Index: linux/include/asm-i386/nmi.h
===================================================================
--- linux.orig/include/asm-i386/nmi.h
+++ linux/include/asm-i386/nmi.h
@@ -50,4 +50,12 @@ void __trigger_all_cpu_backtrace(void);
 
 #endif
 
+void lapic_watchdog_stop(void);
+int lapic_watchdog_init(unsigned nmi_hz);
+int lapic_wd_event(unsigned nmi_hz);
+unsigned lapic_adjust_nmi_hz(unsigned hz);
+int lapic_watchdog_ok(void);
+void disable_lapic_nmi_watchdog(void);
+void enable_lapic_nmi_watchdog(void);
+
 #endif /* ASM_NMI_H */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too.
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (2 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


This mainly removes a lot of code, replacing it with calls into the new 32bit
perfctr-watchdog.c

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/kernel/Makefile |    3 
 arch/x86_64/kernel/nmi.c    |  678 ++------------------------------------------
 include/asm-x86_64/nmi.h    |    9 
 3 files changed, 44 insertions(+), 646 deletions(-)

Index: linux/arch/x86_64/kernel/Makefile
===================================================================
--- linux.orig/arch/x86_64/kernel/Makefile
+++ linux/arch/x86_64/kernel/Makefile
@@ -9,7 +9,7 @@ obj-y	:= process.o signal.o entry.o trap
 		x8664_ksyms.o i387.o syscall.o vsyscall.o \
 		setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
 		pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o sched-clock.o \
-		bugs.o
+		bugs.o perfctr-watchdog.o
 
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-$(CONFIG_X86_MCE)		+= mce.o therm_throt.o
@@ -59,3 +59,4 @@ msr-$(subst m,y,$(CONFIG_X86_MSR))  += .
 alternative-y			+= ../../i386/kernel/alternative.o
 pcspeaker-y			+= ../../i386/kernel/pcspeaker.o
 sched-clock-y			+= ../../i386/kernel/sched-clock.o
+perfctr-watchdog-y		+= ../../i386/kernel/cpu/perfctr-watchdog.o
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -80,4 +80,13 @@ extern int unknown_nmi_panic;
 void __trigger_all_cpu_backtrace(void);
 #define trigger_all_cpu_backtrace() __trigger_all_cpu_backtrace()
 
+
+void lapic_watchdog_stop(void);
+int lapic_watchdog_init(unsigned nmi_hz);
+int lapic_wd_event(unsigned nmi_hz);
+unsigned lapic_adjust_nmi_hz(unsigned hz);
+int lapic_watchdog_ok(void);
+void disable_lapic_nmi_watchdog(void);
+void enable_lapic_nmi_watchdog(void);
+
 #endif /* ASM_NMI_H */
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -27,28 +27,11 @@
 #include <asm/proto.h>
 #include <asm/kdebug.h>
 #include <asm/mce.h>
-#include <asm/intel_arch_perfmon.h>
 
 int unknown_nmi_panic;
 int nmi_watchdog_enabled;
 int panic_on_unrecovered_nmi;
 
-/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
- * evtsel_nmi_owner tracks the ownership of the event selection
- * - different performance counters/ event selection may be reserved for
- *   different subsystems this reservation system just tries to coordinate
- *   things a little
- */
-
-/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
- * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
- */
-#define NMI_MAX_COUNTER_BITS 66
-#define NMI_MAX_COUNTER_LONGS BITS_TO_LONGS(NMI_MAX_COUNTER_BITS)
-
-static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-
 static cpumask_t backtrace_mask = CPU_MASK_NONE;
 
 /* nmi_active:
@@ -63,191 +46,11 @@ int panic_on_timeout;
 unsigned int nmi_watchdog = NMI_DEFAULT;
 static unsigned int nmi_hz = HZ;
 
-struct nmi_watchdog_ctlblk {
-	int enabled;
-	u64 check_bit;
-	unsigned int cccr_msr;
-	unsigned int perfctr_msr;  /* the MSR to reset in NMI handler */
-	unsigned int evntsel_msr;  /* the MSR to select the events to handle */
-};
-static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+static DEFINE_PER_CPU(short, wd_enabled);
 
 /* local prototypes */
 static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
 
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
-{
-	/* returns the bit offset of the performance counter register */
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return (msr - MSR_K7_PERFCTR0);
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return (msr - MSR_ARCH_PERFMON_PERFCTR0);
-		else
-			return (msr - MSR_P4_BPU_PERFCTR0);
-	}
-	return 0;
-}
-
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
-{
-	/* returns the bit offset of the event selection register */
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return (msr - MSR_K7_EVNTSEL0);
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
-		else
-			return (msr - MSR_P4_BSU_ESCR0);
-	}
-	return 0;
-}
-
-/* checks for a bit availability (hack for oprofile) */
-int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
-{
-	int cpu;
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-	for_each_possible_cpu (cpu) {
-		if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
-			return 0;
-	}
-	return 1;
-}
-
-/* checks the an msr for availability */
-int avail_to_resrv_perfctr_nmi(unsigned int msr)
-{
-	unsigned int counter;
-	int cpu;
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	for_each_possible_cpu (cpu) {
-		if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
-			return 0;
-	}
-	return 1;
-}
-
-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	if (!test_and_set_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
-		return 1;
-	return 0;
-}
-
-static void __release_perfctr_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_perfctr_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	clear_bit(counter, &per_cpu(perfctr_nmi_owner, cpu));
-}
-
-int reserve_perfctr_nmi(unsigned int msr)
-{
-	int cpu, i;
-	for_each_possible_cpu (cpu) {
-		if (!__reserve_perfctr_nmi(cpu, msr)) {
-			for_each_possible_cpu (i) {
-				if (i >= cpu)
-					break;
-				__release_perfctr_nmi(i, msr);
-			}
-			return 0;
-		}
-	}
-	return 1;
-}
-
-void release_perfctr_nmi(unsigned int msr)
-{
-	int cpu;
-	for_each_possible_cpu (cpu)
-		__release_perfctr_nmi(cpu, msr);
-}
-
-int __reserve_evntsel_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_evntsel_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	if (!test_and_set_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]))
-		return 1;
-	return 0;
-}
-
-static void __release_evntsel_nmi(int cpu, unsigned int msr)
-{
-	unsigned int counter;
-	if (cpu < 0)
-		cpu = smp_processor_id();
-
-	counter = nmi_evntsel_msr_to_bit(msr);
-	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
-	clear_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]);
-}
-
-int reserve_evntsel_nmi(unsigned int msr)
-{
-	int cpu, i;
-	for_each_possible_cpu (cpu) {
-		if (!__reserve_evntsel_nmi(cpu, msr)) {
-			for_each_possible_cpu (i) {
-				if (i >= cpu)
-					break;
-				__release_evntsel_nmi(i, msr);
-			}
-			return 0;
-		}
-	}
-	return 1;
-}
-
-void release_evntsel_nmi(unsigned int msr)
-{
-	int cpu;
-	for_each_possible_cpu (cpu) {
-		__release_evntsel_nmi(cpu, msr);
-	}
-}
-
-static __cpuinit inline int nmi_known_cpu(void)
-{
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		return boot_cpu_data.x86 == 15 || boot_cpu_data.x86 == 16;
-	case X86_VENDOR_INTEL:
-		if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
-			return 1;
-		else
-			return (boot_cpu_data.x86 == 15);
-	}
-	return 0;
-}
-
 /* Run after command line and cpu_init init, but before all other checks */
 void nmi_watchdog_default(void)
 {
@@ -277,23 +80,6 @@ static __init void nmi_cpu_busy(void *da
 }
 #endif
 
-static unsigned int adjust_for_32bit_ctr(unsigned int hz)
-{
-	unsigned int retval = hz;
-
-	/*
-	 * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter
-	 * are writable, with higher bits sign extending from bit 31.
-	 * So, we can only program the counter with 31 bit values and
-	 * 32nd bit should be 1, for 33.. to be 1.
-	 * Find the appropriate nmi_hz
-	 */
- 	if ((((u64)cpu_khz * 1000) / retval) > 0x7fffffffULL) {
-		retval = ((u64)cpu_khz * 1000) / 0x7fffffffUL + 1;
-	}
-	return retval;
-}
-
 int __init check_nmi_watchdog (void)
 {
 	int *counts;
@@ -322,14 +108,14 @@ int __init check_nmi_watchdog (void)
 	mdelay((20*1000)/nmi_hz); // wait 20 ticks
 
 	for_each_online_cpu(cpu) {
-		if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+		if (!per_cpu(wd_enabled, cpu))
 			continue;
 		if (cpu_pda(cpu)->__nmi_count - counts[cpu] <= 5) {
 			printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
 			       cpu,
 			       counts[cpu],
 			       cpu_pda(cpu)->__nmi_count);
-			per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+			per_cpu(wd_enabled, cpu) = 0;
 			atomic_dec(&nmi_active);
 		}
 	}
@@ -344,13 +130,8 @@ int __init check_nmi_watchdog (void)
 
 	/* now that we know it works we can reduce NMI frequency to
 	   something more reasonable; makes a difference in some configs */
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-		nmi_hz = 1;
-	 	if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
-			nmi_hz = adjust_for_32bit_ctr(nmi_hz);
-	}
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+		nmi_hz = lapic_adjust_nmi_hz(1);
 
 	kfree(counts);
 	return 0;
@@ -379,57 +160,6 @@ int __init setup_nmi_watchdog(char *str)
 
 __setup("nmi_watchdog=", setup_nmi_watchdog);
 
-static void disable_lapic_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
-	if (atomic_read(&nmi_active) <= 0)
-		return;
-
-	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
-	BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-static void enable_lapic_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
-	/* are we already enabled */
-	if (atomic_read(&nmi_active) != 0)
-		return;
-
-	/* are we lapic aware */
-	if (nmi_known_cpu() <= 0)
-		return;
-
-	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
-	touch_nmi_watchdog();
-}
-
-void disable_timer_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
-	if (atomic_read(&nmi_active) <= 0)
-		return;
-
-	disable_irq(0);
-	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
-	BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-void enable_timer_nmi_watchdog(void)
-{
-	BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
-	if (atomic_read(&nmi_active) == 0) {
-		touch_nmi_watchdog();
-		on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
-		enable_irq(0);
-	}
-}
 
 static void __acpi_nmi_disable(void *__unused)
 {
@@ -515,275 +245,9 @@ late_initcall(init_lapic_nmi_sysfs);
 
 #endif	/* CONFIG_PM */
 
-/*
- * Activate the NMI watchdog via the local APIC.
- * Original code written by Keith Owens.
- */
-
-/* Note that these events don't tick when the CPU idles. This means
-   the frequency varies with CPU load. */
-
-#define K7_EVNTSEL_ENABLE	(1 << 22)
-#define K7_EVNTSEL_INT		(1 << 20)
-#define K7_EVNTSEL_OS		(1 << 17)
-#define K7_EVNTSEL_USR		(1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
-#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
-
-static int setup_k7_watchdog(void)
-{
-	unsigned int perfctr_msr, evntsel_msr;
-	unsigned int evntsel;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	perfctr_msr = MSR_K7_PERFCTR0;
-	evntsel_msr = MSR_K7_EVNTSEL0;
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	/* Simulator may not support it */
-	if (checking_wrmsrl(evntsel_msr, 0UL))
-		goto fail2;
-	wrmsrl(perfctr_msr, 0UL);
-
-	evntsel = K7_EVNTSEL_INT
-		| K7_EVNTSEL_OS
-		| K7_EVNTSEL_USR
-		| K7_NMI_EVENT;
-
-	/* setup the timer */
-	wrmsr(evntsel_msr, evntsel, 0);
-	wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= K7_EVNTSEL_ENABLE;
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = 0;  //unused
-	wd->check_bit = 1ULL<<63;
-	return 1;
-fail2:
-	__release_evntsel_nmi(-1, evntsel_msr);
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
-}
-
-static void stop_k7_watchdog(void)
-{
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
-   the frequency varies with CPU load. */
-
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
-#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
-#define P4_ESCR_OS		(1<<3)
-#define P4_ESCR_USR		(1<<2)
-#define P4_CCCR_OVF_PMI0	(1<<26)
-#define P4_CCCR_OVF_PMI1	(1<<27)
-#define P4_CCCR_THRESHOLD(N)	((N)<<20)
-#define P4_CCCR_COMPLEMENT	(1<<19)
-#define P4_CCCR_COMPARE		(1<<18)
-#define P4_CCCR_REQUIRED	(3<<16)
-#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
-#define P4_CCCR_ENABLE		(1<<12)
-#define P4_CCCR_OVF 		(1<<31)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
-   CRU_ESCR0 (with any non-null event selector) through a complemented
-   max threshold. [IA32-Vol3, Section 14.9.9] */
-
-static int setup_p4_watchdog(void)
-{
-	unsigned int perfctr_msr, evntsel_msr, cccr_msr;
-	unsigned int evntsel, cccr_val;
-	unsigned int misc_enable, dummy;
-	unsigned int ht_num;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
-	if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
-		return 0;
-
-#ifdef CONFIG_SMP
-	/* detect which hyperthread we are on */
-	if (smp_num_siblings == 2) {
-		unsigned int ebx, apicid;
-
-        	ebx = cpuid_ebx(1);
-	        apicid = (ebx >> 24) & 0xff;
-        	ht_num = apicid & 1;
-	} else
-#endif
-		ht_num = 0;
-
-	/* performance counters are shared resources
-	 * assign each hyperthread its own set
-	 * (re-use the ESCR0 register, seems safe
-	 * and keeps the cccr_val the same)
-	 */
-	if (!ht_num) {
-		/* logical cpu 0 */
-		perfctr_msr = MSR_P4_IQ_PERFCTR0;
-		evntsel_msr = MSR_P4_CRU_ESCR0;
-		cccr_msr = MSR_P4_IQ_CCCR0;
-		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
-	} else {
-		/* logical cpu 1 */
-		perfctr_msr = MSR_P4_IQ_PERFCTR1;
-		evntsel_msr = MSR_P4_CRU_ESCR0;
-		cccr_msr = MSR_P4_IQ_CCCR1;
-		cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
-	}
-
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	evntsel = P4_ESCR_EVENT_SELECT(0x3F)
-	 	| P4_ESCR_OS
-		| P4_ESCR_USR;
-
-	cccr_val |= P4_CCCR_THRESHOLD(15)
-		 | P4_CCCR_COMPLEMENT
-		 | P4_CCCR_COMPARE
-		 | P4_CCCR_REQUIRED;
-
-	wrmsr(evntsel_msr, evntsel, 0);
-	wrmsr(cccr_msr, cccr_val, 0);
-	wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	cccr_val |= P4_CCCR_ENABLE;
-	wrmsr(cccr_msr, cccr_val, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = cccr_msr;
-	wd->check_bit = 1ULL<<39;
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
-}
-
-static void stop_p4_watchdog(void)
-{
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	wrmsr(wd->cccr_msr, 0, 0);
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-#define ARCH_PERFMON_NMI_EVENT_SEL	ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK	ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
-
-static int setup_intel_arch_watchdog(void)
-{
-	unsigned int ebx;
-	union cpuid10_eax eax;
-	unsigned int unused;
-	unsigned int perfctr_msr, evntsel_msr;
-	unsigned int evntsel;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/*
-	 * Check whether the Architectural PerfMon supports
-	 * Unhalted Core Cycles Event or not.
-	 * NOTE: Corresponding bit = 0 in ebx indicates event present.
-	 */
-	cpuid(10, &(eax.full), &ebx, &unused, &unused);
-	if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
-	    (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
-		goto fail;
-
-	perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
-	evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
-
-	if (!__reserve_perfctr_nmi(-1, perfctr_msr))
-		goto fail;
-
-	if (!__reserve_evntsel_nmi(-1, evntsel_msr))
-		goto fail1;
-
-	wrmsrl(perfctr_msr, 0UL);
-
-	evntsel = ARCH_PERFMON_EVENTSEL_INT
-		| ARCH_PERFMON_EVENTSEL_OS
-		| ARCH_PERFMON_EVENTSEL_USR
-		| ARCH_PERFMON_NMI_EVENT_SEL
-		| ARCH_PERFMON_NMI_EVENT_UMASK;
-
-	/* setup the timer */
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	nmi_hz = adjust_for_32bit_ctr(nmi_hz);
-	wrmsr(perfctr_msr, (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0);
-
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
-	wrmsr(evntsel_msr, evntsel, 0);
-
-	wd->perfctr_msr = perfctr_msr;
-	wd->evntsel_msr = evntsel_msr;
-	wd->cccr_msr = 0;  //unused
-	wd->check_bit = 1ULL << (eax.split.bit_width - 1);
-	return 1;
-fail1:
-	__release_perfctr_nmi(-1, perfctr_msr);
-fail:
-	return 0;
-}
-
-static void stop_intel_arch_watchdog(void)
-{
-	unsigned int ebx;
-	union cpuid10_eax eax;
-	unsigned int unused;
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/*
-	 * Check whether the Architectural PerfMon supports
-	 * Unhalted Core Cycles Event or not.
-	 * NOTE: Corresponding bit = 0 in ebx indicates event present.
-	 */
-	cpuid(10, &(eax.full), &ebx, &unused, &unused);
-	if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
-	    (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
-		return;
-
-	wrmsr(wd->evntsel_msr, 0, 0);
-
-	__release_evntsel_nmi(-1, wd->evntsel_msr);
-	__release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
 void setup_apic_nmi_watchdog(void *unused)
 {
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
-	/* only support LOCAL and IO APICs for now */
-	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
-	    (nmi_watchdog != NMI_IO_APIC))
-	    	return;
-
-	if (wd->enabled == 1)
+	if (__get_cpu_var(wd_enabled) == 1)
 		return;
 
 	/* cheap hack to support suspend/resume */
@@ -791,62 +255,31 @@ void setup_apic_nmi_watchdog(void *unuse
 	if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
 		return;
 
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		switch (boot_cpu_data.x86_vendor) {
-		case X86_VENDOR_AMD:
-			if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
-				return;
-			if (!setup_k7_watchdog())
-				return;
-			break;
-		case X86_VENDOR_INTEL:
-			if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
-				if (!setup_intel_arch_watchdog())
-					return;
-				break;
-			}
-			if (!setup_p4_watchdog())
-				return;
-			break;
-		default:
+	switch (nmi_watchdog) {
+	case NMI_LOCAL_APIC:
+		__get_cpu_var(wd_enabled) = 1;
+		if (lapic_watchdog_init(nmi_hz) < 0) {
+			__get_cpu_var(wd_enabled) = 0;
 			return;
 		}
+		/* FALL THROUGH */
+	case NMI_IO_APIC:
+		__get_cpu_var(wd_enabled) = 1;
+		atomic_inc(&nmi_active);
 	}
-	wd->enabled = 1;
-	atomic_inc(&nmi_active);
 }
 
 void stop_apic_nmi_watchdog(void *unused)
 {
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
 	/* only support LOCAL and IO APICs for now */
 	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
 	    (nmi_watchdog != NMI_IO_APIC))
 	    	return;
-
-	if (wd->enabled == 0)
+	if (__get_cpu_var(wd_enabled) == 0)
 		return;
-
-	if (nmi_watchdog == NMI_LOCAL_APIC) {
-		switch (boot_cpu_data.x86_vendor) {
-		case X86_VENDOR_AMD:
-			if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
-				return;
-			stop_k7_watchdog();
-			break;
-		case X86_VENDOR_INTEL:
-			if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
-				stop_intel_arch_watchdog();
-				break;
-			}
-			stop_p4_watchdog();
-			break;
-		default:
-			return;
-		}
-	}
-	wd->enabled = 0;
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+		lapic_watchdog_stop();
+	__get_cpu_var(wd_enabled) = 0;
 	atomic_dec(&nmi_active);
 }
 
@@ -885,9 +318,7 @@ int __kprobes nmi_watchdog_tick(struct p
 	int sum;
 	int touched = 0;
 	int cpu = smp_processor_id();
-	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-	u64 dummy;
-	int rc=0;
+	int rc = 0;
 
 	/* check for other users first */
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
@@ -934,55 +365,20 @@ int __kprobes nmi_watchdog_tick(struct p
 	}
 
 	/* see if the nmi watchdog went off */
-	if (wd->enabled) {
-		if (nmi_watchdog == NMI_LOCAL_APIC) {
-			rdmsrl(wd->perfctr_msr, dummy);
-			if (dummy & wd->check_bit){
-				/* this wasn't a watchdog timer interrupt */
-				goto done;
-			}
-
-			/* only Intel uses the cccr msr */
-	 		if (wd->cccr_msr != 0) {
-	 			/*
-	 			 * P4 quirks:
-	 			 * - An overflown perfctr will assert its interrupt
-	 			 *   until the OVF flag in its CCCR is cleared.
-	 			 * - LVTPC is masked on interrupt and must be
-	 			 *   unmasked by the LVTPC handler.
-	 			 */
-				rdmsrl(wd->cccr_msr, dummy);
-				dummy &= ~P4_CCCR_OVF;
-	 			wrmsrl(wd->cccr_msr, dummy);
-	 			apic_write(APIC_LVTPC, APIC_DM_NMI);
-				/* start the cycle over again */
-				wrmsrl(wd->perfctr_msr,
-				       -((u64)cpu_khz * 1000 / nmi_hz));
-	 		} else if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
-				/*
-				 * ArchPerfom/Core Duo needs to re-unmask
-				 * the apic vector
-				 */
-				apic_write(APIC_LVTPC, APIC_DM_NMI);
-				/* ARCH_PERFMON has 32 bit counter writes */
-				wrmsr(wd->perfctr_msr,
-				     (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0);
-			} else {
-				/* start the cycle over again */
-				wrmsrl(wd->perfctr_msr,
-				       -((u64)cpu_khz * 1000 / nmi_hz));
-			}
-			rc = 1;
-		} else 	if (nmi_watchdog == NMI_IO_APIC) {
-			/* don't know how to accurately check for this.
-			 * just assume it was a watchdog timer interrupt
-			 * This matches the old behaviour.
-			 */
-			rc = 1;
-		} else
-			printk(KERN_WARNING "Unknown enabled NMI hardware?!\n");
+	if (!__get_cpu_var(wd_enabled))
+		return rc;
+	switch (nmi_watchdog) {
+	case NMI_LOCAL_APIC:
+		rc |= lapic_wd_event(nmi_hz);
+		break;
+	case NMI_IO_APIC:
+		/* don't know how to accurately check for this.
+		 * just assume it was a watchdog timer interrupt
+		 * This matches the old behaviour.
+		 */
+		rc = 1;
+		break;
 	}
-done:
 	return rc;
 }
 
@@ -1067,12 +463,4 @@ void __trigger_all_cpu_backtrace(void)
 
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
-EXPORT_SYMBOL(reserve_perfctr_nmi);
-EXPORT_SYMBOL(release_perfctr_nmi);
-EXPORT_SYMBOL(reserve_evntsel_nmi);
-EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(disable_timer_nmi_watchdog);
-EXPORT_SYMBOL(enable_timer_nmi_watchdog);
 EXPORT_SYMBOL(touch_nmi_watchdog);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (3 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when
it is not implemented.

This is the same as COMPATIBLE_IOCTL internally, but better self documentng.

Valid reasons to use this:
- It is implemented with ->compat_ioctl on some device, but programs
  call it on others too.
- The ioctl is not implemented in the native kernel, but programs
  call it commonly anyways.
Most other reasons are not valid. 

Signed-off-by: Andi Kleen <ak@suse.de>

---
 fs/compat_ioctl.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2396,6 +2396,14 @@ lp_timeout_trans(unsigned int fd, unsign
 #define ULONG_IOCTL(cmd) \
 	{ (cmd), (ioctl_trans_handler_t)sys_ioctl },
 
+/* ioctl should not be warned about even if it's not implemented.
+   Valid reasons to use this:
+   - It is implemented with ->compat_ioctl on some device, but programs
+   call it on others too.
+   - The ioctl is not implemented in the native kernel, but programs
+   call it commonly anyways.
+   Most other reasons are not valid. */
+#define IGNORE_IOCTL(cmd) COMPATIBLE_IOCTL(cmd)
 
 struct ioctl_trans ioctl_start[] = {
 #include <linux/compat_ioctl.h>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (4 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


The kernel doesn't implement it, but some programs like java use it 
anyways. Shut the code up.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 fs/compat_ioctl.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2602,6 +2602,8 @@ HANDLE_IOCTL(SIOCGIWENCODEEXT, do_wirele
 HANDLE_IOCTL(SIOCSIWPMKSA, do_wireless_ioctl)
 HANDLE_IOCTL(SIOCSIFBR, old_bridge_ioctl)
 HANDLE_IOCTL(SIOCGIFBR, old_bridge_ioctl)
+/* Not implemented in the native kernel */
+IGNORE_IOCTL(SIOCGIFCOUNT)
 HANDLE_IOCTL(RTC_IRQP_READ32, rtc_ioctl)
 HANDLE_IOCTL(RTC_IRQP_SET32, rtc_ioctl)
 HANDLE_IOCTL(RTC_EPOCH_READ32, rtc_ioctl)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (5 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: dpreed, patches, linux-kernel


From: David P. Reed <dpreed@reed.com>

- Use 64bit TSC calculations to avoid handling overflow
- Use 32bit unsigned arithmetic for the APIC timer. This
way overflows are handled correctly.
- Fix exit check of loop to account for apic timer counting down

Signed-off-by: dpreed@reed.com
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/kernel/apic.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -839,14 +839,15 @@ static void setup_APIC_timer(unsigned in
 
 static int __init calibrate_APIC_clock(void)
 {
-	int apic, apic_start, tsc, tsc_start;
+	unsigned apic, apic_start;
+	unsigned long tsc, tsc_start;
 	int result;
 	/*
 	 * Put whatever arbitrary (but long enough) timeout
 	 * value into the APIC clock, we just want to get the
 	 * counter running for calibration.
 	 */
-	__setup_APIC_LVTT(1000000000);
+	__setup_APIC_LVTT(4000000000);
 
 	apic_start = apic_read(APIC_TMCCT);
 #ifdef CONFIG_X86_PM_TIMER
@@ -857,13 +858,13 @@ static int __init calibrate_APIC_clock(v
 	} else
 #endif
 	{
-		rdtscl(tsc_start);
+		rdtscll(tsc_start);
 
 		do {
 			apic = apic_read(APIC_TMCCT);
-			rdtscl(tsc);
+			rdtscll(tsc);
 		} while ((tsc - tsc_start) < TICK_COUNT &&
-				(apic - apic_start) < TICK_COUNT);
+				(apic_start - apic) < TICK_COUNT);
 
 		result = (apic_start - apic) * 1000L * tsc_khz /
 					(tsc - tsc_start);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (6 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  5:57   ` Jeremy Fitzhardinge
  2007-05-01  3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


This implements new vDSO for x86-64.  The concept is similar
to the existing vDSOs on i386 and PPC.  x86-64 has had static
vsyscalls before,  but these are not flexible enough anymore.

A vDSO is a ELF shared library supplied by the kernel that is mapped into 
user address space.  The vDSO mapping is randomized for each process
for security reasons.

Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.

The vdso can be disabled with vdso=0

It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall.  clock_gettime is significantly faster 
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.

The new calls are generally faster than the old vsyscall. 

TBD: add new benchmarks

Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it) 

Weak points: 
- glibc support still to be written

The VM interface is partly based on Ingo Molnar's i386 version.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 Documentation/kernel-parameters.txt |    2 
 arch/x86_64/Makefile                |    3 
 arch/x86_64/ia32/ia32_binfmt.c      |    1 
 arch/x86_64/kernel/time.c           |    1 
 arch/x86_64/kernel/vmlinux.lds.S    |   12 +++
 arch/x86_64/kernel/vsyscall.c       |   22 +----
 arch/x86_64/mm/init.c               |   17 ++++
 arch/x86_64/vdso/Makefile           |   49 ++++++++++++
 arch/x86_64/vdso/vclock_gettime.c   |  120 +++++++++++++++++++++++++++++++
 arch/x86_64/vdso/vdso-note.S        |   25 ++++++
 arch/x86_64/vdso/vdso-start.S       |    2 
 arch/x86_64/vdso/vdso.S             |    2 
 arch/x86_64/vdso/vdso.lds.S         |   77 ++++++++++++++++++++
 arch/x86_64/vdso/vextern.h          |   16 ++++
 arch/x86_64/vdso/vgetcpu.c          |   50 +++++++++++++
 arch/x86_64/vdso/vma.c              |  137 ++++++++++++++++++++++++++++++++++++
 arch/x86_64/vdso/voffset.h          |    1 
 arch/x86_64/vdso/vvar.c             |   12 +++
 include/asm-x86_64/auxvec.h         |    2 
 include/asm-x86_64/elf.h            |   13 +++
 include/asm-x86_64/mmu.h            |    1 
 include/asm-x86_64/pgtable.h        |    8 +-
 include/asm-x86_64/vgtod.h          |   29 +++++++
 include/asm-x86_64/vsyscall.h       |    3 
 24 files changed, 583 insertions(+), 22 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -38,6 +38,7 @@
 
 int sysctl_vsyscall32 = 1;
 
+#undef ARCH_DLINFO
 #define ARCH_DLINFO do {  \
 	if (sysctl_vsyscall32) { \
 	NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
Index: linux/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux/arch/x86_64/kernel/vmlinux.lds.S
@@ -94,6 +94,9 @@ SECTIONS
   .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
 		{ *(.vsyscall_gtod_data) }
   vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
+  .vsyscall_clock : AT(VLOAD(.vsyscall_clock))
+		{ *(.vsyscall_clock) }
+  vsyscall_clock = VVIRT(.vsyscall_clock);
 
 
   .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
@@ -153,6 +156,8 @@ SECTIONS
 
   . = ALIGN(4096);		/* Init code and data */
   __init_begin = .;
+
+
   .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
 	_sinittext = .;
 	*(.init.text)
@@ -190,6 +195,12 @@ SECTIONS
   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
   .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
 
+/* vdso blob that is mapped into user space */
+  vdso_start = . ;
+  .vdso  : AT(ADDR(.vdso) - LOAD_OFFSET) { *(.vdso) }
+  . = ALIGN(4096);
+  vdso_end = .;
+
 #ifdef CONFIG_BLK_DEV_INITRD
   . = ALIGN(4096);
   __initramfs_start = .;
@@ -202,6 +213,7 @@ SECTIONS
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
   __per_cpu_end = .;
   . = ALIGN(4096);
+
   __init_end = .;
 
   . = ALIGN(4096);
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -159,6 +159,14 @@ static __init void set_pte_phys(unsigned
 	__flush_tlb_one(vaddr);
 }
 
+void __init
+set_kernel_map(void *vaddr,unsigned long len,unsigned long phys,pgprot_t prot)
+{
+	void *end = vaddr + ALIGN(len, PAGE_SIZE);
+	for (; vaddr < end; vaddr += PAGE_SIZE, phys += PAGE_SIZE)
+		set_pte_phys((unsigned long)vaddr, phys, prot);
+}
+
 /* NOTE: this is meant to be run only at boot */
 void __init 
 __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
@@ -756,3 +764,12 @@ int in_gate_area_no_task(unsigned long a
 {
 	return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
 }
+
+const char *arch_vma_name(struct vm_area_struct *vma)
+{
+	if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
+		return "[vdso]";
+	if (vma == &gate_vma)
+		return "[vsyscall]";
+	return NULL;
+}
Index: linux/arch/x86_64/vdso/vdso-note.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso-note.S
@@ -0,0 +1,25 @@
+/*
+ * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
+ * Here we can supply some information useful to userland.
+ */
+
+#include <linux/uts.h>
+#include <linux/version.h>
+
+#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type)			      \
+	.section name, flags;						      \
+	.balign 4;							      \
+	.long 1f - 0f;		/* name length */			      \
+	.long 3f - 2f;		/* data length */			      \
+	.long type;		/* note type */				      \
+0:	.asciz vendor;		/* vendor name */			      \
+1:	.balign 4;							      \
+2:
+
+#define ASM_ELF_NOTE_END						      \
+3:	.balign 4;		/* pad out section */			      \
+	.previous
+
+	ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0)
+	.long LINUX_VERSION_CODE
+	ASM_ELF_NOTE_END
Index: linux/arch/x86_64/vdso/vdso.lds.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso.lds.S
@@ -0,0 +1,77 @@
+/*
+ * Linker script for vsyscall DSO.  The vsyscall page is an ELF shared
+ * object prelinked to its virtual address, and with only one read-only
+ * segment (that fits in one page).  This script controls its layout.
+ */
+#include <asm/asm-offsets.h>
+#include "voffset.h"
+
+#define VDSO_PRELINK 0xffffffffff700000
+
+SECTIONS
+{
+  . = VDSO_PRELINK + SIZEOF_HEADERS;
+
+  .hash           : { *(.hash) }		:text
+  .gnu.hash       : { *(.gnu.hash) }
+  .dynsym         : { *(.dynsym) }
+  .dynstr         : { *(.dynstr) }
+  .gnu.version    : { *(.gnu.version) }
+  .gnu.version_d  : { *(.gnu.version_d) }
+  .gnu.version_r  : { *(.gnu.version_r) }
+
+  /* This linker script is used both with -r and with -shared.
+     For the layouts to match, we need to skip more than enough
+     space for the dynamic symbol table et al.  If this amount
+     is insufficient, ld -shared will barf.  Just increase it here.  */
+  . = VDSO_PRELINK + VDSO_TEXT_OFFSET;
+
+  .text           : { *(.text) }		:text
+  .text.ptr       : { *(.text.ptr) }		:text
+  . = VDSO_PRELINK + 0x900;
+  .data           : { *(.data) }		:text
+  .bss            : { *(.bss) }			:text
+
+  .altinstructions : { *(.altinstructions) }			:text
+  .altinstr_replacement  : { *(.altinstr_replacement) }	:text
+
+  .note		  : { *(.note.*) }		:text :note
+  .eh_frame_hdr   : { *(.eh_frame_hdr) }	:text :eh_frame_hdr
+  .eh_frame       : { KEEP (*(.eh_frame)) }	:text
+  .dynamic        : { *(.dynamic) }		:text :dynamic
+  .useless        : {
+  	*(.got.plt) *(.got)
+	*(.gnu.linkonce.d.*)
+	*(.dynbss)
+	*(.gnu.linkonce.b.*)
+  }						:text
+}
+
+/*
+ * We must supply the ELF program headers explicitly to get just one
+ * PT_LOAD segment, and set the flags explicitly to make segments read-only.
+ */
+PHDRS
+{
+  text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
+  dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+  note PT_NOTE FLAGS(4); /* PF_R */
+  eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
+}
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+  LINUX_2.6 {
+    global:
+	clock_gettime;
+	__vdso_clock_gettime;
+	gettimeofday;
+	__vdso_gettimeofday;
+	getcpu;
+	__vdso_getcpu;
+    local: *;
+  };
+}
Index: linux/arch/x86_64/vdso/Makefile
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/Makefile
@@ -0,0 +1,49 @@
+#
+# x86-64 vDSO.
+#
+
+# files to link into the vdso
+# vdso-start.o has to be first
+vobjs-y := vdso-start.o vdso-note.o vclock_gettime.o vgetcpu.o vvar.o
+
+# files to link into kernel
+obj-y := vma.o vdso.o vdso-syms.o
+
+vobjs := $(foreach F,$(vobjs-y),$(obj)/$F)
+
+$(obj)/vdso.o: $(obj)/vdso.so
+
+targets += vdso.so vdso.lds $(vobjs-y) vdso-syms.o
+
+# The DSO images are built using a special linker script.
+quiet_cmd_syscall = SYSCALL $@
+      cmd_syscall = $(CC) -m elf_x86_64 -nostdlib $(SYSCFLAGS_$(@F)) \
+		          -Wl,-T,$(filter-out FORCE,$^) -o $@
+
+export CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
+
+vdso-flags = -fPIC -shared -Wl,-soname=linux-vdso.so.1 \
+		 $(call ld-option, -Wl$(comma)--hash-style=sysv) \
+		-Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
+SYSCFLAGS_vdso.so = $(vdso-flags)
+
+$(obj)/vdso.o: $(src)/vdso.S $(obj)/vdso.so
+
+$(obj)/vdso.so: $(src)/vdso.lds $(vobjs) FORCE
+	$(call if_changed,syscall)
+
+CF := $(PROFILING) -mcmodel=small -fPIC -g0 -O2 -fasynchronous-unwind-tables -m64
+
+$(obj)/vclock_gettime.o: CFLAGS = $(CF)
+$(obj)/vgetcpu.o: CFLAGS = $(CF)
+
+# We also create a special relocatable object that should mirror the symbol
+# table and layout of the linked DSO.  With ld -R we can then refer to
+# these symbols in the kernel code rather than hand-coded addresses.
+extra-y += vdso-syms.o
+$(obj)/built-in.o: $(obj)/vdso-syms.o
+$(obj)/built-in.o: ld_flags += -R $(obj)/vdso-syms.o
+
+SYSCFLAGS_vdso-syms.o = -r -d
+$(obj)/vdso-syms.o: $(src)/vdso.lds $(vobjs) FORCE
+	$(call if_changed,syscall)
Index: linux/arch/x86_64/vdso/vclock_gettime.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vclock_gettime.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright 2006 Andi Kleen, SUSE Labs.
+ * Subject to the GNU Public License, v.2
+ *
+ * Fast user context implementation of clock_gettime and gettimeofday.
+ *
+ * The code should have no internal unresolved relocations.
+ * Check with readelf after changing.
+ * Also alternative() doesn't work.
+ */
+
+#include <linux/kernel.h>
+#include <linux/posix-timers.h>
+#include <linux/time.h>
+#include <linux/string.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include <asm/timex.h>
+#include <asm/hpet.h>
+#include <asm/unistd.h>
+#include <asm/io.h>
+#include <asm/vgtod.h>
+#include "vextern.h"
+
+#define gtod vdso_vsyscall_gtod_data
+
+static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+	long ret;
+	asm("syscall" : "=a" (ret) :
+	    "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
+	return ret;
+}
+
+static inline long vgetns(void)
+{
+	cycles_t (*vread)(void);
+	vread = gtod->clock.vread;
+	return ((vread() - gtod->clock.cycle_last) * gtod->clock.mult) >>
+		gtod->clock.shift;
+}
+
+static noinline int do_realtime(struct timespec *ts)
+{
+	unsigned long seq, ns;
+	do {
+		seq = read_seqbegin(&gtod->lock);
+		ts->tv_sec = gtod->wall_time_sec;
+		ts->tv_nsec = gtod->wall_time_nsec;
+		ns = vgetns();
+	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	timespec_add_ns(ts, ns);
+	return 0;
+}
+
+/* Copy of the version in kernel/time.c which we cannot directly access */
+static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
+{
+	while (nsec >= NSEC_PER_SEC) {
+		nsec -= NSEC_PER_SEC;
+		++sec;
+	}
+	while (nsec < 0) {
+		nsec += NSEC_PER_SEC;
+		--sec;
+	}
+	ts->tv_sec = sec;
+	ts->tv_nsec = nsec;
+}
+
+static noinline int do_monotonic(struct timespec *ts)
+{
+	unsigned long seq, ns, secs;
+	do {
+		seq = read_seqbegin(&gtod->lock);
+		secs = gtod->wall_time_sec;
+		ns = gtod->wall_time_nsec + vgetns();
+		secs += gtod->wall_to_monotonic.tv_sec;
+		ns += gtod->wall_to_monotonic.tv_nsec;
+	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	vset_normalized_timespec(ts, secs, ns);
+	return 0;
+}
+
+int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
+{
+	if (likely(gtod->sysctl_enabled && gtod->clock.vread))
+		switch (clock) {
+		case CLOCK_REALTIME:
+			return do_realtime(ts);
+		case CLOCK_MONOTONIC:
+			return do_monotonic(ts);
+		}
+	return vdso_fallback_gettime(clock, ts);
+}
+int clock_gettime(clockid_t, struct timespec *)
+	__attribute__((weak, alias("__vdso_clock_gettime")));
+
+int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
+{
+	long ret;
+	if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
+		BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
+			     offsetof(struct timespec, tv_nsec) ||
+			     sizeof(*tv) != sizeof(struct timespec));
+		do_realtime((struct timespec *)tv);
+		tv->tv_usec /= 1000;
+		if (unlikely(tz != NULL)) {
+			/* This relies on gcc inlining the memcpy. We'll notice
+			   if it ever fails to do so. */
+			memcpy(tz, &gtod->sys_tz, sizeof(struct timezone));
+		}
+		return 0;
+	}
+	asm("syscall" : "=a" (ret) :
+	    "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
+	return ret;
+}
+int gettimeofday(struct timeval *, struct timezone *)
+	__attribute__((weak, alias("__vdso_gettimeofday")));
Index: linux/arch/x86_64/vdso/vma.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vma.c
@@ -0,0 +1,137 @@
+/*
+ * Set up the VMAs to tell the VM about the vDSO.
+ * Copyright 2007 Andi Kleen, SUSE Labs.
+ * Subject to the GPL, v.2
+ */
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/random.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include <asm/proto.h>
+#include "voffset.h"
+
+int vdso_enabled = 1;
+
+#define VEXTERN(x) extern typeof(__ ## x) *vdso_ ## x;
+#include "vextern.h"
+#undef VEXTERN
+
+extern char vdso_kernel_start[], vdso_start[], vdso_end[];
+extern unsigned short vdso_sync_cpuid;
+
+struct page **vdso_pages;
+
+static inline void *var_ref(void *vbase, char *var, char *name)
+{
+	unsigned offset = var - &vdso_kernel_start[0] + VDSO_TEXT_OFFSET;
+	void *p = vbase + offset;
+	if (*(void **)p != (void *)VMAGIC) {
+		printk("VDSO: variable %s broken\n", name);
+		vdso_enabled = 0;
+	}
+	return p;
+}
+
+static int __init init_vdso_vars(void)
+{
+	int npages = (vdso_end - vdso_start + PAGE_SIZE - 1) / PAGE_SIZE;
+	int i;
+	char *vbase;
+
+	vdso_pages = kmalloc(sizeof(struct page *) * npages, GFP_KERNEL);
+	if (!vdso_pages)
+		goto oom;
+	for (i = 0; i < npages; i++) {
+		struct page *p;
+		p = alloc_page(GFP_KERNEL);
+		if (!p)
+			goto oom;
+		vdso_pages[i] = p;
+		copy_page(page_address(p), vdso_start + i*PAGE_SIZE);
+	}
+
+	vbase = vmap(vdso_pages, npages, 0, PAGE_KERNEL);
+	if (!vbase)
+		goto oom;
+
+	if (memcmp(vbase, "\177ELF", 4)) {
+		printk("VDSO: I'm broken; not ELF\n");
+		vdso_enabled = 0;
+	}
+
+#define V(x) *(typeof(x) *) var_ref(vbase, (char *)RELOC_HIDE(&x, 0), #x)
+#define VEXTERN(x) \
+	V(vdso_ ## x) = &__ ## x;
+#include "vextern.h"
+#undef VEXTERN
+	return 0;
+
+ oom:
+	printk("Cannot allocate vdso\n");
+	vdso_enabled = 0;
+	return -ENOMEM;
+}
+__initcall(init_vdso_vars);
+
+struct linux_binprm;
+
+/* Put the vdso above the (randomized) stack with another randomized offset.
+   This way there is no hole in the middle of address space.
+   To save memory make sure it is still in the same PTE as the stack top.
+   This doesn't give that many random bits */
+static unsigned long vdso_addr(unsigned long start, unsigned len)
+{
+	unsigned long addr, end;
+	unsigned offset;
+	end = (start + PMD_SIZE - 1) & PMD_MASK;
+	if (end >= TASK_SIZE64)
+		end = TASK_SIZE64;
+	end -= len;
+	/* This loses some more bits than a modulo, but is cheaper */
+	offset = get_random_int() & (PTRS_PER_PTE - 1);
+	addr = start + (offset << PAGE_SHIFT);
+	if (addr >= end)
+		addr = end;
+	return addr;
+}
+
+/* Setup a VMA at program startup for the vsyscall page.
+   Not called for compat tasks */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr;
+	int ret;
+	unsigned len = round_up(vdso_end - vdso_start, PAGE_SIZE);
+
+	if (!vdso_enabled)
+		return 0;
+
+	down_write(&mm->mmap_sem);
+	addr = get_unmapped_area(NULL, vdso_addr(mm->start_stack, len), len, 0, 0);
+	if (IS_ERR_VALUE(addr)) {
+		ret = addr;
+		goto up_fail;
+	}
+
+	ret = install_special_mapping(mm, addr, len,
+				      VM_READ|VM_EXEC|
+				      VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+				      VM_ALWAYSDUMP,
+				      vdso_pages);
+	if (ret)
+		goto up_fail;
+
+	current->mm->context.vdso = (void *)addr;
+up_fail:
+	up_write(&mm->mmap_sem);
+	return ret;
+}
+
+static __init int vdso_setup(char *s)
+{
+	vdso_enabled = simple_strtoul(s, NULL, 0);
+	return 0;
+}
+__setup("vdso=", vdso_setup);
Index: linux/arch/x86_64/vdso/vdso.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso.S
@@ -0,0 +1,2 @@
+	.section ".vdso","a"
+	.incbin "arch/x86_64/vdso/vdso.so"
Index: linux/arch/x86_64/vdso/vdso-start.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso-start.S
@@ -0,0 +1,2 @@
+	.globl vdso_kernel_start
+vdso_kernel_start:
Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -77,7 +77,8 @@ head-y := arch/x86_64/kernel/head.o arch
 libs-y 					+= arch/x86_64/lib/
 core-y					+= arch/x86_64/kernel/ \
 					   arch/x86_64/mm/ \
-					   arch/x86_64/crypto/
+					   arch/x86_64/crypto/ \
+					   arch/x86_64/vdso/
 core-$(CONFIG_IA32_EMULATION)		+= arch/x86_64/ia32/
 drivers-$(CONFIG_PCI)			+= arch/x86_64/pci/
 drivers-$(CONFIG_OPROFILE)		+= arch/x86_64/oprofile/
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -415,6 +415,9 @@ void vmalloc_sync_all(void);
 
 extern int kern_addr_valid(unsigned long addr); 
 
+extern void set_kernel_map(void *vaddr, unsigned long len,
+			   unsigned long phys, pgprot_t prot);
+
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
 		remap_pfn_range(vma, vaddr, pfn, size, prot)
 
@@ -442,6 +445,9 @@ extern int kern_addr_valid(unsigned long
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 #define __HAVE_ARCH_PTE_SAME
 #include <asm-generic/pgtable.h>
-#endif /* !__ASSEMBLY__ */
 
+extern void
+fix_set_pte_phys(unsigned long vaddr, unsigned long phys, pgprot_t prot);
+
+#endif /* !__ASSEMBLY__ */
 #endif /* _X86_64_PGTABLE_H */
Index: linux/include/asm-x86_64/mmu.h
===================================================================
--- linux.orig/include/asm-x86_64/mmu.h
+++ linux/include/asm-x86_64/mmu.h
@@ -15,6 +15,7 @@ typedef struct { 
 	rwlock_t ldtlock; 
 	int size;
 	struct semaphore sem; 
+	void *vdso;
 } mm_context_t;
 
 #endif
Index: linux/include/asm-x86_64/vsyscall.h
===================================================================
--- linux.orig/include/asm-x86_64/vsyscall.h
+++ linux/include/asm-x86_64/vsyscall.h
@@ -22,6 +22,8 @@ enum vsyscall_num {
 /* Definitions for CONFIG_GENERIC_TIME definitions */
 #define __section_vsyscall_gtod_data __attribute__ \
 	((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
+#define __section_vsyscall_clock __attribute__ \
+	((unused, __section__ (".vsyscall_clock"),aligned(16)))
 #define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))
 
 #define VGETCPU_RDTSCP	1
@@ -36,7 +38,6 @@ extern volatile unsigned long __jiffies;
 /* kernel space (writeable) */
 extern int vgetcpu_mode;
 extern struct timezone sys_tz;
-extern struct vsyscall_gtod_data_t vsyscall_gtod_data;
 
 #endif /* __KERNEL__ */
 
Index: linux/include/asm-x86_64/auxvec.h
===================================================================
--- linux.orig/include/asm-x86_64/auxvec.h
+++ linux/include/asm-x86_64/auxvec.h
@@ -1,4 +1,6 @@
 #ifndef __ASM_X86_64_AUXVEC_H
 #define __ASM_X86_64_AUXVEC_H
 
+#define AT_SYSINFO_EHDR		33
+
 #endif
Index: linux/include/asm-x86_64/elf.h
===================================================================
--- linux.orig/include/asm-x86_64/elf.h
+++ linux/include/asm-x86_64/elf.h
@@ -162,6 +162,19 @@ extern int dump_task_fpu (struct task_st
 /* 1GB for 64bit, 8MB for 32bit */
 #define STACK_RND_MASK (test_thread_flag(TIF_IA32) ? 0x7ff : 0x3fffff)
 
+
+#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
+struct linux_binprm;
+extern int arch_setup_additional_pages(struct linux_binprm *bprm,
+                                       int executable_stack);
+
+extern int vdso_enabled;
+
+#define ARCH_DLINFO						\
+do if (vdso_enabled) {						\
+	NEW_AUX_ENT(AT_SYSINFO_EHDR,(unsigned long)current->mm->context.vdso);\
+} while (0)
+
 #endif
 
 #endif
Index: linux/arch/x86_64/vdso/vextern.h
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vextern.h
@@ -0,0 +1,16 @@
+#ifndef VEXTERN
+#include <asm/vsyscall.h>
+#define VEXTERN(x) \
+	extern typeof(x) *vdso_ ## x __attribute__((visibility("hidden")));
+#endif
+
+#define VMAGIC 0xfeedbabeabcdefabUL
+
+/* Any kernel variables used in the vDSO must be exported in the main
+   kernel's vmlinux.lds.S/vsyscall.h/proper __section and
+   put into vextern.h and be referenced as a pointer with vdso prefix.
+   The main kernel later fills in the values.   */
+
+VEXTERN(jiffies)
+VEXTERN(vgetcpu_mode)
+VEXTERN(vsyscall_gtod_data)
Index: linux/arch/x86_64/vdso/vgetcpu.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vgetcpu.c
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2006 Andi Kleen, SUSE Labs.
+ * Subject to the GNU Public License, v.2
+ *
+ * Fast user context implementation of getcpu()
+ */
+
+#include <linux/kernel.h>
+#include <linux/getcpu.h>
+#include <linux/jiffies.h>
+#include <linux/time.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include "vextern.h"
+
+long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+{
+	unsigned int dummy, p;
+	unsigned long j = 0;
+
+	/* Fast cache - only recompute value once per jiffies and avoid
+	   relatively costly rdtscp/cpuid otherwise.
+	   This works because the scheduler usually keeps the process
+	   on the same CPU and this syscall doesn't guarantee its
+	   results anyways.
+	   We do this here because otherwise user space would do it on
+	   its own in a likely inferior way (no access to jiffies).
+	   If you don't like it pass NULL. */
+	if (tcache && tcache->blob[0] == (j = *vdso_jiffies)) {
+		p = tcache->blob[1];
+	} else if (*vdso_vgetcpu_mode == VGETCPU_RDTSCP) {
+		/* Load per CPU data from RDTSCP */
+		rdtscp(dummy, dummy, p);
+	} else {
+		/* Load per CPU data from GDT */
+		asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+	}
+	if (tcache) {
+		tcache->blob[0] = j;
+		tcache->blob[1] = p;
+	}
+	if (cpu)
+		*cpu = p & 0xfff;
+	if (node)
+		*node = p >> 12;
+	return 0;
+}
+
+long getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+	__attribute__((weak, alias("__vdso_getcpu")));
Index: linux/arch/x86_64/vdso/vvar.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vvar.c
@@ -0,0 +1,12 @@
+/* Define pointer to external vDSO variables.
+   These are part of the vDSO. The kernel fills in the real addresses
+   at boot time. This is done because when the vdso is linked the
+   kernel isn't yet and we don't know the final addresses. */
+#include <linux/kernel.h>
+#include <linux/time.h>
+#include <asm/vsyscall.h>
+#include <asm/timex.h>
+#include <asm/vgtod.h>
+
+#define VEXTERN(x) typeof (__ ## x) *vdso_ ## x = (void *)VMAGIC;
+#include "vextern.h"
Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -1813,7 +1813,7 @@ and is between 256 and 4096 characters. 
 	usbhid.mousepoll=
 			[USBHID] The interval which mice are to be polled at.
 
-	vdso=		[IA-32,SH]
+	vdso=		[IA-32,SH,x86-64]
 			vdso=2: enable compat VDSO (default with COMPAT_VDSO)
 			vdso=1: enable VDSO (default)
 			vdso=0: disable VDSO mapping
Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -42,6 +42,7 @@
 #include <asm/segment.h>
 #include <asm/desc.h>
 #include <asm/topology.h>
+#include <asm/vgtod.h>
 
 #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
 #define __syscall_clobber "r11","rcx","memory"
@@ -57,26 +58,9 @@
  * - writen by timer interrupt or systcl (/proc/sys/kernel/vsyscall64)
  * Try to keep this structure as small as possible to avoid cache line ping pongs
  */
-struct vsyscall_gtod_data_t {
-	seqlock_t	lock;
-
-	/* open coded 'struct timespec' */
-	time_t		wall_time_sec;
-	u32		wall_time_nsec;
-
-	int		sysctl_enabled;
-	struct timezone sys_tz;
-	struct { /* extract of a clocksource struct */
-		cycle_t (*vread)(void);
-		cycle_t	cycle_last;
-		cycle_t	mask;
-		u32	mult;
-		u32	shift;
-	} clock;
-};
 int __vgetcpu_mode __section_vgetcpu_mode;
 
-struct vsyscall_gtod_data_t __vsyscall_gtod_data __section_vsyscall_gtod_data =
+struct vsyscall_gtod_data __vsyscall_gtod_data __section_vsyscall_gtod_data =
 {
 	.lock = SEQLOCK_UNLOCKED,
 	.sysctl_enabled = 1,
@@ -96,6 +80,8 @@ void update_vsyscall(struct timespec *wa
 	vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
 	vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
 	vsyscall_gtod_data.sys_tz = sys_tz;
+	vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
+	vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
 
Index: linux/arch/x86_64/kernel/time.c
===================================================================
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -44,6 +44,7 @@
 #include <asm/hpet.h>
 #include <asm/mpspec.h>
 #include <asm/nmi.h>
+#include <asm/vgtod.h>
 
 static char *timename = NULL;
 
Index: linux/include/asm-x86_64/vgtod.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/vgtod.h
@@ -0,0 +1,29 @@
+#ifndef _ASM_VGTOD_H
+#define _ASM_VGTOD_H 1
+
+#include <asm/vsyscall.h>
+#include <linux/clocksource.h>
+
+struct vsyscall_gtod_data {
+	seqlock_t	lock;
+
+	/* open coded 'struct timespec' */
+	time_t		wall_time_sec;
+	u32		wall_time_nsec;
+
+	int		sysctl_enabled;
+	struct timezone sys_tz;
+	struct { /* extract of a clocksource struct */
+		cycle_t (*vread)(void);
+		cycle_t	cycle_last;
+		cycle_t	mask;
+		u32	mult;
+		u32	shift;
+	} clock;
+	struct timespec wall_to_monotonic;
+};
+extern struct vsyscall_gtod_data __vsyscall_gtod_data
+__section_vsyscall_gtod_data;
+extern struct vsyscall_gtod_data vsyscall_gtod_data;
+
+#endif
Index: linux/arch/x86_64/vdso/voffset.h
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/voffset.h
@@ -0,0 +1 @@
+#define VDSO_TEXT_OFFSET 0x500

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (7 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Dead to magic numbers!

Generated code is the same.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/kernel/verify_cpu.S |   23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

Index: linux/arch/x86_64/kernel/verify_cpu.S
===================================================================
--- linux.orig/arch/x86_64/kernel/verify_cpu.S
+++ linux/arch/x86_64/kernel/verify_cpu.S
@@ -30,18 +30,27 @@
  * 	appropriately. Either display a message or halt.
  */
 
-verify_cpu:
+#include <asm/cpufeature.h>
 
+verify_cpu:
 	pushfl				# Save caller passed flags
 	pushl	$0			# Kill any dangerous flags
 	popfl
 
-	/* minimum CPUID flags for x86-64 */
-	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
-#define SSE_MASK ((1<<25)|(1<<26))
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-					   (1<<13)|(1<<15)|(1<<24))
-#define REQUIRED_MASK2 (1<<29)
+	/* minimum CPUID flags for x86-64 as defined by AMD */
+#define M(x) (1<<(x))
+#define M2(a,b) M(a)|M(b)
+#define M4(a,b,c,d) M(a)|M(b)|M(c)|M(d)
+
+#define SSE_MASK \
+	(M2(X86_FEATURE_XMM,X86_FEATURE_XMM2))
+#define REQUIRED_MASK1 \
+	(M4(X86_FEATURE_FPU,X86_FEATURE_PSE,X86_FEATURE_TSC,X86_FEATURE_MSR)|\
+	 M4(X86_FEATURE_PAE,X86_FEATURE_CX8,X86_FEATURE_PGE,X86_FEATURE_CMOV)|\
+	 M(X86_FEATURE_FXSR))
+#define REQUIRED_MASK2 \
+	(M(X86_FEATURE_LM - 32))
+
 	pushfl				# standard way to check for cpuid
 	popl	%eax
 	movl	%eax,%ebx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (8 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Follows i386 and useful cleanup.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/boot/Makefile |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/boot/Makefile
===================================================================
--- linux.orig/arch/x86_64/boot/Makefile
+++ linux/arch/x86_64/boot/Makefile
@@ -36,7 +36,7 @@ subdir-		:= compressed/	#Let make clean 
 # ---------------------------------------------------------------------------
 
 $(obj)/bzImage: IMAGE_OFFSET := 0x100000
-$(obj)/bzImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
+$(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
 $(obj)/bzImage: BUILDFLAGS   := -b
 
 quiet_cmd_image = BUILD   $@

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (9 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Needed for followon patch

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/boot/Makefile |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/boot/Makefile
===================================================================
--- linux.orig/arch/i386/boot/Makefile
+++ linux/arch/i386/boot/Makefile
@@ -36,9 +36,9 @@ HOSTCFLAGS_build.o := $(LINUXINCLUDE)
 # ---------------------------------------------------------------------------
 
 $(obj)/zImage:  IMAGE_OFFSET := 0x1000
-$(obj)/zImage:  EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK)
+$(obj)/zImage:  EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK)
 $(obj)/bzImage: IMAGE_OFFSET := 0x100000
-$(obj)/bzImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
+$(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
 $(obj)/bzImage: BUILDFLAGS   := -b
 
 quiet_cmd_image = BUILD   $@

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [12/30] i386: Verify important CPUID bits in real mode
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (10 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Check some CPUID bits that are needed for compiler generated early in boot.
When the system is still in real mode before changing the VESA BIOS mode
it is possible to still display an visible error message on the screen.

Similar to x86-64.

Includes cleanups from Eric Biederman

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/Kconfig.cpu                |   22 ++++++++++-
 arch/i386/boot/setup.S               |   17 ++++++++
 arch/i386/kernel/verify_cpu.S        |   68 +++++++++++++++++++++++++++++++++++
 include/asm-i386/cpufeature.h        |    3 +
 include/asm-i386/required-features.h |   34 +++++++++++++++++
 5 files changed, 142 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/verify_cpu.S
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/verify_cpu.S
@@ -0,0 +1,65 @@
+/* Check if CPU has some minimum CPUID bits
+   This runs in 16bit mode so that the caller can still use the BIOS
+   to output errors on the screen */
+#include <asm/cpufeature.h>
+
+verify_cpu:
+	pushfl				# Save caller passed flags
+	pushl	$0			# Kill any dangerous flags
+	popfl
+
+#if CONFIG_X86_MINIMUM_CPU_MODEL >= 4
+	pushfl
+	orl	$(1<<18),(%esp)		# try setting AC
+	popfl
+	pushfl
+	popl    %eax
+	testl	$(1<<18),%eax
+	jz	bad
+#endif
+#if REQUIRED_MASK1 != 0
+	pushfl				# standard way to check for cpuid
+	popl	%eax
+	movl	%eax,%ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	cmpl	%eax,%ebx
+	pushfl				# standard way to check for cpuid
+	popl	%eax
+	movl	%eax,%ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	cmpl	%eax,%ebx
+	jz	bad			# REQUIRED_MASK1 != 0 requires CPUID
+
+	movl	$0x0,%eax		# See if cpuid 1 is implemented
+	cpuid
+	cmpl	$0x1,%eax
+	jb	bad			# no cpuid 1
+
+	movl    $0x1,%eax		# Does the cpu have what it takes
+	cpuid
+
+#if CONFIG_X86_MINIMUM_CPU_MODEL > 4
+#error	add proper model checking here
+#endif
+
+	andl	$REQUIRED_MASK1,%edx
+	xorl	$REQUIRED_MASK1,%edx
+	jnz	bad
+#endif /* REQUIRED_MASK1 */
+
+	popfl
+	xor	%eax,%eax
+	ret
+
+bad:
+	popfl
+	movl	$1,%eax
+	ret
Index: linux/arch/i386/Kconfig.cpu
===================================================================
--- linux.orig/arch/i386/Kconfig.cpu
+++ linux/arch/i386/Kconfig.cpu
@@ -240,14 +240,19 @@ config X86_L1_CACHE_SHIFT
 	default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
 	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7
 
+config X86_XADD
+	bool
+	depends on !M386
+	default y
+
 config RWSEM_GENERIC_SPINLOCK
 	bool
-	depends on M386
+	depends on !X86_XADD
 	default y
 
 config RWSEM_XCHGADD_ALGORITHM
 	bool
-	depends on !M386
+	depends on X86_XADD
 	default y
 
 config ARCH_HAS_ILOG2_U32
@@ -331,3 +336,16 @@ config X86_TSC
 	bool
 	depends on (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ
 	default y
+
+# this should be set for all -march=.. options where the compiler
+# generates cmov.
+config X86_CMOV
+	bool
+	depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7)
+	default y
+
+config X86_MINIMUM_CPU_MODEL
+	int
+	default "4" if X86_XADD || X86_CMPXCHG || X86_BSWAP
+	default "0"
+
Index: linux/arch/i386/boot/setup.S
===================================================================
--- linux.orig/arch/i386/boot/setup.S
+++ linux/arch/i386/boot/setup.S
@@ -302,7 +302,24 @@ good_sig:
 
 loader_panic_mess: .string "Wrong loader, giving up..."
 
+# check minimum cpuid
+# we do this here because it is the last place we can actually
+# show a user visible error message. Later the video modus
+# might be already messed up.
 loader_ok:
+	call verify_cpu
+	testl  %eax,%eax
+	jz	cpu_ok
+	lea	cpu_panic_mess,%si
+	call	prtstr
+1:	jmp	1b
+
+cpu_panic_mess:
+	.asciz  "PANIC: CPU too old for this kernel."
+
+#include "../kernel/verify_cpu.S"
+
+cpu_ok:
 # Get memory size (extended mem, kB)
 
 	xorl	%eax, %eax
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -7,7 +7,10 @@
 #ifndef __ASM_I386_CPUFEATURE_H
 #define __ASM_I386_CPUFEATURE_H
 
+#ifndef __ASSEMBLY__
 #include <linux/bitops.h>
+#endif
+#include <asm/required-features.h>
 
 #define NCAPINTS	7	/* N 32-bit words worth of info */
 
Index: linux/include/asm-i386/required-features.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/required-features.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_REQUIRED_FEATURES_H
+#define _ASM_REQUIRED_FEATURES_H 1
+
+/* Define minimum CPUID feature set for kernel These bits are checked
+   really early to actually display a visible error message before the
+   kernel dies.  Only add word 0 bits here
+
+   Some requirements that are not in CPUID yet are also in the
+   CONFIG_X86_MINIMUM_CPU mode which is checked too.
+
+   The real information is in arch/i386/Kconfig.cpu, this just converts
+   the CONFIGs into a bitmask */
+
+#ifdef CONFIG_X86_PAE
+#define NEED_PAE	(1<<X86_FEATURE_PAE)
+#else
+#define NEED_PAE	0
+#endif
+
+#ifdef CONFIG_X86_CMOV
+#define NEED_CMOV	(1<<X86_FEATURE_CMOV)
+#else
+#define NEED_CMOV	0
+#endif
+
+#ifdef CONFIG_X86_CMPXCHG64
+#define NEED_CMPXCHG64  (1<<X86_FEATURE_CX8)
+#else
+#define NEED_CMPXCHG64  0
+#endif
+
+#define REQUIRED_MASK1	(NEED_PAE|NEED_CMOV|NEED_CMPXCHG64)
+
+#endif

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [13/30] i386: Evaluate constant cpu features at runtime
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (11 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Redefine cpu_has() to evaluate cpu features already checked in early 
boot at compile time.  This way the compiler might eliminate some dead code.
Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/cpufeature.h |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -106,8 +106,12 @@
 #define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
 #define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading not valid */
 
-#define cpu_has(c, bit)		test_bit(bit, (c)->x86_capability)
-#define boot_cpu_has(bit)	test_bit(bit, boot_cpu_data.x86_capability)
+#define cpu_has(c, bit)					\
+	((__builtin_constant_p(bit) && (bit) < 32 && 	\
+		(1UL << (bit)) & REQUIRED_MASK1) ?	\
+		1 : 					\
+	test_bit(bit, (c)->x86_capability))
+#define boot_cpu_has(bit)	cpu_has(&boot_cpu_data, bit)
 
 #define cpu_has_fpu		boot_cpu_has(X86_FEATURE_FPU)
 #define cpu_has_vme		boot_cpu_has(X86_FEATURE_VME)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [14/30] i386: Implement alternative_io for i386
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (12 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Ported from x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/alternative.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Index: linux/include/asm-i386/alternative.h
===================================================================
--- linux.orig/include/asm-i386/alternative.h
+++ linux/include/asm-i386/alternative.h
@@ -82,6 +82,21 @@ static inline void alternatives_smp_swit
 		      "663:\n\t" newinstr "\n664:\n"   /* replacement */\
 		      ".previous" :: "i" (feature), ##input)
 
+/* Like alternative_input, but with a single output argument */
+#define alternative_io(oldinstr, newinstr, feature, output, input...) \
+	asm volatile ("661:\n\t" oldinstr "\n662:\n"			\
+		      ".section .altinstructions,\"a\"\n"		\
+		      "  .align 4\n"					\
+		      "  .long 661b\n"            /* label */		\
+		      "  .long 663f\n"		  /* new instruction */	\
+		      "  .byte %c[feat]\n"        /* feature bit */	\
+		      "  .byte 662b-661b\n"       /* sourcelen */	\
+		      "  .byte 664f-663f\n"       /* replacementlen */	\
+		      ".previous\n"					\
+		      ".section .altinstr_replacement,\"ax\"\n"		\
+		      "663:\n\t" newinstr "\n664:\n"   /* replacement */ \
+		      ".previous" : output : [feat] "i" (feature), ##input)
+
 /*
  * Alternative inline assembly for SMP.
  *

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (13 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Syncs up with x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/intel.c  |    4 +++-
 include/asm-i386/cpufeature.h |    1 +
 include/asm-i386/tsc.h        |    4 ----
 3 files changed, 4 insertions(+), 5 deletions(-)

Index: linux/arch/i386/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/intel.c
+++ linux/arch/i386/kernel/cpu/intel.c
@@ -188,8 +188,10 @@ static void __cpuinit init_intel(struct 
 	}
 #endif
 
-	if (c->x86 == 15)
+	if (c->x86 == 15) {
 		set_bit(X86_FEATURE_P4, c->x86_capability);
+		set_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
+	}
 	if (c->x86 == 6) 
 		set_bit(X86_FEATURE_P3, c->x86_capability);
 	if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -79,6 +79,7 @@
 #define X86_FEATURE_PEBS	(3*32+12)  /* Precise-Event Based Sampling */
 #define X86_FEATURE_BTS		(3*32+13)  /* Branch Trace Store */
 #define X86_FEATURE_LAPIC_TIMER_BROKEN (3*32+ 14) /* lapic timer broken in C1 */
+#define X86_FEATURE_SYNC_RDTSC	(3*32+15)  /* RDTSC synchronizes the CPU */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	(4*32+ 0) /* Streaming SIMD Extensions-3 */
Index: linux/include/asm-i386/tsc.h
===================================================================
--- linux.orig/include/asm-i386/tsc.h
+++ linux/include/asm-i386/tsc.h
@@ -35,7 +35,6 @@ static inline cycles_t get_cycles(void)
 static __always_inline cycles_t get_cycles_sync(void)
 {
 	unsigned long long ret;
-#ifdef X86_FEATURE_SYNC_RDTSC
 	unsigned eax;
 
 	/*
@@ -44,9 +43,6 @@ static __always_inline cycles_t get_cycl
 	 */
 	alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
 			  "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
-#else
-	sync_core();
-#endif
 	rdtscll(ret);
 
 	return ret;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (14 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Following x86-64
Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/cpufeature.h |    1 +
 1 file changed, 1 insertion(+)

Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -52,6 +52,7 @@
 #define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
 #define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */
 #define X86_FEATURE_3DNOW	(1*32+31) /* 3DNow! */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (15 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: joerg.roedel, patches, linux-kernel


RDTSCP is already synchronous and doesn't need an explicit CPUID.
This is a little faster and more importantly avoids VMEXITs on Hypervisors.

Original patch from Joerg Roedel, but reworked by AK
Also includes miscompilation fix by Eric Biederman

Cc: "Joerg Roedel" <joerg.roedel@amd.com>

Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/tsc.h |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux/include/asm-i386/tsc.h
===================================================================
--- linux.orig/include/asm-i386/tsc.h
+++ linux/include/asm-i386/tsc.h
@@ -38,6 +38,15 @@ static __always_inline cycles_t get_cycl
 	unsigned eax;
 
 	/*
+  	 * Use RDTSCP if possible; it is guaranteed to be synchronous
+ 	 * and doesn't cause a VMEXIT on Hypervisors
+	 */
+	alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
+			 	 "=A" (ret), "0" (0ULL) : "ecx", "memory");
+	if (ret)
+		return ret;
+
+	/*
 	 * Don't do an additional sync on CPUs where we know
 	 * RDTSC is already synchronous:
 	 */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (16 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


This was supposed to see the full memory on a ASUS A8SX motherboard
with 4GB RAM where the northbridge reports less memory, but it didn't
help there. But it's a reasonable change so let's include it anyways.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/mm/k8topology.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux/arch/x86_64/mm/k8topology.c
===================================================================
--- linux.orig/arch/x86_64/mm/k8topology.c
+++ linux/arch/x86_64/mm/k8topology.c
@@ -62,6 +62,8 @@ int __init k8_scan_nodes(unsigned long s
 
 	reg = read_pci_config(0, nb, 0, 0x60); 
 	numnodes = ((reg >> 4) & 0xF) + 1;
+	if (numnodes <= 1)
+		return -1;
 
 	printk(KERN_INFO "Number of nodes %d\n", numnodes);
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [19/30] i386: Little cleanups in smpboot.c
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (17 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


- Remove #if that is always set
- Fix warning

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/smpboot.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -516,7 +516,6 @@ static void unmap_cpu_to_logical_apicid(
 	unmap_cpu_to_node(cpu);
 }
 
-#if APIC_DEBUG
 static inline void __inquire_remote_apic(int apicid)
 {
 	int i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 };
@@ -548,14 +547,13 @@ static inline void __inquire_remote_apic
 		switch (status) {
 		case APIC_ICR_RR_VALID:
 			status = apic_read(APIC_RRR);
-			printk("%08x\n", status);
+			printk("%lx\n", status);
 			break;
 		default:
 			printk("failed\n");
 		}
 	}
 }
-#endif
 
 #ifdef WAKE_SECONDARY_VIA_NMI
 /* 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0)
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (18 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


access_ok checks this case anyways, no need to check twice.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/lib/usercopy.c |    7 -------
 1 file changed, 7 deletions(-)

Index: linux/arch/i386/lib/usercopy.c
===================================================================
--- linux.orig/arch/i386/lib/usercopy.c
+++ linux/arch/i386/lib/usercopy.c
@@ -716,7 +716,6 @@ do {									\
 unsigned long __copy_to_user_ll(void __user *to, const void *from,
 				unsigned long n)
 {
-	BUG_ON((long) n < 0);
 #ifndef CONFIG_X86_WP_WORKS_OK
 	if (unlikely(boot_cpu_data.wp_works_ok == 0) &&
 			((unsigned long )to) < TASK_SIZE) {
@@ -785,7 +784,6 @@ EXPORT_SYMBOL(__copy_to_user_ll);
 unsigned long __copy_from_user_ll(void *to, const void __user *from,
 					unsigned long n)
 {
-	BUG_ON((long)n < 0);
 	if (movsl_is_ok(to, from, n))
 		__copy_user_zeroing(to, from, n);
 	else
@@ -797,7 +795,6 @@ EXPORT_SYMBOL(__copy_from_user_ll);
 unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from,
 					 unsigned long n)
 {
-	BUG_ON((long)n < 0);
 	if (movsl_is_ok(to, from, n))
 		__copy_user(to, from, n);
 	else
@@ -810,7 +807,6 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero
 unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
 					unsigned long n)
 {
-	BUG_ON((long)n < 0);
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if ( n > 64 && cpu_has_xmm2)
                 n = __copy_user_zeroing_intel_nocache(to, from, n);
@@ -825,7 +821,6 @@ unsigned long __copy_from_user_ll_nocach
 unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
 					unsigned long n)
 {
-	BUG_ON((long)n < 0);
 #ifdef CONFIG_X86_INTEL_USERCOPY
 	if ( n > 64 && cpu_has_xmm2)
                 n = __copy_user_intel_nocache(to, from, n);
@@ -853,7 +848,6 @@ unsigned long __copy_from_user_ll_nocach
 unsigned long
 copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	BUG_ON((long) n < 0);
 	if (access_ok(VERIFY_WRITE, to, n))
 		n = __copy_to_user(to, from, n);
 	return n;
@@ -879,7 +873,6 @@ EXPORT_SYMBOL(copy_to_user);
 unsigned long
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	BUG_ON((long) n < 0);
 	if (access_ok(VERIFY_READ, from, n))
 		n = __copy_from_user(to, from, n);
 	else

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (19 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 fs/compat.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux/fs/compat.c
===================================================================
--- linux.orig/fs/compat.c
+++ linux/fs/compat.c
@@ -371,13 +371,14 @@ static void compat_ioctl_error(struct fi
 			fn = "?";
 	}
 
-	sprintf(buf,"'%c'", (cmd>>24) & 0x3f);
+	sprintf(buf,"'%c'", (cmd>>_IOC_TYPESHIFT) & _IOC_TYPEMASK);
 	if (!isprint(buf[1]))
 		sprintf(buf, "%02x", buf[1]);
 	compat_printk("ioctl32(%s:%d): Unknown cmd fd(%d) "
-			"cmd(%08x){%s} arg(%08x) on %s\n",
+			"cmd(%08x){t:%s;sz:%u} arg(%08x) on %s\n",
 			current->comm, current->pid,
 			(int)fd, (unsigned int)cmd, buf,
+			(cmd >> _IOC_SIZESHIFT) & _IOC_SIZEMASK,
 			(unsigned int)arg, fn);
 
 	if (path)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [22/30] x86_64: Remove CONFIG_REORDER
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (20 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: arjan, patches, linux-kernel


The option never worked well and functionlist wasn't well maintained.
Also it made the build very slow on many binutils version.

So just remove it.

Cc: arjan@linux.intel.com
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/Kconfig              |    8 
 arch/x86_64/Makefile             |    1 
 arch/x86_64/kernel/functionlist  | 1284 ---------------------------------------
 arch/x86_64/kernel/vmlinux.lds.S |    3 
 4 files changed, 1296 deletions(-)

Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -660,14 +660,6 @@ config CC_STACKPROTECTOR_ALL
 
 source kernel/Kconfig.hz
 
-config REORDER
-	bool "Function reordering"
-	default n
-	help
-         This option enables the toolchain to reorder functions for a more 
-         optimal TLB usage. If you have pretty much any version of binutils, 
-	 this can increase your kernel build time by roughly one minute.
-
 config K8_NB
 	def_bool y
 	depends on AGP_AMD64 || IOMMU || (PCI && NUMA)
Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -40,7 +40,6 @@ cflags-y += -m64
 cflags-y += -mno-red-zone
 cflags-y += -mcmodel=kernel
 cflags-y += -pipe
-cflags-kernel-$(CONFIG_REORDER) += -ffunction-sections
 cflags-y += -Wno-sign-compare
 cflags-y += -fno-asynchronous-unwind-tables
 ifneq ($(CONFIG_DEBUG_INFO),y)
Index: linux/arch/x86_64/kernel/functionlist
===================================================================
--- linux.orig/arch/x86_64/kernel/functionlist
+++ /dev/null
@@ -1,1284 +0,0 @@
-*(.text.flush_thread)
-*(.text.check_poison_obj)
-*(.text.copy_page)
-*(.text.__set_personality)
-*(.text.gart_map_sg)
-*(.text.kmem_cache_free)
-*(.text.find_get_page)
-*(.text._raw_spin_lock)
-*(.text.ide_outb)
-*(.text.unmap_vmas)
-*(.text.copy_page_range)
-*(.text.kprobe_handler)
-*(.text.__handle_mm_fault)
-*(.text.__d_lookup)
-*(.text.copy_user_generic)
-*(.text.__link_path_walk)
-*(.text.get_page_from_freelist)
-*(.text.kmem_cache_alloc)
-*(.text.drive_cmd_intr)
-*(.text.ia32_setup_sigcontext)
-*(.text.huge_pte_offset)
-*(.text.do_page_fault)
-*(.text.page_remove_rmap)
-*(.text.release_pages)
-*(.text.ide_end_request)
-*(.text.__mutex_lock_slowpath)
-*(.text.__find_get_block)
-*(.text.kfree)
-*(.text.vfs_read)
-*(.text._raw_spin_unlock)
-*(.text.free_hot_cold_page)
-*(.text.fget_light)
-*(.text.schedule)
-*(.text.memcmp)
-*(.text.touch_atime)
-*(.text.__might_sleep)
-*(.text.__down_read_trylock)
-*(.text.arch_pick_mmap_layout)
-*(.text.find_vma)
-*(.text.__make_request)
-*(.text.do_generic_mapping_read)
-*(.text.mutex_lock_interruptible)
-*(.text.__generic_file_aio_read)
-*(.text._atomic_dec_and_lock)
-*(.text.__wake_up_bit)
-*(.text.add_to_page_cache)
-*(.text.cache_alloc_debugcheck_after)
-*(.text.vm_normal_page)
-*(.text.mutex_debug_check_no_locks_freed)
-*(.text.net_rx_action)
-*(.text.__find_first_zero_bit)
-*(.text.put_page)
-*(.text._raw_read_lock)
-*(.text.__delay)
-*(.text.dnotify_parent)
-*(.text.do_path_lookup)
-*(.text.do_sync_read)
-*(.text.do_lookup)
-*(.text.bit_waitqueue)
-*(.text.file_read_actor)
-*(.text.strncpy_from_user)
-*(.text.__pagevec_lru_add_active)
-*(.text.fget)
-*(.text.dput)
-*(.text.__strnlen_user)
-*(.text.inotify_inode_queue_event)
-*(.text.rw_verify_area)
-*(.text.ide_intr)
-*(.text.inotify_dentry_parent_queue_event)
-*(.text.permission)
-*(.text.memscan)
-*(.text.hpet_rtc_interrupt)
-*(.text.do_mmap_pgoff)
-*(.text.current_fs_time)
-*(.text.vfs_getattr)
-*(.text.kmem_flagcheck)
-*(.text.mark_page_accessed)
-*(.text.free_pages_and_swap_cache)
-*(.text.generic_fillattr)
-*(.text.__block_prepare_write)
-*(.text.__set_page_dirty_nobuffers)
-*(.text.link_path_walk)
-*(.text.find_get_pages_tag)
-*(.text.ide_do_request)
-*(.text.__alloc_pages)
-*(.text.generic_permission)
-*(.text.mod_page_state_offset)
-*(.text.free_pgd_range)
-*(.text.generic_file_buffered_write)
-*(.text.number)
-*(.text.ide_do_rw_disk)
-*(.text.__brelse)
-*(.text.__mod_page_state_offset)
-*(.text.rotate_reclaimable_page)
-*(.text.find_vma_prepare)
-*(.text.find_vma_prev)
-*(.text.lru_cache_add_active)
-*(.text.__kmalloc_track_caller)
-*(.text.smp_invalidate_interrupt)
-*(.text.handle_IRQ_event)
-*(.text.__find_get_block_slow)
-*(.text.do_wp_page)
-*(.text.do_select)
-*(.text.set_user_nice)
-*(.text.sys_read)
-*(.text.do_munmap)
-*(.text.csum_partial)
-*(.text.__do_softirq)
-*(.text.may_open)
-*(.text.getname)
-*(.text.get_empty_filp)
-*(.text.__fput)
-*(.text.remove_mapping)
-*(.text.filp_ctor)
-*(.text.poison_obj)
-*(.text.unmap_region)
-*(.text.test_set_page_writeback)
-*(.text.__do_page_cache_readahead)
-*(.text.sock_def_readable)
-*(.text.ide_outl)
-*(.text.shrink_zone)
-*(.text.rb_insert_color)
-*(.text.get_request)
-*(.text.sys_pread64)
-*(.text.spin_bug)
-*(.text.ide_outsl)
-*(.text.mask_and_ack_8259A)
-*(.text.filemap_nopage)
-*(.text.page_add_file_rmap)
-*(.text.find_lock_page)
-*(.text.tcp_poll)
-*(.text.__mark_inode_dirty)
-*(.text.file_ra_state_init)
-*(.text.generic_file_llseek)
-*(.text.__pagevec_lru_add)
-*(.text.page_cache_readahead)
-*(.text.n_tty_receive_buf)
-*(.text.zonelist_policy)
-*(.text.vma_adjust)
-*(.text.test_clear_page_dirty)
-*(.text.sync_buffer)
-*(.text.do_exit)
-*(.text.__bitmap_weight)
-*(.text.alloc_pages_current)
-*(.text.get_unused_fd)
-*(.text.zone_watermark_ok)
-*(.text.cpuset_update_task_memory_state)
-*(.text.__bitmap_empty)
-*(.text.sys_munmap)
-*(.text.__inode_dir_notify)
-*(.text.__generic_file_aio_write_nolock)
-*(.text.__pte_alloc)
-*(.text.sys_select)
-*(.text.vm_acct_memory)
-*(.text.vfs_write)
-*(.text.__lru_add_drain)
-*(.text.prio_tree_insert)
-*(.text.generic_file_aio_read)
-*(.text.vma_merge)
-*(.text.block_write_full_page)
-*(.text.__page_set_anon_rmap)
-*(.text.apic_timer_interrupt)
-*(.text.release_console_sem)
-*(.text.sys_write)
-*(.text.sys_brk)
-*(.text.dup_mm)
-*(.text.read_current_timer)
-*(.text.ll_rw_block)
-*(.text.blk_rq_map_sg)
-*(.text.dbg_userword)
-*(.text.__block_commit_write)
-*(.text.cache_grow)
-*(.text.copy_strings)
-*(.text.release_task)
-*(.text.do_sync_write)
-*(.text.unlock_page)
-*(.text.load_elf_binary)
-*(.text.__follow_mount)
-*(.text.__getblk)
-*(.text.do_sys_open)
-*(.text.current_kernel_time)
-*(.text.call_rcu)
-*(.text.write_chan)
-*(.text.vsnprintf)
-*(.text.dummy_inode_setsecurity)
-*(.text.submit_bh)
-*(.text.poll_freewait)
-*(.text.bio_alloc_bioset)
-*(.text.skb_clone)
-*(.text.page_waitqueue)
-*(.text.__mutex_lock_interruptible_slowpath)
-*(.text.get_index)
-*(.text.csum_partial_copy_generic)
-*(.text.bad_range)
-*(.text.remove_vma)
-*(.text.cp_new_stat)
-*(.text.alloc_arraycache)
-*(.text.test_clear_page_writeback)
-*(.text.strsep)
-*(.text.open_namei)
-*(.text._raw_read_unlock)
-*(.text.get_vma_policy)
-*(.text.__down_write_trylock)
-*(.text.find_get_pages)
-*(.text.tcp_rcv_established)
-*(.text.generic_make_request)
-*(.text.__block_write_full_page)
-*(.text.cfq_set_request)
-*(.text.sys_inotify_init)
-*(.text.split_vma)
-*(.text.__mod_timer)
-*(.text.get_options)
-*(.text.vma_link)
-*(.text.mpage_writepages)
-*(.text.truncate_complete_page)
-*(.text.tcp_recvmsg)
-*(.text.sigprocmask)
-*(.text.filemap_populate)
-*(.text.sys_close)
-*(.text.inotify_dev_queue_event)
-*(.text.do_task_stat)
-*(.text.__dentry_open)
-*(.text.unlink_file_vma)
-*(.text.__pollwait)
-*(.text.packet_rcv_spkt)
-*(.text.drop_buffers)
-*(.text.free_pgtables)
-*(.text.generic_file_direct_write)
-*(.text.copy_process)
-*(.text.netif_receive_skb)
-*(.text.dnotify_flush)
-*(.text.print_bad_pte)
-*(.text.anon_vma_unlink)
-*(.text.sys_mprotect)
-*(.text.sync_sb_inodes)
-*(.text.find_inode_fast)
-*(.text.dummy_inode_readlink)
-*(.text.putname)
-*(.text.init_smp_flush)
-*(.text.dbg_redzone2)
-*(.text.sk_run_filter)
-*(.text.may_expand_vm)
-*(.text.generic_file_aio_write)
-*(.text.find_next_zero_bit)
-*(.text.file_kill)
-*(.text.audit_getname)
-*(.text.arch_unmap_area_topdown)
-*(.text.alloc_page_vma)
-*(.text.tcp_transmit_skb)
-*(.text.rb_next)
-*(.text.dbg_redzone1)
-*(.text.generic_file_mmap)
-*(.text.vfs_fstat)
-*(.text.sys_time)
-*(.text.page_lock_anon_vma)
-*(.text.get_unmapped_area)
-*(.text.remote_llseek)
-*(.text.__up_read)
-*(.text.fd_install)
-*(.text.eventpoll_init_file)
-*(.text.dma_alloc_coherent)
-*(.text.create_empty_buffers)
-*(.text.__mutex_unlock_slowpath)
-*(.text.dup_fd)
-*(.text.d_alloc)
-*(.text.tty_ldisc_try)
-*(.text.sys_stime)
-*(.text.__rb_rotate_right)
-*(.text.d_validate)
-*(.text.rb_erase)
-*(.text.path_release)
-*(.text.memmove)
-*(.text.invalidate_complete_page)
-*(.text.clear_inode)
-*(.text.cache_estimate)
-*(.text.alloc_buffer_head)
-*(.text.smp_call_function_interrupt)
-*(.text.flush_tlb_others)
-*(.text.file_move)
-*(.text.balance_dirty_pages_ratelimited)
-*(.text.vma_prio_tree_add)
-*(.text.timespec_trunc)
-*(.text.mempool_alloc)
-*(.text.iget_locked)
-*(.text.d_alloc_root)
-*(.text.cpuset_populate_dir)
-*(.text.anon_vma_prepare)
-*(.text.sys_newstat)
-*(.text.alloc_page_interleave)
-*(.text.__path_lookup_intent_open)
-*(.text.__pagevec_free)
-*(.text.inode_init_once)
-*(.text.free_vfsmnt)
-*(.text.__user_walk_fd)
-*(.text.cfq_idle_slice_timer)
-*(.text.sys_mmap)
-*(.text.sys_llseek)
-*(.text.prio_tree_remove)
-*(.text.filp_close)
-*(.text.file_permission)
-*(.text.vma_prio_tree_remove)
-*(.text.tcp_ack)
-*(.text.nameidata_to_filp)
-*(.text.sys_lseek)
-*(.text.percpu_counter_mod)
-*(.text.igrab)
-*(.text.__bread)
-*(.text.alloc_inode)
-*(.text.filldir)
-*(.text.__rb_rotate_left)
-*(.text.irq_affinity_write_proc)
-*(.text.init_request_from_bio)
-*(.text.find_or_create_page)
-*(.text.tty_poll)
-*(.text.tcp_sendmsg)
-*(.text.ide_wait_stat)
-*(.text.free_buffer_head)
-*(.text.flush_signal_handlers)
-*(.text.tcp_v4_rcv)
-*(.text.nr_blockdev_pages)
-*(.text.locks_remove_flock)
-*(.text.__iowrite32_copy)
-*(.text.do_filp_open)
-*(.text.try_to_release_page)
-*(.text.page_add_new_anon_rmap)
-*(.text.kmem_cache_size)
-*(.text.eth_type_trans)
-*(.text.try_to_free_buffers)
-*(.text.schedule_tail)
-*(.text.proc_lookup)
-*(.text.no_llseek)
-*(.text.kfree_skbmem)
-*(.text.do_wait)
-*(.text.do_mpage_readpage)
-*(.text.vfs_stat_fd)
-*(.text.tty_write)
-*(.text.705)
-*(.text.sync_page)
-*(.text.__remove_shared_vm_struct)
-*(.text.__kfree_skb)
-*(.text.sock_poll)
-*(.text.get_request_wait)
-*(.text.do_sigaction)
-*(.text.do_brk)
-*(.text.tcp_event_data_recv)
-*(.text.read_chan)
-*(.text.pipe_writev)
-*(.text.__emul_lookup_dentry)
-*(.text.rtc_get_rtc_time)
-*(.text.print_objinfo)
-*(.text.file_update_time)
-*(.text.do_signal)
-*(.text.disable_8259A_irq)
-*(.text.blk_queue_bounce)
-*(.text.__anon_vma_link)
-*(.text.__vma_link)
-*(.text.vfs_rename)
-*(.text.sys_newlstat)
-*(.text.sys_newfstat)
-*(.text.sys_mknod)
-*(.text.__show_regs)
-*(.text.iput)
-*(.text.get_signal_to_deliver)
-*(.text.flush_tlb_page)
-*(.text.debug_mutex_wake_waiter)
-*(.text.copy_thread)
-*(.text.clear_page_dirty_for_io)
-*(.text.buffer_io_error)
-*(.text.vfs_permission)
-*(.text.truncate_inode_pages_range)
-*(.text.sys_recvfrom)
-*(.text.remove_suid)
-*(.text.mark_buffer_dirty)
-*(.text.local_bh_enable)
-*(.text.get_zeroed_page)
-*(.text.get_vmalloc_info)
-*(.text.flush_old_exec)
-*(.text.dummy_inode_permission)
-*(.text.__bio_add_page)
-*(.text.prio_tree_replace)
-*(.text.notify_change)
-*(.text.mntput_no_expire)
-*(.text.fput)
-*(.text.__end_that_request_first)
-*(.text.wake_up_bit)
-*(.text.unuse_mm)
-*(.text.shrink_icache_memory)
-*(.text.sched_balance_self)
-*(.text.__pmd_alloc)
-*(.text.pipe_poll)
-*(.text.normal_poll)
-*(.text.__free_pages)
-*(.text.follow_mount)
-*(.text.cdrom_start_packet_command)
-*(.text.blk_recount_segments)
-*(.text.bio_put)
-*(.text.__alloc_skb)
-*(.text.__wake_up)
-*(.text.vm_stat_account)
-*(.text.sys_fcntl)
-*(.text.sys_fadvise64)
-*(.text._raw_write_unlock)
-*(.text.__pud_alloc)
-*(.text.alloc_page_buffers)
-*(.text.vfs_llseek)
-*(.text.sockfd_lookup)
-*(.text._raw_write_lock)
-*(.text.put_compound_page)
-*(.text.prune_dcache)
-*(.text.pipe_readv)
-*(.text.mempool_free)
-*(.text.make_ahead_window)
-*(.text.lru_add_drain)
-*(.text.constant_test_bit)
-*(.text.__clear_user)
-*(.text.arch_unmap_area)
-*(.text.anon_vma_link)
-*(.text.sys_chroot)
-*(.text.setup_arg_pages)
-*(.text.radix_tree_preload)
-*(.text.init_rwsem)
-*(.text.generic_osync_inode)
-*(.text.generic_delete_inode)
-*(.text.do_sys_poll)
-*(.text.dev_queue_xmit)
-*(.text.default_llseek)
-*(.text.__writeback_single_inode)
-*(.text.vfs_ioctl)
-*(.text.__up_write)
-*(.text.unix_poll)
-*(.text.sys_rt_sigprocmask)
-*(.text.sock_recvmsg)
-*(.text.recalc_bh_state)
-*(.text.__put_unused_fd)
-*(.text.process_backlog)
-*(.text.locks_remove_posix)
-*(.text.lease_modify)
-*(.text.expand_files)
-*(.text.end_buffer_read_nobh)
-*(.text.d_splice_alias)
-*(.text.debug_mutex_init_waiter)
-*(.text.copy_from_user)
-*(.text.cap_vm_enough_memory)
-*(.text.show_vfsmnt)
-*(.text.release_sock)
-*(.text.pfifo_fast_enqueue)
-*(.text.half_md4_transform)
-*(.text.fs_may_remount_ro)
-*(.text.do_fork)
-*(.text.copy_hugetlb_page_range)
-*(.text.cache_free_debugcheck)
-*(.text.__tcp_select_window)
-*(.text.task_handoff_register)
-*(.text.sys_open)
-*(.text.strlcpy)
-*(.text.skb_copy_datagram_iovec)
-*(.text.set_up_list3s)
-*(.text.release_open_intent)
-*(.text.qdisc_restart)
-*(.text.n_tty_chars_in_buffer)
-*(.text.inode_change_ok)
-*(.text.__downgrade_write)
-*(.text.debug_mutex_unlock)
-*(.text.add_timer_randomness)
-*(.text.sock_common_recvmsg)
-*(.text.set_bh_page)
-*(.text.printk_lock)
-*(.text.path_release_on_umount)
-*(.text.ip_output)
-*(.text.ide_build_dmatable)
-*(.text.__get_user_8)
-*(.text.end_buffer_read_sync)
-*(.text.__d_path)
-*(.text.d_move)
-*(.text.del_timer)
-*(.text.constant_test_bit)
-*(.text.blockable_page_cache_readahead)
-*(.text.tty_read)
-*(.text.sys_readlink)
-*(.text.sys_faccessat)
-*(.text.read_swap_cache_async)
-*(.text.pty_write_room)
-*(.text.page_address_in_vma)
-*(.text.kthread)
-*(.text.cfq_exit_io_context)
-*(.text.__tcp_push_pending_frames)
-*(.text.sys_pipe)
-*(.text.submit_bio)
-*(.text.pid_revalidate)
-*(.text.page_referenced_file)
-*(.text.lock_sock)
-*(.text.get_page_state_node)
-*(.text.generic_block_bmap)
-*(.text.do_setitimer)
-*(.text.dev_queue_xmit_nit)
-*(.text.copy_from_read_buf)
-*(.text.__const_udelay)
-*(.text.console_conditional_schedule)
-*(.text.wake_up_new_task)
-*(.text.wait_for_completion_interruptible)
-*(.text.tcp_rcv_rtt_update)
-*(.text.sys_mlockall)
-*(.text.set_fs_altroot)
-*(.text.schedule_timeout)
-*(.text.nr_free_pagecache_pages)
-*(.text.nf_iterate)
-*(.text.mapping_tagged)
-*(.text.ip_queue_xmit)
-*(.text.ip_local_deliver)
-*(.text.follow_page)
-*(.text.elf_map)
-*(.text.dummy_file_permission)
-*(.text.dispose_list)
-*(.text.dentry_open)
-*(.text.dentry_iput)
-*(.text.bio_alloc)
-*(.text.wait_on_page_bit)
-*(.text.vfs_readdir)
-*(.text.vfs_lstat)
-*(.text.seq_escape)
-*(.text.__posix_lock_file)
-*(.text.mm_release)
-*(.text.kref_put)
-*(.text.ip_rcv)
-*(.text.__iget)
-*(.text.free_pages)
-*(.text.find_mergeable_anon_vma)
-*(.text.find_extend_vma)
-*(.text.dummy_inode_listsecurity)
-*(.text.bio_add_page)
-*(.text.__vm_enough_memory)
-*(.text.vfs_stat)
-*(.text.tty_paranoia_check)
-*(.text.tcp_read_sock)
-*(.text.tcp_data_queue)
-*(.text.sys_uname)
-*(.text.sys_renameat)
-*(.text.__strncpy_from_user)
-*(.text.__mutex_init)
-*(.text.__lookup_hash)
-*(.text.kref_get)
-*(.text.ip_route_input)
-*(.text.__insert_inode_hash)
-*(.text.do_sock_write)
-*(.text.blk_done_softirq)
-*(.text.__wake_up_sync)
-*(.text.__vma_link_rb)
-*(.text.tty_ioctl)
-*(.text.tracesys)
-*(.text.sys_getdents)
-*(.text.sys_dup)
-*(.text.stub_execve)
-*(.text.sha_transform)
-*(.text.radix_tree_tag_clear)
-*(.text.put_unused_fd)
-*(.text.put_files_struct)
-*(.text.mpage_readpages)
-*(.text.may_delete)
-*(.text.kmem_cache_create)
-*(.text.ip_mc_output)
-*(.text.interleave_nodes)
-*(.text.groups_search)
-*(.text.generic_drop_inode)
-*(.text.generic_commit_write)
-*(.text.fcntl_setlk)
-*(.text.exit_mmap)
-*(.text.end_page_writeback)
-*(.text.__d_rehash)
-*(.text.debug_mutex_free_waiter)
-*(.text.csum_ipv6_magic)
-*(.text.count)
-*(.text.cleanup_rbuf)
-*(.text.check_spinlock_acquired_node)
-*(.text.can_vma_merge_after)
-*(.text.bio_endio)
-*(.text.alloc_pidmap)
-*(.text.write_ldt)
-*(.text.vmtruncate_range)
-*(.text.vfs_create)
-*(.text.__user_walk)
-*(.text.update_send_head)
-*(.text.unmap_underlying_metadata)
-*(.text.tty_ldisc_deref)
-*(.text.tcp_setsockopt)
-*(.text.tcp_send_ack)
-*(.text.sys_pause)
-*(.text.sys_gettimeofday)
-*(.text.sync_dirty_buffer)
-*(.text.strncmp)
-*(.text.release_posix_timer)
-*(.text.proc_file_read)
-*(.text.prepare_to_wait)
-*(.text.locks_mandatory_locked)
-*(.text.interruptible_sleep_on_timeout)
-*(.text.inode_sub_bytes)
-*(.text.in_group_p)
-*(.text.hrtimer_try_to_cancel)
-*(.text.filldir64)
-*(.text.fasync_helper)
-*(.text.dummy_sb_pivotroot)
-*(.text.d_lookup)
-*(.text.d_instantiate)
-*(.text.__d_find_alias)
-*(.text.cpu_idle_wait)
-*(.text.cond_resched_lock)
-*(.text.chown_common)
-*(.text.blk_congestion_wait)
-*(.text.activate_page)
-*(.text.unlock_buffer)
-*(.text.tty_wakeup)
-*(.text.tcp_v4_do_rcv)
-*(.text.tcp_current_mss)
-*(.text.sys_openat)
-*(.text.sys_fchdir)
-*(.text.strnlen_user)
-*(.text.strnlen)
-*(.text.strchr)
-*(.text.sock_common_getsockopt)
-*(.text.skb_checksum)
-*(.text.remove_wait_queue)
-*(.text.rb_replace_node)
-*(.text.radix_tree_node_ctor)
-*(.text.pty_chars_in_buffer)
-*(.text.profile_hit)
-*(.text.prio_tree_left)
-*(.text.pgd_clear_bad)
-*(.text.pfifo_fast_dequeue)
-*(.text.page_referenced)
-*(.text.open_exec)
-*(.text.mmput)
-*(.text.mm_init)
-*(.text.__ide_dma_off_quietly)
-*(.text.ide_dma_intr)
-*(.text.hrtimer_start)
-*(.text.get_io_context)
-*(.text.__get_free_pages)
-*(.text.find_first_zero_bit)
-*(.text.file_free_rcu)
-*(.text.dummy_socket_sendmsg)
-*(.text.do_unlinkat)
-*(.text.do_arch_prctl)
-*(.text.destroy_inode)
-*(.text.can_vma_merge_before)
-*(.text.block_sync_page)
-*(.text.block_prepare_write)
-*(.text.bio_init)
-*(.text.arch_ptrace)
-*(.text.wake_up_inode)
-*(.text.wait_on_retry_sync_kiocb)
-*(.text.vma_prio_tree_next)
-*(.text.tcp_rcv_space_adjust)
-*(.text.__tcp_ack_snd_check)
-*(.text.sys_utime)
-*(.text.sys_recvmsg)
-*(.text.sys_mremap)
-*(.text.sys_bdflush)
-*(.text.sleep_on)
-*(.text.set_page_dirty_lock)
-*(.text.seq_path)
-*(.text.schedule_timeout_interruptible)
-*(.text.sched_fork)
-*(.text.rt_run_flush)
-*(.text.profile_munmap)
-*(.text.prepare_binprm)
-*(.text.__pagevec_release_nonlru)
-*(.text.m_show)
-*(.text.lookup_mnt)
-*(.text.__lookup_mnt)
-*(.text.lock_timer_base)
-*(.text.is_subdir)
-*(.text.invalidate_bh_lru)
-*(.text.init_buffer_head)
-*(.text.ifind_fast)
-*(.text.ide_dma_start)
-*(.text.__get_page_state)
-*(.text.flock_to_posix_lock)
-*(.text.__find_symbol)
-*(.text.do_futex)
-*(.text.do_execve)
-*(.text.dirty_writeback_centisecs_handler)
-*(.text.dev_watchdog)
-*(.text.can_share_swap_page)
-*(.text.blkdev_put)
-*(.text.bio_get_nr_vecs)
-*(.text.xfrm_compile_policy)
-*(.text.vma_prio_tree_insert)
-*(.text.vfs_lstat_fd)
-*(.text.__user_path_lookup_open)
-*(.text.thread_return)
-*(.text.tcp_send_delayed_ack)
-*(.text.sock_def_error_report)
-*(.text.shrink_slab)
-*(.text.serial_out)
-*(.text.seq_read)
-*(.text.secure_ip_id)
-*(.text.search_binary_handler)
-*(.text.proc_pid_unhash)
-*(.text.pagevec_lookup)
-*(.text.new_inode)
-*(.text.memcpy_toiovec)
-*(.text.locks_free_lock)
-*(.text.__lock_page)
-*(.text.__lock_buffer)
-*(.text.load_module)
-*(.text.is_bad_inode)
-*(.text.invalidate_inode_buffers)
-*(.text.insert_vm_struct)
-*(.text.inode_setattr)
-*(.text.inode_add_bytes)
-*(.text.ide_read_24)
-*(.text.ide_get_error_location)
-*(.text.ide_do_drive_cmd)
-*(.text.get_locked_pte)
-*(.text.get_filesystem_list)
-*(.text.generic_file_open)
-*(.text.follow_down)
-*(.text.find_next_bit)
-*(.text.__find_first_bit)
-*(.text.exit_mm)
-*(.text.exec_keys)
-*(.text.end_buffer_write_sync)
-*(.text.end_bio_bh_io_sync)
-*(.text.dummy_socket_shutdown)
-*(.text.d_rehash)
-*(.text.d_path)
-*(.text.do_ioctl)
-*(.text.dget_locked)
-*(.text.copy_thread_group_keys)
-*(.text.cdrom_end_request)
-*(.text.cap_bprm_apply_creds)
-*(.text.blk_rq_bio_prep)
-*(.text.__bitmap_intersects)
-*(.text.bio_phys_segments)
-*(.text.bio_free)
-*(.text.arch_get_unmapped_area_topdown)
-*(.text.writeback_in_progress)
-*(.text.vfs_follow_link)
-*(.text.tcp_rcv_state_process)
-*(.text.tcp_check_space)
-*(.text.sys_stat)
-*(.text.sys_rt_sigreturn)
-*(.text.sys_rt_sigaction)
-*(.text.sys_remap_file_pages)
-*(.text.sys_pwrite64)
-*(.text.sys_fchownat)
-*(.text.sys_fchmodat)
-*(.text.strncat)
-*(.text.strlcat)
-*(.text.strcmp)
-*(.text.steal_locks)
-*(.text.sock_create)
-*(.text.sk_stream_rfree)
-*(.text.sk_stream_mem_schedule)
-*(.text.skip_atoi)
-*(.text.sk_alloc)
-*(.text.show_stat)
-*(.text.set_fs_pwd)
-*(.text.set_binfmt)
-*(.text.pty_unthrottle)
-*(.text.proc_symlink)
-*(.text.pipe_release)
-*(.text.pageout)
-*(.text.n_tty_write_wakeup)
-*(.text.n_tty_ioctl)
-*(.text.nr_free_zone_pages)
-*(.text.migration_thread)
-*(.text.mempool_free_slab)
-*(.text.meminfo_read_proc)
-*(.text.max_sane_readahead)
-*(.text.lru_cache_add)
-*(.text.kill_fasync)
-*(.text.kernel_read)
-*(.text.invalidate_mapping_pages)
-*(.text.inode_has_buffers)
-*(.text.init_once)
-*(.text.inet_sendmsg)
-*(.text.idedisk_issue_flush)
-*(.text.generic_file_write)
-*(.text.free_more_memory)
-*(.text.__free_fdtable)
-*(.text.filp_dtor)
-*(.text.exit_sem)
-*(.text.exit_itimers)
-*(.text.error_interrupt)
-*(.text.end_buffer_async_write)
-*(.text.eligible_child)
-*(.text.elf_map)
-*(.text.dump_task_regs)
-*(.text.dummy_task_setscheduler)
-*(.text.dummy_socket_accept)
-*(.text.dummy_file_free_security)
-*(.text.__down_read)
-*(.text.do_sock_read)
-*(.text.do_sigaltstack)
-*(.text.do_mremap)
-*(.text.current_io_context)
-*(.text.cpu_swap_callback)
-*(.text.copy_vma)
-*(.text.cap_bprm_set_security)
-*(.text.blk_insert_request)
-*(.text.bio_map_kern_endio)
-*(.text.bio_hw_segments)
-*(.text.bictcp_cong_avoid)
-*(.text.add_interrupt_randomness)
-*(.text.wait_for_completion)
-*(.text.version_read_proc)
-*(.text.unix_write_space)
-*(.text.tty_ldisc_ref_wait)
-*(.text.tty_ldisc_put)
-*(.text.try_to_wake_up)
-*(.text.tcp_v4_tw_remember_stamp)
-*(.text.tcp_try_undo_dsack)
-*(.text.tcp_may_send_now)
-*(.text.sys_waitid)
-*(.text.sys_sched_getparam)
-*(.text.sys_getppid)
-*(.text.sys_getcwd)
-*(.text.sys_dup2)
-*(.text.sys_chmod)
-*(.text.sys_chdir)
-*(.text.sprintf)
-*(.text.sock_wfree)
-*(.text.sock_aio_write)
-*(.text.skb_drop_fraglist)
-*(.text.skb_dequeue)
-*(.text.set_close_on_exec)
-*(.text.set_brk)
-*(.text.seq_puts)
-*(.text.SELECT_DRIVE)
-*(.text.sched_exec)
-*(.text.return_EIO)
-*(.text.remove_from_page_cache)
-*(.text.rcu_start_batch)
-*(.text.__put_task_struct)
-*(.text.proc_pid_readdir)
-*(.text.proc_get_inode)
-*(.text.prepare_to_wait_exclusive)
-*(.text.pipe_wait)
-*(.text.pipe_new)
-*(.text.pdflush_operation)
-*(.text.__pagevec_release)
-*(.text.pagevec_lookup_tag)
-*(.text.packet_rcv)
-*(.text.n_tty_set_room)
-*(.text.nr_free_pages)
-*(.text.__net_timestamp)
-*(.text.mpage_end_io_read)
-*(.text.mod_timer)
-*(.text.__memcpy)
-*(.text.mb_cache_shrink_fn)
-*(.text.lock_rename)
-*(.text.kstrdup)
-*(.text.is_ignored)
-*(.text.int_very_careful)
-*(.text.inotify_inode_is_dead)
-*(.text.inotify_get_cookie)
-*(.text.inode_get_bytes)
-*(.text.init_timer)
-*(.text.init_dev)
-*(.text.inet_getname)
-*(.text.ide_map_sg)
-*(.text.__ide_dma_end)
-*(.text.hrtimer_get_remaining)
-*(.text.get_task_mm)
-*(.text.get_random_int)
-*(.text.free_pipe_info)
-*(.text.filemap_write_and_wait_range)
-*(.text.exit_thread)
-*(.text.enter_idle)
-*(.text.end_that_request_first)
-*(.text.end_8259A_irq)
-*(.text.dummy_file_alloc_security)
-*(.text.do_group_exit)
-*(.text.debug_mutex_init)
-*(.text.cpuset_exit)
-*(.text.cpu_idle)
-*(.text.copy_semundo)
-*(.text.copy_files)
-*(.text.chrdev_open)
-*(.text.cdrom_transfer_packet_command)
-*(.text.cdrom_mode_sense)
-*(.text.blk_phys_contig_segment)
-*(.text.blk_get_queue)
-*(.text.bio_split)
-*(.text.audit_alloc)
-*(.text.anon_pipe_buf_release)
-*(.text.add_wait_queue_exclusive)
-*(.text.add_wait_queue)
-*(.text.acct_process)
-*(.text.account)
-*(.text.zeromap_page_range)
-*(.text.yield)
-*(.text.writeback_acquire)
-*(.text.worker_thread)
-*(.text.wait_on_page_writeback_range)
-*(.text.__wait_on_buffer)
-*(.text.vscnprintf)
-*(.text.vmalloc_to_pfn)
-*(.text.vgacon_save_screen)
-*(.text.vfs_unlink)
-*(.text.vfs_rmdir)
-*(.text.unregister_md_personality)
-*(.text.unlock_new_inode)
-*(.text.unix_stream_sendmsg)
-*(.text.unix_stream_recvmsg)
-*(.text.unhash_process)
-*(.text.udp_v4_lookup_longway)
-*(.text.tty_ldisc_flush)
-*(.text.tty_ldisc_enable)
-*(.text.tty_hung_up_p)
-*(.text.tty_buffer_free_all)
-*(.text.tso_fragment)
-*(.text.try_to_del_timer_sync)
-*(.text.tcp_v4_err)
-*(.text.tcp_unhash)
-*(.text.tcp_seq_next)
-*(.text.tcp_select_initial_window)
-*(.text.tcp_sacktag_write_queue)
-*(.text.tcp_cwnd_validate)
-*(.text.sys_vhangup)
-*(.text.sys_uselib)
-*(.text.sys_symlink)
-*(.text.sys_signal)
-*(.text.sys_poll)
-*(.text.sys_mount)
-*(.text.sys_kill)
-*(.text.sys_ioctl)
-*(.text.sys_inotify_add_watch)
-*(.text.sys_getuid)
-*(.text.sys_getrlimit)
-*(.text.sys_getitimer)
-*(.text.sys_getgroups)
-*(.text.sys_ftruncate)
-*(.text.sysfs_lookup)
-*(.text.sys_exit_group)
-*(.text.stub_fork)
-*(.text.sscanf)
-*(.text.sock_map_fd)
-*(.text.sock_get_timestamp)
-*(.text.__sock_create)
-*(.text.smp_call_function_single)
-*(.text.sk_stop_timer)
-*(.text.skb_copy_and_csum_datagram)
-*(.text.__skb_checksum_complete)
-*(.text.single_next)
-*(.text.sigqueue_alloc)
-*(.text.shrink_dcache_parent)
-*(.text.select_idle_routine)
-*(.text.run_workqueue)
-*(.text.run_local_timers)
-*(.text.remove_inode_hash)
-*(.text.remove_dquot_ref)
-*(.text.register_binfmt)
-*(.text.read_cache_pages)
-*(.text.rb_last)
-*(.text.pty_open)
-*(.text.proc_root_readdir)
-*(.text.proc_pid_flush)
-*(.text.proc_pident_lookup)
-*(.text.proc_fill_super)
-*(.text.proc_exe_link)
-*(.text.posix_locks_deadlock)
-*(.text.pipe_iov_copy_from_user)
-*(.text.opost)
-*(.text.nf_register_hook)
-*(.text.netif_rx_ni)
-*(.text.m_start)
-*(.text.mpage_writepage)
-*(.text.mm_alloc)
-*(.text.memory_open)
-*(.text.mark_buffer_async_write)
-*(.text.lru_add_drain_all)
-*(.text.locks_init_lock)
-*(.text.locks_delete_lock)
-*(.text.lock_hrtimer_base)
-*(.text.load_script)
-*(.text.__kill_fasync)
-*(.text.ip_mc_sf_allow)
-*(.text.__ioremap)
-*(.text.int_with_check)
-*(.text.int_sqrt)
-*(.text.install_thread_keyring)
-*(.text.init_page_buffers)
-*(.text.inet_sock_destruct)
-*(.text.idle_notifier_register)
-*(.text.ide_execute_command)
-*(.text.ide_end_drive_cmd)
-*(.text.__ide_dma_host_on)
-*(.text.hrtimer_run_queues)
-*(.text.hpet_mask_rtc_irq_bit)
-*(.text.__get_zone_counts)
-*(.text.get_zone_counts)
-*(.text.get_write_access)
-*(.text.get_fs_struct)
-*(.text.get_dirty_limits)
-*(.text.generic_readlink)
-*(.text.free_hot_page)
-*(.text.finish_wait)
-*(.text.find_inode)
-*(.text.find_first_bit)
-*(.text.__filemap_fdatawrite_range)
-*(.text.__filemap_copy_from_user_iovec)
-*(.text.exit_aio)
-*(.text.elv_set_request)
-*(.text.elv_former_request)
-*(.text.dup_namespace)
-*(.text.dupfd)
-*(.text.dummy_socket_getsockopt)
-*(.text.dummy_sb_post_mountroot)
-*(.text.dummy_quotactl)
-*(.text.dummy_inode_rename)
-*(.text.__do_SAK)
-*(.text.do_pipe)
-*(.text.do_fsync)
-*(.text.d_instantiate_unique)
-*(.text.d_find_alias)
-*(.text.deny_write_access)
-*(.text.dentry_unhash)
-*(.text.d_delete)
-*(.text.datagram_poll)
-*(.text.cpuset_fork)
-*(.text.cpuid_read)
-*(.text.copy_namespace)
-*(.text.cond_resched)
-*(.text.check_version)
-*(.text.__change_page_attr)
-*(.text.cfq_slab_kill)
-*(.text.cfq_completed_request)
-*(.text.cdrom_pc_intr)
-*(.text.cdrom_decode_status)
-*(.text.cap_capset_check)
-*(.text.blk_put_request)
-*(.text.bio_fs_destructor)
-*(.text.bictcp_min_cwnd)
-*(.text.alloc_chrdev_region)
-*(.text.add_element)
-*(.text.acct_update_integrals)
-*(.text.write_boundary_block)
-*(.text.writeback_release)
-*(.text.writeback_inodes)
-*(.text.wake_up_state)
-*(.text.__wake_up_locked)
-*(.text.wake_futex)
-*(.text.wait_task_inactive)
-*(.text.__wait_on_freeing_inode)
-*(.text.wait_noreap_copyout)
-*(.text.vmstat_start)
-*(.text.vgacon_do_font_op)
-*(.text.vfs_readv)
-*(.text.vfs_quota_sync)
-*(.text.update_queue)
-*(.text.unshare_files)
-*(.text.unmap_vm_area)
-*(.text.unix_socketpair)
-*(.text.unix_release_sock)
-*(.text.unix_detach_fds)
-*(.text.unix_create1)
-*(.text.unix_bind)
-*(.text.udp_sendmsg)
-*(.text.udp_rcv)
-*(.text.udp_queue_rcv_skb)
-*(.text.uart_write)
-*(.text.uart_startup)
-*(.text.uart_open)
-*(.text.tty_vhangup)
-*(.text.tty_termios_baud_rate)
-*(.text.tty_release)
-*(.text.tty_ldisc_ref)
-*(.text.throttle_vm_writeout)
-*(.text.058)
-*(.text.tcp_xmit_probe_skb)
-*(.text.tcp_v4_send_check)
-*(.text.tcp_v4_destroy_sock)
-*(.text.tcp_sync_mss)
-*(.text.tcp_snd_test)
-*(.text.tcp_slow_start)
-*(.text.tcp_send_fin)
-*(.text.tcp_rtt_estimator)
-*(.text.tcp_parse_options)
-*(.text.tcp_ioctl)
-*(.text.tcp_init_tso_segs)
-*(.text.tcp_init_cwnd)
-*(.text.tcp_getsockopt)
-*(.text.tcp_fin)
-*(.text.tcp_connect)
-*(.text.tcp_cong_avoid)
-*(.text.__tcp_checksum_complete_user)
-*(.text.task_dumpable)
-*(.text.sys_wait4)
-*(.text.sys_utimes)
-*(.text.sys_symlinkat)
-*(.text.sys_socketpair)
-*(.text.sys_rmdir)
-*(.text.sys_readahead)
-*(.text.sys_nanosleep)
-*(.text.sys_linkat)
-*(.text.sys_fstat)
-*(.text.sysfs_readdir)
-*(.text.sys_execve)
-*(.text.sysenter_tracesys)
-*(.text.sys_chown)
-*(.text.stub_clone)
-*(.text.strrchr)
-*(.text.strncpy)
-*(.text.stopmachine_set_state)
-*(.text.sock_sendmsg)
-*(.text.sock_release)
-*(.text.sock_fasync)
-*(.text.sock_close)
-*(.text.sk_stream_write_space)
-*(.text.sk_reset_timer)
-*(.text.skb_split)
-*(.text.skb_recv_datagram)
-*(.text.skb_queue_tail)
-*(.text.sk_attach_filter)
-*(.text.si_swapinfo)
-*(.text.simple_strtoll)
-*(.text.set_termios)
-*(.text.set_task_comm)
-*(.text.set_shrinker)
-*(.text.set_normalized_timespec)
-*(.text.set_brk)
-*(.text.serial_in)
-*(.text.seq_printf)
-*(.text.secure_dccp_sequence_number)
-*(.text.rwlock_bug)
-*(.text.rt_hash_code)
-*(.text.__rta_fill)
-*(.text.__request_resource)
-*(.text.relocate_new_kernel)
-*(.text.release_thread)
-*(.text.release_mem)
-*(.text.rb_prev)
-*(.text.rb_first)
-*(.text.random_poll)
-*(.text.__put_super_and_need_restart)
-*(.text.pty_write)
-*(.text.ptrace_stop)
-*(.text.proc_self_readlink)
-*(.text.proc_root_lookup)
-*(.text.proc_root_link)
-*(.text.proc_pid_make_inode)
-*(.text.proc_pid_attr_write)
-*(.text.proc_lookupfd)
-*(.text.proc_delete_inode)
-*(.text.posix_same_owner)
-*(.text.posix_block_lock)
-*(.text.poll_initwait)
-*(.text.pipe_write)
-*(.text.pipe_read_fasync)
-*(.text.pipe_ioctl)
-*(.text.pdflush)
-*(.text.pci_user_read_config_dword)
-*(.text.page_readlink)
-*(.text.null_lseek)
-*(.text.nf_hook_slow)
-*(.text.netlink_sock_destruct)
-*(.text.netlink_broadcast)
-*(.text.neigh_resolve_output)
-*(.text.name_to_int)
-*(.text.mwait_idle)
-*(.text.mutex_trylock)
-*(.text.mutex_debug_check_no_locks_held)
-*(.text.m_stop)
-*(.text.mpage_end_io_write)
-*(.text.mpage_alloc)
-*(.text.move_page_tables)
-*(.text.mounts_open)
-*(.text.__memset)
-*(.text.memcpy_fromiovec)
-*(.text.make_8259A_irq)
-*(.text.lookup_user_key_possessed)
-*(.text.lookup_create)
-*(.text.locks_insert_lock)
-*(.text.locks_alloc_lock)
-*(.text.kthread_should_stop)
-*(.text.kswapd)
-*(.text.kobject_uevent)
-*(.text.kobject_get_path)
-*(.text.kobject_get)
-*(.text.klist_children_put)
-*(.text.__ip_route_output_key)
-*(.text.ip_flush_pending_frames)
-*(.text.ip_compute_csum)
-*(.text.ip_append_data)
-*(.text.ioc_set_batching)
-*(.text.invalidate_inode_pages)
-*(.text.__invalidate_device)
-*(.text.install_arg_page)
-*(.text.in_sched_functions)
-*(.text.inotify_unmount_inodes)
-*(.text.init_once)
-*(.text.init_cdrom_command)
-*(.text.inet_stream_connect)
-*(.text.inet_sk_rebuild_header)
-*(.text.inet_csk_addr2sockaddr)
-*(.text.inet_create)
-*(.text.ifind)
-*(.text.ide_setup_dma)
-*(.text.ide_outsw)
-*(.text.ide_fixstring)
-*(.text.ide_dma_setup)
-*(.text.ide_cdrom_packet)
-*(.text.ide_cd_put)
-*(.text.ide_build_sglist)
-*(.text.i8259A_shutdown)
-*(.text.hung_up_tty_ioctl)
-*(.text.hrtimer_nanosleep)
-*(.text.hrtimer_init)
-*(.text.hrtimer_cancel)
-*(.text.hash_futex)
-*(.text.group_send_sig_info)
-*(.text.grab_cache_page_nowait)
-*(.text.get_wchan)
-*(.text.get_stack)
-*(.text.get_page_state)
-*(.text.getnstimeofday)
-*(.text.get_node)
-*(.text.get_kprobe)
-*(.text.generic_unplug_device)
-*(.text.free_task)
-*(.text.frag_show)
-*(.text.find_next_zero_string)
-*(.text.filp_open)
-*(.text.fillonedir)
-*(.text.exit_io_context)
-*(.text.exit_idle)
-*(.text.exact_lock)
-*(.text.eth_header)
-*(.text.dummy_unregister_security)
-*(.text.dummy_socket_post_create)
-*(.text.dummy_socket_listen)
-*(.text.dummy_quota_on)
-*(.text.dummy_inode_follow_link)
-*(.text.dummy_file_receive)
-*(.text.dummy_file_mprotect)
-*(.text.dummy_file_lock)
-*(.text.dummy_file_ioctl)
-*(.text.dummy_bprm_post_apply_creds)
-*(.text.do_writepages)
-*(.text.__down_interruptible)
-*(.text.do_notify_resume)
-*(.text.do_acct_process)
-*(.text.del_timer_sync)
-*(.text.default_rebuild_header)
-*(.text.d_callback)
-*(.text.dcache_readdir)
-*(.text.ctrl_dumpfamily)
-*(.text.cpuset_rmdir)
-*(.text.copy_strings_kernel)
-*(.text.con_write_room)
-*(.text.complete_all)
-*(.text.collect_sigign_sigcatch)
-*(.text.clear_user)
-*(.text.check_unthrottle)
-*(.text.cdrom_release)
-*(.text.cdrom_newpc_intr)
-*(.text.cdrom_ioctl)
-*(.text.cdrom_check_status)
-*(.text.cdev_put)
-*(.text.cdev_add)
-*(.text.cap_ptrace)
-*(.text.cap_bprm_secureexec)
-*(.text.cache_alloc_refill)
-*(.text.bmap)
-*(.text.blk_run_queue)
-*(.text.blk_queue_dma_alignment)
-*(.text.blk_ordered_req_seq)
-*(.text.blk_backing_dev_unplug)
-*(.text.__bitmap_subset)
-*(.text.__bitmap_and)
-*(.text.bio_unmap_user)
-*(.text.__bforget)
-*(.text.bd_forget)
-*(.text.bad_pipe_w)
-*(.text.bad_get_user)
-*(.text.audit_free)
-*(.text.anon_vma_ctor)
-*(.text.anon_pipe_buf_map)
-*(.text.alloc_sock_iocb)
-*(.text.alloc_fdset)
-*(.text.aio_kick_handler)
-*(.text.__add_entropy_words)
-*(.text.add_disk_randomness)
Index: linux/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux/arch/x86_64/kernel/vmlinux.lds.S
@@ -30,9 +30,6 @@ SECTIONS
 	/* First the code that has to be first for bootstrapping */
 	*(.bootstrap.text)
 	_stext = .;
-	/* Then all the functions that are "hot" in profiles, to group them
-           onto the same hugetlb entry */
-	#include "functionlist"
 	/* Then the rest */
 	*(.text)
 	SCHED_TEXT

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (21 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel



Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/boot/setup.S |    2 
 arch/x86_64/boot/video.S | 2043 -----------------------------------------------
 2 files changed, 1 insertion(+), 2044 deletions(-)

Index: linux/arch/x86_64/boot/setup.S
===================================================================
--- linux.orig/arch/x86_64/boot/setup.S
+++ linux/arch/x86_64/boot/setup.S
@@ -807,7 +807,7 @@ gdt_48:
 
 # Include video setup & detection code
 
-#include "video.S"
+#include "../../i386/boot/video.S"
 
 # Setup signature -- must be last
 setup_sig1:	.word	SIG1
Index: linux/arch/x86_64/boot/video.S
===================================================================
--- linux.orig/arch/x86_64/boot/video.S
+++ /dev/null
@@ -1,2043 +0,0 @@
-/*	video.S
- *
- *	Display adapter & video mode setup, version 2.13 (14-May-99)
- *
- *	Copyright (C) 1995 -- 1998 Martin Mares <mj@ucw.cz>
- *	Based on the original setup.S code (C) Linus Torvalds and Mats Anderson
- *
- *	Rewritten to use GNU 'as' by Chris Noe <stiker@northlink.com> May 1999
- *
- *	For further information, look at Documentation/svga.txt.
- *
- */
-
-/* Enable autodetection of SVGA adapters and modes. */
-#undef CONFIG_VIDEO_SVGA
-
-/* Enable autodetection of VESA modes */
-#define CONFIG_VIDEO_VESA
-
-/* Enable compacting of mode table */
-#define CONFIG_VIDEO_COMPACT
-
-/* Retain screen contents when switching modes */
-#define CONFIG_VIDEO_RETAIN
-
-/* Enable local mode list */
-#undef CONFIG_VIDEO_LOCAL
-
-/* Force 400 scan lines for standard modes (hack to fix bad BIOS behaviour */
-#undef CONFIG_VIDEO_400_HACK
-
-/* Hack that lets you force specific BIOS mode ID and specific dimensions */
-#undef CONFIG_VIDEO_GFX_HACK
-#define VIDEO_GFX_BIOS_AX 0x4f02	/* 800x600 on ThinkPad */
-#define VIDEO_GFX_BIOS_BX 0x0102
-#define VIDEO_GFX_DUMMY_RESOLUTION 0x6425	/* 100x37 */
-
-/* This code uses an extended set of video mode numbers. These include:
- * Aliases for standard modes
- *	NORMAL_VGA (-1)
- *	EXTENDED_VGA (-2)
- *	ASK_VGA (-3)
- * Video modes numbered by menu position -- NOT RECOMMENDED because of lack
- * of compatibility when extending the table. These are between 0x00 and 0xff.
- */
-#define VIDEO_FIRST_MENU 0x0000
-
-/* Standard BIOS video modes (BIOS number + 0x0100) */
-#define VIDEO_FIRST_BIOS 0x0100
-
-/* VESA BIOS video modes (VESA number + 0x0200) */
-#define VIDEO_FIRST_VESA 0x0200
-
-/* Video7 special modes (BIOS number + 0x0900) */
-#define VIDEO_FIRST_V7 0x0900
-
-/* Special video modes */
-#define VIDEO_FIRST_SPECIAL 0x0f00
-#define VIDEO_80x25 0x0f00
-#define VIDEO_8POINT 0x0f01
-#define VIDEO_80x43 0x0f02
-#define VIDEO_80x28 0x0f03
-#define VIDEO_CURRENT_MODE 0x0f04
-#define VIDEO_80x30 0x0f05
-#define VIDEO_80x34 0x0f06
-#define VIDEO_80x60 0x0f07
-#define VIDEO_GFX_HACK 0x0f08
-#define VIDEO_LAST_SPECIAL 0x0f09
-
-/* Video modes given by resolution */
-#define VIDEO_FIRST_RESOLUTION 0x1000
-
-/* The "recalculate timings" flag */
-#define VIDEO_RECALC 0x8000
-
-/* Positions of various video parameters passed to the kernel */
-/* (see also include/linux/tty.h) */
-#define PARAM_CURSOR_POS	0x00
-#define PARAM_VIDEO_PAGE	0x04
-#define PARAM_VIDEO_MODE	0x06
-#define PARAM_VIDEO_COLS	0x07
-#define PARAM_VIDEO_EGA_BX	0x0a
-#define PARAM_VIDEO_LINES	0x0e
-#define PARAM_HAVE_VGA		0x0f
-#define PARAM_FONT_POINTS	0x10
-
-#define PARAM_LFB_WIDTH		0x12
-#define PARAM_LFB_HEIGHT	0x14
-#define PARAM_LFB_DEPTH		0x16
-#define PARAM_LFB_BASE		0x18
-#define PARAM_LFB_SIZE		0x1c
-#define PARAM_LFB_LINELENGTH	0x24
-#define PARAM_LFB_COLORS	0x26
-#define PARAM_VESAPM_SEG	0x2e
-#define PARAM_VESAPM_OFF	0x30
-#define PARAM_LFB_PAGES		0x32
-#define PARAM_VESA_ATTRIB	0x34
-#define PARAM_CAPABILITIES	0x36
-
-/* Define DO_STORE according to CONFIG_VIDEO_RETAIN */
-#ifdef CONFIG_VIDEO_RETAIN
-#define DO_STORE call store_screen
-#else
-#define DO_STORE
-#endif /* CONFIG_VIDEO_RETAIN */
-
-# This is the main entry point called by setup.S
-# %ds *must* be pointing to the bootsector
-video:	pushw	%ds		# We use different segments
-	pushw	%ds		# FS contains original DS
-	popw	%fs
-	pushw	%cs		# DS is equal to CS
-	popw	%ds
-	pushw	%cs		# ES is equal to CS
-	popw	%es
-	xorw	%ax, %ax
-	movw	%ax, %gs	# GS is zero
-	cld
-	call	basic_detect	# Basic adapter type testing (EGA/VGA/MDA/CGA)
-#ifdef CONFIG_VIDEO_SELECT
-	movw	%fs:(0x01fa), %ax		# User selected video mode
-	cmpw	$ASK_VGA, %ax			# Bring up the menu
-	jz	vid2
-
-	call	mode_set			# Set the mode
-	jc	vid1
-
-	leaw	badmdt, %si			# Invalid mode ID
-	call	prtstr
-vid2:	call	mode_menu
-vid1:
-#ifdef CONFIG_VIDEO_RETAIN
-	call	restore_screen			# Restore screen contents
-#endif /* CONFIG_VIDEO_RETAIN */
-	call	store_edid
-#endif /* CONFIG_VIDEO_SELECT */
-	call	mode_params			# Store mode parameters
-	popw	%ds				# Restore original DS
-	ret
-
-# Detect if we have CGA, MDA, EGA or VGA and pass it to the kernel.
-basic_detect:
-	movb	$0, %fs:(PARAM_HAVE_VGA)
-	movb	$0x12, %ah	# Check EGA/VGA
-	movb	$0x10, %bl
-	int	$0x10
-	movw	%bx, %fs:(PARAM_VIDEO_EGA_BX)	# Identifies EGA to the kernel
-	cmpb	$0x10, %bl			# No, it's a CGA/MDA/HGA card.
-	je	basret
-
-	incb	adapter
-	movw	$0x1a00, %ax			# Check EGA or VGA?
-	int	$0x10
-	cmpb	$0x1a, %al			# 1a means VGA...
-	jne	basret				# anything else is EGA.
-	
-	incb	%fs:(PARAM_HAVE_VGA)		# We've detected a VGA
-	incb	adapter
-basret:	ret
-
-# Store the video mode parameters for later usage by the kernel.
-# This is done by asking the BIOS except for the rows/columns
-# parameters in the default 80x25 mode -- these are set directly,
-# because some very obscure BIOSes supply insane values.
-mode_params:
-#ifdef CONFIG_VIDEO_SELECT
-	cmpb	$0, graphic_mode
-	jnz	mopar_gr
-#endif
-	movb	$0x03, %ah			# Read cursor position
-	xorb	%bh, %bh
-	int	$0x10
-	movw	%dx, %fs:(PARAM_CURSOR_POS)
-	movb	$0x0f, %ah			# Read page/mode/width
-	int	$0x10
-	movw	%bx, %fs:(PARAM_VIDEO_PAGE)
-	movw	%ax, %fs:(PARAM_VIDEO_MODE)	# Video mode and screen width
-	cmpb	$0x7, %al			# MDA/HGA => segment differs
-	jnz	mopar0
-
-	movw	$0xb000, video_segment
-mopar0: movw	%gs:(0x485), %ax		# Font size
-	movw	%ax, %fs:(PARAM_FONT_POINTS)	# (valid only on EGA/VGA)
-	movw	force_size, %ax			# Forced size?
-	orw	%ax, %ax
-	jz	mopar1
-
-	movb	%ah, %fs:(PARAM_VIDEO_COLS)
-	movb	%al, %fs:(PARAM_VIDEO_LINES)
-	ret
-
-mopar1:	movb	$25, %al
-	cmpb	$0, adapter			# If we are on CGA/MDA/HGA, the
-	jz	mopar2				# screen must have 25 lines.
-
-	movb	%gs:(0x484), %al		# On EGA/VGA, use the EGA+ BIOS
-	incb	%al				# location of max lines.
-mopar2: movb	%al, %fs:(PARAM_VIDEO_LINES)
-	ret
-
-#ifdef CONFIG_VIDEO_SELECT
-# Fetching of VESA frame buffer parameters
-mopar_gr:
-	leaw	modelist+1024, %di
-	movb	$0x23, %fs:(PARAM_HAVE_VGA)
-	movw	16(%di), %ax
-	movw	%ax, %fs:(PARAM_LFB_LINELENGTH)
-	movw	18(%di), %ax
-	movw	%ax, %fs:(PARAM_LFB_WIDTH)
-	movw	20(%di), %ax
-	movw	%ax, %fs:(PARAM_LFB_HEIGHT)
-	movb	25(%di), %al
-	movb	$0, %ah
-	movw	%ax, %fs:(PARAM_LFB_DEPTH)
-	movb	29(%di), %al	
-	movb	$0, %ah
-	movw	%ax, %fs:(PARAM_LFB_PAGES)
-	movl	40(%di), %eax
-	movl	%eax, %fs:(PARAM_LFB_BASE)
-	movl	31(%di), %eax
-	movl	%eax, %fs:(PARAM_LFB_COLORS)
-	movl	35(%di), %eax
-	movl	%eax, %fs:(PARAM_LFB_COLORS+4)
-	movw	0(%di), %ax
-	movw	%ax, %fs:(PARAM_VESA_ATTRIB)
-
-# get video mem size
-	leaw	modelist+1024, %di
-	movw	$0x4f00, %ax
-	int	$0x10
-	xorl	%eax, %eax
-	movw	18(%di), %ax
-	movl	%eax, %fs:(PARAM_LFB_SIZE)
-
-# store mode capabilities
-	movl 10(%di), %eax
-	movl %eax, %fs:(PARAM_CAPABILITIES)
-
-# switching the DAC to 8-bit is for <= 8 bpp only
-	movw	%fs:(PARAM_LFB_DEPTH), %ax
-	cmpw	$8, %ax
-	jg	dac_done
-
-# get DAC switching capability
-	xorl	%eax, %eax
-	movb	10(%di), %al
-	testb	$1, %al
-	jz	dac_set
-
-# attempt to switch DAC to 8-bit
-	movw	$0x4f08, %ax
-	movw	$0x0800, %bx
-	int	$0x10
-	cmpw	$0x004f, %ax
-	jne     dac_set
-	movb    %bh, dac_size		# store actual DAC size
-
-dac_set:
-# set color size to DAC size
-	movb	dac_size, %al
-	movb	%al, %fs:(PARAM_LFB_COLORS+0)
-	movb	%al, %fs:(PARAM_LFB_COLORS+2)
-	movb	%al, %fs:(PARAM_LFB_COLORS+4)
-	movb	%al, %fs:(PARAM_LFB_COLORS+6)
-
-# set color offsets to 0
-	movb	$0, %fs:(PARAM_LFB_COLORS+1)
-	movb	$0, %fs:(PARAM_LFB_COLORS+3)
-	movb	$0, %fs:(PARAM_LFB_COLORS+5)
-	movb	$0, %fs:(PARAM_LFB_COLORS+7)
-
-dac_done:
-# get protected mode interface informations
-	movw	$0x4f0a, %ax
-	xorw	%bx, %bx
-	xorw	%di, %di
-	int	$0x10
-	cmp	$0x004f, %ax
-	jnz	no_pm
-
-	movw	%es, %fs:(PARAM_VESAPM_SEG)
-	movw	%di, %fs:(PARAM_VESAPM_OFF)
-no_pm:	ret
-
-# The video mode menu
-mode_menu:
-	leaw	keymsg, %si			# "Return/Space/Timeout" message
-	call	prtstr
-	call	flush
-nokey:	call	getkt
-
-	cmpb	$0x0d, %al			# ENTER ?
-	je	listm				# yes - manual mode selection
-
-	cmpb	$0x20, %al			# SPACE ?
-	je	defmd1				# no - repeat
-
-	call 	beep
-	jmp	nokey
-
-defmd1:	ret					# No mode chosen? Default 80x25
-
-listm:	call	mode_table			# List mode table
-listm0:	leaw	name_bann, %si			# Print adapter name
-	call	prtstr
-	movw	card_name, %si
-	orw	%si, %si
-	jnz	an2
-
-	movb	adapter, %al
-	leaw	old_name, %si
-	orb	%al, %al
-	jz	an1
-
-	leaw	ega_name, %si
-	decb	%al
-	jz	an1
-
-	leaw	vga_name, %si
-	jmp	an1
-
-an2:	call	prtstr
-	leaw	svga_name, %si
-an1:	call	prtstr
-	leaw	listhdr, %si			# Table header
-	call	prtstr
-	movb	$0x30, %dl			# DL holds mode number
-	leaw	modelist, %si
-lm1:	cmpw	$ASK_VGA, (%si)			# End?
-	jz	lm2
-
-	movb	%dl, %al			# Menu selection number
-	call	prtchr
-	call	prtsp2
-	lodsw
-	call	prthw				# Mode ID
-	call	prtsp2
-	movb	0x1(%si), %al
-	call	prtdec				# Rows
-	movb	$0x78, %al			# the letter 'x'
-	call	prtchr
-	lodsw
-	call	prtdec				# Columns
-	movb	$0x0d, %al			# New line
-	call	prtchr
-	movb	$0x0a, %al
-	call	prtchr
-	incb	%dl				# Next character
-	cmpb	$0x3a, %dl
-	jnz	lm1
-
-	movb	$0x61, %dl
-	jmp	lm1
-
-lm2:	leaw	prompt, %si			# Mode prompt
-	call	prtstr
-	leaw	edit_buf, %di			# Editor buffer
-lm3:	call	getkey
-	cmpb	$0x0d, %al			# Enter?
-	jz	lment
-
-	cmpb	$0x08, %al			# Backspace?
-	jz	lmbs
-
-	cmpb	$0x20, %al			# Printable?
-	jc	lm3
-
-	cmpw	$edit_buf+4, %di		# Enough space?
-	jz	lm3
-
-	stosb
-	call	prtchr
-	jmp	lm3
-
-lmbs:	cmpw	$edit_buf, %di			# Backspace
-	jz	lm3
-
-	decw	%di
-	movb	$0x08, %al
-	call	prtchr
-	call	prtspc
-	movb	$0x08, %al
-	call	prtchr
-	jmp	lm3
-	
-lment:	movb	$0, (%di)
-	leaw	crlft, %si
-	call	prtstr
-	leaw	edit_buf, %si
-	cmpb	$0, (%si)			# Empty string = default mode
-	jz	lmdef
-
-	cmpb	$0, 1(%si)			# One character = menu selection
-	jz	mnusel
-
-	cmpw	$0x6373, (%si)			# "scan" => mode scanning
-	jnz	lmhx
-
-	cmpw	$0x6e61, 2(%si)
-	jz	lmscan
-
-lmhx:	xorw	%bx, %bx			# Else => mode ID in hex
-lmhex:	lodsb
-	orb	%al, %al
-	jz	lmuse1
-
-	subb	$0x30, %al
-	jc	lmbad
-
-	cmpb	$10, %al
-	jc	lmhx1
-
-	subb	$7, %al
-	andb	$0xdf, %al
-	cmpb	$10, %al
-	jc	lmbad
-
-	cmpb	$16, %al
-	jnc	lmbad
-
-lmhx1:	shlw	$4, %bx
-	orb	%al, %bl
-	jmp	lmhex
-
-lmuse1:	movw	%bx, %ax
-	jmp	lmuse
-
-mnusel:	lodsb					# Menu selection
-	xorb	%ah, %ah
-	subb	$0x30, %al
-	jc	lmbad
-
-	cmpb	$10, %al
-	jc	lmuse
-	
-	cmpb	$0x61-0x30, %al
-	jc	lmbad
-	
-	subb	$0x61-0x30-10, %al
-	cmpb	$36, %al
-	jnc	lmbad
-
-lmuse:	call	mode_set
-	jc	lmdef
-
-lmbad:	leaw	unknt, %si
-	call	prtstr
-	jmp	lm2
-lmscan:	cmpb	$0, adapter			# Scanning only on EGA/VGA
-	jz	lmbad
-
-	movw	$0, mt_end			# Scanning of modes is
-	movb	$1, scanning			# done as new autodetection.
-	call	mode_table
-	jmp	listm0
-lmdef:	ret
-
-# Additional parts of mode_set... (relative jumps, you know)
-setv7:						# Video7 extended modes
-	DO_STORE
-	subb	$VIDEO_FIRST_V7>>8, %bh
-	movw	$0x6f05, %ax
-	int	$0x10
-	stc
-	ret
-
-_setrec:	jmp	setrec			# Ugly...
-_set_80x25:	jmp	set_80x25
-
-# Aliases for backward compatibility.
-setalias:
-	movw	$VIDEO_80x25, %ax
-	incw	%bx
-	jz	mode_set
-
-	movb	$VIDEO_8POINT-VIDEO_FIRST_SPECIAL, %al
-	incw	%bx
-	jnz	setbad				# Fall-through!
-
-# Setting of user mode (AX=mode ID) => CF=success
-mode_set:
-	movw	%ax, %fs:(0x01fa)		# Store mode for use in acpi_wakeup.S
-	movw	%ax, %bx
-	cmpb	$0xff, %ah
-	jz	setalias
-
-	testb	$VIDEO_RECALC>>8, %ah
-	jnz	_setrec
-
-	cmpb	$VIDEO_FIRST_RESOLUTION>>8, %ah
-	jnc	setres
-	
-	cmpb	$VIDEO_FIRST_SPECIAL>>8, %ah
-	jz	setspc
-	
-	cmpb	$VIDEO_FIRST_V7>>8, %ah
-	jz	setv7
-	
-	cmpb	$VIDEO_FIRST_VESA>>8, %ah
-	jnc	check_vesa
-	
-	orb	%ah, %ah
-	jz	setmenu
-	
-	decb	%ah
-	jz	setbios
-
-setbad:	clc
-	movb	$0, do_restore			# The screen needn't be restored
-	ret
-
-setvesa:
-	DO_STORE
-	subb	$VIDEO_FIRST_VESA>>8, %bh
-	movw	$0x4f02, %ax			# VESA BIOS mode set call
-	int	$0x10
-	cmpw	$0x004f, %ax			# AL=4f if implemented
-	jnz	setbad				# AH=0 if OK
-
-	stc
-	ret
-
-setbios:
-	DO_STORE
-	int	$0x10				# Standard BIOS mode set call
-	pushw	%bx
-	movb	$0x0f, %ah			# Check if really set
-	int	$0x10
-	popw	%bx
-	cmpb	%bl, %al
-	jnz	setbad
-	
-	stc
-	ret
-
-setspc:	xorb	%bh, %bh			# Set special mode
-	cmpb	$VIDEO_LAST_SPECIAL-VIDEO_FIRST_SPECIAL, %bl
-	jnc	setbad
-	
-	addw	%bx, %bx
-	jmp	*spec_inits(%bx)
-
-setmenu:
-	orb	%al, %al			# 80x25 is an exception
-	jz	_set_80x25
-	
-	pushw	%bx				# Set mode chosen from menu
-	call	mode_table			# Build the mode table
-	popw	%ax
-	shlw	$2, %ax
-	addw	%ax, %si
-	cmpw	%di, %si
-	jnc	setbad
-	
-	movw	(%si), %ax			# Fetch mode ID
-_m_s:	jmp	mode_set
-
-setres:	pushw	%bx				# Set mode chosen by resolution
-	call	mode_table
-	popw	%bx
-	xchgb	%bl, %bh
-setr1:	lodsw
-	cmpw	$ASK_VGA, %ax			# End of the list?
-	jz	setbad
-	
-	lodsw
-	cmpw	%bx, %ax
-	jnz	setr1
-	
-	movw	-4(%si), %ax			# Fetch mode ID
-	jmp	_m_s
-
-check_vesa:
-#ifdef CONFIG_FIRMWARE_EDID
-	leaw	modelist+1024, %di
-	movw	$0x4f00, %ax
-	int	$0x10
-	cmpw	$0x004f, %ax
-	jnz	setbad
-
-	movw	4(%di), %ax
-	movw	%ax, vbe_version
-#endif
-	leaw	modelist+1024, %di
-	subb	$VIDEO_FIRST_VESA>>8, %bh
-	movw	%bx, %cx			# Get mode information structure
-	movw	$0x4f01, %ax
-	int	$0x10
-	addb	$VIDEO_FIRST_VESA>>8, %bh
-	cmpw	$0x004f, %ax
-	jnz	setbad
-
-	movb	(%di), %al			# Check capabilities.
-	andb	$0x19, %al
-	cmpb	$0x09, %al
-	jz	setvesa				# This is a text mode
-
-	movb	(%di), %al			# Check capabilities.
-	andb	$0x99, %al
-	cmpb	$0x99, %al
-	jnz	_setbad				# Doh! No linear frame buffer.
-
-	subb	$VIDEO_FIRST_VESA>>8, %bh
-	orw	$0x4000, %bx			# Use linear frame buffer
-	movw	$0x4f02, %ax			# VESA BIOS mode set call
-	int	$0x10
-	cmpw	$0x004f, %ax			# AL=4f if implemented
-	jnz	_setbad				# AH=0 if OK
-
-	movb	$1, graphic_mode		# flag graphic mode
-	movb	$0, do_restore			# no screen restore
-	stc
-	ret
-
-_setbad:	jmp	setbad          	# Ugly...
-
-# Recalculate vertical display end registers -- this fixes various
-# inconsistencies of extended modes on many adapters. Called when
-# the VIDEO_RECALC flag is set in the mode ID.
-
-setrec:	subb	$VIDEO_RECALC>>8, %ah		# Set the base mode
-	call	mode_set
-	jnc	rct3
-
-	movw	%gs:(0x485), %ax		# Font size in pixels
-	movb	%gs:(0x484), %bl		# Number of rows
-	incb	%bl
-	mulb	%bl				# Number of visible
-	decw	%ax				# scan lines - 1
-	movw	$0x3d4, %dx
-	movw	%ax, %bx
-	movb	$0x12, %al			# Lower 8 bits
-	movb	%bl, %ah
-	outw	%ax, %dx
-	movb	$0x07, %al		# Bits 8 and 9 in the overflow register
-	call	inidx
-	xchgb	%al, %ah
-	andb	$0xbd, %ah
-	shrb	%bh
-	jnc	rct1
-	orb	$0x02, %ah
-rct1:	shrb	%bh
-	jnc	rct2
-	orb	$0x40, %ah
-rct2:	movb	$0x07, %al
-	outw	%ax, %dx
-	stc
-rct3:	ret
-
-# Table of routines for setting of the special modes.
-spec_inits:
-	.word	set_80x25
-	.word	set_8pixel
-	.word	set_80x43
-	.word	set_80x28
-	.word	set_current
-	.word	set_80x30
-	.word	set_80x34
-	.word	set_80x60
-	.word	set_gfx
-
-# Set the 80x25 mode. If already set, do nothing.
-set_80x25:
-	movw	$0x5019, force_size		# Override possibly broken BIOS
-use_80x25:
-#ifdef CONFIG_VIDEO_400_HACK
-	movw	$0x1202, %ax			# Force 400 scan lines
-	movb	$0x30, %bl
-	int	$0x10
-#else
-	movb	$0x0f, %ah			# Get current mode ID
-	int	$0x10
-	cmpw	$0x5007, %ax	# Mode 7 (80x25 mono) is the only one available
-	jz	st80		# on CGA/MDA/HGA and is also available on EGAM
-
-	cmpw	$0x5003, %ax	# Unknown mode, force 80x25 color
-	jnz	force3
-
-st80:	cmpb	$0, adapter	# CGA/MDA/HGA => mode 3/7 is always 80x25
-	jz	set80
-
-	movb	%gs:(0x0484), %al	# This is EGA+ -- beware of 80x50 etc.
-	orb	%al, %al		# Some buggy BIOS'es set 0 rows
-	jz	set80
-	
-	cmpb	$24, %al		# It's hopefully correct
-	jz	set80
-#endif /* CONFIG_VIDEO_400_HACK */
-force3:	DO_STORE
-	movw	$0x0003, %ax			# Forced set
-	int	$0x10
-set80:	stc
-	ret
-
-# Set the 80x50/80x43 8-pixel mode. Simple BIOS calls.
-set_8pixel:
-	DO_STORE
-	call	use_80x25			# The base is 80x25
-set_8pt:
-	movw	$0x1112, %ax			# Use 8x8 font
-	xorb	%bl, %bl
-	int	$0x10
-	movw	$0x1200, %ax			# Use alternate print screen
-	movb	$0x20, %bl
-	int	$0x10
-	movw	$0x1201, %ax			# Turn off cursor emulation
-	movb	$0x34, %bl
-	int	$0x10
-	movb	$0x01, %ah			# Define cursor scan lines 6-7
-	movw	$0x0607, %cx
-	int	$0x10
-set_current:
-	stc
-	ret
-
-# Set the 80x28 mode. This mode works on all VGA's, because it's a standard
-# 80x25 mode with 14-point fonts instead of 16-point.
-set_80x28:
-	DO_STORE
-	call	use_80x25			# The base is 80x25
-set14:	movw	$0x1111, %ax			# Use 9x14 font
-	xorb	%bl, %bl
-	int	$0x10
-	movb	$0x01, %ah			# Define cursor scan lines 11-12
-	movw	$0x0b0c, %cx
-	int	$0x10
-	stc
-	ret
-
-# Set the 80x43 mode. This mode is works on all VGA's.
-# It's a 350-scanline mode with 8-pixel font.
-set_80x43:
-	DO_STORE
-	movw	$0x1201, %ax			# Set 350 scans
-	movb	$0x30, %bl
-	int	$0x10
-	movw	$0x0003, %ax			# Reset video mode
-	int	$0x10
-	jmp	set_8pt				# Use 8-pixel font
-
-# Set the 80x30 mode (all VGA's). 480 scanlines, 16-pixel font.
-set_80x30:
-	call	use_80x25			# Start with real 80x25
-	DO_STORE
-	movw	$0x3cc, %dx			# Get CRTC port
-	inb	%dx, %al
-	movb	$0xd4, %dl
-	rorb	%al				# Mono or color?
-	jc	set48a
-
-	movb	$0xb4, %dl
-set48a:	movw	$0x0c11, %ax		# Vertical sync end (also unlocks CR0-7)
- 	call	outidx
-	movw	$0x0b06, %ax			# Vertical total
- 	call	outidx
-	movw	$0x3e07, %ax			# (Vertical) overflow
- 	call	outidx
-	movw	$0xea10, %ax			# Vertical sync start
- 	call	outidx
-	movw	$0xdf12, %ax			# Vertical display end
-	call	outidx
-	movw	$0xe715, %ax			# Vertical blank start
- 	call	outidx
-	movw	$0x0416, %ax			# Vertical blank end
- 	call	outidx
-	pushw	%dx
-	movb	$0xcc, %dl			# Misc output register (read)
- 	inb	%dx, %al
- 	movb	$0xc2, %dl			# (write)
- 	andb	$0x0d, %al	# Preserve clock select bits and color bit
- 	orb	$0xe2, %al			# Set correct sync polarity
- 	outb	%al, %dx
-	popw	%dx
-	movw	$0x501e, force_size
-	stc					# That's all.
-	ret
-
-# Set the 80x34 mode (all VGA's). 480 scans, 14-pixel font.
-set_80x34:
-	call	set_80x30			# Set 480 scans
-	call	set14				# And 14-pt font
-	movw	$0xdb12, %ax			# VGA vertical display end
-	movw	$0x5022, force_size
-setvde:	call	outidx
-	stc
-	ret
-
-# Set the 80x60 mode (all VGA's). 480 scans, 8-pixel font.
-set_80x60:
-	call	set_80x30			# Set 480 scans
-	call	set_8pt				# And 8-pt font
-	movw	$0xdf12, %ax			# VGA vertical display end
-	movw	$0x503c, force_size
-	jmp	setvde
-
-# Special hack for ThinkPad graphics
-set_gfx:
-#ifdef CONFIG_VIDEO_GFX_HACK
-	movw	$VIDEO_GFX_BIOS_AX, %ax
-	movw	$VIDEO_GFX_BIOS_BX, %bx
-	int	$0x10
-	movw	$VIDEO_GFX_DUMMY_RESOLUTION, force_size
-	stc
-#endif
-	ret
-
-#ifdef CONFIG_VIDEO_RETAIN
-
-# Store screen contents to temporary buffer.
-store_screen:
-	cmpb	$0, do_restore			# Already stored?
-	jnz	stsr
-
-	testb	$CAN_USE_HEAP, loadflags	# Have we space for storing?
-	jz	stsr
-	
-	pushw	%ax
-	pushw	%bx
-	pushw	force_size			# Don't force specific size
-	movw	$0, force_size
-	call	mode_params			# Obtain params of current mode
-	popw	force_size
-	movb	%fs:(PARAM_VIDEO_LINES), %ah
-	movb	%fs:(PARAM_VIDEO_COLS), %al
-	movw	%ax, %bx			# BX=dimensions
-	mulb	%ah
-	movw	%ax, %cx			# CX=number of characters
-	addw	%ax, %ax			# Calculate image size
-	addw	$modelist+1024+4, %ax
-	cmpw	heap_end_ptr, %ax
-	jnc	sts1				# Unfortunately, out of memory
-
-	movw	%fs:(PARAM_CURSOR_POS), %ax	# Store mode params
-	leaw	modelist+1024, %di
-	stosw
-	movw	%bx, %ax
-	stosw
-	pushw	%ds				# Store the screen
-	movw	video_segment, %ds
-	xorw	%si, %si
-	rep
-	movsw
-	popw	%ds
-	incb	do_restore			# Screen will be restored later
-sts1:	popw	%bx
-	popw	%ax
-stsr:	ret
-
-# Restore screen contents from temporary buffer.
-restore_screen:
-	cmpb	$0, do_restore			# Has the screen been stored?
-	jz	res1
-
-	call	mode_params			# Get parameters of current mode
-	movb	%fs:(PARAM_VIDEO_LINES), %cl
-	movb	%fs:(PARAM_VIDEO_COLS), %ch
-	leaw	modelist+1024, %si		# Screen buffer
-	lodsw					# Set cursor position
-	movw	%ax, %dx
-	cmpb	%cl, %dh
-	jc	res2
-	
-	movb	%cl, %dh
-	decb	%dh
-res2:	cmpb	%ch, %dl
-	jc	res3
-	
-	movb	%ch, %dl
-	decb	%dl
-res3:	movb	$0x02, %ah
-	movb	$0x00, %bh
-	int	$0x10
-	lodsw					# Display size
-	movb	%ah, %dl			# DL=number of lines
-	movb	$0, %ah				# BX=phys. length of orig. line
-	movw	%ax, %bx
-	cmpb	%cl, %dl			# Too many?
-	jc	res4
-
-	pushw	%ax
-	movb	%dl, %al
-	subb	%cl, %al
-	mulb	%bl
-	addw	%ax, %si
-	addw	%ax, %si
-	popw	%ax
-	movb	%cl, %dl
-res4:	cmpb	%ch, %al			# Too wide?
-	jc	res5
-	
-	movb	%ch, %al			# AX=width of src. line
-res5:	movb	$0, %cl
-	xchgb	%ch, %cl
-	movw	%cx, %bp			# BP=width of dest. line
-	pushw	%es
-	movw	video_segment, %es
-	xorw	%di, %di			# Move the data
-	addw	%bx, %bx			# Convert BX and BP to _bytes_
-	addw	%bp, %bp
-res6:	pushw	%si
-	pushw	%di
-	movw	%ax, %cx
-	rep
-	movsw
-	popw	%di
-	popw	%si
-	addw	%bp, %di
-	addw	%bx, %si
-	decb	%dl
-	jnz	res6
-	
-	popw	%es				# Done
-res1:	ret
-#endif /* CONFIG_VIDEO_RETAIN */
-
-# Write to indexed VGA register (AL=index, AH=data, DX=index reg. port)
-outidx:	outb	%al, %dx
-	pushw	%ax
-	movb	%ah, %al
-	incw	%dx
-	outb	%al, %dx
-	decw	%dx
-	popw	%ax
-	ret
-
-# Build the table of video modes (stored after the setup.S code at the
-# `modelist' label. Each video mode record looks like:
-#	.word	MODE-ID		(our special mode ID (see above))
-#	.byte	rows		(number of rows)
-#	.byte	columns		(number of columns)
-# Returns address of the end of the table in DI, the end is marked
-# with a ASK_VGA ID.
-mode_table:
-	movw	mt_end, %di			# Already filled?
-	orw	%di, %di
-	jnz	mtab1x
-	
-	leaw	modelist, %di			# Store standard modes:
-	movl	$VIDEO_80x25 + 0x50190000, %eax	# The 80x25 mode (ALL)
-	stosl
-	movb	adapter, %al			# CGA/MDA/HGA -- no more modes
-	orb	%al, %al
-	jz	mtabe
-	
-	decb	%al
-	jnz	mtabv
-	
-	movl	$VIDEO_8POINT + 0x502b0000, %eax	# The 80x43 EGA mode
-	stosl
-	jmp	mtabe
-
-mtab1x:	jmp	mtab1
-
-mtabv:	leaw	vga_modes, %si			# All modes for std VGA
-	movw	$vga_modes_end-vga_modes, %cx
-	rep	# I'm unable to use movsw as I don't know how to store a half
-	movsb	# of the expression above to cx without using explicit shr.
-
-	cmpb	$0, scanning			# Mode scan requested?
-	jz	mscan1
-	
-	call	mode_scan
-mscan1:
-
-#ifdef CONFIG_VIDEO_LOCAL
-	call	local_modes
-#endif /* CONFIG_VIDEO_LOCAL */
-
-#ifdef CONFIG_VIDEO_VESA
-	call	vesa_modes			# Detect VESA VGA modes
-#endif /* CONFIG_VIDEO_VESA */
-
-#ifdef CONFIG_VIDEO_SVGA
-	cmpb	$0, scanning			# Bypass when scanning
-	jnz	mscan2
-	
-	call	svga_modes			# Detect SVGA cards & modes
-mscan2:
-#endif /* CONFIG_VIDEO_SVGA */
-
-mtabe:
-
-#ifdef CONFIG_VIDEO_COMPACT
-	leaw	modelist, %si
-	movw	%di, %dx
-	movw	%si, %di
-cmt1:	cmpw	%dx, %si			# Scan all modes
-	jz	cmt2
-
-	leaw	modelist, %bx			# Find in previous entries
-	movw	2(%si), %cx
-cmt3:	cmpw	%bx, %si
-	jz	cmt4
-
-	cmpw	2(%bx), %cx			# Found => don't copy this entry
-	jz	cmt5
-
-	addw	$4, %bx
-	jmp	cmt3
-
-cmt4:	movsl					# Copy entry
-	jmp	cmt1
-
-cmt5:	addw	$4, %si				# Skip entry
-	jmp	cmt1
-
-cmt2:
-#endif	/* CONFIG_VIDEO_COMPACT */
-
-	movw	$ASK_VGA, (%di)			# End marker
-	movw	%di, mt_end
-mtab1:	leaw	modelist, %si			# SI=mode list, DI=list end
-ret0:	ret
-
-# Modes usable on all standard VGAs
-vga_modes:
-	.word	VIDEO_8POINT
-	.word	0x5032				# 80x50
-	.word	VIDEO_80x43
-	.word	0x502b				# 80x43
-	.word	VIDEO_80x28
-	.word	0x501c				# 80x28
-	.word	VIDEO_80x30
-	.word	0x501e				# 80x30
-	.word	VIDEO_80x34
-	.word	0x5022				# 80x34
-	.word	VIDEO_80x60
-	.word	0x503c				# 80x60
-#ifdef CONFIG_VIDEO_GFX_HACK
-	.word	VIDEO_GFX_HACK
-	.word	VIDEO_GFX_DUMMY_RESOLUTION
-#endif
-
-vga_modes_end:
-# Detect VESA modes.
-
-#ifdef CONFIG_VIDEO_VESA
-vesa_modes:
-	cmpb	$2, adapter			# VGA only
-	jnz	ret0
-
-	movw	%di, %bp			# BP=original mode table end
-	addw	$0x200, %di			# Buffer space
-	movw	$0x4f00, %ax			# VESA Get card info call
-	int	$0x10
-	movw	%bp, %di
-	cmpw	$0x004f, %ax			# Successful?
-	jnz	ret0
-	
-	cmpw	$0x4556, 0x200(%di)
-	jnz	ret0
-	
-	cmpw	$0x4153, 0x202(%di)
-	jnz	ret0
-	
-	movw	$vesa_name, card_name		# Set name to "VESA VGA"
-	pushw	%gs
-	lgsw	0x20e(%di), %si			# GS:SI=mode list
-	movw	$128, %cx			# Iteration limit
-vesa1:
-# gas version 2.9.1, using BFD version 2.9.1.0.23 buggers the next inst.
-# XXX:	lodsw	%gs:(%si), %ax			# Get next mode in the list
-	gs; lodsw
-	cmpw	$0xffff, %ax			# End of the table?
-	jz	vesar
-	
-	cmpw	$0x0080, %ax			# Check validity of mode ID
-	jc	vesa2
-	
-	orb	%ah, %ah		# Valid IDs: 0x0000-0x007f/0x0100-0x07ff
-	jz	vesan			# Certain BIOSes report 0x80-0xff!
-
-	cmpw	$0x0800, %ax
-	jnc	vesae
-
-vesa2:	pushw	%cx
-	movw	%ax, %cx			# Get mode information structure
-	movw	$0x4f01, %ax
-	int	$0x10
-	movw	%cx, %bx			# BX=mode number
-	addb	$VIDEO_FIRST_VESA>>8, %bh
-	popw	%cx
-	cmpw	$0x004f, %ax
-	jnz	vesan			# Don't report errors (buggy BIOSES)
-
-	movb	(%di), %al			# Check capabilities. We require
-	andb	$0x19, %al			# a color text mode.
-	cmpb	$0x09, %al
-	jnz	vesan
-	
-	cmpw	$0xb800, 8(%di)		# Standard video memory address required
-	jnz	vesan
-
-	testb	$2, (%di)			# Mode characteristics supplied?
-	movw	%bx, (%di)			# Store mode number
-	jz	vesa3
-	
-	xorw	%dx, %dx
-	movw	0x12(%di), %bx			# Width
-	orb	%bh, %bh
-	jnz	vesan
-	
-	movb	%bl, 0x3(%di)
-	movw	0x14(%di), %ax			# Height
-	orb	%ah, %ah
-	jnz	vesan
-	
-	movb	%al, 2(%di)
-	mulb	%bl
-	cmpw	$8193, %ax		# Small enough for Linux console driver?
-	jnc	vesan
-
-	jmp	vesaok
-
-vesa3:	subw	$0x8108, %bx	# This mode has no detailed info specified,
-	jc	vesan		# so it must be a standard VESA mode.
-
-	cmpw	$5, %bx
-	jnc	vesan
-
-	movw	vesa_text_mode_table(%bx), %ax
-	movw	%ax, 2(%di)
-vesaok:	addw	$4, %di				# The mode is valid. Store it.
-vesan:	loop	vesa1			# Next mode. Limit exceeded => error
-vesae:	leaw	vesaer, %si
-	call	prtstr
-	movw	%bp, %di			# Discard already found modes.
-vesar:	popw	%gs
-	ret
-
-# Dimensions of standard VESA text modes
-vesa_text_mode_table:
-	.byte	60, 80				# 0108
-	.byte	25, 132				# 0109
-	.byte	43, 132				# 010A
-	.byte	50, 132				# 010B
-	.byte	60, 132				# 010C
-#endif	/* CONFIG_VIDEO_VESA */
-
-# Scan for video modes. A bit dirty, but should work.
-mode_scan:
-	movw	$0x0100, %cx			# Start with mode 0
-scm1:	movb	$0, %ah				# Test the mode
-	movb	%cl, %al
-	int	$0x10
-	movb	$0x0f, %ah
-	int	$0x10
-	cmpb	%cl, %al
-	jnz	scm2				# Mode not set
-
-	movw	$0x3c0, %dx			# Test if it's a text mode
-	movb	$0x10, %al			# Mode bits
-	call	inidx
-	andb	$0x03, %al
-	jnz	scm2
-	
-	movb	$0xce, %dl			# Another set of mode bits
-	movb	$0x06, %al
-	call	inidx
-	shrb	%al
-	jc	scm2
-	
-	movb	$0xd4, %dl			# Cursor location
-	movb	$0x0f, %al
-	call	inidx
-	orb	%al, %al
-	jnz	scm2
-	
-	movw	%cx, %ax			# Ok, store the mode
-	stosw
-	movb	%gs:(0x484), %al		# Number of rows
-	incb	%al
-	stosb
-	movw	%gs:(0x44a), %ax		# Number of columns
-	stosb
-scm2:	incb	%cl
-	jns	scm1
-	
-	movw	$0x0003, %ax			# Return back to mode 3
-	int	$0x10
-	ret
-
-tstidx:	outw	%ax, %dx			# OUT DX,AX and inidx
-inidx:	outb	%al, %dx			# Read from indexed VGA register
-	incw	%dx			# AL=index, DX=index reg port -> AL=data
-	inb	%dx, %al
-	decw	%dx
-	ret
-
-# Try to detect type of SVGA card and supply (usually approximate) video
-# mode table for it.
-
-#ifdef CONFIG_VIDEO_SVGA
-svga_modes:
-	leaw	svga_table, %si			# Test all known SVGA adapters
-dosvga:	lodsw
-	movw	%ax, %bp			# Default mode table
-	orw	%ax, %ax
-	jz	didsv1
-
-	lodsw					# Pointer to test routine
-	pushw	%si
-	pushw	%di
-	pushw	%es
-	movw	$0xc000, %bx
-	movw	%bx, %es
-	call	*%ax				# Call test routine
-	popw	%es
-	popw	%di
-	popw	%si
-	orw	%bp, %bp
-	jz	dosvga
-	
-	movw	%bp, %si			# Found, copy the modes
-	movb	svga_prefix, %ah
-cpsvga:	lodsb
-	orb	%al, %al
-	jz	didsv
-	
-	stosw
-	movsw
-	jmp	cpsvga
-
-didsv:	movw	%si, card_name			# Store pointer to card name
-didsv1:	ret
-
-# Table of all known SVGA cards. For each card, we store a pointer to
-# a table of video modes supported by the card and a pointer to a routine
-# used for testing of presence of the card. The video mode table is always
-# followed by the name of the card or the chipset.
-svga_table:
-	.word	ati_md, ati_test
-	.word	oak_md, oak_test
-	.word	paradise_md, paradise_test
-	.word	realtek_md, realtek_test
-	.word	s3_md, s3_test
-	.word	chips_md, chips_test
-	.word	video7_md, video7_test
-	.word	cirrus5_md, cirrus5_test
-	.word	cirrus6_md, cirrus6_test
-	.word	cirrus1_md, cirrus1_test
-	.word	ahead_md, ahead_test
-	.word	everex_md, everex_test
-	.word	genoa_md, genoa_test
-	.word	trident_md, trident_test
-	.word	tseng_md, tseng_test
-	.word	0
-
-# Test routines and mode tables:
-
-# S3 - The test algorithm was taken from the SuperProbe package
-# for XFree86 1.2.1. Report bugs to Christoph.Niemann@linux.org
-s3_test:
-	movw	$0x0f35, %cx	# we store some constants in cl/ch
-	movw	$0x03d4, %dx
-	movb	$0x38, %al
-	call	inidx
-	movb	%al, %bh	# store current CRT-register 0x38
-	movw	$0x0038, %ax
-	call	outidx		# disable writing to special regs
-	movb	%cl, %al	# check whether we can write special reg 0x35
-	call	inidx
-	movb	%al, %bl	# save the current value of CRT reg 0x35
-	andb	$0xf0, %al	# clear bits 0-3
-	movb	%al, %ah
-	movb	%cl, %al	# and write it to CRT reg 0x35
-	call	outidx
-	call	inidx		# now read it back
-	andb	%ch, %al	# clear the upper 4 bits
-	jz	s3_2		# the first test failed. But we have a
-
-	movb	%bl, %ah	# second chance
-	movb	%cl, %al
-	call	outidx
-	jmp	s3_1		# do the other tests
-
-s3_2:	movw	%cx, %ax	# load ah with 0xf and al with 0x35
-	orb	%bl, %ah	# set the upper 4 bits of ah with the orig value
-	call	outidx		# write ...
-	call	inidx		# ... and reread 
-	andb	%cl, %al	# turn off the upper 4 bits
-	pushw	%ax
-	movb	%bl, %ah	# restore old value in register 0x35
-	movb	%cl, %al
-	call	outidx
-	popw	%ax
-	cmpb	%ch, %al	# setting lower 4 bits was successful => bad
-	je	no_s3		# writing is allowed => this is not an S3
-
-s3_1:	movw	$0x4838, %ax	# allow writing to special regs by putting
-	call	outidx		# magic number into CRT-register 0x38
-	movb	%cl, %al	# check whether we can write special reg 0x35
-	call	inidx
-	movb	%al, %bl
-	andb	$0xf0, %al
-	movb	%al, %ah
-	movb	%cl, %al
-	call	outidx
-	call	inidx
-	andb	%ch, %al
-	jnz	no_s3		# no, we can't write => no S3
-
-	movw	%cx, %ax
-	orb	%bl, %ah
-	call	outidx
-	call	inidx
-	andb	%ch, %al
-	pushw	%ax
-	movb	%bl, %ah	# restore old value in register 0x35
-	movb	%cl, %al
-	call	outidx
-	popw	%ax
-	cmpb	%ch, %al
-	jne	no_s31		# writing not possible => no S3
-	movb	$0x30, %al
-	call	inidx		# now get the S3 id ...
-	leaw	idS3, %di
-	movw	$0x10, %cx
-	repne
-	scasb
-	je	no_s31
-
-	movb	%bh, %ah
-	movb	$0x38, %al
-	jmp	s3rest
-
-no_s3:	movb	$0x35, %al	# restore CRT register 0x35
-	movb	%bl, %ah
-	call	outidx
-no_s31:	xorw	%bp, %bp	# Detection failed
-s3rest:	movb	%bh, %ah
-	movb	$0x38, %al	# restore old value of CRT register 0x38
-	jmp	outidx
-
-idS3:	.byte	0x81, 0x82, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95
-	.byte	0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa8, 0xb0
-
-s3_md:	.byte	0x54, 0x2b, 0x84
-	.byte	0x55, 0x19, 0x84
-	.byte	0
-	.ascii	"S3"
-	.byte	0
-
-# ATI cards.
-ati_test:
-	leaw 	idati, %si
-	movw	$0x31, %di
-	movw	$0x09, %cx
-	repe
-	cmpsb
-	je	atiok
-
-	xorw	%bp, %bp
-atiok:	ret
-
-idati:	.ascii	"761295520"
-
-ati_md:	.byte	0x23, 0x19, 0x84
-	.byte	0x33, 0x2c, 0x84
-	.byte	0x22, 0x1e, 0x64
-	.byte	0x21, 0x19, 0x64
-	.byte	0x58, 0x21, 0x50
-	.byte	0x5b, 0x1e, 0x50
-	.byte	0
-	.ascii	"ATI"
-	.byte	0
-
-# AHEAD
-ahead_test:
-	movw	$0x200f, %ax
-	movw	$0x3ce, %dx
-	outw	%ax, %dx
-	incw	%dx
-	inb	%dx, %al
-	cmpb	$0x20, %al
-	je	isahed
-
-	cmpb	$0x21, %al
-	je	isahed
-	
-	xorw	%bp, %bp
-isahed:	ret
-
-ahead_md:
-	.byte	0x22, 0x2c, 0x84
-	.byte	0x23, 0x19, 0x84
-	.byte	0x24, 0x1c, 0x84
-	.byte	0x2f, 0x32, 0xa0
-	.byte	0x32, 0x22, 0x50
-	.byte	0x34, 0x42, 0x50
-	.byte	0
-	.ascii	"Ahead"
-	.byte	0
-
-# Chips & Tech.
-chips_test:
-	movw	$0x3c3, %dx
-	inb	%dx, %al
-	orb	$0x10, %al
-	outb	%al, %dx
-	movw	$0x104, %dx
-	inb	%dx, %al
-	movb	%al, %bl
-	movw	$0x3c3, %dx
-	inb	%dx, %al
-	andb	$0xef, %al
-	outb	%al, %dx
-	cmpb	$0xa5, %bl
-	je	cantok
-	
-	xorw	%bp, %bp
-cantok:	ret
-
-chips_md:
-	.byte	0x60, 0x19, 0x84
-	.byte	0x61, 0x32, 0x84
-	.byte	0
-	.ascii	"Chips & Technologies"
-	.byte	0
-
-# Cirrus Logic 5X0
-cirrus1_test:
-	movw	$0x3d4, %dx
-	movb	$0x0c, %al
-	outb	%al, %dx
-	incw	%dx
-	inb	%dx, %al
-	movb	%al, %bl
-	xorb	%al, %al
-	outb	%al, %dx
-	decw	%dx
-	movb	$0x1f, %al
-	outb	%al, %dx
-	incw	%dx
-	inb	%dx, %al
-	movb	%al, %bh
-	xorb	%ah, %ah
-	shlb	$4, %al
-	movw	%ax, %cx
-	movb	%bh, %al
-	shrb	$4, %al
-	addw	%ax, %cx
-	shlw	$8, %cx
-	addw	$6, %cx
-	movw	%cx, %ax
-	movw	$0x3c4, %dx
-	outw	%ax, %dx
-	incw	%dx
-	inb	%dx, %al
-	andb	%al, %al
-	jnz	nocirr
-	
-	movb	%bh, %al
-	outb	%al, %dx
-	inb	%dx, %al
-	cmpb	$0x01, %al
-	je	iscirr
-
-nocirr:	xorw	%bp, %bp
-iscirr: movw	$0x3d4, %dx
-	movb	%bl, %al
-	xorb	%ah, %ah
-	shlw	$8, %ax
-	addw	$0x0c, %ax
-	outw	%ax, %dx
-	ret
-
-cirrus1_md:
-	.byte	0x1f, 0x19, 0x84
-	.byte	0x20, 0x2c, 0x84
-	.byte	0x22, 0x1e, 0x84
-	.byte	0x31, 0x25, 0x64
-	.byte	0
-	.ascii	"Cirrus Logic 5X0"
-	.byte	0
-
-# Cirrus Logic 54XX
-cirrus5_test:
-	movw	$0x3c4, %dx
-	movb	$6, %al
-	call	inidx
-	movb	%al, %bl			# BL=backup
-	movw	$6, %ax
-	call	tstidx
-	cmpb	$0x0f, %al
-	jne	c5fail
-	
-	movw	$0x1206, %ax
-	call	tstidx
-	cmpb	$0x12, %al
-	jne	c5fail
-	
-	movb	$0x1e, %al
-	call	inidx
-	movb	%al, %bh
-	movb	%bh, %ah
-	andb	$0xc0, %ah
-	movb	$0x1e, %al
-	call	tstidx
-	andb	$0x3f, %al
-	jne	c5xx
-	
-	movb	$0x1e, %al
-	movb	%bh, %ah
-	orb	$0x3f, %ah
-	call	tstidx
-	xorb	$0x3f, %al
-	andb	$0x3f, %al
-c5xx:	pushf
-	movb	$0x1e, %al
-	movb	%bh, %ah
-	outw	%ax, %dx
-	popf
-	je	c5done
-
-c5fail:	xorw	%bp, %bp
-c5done:	movb	$6, %al
-	movb	%bl, %ah
-	outw	%ax, %dx
-	ret
-
-cirrus5_md:
-	.byte	0x14, 0x19, 0x84
-	.byte	0x54, 0x2b, 0x84
-	.byte	0
-	.ascii	"Cirrus Logic 54XX"
-	.byte	0
-
-# Cirrus Logic 64XX -- no known extra modes, but must be identified, because
-# it's misidentified by the Ahead test.
-cirrus6_test:
-	movw	$0x3ce, %dx
-	movb	$0x0a, %al
-	call	inidx
-	movb	%al, %bl	# BL=backup
-	movw	$0xce0a, %ax
-	call	tstidx
-	orb	%al, %al
-	jne	c2fail
-	
-	movw	$0xec0a, %ax
-	call	tstidx
-	cmpb	$0x01, %al
-	jne	c2fail
-	
-	movb	$0xaa, %al
-	call	inidx		# 4X, 5X, 7X and 8X are valid 64XX chip ID's. 
-	shrb	$4, %al
-	subb	$4, %al
-	jz	c6done
-	
-	decb	%al
-	jz	c6done
-	
-	subb	$2, %al
-	jz	c6done
-	
-	decb	%al
-	jz	c6done
-	
-c2fail:	xorw	%bp, %bp
-c6done:	movb	$0x0a, %al
-	movb	%bl, %ah
-	outw	%ax, %dx
-	ret
-
-cirrus6_md:
-	.byte	0
-	.ascii	"Cirrus Logic 64XX"
-	.byte	0
-
-# Everex / Trident
-everex_test:
-	movw	$0x7000, %ax
-	xorw	%bx, %bx
-	int	$0x10
-	cmpb	$0x70, %al
-	jne	noevrx
-	
-	shrw	$4, %dx
-	cmpw	$0x678, %dx
-	je	evtrid
-	
-	cmpw	$0x236, %dx
-	jne	evrxok
-
-evtrid:	leaw	trident_md, %bp
-evrxok:	ret
-
-noevrx:	xorw	%bp, %bp
-	ret
-
-everex_md:
-	.byte	0x03, 0x22, 0x50
-	.byte	0x04, 0x3c, 0x50
-	.byte	0x07, 0x2b, 0x64
-	.byte	0x08, 0x4b, 0x64
-	.byte	0x0a, 0x19, 0x84
-	.byte	0x0b, 0x2c, 0x84
-	.byte	0x16, 0x1e, 0x50
-	.byte	0x18, 0x1b, 0x64
-	.byte	0x21, 0x40, 0xa0
-	.byte	0x40, 0x1e, 0x84
-	.byte	0
-	.ascii	"Everex/Trident"
-	.byte	0
-
-# Genoa.
-genoa_test:
-	leaw	idgenoa, %si			# Check Genoa 'clues'
-	xorw	%ax, %ax
-	movb	%es:(0x37), %al
-	movw	%ax, %di
-	movw	$0x04, %cx
-	decw	%si
-	decw	%di
-l1:	incw	%si
-	incw	%di
-	movb	(%si), %al
-	testb	%al, %al
-	jz	l2
-
-	cmpb	%es:(%di), %al
-l2:	loope 	l1
-	orw	%cx, %cx
-	je	isgen
-	
-	xorw	%bp, %bp
-isgen:	ret
-
-idgenoa: .byte	0x77, 0x00, 0x99, 0x66
-
-genoa_md:
-	.byte	0x58, 0x20, 0x50
-	.byte	0x5a, 0x2a, 0x64
-	.byte	0x60, 0x19, 0x84
-	.byte	0x61, 0x1d, 0x84
-	.byte	0x62, 0x20, 0x84
-	.byte	0x63, 0x2c, 0x84
-	.byte	0x64, 0x3c, 0x84
-	.byte	0x6b, 0x4f, 0x64
-	.byte	0x72, 0x3c, 0x50
-	.byte	0x74, 0x42, 0x50
-	.byte	0x78, 0x4b, 0x64
-	.byte	0
-	.ascii	"Genoa"
-	.byte	0
-
-# OAK
-oak_test:
-	leaw	idoakvga, %si
-	movw	$0x08, %di
-	movw	$0x08, %cx
-	repe
-	cmpsb
-	je	isoak
-	
-	xorw	%bp, %bp
-isoak:	ret
-
-idoakvga: .ascii  "OAK VGA "
-
-oak_md: .byte	0x4e, 0x3c, 0x50
-	.byte	0x4f, 0x3c, 0x84
-	.byte	0x50, 0x19, 0x84
-	.byte	0x51, 0x2b, 0x84
-	.byte	0
-	.ascii	"OAK"
-	.byte	0
-
-# WD Paradise.
-paradise_test:
-	leaw	idparadise, %si
-	movw	$0x7d, %di
-	movw	$0x04, %cx
-	repe
-	cmpsb
-	je	ispara
-	
-	xorw	%bp, %bp
-ispara:	ret
-
-idparadise:	.ascii	"VGA="
-
-paradise_md:
-	.byte	0x41, 0x22, 0x50
-	.byte	0x47, 0x1c, 0x84
-	.byte	0x55, 0x19, 0x84
-	.byte	0x54, 0x2c, 0x84
-	.byte	0
-	.ascii	"Paradise"
-	.byte	0
-
-# Trident.
-trident_test:
-	movw	$0x3c4, %dx
-	movb	$0x0e, %al
-	outb	%al, %dx
-	incw	%dx
-	inb	%dx, %al
-	xchgb	%al, %ah
-	xorb	%al, %al
-	outb	%al, %dx
-	inb	%dx, %al
-	xchgb	%ah, %al
-	movb	%al, %bl	# Strange thing ... in the book this wasn't
-	andb	$0x02, %bl	# necessary but it worked on my card which
-	jz	setb2		# is a trident. Without it the screen goes
-				# blurred ...
-	andb	$0xfd, %al
-	jmp	clrb2		
-
-setb2:	orb	$0x02, %al	
-clrb2:	outb	%al, %dx
-	andb	$0x0f, %ah
-	cmpb	$0x02, %ah
-	je	istrid
-
-	xorw	%bp, %bp
-istrid:	ret
-
-trident_md:
-	.byte	0x50, 0x1e, 0x50
-	.byte	0x51, 0x2b, 0x50
-	.byte	0x52, 0x3c, 0x50
-	.byte	0x57, 0x19, 0x84
-	.byte	0x58, 0x1e, 0x84
-	.byte	0x59, 0x2b, 0x84
-	.byte	0x5a, 0x3c, 0x84
-	.byte	0
-	.ascii	"Trident"
-	.byte	0
-
-# Tseng.
-tseng_test:
-	movw	$0x3cd, %dx
-	inb	%dx, %al	# Could things be this simple ! :-)
-	movb	%al, %bl
-	movb	$0x55, %al
-	outb	%al, %dx
-	inb	%dx, %al
-	movb	%al, %ah
-	movb	%bl, %al
-	outb	%al, %dx
-	cmpb	$0x55, %ah
- 	je	istsen
-
-isnot:	xorw	%bp, %bp
-istsen:	ret
-
-tseng_md:
-	.byte	0x26, 0x3c, 0x50
-	.byte	0x2a, 0x28, 0x64
-	.byte	0x23, 0x19, 0x84
-	.byte	0x24, 0x1c, 0x84
-	.byte	0x22, 0x2c, 0x84
-	.byte	0x21, 0x3c, 0x84
-	.byte	0
-	.ascii	"Tseng"
-	.byte	0
-
-# Video7.
-video7_test:
-	movw	$0x3cc, %dx
-	inb	%dx, %al
-	movw	$0x3b4, %dx
-	andb	$0x01, %al
-	jz	even7
-
-	movw	$0x3d4, %dx
-even7:	movb	$0x0c, %al
-	outb	%al, %dx
-	incw	%dx
-	inb	%dx, %al
-	movb	%al, %bl
-	movb	$0x55, %al
-	outb	%al, %dx
-	inb	%dx, %al
-	decw	%dx
-	movb	$0x1f, %al
-	outb	%al, %dx
-	incw	%dx
-	inb	%dx, %al
-	movb	%al, %bh
-	decw	%dx
-	movb	$0x0c, %al
-	outb	%al, %dx
-	incw	%dx
-	movb	%bl, %al
-	outb	%al, %dx
-	movb	$0x55, %al
-	xorb	$0xea, %al
-	cmpb	%bh, %al
-	jne	isnot
-	
-	movb	$VIDEO_FIRST_V7>>8, svga_prefix # Use special mode switching
-	ret
-
-video7_md:
-	.byte	0x40, 0x2b, 0x50
-	.byte	0x43, 0x3c, 0x50
-	.byte	0x44, 0x3c, 0x64
-	.byte	0x41, 0x19, 0x84
-	.byte	0x42, 0x2c, 0x84
-	.byte	0x45, 0x1c, 0x84
-	.byte	0
-	.ascii	"Video 7"
-	.byte	0
-
-# Realtek VGA
-realtek_test:
-	leaw	idrtvga, %si
-	movw	$0x45, %di
-	movw	$0x0b, %cx
-	repe
-	cmpsb
-	je	isrt
-	
-	xorw	%bp, %bp
-isrt:	ret
-
-idrtvga:	.ascii	"REALTEK VGA"
-
-realtek_md:
-	.byte	0x1a, 0x3c, 0x50
-	.byte	0x1b, 0x19, 0x84
-	.byte	0x1c, 0x1e, 0x84
-	.byte	0x1d, 0x2b, 0x84
-	.byte	0x1e, 0x3c, 0x84
-	.byte	0
-	.ascii	"REALTEK"
-	.byte	0
-
-#endif	/* CONFIG_VIDEO_SVGA */
-
-# User-defined local mode table (VGA only)
-#ifdef CONFIG_VIDEO_LOCAL
-local_modes:
-	leaw	local_mode_table, %si
-locm1:	lodsw
-	orw	%ax, %ax
-	jz	locm2
-	
-	stosw
-	movsw
-	jmp	locm1
-
-locm2:	ret
-
-# This is the table of local video modes which can be supplied manually
-# by the user. Each entry consists of mode ID (word) and dimensions
-# (byte for column count and another byte for row count). These modes
-# are placed before all SVGA and VESA modes and override them if table
-# compacting is enabled. The table must end with a zero word followed
-# by NUL-terminated video adapter name.
-local_mode_table:
-	.word	0x0100				# Example: 40x25
-	.byte	25,40
-	.word	0
-	.ascii	"Local"
-	.byte	0
-#endif	/* CONFIG_VIDEO_LOCAL */
-
-# Read a key and return the ASCII code in al, scan code in ah
-getkey:	xorb	%ah, %ah
-	int	$0x16
-	ret
-
-# Read a key with a timeout of 30 seconds.
-# The hardware clock is used to get the time.
-getkt:	call	gettime
-	addb	$30, %al			# Wait 30 seconds
-	cmpb	$60, %al
-	jl	lminute
-
-	subb	$60, %al
-lminute:
-	movb	%al, %cl
-again:	movb	$0x01, %ah
-	int	$0x16
-	jnz	getkey				# key pressed, so get it
-
-	call	gettime
-	cmpb	%cl, %al
-	jne	again
-
-	movb	$0x20, %al			# timeout, return `space'
-	ret
-
-# Flush the keyboard buffer
-flush:	movb	$0x01, %ah
-	int	$0x16
-	jz	empty
-	
-	xorb	%ah, %ah
-	int	$0x16
-	jmp	flush
-
-empty:	ret
-
-# Print hexadecimal number.
-prthw:	pushw	%ax
-	movb	%ah, %al
-	call	prthb
-	popw	%ax
-prthb:	pushw	%ax
-	shrb	$4, %al
-	call	prthn
-	popw	%ax
-	andb	$0x0f, %al
-prthn:	cmpb	$0x0a, %al
-	jc	prth1
-
-	addb	$0x07, %al
-prth1:	addb	$0x30, %al
-	jmp	prtchr
-
-# Print decimal number in al
-prtdec:	pushw	%ax
-	pushw	%cx
-	xorb	%ah, %ah
-	movb	$0x0a, %cl
-	idivb	%cl
-	cmpb	$0x09, %al
-	jbe	lt100
-
-	call	prtdec
-	jmp	skip10
-
-lt100:	addb	$0x30, %al
-	call	prtchr
-skip10:	movb	%ah, %al
-	addb	$0x30, %al
-	call	prtchr	
-	popw	%cx
-	popw	%ax
-	ret
-
-store_edid:
-#ifdef CONFIG_FIRMWARE_EDID
-	pushw	%es				# just save all registers
-	pushw	%ax
-	pushw	%bx
-	pushw   %cx
-	pushw	%dx
-	pushw   %di
-
-	pushw	%fs
-	popw    %es
-
-	movl	$0x13131313, %eax		# memset block with 0x13
-	movw    $32, %cx
-	movw	$0x140, %di
-	cld
-	rep
-	stosl
-
-	cmpw	$0x0200, vbe_version		# only do EDID on >= VBE2.0
-	jl	no_edid
-
-	pushw   %es				# save ES
-	xorw    %di, %di                        # Report Capability
-	pushw   %di
-	popw    %es                             # ES:DI must be 0:0
-	movw	$0x4f15, %ax
-	xorw	%bx, %bx
-	xorw	%cx, %cx
-	int	$0x10
-	popw    %es                             # restore ES
-
-	cmpb    $0x00, %ah                      # call successful
-	jne     no_edid
-
-	cmpb    $0x4f, %al                      # function supported
-	jne     no_edid
-
-	movw	$0x4f15, %ax                    # do VBE/DDC
-	movw	$0x01, %bx
-	movw	$0x00, %cx
-	movw    $0x00, %dx
-	movw	$0x140, %di
-	int	$0x10
-
-no_edid:
-	popw	%di				# restore all registers
-	popw	%dx
-	popw	%cx
-	popw	%bx
-	popw	%ax
-	popw	%es
-#endif
-	ret
-
-# VIDEO_SELECT-only variables
-mt_end:		.word	0	# End of video mode table if built
-edit_buf:	.space	6	# Line editor buffer
-card_name:	.word	0	# Pointer to adapter name
-scanning:	.byte	0	# Performing mode scan
-do_restore:	.byte	0	# Screen contents altered during mode change
-svga_prefix:	.byte	VIDEO_FIRST_BIOS>>8	# Default prefix for BIOS modes
-graphic_mode:	.byte	0	# Graphic mode with a linear frame buffer
-dac_size:	.byte	6	# DAC bit depth
-vbe_version:	.word	0	# VBE bios version
-
-# Status messages
-keymsg:		.ascii	"Press <RETURN> to see video modes available, "
-		.ascii	"<SPACE> to continue or wait 30 secs"
-		.byte	0x0d, 0x0a, 0
-
-listhdr:	.byte	0x0d, 0x0a
-		.ascii	"Mode:    COLSxROWS:"
-
-crlft:		.byte	0x0d, 0x0a, 0
-
-prompt:		.byte	0x0d, 0x0a
-		.asciz	"Enter mode number or `scan': "
-
-unknt:		.asciz	"Unknown mode ID. Try again."
-
-badmdt:		.ascii	"You passed an undefined mode number."
-		.byte	0x0d, 0x0a, 0
-
-vesaer:		.ascii	"Error: Scanning of VESA modes failed. Please "
-		.ascii	"report to <mj@ucw.cz>."
-		.byte	0x0d, 0x0a, 0
-
-old_name:	.asciz	"CGA/MDA/HGA"
-
-ega_name:	.asciz	"EGA"
-
-svga_name:	.ascii	" "
-
-vga_name:	.asciz	"VGA"
-
-vesa_name:	.asciz	"VESA"
-
-name_bann:	.asciz	"Video adapter: "
-#endif /* CONFIG_VIDEO_SELECT */
-
-# Other variables:
-adapter:	.byte	0	# Video adapter: 0=CGA/MDA/HGA,1=EGA,2=VGA
-video_segment:	.word	0xb800	# Video memory segment
-force_size:	.word	0	# Use this size instead of the one in BIOS vars

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (22 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01 15:45   ` Chuck Ebbert
  2007-05-01  3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


vfat implements compat handlers for these ioctls, but when they
were executed on other file systems the kernel would still complain
about an unknown compat ioctl.  Just declare them as compatible
and let them be rejected when not needed by the normal path.

This makes wine runs a lot quieter

Signed-off-by: Andi Kleen <ak@suse.de>

---
 fs/compat_ioctl.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2627,6 +2627,15 @@ COMPATIBLE_IOCTL(LPRESET)
 /*LPGETSTATS not implemented, but no kernels seem to compile it in anyways*/
 COMPATIBLE_IOCTL(LPGETFLAGS)
 HANDLE_IOCTL(LPSETTIMEOUT, lp_timeout_trans)
+
+/* fat 'r' ioctls. These are handled by fat with ->compat_ioctl,
+   but we don't want warnings on other file systems. So declare
+   them as compatible here. */
+#define VFAT_IOCTL_READDIR_BOTH32       _IOR('r', 1, struct compat_dirent[2])
+#define VFAT_IOCTL_READDIR_SHORT32      _IOR('r', 2, struct compat_dirent[2])
+
+IGNORE_IOCTL(VFAT_IOCTL_READDIR_BOTH32)
+IGNORE_IOCTL(VFAT_IOCTL_READDIR_SHORT32)
 };
 
 int ioctl_table_size = ARRAY_SIZE(ioctl_start);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (23 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]


Fix:

In file included from include2/asm/apic.h:5,
                 from include2/asm/smp.h:15,
                 from linux/arch/x86_64/kernel/genapic_flat.c:18:
linux/include/linux/pm.h: In function ‘call_platform_enable_wakeup’:
linux/include/linux/pm.h:331: error: ‘EIO’ undeclared (first use in this function)
linux/include/linux/pm.h:331: error: (Each undeclared identifier is reported only once
linux/include/linux/pm.h:331: error: for each function it appears in.)

Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/kernel/genapic_flat.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/x86_64/kernel/genapic_flat.c
===================================================================
--- linux.orig/arch/x86_64/kernel/genapic_flat.c
+++ linux/arch/x86_64/kernel/genapic_flat.c
@@ -8,6 +8,7 @@
  * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
  * James Cleverdon.
  */
+#include <linux/errno.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
 #include <linux/string.h>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [26/30] i386: Drop noisy e820 debugging printks
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (24 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/e820.c |   15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

Index: linux/arch/i386/kernel/e820.c
===================================================================
--- linux.orig/arch/i386/kernel/e820.c
+++ linux/arch/i386/kernel/e820.c
@@ -393,10 +393,8 @@ int __init sanitize_e820_map(struct e820
 		   ____________________33__
 		   ______________________4_
 	*/
-	printk("sanitize start\n");
 	/* if there's only one memory region, don't bother */
 	if (*pnr_map < 2) {
-		printk("sanitize bail 0\n");
 		return -1;
 	}
 
@@ -405,7 +403,6 @@ int __init sanitize_e820_map(struct e820
 	/* bail out if we find any unreasonable addresses in bios map */
 	for (i=0; i<old_nr; i++)
 		if (biosmap[i].addr + biosmap[i].size < biosmap[i].addr) {
-			printk("sanitize bail 1\n");
 			return -1;
 		}
 
@@ -501,7 +498,6 @@ int __init sanitize_e820_map(struct e820
 	memcpy(biosmap, new_bios, new_nr*sizeof(struct e820entry));
 	*pnr_map = new_nr;
 
-	printk("sanitize end\n");
 	return 0;
 }
 
@@ -532,7 +528,6 @@ int __init copy_e820_map(struct e820entr
 		unsigned long long size = biosmap->size;
 		unsigned long long end = start + size;
 		unsigned long type = biosmap->type;
-		printk("copy_e820_map() start: %016Lx size: %016Lx end: %016Lx type: %ld\n", start, size, end, type);
 
 		/* Overflow in 64 bits? Ignore the memory map. */
 		if (start > end)
@@ -543,17 +538,11 @@ int __init copy_e820_map(struct e820entr
 		 * Not right. Fix it up.
 		 */
 		if (type == E820_RAM) {
-			printk("copy_e820_map() type is E820_RAM\n");
 			if (start < 0x100000ULL && end > 0xA0000ULL) {
-				printk("copy_e820_map() lies in range...\n");
-				if (start < 0xA0000ULL) {
-					printk("copy_e820_map() start < 0xA0000ULL\n");
+				if (start < 0xA0000ULL)
 					add_memory_region(start, 0xA0000ULL-start, type);
-				}
-				if (end <= 0x100000ULL) {
-					printk("copy_e820_map() end <= 0x100000ULL\n");
+				if (end <= 0x100000ULL)
 					continue;
-				}
 				start = 0x100000ULL;
 				size = end - start;
 			}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [27/30] i386: white space fixes in i387.h
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (25 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: jan.kiszka, patches, linux-kernel


From: Jan Kiszka <jan.kiszka@web.de>

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/i387.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux/include/asm-i386/i387.h
===================================================================
--- linux.orig/include/asm-i386/i387.h
+++ linux/include/asm-i386/i387.h
@@ -83,8 +83,8 @@ static inline void __save_init_fpu( stru
 
 #define __clear_fpu( tsk )					\
 do {								\
-	if (task_thread_info(tsk)->status & TS_USEDFPU) {		\
-		asm volatile("fnclex ; fwait");				\
+	if (task_thread_info(tsk)->status & TS_USEDFPU) {	\
+		asm volatile("fnclex ; fwait");			\
 		task_thread_info(tsk)->status &= ~TS_USEDFPU;	\
 		stts();						\
 	}							\
@@ -113,7 +113,7 @@ static inline void save_init_fpu( struct
 	__clear_fpu( tsk );	\
 	preempt_enable();	\
 } while (0)
-					\
+
 /*
  * FPU state interaction...
  */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (26 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
  2007-05-01  3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: jan.kiszka, patches, linux-kernel


From: Jan Kiszka <jan.kiszka@web.de>

There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and
none of them appear to require additional preempt_disable/enable here.
Let's open-code save_init_fpu in __unlazy_fpu to save a few ops.

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 include/asm-i386/i387.h |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Index: linux/include/asm-i386/i387.h
===================================================================
--- linux.orig/include/asm-i386/i387.h
+++ linux/include/asm-i386/i387.h
@@ -74,11 +74,12 @@ static inline void __save_init_fpu( stru
 	task_thread_info(tsk)->status &= ~TS_USEDFPU;
 }
 
-#define __unlazy_fpu( tsk ) do { \
-	if (task_thread_info(tsk)->status & TS_USEDFPU) \
-		save_init_fpu( tsk ); 			\
-	else						\
-		tsk->fpu_counter = 0;			\
+#define __unlazy_fpu( tsk ) do {				\
+	if (task_thread_info(tsk)->status & TS_USEDFPU) {	\
+		__save_init_fpu(tsk);				\
+		stts();						\
+	} else							\
+		tsk->fpu_counter = 0;				\
 } while (0)
 
 #define __clear_fpu( tsk )					\

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (27 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
  29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: patches, linux-kernel


asm-offsets.c is valid source code and needs to be diffed.

Signed-off-by: Andi Kleen <ak@suse.de>

---
 Documentation/dontdiff |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/Documentation/dontdiff
===================================================================
--- linux.orig/Documentation/dontdiff
+++ linux/Documentation/dontdiff
@@ -55,8 +55,8 @@ aic7*seq.h*
 aicasm
 aicdb.h*
 asm
-asm-offsets.*
-asm_offsets.*
+asm-offsets.h
+asm_offsets.h
 autoconf.h*
 bbootsect
 bin2c

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
                   ` (28 preceding siblings ...)
  2007-05-01  3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
@ 2007-05-01  3:58 ` Andi Kleen
  2007-05-01  4:26   ` Eric Dumazet
  2007-05-01  4:37   ` William Lee Irwin III
  29 siblings, 2 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  3:58 UTC (permalink / raw)
  To: ebiederm, patches, linux-kernel


From: ebiederm@xmission.com

When in PAE mode we require that the user kernel divide to be
on a 1G boundary.  The 2G/2G split does not have that property
so require !X86_PAE

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/i386/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 1a94a73..80003de 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -570,6 +570,7 @@ choice
 		depends on !HIGHMEM
 		bool "3G/1G user/kernel split (for full 1G low memory)"
 	config VMSPLIT_2G
+		depends on !X86_PAE
 		bool "2G/2G user/kernel split"
 	config VMSPLIT_1G
 		bool "1G/3G user/kernel split"
-- 
1.5.1.1.181.g2de0



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
@ 2007-05-01  4:26   ` Eric Dumazet
  2007-05-01  6:21     ` Andi Kleen
  2007-05-01  4:37   ` William Lee Irwin III
  1 sibling, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01  4:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ebiederm, patches, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]

Andi Kleen a écrit :
> From: ebiederm@xmission.com
> 
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary.  The 2G/2G split does not have that property
> so require !X86_PAE
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  arch/i386/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> index 1a94a73..80003de 100644
> --- a/arch/i386/Kconfig
> +++ b/arch/i386/Kconfig
> @@ -570,6 +570,7 @@ choice
>  		depends on !HIGHMEM
>  		bool "3G/1G user/kernel split (for full 1G low memory)"
>  	config VMSPLIT_2G
> +		depends on !X86_PAE
>  		bool "2G/2G user/kernel split"
>  	config VMSPLIT_1G
>  		bool "1G/3G user/kernel split"

Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change 
PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?

Maybe the following patch is better ?

[PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE

When in PAE mode we require that the user kernel divide to be
on a 1G boundary.  We must therefore make sure PAGE_OFFSET is correctlty 
defined in the 2G/2G split and PAE mode.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

[-- Attachment #2: i386_CONFIG_PAGE_OFFSET.patch --]
[-- Type: text/plain, Size: 414 bytes --]

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 53d6237..32356f2 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -578,7 +578,8 @@ endchoice
 config PAGE_OFFSET
 	hex
 	default 0xB0000000 if VMSPLIT_3G_OPT
-	default 0x78000000 if VMSPLIT_2G
+	default 0x78000000 if (VMSPLIT_2G && !X86_PAE)
+	default 0x80000000 if (VMSPLIT_2G && X86_PAE)
 	default 0x40000000 if VMSPLIT_1G
 	default 0xC0000000
 

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
  2007-05-01  4:26   ` Eric Dumazet
@ 2007-05-01  4:37   ` William Lee Irwin III
  2007-05-01  4:57     ` Eric Dumazet
                       ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01  4:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Lord, ebiederm, patches, linux-kernel

On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
> From: ebiederm@xmission.com
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary.  The 2G/2G split does not have that property
> so require !X86_PAE
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  arch/i386/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)

What on earth?

config PAGE_OFFSET
        hex
        default 0xB0000000 if VMSPLIT_3G_OPT
        default 0x78000000 if VMSPLIT_2G
        default 0x40000000 if VMSPLIT_1G
        default 0xC0000000

This appears to have been introduced by:
commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
Author: Mark Lord <lkml@rtr.ca>
Date:   Wed Feb 1 03:06:11 2006 -0800
    [PATCH] VMSPLIT config options

There's some sort of insanity going on here. Since when is 0x78000000
a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
nothing else, as it's certainly not 2GB/2GB.

These VMSPLIT config options vs. PAE are foul as they're now done in
any event. If they were done properly, they'd properly set up the pmd
within which the division point between user and kernelspace falls.

This patch, I suppose, stops people from shooting themselves in the
foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
without handling the partial pmd case. 2MB/4MB resolution is enough
granularity for any reasonable purpose, so split ptes aren't worth the
effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
If you're going to play the VMSPLIT game at all, handle split pmd's.

I'll see what else is pending in the i386 pagetable arena and clear
this up if there aren't other objections (this is where Andi gets to
complain that things are too complex already and preemptively NAK to
save me the effort, if it's not seen to be desirable). Eric, your patch
is a reasonable stop-gap measure for the original deficiency.


-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  4:37   ` William Lee Irwin III
@ 2007-05-01  4:57     ` Eric Dumazet
  2007-05-01  5:11       ` William Lee Irwin III
  2007-05-01  5:35     ` Eric W. Biederman
  2007-05-01 13:32     ` Mark Lord
  2 siblings, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01  4:57 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Mark Lord, ebiederm, patches, linux-kernel

William Lee Irwin III a écrit :
> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary.  The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  arch/i386/Kconfig |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> What on earth?
> 
> config PAGE_OFFSET
>         hex
>         default 0xB0000000 if VMSPLIT_3G_OPT
>         default 0x78000000 if VMSPLIT_2G
>         default 0x40000000 if VMSPLIT_1G
>         default 0xC0000000
> 
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date:   Wed Feb 1 03:06:11 2006 -0800
>     [PATCH] VMSPLIT config options
> 
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.

Please could you stop saying others are insane ?

They are like you and can fail sometime. Apparently when the patch came, 
nobody (including you) commented.

It's not that difficult to think about VMALLOC space (I might be wrong about 
this, but I feel this explains 78000000 vs 80000000)

> 
> These VMSPLIT config options vs. PAE are foul as they're now done in
> any event. If they were done properly, they'd properly set up the pmd
> within which the division point between user and kernelspace falls.
> 
> This patch, I suppose, stops people from shooting themselves in the
> foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
> without handling the partial pmd case. 2MB/4MB resolution is enough
> granularity for any reasonable purpose, so split ptes aren't worth the
> effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
> If you're going to play the VMSPLIT game at all, handle split pmd's.
> 
> I'll see what else is pending in the i386 pagetable arena and clear
> this up if there aren't other objections (this is where Andi gets to
> complain that things are too complex already and preemptively NAK to
> save me the effort, if it's not seen to be desirable). Eric, your patch
> is a reasonable stop-gap measure for the original deficiency.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  4:57     ` Eric Dumazet
@ 2007-05-01  5:11       ` William Lee Irwin III
  0 siblings, 0 replies; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01  5:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, Mark Lord, ebiederm, patches, linux-kernel

William Lee Irwin III a ?crit :
>> There's some sort of insanity going on here. Since when is 0x78000000
>> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
>> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
>> nothing else, as it's certainly not 2GB/2GB.

On Tue, May 01, 2007 at 06:57:19AM +0200, Eric Dumazet wrote:
> Please could you stop saying others are insane ?
> They are like you and can fail sometime. Apparently when the patch came, 
> nobody (including you) commented.
> It's not that difficult to think about VMALLOC space (I might be wrong 
> about this, but I feel this explains 78000000 vs 80000000)

I'm obviously aware of vmallocspace. Read carefully:

    >> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
    >> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
    >> nothing else, as it's certainly not 2GB/2GB.

The meaning of "for laptops" is that it's carving out a chunk of
user virtualspace to use for vmallocspace in lieu of carving out a
piece of the 1:1 mapping of physical memory for the same purpose.


-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  4:37   ` William Lee Irwin III
  2007-05-01  4:57     ` Eric Dumazet
@ 2007-05-01  5:35     ` Eric W. Biederman
  2007-05-01 13:32     ` Mark Lord
  2 siblings, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-05-01  5:35 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andi Kleen, Mark Lord, patches, linux-kernel

William Lee Irwin III <wli@holomorphy.com> writes:

> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary.  The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  arch/i386/Kconfig |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> What on earth?
>
> config PAGE_OFFSET
>         hex
>         default 0xB0000000 if VMSPLIT_3G_OPT
>         default 0x78000000 if VMSPLIT_2G
>         default 0x40000000 if VMSPLIT_1G
>         default 0xC0000000
>
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date:   Wed Feb 1 03:06:11 2006 -0800
>     [PATCH] VMSPLIT config options
>
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.

It makes a little more sense when you realize all of the options
were originally !X86_PAE.  So they were designed with highmem
disabled.

> These VMSPLIT config options vs. PAE are foul as they're now done in
> any event. If they were done properly, they'd properly set up the pmd
> within which the division point between user and kernelspace falls.

They were designed to avoid highmem a totally different design point.

> This patch, I suppose, stops people from shooting themselves in the
> foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
> without handling the partial pmd case. 2MB/4MB resolution is enough
> granularity for any reasonable purpose, so split ptes aren't worth the
> effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
> If you're going to play the VMSPLIT game at all, handle split pmd's.

What I find telling is that I fixed this based on code review not
based on bug reports.

> I'll see what else is pending in the i386 pagetable arena and clear
> this up if there aren't other objections (this is where Andi gets to
> complain that things are too complex already and preemptively NAK to
> save me the effort, if it's not seen to be desirable). Eric, your patch
> is a reasonable stop-gap measure for the original deficiency.

Frankly rather then putting much effort into this I suspect we should
just delete these options entirely.  We are long past the point where
they matter.

Eric

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
  2007-05-01  3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
@ 2007-05-01  5:57   ` Jeremy Fitzhardinge
  2007-05-01  7:23     ` Andi Kleen
  0 siblings, 1 reply; 52+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  5:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: patches, linux-kernel

Andi Kleen wrote:
> This implements new vDSO for x86-64.  The concept is similar
> to the existing vDSOs on i386 and PPC.  x86-64 has had static
> vsyscalls before,  but these are not flexible enough anymore.
>
> A vDSO is a ELF shared library supplied by the kernel that is mapped into 
> user address space.  The vDSO mapping is randomized for each process
> for security reasons.
>
> Doing this was needed for clock_gettime, because clock_gettime
> always needs a syscall fallback and having one at a fixed
> address would have made buffer overflow exploits too easy to write.
>
> The vdso can be disabled with vdso=0
>
> It currently includes a new gettimeofday implemention and optimized
> clock_gettime(). The gettimeofday implementation is slightly faster
> than the one in the old vsyscall.  clock_gettime is significantly faster 
> than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
>
> The new calls are generally faster than the old vsyscall. 
>
> TBD: add new benchmarks
>
> Advantages over the old x86-64 vsyscalls:
> - Extensible
> - Randomized
> - Cleaner
> - Easier to virtualize (the old static address range previously causes
> overhead e.g. for Xen because it has to create special page tables for it) 
>
> Weak points: 
> - glibc support still to be written
>
> The VM interface is partly based on Ingo Molnar's i386 version.
>
> Signed-off-by: Andi Kleen <ak@suse.de>
>
> ---
>  Documentation/kernel-parameters.txt |    2 
>  arch/x86_64/Makefile                |    3 
>  arch/x86_64/ia32/ia32_binfmt.c      |    1 
>  arch/x86_64/kernel/time.c           |    1 
>  arch/x86_64/kernel/vmlinux.lds.S    |   12 +++
>  arch/x86_64/kernel/vsyscall.c       |   22 +----
>  arch/x86_64/mm/init.c               |   17 ++++
>  arch/x86_64/vdso/Makefile           |   49 ++++++++++++
>  arch/x86_64/vdso/vclock_gettime.c   |  120 +++++++++++++++++++++++++++++++
>  arch/x86_64/vdso/vdso-note.S        |   25 ++++++
>  arch/x86_64/vdso/vdso-start.S       |    2 
>  arch/x86_64/vdso/vdso.S             |    2 
>  arch/x86_64/vdso/vdso.lds.S         |   77 ++++++++++++++++++++
>  arch/x86_64/vdso/vextern.h          |   16 ++++
>  arch/x86_64/vdso/vgetcpu.c          |   50 +++++++++++++
>  arch/x86_64/vdso/vma.c              |  137 ++++++++++++++++++++++++++++++++++++
>  arch/x86_64/vdso/voffset.h          |    1 
>  arch/x86_64/vdso/vvar.c             |   12 +++
>  include/asm-x86_64/auxvec.h         |    2 
>  include/asm-x86_64/elf.h            |   13 +++
>  include/asm-x86_64/mmu.h            |    1 
>  include/asm-x86_64/pgtable.h        |    8 +-
>  include/asm-x86_64/vgtod.h          |   29 +++++++
>  include/asm-x86_64/vsyscall.h       |    3 
>  24 files changed, 583 insertions(+), 22 deletions(-)
>
> Index: linux/arch/x86_64/ia32/ia32_binfmt.c
> ===================================================================
> --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
> +++ linux/arch/x86_64/ia32/ia32_binfmt.c
> @@ -38,6 +38,7 @@
>  
>  int sysctl_vsyscall32 = 1;
>  
> +#undef ARCH_DLINFO
>  #define ARCH_DLINFO do {  \
>  	if (sysctl_vsyscall32) { \
>  	NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
> Index: linux/arch/x86_64/kernel/vmlinux.lds.S
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
> +++ linux/arch/x86_64/kernel/vmlinux.lds.S
> @@ -94,6 +94,9 @@ SECTIONS
>    .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
>  		{ *(.vsyscall_gtod_data) }
>    vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
> +  .vsyscall_clock : AT(VLOAD(.vsyscall_clock))
> +		{ *(.vsyscall_clock) }
> +  vsyscall_clock = VVIRT(.vsyscall_clock);
>  
>  
>    .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
> @@ -153,6 +156,8 @@ SECTIONS
>  
>    . = ALIGN(4096);		/* Init code and data */
>    __init_begin = .;
> +
> +
>    .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
>  	_sinittext = .;
>  	*(.init.text)
> @@ -190,6 +195,12 @@ SECTIONS
>    .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
>    .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
>  
> +/* vdso blob that is mapped into user space */
> +  vdso_start = . ;
> +  .vdso  : AT(ADDR(.vdso) - LOAD_OFFSET) { *(.vdso) }
> +  . = ALIGN(4096);
> +  vdso_end = .;
> +
>  #ifdef CONFIG_BLK_DEV_INITRD
>    . = ALIGN(4096);
>    __initramfs_start = .;
> @@ -202,6 +213,7 @@ SECTIONS
>    .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
>    __per_cpu_end = .;
>    . = ALIGN(4096);
> +
>    __init_end = .;
>  
>    . = ALIGN(4096);
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -159,6 +159,14 @@ static __init void set_pte_phys(unsigned
>  	__flush_tlb_one(vaddr);
>  }
>  
> +void __init
> +set_kernel_map(void *vaddr,unsigned long len,unsigned long phys,pgprot_t prot)
> +{
> +	void *end = vaddr + ALIGN(len, PAGE_SIZE);
> +	for (; vaddr < end; vaddr += PAGE_SIZE, phys += PAGE_SIZE)
> +		set_pte_phys((unsigned long)vaddr, phys, prot);
> +}
> +
>  /* NOTE: this is meant to be run only at boot */
>  void __init 
>  __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
> @@ -756,3 +764,12 @@ int in_gate_area_no_task(unsigned long a
>  {
>  	return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
>  }
> +
> +const char *arch_vma_name(struct vm_area_struct *vma)
> +{
> +	if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
> +		return "[vdso]";
> +	if (vma == &gate_vma)
> +		return "[vsyscall]";
> +	return NULL;
> +}
> Index: linux/arch/x86_64/vdso/vdso-note.S
> ===================================================================
> --- /dev/null
> +++ linux/arch/x86_64/vdso/vdso-note.S
> @@ -0,0 +1,25 @@
> +/*
> + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
> + * Here we can supply some information useful to userland.
> + */
> +
> +#include <linux/uts.h>
> +#include <linux/version.h>
> +
> +#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type)			      \
>   

Use linux/elfnote.h?

    J

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  4:26   ` Eric Dumazet
@ 2007-05-01  6:21     ` Andi Kleen
  2007-05-01 13:01       ` Bill Irwin
  0 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  6:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, ebiederm, patches, linux-kernel, bill.irwin

On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
> Andi Kleen a ?crit :
> >From: ebiederm@xmission.com
> >
> >When in PAE mode we require that the user kernel divide to be
> >on a 1G boundary.  The 2G/2G split does not have that property
> >so require !X86_PAE
> >
> >Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >---
> > arch/i386/Kconfig |    1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> >index 1a94a73..80003de 100644
> >--- a/arch/i386/Kconfig
> >+++ b/arch/i386/Kconfig
> >@@ -570,6 +570,7 @@ choice
> > 		depends on !HIGHMEM
> > 		bool "3G/1G user/kernel split (for full 1G low memory)"
> > 	config VMSPLIT_2G
> >+		depends on !X86_PAE
> > 		bool "2G/2G user/kernel split"
> > 	config VMSPLIT_1G
> > 		bool "1G/3G user/kernel split"
> 
> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change 
> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?

I dropped the patch for now.

> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
> 
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary.  We must therefore make sure PAGE_OFFSET is correctlty 
> defined in the 2G/2G split and PAE mode.

Looks reasonable. Did you test both cases? wli, ok for you too?

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
  2007-05-01  5:57   ` Jeremy Fitzhardinge
@ 2007-05-01  7:23     ` Andi Kleen
  2007-05-01  8:00       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01  7:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: patches, linux-kernel


> Use linux/elfnote.h?

Good point. Will fix. I just copied it from i386, but it probably
should be there fixed too.

Actually they are not completely equivalent because the old macro
generates the section name differently from the name in the payload
(kernelversion, Linux). But I split that up into two notes now.

If that is done on i386 it might break gdb reading core files though?

BTW if anybody noticed the patch also contained a few dead hunks (two unused functions/
prototypes and some white space change). These are already gone now too.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
  2007-05-01  7:23     ` Andi Kleen
@ 2007-05-01  8:00       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 52+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  8:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: patches, linux-kernel

Andi Kleen wrote:
>> Use linux/elfnote.h?
>>     
>
> Good point. Will fix. I just copied it from i386, but it probably
> should be there fixed too.
>   

Already has been.  I posted a patch a day or so ago (reposted below).

> Actually they are not completely equivalent because the old macro
> generates the section name differently from the name in the payload
> (kernelversion, Linux). But I split that up into two notes now.
>   

Roland agreed that it doesn't matter.

> If that is done on i386 it might break gdb reading core files though?
>   

No.  The notename in the final output is the same either way.  The
linker packs .note.* into a single .note section.

Subject: use elfnote.h to generate vsyscall notes.

Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally.  Changes elfnote.h a bit to suite, since this is the first
asm user, and it wasn't quite right.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>

---
 arch/i386/kernel/vsyscall-note.S |   23 ++++++-----------------
 include/linux/elfnote.h          |   18 +++++++++++++-----
 2 files changed, 19 insertions(+), 22 deletions(-)

===================================================================
--- a/arch/i386/kernel/vsyscall-note.S
+++ b/arch/i386/kernel/vsyscall-note.S
@@ -3,23 +3,12 @@
  * Here we can supply some information useful to userland.
  */
 
-#include <linux/uts.h>
 #include <linux/version.h>
+#include <linux/elfnote.h>
 
-#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type)			      \
-	.section name, flags;						      \
-	.balign 4;							      \
-	.long 1f - 0f;		/* name length */			      \
-	.long 3f - 2f;		/* data length */			      \
-	.long type;		/* note type */				      \
-0:	.asciz vendor;		/* vendor name */			      \
-1:	.balign 4;							      \
-2:
-
-#define ASM_ELF_NOTE_END						      \
-3:	.balign 4;		/* pad out section */			      \
-	.previous
-
-	ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0)
+/* Ideally this would use UTS_NAME, but using a quoted string here
+   doesn't work. Remember to change this when changing the
+   kernel's name. */
+ELFNOTE_START(Linux, 0, "a")
 	.long LINUX_VERSION_CODE
-	ASM_ELF_NOTE_END
+ELFNOTE_END
===================================================================
--- a/include/linux/elfnote.h
+++ b/include/linux/elfnote.h
@@ -38,17 +38,25 @@
  * e.g. ELFNOTE(XYZCo, 42, .asciz, "forty-two")
  *      ELFNOTE(XYZCo, 12, .long, 0xdeadbeef)
  */
-#define ELFNOTE(name, type, desctype, descdata)	\
-.pushsection .note.name, "",@note	;	\
+#define ELFNOTE_START(name, type, flags)	\
+.pushsection .note.name, flags,@note	;	\
   .align 4				;	\
   .long 2f - 1f		/* namesz */	;	\
-  .long 4f - 3f		/* descsz */	;	\
+  .long 4484f - 3f	/* descsz */	;	\
   .long type				;	\
 1:.asciz #name				;	\
 2:.align 4				;	\
-3:desctype descdata			;	\
-4:.align 4				;	\
+3:
+
+#define ELFNOTE_END				\
+4484:.align 4				;	\
 .popsection				;
+
+#define ELFNOTE(name, type, desc)		\
+	ELFNOTE_START(name, type, "")		\
+		desc			;	\
+	ELFNOTE_END
+
 #else	/* !__ASSEMBLER__ */
 #include <linux/elf.h>
 /*

    J

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  6:21     ` Andi Kleen
@ 2007-05-01 13:01       ` Bill Irwin
  2007-05-01 13:49         ` Mark Lord
  2007-05-01 15:51         ` Eric Dumazet
  0 siblings, 2 replies; 52+ messages in thread
From: Bill Irwin @ 2007-05-01 13:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Eric Dumazet, ebiederm, patches, linux-kernel, bill.irwin

On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change 
>> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?

On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
> I dropped the patch for now.

I'm more miffed about what it's cleaning up after than the patch itself.


On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary.  We must therefore make sure PAGE_OFFSET is correctlty 
>> defined in the 2G/2G split and PAE mode.

On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
> Looks reasonable. Did you test both cases? wli, ok for you too?

Sorry about the delay in replying.

I don't mind so long as we're not letting doorstop configs through. I'd
probably do something more like

Index: sched/arch/i386/Kconfig
===================================================================
--- sched.orig/arch/i386/Kconfig	2007-05-01 04:35:47.065162310 -0700
+++ sched/arch/i386/Kconfig	2007-05-01 04:36:50.100754504 -0700
@@ -571,6 +571,9 @@
 		bool "3G/1G user/kernel split (for full 1G low memory)"
 	config VMSPLIT_2G
 		bool "2G/2G user/kernel split"
+	config VMSPLIT_2G_OPT
+		depends on !HIGHMEM
+		bool "2G/2G user/kernel split (for full 2G low memory)"
 	config VMSPLIT_1G
 		bool "1G/3G user/kernel split"
 endchoice
@@ -578,7 +581,8 @@
 config PAGE_OFFSET
 	hex
 	default 0xB0000000 if VMSPLIT_3G_OPT
-	default 0x78000000 if VMSPLIT_2G
+	default 0x80000000 if VMSPLIT_2G
+	default 0x78000000 if VMSPLIT_2G_OPT
 	default 0x40000000 if VMSPLIT_1G
 	default 0xC0000000
 
as a stopgap measure, but I'm not all that interested in grabbing patch
credits where others could do it easily enough. Either of the config
alterations is fine by me as they now stand; maybe Eric Dumazet might
care to do something like my suggestion at some point.

My interest here is in approaches that aren't really centered around
config options. Those are pmd handling for 1GB-unaligned PAGE_OFFSET
in PAE and dynamic vmallocspace reservation, the latter of which is
more complex than the first. I'd probably only do the pmd handling as
it's much easier than dynamic vmallocspace, which does substantial
violence to the core in order to reserve chunks of ZONE_NORMAL's
virtualspace so that no boot-time virtualspace reservations need to be
made for vmalloc(). Basically it would make vmalloc() proper use
physically contiguous memory and vmap() fiddle with ZONE_NORMAL
pagetables while reserving physical memory underlying the virtualspace
reserved for vmap() so that it's no longer necessary to carve
vmallocspace out of userspace to avoid highmem. That would occur at the
cost of runtime memory footprint of the rarely-called vmap() and
ZONE_NORMAL mapping updates. It would also alleviate pressure on
vmallocspace in configurations where it would be severely limited.

I'm doing a bit of thinking about this laptop-avoiding-highmem problem.
I've not come up with any better ideas than the dynamic vmallocspace
approach to avoid ABI damage while avoiding both highmem and sacrificing
memory for 1GB laptops, and slightly mitigating ABI damage for 2GB
laptops' highmem avoidance efforts. I'm thinking the applicability
isn't broad enough to merit the effort of dynamic vmallocspace. The pmd
fixup for 1GB-unaligned splits is not such a big deal in comparison.


-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01  4:37   ` William Lee Irwin III
  2007-05-01  4:57     ` Eric Dumazet
  2007-05-01  5:35     ` Eric W. Biederman
@ 2007-05-01 13:32     ` Mark Lord
  2007-05-01 14:17       ` William Lee Irwin III
  2 siblings, 1 reply; 52+ messages in thread
From: Mark Lord @ 2007-05-01 13:32 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andi Kleen, ebiederm, patches, linux-kernel

William Lee Irwin III wrote:
> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary.  The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  arch/i386/Kconfig |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> What on earth?
> 
> config PAGE_OFFSET
>         hex
>         default 0xB0000000 if VMSPLIT_3G_OPT
>         default 0x78000000 if VMSPLIT_2G
>         default 0x40000000 if VMSPLIT_1G
>         default 0xC0000000
> 
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date:   Wed Feb 1 03:06:11 2006 -0800
>     [PATCH] VMSPLIT config options
> 
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.

You need to go search the archives and read the *extensive* thread
on this from when it was introduced.  Lots of high profile kernel
developers were in on this one.

The idea is really simple:  eliminate the need for HIGHMEM
on common machines.  

And yes, VMSPLIT_2G really means VMSPLIT_2G_OPT,
in the same way as the (added last) VMSPLIT_3G_OPT flag.

Cheers

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 13:01       ` Bill Irwin
@ 2007-05-01 13:49         ` Mark Lord
  2007-05-01 15:51         ` Eric Dumazet
  1 sibling, 0 replies; 52+ messages in thread
From: Mark Lord @ 2007-05-01 13:49 UTC (permalink / raw)
  To: Bill Irwin, Andi Kleen, Eric Dumazet, ebiederm, patches,
	linux-kernel, Jens Axboe

Bill Irwin wrote:
>
> I don't mind so long as we're not letting doorstop configs through. I'd
> probably do something more like
> 
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig	2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig	2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
>  		bool "3G/1G user/kernel split (for full 1G low memory)"
>  	config VMSPLIT_2G
>  		bool "2G/2G user/kernel split"
> +	config VMSPLIT_2G_OPT
> +		depends on !HIGHMEM
> +		bool "2G/2G user/kernel split (for full 2G low memory)"
>  	config VMSPLIT_1G
>  		bool "1G/3G user/kernel split"
>  endchoice
> @@ -578,7 +581,8 @@
>  config PAGE_OFFSET
>  	hex
>  	default 0xB0000000 if VMSPLIT_3G_OPT
> -	default 0x78000000 if VMSPLIT_2G
> +	default 0x80000000 if VMSPLIT_2G
> +	default 0x78000000 if VMSPLIT_2G_OPT
>  	default 0x40000000 if VMSPLIT_1G
>  	default 0xC0000000
>  
> as a stopgap measure, but I'm not all that interested in grabbing patch
..

Yup, I second that one.  The idea of the original 2G split
was to avoid the need for HIGHMEM entirely, reducing overhead
on slower machines.

Having both kinds of splits is fine, but probably just the original one
with the !PAE is okay too.

-ml

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 13:32     ` Mark Lord
@ 2007-05-01 14:17       ` William Lee Irwin III
  2007-05-01 14:20         ` Mark Lord
  0 siblings, 1 reply; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01 14:17 UTC (permalink / raw)
  To: Mark Lord; +Cc: Andi Kleen, ebiederm, patches, linux-kernel

William Lee Irwin III wrote:
>> There's some sort of insanity going on here. Since when is 0x78000000
>> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
>> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
>> nothing else, as it's certainly not 2GB/2GB.

On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> You need to go search the archives and read the *extensive* thread
> on this from when it was introduced.  Lots of high profile kernel
> developers were in on this one.

Various good points were raised. The NX bit requiring
3-level pagetables suggests that 3-level pagetables without highmem
would be valuable, at which point the unaligned splits to avoid highmem
with 3-level pagetables gain additional relevance. I'll queue that up
to be written along with split pmd affairs, and check for features
dependent on 4-level pagetables, too.


On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> The idea is really simple:  eliminate the need for HIGHMEM
> on common machines.  

Of course. Everything is simple until it's done properly. It's
why everyone forgets the "and no simpler" half of the quote, which
turns the meaning of what everyone tries to quote on its head.

I guess looking over it I don't blame you so much. I should've been
around to clean up these issues at the time it was happening. People
seem to have been trying to use the RAM in their laptops without big
slowdowns, and may not have had the VM "sophistication" to deal with
the sorts of issues I'm raising.


On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> And yes, VMSPLIT_2G really means VMSPLIT_2G_OPT,
> in the same way as the (added last) VMSPLIT_3G_OPT flag.

This was a large portion of the substance of my complaint.


-- wli

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 14:17       ` William Lee Irwin III
@ 2007-05-01 14:20         ` Mark Lord
  0 siblings, 0 replies; 52+ messages in thread
From: Mark Lord @ 2007-05-01 14:20 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andi Kleen, ebiederm, patches, linux-kernel

William Lee Irwin III wrote:
>..
> On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
>> You need to go search the archives and read the *extensive* thread
>> on this from when it was introduced.  Lots of high profile kernel
>> developers were in on this one.
> 
> Various good points were raised. The NX bit requiring
> 3-level pagetables suggests that 3-level pagetables without highmem
> would be valuable, at which point the unaligned splits to avoid highmem
> with 3-level pagetables gain additional relevance. I'll queue that up
> to be written along with split pmd affairs, and check for features
> dependent on 4-level pagetables, too.

Excellent!  That would be super to have in there, William.

Cheers!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
  2007-05-01  3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
@ 2007-05-01 15:45   ` Chuck Ebbert
  2007-05-02 10:46     ` Andi Kleen
  0 siblings, 1 reply; 52+ messages in thread
From: Chuck Ebbert @ 2007-05-01 15:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: patches, linux-kernel

Andi Kleen wrote:
> vfat implements compat handlers for these ioctls, but when they
> were executed on other file systems the kernel would still complain
> about an unknown compat ioctl.  Just declare them as compatible
> and let them be rejected when not needed by the normal path.
> 
> This makes wine runs a lot quieter

Does this also restore the original -ENOTTY return code?

The change that made it return -EINVAL broke a few Wine apps.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 13:01       ` Bill Irwin
  2007-05-01 13:49         ` Mark Lord
@ 2007-05-01 15:51         ` Eric Dumazet
  2007-05-01 17:00           ` Bill Irwin
  1 sibling, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 15:51 UTC (permalink / raw)
  To: Bill Irwin, Andi Kleen, Eric Dumazet, ebiederm, patches,
	linux-kernel

Bill Irwin a écrit :
> On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>>> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change 
>>> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?
> 
> On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
>> I dropped the patch for now.
> 
> I'm more miffed about what it's cleaning up after than the patch itself.
> 
> 
> On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>>> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
>>> When in PAE mode we require that the user kernel divide to be
>>> on a 1G boundary.  We must therefore make sure PAGE_OFFSET is correctlty 
>>> defined in the 2G/2G split and PAE mode.
> 
> On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
>> Looks reasonable. Did you test both cases? wli, ok for you too?
> 
> Sorry about the delay in replying.
> 
> I don't mind so long as we're not letting doorstop configs through. I'd
> probably do something more like
> 
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig	2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig	2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
>  		bool "3G/1G user/kernel split (for full 1G low memory)"
>  	config VMSPLIT_2G
>  		bool "2G/2G user/kernel split"
> +	config VMSPLIT_2G_OPT
> +		depends on !HIGHMEM
> +		bool "2G/2G user/kernel split (for full 2G low memory)"
>  	config VMSPLIT_1G
>  		bool "1G/3G user/kernel split"
>  endchoice
> @@ -578,7 +581,8 @@
>  config PAGE_OFFSET
>  	hex
>  	default 0xB0000000 if VMSPLIT_3G_OPT
> -	default 0x78000000 if VMSPLIT_2G
> +	default 0x80000000 if VMSPLIT_2G
> +	default 0x78000000 if VMSPLIT_2G_OPT
>  	default 0x40000000 if VMSPLIT_1G
>  	default 0xC0000000
>  
> as a stopgap measure, but I'm not all that interested in grabbing patch
> credits where others could do it easily enough. Either of the config
> alterations is fine by me as they now stand; maybe Eric Dumazet might
> care to do something like my suggestion at some point.

Your patch is very fine Bill, please resubmit it with proper Signed-off-by

My first patch was a trivial reaction to try to keep alive 2G/2G split, yours 
is better for fine tuning.

Thank you



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 15:51         ` Eric Dumazet
@ 2007-05-01 17:00           ` Bill Irwin
  2007-05-01 17:17             ` Eric W. Biederman
  2007-05-02  9:38             ` Andi Kleen
  0 siblings, 2 replies; 52+ messages in thread
From: Bill Irwin @ 2007-05-01 17:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bill Irwin, Andi Kleen, ebiederm, patches, linux-kernel

Bill Irwin a écrit :
>> as a stopgap measure, but I'm not all that interested in grabbing patch
>> credits where others could do it easily enough. Either of the config
>> alterations is fine by me as they now stand; maybe Eric Dumazet might
>> care to do something like my suggestion at some point.

On Tue, May 01, 2007 at 05:51:44PM +0200, Eric Dumazet wrote:
> Your patch is very fine Bill, please resubmit it with proper Signed-off-by
> My first patch was a trivial reaction to try to keep alive 2G/2G split, 
> yours is better for fine tuning.

I was hoping you would submit it as an update, but maybe adding your
Signed-off-by: to my own will do.


-- wli


Only 1GB-aligned kernel/user splits are now handled for PAE. The
2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
mapping for physical memory by using an actual split of 1.875/2.125
to accommodate 128MB of vmallocspace out of what would otherwise
be a full 2GB for userspace. That attempt disturbs the alignment
required by PAE for 2GB/2GB splits, and furthermore does not provide
a 2GB/2GB split as advertised.

This patch resolves the issues here in two manners. The first is
by providing a true 2GB/2GB split in addition to the 1.875/2.125
split. The second is by renaming the 1.875/2.125 split to
CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
performs a similar manuever to avoid aliasing vmallocspace with
the 1:1 mapping for physical memory around the 3GB boundary. With
the 1.875/2.125 split properly-named, its config option is then
tagged as depending on !HIGHMEM to express the PAE implementation's
current inability to deal with such unaligned splits.

This patch is essentially a combination of two patches, one written
by Eric Biederman and the other by Eric Dumazet. If they could add
their Signed-off-by: to this, I'd be much obliged.

Signed-off-by: William Irwin <wli@holomorphy.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>


Index: sched/arch/i386/Kconfig
===================================================================
--- sched.orig/arch/i386/Kconfig	2007-05-01 04:35:47.065162310 -0700
+++ sched/arch/i386/Kconfig	2007-05-01 04:36:50.100754504 -0700
@@ -571,6 +571,9 @@
 		bool "3G/1G user/kernel split (for full 1G low memory)"
 	config VMSPLIT_2G
 		bool "2G/2G user/kernel split"
+	config VMSPLIT_2G_OPT
+		depends on !HIGHMEM
+		bool "2G/2G user/kernel split (for full 2G low memory)"
 	config VMSPLIT_1G
 		bool "1G/3G user/kernel split"
 endchoice
@@ -578,7 +581,8 @@
 config PAGE_OFFSET
 	hex
 	default 0xB0000000 if VMSPLIT_3G_OPT
-	default 0x78000000 if VMSPLIT_2G
+	default 0x80000000 if VMSPLIT_2G
+	default 0x78000000 if VMSPLIT_2G_OPT
 	default 0x40000000 if VMSPLIT_1G
 	default 0xC0000000
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 17:00           ` Bill Irwin
@ 2007-05-01 17:17             ` Eric W. Biederman
  2007-05-01 20:41               ` Eric Dumazet
  2007-05-02  9:38             ` Andi Kleen
  1 sibling, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-05-01 17:17 UTC (permalink / raw)
  To: Bill Irwin; +Cc: Andi Kleen, patches, linux-kernel

Bill Irwin <bill.irwin@oracle.com> writes:

>
> Only 1GB-aligned kernel/user splits are now handled for PAE. The
> 2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
> mapping for physical memory by using an actual split of 1.875/2.125
> to accommodate 128MB of vmallocspace out of what would otherwise
> be a full 2GB for userspace. That attempt disturbs the alignment
> required by PAE for 2GB/2GB splits, and furthermore does not provide
> a 2GB/2GB split as advertised.
>
> This patch resolves the issues here in two manners. The first is
> by providing a true 2GB/2GB split in addition to the 1.875/2.125
> split. The second is by renaming the 1.875/2.125 split to
> CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
> performs a similar manuever to avoid aliasing vmallocspace with
> the 1:1 mapping for physical memory around the 3GB boundary. With
> the 1.875/2.125 split properly-named, its config option is then
> tagged as depending on !HIGHMEM to express the PAE implementation's
> current inability to deal with such unaligned splits.
>
> This patch is essentially a combination of two patches, one written
> by Eric Biederman and the other by Eric Dumazet. If they could add
> their Signed-off-by: to this, I'd be much obliged.
>
> Signed-off-by: William Irwin <wli@holomorphy.com>
> Cc: Eric Dumazet <dada1@cosmosbay.com>
> Cc: Mark Lord <lkml@rtr.ca>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Andi Kleen <ak@suse.de>

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
>
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig	2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig	2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
>  		bool "3G/1G user/kernel split (for full 1G low memory)"
>  	config VMSPLIT_2G
>  		bool "2G/2G user/kernel split"
> +	config VMSPLIT_2G_OPT
> +		depends on !HIGHMEM
> +		bool "2G/2G user/kernel split (for full 2G low memory)"
>  	config VMSPLIT_1G
>  		bool "1G/3G user/kernel split"
>  endchoice
> @@ -578,7 +581,8 @@
>  config PAGE_OFFSET
>  	hex
>  	default 0xB0000000 if VMSPLIT_3G_OPT
> -	default 0x78000000 if VMSPLIT_2G
> +	default 0x80000000 if VMSPLIT_2G
> +	default 0x78000000 if VMSPLIT_2G_OPT
>  	default 0x40000000 if VMSPLIT_1G
>  	default 0xC0000000
>  

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 17:17             ` Eric W. Biederman
@ 2007-05-01 20:41               ` Eric Dumazet
  0 siblings, 0 replies; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 20:41 UTC (permalink / raw)
  To: Bill Irwin; +Cc: Eric W. Biederman, Andi Kleen, patches, linux-kernel

Eric W. Biederman a écrit :
> Bill Irwin <bill.irwin@oracle.com> writes:
> 
>> Only 1GB-aligned kernel/user splits are now handled for PAE. The
>> 2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
>> mapping for physical memory by using an actual split of 1.875/2.125
>> to accommodate 128MB of vmallocspace out of what would otherwise
>> be a full 2GB for userspace. That attempt disturbs the alignment
>> required by PAE for 2GB/2GB splits, and furthermore does not provide
>> a 2GB/2GB split as advertised.
>>
>> This patch resolves the issues here in two manners. The first is
>> by providing a true 2GB/2GB split in addition to the 1.875/2.125
>> split. The second is by renaming the 1.875/2.125 split to
>> CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
>> performs a similar manuever to avoid aliasing vmallocspace with
>> the 1:1 mapping for physical memory around the 3GB boundary. With
>> the 1.875/2.125 split properly-named, its config option is then
>> tagged as depending on !HIGHMEM to express the PAE implementation's
>> current inability to deal with such unaligned splits.
>>
>> This patch is essentially a combination of two patches, one written
>> by Eric Biederman and the other by Eric Dumazet. If they could add
>> their Signed-off-by: to this, I'd be much obliged.
>>
>> Signed-off-by: William Irwin <wli@holomorphy.com>
>> Cc: Eric Dumazet <dada1@cosmosbay.com>
>> Cc: Mark Lord <lkml@rtr.ca>
>> Cc: Eric W. Biederman <ebiederm@xmission.com>
>> Cc: Andi Kleen <ak@suse.de>
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Eric Dumazet <dada1@cosmosbay.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
  2007-05-01 17:00           ` Bill Irwin
  2007-05-01 17:17             ` Eric W. Biederman
@ 2007-05-02  9:38             ` Andi Kleen
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-02  9:38 UTC (permalink / raw)
  To: Bill Irwin; +Cc: Eric Dumazet, ebiederm, patches, linux-kernel

On Tuesday 01 May 2007 19:00:46 Bill Irwin wrote:
> Bill Irwin a écrit :
> >> as a stopgap measure, but I'm not all that interested in grabbing patch
> >> credits where others could do it easily enough. Either of the config
> >> alterations is fine by me as they now stand; maybe Eric Dumazet might
> >> care to do something like my suggestion at some point.
> 
> On Tue, May 01, 2007 at 05:51:44PM +0200, Eric Dumazet wrote:
> > Your patch is very fine Bill, please resubmit it with proper Signed-off-by
> > My first patch was a trivial reaction to try to keep alive 2G/2G split, 
> > yours is better for fine tuning.
> 
> I was hoping you would submit it as an update, but maybe adding your
> Signed-off-by: to my own will do.

Added thanks

-Andi


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
  2007-05-01 15:45   ` Chuck Ebbert
@ 2007-05-02 10:46     ` Andi Kleen
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-02 10:46 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: patches, linux-kernel

On Tuesday 01 May 2007 17:45:52 Chuck Ebbert wrote:
> Andi Kleen wrote:
> > vfat implements compat handlers for these ioctls, but when they
> > were executed on other file systems the kernel would still complain
> > about an unknown compat ioctl.  Just declare them as compatible
> > and let them be rejected when not needed by the normal path.
> > 
> > This makes wine runs a lot quieter
> 
> Does this also restore the original -ENOTTY return code?
> 
> The change that made it return -EINVAL broke a few Wine apps.

No it's still EINVAL as before. I just eliminated the printks.

I guess one could add a IGNORE_IOCTL_ENOTTY() or somesuch though if it's 
really a problem.

-Andi


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2007-05-02 10:47 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-01  3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
2007-05-01  3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
2007-05-01  3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
2007-05-01  3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
2007-05-01  3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
2007-05-01  3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
2007-05-01  3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
2007-05-01  3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
2007-05-01  3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
2007-05-01  5:57   ` Jeremy Fitzhardinge
2007-05-01  7:23     ` Andi Kleen
2007-05-01  8:00       ` Jeremy Fitzhardinge
2007-05-01  3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
2007-05-01  3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
2007-05-01  3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
2007-05-01  3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
2007-05-01  3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
2007-05-01  3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
2007-05-01  3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
2007-05-01  3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
2007-05-01  3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
2007-05-01  3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
2007-05-01  3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
2007-05-01  3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
2007-05-01  3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
2007-05-01  3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
2007-05-01  3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
2007-05-01  3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
2007-05-01 15:45   ` Chuck Ebbert
2007-05-02 10:46     ` Andi Kleen
2007-05-01  3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
2007-05-01  3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
2007-05-01  3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
2007-05-01  3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
2007-05-01  3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
2007-05-01  3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
2007-05-01  4:26   ` Eric Dumazet
2007-05-01  6:21     ` Andi Kleen
2007-05-01 13:01       ` Bill Irwin
2007-05-01 13:49         ` Mark Lord
2007-05-01 15:51         ` Eric Dumazet
2007-05-01 17:00           ` Bill Irwin
2007-05-01 17:17             ` Eric W. Biederman
2007-05-01 20:41               ` Eric Dumazet
2007-05-02  9:38             ` Andi Kleen
2007-05-01  4:37   ` William Lee Irwin III
2007-05-01  4:57     ` Eric Dumazet
2007-05-01  5:11       ` William Lee Irwin III
2007-05-01  5:35     ` Eric W. Biederman
2007-05-01 13:32     ` Mark Lord
2007-05-01 14:17       ` William Lee Irwin III
2007-05-01 14:20         ` Mark Lord

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox