* [PATCH] [1/30] x86_64: Dynamically adjust machine check interval
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
@ 2007-05-01 3:57 ` Andi Kleen
2007-05-01 3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
` (28 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:57 UTC (permalink / raw)
To: thockin, patches, linux-kernel
From: Tim Hockin <thockin@google.com>
Background:
We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
especially when we are trying really hard to stress the system out. The
current MCE poller uses a static interval which does not care whether it
has or has not found MCEs recently.
Description:
This patch makes the MCE poller adjust the polling interval dynamically.
If we find an MCE, poll 2x faster (down to 10 ms). When we stop finding
MCEs, poll 2x slower (up to check_interval seconds). The check_interval
tunable becomes the max polling interval. The "Machine check events
logged" printk() is rate limited to the check_interval, which should be
identical behavior to the old functionality.
Result:
If you start to take a lot of correctable errors (not exceptions), you
log them faster and more accurately (less chance of overflowing the MCA
registers). If you don't take a lot of errors, you will see no change.
Alternatives:
I considered simply reducing the polling interval to 10 ms immediately
and keeping it there as long as we continue to find errors. This felt a
bit heavy handed, but does perform significantly better for the default
check_interval of 5 minutes (we're using a few seconds when testing for
DRAM errors). I could be convinced to go with this, if anyone felt it
was not too aggressive.
Testing:
I used an error-injecting DIMM to create lots of correctable DRAM errors
and verified that the polling interval accelerates. The printk() only
happens once per check_interval seconds.
Patch:
This patch is against 2.6.21-rc7.
Signed-Off-By: Tim Hockin <thockin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
---
Documentation/x86_64/machinecheck | 7 ++++++-
arch/x86_64/kernel/mce.c | 32 ++++++++++++++++++++++++--------
2 files changed, 30 insertions(+), 9 deletions(-)
Index: linux/Documentation/x86_64/machinecheck
===================================================================
--- linux.orig/Documentation/x86_64/machinecheck
+++ linux/Documentation/x86_64/machinecheck
@@ -36,7 +36,12 @@ between all CPUs.
check_interval
How often to poll for corrected machine check errors, in seconds
- (Note output is hexademical). Default 5 minutes.
+ (Note output is hexademical). Default 5 minutes. When the poller
+ finds MCEs it triggers an exponential speedup (poll more often) on
+ the polling interval. When the poller stops finding MCEs, it
+ triggers an exponential backoff (poll less often) on the polling
+ interval. The check_interval variable is both the initial and
+ maximum polling interval.
tolerant
Tolerance level. When a machine check exception occurs for a non
Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -323,10 +323,13 @@ void mce_log_therm_throt_event(unsigned
#endif /* CONFIG_X86_MCE_INTEL */
/*
- * Periodic polling timer for "silent" machine check errors.
+ * Periodic polling timer for "silent" machine check errors. If the
+ * poller finds an MCE, poll 2x faster. When the poller finds no more
+ * errors, poll 2x slower (up to check_interval seconds).
*/
static int check_interval = 5 * 60; /* 5 minutes */
+static int next_interval; /* in jiffies */
static void mcheck_timer(struct work_struct *work);
static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
@@ -339,7 +342,6 @@ static void mcheck_check_cpu(void *info)
static void mcheck_timer(struct work_struct *work)
{
on_each_cpu(mcheck_check_cpu, NULL, 1, 1);
- schedule_delayed_work(&mcheck_work, check_interval * HZ);
/*
* It's ok to read stale data here for notify_user and
@@ -349,17 +351,30 @@ static void mcheck_timer(struct work_str
* writes.
*/
if (notify_user && console_logged) {
+ static unsigned long last_print;
+ unsigned long now = jiffies;
+
+ /* if we logged an MCE, reduce the polling interval */
+ next_interval = max(next_interval/2, HZ/100);
notify_user = 0;
clear_bit(0, &console_logged);
- printk(KERN_INFO "Machine check events logged\n");
+ if (time_after_eq(now, last_print + (check_interval*HZ))) {
+ last_print = now;
+ printk(KERN_INFO "Machine check events logged\n");
+ }
+ } else {
+ next_interval = min(next_interval*2, check_interval*HZ);
}
+
+ schedule_delayed_work(&mcheck_work, next_interval);
}
static __init int periodic_mcheck_init(void)
{
- if (check_interval)
- schedule_delayed_work(&mcheck_work, check_interval*HZ);
+ next_interval = check_interval * HZ;
+ if (next_interval)
+ schedule_delayed_work(&mcheck_work, next_interval);
return 0;
}
__initcall(periodic_mcheck_init);
@@ -597,12 +612,13 @@ static int mce_resume(struct sys_device
/* Reinit MCEs after user configuration changes */
static void mce_restart(void)
{
- if (check_interval)
+ if (next_interval)
cancel_delayed_work(&mcheck_work);
/* Timer race is harmless here */
on_each_cpu(mce_init, NULL, 1, 1);
- if (check_interval)
- schedule_delayed_work(&mcheck_work, check_interval*HZ);
+ next_interval = check_interval * HZ;
+ if (next_interval)
+ schedule_delayed_work(&mcheck_work, next_interval);
}
static struct sysdev_class mce_sysclass = {
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
2007-05-01 3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
@ 2007-05-01 3:57 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
` (27 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:57 UTC (permalink / raw)
To: suresh.b.siddha, andi, dada1, rientjes, clameter, patches,
linux-kernel
From: Suresh Siddha <suresh.b.siddha@intel.com>
Set the node_possible_map at runtime on x86_64. On a non NUMA system,
num_possible_nodes() will now say '1'.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
---
---
arch/x86_64/mm/k8topology.c | 7 ++-----
arch/x86_64/mm/numa.c | 10 ++++++++--
arch/x86_64/mm/srat.c | 8 +++++---
3 files changed, 15 insertions(+), 10 deletions(-)
Index: linux/arch/x86_64/mm/k8topology.c
===================================================================
--- linux.orig/arch/x86_64/mm/k8topology.c
+++ linux/arch/x86_64/mm/k8topology.c
@@ -49,11 +49,8 @@ int __init k8_scan_nodes(unsigned long s
int found = 0;
u32 reg;
unsigned numnodes;
- nodemask_t nodes_parsed;
unsigned dualcore = 0;
- nodes_clear(nodes_parsed);
-
if (!early_pci_allowed())
return -1;
@@ -102,7 +99,7 @@ int __init k8_scan_nodes(unsigned long s
nodeid, (base>>8)&3, (limit>>8) & 3);
return -1;
}
- if (node_isset(nodeid, nodes_parsed)) {
+ if (node_isset(nodeid, node_possible_map)) {
printk(KERN_INFO "Node %d already present. Skipping\n",
nodeid);
continue;
@@ -155,7 +152,7 @@ int __init k8_scan_nodes(unsigned long s
prevbase = base;
- node_set(nodeid, nodes_parsed);
+ node_set(nodeid, node_possible_map);
}
if (!found)
Index: linux/arch/x86_64/mm/numa.c
===================================================================
--- linux.orig/arch/x86_64/mm/numa.c
+++ linux/arch/x86_64/mm/numa.c
@@ -295,7 +295,7 @@ static int __init setup_node_range(int n
ret = -1;
}
nodes[nid].end = *addr;
- node_set_online(nid);
+ node_set(nid, node_possible_map);
printk(KERN_INFO "Faking node %d at %016Lx-%016Lx (%LuMB)\n", nid,
nodes[nid].start, nodes[nid].end,
(nodes[nid].end - nodes[nid].start) >> 20);
@@ -479,7 +479,7 @@ out:
* SRAT.
*/
remove_all_active_ranges();
- for_each_online_node(i) {
+ for_each_node_mask(i, node_possible_map) {
e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
@@ -494,20 +494,25 @@ void __init numa_initmem_init(unsigned l
{
int i;
+ nodes_clear(node_possible_map);
+
#ifdef CONFIG_NUMA_EMU
if (cmdline && !numa_emulation(start_pfn, end_pfn))
return;
+ nodes_clear(node_possible_map);
#endif
#ifdef CONFIG_ACPI_NUMA
if (!numa_off && !acpi_scan_nodes(start_pfn << PAGE_SHIFT,
end_pfn << PAGE_SHIFT))
return;
+ nodes_clear(node_possible_map);
#endif
#ifdef CONFIG_K8_NUMA
if (!numa_off && !k8_scan_nodes(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT))
return;
+ nodes_clear(node_possible_map);
#endif
printk(KERN_INFO "%s\n",
numa_off ? "NUMA turned off" : "No NUMA configuration found");
@@ -521,6 +526,7 @@ void __init numa_initmem_init(unsigned l
memnodemap[0] = 0;
nodes_clear(node_online_map);
node_set_online(0);
+ node_set(0, node_possible_map);
for (i = 0; i < NR_CPUS; i++)
numa_set_node(i, 0);
node_to_cpumask[0] = cpumask_of_cpu(0);
Index: linux/arch/x86_64/mm/srat.c
===================================================================
--- linux.orig/arch/x86_64/mm/srat.c
+++ linux/arch/x86_64/mm/srat.c
@@ -419,19 +419,21 @@ int __init acpi_scan_nodes(unsigned long
return -1;
}
+ node_possible_map = nodes_parsed;
+
/* Finally register nodes */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
/* Try again in case setup_node_bootmem missed one due
to missing bootmem */
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, node_possible_map)
if (!node_online(i))
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
for (i = 0; i < NR_CPUS; i++) {
if (cpu_to_node[i] == NUMA_NO_NODE)
continue;
- if (!node_isset(cpu_to_node[i], nodes_parsed))
+ if (!node_isset(cpu_to_node[i], node_possible_map))
numa_set_node(i, NUMA_NO_NODE);
}
numa_init_array();
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [3/30] i386: Clean up NMI watchdog code
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
2007-05-01 3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
2007-05-01 3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
` (26 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/kernel/cpu/Makefile | 2
arch/i386/kernel/cpu/perfctr-watchdog.c | 658 +++++++++++++++++++++++++
arch/i386/kernel/nmi.c | 829 ++------------------------------
include/asm-i386/nmi.h | 8
4 files changed, 721 insertions(+), 776 deletions(-)
Index: linux/arch/i386/kernel/nmi.c
===================================================================
--- linux.orig/arch/i386/kernel/nmi.c
+++ linux/arch/i386/kernel/nmi.c
@@ -20,7 +20,6 @@
#include <linux/sysdev.h>
#include <linux/sysctl.h>
#include <linux/percpu.h>
-#include <linux/dmi.h>
#include <linux/kprobes.h>
#include <linux/cpumask.h>
#include <linux/kernel_stat.h>
@@ -28,30 +27,14 @@
#include <asm/smp.h>
#include <asm/nmi.h>
#include <asm/kdebug.h>
-#include <asm/intel_arch_perfmon.h>
#include "mach_traps.h"
int unknown_nmi_panic;
int nmi_watchdog_enabled;
-/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
- * evtsel_nmi_owner tracks the ownership of the event selection
- * - different performance counters/ event selection may be reserved for
- * different subsystems this reservation system just tries to coordinate
- * things a little
- */
-
-/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
- * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
- */
-#define NMI_MAX_COUNTER_BITS 66
-#define NMI_MAX_COUNTER_LONGS BITS_TO_LONGS(NMI_MAX_COUNTER_BITS)
-
-static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-
static cpumask_t backtrace_mask = CPU_MASK_NONE;
+
/* nmi_active:
* >0: the lapic NMI watchdog is active, but can be disabled
* <0: the lapic NMI watchdog has not been set up, and cannot
@@ -63,203 +46,11 @@ atomic_t nmi_active = ATOMIC_INIT(0); /
unsigned int nmi_watchdog = NMI_DEFAULT;
static unsigned int nmi_hz = HZ;
-struct nmi_watchdog_ctlblk {
- int enabled;
- u64 check_bit;
- unsigned int cccr_msr;
- unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
- unsigned int evntsel_msr; /* the MSR to select the events to handle */
-};
-static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+static DEFINE_PER_CPU(short, wd_enabled);
/* local prototypes */
static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
-{
- /* returns the bit offset of the performance counter register */
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return (msr - MSR_K7_PERFCTR0);
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return (msr - MSR_ARCH_PERFMON_PERFCTR0);
-
- switch (boot_cpu_data.x86) {
- case 6:
- return (msr - MSR_P6_PERFCTR0);
- case 15:
- return (msr - MSR_P4_BPU_PERFCTR0);
- }
- }
- return 0;
-}
-
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
-{
- /* returns the bit offset of the event selection register */
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return (msr - MSR_K7_EVNTSEL0);
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
-
- switch (boot_cpu_data.x86) {
- case 6:
- return (msr - MSR_P6_EVNTSEL0);
- case 15:
- return (msr - MSR_P4_BSU_ESCR0);
- }
- }
- return 0;
-}
-
-/* checks for a bit availability (hack for oprofile) */
-int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
-{
- int cpu;
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
- for_each_possible_cpu (cpu) {
- if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
- return 0;
- }
- return 1;
-}
-
-/* checks the an msr for availability */
-int avail_to_resrv_perfctr_nmi(unsigned int msr)
-{
- unsigned int counter;
- int cpu;
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- for_each_possible_cpu (cpu) {
- if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
- return 0;
- }
- return 1;
-}
-
-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- if (!test_and_set_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]))
- return 1;
- return 0;
-}
-
-static void __release_perfctr_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- clear_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)[0]);
-}
-
-int reserve_perfctr_nmi(unsigned int msr)
-{
- int cpu, i;
- for_each_possible_cpu (cpu) {
- if (!__reserve_perfctr_nmi(cpu, msr)) {
- for_each_possible_cpu (i) {
- if (i >= cpu)
- break;
- __release_perfctr_nmi(i, msr);
- }
- return 0;
- }
- }
- return 1;
-}
-
-void release_perfctr_nmi(unsigned int msr)
-{
- int cpu;
- for_each_possible_cpu (cpu) {
- __release_perfctr_nmi(cpu, msr);
- }
-}
-
-int __reserve_evntsel_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_evntsel_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- if (!test_and_set_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]))
- return 1;
- return 0;
-}
-
-static void __release_evntsel_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_evntsel_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- clear_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]);
-}
-
-int reserve_evntsel_nmi(unsigned int msr)
-{
- int cpu, i;
- for_each_possible_cpu (cpu) {
- if (!__reserve_evntsel_nmi(cpu, msr)) {
- for_each_possible_cpu (i) {
- if (i >= cpu)
- break;
- __release_evntsel_nmi(i, msr);
- }
- return 0;
- }
- }
- return 1;
-}
-
-void release_evntsel_nmi(unsigned int msr)
-{
- int cpu;
- for_each_possible_cpu (cpu) {
- __release_evntsel_nmi(cpu, msr);
- }
-}
-
-static __cpuinit inline int nmi_known_cpu(void)
-{
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6)
- || (boot_cpu_data.x86 == 16));
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return 1;
- else
- return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
- }
- return 0;
-}
-
static int endflag __initdata = 0;
#ifdef CONFIG_SMP
@@ -281,28 +72,6 @@ static __init void nmi_cpu_busy(void *da
}
#endif
-static unsigned int adjust_for_32bit_ctr(unsigned int hz)
-{
- u64 counter_val;
- unsigned int retval = hz;
-
- /*
- * On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter
- * are writable, with higher bits sign extending from bit 31.
- * So, we can only program the counter with 31 bit values and
- * 32nd bit should be 1, for 33.. to be 1.
- * Find the appropriate nmi_hz
- */
- counter_val = (u64)cpu_khz * 1000;
- do_div(counter_val, retval);
- if (counter_val > 0x7fffffffULL) {
- u64 count = (u64)cpu_khz * 1000;
- do_div(count, 0x7fffffffUL);
- retval = count + 1;
- }
- return retval;
-}
-
static int __init check_nmi_watchdog(void)
{
unsigned int *prev_nmi_count;
@@ -335,14 +104,14 @@ static int __init check_nmi_watchdog(voi
if (!cpu_isset(cpu, cpu_callin_map))
continue;
#endif
- if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+ if (!per_cpu(wd_enabled, cpu))
continue;
if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu,
prev_nmi_count[cpu],
nmi_count(cpu));
- per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+ per_cpu(wd_enabled, cpu) = 0;
atomic_dec(&nmi_active);
}
}
@@ -356,16 +125,8 @@ static int __init check_nmi_watchdog(voi
/* now that we know it works we can reduce NMI frequency to
something more reasonable; makes a difference in some configs */
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- nmi_hz = 1;
-
- if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
- wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
- nmi_hz = adjust_for_32bit_ctr(nmi_hz);
- }
- }
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ nmi_hz = lapic_adjust_nmi_hz(1);
kfree(prev_nmi_count);
return 0;
@@ -388,85 +149,8 @@ static int __init setup_nmi_watchdog(cha
__setup("nmi_watchdog=", setup_nmi_watchdog);
-static void disable_lapic_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
- if (atomic_read(&nmi_active) <= 0)
- return;
-
- on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
- BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-static void enable_lapic_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
- /* are we already enabled */
- if (atomic_read(&nmi_active) != 0)
- return;
-
- /* are we lapic aware */
- if (nmi_known_cpu() <= 0)
- return;
-
- on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
- touch_nmi_watchdog();
-}
-void disable_timer_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
- if (atomic_read(&nmi_active) <= 0)
- return;
-
- disable_irq(0);
- on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
- BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-void enable_timer_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
- if (atomic_read(&nmi_active) == 0) {
- touch_nmi_watchdog();
- on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
- enable_irq(0);
- }
-}
-
-static void __acpi_nmi_disable(void *__unused)
-{
- apic_write_around(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
-}
-
-/*
- * Disable timer based NMIs on all CPUs:
- */
-void acpi_nmi_disable(void)
-{
- if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
- on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
-}
-
-static void __acpi_nmi_enable(void *__unused)
-{
- apic_write_around(APIC_LVT0, APIC_DM_NMI);
-}
-
-/*
- * Enable timer based NMIs on all CPUs:
- */
-void acpi_nmi_enable(void)
-{
- if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
- on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
-}
+/* Suspend/resume support */
#ifdef CONFIG_PM
@@ -513,7 +197,7 @@ static int __init init_lapic_nmi_sysfs(v
if (nmi_watchdog != NMI_LOCAL_APIC)
return 0;
- if ( atomic_read(&nmi_active) < 0 )
+ if (atomic_read(&nmi_active) < 0)
return 0;
error = sysdev_class_register(&nmi_sysclass);
@@ -526,433 +210,69 @@ late_initcall(init_lapic_nmi_sysfs);
#endif /* CONFIG_PM */
-/*
- * Activate the NMI watchdog via the local APIC.
- * Original code written by Keith Owens.
- */
-
-static void write_watchdog_counter(unsigned int perfctr_msr, const char *descr)
-{
- u64 count = (u64)cpu_khz * 1000;
-
- do_div(count, nmi_hz);
- if(descr)
- Dprintk("setting %s to -0x%08Lx\n", descr, count);
- wrmsrl(perfctr_msr, 0 - count);
-}
-
-static void write_watchdog_counter32(unsigned int perfctr_msr,
- const char *descr)
-{
- u64 count = (u64)cpu_khz * 1000;
-
- do_div(count, nmi_hz);
- if(descr)
- Dprintk("setting %s to -0x%08Lx\n", descr, count);
- wrmsr(perfctr_msr, (u32)(-count), 0);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
- the frequency varies with CPU load. */
-
-#define K7_EVNTSEL_ENABLE (1 << 22)
-#define K7_EVNTSEL_INT (1 << 20)
-#define K7_EVNTSEL_OS (1 << 17)
-#define K7_EVNTSEL_USR (1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
-#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
-
-static int setup_k7_watchdog(void)
-{
- unsigned int perfctr_msr, evntsel_msr;
- unsigned int evntsel;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- perfctr_msr = MSR_K7_PERFCTR0;
- evntsel_msr = MSR_K7_EVNTSEL0;
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- wrmsrl(perfctr_msr, 0UL);
-
- evntsel = K7_EVNTSEL_INT
- | K7_EVNTSEL_OS
- | K7_EVNTSEL_USR
- | K7_NMI_EVENT;
-
- /* setup the timer */
- wrmsr(evntsel_msr, evntsel, 0);
- write_watchdog_counter(perfctr_msr, "K7_PERFCTR0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- evntsel |= K7_EVNTSEL_ENABLE;
- wrmsr(evntsel_msr, evntsel, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = 0; //unused
- wd->check_bit = 1ULL<<63;
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
-}
-
-static void stop_k7_watchdog(void)
-{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-#define P6_EVNTSEL0_ENABLE (1 << 22)
-#define P6_EVNTSEL_INT (1 << 20)
-#define P6_EVNTSEL_OS (1 << 17)
-#define P6_EVNTSEL_USR (1 << 16)
-#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79
-#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED
-
-static int setup_p6_watchdog(void)
-{
- unsigned int perfctr_msr, evntsel_msr;
- unsigned int evntsel;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- perfctr_msr = MSR_P6_PERFCTR0;
- evntsel_msr = MSR_P6_EVNTSEL0;
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- wrmsrl(perfctr_msr, 0UL);
-
- evntsel = P6_EVNTSEL_INT
- | P6_EVNTSEL_OS
- | P6_EVNTSEL_USR
- | P6_NMI_EVENT;
-
- /* setup the timer */
- wrmsr(evntsel_msr, evntsel, 0);
- nmi_hz = adjust_for_32bit_ctr(nmi_hz);
- write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- evntsel |= P6_EVNTSEL0_ENABLE;
- wrmsr(evntsel_msr, evntsel, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = 0; //unused
- wd->check_bit = 1ULL<<39;
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
-}
-
-static void stop_p6_watchdog(void)
-{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
- the frequency varies with CPU load. */
-
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
-#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
-#define P4_ESCR_OS (1<<3)
-#define P4_ESCR_USR (1<<2)
-#define P4_CCCR_OVF_PMI0 (1<<26)
-#define P4_CCCR_OVF_PMI1 (1<<27)
-#define P4_CCCR_THRESHOLD(N) ((N)<<20)
-#define P4_CCCR_COMPLEMENT (1<<19)
-#define P4_CCCR_COMPARE (1<<18)
-#define P4_CCCR_REQUIRED (3<<16)
-#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
-#define P4_CCCR_ENABLE (1<<12)
-#define P4_CCCR_OVF (1<<31)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
- CRU_ESCR0 (with any non-null event selector) through a complemented
- max threshold. [IA32-Vol3, Section 14.9.9] */
-
-static int setup_p4_watchdog(void)
+static void __acpi_nmi_enable(void *__unused)
{
- unsigned int perfctr_msr, evntsel_msr, cccr_msr;
- unsigned int evntsel, cccr_val;
- unsigned int misc_enable, dummy;
- unsigned int ht_num;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
- if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
- return 0;
-
-#ifdef CONFIG_SMP
- /* detect which hyperthread we are on */
- if (smp_num_siblings == 2) {
- unsigned int ebx, apicid;
-
- ebx = cpuid_ebx(1);
- apicid = (ebx >> 24) & 0xff;
- ht_num = apicid & 1;
- } else
-#endif
- ht_num = 0;
-
- /* performance counters are shared resources
- * assign each hyperthread its own set
- * (re-use the ESCR0 register, seems safe
- * and keeps the cccr_val the same)
- */
- if (!ht_num) {
- /* logical cpu 0 */
- perfctr_msr = MSR_P4_IQ_PERFCTR0;
- evntsel_msr = MSR_P4_CRU_ESCR0;
- cccr_msr = MSR_P4_IQ_CCCR0;
- cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
- } else {
- /* logical cpu 1 */
- perfctr_msr = MSR_P4_IQ_PERFCTR1;
- evntsel_msr = MSR_P4_CRU_ESCR0;
- cccr_msr = MSR_P4_IQ_CCCR1;
- cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
- }
-
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- evntsel = P4_ESCR_EVENT_SELECT(0x3F)
- | P4_ESCR_OS
- | P4_ESCR_USR;
-
- cccr_val |= P4_CCCR_THRESHOLD(15)
- | P4_CCCR_COMPLEMENT
- | P4_CCCR_COMPARE
- | P4_CCCR_REQUIRED;
-
- wrmsr(evntsel_msr, evntsel, 0);
- wrmsr(cccr_msr, cccr_val, 0);
- write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- cccr_val |= P4_CCCR_ENABLE;
- wrmsr(cccr_msr, cccr_val, 0);
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = cccr_msr;
- wd->check_bit = 1ULL<<39;
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
+ apic_write_around(APIC_LVT0, APIC_DM_NMI);
}
-static void stop_p4_watchdog(void)
+/*
+ * Enable timer based NMIs on all CPUs:
+ */
+void acpi_nmi_enable(void)
{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- wrmsr(wd->cccr_msr, 0, 0);
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
+ if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
+ on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
}
-#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
-
-static int setup_intel_arch_watchdog(void)
+static void __acpi_nmi_disable(void *__unused)
{
- unsigned int ebx;
- union cpuid10_eax eax;
- unsigned int unused;
- unsigned int perfctr_msr, evntsel_msr;
- unsigned int evntsel;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebx indicates event present.
- */
- cpuid(10, &(eax.full), &ebx, &unused, &unused);
- if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
- (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- goto fail;
-
- perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
- evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
-
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- wrmsrl(perfctr_msr, 0UL);
-
- evntsel = ARCH_PERFMON_EVENTSEL_INT
- | ARCH_PERFMON_EVENTSEL_OS
- | ARCH_PERFMON_EVENTSEL_USR
- | ARCH_PERFMON_NMI_EVENT_SEL
- | ARCH_PERFMON_NMI_EVENT_UMASK;
-
- /* setup the timer */
- wrmsr(evntsel_msr, evntsel, 0);
- nmi_hz = adjust_for_32bit_ctr(nmi_hz);
- write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
- wrmsr(evntsel_msr, evntsel, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = 0; //unused
- wd->check_bit = 1ULL << (eax.split.bit_width - 1);
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
+ apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
}
-static void stop_intel_arch_watchdog(void)
+/*
+ * Disable timer based NMIs on all CPUs:
+ */
+void acpi_nmi_disable(void)
{
- unsigned int ebx;
- union cpuid10_eax eax;
- unsigned int unused;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebx indicates event present.
- */
- cpuid(10, &(eax.full), &ebx, &unused, &unused);
- if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
- (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- return;
-
- wrmsr(wd->evntsel_msr, 0, 0);
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
+ if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
+ on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
}
void setup_apic_nmi_watchdog (void *unused)
{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /* only support LOCAL and IO APICs for now */
- if ((nmi_watchdog != NMI_LOCAL_APIC) &&
- (nmi_watchdog != NMI_IO_APIC))
- return;
-
- if (wd->enabled == 1)
- return;
+ if (__get_cpu_var(wd_enabled))
+ return;
/* cheap hack to support suspend/resume */
/* if cpu0 is not active neither should the other cpus */
if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
return;
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 &&
- boot_cpu_data.x86 != 16)
- return;
- if (!setup_k7_watchdog())
- return;
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- if (!setup_intel_arch_watchdog())
- return;
- break;
- }
- switch (boot_cpu_data.x86) {
- case 6:
- if (boot_cpu_data.x86_model > 0xd)
- return;
-
- if (!setup_p6_watchdog())
- return;
- break;
- case 15:
- if (boot_cpu_data.x86_model > 0x4)
- return;
-
- if (!setup_p4_watchdog())
- return;
- break;
- default:
- return;
- }
- break;
- default:
+ switch (nmi_watchdog) {
+ case NMI_LOCAL_APIC:
+ __get_cpu_var(wd_enabled) = 1; /* enable it before to avoid race with handler */
+ if (lapic_watchdog_init(nmi_hz) < 0) {
+ __get_cpu_var(wd_enabled) = 0;
return;
}
+ /* FALL THROUGH */
+ case NMI_IO_APIC:
+ __get_cpu_var(wd_enabled) = 1;
+ atomic_inc(&nmi_active);
}
- wd->enabled = 1;
- atomic_inc(&nmi_active);
}
void stop_apic_nmi_watchdog(void *unused)
{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
/* only support LOCAL and IO APICs for now */
if ((nmi_watchdog != NMI_LOCAL_APIC) &&
(nmi_watchdog != NMI_IO_APIC))
return;
-
- if (wd->enabled == 0)
+ if (__get_cpu_var(wd_enabled) == 0)
return;
-
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- stop_k7_watchdog();
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- stop_intel_arch_watchdog();
- break;
- }
- switch (boot_cpu_data.x86) {
- case 6:
- if (boot_cpu_data.x86_model > 0xd)
- break;
- stop_p6_watchdog();
- break;
- case 15:
- if (boot_cpu_data.x86_model > 0x4)
- break;
- stop_p4_watchdog();
- break;
- }
- break;
- default:
- return;
- }
- }
- wd->enabled = 0;
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ lapic_watchdog_stop();
+ __get_cpu_var(wd_enabled) = 0;
atomic_dec(&nmi_active);
}
@@ -1008,8 +328,6 @@ __kprobes int nmi_watchdog_tick(struct p
unsigned int sum;
int touched = 0;
int cpu = smp_processor_id();
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- u64 dummy;
int rc=0;
/* check for other users first */
@@ -1052,53 +370,20 @@ __kprobes int nmi_watchdog_tick(struct p
alert_counter[cpu] = 0;
}
/* see if the nmi watchdog went off */
- if (wd->enabled) {
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- rdmsrl(wd->perfctr_msr, dummy);
- if (dummy & wd->check_bit){
- /* this wasn't a watchdog timer interrupt */
- goto done;
- }
-
- /* only Intel P4 uses the cccr msr */
- if (wd->cccr_msr != 0) {
- /*
- * P4 quirks:
- * - An overflown perfctr will assert its interrupt
- * until the OVF flag in its CCCR is cleared.
- * - LVTPC is masked on interrupt and must be
- * unmasked by the LVTPC handler.
- */
- rdmsrl(wd->cccr_msr, dummy);
- dummy &= ~P4_CCCR_OVF;
- wrmsrl(wd->cccr_msr, dummy);
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- /* start the cycle over again */
- write_watchdog_counter(wd->perfctr_msr, NULL);
- }
- else if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
- wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
- /* P6 based Pentium M need to re-unmask
- * the apic vector but it doesn't hurt
- * other P6 variant.
- * ArchPerfom/Core Duo also needs this */
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- /* P6/ARCH_PERFMON has 32 bit counter write */
- write_watchdog_counter32(wd->perfctr_msr, NULL);
- } else {
- /* start the cycle over again */
- write_watchdog_counter(wd->perfctr_msr, NULL);
- }
- rc = 1;
- } else if (nmi_watchdog == NMI_IO_APIC) {
- /* don't know how to accurately check for this.
- * just assume it was a watchdog timer interrupt
- * This matches the old behaviour.
- */
- rc = 1;
- }
+ if (!__get_cpu_var(wd_enabled))
+ return rc;
+ switch (nmi_watchdog) {
+ case NMI_LOCAL_APIC:
+ rc |= lapic_wd_event(nmi_hz);
+ break;
+ case NMI_IO_APIC:
+ /* don't know how to accurately check for this.
+ * just assume it was a watchdog timer interrupt
+ * This matches the old behaviour.
+ */
+ rc = 1;
+ break;
}
-done:
return rc;
}
@@ -1143,7 +428,7 @@ int proc_nmi_enabled(struct ctl_table *t
}
if (nmi_watchdog == NMI_DEFAULT) {
- if (nmi_known_cpu() > 0)
+ if (lapic_watchdog_ok())
nmi_watchdog = NMI_LOCAL_APIC;
else
nmi_watchdog = NMI_IO_APIC;
@@ -1179,11 +464,3 @@ void __trigger_all_cpu_backtrace(void)
EXPORT_SYMBOL(nmi_active);
EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
-EXPORT_SYMBOL(reserve_perfctr_nmi);
-EXPORT_SYMBOL(release_perfctr_nmi);
-EXPORT_SYMBOL(reserve_evntsel_nmi);
-EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(disable_timer_nmi_watchdog);
-EXPORT_SYMBOL(enable_timer_nmi_watchdog);
Index: linux/arch/i386/kernel/cpu/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/cpu/Makefile
+++ linux/arch/i386/kernel/cpu/Makefile
@@ -17,3 +17,5 @@ obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
obj-$(CONFIG_CPU_FREQ) += cpufreq/
+
+obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
Index: linux/arch/i386/kernel/cpu/perfctr-watchdog.c
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -0,0 +1,658 @@
+/* local apic based NMI watchdog for various CPUs.
+ This file also handles reservation of performance counters for coordination
+ with other users (like oprofile).
+
+ Note that these events normally don't tick when the CPU idles. This means
+ the frequency varies with CPU load.
+
+ Original code for K7/P6 written by Keith Owens */
+
+#include <linux/percpu.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+#include <linux/smp.h>
+#include <linux/nmi.h>
+#include <asm/apic.h>
+#include <asm/intel_arch_perfmon.h>
+
+struct nmi_watchdog_ctlblk {
+ unsigned int cccr_msr;
+ unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
+ unsigned int evntsel_msr; /* the MSR to select the events to handle */
+};
+
+/* Interface defining a CPU specific perfctr watchdog */
+struct wd_ops {
+ int (*reserve)(void);
+ void (*unreserve)(void);
+ int (*setup)(unsigned nmi_hz);
+ void (*rearm)(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz);
+ void (*stop)(void *);
+ unsigned perfctr;
+ unsigned evntsel;
+ u64 checkbit;
+};
+
+static struct wd_ops *wd_ops;
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
+
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ * different subsystems this reservation system just tries to coordinate
+ * things a little
+ */
+static DECLARE_BITMAP(perfctr_nmi_owner, NMI_MAX_COUNTER_BITS);
+static DECLARE_BITMAP(evntsel_nmi_owner, NMI_MAX_COUNTER_BITS);
+
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+ return wd_ops ? msr - wd_ops->perfctr : 0;
+}
+
+/* converts an msr to an appropriate reservation bit */
+/* returns the bit offset of the event selection register */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+ return wd_ops ? msr - wd_ops->evntsel : 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, perfctr_nmi_owner));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ return (!test_bit(counter, perfctr_nmi_owner));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, perfctr_nmi_owner))
+ return 1;
+ return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, perfctr_nmi_owner);
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ if (!test_and_set_bit(counter, evntsel_nmi_owner))
+ return 1;
+ return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_evntsel_msr_to_bit(msr);
+ BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+ clear_bit(counter, evntsel_nmi_owner);
+}
+
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
+
+void disable_lapic_nmi_watchdog(void)
+{
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+ if (atomic_read(&nmi_active) <= 0)
+ return;
+
+ on_each_cpu(wd_ops->stop, NULL, 0, 1);
+ wd_ops->unreserve();
+
+ BUG_ON(atomic_read(&nmi_active) != 0);
+}
+
+void enable_lapic_nmi_watchdog(void)
+{
+ BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+ /* are we already enabled */
+ if (atomic_read(&nmi_active) != 0)
+ return;
+
+ /* are we lapic aware */
+ if (!wd_ops)
+ return;
+ if (!wd_ops->reserve()) {
+ printk(KERN_ERR "NMI watchdog: cannot reserve perfctrs\n");
+ return;
+ }
+
+ on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+ touch_nmi_watchdog();
+}
+
+/*
+ * Activate the NMI watchdog via the local APIC.
+ */
+
+static unsigned int adjust_for_32bit_ctr(unsigned int hz)
+{
+ u64 counter_val;
+ unsigned int retval = hz;
+
+ /*
+ * On Intel CPUs with P6/ARCH_PERFMON only 32 bits in the counter
+ * are writable, with higher bits sign extending from bit 31.
+ * So, we can only program the counter with 31 bit values and
+ * 32nd bit should be 1, for 33.. to be 1.
+ * Find the appropriate nmi_hz
+ */
+ counter_val = (u64)cpu_khz * 1000;
+ do_div(counter_val, retval);
+ if (counter_val > 0x7fffffffULL) {
+ u64 count = (u64)cpu_khz * 1000;
+ do_div(count, 0x7fffffffUL);
+ retval = count + 1;
+ }
+ return retval;
+}
+
+static void
+write_watchdog_counter(unsigned int perfctr_msr, const char *descr, unsigned nmi_hz)
+{
+ u64 count = (u64)cpu_khz * 1000;
+
+ do_div(count, nmi_hz);
+ if(descr)
+ Dprintk("setting %s to -0x%08Lx\n", descr, count);
+ wrmsrl(perfctr_msr, 0 - count);
+}
+
+static void write_watchdog_counter32(unsigned int perfctr_msr,
+ const char *descr, unsigned nmi_hz)
+{
+ u64 count = (u64)cpu_khz * 1000;
+
+ do_div(count, nmi_hz);
+ if(descr)
+ Dprintk("setting %s to -0x%08Lx\n", descr, count);
+ wrmsr(perfctr_msr, (u32)(-count), 0);
+}
+
+/* AMD K7/K8/Family10h/Family11h support. AMD keeps this interface
+ nicely stable so there is not much variety */
+
+#define K7_EVNTSEL_ENABLE (1 << 22)
+#define K7_EVNTSEL_INT (1 << 20)
+#define K7_EVNTSEL_OS (1 << 17)
+#define K7_EVNTSEL_USR (1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
+#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+
+static int setup_k7_watchdog(unsigned nmi_hz)
+{
+ unsigned int perfctr_msr, evntsel_msr;
+ unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ perfctr_msr = MSR_K7_PERFCTR0;
+ evntsel_msr = MSR_K7_EVNTSEL0;
+
+ wrmsrl(perfctr_msr, 0UL);
+
+ evntsel = K7_EVNTSEL_INT
+ | K7_EVNTSEL_OS
+ | K7_EVNTSEL_USR
+ | K7_NMI_EVENT;
+
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ write_watchdog_counter(perfctr_msr, "K7_PERFCTR0",nmi_hz);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ evntsel |= K7_EVNTSEL_ENABLE;
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ return 1;
+}
+
+static void single_msr_stop_watchdog(void *arg)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ wrmsr(wd->evntsel_msr, 0, 0);
+}
+
+static int single_msr_reserve(void)
+{
+ if (!reserve_perfctr_nmi(wd_ops->perfctr))
+ return 0;
+
+ if (!reserve_evntsel_nmi(wd_ops->evntsel)) {
+ release_perfctr_nmi(wd_ops->perfctr);
+ return 0;
+ }
+ return 1;
+}
+
+static void single_msr_unreserve(void)
+{
+ release_evntsel_nmi(wd_ops->perfctr);
+ release_perfctr_nmi(wd_ops->evntsel);
+}
+
+static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+ /* start the cycle over again */
+ write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
+}
+
+static struct wd_ops k7_wd_ops = {
+ .reserve = single_msr_reserve,
+ .unreserve = single_msr_unreserve,
+ .setup = setup_k7_watchdog,
+ .rearm = single_msr_rearm,
+ .stop = single_msr_stop_watchdog,
+ .perfctr = MSR_K7_PERFCTR0,
+ .evntsel = MSR_K7_EVNTSEL0,
+ .checkbit = 1ULL<<63,
+};
+
+/* Intel Model 6 (PPro+,P2,P3,P-M,Core1) */
+
+#define P6_EVNTSEL0_ENABLE (1 << 22)
+#define P6_EVNTSEL_INT (1 << 20)
+#define P6_EVNTSEL_OS (1 << 17)
+#define P6_EVNTSEL_USR (1 << 16)
+#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79
+#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED
+
+static int setup_p6_watchdog(unsigned nmi_hz)
+{
+ unsigned int perfctr_msr, evntsel_msr;
+ unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ perfctr_msr = MSR_P6_PERFCTR0;
+ evntsel_msr = MSR_P6_EVNTSEL0;
+
+ wrmsrl(perfctr_msr, 0UL);
+
+ evntsel = P6_EVNTSEL_INT
+ | P6_EVNTSEL_OS
+ | P6_EVNTSEL_USR
+ | P6_NMI_EVENT;
+
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ nmi_hz = adjust_for_32bit_ctr(nmi_hz);
+ write_watchdog_counter32(perfctr_msr, "P6_PERFCTR0",nmi_hz);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ evntsel |= P6_EVNTSEL0_ENABLE;
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ return 1;
+}
+
+static void p6_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+ /* P6 based Pentium M need to re-unmask
+ * the apic vector but it doesn't hurt
+ * other P6 variant.
+ * ArchPerfom/Core Duo also needs this */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ /* P6/ARCH_PERFMON has 32 bit counter write */
+ write_watchdog_counter32(wd->perfctr_msr, NULL,nmi_hz);
+}
+
+static struct wd_ops p6_wd_ops = {
+ .reserve = single_msr_reserve,
+ .unreserve = single_msr_unreserve,
+ .setup = setup_p6_watchdog,
+ .rearm = p6_rearm,
+ .stop = single_msr_stop_watchdog,
+ .perfctr = MSR_P6_PERFCTR0,
+ .evntsel = MSR_P6_EVNTSEL0,
+ .checkbit = 1ULL<<39,
+};
+
+/* Intel P4 performance counters. By far the most complicated of all. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
+#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
+#define P4_ESCR_OS (1<<3)
+#define P4_ESCR_USR (1<<2)
+#define P4_CCCR_OVF_PMI0 (1<<26)
+#define P4_CCCR_OVF_PMI1 (1<<27)
+#define P4_CCCR_THRESHOLD(N) ((N)<<20)
+#define P4_CCCR_COMPLEMENT (1<<19)
+#define P4_CCCR_COMPARE (1<<18)
+#define P4_CCCR_REQUIRED (3<<16)
+#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
+#define P4_CCCR_ENABLE (1<<12)
+#define P4_CCCR_OVF (1<<31)
+
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+ CRU_ESCR0 (with any non-null event selector) through a complemented
+ max threshold. [IA32-Vol3, Section 14.9.9] */
+
+static int setup_p4_watchdog(unsigned nmi_hz)
+{
+ unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+ unsigned int evntsel, cccr_val;
+ unsigned int misc_enable, dummy;
+ unsigned int ht_num;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
+ if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
+ return 0;
+
+#ifdef CONFIG_SMP
+ /* detect which hyperthread we are on */
+ if (smp_num_siblings == 2) {
+ unsigned int ebx, apicid;
+
+ ebx = cpuid_ebx(1);
+ apicid = (ebx >> 24) & 0xff;
+ ht_num = apicid & 1;
+ } else
+#endif
+ ht_num = 0;
+
+ /* performance counters are shared resources
+ * assign each hyperthread its own set
+ * (re-use the ESCR0 register, seems safe
+ * and keeps the cccr_val the same)
+ */
+ if (!ht_num) {
+ /* logical cpu 0 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR0;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR0;
+ cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+ } else {
+ /* logical cpu 1 */
+ perfctr_msr = MSR_P4_IQ_PERFCTR1;
+ evntsel_msr = MSR_P4_CRU_ESCR0;
+ cccr_msr = MSR_P4_IQ_CCCR1;
+ cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+ }
+
+ evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+ | P4_ESCR_OS
+ | P4_ESCR_USR;
+
+ cccr_val |= P4_CCCR_THRESHOLD(15)
+ | P4_CCCR_COMPLEMENT
+ | P4_CCCR_COMPARE
+ | P4_CCCR_REQUIRED;
+
+ wrmsr(evntsel_msr, evntsel, 0);
+ wrmsr(cccr_msr, cccr_val, 0);
+ write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0", nmi_hz);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ cccr_val |= P4_CCCR_ENABLE;
+ wrmsr(cccr_msr, cccr_val, 0);
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = cccr_msr;
+ return 1;
+}
+
+static void stop_p4_watchdog(void *arg)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+ wrmsr(wd->cccr_msr, 0, 0);
+ wrmsr(wd->evntsel_msr, 0, 0);
+}
+
+static int p4_reserve(void)
+{
+ if (!reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR0))
+ return 0;
+#ifdef CONFIG_SMP
+ if (smp_num_siblings > 1 && !reserve_perfctr_nmi(MSR_P4_IQ_PERFCTR1))
+ goto fail1;
+#endif
+ if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+ goto fail2;
+ /* RED-PEN why is ESCR1 not reserved here? */
+ return 1;
+ fail2:
+#ifdef CONFIG_SMP
+ if (smp_num_siblings > 1)
+ release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
+ fail1:
+#endif
+ release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
+ return 0;
+}
+
+static void p4_unreserve(void)
+{
+#ifdef CONFIG_SMP
+ if (smp_num_siblings > 1)
+ release_evntsel_nmi(MSR_P4_IQ_PERFCTR1);
+#endif
+ release_evntsel_nmi(MSR_P4_IQ_PERFCTR0);
+ release_perfctr_nmi(MSR_P4_CRU_ESCR0);
+}
+
+static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
+{
+ unsigned dummy;
+ /*
+ * P4 quirks:
+ * - An overflown perfctr will assert its interrupt
+ * until the OVF flag in its CCCR is cleared.
+ * - LVTPC is masked on interrupt and must be
+ * unmasked by the LVTPC handler.
+ */
+ rdmsrl(wd->cccr_msr, dummy);
+ dummy &= ~P4_CCCR_OVF;
+ wrmsrl(wd->cccr_msr, dummy);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ /* start the cycle over again */
+ write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
+}
+
+static struct wd_ops p4_wd_ops = {
+ .reserve = p4_reserve,
+ .unreserve = p4_unreserve,
+ .setup = setup_p4_watchdog,
+ .rearm = p4_rearm,
+ .stop = stop_p4_watchdog,
+ /* RED-PEN this is wrong for the other sibling */
+ .perfctr = MSR_P4_BPU_PERFCTR0,
+ .evntsel = MSR_P4_BSU_ESCR0,
+ .checkbit = 1ULL<<39,
+};
+
+/* Watchdog using the Intel architected PerfMon. Used for Core2 and hopefully
+ all future Intel CPUs. */
+
+#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
+#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
+
+static int setup_intel_arch_watchdog(unsigned nmi_hz)
+{
+ unsigned int ebx;
+ union cpuid10_eax eax;
+ unsigned int unused;
+ unsigned int perfctr_msr, evntsel_msr;
+ unsigned int evntsel;
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+ /*
+ * Check whether the Architectural PerfMon supports
+ * Unhalted Core Cycles Event or not.
+ * NOTE: Corresponding bit = 0 in ebx indicates event present.
+ */
+ cpuid(10, &(eax.full), &ebx, &unused, &unused);
+ if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
+ (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
+ return 0;
+
+ perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
+ evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
+
+ wrmsrl(perfctr_msr, 0UL);
+
+ evntsel = ARCH_PERFMON_EVENTSEL_INT
+ | ARCH_PERFMON_EVENTSEL_OS
+ | ARCH_PERFMON_EVENTSEL_USR
+ | ARCH_PERFMON_NMI_EVENT_SEL
+ | ARCH_PERFMON_NMI_EVENT_UMASK;
+
+ /* setup the timer */
+ wrmsr(evntsel_msr, evntsel, 0);
+ nmi_hz = adjust_for_32bit_ctr(nmi_hz);
+ write_watchdog_counter32(perfctr_msr, "INTEL_ARCH_PERFCTR0", nmi_hz);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
+ wrmsr(evntsel_msr, evntsel, 0);
+
+ wd->perfctr_msr = perfctr_msr;
+ wd->evntsel_msr = evntsel_msr;
+ wd->cccr_msr = 0; //unused
+ wd_ops->checkbit = 1ULL << (eax.split.bit_width - 1);
+ return 1;
+}
+
+static struct wd_ops intel_arch_wd_ops = {
+ .reserve = single_msr_reserve,
+ .unreserve = single_msr_unreserve,
+ .setup = setup_intel_arch_watchdog,
+ .rearm = p6_rearm,
+ .stop = single_msr_stop_watchdog,
+ .perfctr = MSR_ARCH_PERFMON_PERFCTR0,
+ .evntsel = MSR_ARCH_PERFMON_EVENTSEL0,
+};
+
+static void probe_nmi_watchdog(void)
+{
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 &&
+ boot_cpu_data.x86 != 16)
+ return;
+ wd_ops = &k7_wd_ops;
+ break;
+ case X86_VENDOR_INTEL:
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ wd_ops = &intel_arch_wd_ops;
+ break;
+ }
+ switch (boot_cpu_data.x86) {
+ case 6:
+ if (boot_cpu_data.x86_model > 0xd)
+ return;
+
+ wd_ops = &p6_wd_ops;
+ break;
+ case 15:
+ if (boot_cpu_data.x86_model > 0x4)
+ return;
+
+ wd_ops = &p4_wd_ops;
+ break;
+ default:
+ return;
+ }
+ break;
+ }
+}
+
+/* Interface to nmi.c */
+
+int lapic_watchdog_init(unsigned nmi_hz)
+{
+ if (!wd_ops) {
+ probe_nmi_watchdog();
+ if (!wd_ops)
+ return -1;
+ }
+
+ if (!(wd_ops->setup(nmi_hz))) {
+ printk(KERN_ERR "Cannot setup NMI watchdog on CPU %d\n",
+ raw_smp_processor_id());
+ return -1;
+ }
+
+ return 0;
+}
+
+void lapic_watchdog_stop(void)
+{
+ if (wd_ops)
+ wd_ops->stop(NULL);
+}
+
+unsigned lapic_adjust_nmi_hz(unsigned hz)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+ if (wd->perfctr_msr == MSR_P6_PERFCTR0 ||
+ wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
+ hz = adjust_for_32bit_ctr(hz);
+ return hz;
+}
+
+int lapic_wd_event(unsigned nmi_hz)
+{
+ struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+ u64 ctr;
+ rdmsrl(wd->perfctr_msr, ctr);
+ if (ctr & wd_ops->checkbit) { /* perfctr still running? */
+ return 0;
+ }
+ wd_ops->rearm(wd, nmi_hz);
+ return 1;
+}
+
+int lapic_watchdog_ok(void)
+{
+ return wd_ops != NULL;
+}
Index: linux/include/asm-i386/nmi.h
===================================================================
--- linux.orig/include/asm-i386/nmi.h
+++ linux/include/asm-i386/nmi.h
@@ -50,4 +50,12 @@ void __trigger_all_cpu_backtrace(void);
#endif
+void lapic_watchdog_stop(void);
+int lapic_watchdog_init(unsigned nmi_hz);
+int lapic_wd_event(unsigned nmi_hz);
+unsigned lapic_adjust_nmi_hz(unsigned hz);
+int lapic_watchdog_ok(void);
+void disable_lapic_nmi_watchdog(void);
+void enable_lapic_nmi_watchdog(void);
+
#endif /* ASM_NMI_H */
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too.
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (2 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
` (25 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
This mainly removes a lot of code, replacing it with calls into the new 32bit
perfctr-watchdog.c
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/kernel/Makefile | 3
arch/x86_64/kernel/nmi.c | 678 ++------------------------------------------
include/asm-x86_64/nmi.h | 9
3 files changed, 44 insertions(+), 646 deletions(-)
Index: linux/arch/x86_64/kernel/Makefile
===================================================================
--- linux.orig/arch/x86_64/kernel/Makefile
+++ linux/arch/x86_64/kernel/Makefile
@@ -9,7 +9,7 @@ obj-y := process.o signal.o entry.o trap
x8664_ksyms.o i387.o syscall.o vsyscall.o \
setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o sched-clock.o \
- bugs.o
+ bugs.o perfctr-watchdog.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_X86_MCE) += mce.o therm_throt.o
@@ -59,3 +59,4 @@ msr-$(subst m,y,$(CONFIG_X86_MSR)) += .
alternative-y += ../../i386/kernel/alternative.o
pcspeaker-y += ../../i386/kernel/pcspeaker.o
sched-clock-y += ../../i386/kernel/sched-clock.o
+perfctr-watchdog-y += ../../i386/kernel/cpu/perfctr-watchdog.o
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -80,4 +80,13 @@ extern int unknown_nmi_panic;
void __trigger_all_cpu_backtrace(void);
#define trigger_all_cpu_backtrace() __trigger_all_cpu_backtrace()
+
+void lapic_watchdog_stop(void);
+int lapic_watchdog_init(unsigned nmi_hz);
+int lapic_wd_event(unsigned nmi_hz);
+unsigned lapic_adjust_nmi_hz(unsigned hz);
+int lapic_watchdog_ok(void);
+void disable_lapic_nmi_watchdog(void);
+void enable_lapic_nmi_watchdog(void);
+
#endif /* ASM_NMI_H */
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -27,28 +27,11 @@
#include <asm/proto.h>
#include <asm/kdebug.h>
#include <asm/mce.h>
-#include <asm/intel_arch_perfmon.h>
int unknown_nmi_panic;
int nmi_watchdog_enabled;
int panic_on_unrecovered_nmi;
-/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
- * evtsel_nmi_owner tracks the ownership of the event selection
- * - different performance counters/ event selection may be reserved for
- * different subsystems this reservation system just tries to coordinate
- * things a little
- */
-
-/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
- * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
- */
-#define NMI_MAX_COUNTER_BITS 66
-#define NMI_MAX_COUNTER_LONGS BITS_TO_LONGS(NMI_MAX_COUNTER_BITS)
-
-static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[NMI_MAX_COUNTER_LONGS]);
-
static cpumask_t backtrace_mask = CPU_MASK_NONE;
/* nmi_active:
@@ -63,191 +46,11 @@ int panic_on_timeout;
unsigned int nmi_watchdog = NMI_DEFAULT;
static unsigned int nmi_hz = HZ;
-struct nmi_watchdog_ctlblk {
- int enabled;
- u64 check_bit;
- unsigned int cccr_msr;
- unsigned int perfctr_msr; /* the MSR to reset in NMI handler */
- unsigned int evntsel_msr; /* the MSR to select the events to handle */
-};
-static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
+static DEFINE_PER_CPU(short, wd_enabled);
/* local prototypes */
static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
-{
- /* returns the bit offset of the performance counter register */
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return (msr - MSR_K7_PERFCTR0);
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return (msr - MSR_ARCH_PERFMON_PERFCTR0);
- else
- return (msr - MSR_P4_BPU_PERFCTR0);
- }
- return 0;
-}
-
-/* converts an msr to an appropriate reservation bit */
-static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
-{
- /* returns the bit offset of the event selection register */
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return (msr - MSR_K7_EVNTSEL0);
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return (msr - MSR_ARCH_PERFMON_EVENTSEL0);
- else
- return (msr - MSR_P4_BSU_ESCR0);
- }
- return 0;
-}
-
-/* checks for a bit availability (hack for oprofile) */
-int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
-{
- int cpu;
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
- for_each_possible_cpu (cpu) {
- if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
- return 0;
- }
- return 1;
-}
-
-/* checks the an msr for availability */
-int avail_to_resrv_perfctr_nmi(unsigned int msr)
-{
- unsigned int counter;
- int cpu;
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- for_each_possible_cpu (cpu) {
- if (test_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
- return 0;
- }
- return 1;
-}
-
-static int __reserve_perfctr_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- if (!test_and_set_bit(counter, &per_cpu(perfctr_nmi_owner, cpu)))
- return 1;
- return 0;
-}
-
-static void __release_perfctr_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_perfctr_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- clear_bit(counter, &per_cpu(perfctr_nmi_owner, cpu));
-}
-
-int reserve_perfctr_nmi(unsigned int msr)
-{
- int cpu, i;
- for_each_possible_cpu (cpu) {
- if (!__reserve_perfctr_nmi(cpu, msr)) {
- for_each_possible_cpu (i) {
- if (i >= cpu)
- break;
- __release_perfctr_nmi(i, msr);
- }
- return 0;
- }
- }
- return 1;
-}
-
-void release_perfctr_nmi(unsigned int msr)
-{
- int cpu;
- for_each_possible_cpu (cpu)
- __release_perfctr_nmi(cpu, msr);
-}
-
-int __reserve_evntsel_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_evntsel_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- if (!test_and_set_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]))
- return 1;
- return 0;
-}
-
-static void __release_evntsel_nmi(int cpu, unsigned int msr)
-{
- unsigned int counter;
- if (cpu < 0)
- cpu = smp_processor_id();
-
- counter = nmi_evntsel_msr_to_bit(msr);
- BUG_ON(counter > NMI_MAX_COUNTER_BITS);
-
- clear_bit(counter, &per_cpu(evntsel_nmi_owner, cpu)[0]);
-}
-
-int reserve_evntsel_nmi(unsigned int msr)
-{
- int cpu, i;
- for_each_possible_cpu (cpu) {
- if (!__reserve_evntsel_nmi(cpu, msr)) {
- for_each_possible_cpu (i) {
- if (i >= cpu)
- break;
- __release_evntsel_nmi(i, msr);
- }
- return 0;
- }
- }
- return 1;
-}
-
-void release_evntsel_nmi(unsigned int msr)
-{
- int cpu;
- for_each_possible_cpu (cpu) {
- __release_evntsel_nmi(cpu, msr);
- }
-}
-
-static __cpuinit inline int nmi_known_cpu(void)
-{
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- return boot_cpu_data.x86 == 15 || boot_cpu_data.x86 == 16;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
- return 1;
- else
- return (boot_cpu_data.x86 == 15);
- }
- return 0;
-}
-
/* Run after command line and cpu_init init, but before all other checks */
void nmi_watchdog_default(void)
{
@@ -277,23 +80,6 @@ static __init void nmi_cpu_busy(void *da
}
#endif
-static unsigned int adjust_for_32bit_ctr(unsigned int hz)
-{
- unsigned int retval = hz;
-
- /*
- * On Intel CPUs with ARCH_PERFMON only 32 bits in the counter
- * are writable, with higher bits sign extending from bit 31.
- * So, we can only program the counter with 31 bit values and
- * 32nd bit should be 1, for 33.. to be 1.
- * Find the appropriate nmi_hz
- */
- if ((((u64)cpu_khz * 1000) / retval) > 0x7fffffffULL) {
- retval = ((u64)cpu_khz * 1000) / 0x7fffffffUL + 1;
- }
- return retval;
-}
-
int __init check_nmi_watchdog (void)
{
int *counts;
@@ -322,14 +108,14 @@ int __init check_nmi_watchdog (void)
mdelay((20*1000)/nmi_hz); // wait 20 ticks
for_each_online_cpu(cpu) {
- if (!per_cpu(nmi_watchdog_ctlblk, cpu).enabled)
+ if (!per_cpu(wd_enabled, cpu))
continue;
if (cpu_pda(cpu)->__nmi_count - counts[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
cpu,
counts[cpu],
cpu_pda(cpu)->__nmi_count);
- per_cpu(nmi_watchdog_ctlblk, cpu).enabled = 0;
+ per_cpu(wd_enabled, cpu) = 0;
atomic_dec(&nmi_active);
}
}
@@ -344,13 +130,8 @@ int __init check_nmi_watchdog (void)
/* now that we know it works we can reduce NMI frequency to
something more reasonable; makes a difference in some configs */
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- nmi_hz = 1;
- if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1)
- nmi_hz = adjust_for_32bit_ctr(nmi_hz);
- }
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ nmi_hz = lapic_adjust_nmi_hz(1);
kfree(counts);
return 0;
@@ -379,57 +160,6 @@ int __init setup_nmi_watchdog(char *str)
__setup("nmi_watchdog=", setup_nmi_watchdog);
-static void disable_lapic_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
- if (atomic_read(&nmi_active) <= 0)
- return;
-
- on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
- BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-static void enable_lapic_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
-
- /* are we already enabled */
- if (atomic_read(&nmi_active) != 0)
- return;
-
- /* are we lapic aware */
- if (nmi_known_cpu() <= 0)
- return;
-
- on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
- touch_nmi_watchdog();
-}
-
-void disable_timer_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
- if (atomic_read(&nmi_active) <= 0)
- return;
-
- disable_irq(0);
- on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
-
- BUG_ON(atomic_read(&nmi_active) != 0);
-}
-
-void enable_timer_nmi_watchdog(void)
-{
- BUG_ON(nmi_watchdog != NMI_IO_APIC);
-
- if (atomic_read(&nmi_active) == 0) {
- touch_nmi_watchdog();
- on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
- enable_irq(0);
- }
-}
static void __acpi_nmi_disable(void *__unused)
{
@@ -515,275 +245,9 @@ late_initcall(init_lapic_nmi_sysfs);
#endif /* CONFIG_PM */
-/*
- * Activate the NMI watchdog via the local APIC.
- * Original code written by Keith Owens.
- */
-
-/* Note that these events don't tick when the CPU idles. This means
- the frequency varies with CPU load. */
-
-#define K7_EVNTSEL_ENABLE (1 << 22)
-#define K7_EVNTSEL_INT (1 << 20)
-#define K7_EVNTSEL_OS (1 << 17)
-#define K7_EVNTSEL_USR (1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
-#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
-
-static int setup_k7_watchdog(void)
-{
- unsigned int perfctr_msr, evntsel_msr;
- unsigned int evntsel;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- perfctr_msr = MSR_K7_PERFCTR0;
- evntsel_msr = MSR_K7_EVNTSEL0;
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- /* Simulator may not support it */
- if (checking_wrmsrl(evntsel_msr, 0UL))
- goto fail2;
- wrmsrl(perfctr_msr, 0UL);
-
- evntsel = K7_EVNTSEL_INT
- | K7_EVNTSEL_OS
- | K7_EVNTSEL_USR
- | K7_NMI_EVENT;
-
- /* setup the timer */
- wrmsr(evntsel_msr, evntsel, 0);
- wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- evntsel |= K7_EVNTSEL_ENABLE;
- wrmsr(evntsel_msr, evntsel, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = 0; //unused
- wd->check_bit = 1ULL<<63;
- return 1;
-fail2:
- __release_evntsel_nmi(-1, evntsel_msr);
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
-}
-
-static void stop_k7_watchdog(void)
-{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-/* Note that these events don't tick when the CPU idles. This means
- the frequency varies with CPU load. */
-
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7)
-#define P4_ESCR_EVENT_SELECT(N) ((N)<<25)
-#define P4_ESCR_OS (1<<3)
-#define P4_ESCR_USR (1<<2)
-#define P4_CCCR_OVF_PMI0 (1<<26)
-#define P4_CCCR_OVF_PMI1 (1<<27)
-#define P4_CCCR_THRESHOLD(N) ((N)<<20)
-#define P4_CCCR_COMPLEMENT (1<<19)
-#define P4_CCCR_COMPARE (1<<18)
-#define P4_CCCR_REQUIRED (3<<16)
-#define P4_CCCR_ESCR_SELECT(N) ((N)<<13)
-#define P4_CCCR_ENABLE (1<<12)
-#define P4_CCCR_OVF (1<<31)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
- CRU_ESCR0 (with any non-null event selector) through a complemented
- max threshold. [IA32-Vol3, Section 14.9.9] */
-
-static int setup_p4_watchdog(void)
-{
- unsigned int perfctr_msr, evntsel_msr, cccr_msr;
- unsigned int evntsel, cccr_val;
- unsigned int misc_enable, dummy;
- unsigned int ht_num;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
- if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
- return 0;
-
-#ifdef CONFIG_SMP
- /* detect which hyperthread we are on */
- if (smp_num_siblings == 2) {
- unsigned int ebx, apicid;
-
- ebx = cpuid_ebx(1);
- apicid = (ebx >> 24) & 0xff;
- ht_num = apicid & 1;
- } else
-#endif
- ht_num = 0;
-
- /* performance counters are shared resources
- * assign each hyperthread its own set
- * (re-use the ESCR0 register, seems safe
- * and keeps the cccr_val the same)
- */
- if (!ht_num) {
- /* logical cpu 0 */
- perfctr_msr = MSR_P4_IQ_PERFCTR0;
- evntsel_msr = MSR_P4_CRU_ESCR0;
- cccr_msr = MSR_P4_IQ_CCCR0;
- cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
- } else {
- /* logical cpu 1 */
- perfctr_msr = MSR_P4_IQ_PERFCTR1;
- evntsel_msr = MSR_P4_CRU_ESCR0;
- cccr_msr = MSR_P4_IQ_CCCR1;
- cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
- }
-
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- evntsel = P4_ESCR_EVENT_SELECT(0x3F)
- | P4_ESCR_OS
- | P4_ESCR_USR;
-
- cccr_val |= P4_CCCR_THRESHOLD(15)
- | P4_CCCR_COMPLEMENT
- | P4_CCCR_COMPARE
- | P4_CCCR_REQUIRED;
-
- wrmsr(evntsel_msr, evntsel, 0);
- wrmsr(cccr_msr, cccr_val, 0);
- wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- cccr_val |= P4_CCCR_ENABLE;
- wrmsr(cccr_msr, cccr_val, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = cccr_msr;
- wd->check_bit = 1ULL<<39;
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
-}
-
-static void stop_p4_watchdog(void)
-{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- wrmsr(wd->cccr_msr, 0, 0);
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
-#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
-#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK
-
-static int setup_intel_arch_watchdog(void)
-{
- unsigned int ebx;
- union cpuid10_eax eax;
- unsigned int unused;
- unsigned int perfctr_msr, evntsel_msr;
- unsigned int evntsel;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebx indicates event present.
- */
- cpuid(10, &(eax.full), &ebx, &unused, &unused);
- if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
- (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- goto fail;
-
- perfctr_msr = MSR_ARCH_PERFMON_PERFCTR1;
- evntsel_msr = MSR_ARCH_PERFMON_EVENTSEL1;
-
- if (!__reserve_perfctr_nmi(-1, perfctr_msr))
- goto fail;
-
- if (!__reserve_evntsel_nmi(-1, evntsel_msr))
- goto fail1;
-
- wrmsrl(perfctr_msr, 0UL);
-
- evntsel = ARCH_PERFMON_EVENTSEL_INT
- | ARCH_PERFMON_EVENTSEL_OS
- | ARCH_PERFMON_EVENTSEL_USR
- | ARCH_PERFMON_NMI_EVENT_SEL
- | ARCH_PERFMON_NMI_EVENT_UMASK;
-
- /* setup the timer */
- wrmsr(evntsel_msr, evntsel, 0);
-
- nmi_hz = adjust_for_32bit_ctr(nmi_hz);
- wrmsr(perfctr_msr, (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0);
-
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- evntsel |= ARCH_PERFMON_EVENTSEL0_ENABLE;
- wrmsr(evntsel_msr, evntsel, 0);
-
- wd->perfctr_msr = perfctr_msr;
- wd->evntsel_msr = evntsel_msr;
- wd->cccr_msr = 0; //unused
- wd->check_bit = 1ULL << (eax.split.bit_width - 1);
- return 1;
-fail1:
- __release_perfctr_nmi(-1, perfctr_msr);
-fail:
- return 0;
-}
-
-static void stop_intel_arch_watchdog(void)
-{
- unsigned int ebx;
- union cpuid10_eax eax;
- unsigned int unused;
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /*
- * Check whether the Architectural PerfMon supports
- * Unhalted Core Cycles Event or not.
- * NOTE: Corresponding bit = 0 in ebx indicates event present.
- */
- cpuid(10, &(eax.full), &ebx, &unused, &unused);
- if ((eax.split.mask_length < (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX+1)) ||
- (ebx & ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT))
- return;
-
- wrmsr(wd->evntsel_msr, 0, 0);
-
- __release_evntsel_nmi(-1, wd->evntsel_msr);
- __release_perfctr_nmi(-1, wd->perfctr_msr);
-}
-
void setup_apic_nmi_watchdog(void *unused)
{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
- /* only support LOCAL and IO APICs for now */
- if ((nmi_watchdog != NMI_LOCAL_APIC) &&
- (nmi_watchdog != NMI_IO_APIC))
- return;
-
- if (wd->enabled == 1)
+ if (__get_cpu_var(wd_enabled) == 1)
return;
/* cheap hack to support suspend/resume */
@@ -791,62 +255,31 @@ void setup_apic_nmi_watchdog(void *unuse
if ((smp_processor_id() != 0) && (atomic_read(&nmi_active) <= 0))
return;
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
- return;
- if (!setup_k7_watchdog())
- return;
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- if (!setup_intel_arch_watchdog())
- return;
- break;
- }
- if (!setup_p4_watchdog())
- return;
- break;
- default:
+ switch (nmi_watchdog) {
+ case NMI_LOCAL_APIC:
+ __get_cpu_var(wd_enabled) = 1;
+ if (lapic_watchdog_init(nmi_hz) < 0) {
+ __get_cpu_var(wd_enabled) = 0;
return;
}
+ /* FALL THROUGH */
+ case NMI_IO_APIC:
+ __get_cpu_var(wd_enabled) = 1;
+ atomic_inc(&nmi_active);
}
- wd->enabled = 1;
- atomic_inc(&nmi_active);
}
void stop_apic_nmi_watchdog(void *unused)
{
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
-
/* only support LOCAL and IO APICs for now */
if ((nmi_watchdog != NMI_LOCAL_APIC) &&
(nmi_watchdog != NMI_IO_APIC))
return;
-
- if (wd->enabled == 0)
+ if (__get_cpu_var(wd_enabled) == 0)
return;
-
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- switch (boot_cpu_data.x86_vendor) {
- case X86_VENDOR_AMD:
- if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
- return;
- stop_k7_watchdog();
- break;
- case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
- stop_intel_arch_watchdog();
- break;
- }
- stop_p4_watchdog();
- break;
- default:
- return;
- }
- }
- wd->enabled = 0;
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+ lapic_watchdog_stop();
+ __get_cpu_var(wd_enabled) = 0;
atomic_dec(&nmi_active);
}
@@ -885,9 +318,7 @@ int __kprobes nmi_watchdog_tick(struct p
int sum;
int touched = 0;
int cpu = smp_processor_id();
- struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
- u64 dummy;
- int rc=0;
+ int rc = 0;
/* check for other users first */
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
@@ -934,55 +365,20 @@ int __kprobes nmi_watchdog_tick(struct p
}
/* see if the nmi watchdog went off */
- if (wd->enabled) {
- if (nmi_watchdog == NMI_LOCAL_APIC) {
- rdmsrl(wd->perfctr_msr, dummy);
- if (dummy & wd->check_bit){
- /* this wasn't a watchdog timer interrupt */
- goto done;
- }
-
- /* only Intel uses the cccr msr */
- if (wd->cccr_msr != 0) {
- /*
- * P4 quirks:
- * - An overflown perfctr will assert its interrupt
- * until the OVF flag in its CCCR is cleared.
- * - LVTPC is masked on interrupt and must be
- * unmasked by the LVTPC handler.
- */
- rdmsrl(wd->cccr_msr, dummy);
- dummy &= ~P4_CCCR_OVF;
- wrmsrl(wd->cccr_msr, dummy);
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- /* start the cycle over again */
- wrmsrl(wd->perfctr_msr,
- -((u64)cpu_khz * 1000 / nmi_hz));
- } else if (wd->perfctr_msr == MSR_ARCH_PERFMON_PERFCTR1) {
- /*
- * ArchPerfom/Core Duo needs to re-unmask
- * the apic vector
- */
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- /* ARCH_PERFMON has 32 bit counter writes */
- wrmsr(wd->perfctr_msr,
- (u32)(-((u64)cpu_khz * 1000 / nmi_hz)), 0);
- } else {
- /* start the cycle over again */
- wrmsrl(wd->perfctr_msr,
- -((u64)cpu_khz * 1000 / nmi_hz));
- }
- rc = 1;
- } else if (nmi_watchdog == NMI_IO_APIC) {
- /* don't know how to accurately check for this.
- * just assume it was a watchdog timer interrupt
- * This matches the old behaviour.
- */
- rc = 1;
- } else
- printk(KERN_WARNING "Unknown enabled NMI hardware?!\n");
+ if (!__get_cpu_var(wd_enabled))
+ return rc;
+ switch (nmi_watchdog) {
+ case NMI_LOCAL_APIC:
+ rc |= lapic_wd_event(nmi_hz);
+ break;
+ case NMI_IO_APIC:
+ /* don't know how to accurately check for this.
+ * just assume it was a watchdog timer interrupt
+ * This matches the old behaviour.
+ */
+ rc = 1;
+ break;
}
-done:
return rc;
}
@@ -1067,12 +463,4 @@ void __trigger_all_cpu_backtrace(void)
EXPORT_SYMBOL(nmi_active);
EXPORT_SYMBOL(nmi_watchdog);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
-EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
-EXPORT_SYMBOL(reserve_perfctr_nmi);
-EXPORT_SYMBOL(release_perfctr_nmi);
-EXPORT_SYMBOL(reserve_evntsel_nmi);
-EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(disable_timer_nmi_watchdog);
-EXPORT_SYMBOL(enable_timer_nmi_watchdog);
EXPORT_SYMBOL(touch_nmi_watchdog);
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (3 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
` (24 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when
it is not implemented.
This is the same as COMPATIBLE_IOCTL internally, but better self documentng.
Valid reasons to use this:
- It is implemented with ->compat_ioctl on some device, but programs
call it on others too.
- The ioctl is not implemented in the native kernel, but programs
call it commonly anyways.
Most other reasons are not valid.
Signed-off-by: Andi Kleen <ak@suse.de>
---
fs/compat_ioctl.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2396,6 +2396,14 @@ lp_timeout_trans(unsigned int fd, unsign
#define ULONG_IOCTL(cmd) \
{ (cmd), (ioctl_trans_handler_t)sys_ioctl },
+/* ioctl should not be warned about even if it's not implemented.
+ Valid reasons to use this:
+ - It is implemented with ->compat_ioctl on some device, but programs
+ call it on others too.
+ - The ioctl is not implemented in the native kernel, but programs
+ call it commonly anyways.
+ Most other reasons are not valid. */
+#define IGNORE_IOCTL(cmd) COMPATIBLE_IOCTL(cmd)
struct ioctl_trans ioctl_start[] = {
#include <linux/compat_ioctl.h>
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (4 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
` (23 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
The kernel doesn't implement it, but some programs like java use it
anyways. Shut the code up.
Signed-off-by: Andi Kleen <ak@suse.de>
---
fs/compat_ioctl.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2602,6 +2602,8 @@ HANDLE_IOCTL(SIOCGIWENCODEEXT, do_wirele
HANDLE_IOCTL(SIOCSIWPMKSA, do_wireless_ioctl)
HANDLE_IOCTL(SIOCSIFBR, old_bridge_ioctl)
HANDLE_IOCTL(SIOCGIFBR, old_bridge_ioctl)
+/* Not implemented in the native kernel */
+IGNORE_IOCTL(SIOCGIFCOUNT)
HANDLE_IOCTL(RTC_IRQP_READ32, rtc_ioctl)
HANDLE_IOCTL(RTC_IRQP_SET32, rtc_ioctl)
HANDLE_IOCTL(RTC_EPOCH_READ32, rtc_ioctl)
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (5 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
` (22 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: dpreed, patches, linux-kernel
From: David P. Reed <dpreed@reed.com>
- Use 64bit TSC calculations to avoid handling overflow
- Use 32bit unsigned arithmetic for the APIC timer. This
way overflows are handled correctly.
- Fix exit check of loop to account for apic timer counting down
Signed-off-by: dpreed@reed.com
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/kernel/apic.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -839,14 +839,15 @@ static void setup_APIC_timer(unsigned in
static int __init calibrate_APIC_clock(void)
{
- int apic, apic_start, tsc, tsc_start;
+ unsigned apic, apic_start;
+ unsigned long tsc, tsc_start;
int result;
/*
* Put whatever arbitrary (but long enough) timeout
* value into the APIC clock, we just want to get the
* counter running for calibration.
*/
- __setup_APIC_LVTT(1000000000);
+ __setup_APIC_LVTT(4000000000);
apic_start = apic_read(APIC_TMCCT);
#ifdef CONFIG_X86_PM_TIMER
@@ -857,13 +858,13 @@ static int __init calibrate_APIC_clock(v
} else
#endif
{
- rdtscl(tsc_start);
+ rdtscll(tsc_start);
do {
apic = apic_read(APIC_TMCCT);
- rdtscl(tsc);
+ rdtscll(tsc);
} while ((tsc - tsc_start) < TICK_COUNT &&
- (apic - apic_start) < TICK_COUNT);
+ (apic_start - apic) < TICK_COUNT);
result = (apic_start - apic) * 1000L * tsc_khz /
(tsc - tsc_start);
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (6 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 5:57 ` Jeremy Fitzhardinge
2007-05-01 3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
` (21 subsequent siblings)
29 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
This implements new vDSO for x86-64. The concept is similar
to the existing vDSOs on i386 and PPC. x86-64 has had static
vsyscalls before, but these are not flexible enough anymore.
A vDSO is a ELF shared library supplied by the kernel that is mapped into
user address space. The vDSO mapping is randomized for each process
for security reasons.
Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.
The vdso can be disabled with vdso=0
It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall. clock_gettime is significantly faster
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
The new calls are generally faster than the old vsyscall.
TBD: add new benchmarks
Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it)
Weak points:
- glibc support still to be written
The VM interface is partly based on Ingo Molnar's i386 version.
Signed-off-by: Andi Kleen <ak@suse.de>
---
Documentation/kernel-parameters.txt | 2
arch/x86_64/Makefile | 3
arch/x86_64/ia32/ia32_binfmt.c | 1
arch/x86_64/kernel/time.c | 1
arch/x86_64/kernel/vmlinux.lds.S | 12 +++
arch/x86_64/kernel/vsyscall.c | 22 +----
arch/x86_64/mm/init.c | 17 ++++
arch/x86_64/vdso/Makefile | 49 ++++++++++++
arch/x86_64/vdso/vclock_gettime.c | 120 +++++++++++++++++++++++++++++++
arch/x86_64/vdso/vdso-note.S | 25 ++++++
arch/x86_64/vdso/vdso-start.S | 2
arch/x86_64/vdso/vdso.S | 2
arch/x86_64/vdso/vdso.lds.S | 77 ++++++++++++++++++++
arch/x86_64/vdso/vextern.h | 16 ++++
arch/x86_64/vdso/vgetcpu.c | 50 +++++++++++++
arch/x86_64/vdso/vma.c | 137 ++++++++++++++++++++++++++++++++++++
arch/x86_64/vdso/voffset.h | 1
arch/x86_64/vdso/vvar.c | 12 +++
include/asm-x86_64/auxvec.h | 2
include/asm-x86_64/elf.h | 13 +++
include/asm-x86_64/mmu.h | 1
include/asm-x86_64/pgtable.h | 8 +-
include/asm-x86_64/vgtod.h | 29 +++++++
include/asm-x86_64/vsyscall.h | 3
24 files changed, 583 insertions(+), 22 deletions(-)
Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -38,6 +38,7 @@
int sysctl_vsyscall32 = 1;
+#undef ARCH_DLINFO
#define ARCH_DLINFO do { \
if (sysctl_vsyscall32) { \
NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
Index: linux/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux/arch/x86_64/kernel/vmlinux.lds.S
@@ -94,6 +94,9 @@ SECTIONS
.vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
{ *(.vsyscall_gtod_data) }
vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
+ .vsyscall_clock : AT(VLOAD(.vsyscall_clock))
+ { *(.vsyscall_clock) }
+ vsyscall_clock = VVIRT(.vsyscall_clock);
.vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
@@ -153,6 +156,8 @@ SECTIONS
. = ALIGN(4096); /* Init code and data */
__init_begin = .;
+
+
.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
_sinittext = .;
*(.init.text)
@@ -190,6 +195,12 @@ SECTIONS
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
.exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
+/* vdso blob that is mapped into user space */
+ vdso_start = . ;
+ .vdso : AT(ADDR(.vdso) - LOAD_OFFSET) { *(.vdso) }
+ . = ALIGN(4096);
+ vdso_end = .;
+
#ifdef CONFIG_BLK_DEV_INITRD
. = ALIGN(4096);
__initramfs_start = .;
@@ -202,6 +213,7 @@ SECTIONS
.data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
__per_cpu_end = .;
. = ALIGN(4096);
+
__init_end = .;
. = ALIGN(4096);
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -159,6 +159,14 @@ static __init void set_pte_phys(unsigned
__flush_tlb_one(vaddr);
}
+void __init
+set_kernel_map(void *vaddr,unsigned long len,unsigned long phys,pgprot_t prot)
+{
+ void *end = vaddr + ALIGN(len, PAGE_SIZE);
+ for (; vaddr < end; vaddr += PAGE_SIZE, phys += PAGE_SIZE)
+ set_pte_phys((unsigned long)vaddr, phys, prot);
+}
+
/* NOTE: this is meant to be run only at boot */
void __init
__set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
@@ -756,3 +764,12 @@ int in_gate_area_no_task(unsigned long a
{
return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
}
+
+const char *arch_vma_name(struct vm_area_struct *vma)
+{
+ if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
+ return "[vdso]";
+ if (vma == &gate_vma)
+ return "[vsyscall]";
+ return NULL;
+}
Index: linux/arch/x86_64/vdso/vdso-note.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso-note.S
@@ -0,0 +1,25 @@
+/*
+ * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
+ * Here we can supply some information useful to userland.
+ */
+
+#include <linux/uts.h>
+#include <linux/version.h>
+
+#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type) \
+ .section name, flags; \
+ .balign 4; \
+ .long 1f - 0f; /* name length */ \
+ .long 3f - 2f; /* data length */ \
+ .long type; /* note type */ \
+0: .asciz vendor; /* vendor name */ \
+1: .balign 4; \
+2:
+
+#define ASM_ELF_NOTE_END \
+3: .balign 4; /* pad out section */ \
+ .previous
+
+ ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0)
+ .long LINUX_VERSION_CODE
+ ASM_ELF_NOTE_END
Index: linux/arch/x86_64/vdso/vdso.lds.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso.lds.S
@@ -0,0 +1,77 @@
+/*
+ * Linker script for vsyscall DSO. The vsyscall page is an ELF shared
+ * object prelinked to its virtual address, and with only one read-only
+ * segment (that fits in one page). This script controls its layout.
+ */
+#include <asm/asm-offsets.h>
+#include "voffset.h"
+
+#define VDSO_PRELINK 0xffffffffff700000
+
+SECTIONS
+{
+ . = VDSO_PRELINK + SIZEOF_HEADERS;
+
+ .hash : { *(.hash) } :text
+ .gnu.hash : { *(.gnu.hash) }
+ .dynsym : { *(.dynsym) }
+ .dynstr : { *(.dynstr) }
+ .gnu.version : { *(.gnu.version) }
+ .gnu.version_d : { *(.gnu.version_d) }
+ .gnu.version_r : { *(.gnu.version_r) }
+
+ /* This linker script is used both with -r and with -shared.
+ For the layouts to match, we need to skip more than enough
+ space for the dynamic symbol table et al. If this amount
+ is insufficient, ld -shared will barf. Just increase it here. */
+ . = VDSO_PRELINK + VDSO_TEXT_OFFSET;
+
+ .text : { *(.text) } :text
+ .text.ptr : { *(.text.ptr) } :text
+ . = VDSO_PRELINK + 0x900;
+ .data : { *(.data) } :text
+ .bss : { *(.bss) } :text
+
+ .altinstructions : { *(.altinstructions) } :text
+ .altinstr_replacement : { *(.altinstr_replacement) } :text
+
+ .note : { *(.note.*) } :text :note
+ .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr
+ .eh_frame : { KEEP (*(.eh_frame)) } :text
+ .dynamic : { *(.dynamic) } :text :dynamic
+ .useless : {
+ *(.got.plt) *(.got)
+ *(.gnu.linkonce.d.*)
+ *(.dynbss)
+ *(.gnu.linkonce.b.*)
+ } :text
+}
+
+/*
+ * We must supply the ELF program headers explicitly to get just one
+ * PT_LOAD segment, and set the flags explicitly to make segments read-only.
+ */
+PHDRS
+{
+ text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
+ dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+ note PT_NOTE FLAGS(4); /* PF_R */
+ eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
+}
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+ LINUX_2.6 {
+ global:
+ clock_gettime;
+ __vdso_clock_gettime;
+ gettimeofday;
+ __vdso_gettimeofday;
+ getcpu;
+ __vdso_getcpu;
+ local: *;
+ };
+}
Index: linux/arch/x86_64/vdso/Makefile
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/Makefile
@@ -0,0 +1,49 @@
+#
+# x86-64 vDSO.
+#
+
+# files to link into the vdso
+# vdso-start.o has to be first
+vobjs-y := vdso-start.o vdso-note.o vclock_gettime.o vgetcpu.o vvar.o
+
+# files to link into kernel
+obj-y := vma.o vdso.o vdso-syms.o
+
+vobjs := $(foreach F,$(vobjs-y),$(obj)/$F)
+
+$(obj)/vdso.o: $(obj)/vdso.so
+
+targets += vdso.so vdso.lds $(vobjs-y) vdso-syms.o
+
+# The DSO images are built using a special linker script.
+quiet_cmd_syscall = SYSCALL $@
+ cmd_syscall = $(CC) -m elf_x86_64 -nostdlib $(SYSCFLAGS_$(@F)) \
+ -Wl,-T,$(filter-out FORCE,$^) -o $@
+
+export CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
+
+vdso-flags = -fPIC -shared -Wl,-soname=linux-vdso.so.1 \
+ $(call ld-option, -Wl$(comma)--hash-style=sysv) \
+ -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
+SYSCFLAGS_vdso.so = $(vdso-flags)
+
+$(obj)/vdso.o: $(src)/vdso.S $(obj)/vdso.so
+
+$(obj)/vdso.so: $(src)/vdso.lds $(vobjs) FORCE
+ $(call if_changed,syscall)
+
+CF := $(PROFILING) -mcmodel=small -fPIC -g0 -O2 -fasynchronous-unwind-tables -m64
+
+$(obj)/vclock_gettime.o: CFLAGS = $(CF)
+$(obj)/vgetcpu.o: CFLAGS = $(CF)
+
+# We also create a special relocatable object that should mirror the symbol
+# table and layout of the linked DSO. With ld -R we can then refer to
+# these symbols in the kernel code rather than hand-coded addresses.
+extra-y += vdso-syms.o
+$(obj)/built-in.o: $(obj)/vdso-syms.o
+$(obj)/built-in.o: ld_flags += -R $(obj)/vdso-syms.o
+
+SYSCFLAGS_vdso-syms.o = -r -d
+$(obj)/vdso-syms.o: $(src)/vdso.lds $(vobjs) FORCE
+ $(call if_changed,syscall)
Index: linux/arch/x86_64/vdso/vclock_gettime.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vclock_gettime.c
@@ -0,0 +1,120 @@
+/*
+ * Copyright 2006 Andi Kleen, SUSE Labs.
+ * Subject to the GNU Public License, v.2
+ *
+ * Fast user context implementation of clock_gettime and gettimeofday.
+ *
+ * The code should have no internal unresolved relocations.
+ * Check with readelf after changing.
+ * Also alternative() doesn't work.
+ */
+
+#include <linux/kernel.h>
+#include <linux/posix-timers.h>
+#include <linux/time.h>
+#include <linux/string.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include <asm/timex.h>
+#include <asm/hpet.h>
+#include <asm/unistd.h>
+#include <asm/io.h>
+#include <asm/vgtod.h>
+#include "vextern.h"
+
+#define gtod vdso_vsyscall_gtod_data
+
+static long vdso_fallback_gettime(long clock, struct timespec *ts)
+{
+ long ret;
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
+ return ret;
+}
+
+static inline long vgetns(void)
+{
+ cycles_t (*vread)(void);
+ vread = gtod->clock.vread;
+ return ((vread() - gtod->clock.cycle_last) * gtod->clock.mult) >>
+ gtod->clock.shift;
+}
+
+static noinline int do_realtime(struct timespec *ts)
+{
+ unsigned long seq, ns;
+ do {
+ seq = read_seqbegin(>od->lock);
+ ts->tv_sec = gtod->wall_time_sec;
+ ts->tv_nsec = gtod->wall_time_nsec;
+ ns = vgetns();
+ } while (unlikely(read_seqretry(>od->lock, seq)));
+ timespec_add_ns(ts, ns);
+ return 0;
+}
+
+/* Copy of the version in kernel/time.c which we cannot directly access */
+static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
+{
+ while (nsec >= NSEC_PER_SEC) {
+ nsec -= NSEC_PER_SEC;
+ ++sec;
+ }
+ while (nsec < 0) {
+ nsec += NSEC_PER_SEC;
+ --sec;
+ }
+ ts->tv_sec = sec;
+ ts->tv_nsec = nsec;
+}
+
+static noinline int do_monotonic(struct timespec *ts)
+{
+ unsigned long seq, ns, secs;
+ do {
+ seq = read_seqbegin(>od->lock);
+ secs = gtod->wall_time_sec;
+ ns = gtod->wall_time_nsec + vgetns();
+ secs += gtod->wall_to_monotonic.tv_sec;
+ ns += gtod->wall_to_monotonic.tv_nsec;
+ } while (unlikely(read_seqretry(>od->lock, seq)));
+ vset_normalized_timespec(ts, secs, ns);
+ return 0;
+}
+
+int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
+{
+ if (likely(gtod->sysctl_enabled && gtod->clock.vread))
+ switch (clock) {
+ case CLOCK_REALTIME:
+ return do_realtime(ts);
+ case CLOCK_MONOTONIC:
+ return do_monotonic(ts);
+ }
+ return vdso_fallback_gettime(clock, ts);
+}
+int clock_gettime(clockid_t, struct timespec *)
+ __attribute__((weak, alias("__vdso_clock_gettime")));
+
+int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
+{
+ long ret;
+ if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
+ BUILD_BUG_ON(offsetof(struct timeval, tv_usec) !=
+ offsetof(struct timespec, tv_nsec) ||
+ sizeof(*tv) != sizeof(struct timespec));
+ do_realtime((struct timespec *)tv);
+ tv->tv_usec /= 1000;
+ if (unlikely(tz != NULL)) {
+ /* This relies on gcc inlining the memcpy. We'll notice
+ if it ever fails to do so. */
+ memcpy(tz, >od->sys_tz, sizeof(struct timezone));
+ }
+ return 0;
+ }
+ asm("syscall" : "=a" (ret) :
+ "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory");
+ return ret;
+}
+int gettimeofday(struct timeval *, struct timezone *)
+ __attribute__((weak, alias("__vdso_gettimeofday")));
Index: linux/arch/x86_64/vdso/vma.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vma.c
@@ -0,0 +1,137 @@
+/*
+ * Set up the VMAs to tell the VM about the vDSO.
+ * Copyright 2007 Andi Kleen, SUSE Labs.
+ * Subject to the GPL, v.2
+ */
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <linux/random.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include <asm/proto.h>
+#include "voffset.h"
+
+int vdso_enabled = 1;
+
+#define VEXTERN(x) extern typeof(__ ## x) *vdso_ ## x;
+#include "vextern.h"
+#undef VEXTERN
+
+extern char vdso_kernel_start[], vdso_start[], vdso_end[];
+extern unsigned short vdso_sync_cpuid;
+
+struct page **vdso_pages;
+
+static inline void *var_ref(void *vbase, char *var, char *name)
+{
+ unsigned offset = var - &vdso_kernel_start[0] + VDSO_TEXT_OFFSET;
+ void *p = vbase + offset;
+ if (*(void **)p != (void *)VMAGIC) {
+ printk("VDSO: variable %s broken\n", name);
+ vdso_enabled = 0;
+ }
+ return p;
+}
+
+static int __init init_vdso_vars(void)
+{
+ int npages = (vdso_end - vdso_start + PAGE_SIZE - 1) / PAGE_SIZE;
+ int i;
+ char *vbase;
+
+ vdso_pages = kmalloc(sizeof(struct page *) * npages, GFP_KERNEL);
+ if (!vdso_pages)
+ goto oom;
+ for (i = 0; i < npages; i++) {
+ struct page *p;
+ p = alloc_page(GFP_KERNEL);
+ if (!p)
+ goto oom;
+ vdso_pages[i] = p;
+ copy_page(page_address(p), vdso_start + i*PAGE_SIZE);
+ }
+
+ vbase = vmap(vdso_pages, npages, 0, PAGE_KERNEL);
+ if (!vbase)
+ goto oom;
+
+ if (memcmp(vbase, "\177ELF", 4)) {
+ printk("VDSO: I'm broken; not ELF\n");
+ vdso_enabled = 0;
+ }
+
+#define V(x) *(typeof(x) *) var_ref(vbase, (char *)RELOC_HIDE(&x, 0), #x)
+#define VEXTERN(x) \
+ V(vdso_ ## x) = &__ ## x;
+#include "vextern.h"
+#undef VEXTERN
+ return 0;
+
+ oom:
+ printk("Cannot allocate vdso\n");
+ vdso_enabled = 0;
+ return -ENOMEM;
+}
+__initcall(init_vdso_vars);
+
+struct linux_binprm;
+
+/* Put the vdso above the (randomized) stack with another randomized offset.
+ This way there is no hole in the middle of address space.
+ To save memory make sure it is still in the same PTE as the stack top.
+ This doesn't give that many random bits */
+static unsigned long vdso_addr(unsigned long start, unsigned len)
+{
+ unsigned long addr, end;
+ unsigned offset;
+ end = (start + PMD_SIZE - 1) & PMD_MASK;
+ if (end >= TASK_SIZE64)
+ end = TASK_SIZE64;
+ end -= len;
+ /* This loses some more bits than a modulo, but is cheaper */
+ offset = get_random_int() & (PTRS_PER_PTE - 1);
+ addr = start + (offset << PAGE_SHIFT);
+ if (addr >= end)
+ addr = end;
+ return addr;
+}
+
+/* Setup a VMA at program startup for the vsyscall page.
+ Not called for compat tasks */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long addr;
+ int ret;
+ unsigned len = round_up(vdso_end - vdso_start, PAGE_SIZE);
+
+ if (!vdso_enabled)
+ return 0;
+
+ down_write(&mm->mmap_sem);
+ addr = get_unmapped_area(NULL, vdso_addr(mm->start_stack, len), len, 0, 0);
+ if (IS_ERR_VALUE(addr)) {
+ ret = addr;
+ goto up_fail;
+ }
+
+ ret = install_special_mapping(mm, addr, len,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
+ VM_ALWAYSDUMP,
+ vdso_pages);
+ if (ret)
+ goto up_fail;
+
+ current->mm->context.vdso = (void *)addr;
+up_fail:
+ up_write(&mm->mmap_sem);
+ return ret;
+}
+
+static __init int vdso_setup(char *s)
+{
+ vdso_enabled = simple_strtoul(s, NULL, 0);
+ return 0;
+}
+__setup("vdso=", vdso_setup);
Index: linux/arch/x86_64/vdso/vdso.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso.S
@@ -0,0 +1,2 @@
+ .section ".vdso","a"
+ .incbin "arch/x86_64/vdso/vdso.so"
Index: linux/arch/x86_64/vdso/vdso-start.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vdso-start.S
@@ -0,0 +1,2 @@
+ .globl vdso_kernel_start
+vdso_kernel_start:
Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -77,7 +77,8 @@ head-y := arch/x86_64/kernel/head.o arch
libs-y += arch/x86_64/lib/
core-y += arch/x86_64/kernel/ \
arch/x86_64/mm/ \
- arch/x86_64/crypto/
+ arch/x86_64/crypto/ \
+ arch/x86_64/vdso/
core-$(CONFIG_IA32_EMULATION) += arch/x86_64/ia32/
drivers-$(CONFIG_PCI) += arch/x86_64/pci/
drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -415,6 +415,9 @@ void vmalloc_sync_all(void);
extern int kern_addr_valid(unsigned long addr);
+extern void set_kernel_map(void *vaddr, unsigned long len,
+ unsigned long phys, pgprot_t prot);
+
#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
remap_pfn_range(vma, vaddr, pfn, size, prot)
@@ -442,6 +445,9 @@ extern int kern_addr_valid(unsigned long
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTE_SAME
#include <asm-generic/pgtable.h>
-#endif /* !__ASSEMBLY__ */
+extern void
+fix_set_pte_phys(unsigned long vaddr, unsigned long phys, pgprot_t prot);
+
+#endif /* !__ASSEMBLY__ */
#endif /* _X86_64_PGTABLE_H */
Index: linux/include/asm-x86_64/mmu.h
===================================================================
--- linux.orig/include/asm-x86_64/mmu.h
+++ linux/include/asm-x86_64/mmu.h
@@ -15,6 +15,7 @@ typedef struct {
rwlock_t ldtlock;
int size;
struct semaphore sem;
+ void *vdso;
} mm_context_t;
#endif
Index: linux/include/asm-x86_64/vsyscall.h
===================================================================
--- linux.orig/include/asm-x86_64/vsyscall.h
+++ linux/include/asm-x86_64/vsyscall.h
@@ -22,6 +22,8 @@ enum vsyscall_num {
/* Definitions for CONFIG_GENERIC_TIME definitions */
#define __section_vsyscall_gtod_data __attribute__ \
((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
+#define __section_vsyscall_clock __attribute__ \
+ ((unused, __section__ (".vsyscall_clock"),aligned(16)))
#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))
#define VGETCPU_RDTSCP 1
@@ -36,7 +38,6 @@ extern volatile unsigned long __jiffies;
/* kernel space (writeable) */
extern int vgetcpu_mode;
extern struct timezone sys_tz;
-extern struct vsyscall_gtod_data_t vsyscall_gtod_data;
#endif /* __KERNEL__ */
Index: linux/include/asm-x86_64/auxvec.h
===================================================================
--- linux.orig/include/asm-x86_64/auxvec.h
+++ linux/include/asm-x86_64/auxvec.h
@@ -1,4 +1,6 @@
#ifndef __ASM_X86_64_AUXVEC_H
#define __ASM_X86_64_AUXVEC_H
+#define AT_SYSINFO_EHDR 33
+
#endif
Index: linux/include/asm-x86_64/elf.h
===================================================================
--- linux.orig/include/asm-x86_64/elf.h
+++ linux/include/asm-x86_64/elf.h
@@ -162,6 +162,19 @@ extern int dump_task_fpu (struct task_st
/* 1GB for 64bit, 8MB for 32bit */
#define STACK_RND_MASK (test_thread_flag(TIF_IA32) ? 0x7ff : 0x3fffff)
+
+#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
+struct linux_binprm;
+extern int arch_setup_additional_pages(struct linux_binprm *bprm,
+ int executable_stack);
+
+extern int vdso_enabled;
+
+#define ARCH_DLINFO \
+do if (vdso_enabled) { \
+ NEW_AUX_ENT(AT_SYSINFO_EHDR,(unsigned long)current->mm->context.vdso);\
+} while (0)
+
#endif
#endif
Index: linux/arch/x86_64/vdso/vextern.h
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vextern.h
@@ -0,0 +1,16 @@
+#ifndef VEXTERN
+#include <asm/vsyscall.h>
+#define VEXTERN(x) \
+ extern typeof(x) *vdso_ ## x __attribute__((visibility("hidden")));
+#endif
+
+#define VMAGIC 0xfeedbabeabcdefabUL
+
+/* Any kernel variables used in the vDSO must be exported in the main
+ kernel's vmlinux.lds.S/vsyscall.h/proper __section and
+ put into vextern.h and be referenced as a pointer with vdso prefix.
+ The main kernel later fills in the values. */
+
+VEXTERN(jiffies)
+VEXTERN(vgetcpu_mode)
+VEXTERN(vsyscall_gtod_data)
Index: linux/arch/x86_64/vdso/vgetcpu.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vgetcpu.c
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2006 Andi Kleen, SUSE Labs.
+ * Subject to the GNU Public License, v.2
+ *
+ * Fast user context implementation of getcpu()
+ */
+
+#include <linux/kernel.h>
+#include <linux/getcpu.h>
+#include <linux/jiffies.h>
+#include <linux/time.h>
+#include <asm/vsyscall.h>
+#include <asm/vgtod.h>
+#include "vextern.h"
+
+long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+{
+ unsigned int dummy, p;
+ unsigned long j = 0;
+
+ /* Fast cache - only recompute value once per jiffies and avoid
+ relatively costly rdtscp/cpuid otherwise.
+ This works because the scheduler usually keeps the process
+ on the same CPU and this syscall doesn't guarantee its
+ results anyways.
+ We do this here because otherwise user space would do it on
+ its own in a likely inferior way (no access to jiffies).
+ If you don't like it pass NULL. */
+ if (tcache && tcache->blob[0] == (j = *vdso_jiffies)) {
+ p = tcache->blob[1];
+ } else if (*vdso_vgetcpu_mode == VGETCPU_RDTSCP) {
+ /* Load per CPU data from RDTSCP */
+ rdtscp(dummy, dummy, p);
+ } else {
+ /* Load per CPU data from GDT */
+ asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
+ }
+ if (tcache) {
+ tcache->blob[0] = j;
+ tcache->blob[1] = p;
+ }
+ if (cpu)
+ *cpu = p & 0xfff;
+ if (node)
+ *node = p >> 12;
+ return 0;
+}
+
+long getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+ __attribute__((weak, alias("__vdso_getcpu")));
Index: linux/arch/x86_64/vdso/vvar.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/vvar.c
@@ -0,0 +1,12 @@
+/* Define pointer to external vDSO variables.
+ These are part of the vDSO. The kernel fills in the real addresses
+ at boot time. This is done because when the vdso is linked the
+ kernel isn't yet and we don't know the final addresses. */
+#include <linux/kernel.h>
+#include <linux/time.h>
+#include <asm/vsyscall.h>
+#include <asm/timex.h>
+#include <asm/vgtod.h>
+
+#define VEXTERN(x) typeof (__ ## x) *vdso_ ## x = (void *)VMAGIC;
+#include "vextern.h"
Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -1813,7 +1813,7 @@ and is between 256 and 4096 characters.
usbhid.mousepoll=
[USBHID] The interval which mice are to be polled at.
- vdso= [IA-32,SH]
+ vdso= [IA-32,SH,x86-64]
vdso=2: enable compat VDSO (default with COMPAT_VDSO)
vdso=1: enable VDSO (default)
vdso=0: disable VDSO mapping
Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -42,6 +42,7 @@
#include <asm/segment.h>
#include <asm/desc.h>
#include <asm/topology.h>
+#include <asm/vgtod.h>
#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
#define __syscall_clobber "r11","rcx","memory"
@@ -57,26 +58,9 @@
* - writen by timer interrupt or systcl (/proc/sys/kernel/vsyscall64)
* Try to keep this structure as small as possible to avoid cache line ping pongs
*/
-struct vsyscall_gtod_data_t {
- seqlock_t lock;
-
- /* open coded 'struct timespec' */
- time_t wall_time_sec;
- u32 wall_time_nsec;
-
- int sysctl_enabled;
- struct timezone sys_tz;
- struct { /* extract of a clocksource struct */
- cycle_t (*vread)(void);
- cycle_t cycle_last;
- cycle_t mask;
- u32 mult;
- u32 shift;
- } clock;
-};
int __vgetcpu_mode __section_vgetcpu_mode;
-struct vsyscall_gtod_data_t __vsyscall_gtod_data __section_vsyscall_gtod_data =
+struct vsyscall_gtod_data __vsyscall_gtod_data __section_vsyscall_gtod_data =
{
.lock = SEQLOCK_UNLOCKED,
.sysctl_enabled = 1,
@@ -96,6 +80,8 @@ void update_vsyscall(struct timespec *wa
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
vsyscall_gtod_data.sys_tz = sys_tz;
+ vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
+ vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
Index: linux/arch/x86_64/kernel/time.c
===================================================================
--- linux.orig/arch/x86_64/kernel/time.c
+++ linux/arch/x86_64/kernel/time.c
@@ -44,6 +44,7 @@
#include <asm/hpet.h>
#include <asm/mpspec.h>
#include <asm/nmi.h>
+#include <asm/vgtod.h>
static char *timename = NULL;
Index: linux/include/asm-x86_64/vgtod.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/vgtod.h
@@ -0,0 +1,29 @@
+#ifndef _ASM_VGTOD_H
+#define _ASM_VGTOD_H 1
+
+#include <asm/vsyscall.h>
+#include <linux/clocksource.h>
+
+struct vsyscall_gtod_data {
+ seqlock_t lock;
+
+ /* open coded 'struct timespec' */
+ time_t wall_time_sec;
+ u32 wall_time_nsec;
+
+ int sysctl_enabled;
+ struct timezone sys_tz;
+ struct { /* extract of a clocksource struct */
+ cycle_t (*vread)(void);
+ cycle_t cycle_last;
+ cycle_t mask;
+ u32 mult;
+ u32 shift;
+ } clock;
+ struct timespec wall_to_monotonic;
+};
+extern struct vsyscall_gtod_data __vsyscall_gtod_data
+__section_vsyscall_gtod_data;
+extern struct vsyscall_gtod_data vsyscall_gtod_data;
+
+#endif
Index: linux/arch/x86_64/vdso/voffset.h
===================================================================
--- /dev/null
+++ linux/arch/x86_64/vdso/voffset.h
@@ -0,0 +1 @@
+#define VDSO_TEXT_OFFSET 0x500
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
2007-05-01 3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
@ 2007-05-01 5:57 ` Jeremy Fitzhardinge
2007-05-01 7:23 ` Andi Kleen
0 siblings, 1 reply; 52+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01 5:57 UTC (permalink / raw)
To: Andi Kleen; +Cc: patches, linux-kernel
Andi Kleen wrote:
> This implements new vDSO for x86-64. The concept is similar
> to the existing vDSOs on i386 and PPC. x86-64 has had static
> vsyscalls before, but these are not flexible enough anymore.
>
> A vDSO is a ELF shared library supplied by the kernel that is mapped into
> user address space. The vDSO mapping is randomized for each process
> for security reasons.
>
> Doing this was needed for clock_gettime, because clock_gettime
> always needs a syscall fallback and having one at a fixed
> address would have made buffer overflow exploits too easy to write.
>
> The vdso can be disabled with vdso=0
>
> It currently includes a new gettimeofday implemention and optimized
> clock_gettime(). The gettimeofday implementation is slightly faster
> than the one in the old vsyscall. clock_gettime is significantly faster
> than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
>
> The new calls are generally faster than the old vsyscall.
>
> TBD: add new benchmarks
>
> Advantages over the old x86-64 vsyscalls:
> - Extensible
> - Randomized
> - Cleaner
> - Easier to virtualize (the old static address range previously causes
> overhead e.g. for Xen because it has to create special page tables for it)
>
> Weak points:
> - glibc support still to be written
>
> The VM interface is partly based on Ingo Molnar's i386 version.
>
> Signed-off-by: Andi Kleen <ak@suse.de>
>
> ---
> Documentation/kernel-parameters.txt | 2
> arch/x86_64/Makefile | 3
> arch/x86_64/ia32/ia32_binfmt.c | 1
> arch/x86_64/kernel/time.c | 1
> arch/x86_64/kernel/vmlinux.lds.S | 12 +++
> arch/x86_64/kernel/vsyscall.c | 22 +----
> arch/x86_64/mm/init.c | 17 ++++
> arch/x86_64/vdso/Makefile | 49 ++++++++++++
> arch/x86_64/vdso/vclock_gettime.c | 120 +++++++++++++++++++++++++++++++
> arch/x86_64/vdso/vdso-note.S | 25 ++++++
> arch/x86_64/vdso/vdso-start.S | 2
> arch/x86_64/vdso/vdso.S | 2
> arch/x86_64/vdso/vdso.lds.S | 77 ++++++++++++++++++++
> arch/x86_64/vdso/vextern.h | 16 ++++
> arch/x86_64/vdso/vgetcpu.c | 50 +++++++++++++
> arch/x86_64/vdso/vma.c | 137 ++++++++++++++++++++++++++++++++++++
> arch/x86_64/vdso/voffset.h | 1
> arch/x86_64/vdso/vvar.c | 12 +++
> include/asm-x86_64/auxvec.h | 2
> include/asm-x86_64/elf.h | 13 +++
> include/asm-x86_64/mmu.h | 1
> include/asm-x86_64/pgtable.h | 8 +-
> include/asm-x86_64/vgtod.h | 29 +++++++
> include/asm-x86_64/vsyscall.h | 3
> 24 files changed, 583 insertions(+), 22 deletions(-)
>
> Index: linux/arch/x86_64/ia32/ia32_binfmt.c
> ===================================================================
> --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
> +++ linux/arch/x86_64/ia32/ia32_binfmt.c
> @@ -38,6 +38,7 @@
>
> int sysctl_vsyscall32 = 1;
>
> +#undef ARCH_DLINFO
> #define ARCH_DLINFO do { \
> if (sysctl_vsyscall32) { \
> NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
> Index: linux/arch/x86_64/kernel/vmlinux.lds.S
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
> +++ linux/arch/x86_64/kernel/vmlinux.lds.S
> @@ -94,6 +94,9 @@ SECTIONS
> .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
> { *(.vsyscall_gtod_data) }
> vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
> + .vsyscall_clock : AT(VLOAD(.vsyscall_clock))
> + { *(.vsyscall_clock) }
> + vsyscall_clock = VVIRT(.vsyscall_clock);
>
>
> .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
> @@ -153,6 +156,8 @@ SECTIONS
>
> . = ALIGN(4096); /* Init code and data */
> __init_begin = .;
> +
> +
> .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
> _sinittext = .;
> *(.init.text)
> @@ -190,6 +195,12 @@ SECTIONS
> .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
> .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
>
> +/* vdso blob that is mapped into user space */
> + vdso_start = . ;
> + .vdso : AT(ADDR(.vdso) - LOAD_OFFSET) { *(.vdso) }
> + . = ALIGN(4096);
> + vdso_end = .;
> +
> #ifdef CONFIG_BLK_DEV_INITRD
> . = ALIGN(4096);
> __initramfs_start = .;
> @@ -202,6 +213,7 @@ SECTIONS
> .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
> __per_cpu_end = .;
> . = ALIGN(4096);
> +
> __init_end = .;
>
> . = ALIGN(4096);
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -159,6 +159,14 @@ static __init void set_pte_phys(unsigned
> __flush_tlb_one(vaddr);
> }
>
> +void __init
> +set_kernel_map(void *vaddr,unsigned long len,unsigned long phys,pgprot_t prot)
> +{
> + void *end = vaddr + ALIGN(len, PAGE_SIZE);
> + for (; vaddr < end; vaddr += PAGE_SIZE, phys += PAGE_SIZE)
> + set_pte_phys((unsigned long)vaddr, phys, prot);
> +}
> +
> /* NOTE: this is meant to be run only at boot */
> void __init
> __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
> @@ -756,3 +764,12 @@ int in_gate_area_no_task(unsigned long a
> {
> return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
> }
> +
> +const char *arch_vma_name(struct vm_area_struct *vma)
> +{
> + if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
> + return "[vdso]";
> + if (vma == &gate_vma)
> + return "[vsyscall]";
> + return NULL;
> +}
> Index: linux/arch/x86_64/vdso/vdso-note.S
> ===================================================================
> --- /dev/null
> +++ linux/arch/x86_64/vdso/vdso-note.S
> @@ -0,0 +1,25 @@
> +/*
> + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
> + * Here we can supply some information useful to userland.
> + */
> +
> +#include <linux/uts.h>
> +#include <linux/version.h>
> +
> +#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type) \
>
Use linux/elfnote.h?
J
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
2007-05-01 5:57 ` Jeremy Fitzhardinge
@ 2007-05-01 7:23 ` Andi Kleen
2007-05-01 8:00 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 7:23 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: patches, linux-kernel
> Use linux/elfnote.h?
Good point. Will fix. I just copied it from i386, but it probably
should be there fixed too.
Actually they are not completely equivalent because the old macro
generates the section name differently from the name in the payload
(kernelversion, Linux). But I split that up into two notes now.
If that is done on i386 it might break gdb reading core files though?
BTW if anybody noticed the patch also contained a few dead hunks (two unused functions/
prototypes and some white space change). These are already gone now too.
-Andi
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
2007-05-01 7:23 ` Andi Kleen
@ 2007-05-01 8:00 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 52+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01 8:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: patches, linux-kernel
Andi Kleen wrote:
>> Use linux/elfnote.h?
>>
>
> Good point. Will fix. I just copied it from i386, but it probably
> should be there fixed too.
>
Already has been. I posted a patch a day or so ago (reposted below).
> Actually they are not completely equivalent because the old macro
> generates the section name differently from the name in the payload
> (kernelversion, Linux). But I split that up into two notes now.
>
Roland agreed that it doesn't matter.
> If that is done on i386 it might break gdb reading core files though?
>
No. The notename in the final output is the same either way. The
linker packs .note.* into a single .note section.
Subject: use elfnote.h to generate vsyscall notes.
Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally. Changes elfnote.h a bit to suite, since this is the first
asm user, and it wasn't quite right.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
---
arch/i386/kernel/vsyscall-note.S | 23 ++++++-----------------
include/linux/elfnote.h | 18 +++++++++++++-----
2 files changed, 19 insertions(+), 22 deletions(-)
===================================================================
--- a/arch/i386/kernel/vsyscall-note.S
+++ b/arch/i386/kernel/vsyscall-note.S
@@ -3,23 +3,12 @@
* Here we can supply some information useful to userland.
*/
-#include <linux/uts.h>
#include <linux/version.h>
+#include <linux/elfnote.h>
-#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type) \
- .section name, flags; \
- .balign 4; \
- .long 1f - 0f; /* name length */ \
- .long 3f - 2f; /* data length */ \
- .long type; /* note type */ \
-0: .asciz vendor; /* vendor name */ \
-1: .balign 4; \
-2:
-
-#define ASM_ELF_NOTE_END \
-3: .balign 4; /* pad out section */ \
- .previous
-
- ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0)
+/* Ideally this would use UTS_NAME, but using a quoted string here
+ doesn't work. Remember to change this when changing the
+ kernel's name. */
+ELFNOTE_START(Linux, 0, "a")
.long LINUX_VERSION_CODE
- ASM_ELF_NOTE_END
+ELFNOTE_END
===================================================================
--- a/include/linux/elfnote.h
+++ b/include/linux/elfnote.h
@@ -38,17 +38,25 @@
* e.g. ELFNOTE(XYZCo, 42, .asciz, "forty-two")
* ELFNOTE(XYZCo, 12, .long, 0xdeadbeef)
*/
-#define ELFNOTE(name, type, desctype, descdata) \
-.pushsection .note.name, "",@note ; \
+#define ELFNOTE_START(name, type, flags) \
+.pushsection .note.name, flags,@note ; \
.align 4 ; \
.long 2f - 1f /* namesz */ ; \
- .long 4f - 3f /* descsz */ ; \
+ .long 4484f - 3f /* descsz */ ; \
.long type ; \
1:.asciz #name ; \
2:.align 4 ; \
-3:desctype descdata ; \
-4:.align 4 ; \
+3:
+
+#define ELFNOTE_END \
+4484:.align 4 ; \
.popsection ;
+
+#define ELFNOTE(name, type, desc) \
+ ELFNOTE_START(name, type, "") \
+ desc ; \
+ ELFNOTE_END
+
#else /* !__ASSEMBLER__ */
#include <linux/elf.h>
/*
J
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (7 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
` (20 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Dead to magic numbers!
Generated code is the same.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/kernel/verify_cpu.S | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
Index: linux/arch/x86_64/kernel/verify_cpu.S
===================================================================
--- linux.orig/arch/x86_64/kernel/verify_cpu.S
+++ linux/arch/x86_64/kernel/verify_cpu.S
@@ -30,18 +30,27 @@
* appropriately. Either display a message or halt.
*/
-verify_cpu:
+#include <asm/cpufeature.h>
+verify_cpu:
pushfl # Save caller passed flags
pushl $0 # Kill any dangerous flags
popfl
- /* minimum CPUID flags for x86-64 */
- /* see http://www.x86-64.org/lists/discuss/msg02971.html */
-#define SSE_MASK ((1<<25)|(1<<26))
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
- (1<<13)|(1<<15)|(1<<24))
-#define REQUIRED_MASK2 (1<<29)
+ /* minimum CPUID flags for x86-64 as defined by AMD */
+#define M(x) (1<<(x))
+#define M2(a,b) M(a)|M(b)
+#define M4(a,b,c,d) M(a)|M(b)|M(c)|M(d)
+
+#define SSE_MASK \
+ (M2(X86_FEATURE_XMM,X86_FEATURE_XMM2))
+#define REQUIRED_MASK1 \
+ (M4(X86_FEATURE_FPU,X86_FEATURE_PSE,X86_FEATURE_TSC,X86_FEATURE_MSR)|\
+ M4(X86_FEATURE_PAE,X86_FEATURE_CX8,X86_FEATURE_PGE,X86_FEATURE_CMOV)|\
+ M(X86_FEATURE_FXSR))
+#define REQUIRED_MASK2 \
+ (M(X86_FEATURE_LM - 32))
+
pushfl # standard way to check for cpuid
popl %eax
movl %eax,%ebx
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (8 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
` (19 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Follows i386 and useful cleanup.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/boot/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/arch/x86_64/boot/Makefile
===================================================================
--- linux.orig/arch/x86_64/boot/Makefile
+++ linux/arch/x86_64/boot/Makefile
@@ -36,7 +36,7 @@ subdir- := compressed/ #Let make clean
# ---------------------------------------------------------------------------
$(obj)/bzImage: IMAGE_OFFSET := 0x100000
-$(obj)/bzImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
+$(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
$(obj)/bzImage: BUILDFLAGS := -b
quiet_cmd_image = BUILD $@
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (9 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
` (18 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Needed for followon patch
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/boot/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux/arch/i386/boot/Makefile
===================================================================
--- linux.orig/arch/i386/boot/Makefile
+++ linux/arch/i386/boot/Makefile
@@ -36,9 +36,9 @@ HOSTCFLAGS_build.o := $(LINUXINCLUDE)
# ---------------------------------------------------------------------------
$(obj)/zImage: IMAGE_OFFSET := 0x1000
-$(obj)/zImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK)
+$(obj)/zImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK)
$(obj)/bzImage: IMAGE_OFFSET := 0x100000
-$(obj)/bzImage: EXTRA_AFLAGS := -traditional $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
+$(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
$(obj)/bzImage: BUILDFLAGS := -b
quiet_cmd_image = BUILD $@
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [12/30] i386: Verify important CPUID bits in real mode
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (10 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
` (17 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Check some CPUID bits that are needed for compiler generated early in boot.
When the system is still in real mode before changing the VESA BIOS mode
it is possible to still display an visible error message on the screen.
Similar to x86-64.
Includes cleanups from Eric Biederman
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/Kconfig.cpu | 22 ++++++++++-
arch/i386/boot/setup.S | 17 ++++++++
arch/i386/kernel/verify_cpu.S | 68 +++++++++++++++++++++++++++++++++++
include/asm-i386/cpufeature.h | 3 +
include/asm-i386/required-features.h | 34 +++++++++++++++++
5 files changed, 142 insertions(+), 2 deletions(-)
Index: linux/arch/i386/kernel/verify_cpu.S
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/verify_cpu.S
@@ -0,0 +1,65 @@
+/* Check if CPU has some minimum CPUID bits
+ This runs in 16bit mode so that the caller can still use the BIOS
+ to output errors on the screen */
+#include <asm/cpufeature.h>
+
+verify_cpu:
+ pushfl # Save caller passed flags
+ pushl $0 # Kill any dangerous flags
+ popfl
+
+#if CONFIG_X86_MINIMUM_CPU_MODEL >= 4
+ pushfl
+ orl $(1<<18),(%esp) # try setting AC
+ popfl
+ pushfl
+ popl %eax
+ testl $(1<<18),%eax
+ jz bad
+#endif
+#if REQUIRED_MASK1 != 0
+ pushfl # standard way to check for cpuid
+ popl %eax
+ movl %eax,%ebx
+ xorl $0x200000,%eax
+ pushl %eax
+ popfl
+ pushfl
+ popl %eax
+ cmpl %eax,%ebx
+ pushfl # standard way to check for cpuid
+ popl %eax
+ movl %eax,%ebx
+ xorl $0x200000,%eax
+ pushl %eax
+ popfl
+ pushfl
+ popl %eax
+ cmpl %eax,%ebx
+ jz bad # REQUIRED_MASK1 != 0 requires CPUID
+
+ movl $0x0,%eax # See if cpuid 1 is implemented
+ cpuid
+ cmpl $0x1,%eax
+ jb bad # no cpuid 1
+
+ movl $0x1,%eax # Does the cpu have what it takes
+ cpuid
+
+#if CONFIG_X86_MINIMUM_CPU_MODEL > 4
+#error add proper model checking here
+#endif
+
+ andl $REQUIRED_MASK1,%edx
+ xorl $REQUIRED_MASK1,%edx
+ jnz bad
+#endif /* REQUIRED_MASK1 */
+
+ popfl
+ xor %eax,%eax
+ ret
+
+bad:
+ popfl
+ movl $1,%eax
+ ret
Index: linux/arch/i386/Kconfig.cpu
===================================================================
--- linux.orig/arch/i386/Kconfig.cpu
+++ linux/arch/i386/Kconfig.cpu
@@ -240,14 +240,19 @@ config X86_L1_CACHE_SHIFT
default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7
+config X86_XADD
+ bool
+ depends on !M386
+ default y
+
config RWSEM_GENERIC_SPINLOCK
bool
- depends on M386
+ depends on !X86_XADD
default y
config RWSEM_XCHGADD_ALGORITHM
bool
- depends on !M386
+ depends on X86_XADD
default y
config ARCH_HAS_ILOG2_U32
@@ -331,3 +336,16 @@ config X86_TSC
bool
depends on (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ
default y
+
+# this should be set for all -march=.. options where the compiler
+# generates cmov.
+config X86_CMOV
+ bool
+ depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7)
+ default y
+
+config X86_MINIMUM_CPU_MODEL
+ int
+ default "4" if X86_XADD || X86_CMPXCHG || X86_BSWAP
+ default "0"
+
Index: linux/arch/i386/boot/setup.S
===================================================================
--- linux.orig/arch/i386/boot/setup.S
+++ linux/arch/i386/boot/setup.S
@@ -302,7 +302,24 @@ good_sig:
loader_panic_mess: .string "Wrong loader, giving up..."
+# check minimum cpuid
+# we do this here because it is the last place we can actually
+# show a user visible error message. Later the video modus
+# might be already messed up.
loader_ok:
+ call verify_cpu
+ testl %eax,%eax
+ jz cpu_ok
+ lea cpu_panic_mess,%si
+ call prtstr
+1: jmp 1b
+
+cpu_panic_mess:
+ .asciz "PANIC: CPU too old for this kernel."
+
+#include "../kernel/verify_cpu.S"
+
+cpu_ok:
# Get memory size (extended mem, kB)
xorl %eax, %eax
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -7,7 +7,10 @@
#ifndef __ASM_I386_CPUFEATURE_H
#define __ASM_I386_CPUFEATURE_H
+#ifndef __ASSEMBLY__
#include <linux/bitops.h>
+#endif
+#include <asm/required-features.h>
#define NCAPINTS 7 /* N 32-bit words worth of info */
Index: linux/include/asm-i386/required-features.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/required-features.h
@@ -0,0 +1,34 @@
+#ifndef _ASM_REQUIRED_FEATURES_H
+#define _ASM_REQUIRED_FEATURES_H 1
+
+/* Define minimum CPUID feature set for kernel These bits are checked
+ really early to actually display a visible error message before the
+ kernel dies. Only add word 0 bits here
+
+ Some requirements that are not in CPUID yet are also in the
+ CONFIG_X86_MINIMUM_CPU mode which is checked too.
+
+ The real information is in arch/i386/Kconfig.cpu, this just converts
+ the CONFIGs into a bitmask */
+
+#ifdef CONFIG_X86_PAE
+#define NEED_PAE (1<<X86_FEATURE_PAE)
+#else
+#define NEED_PAE 0
+#endif
+
+#ifdef CONFIG_X86_CMOV
+#define NEED_CMOV (1<<X86_FEATURE_CMOV)
+#else
+#define NEED_CMOV 0
+#endif
+
+#ifdef CONFIG_X86_CMPXCHG64
+#define NEED_CMPXCHG64 (1<<X86_FEATURE_CX8)
+#else
+#define NEED_CMPXCHG64 0
+#endif
+
+#define REQUIRED_MASK1 (NEED_PAE|NEED_CMOV|NEED_CMPXCHG64)
+
+#endif
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [13/30] i386: Evaluate constant cpu features at runtime
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (11 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
` (16 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Redefine cpu_has() to evaluate cpu features already checked in early
boot at compile time. This way the compiler might eliminate some dead code.
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/cpufeature.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -106,8 +106,12 @@
#define X86_FEATURE_LAHF_LM (6*32+ 0) /* LAHF/SAHF in long mode */
#define X86_FEATURE_CMP_LEGACY (6*32+ 1) /* If yes HyperThreading not valid */
-#define cpu_has(c, bit) test_bit(bit, (c)->x86_capability)
-#define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability)
+#define cpu_has(c, bit) \
+ ((__builtin_constant_p(bit) && (bit) < 32 && \
+ (1UL << (bit)) & REQUIRED_MASK1) ? \
+ 1 : \
+ test_bit(bit, (c)->x86_capability))
+#define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
#define cpu_has_fpu boot_cpu_has(X86_FEATURE_FPU)
#define cpu_has_vme boot_cpu_has(X86_FEATURE_VME)
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [14/30] i386: Implement alternative_io for i386
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (12 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
` (15 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Ported from x86-64.
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/alternative.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
Index: linux/include/asm-i386/alternative.h
===================================================================
--- linux.orig/include/asm-i386/alternative.h
+++ linux/include/asm-i386/alternative.h
@@ -82,6 +82,21 @@ static inline void alternatives_smp_swit
"663:\n\t" newinstr "\n664:\n" /* replacement */\
".previous" :: "i" (feature), ##input)
+/* Like alternative_input, but with a single output argument */
+#define alternative_io(oldinstr, newinstr, feature, output, input...) \
+ asm volatile ("661:\n\t" oldinstr "\n662:\n" \
+ ".section .altinstructions,\"a\"\n" \
+ " .align 4\n" \
+ " .long 661b\n" /* label */ \
+ " .long 663f\n" /* new instruction */ \
+ " .byte %c[feat]\n" /* feature bit */ \
+ " .byte 662b-661b\n" /* sourcelen */ \
+ " .byte 664f-663f\n" /* replacementlen */ \
+ ".previous\n" \
+ ".section .altinstr_replacement,\"ax\"\n" \
+ "663:\n\t" newinstr "\n664:\n" /* replacement */ \
+ ".previous" : output : [feat] "i" (feature), ##input)
+
/*
* Alternative inline assembly for SMP.
*
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (13 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
` (14 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Syncs up with x86-64.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/kernel/cpu/intel.c | 4 +++-
include/asm-i386/cpufeature.h | 1 +
include/asm-i386/tsc.h | 4 ----
3 files changed, 4 insertions(+), 5 deletions(-)
Index: linux/arch/i386/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/intel.c
+++ linux/arch/i386/kernel/cpu/intel.c
@@ -188,8 +188,10 @@ static void __cpuinit init_intel(struct
}
#endif
- if (c->x86 == 15)
+ if (c->x86 == 15) {
set_bit(X86_FEATURE_P4, c->x86_capability);
+ set_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
+ }
if (c->x86 == 6)
set_bit(X86_FEATURE_P3, c->x86_capability);
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -79,6 +79,7 @@
#define X86_FEATURE_PEBS (3*32+12) /* Precise-Event Based Sampling */
#define X86_FEATURE_BTS (3*32+13) /* Branch Trace Store */
#define X86_FEATURE_LAPIC_TIMER_BROKEN (3*32+ 14) /* lapic timer broken in C1 */
+#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */
Index: linux/include/asm-i386/tsc.h
===================================================================
--- linux.orig/include/asm-i386/tsc.h
+++ linux/include/asm-i386/tsc.h
@@ -35,7 +35,6 @@ static inline cycles_t get_cycles(void)
static __always_inline cycles_t get_cycles_sync(void)
{
unsigned long long ret;
-#ifdef X86_FEATURE_SYNC_RDTSC
unsigned eax;
/*
@@ -44,9 +43,6 @@ static __always_inline cycles_t get_cycl
*/
alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
"=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
-#else
- sync_core();
-#endif
rdtscll(ret);
return ret;
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (14 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
` (13 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Following x86-64
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/cpufeature.h | 1 +
1 file changed, 1 insertion(+)
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -52,6 +52,7 @@
#define X86_FEATURE_MP (1*32+19) /* MP Capable. */
#define X86_FEATURE_NX (1*32+20) /* Execute Disable */
#define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_RDTSCP (1*32+27) /* RDTSCP */
#define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
#define X86_FEATURE_3DNOWEXT (1*32+30) /* AMD 3DNow! extensions */
#define X86_FEATURE_3DNOW (1*32+31) /* 3DNow! */
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (15 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
` (12 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: joerg.roedel, patches, linux-kernel
RDTSCP is already synchronous and doesn't need an explicit CPUID.
This is a little faster and more importantly avoids VMEXITs on Hypervisors.
Original patch from Joerg Roedel, but reworked by AK
Also includes miscompilation fix by Eric Biederman
Cc: "Joerg Roedel" <joerg.roedel@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/tsc.h | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux/include/asm-i386/tsc.h
===================================================================
--- linux.orig/include/asm-i386/tsc.h
+++ linux/include/asm-i386/tsc.h
@@ -38,6 +38,15 @@ static __always_inline cycles_t get_cycl
unsigned eax;
/*
+ * Use RDTSCP if possible; it is guaranteed to be synchronous
+ * and doesn't cause a VMEXIT on Hypervisors
+ */
+ alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
+ "=A" (ret), "0" (0ULL) : "ecx", "memory");
+ if (ret)
+ return ret;
+
+ /*
* Don't do an additional sync on CPUs where we know
* RDTSC is already synchronous:
*/
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (16 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
` (11 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
This was supposed to see the full memory on a ASUS A8SX motherboard
with 4GB RAM where the northbridge reports less memory, but it didn't
help there. But it's a reasonable change so let's include it anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/mm/k8topology.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux/arch/x86_64/mm/k8topology.c
===================================================================
--- linux.orig/arch/x86_64/mm/k8topology.c
+++ linux/arch/x86_64/mm/k8topology.c
@@ -62,6 +62,8 @@ int __init k8_scan_nodes(unsigned long s
reg = read_pci_config(0, nb, 0, 0x60);
numnodes = ((reg >> 4) & 0xF) + 1;
+ if (numnodes <= 1)
+ return -1;
printk(KERN_INFO "Number of nodes %d\n", numnodes);
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [19/30] i386: Little cleanups in smpboot.c
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (17 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
` (10 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
- Remove #if that is always set
- Fix warning
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/kernel/smpboot.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -516,7 +516,6 @@ static void unmap_cpu_to_logical_apicid(
unmap_cpu_to_node(cpu);
}
-#if APIC_DEBUG
static inline void __inquire_remote_apic(int apicid)
{
int i, regs[] = { APIC_ID >> 4, APIC_LVR >> 4, APIC_SPIV >> 4 };
@@ -548,14 +547,13 @@ static inline void __inquire_remote_apic
switch (status) {
case APIC_ICR_RR_VALID:
status = apic_read(APIC_RRR);
- printk("%08x\n", status);
+ printk("%lx\n", status);
break;
default:
printk("failed\n");
}
}
}
-#endif
#ifdef WAKE_SECONDARY_VIA_NMI
/*
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0)
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (18 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
` (9 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
access_ok checks this case anyways, no need to check twice.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/lib/usercopy.c | 7 -------
1 file changed, 7 deletions(-)
Index: linux/arch/i386/lib/usercopy.c
===================================================================
--- linux.orig/arch/i386/lib/usercopy.c
+++ linux/arch/i386/lib/usercopy.c
@@ -716,7 +716,6 @@ do { \
unsigned long __copy_to_user_ll(void __user *to, const void *from,
unsigned long n)
{
- BUG_ON((long) n < 0);
#ifndef CONFIG_X86_WP_WORKS_OK
if (unlikely(boot_cpu_data.wp_works_ok == 0) &&
((unsigned long )to) < TASK_SIZE) {
@@ -785,7 +784,6 @@ EXPORT_SYMBOL(__copy_to_user_ll);
unsigned long __copy_from_user_ll(void *to, const void __user *from,
unsigned long n)
{
- BUG_ON((long)n < 0);
if (movsl_is_ok(to, from, n))
__copy_user_zeroing(to, from, n);
else
@@ -797,7 +795,6 @@ EXPORT_SYMBOL(__copy_from_user_ll);
unsigned long __copy_from_user_ll_nozero(void *to, const void __user *from,
unsigned long n)
{
- BUG_ON((long)n < 0);
if (movsl_is_ok(to, from, n))
__copy_user(to, from, n);
else
@@ -810,7 +807,6 @@ EXPORT_SYMBOL(__copy_from_user_ll_nozero
unsigned long __copy_from_user_ll_nocache(void *to, const void __user *from,
unsigned long n)
{
- BUG_ON((long)n < 0);
#ifdef CONFIG_X86_INTEL_USERCOPY
if ( n > 64 && cpu_has_xmm2)
n = __copy_user_zeroing_intel_nocache(to, from, n);
@@ -825,7 +821,6 @@ unsigned long __copy_from_user_ll_nocach
unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
unsigned long n)
{
- BUG_ON((long)n < 0);
#ifdef CONFIG_X86_INTEL_USERCOPY
if ( n > 64 && cpu_has_xmm2)
n = __copy_user_intel_nocache(to, from, n);
@@ -853,7 +848,6 @@ unsigned long __copy_from_user_ll_nocach
unsigned long
copy_to_user(void __user *to, const void *from, unsigned long n)
{
- BUG_ON((long) n < 0);
if (access_ok(VERIFY_WRITE, to, n))
n = __copy_to_user(to, from, n);
return n;
@@ -879,7 +873,6 @@ EXPORT_SYMBOL(copy_to_user);
unsigned long
copy_from_user(void *to, const void __user *from, unsigned long n)
{
- BUG_ON((long) n < 0);
if (access_ok(VERIFY_READ, from, n))
n = __copy_from_user(to, from, n);
else
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (19 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
` (8 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Signed-off-by: Andi Kleen <ak@suse.de>
---
fs/compat.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: linux/fs/compat.c
===================================================================
--- linux.orig/fs/compat.c
+++ linux/fs/compat.c
@@ -371,13 +371,14 @@ static void compat_ioctl_error(struct fi
fn = "?";
}
- sprintf(buf,"'%c'", (cmd>>24) & 0x3f);
+ sprintf(buf,"'%c'", (cmd>>_IOC_TYPESHIFT) & _IOC_TYPEMASK);
if (!isprint(buf[1]))
sprintf(buf, "%02x", buf[1]);
compat_printk("ioctl32(%s:%d): Unknown cmd fd(%d) "
- "cmd(%08x){%s} arg(%08x) on %s\n",
+ "cmd(%08x){t:%s;sz:%u} arg(%08x) on %s\n",
current->comm, current->pid,
(int)fd, (unsigned int)cmd, buf,
+ (cmd >> _IOC_SIZESHIFT) & _IOC_SIZEMASK,
(unsigned int)arg, fn);
if (path)
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [22/30] x86_64: Remove CONFIG_REORDER
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (20 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
` (7 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: arjan, patches, linux-kernel
The option never worked well and functionlist wasn't well maintained.
Also it made the build very slow on many binutils version.
So just remove it.
Cc: arjan@linux.intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/Kconfig | 8
arch/x86_64/Makefile | 1
arch/x86_64/kernel/functionlist | 1284 ---------------------------------------
arch/x86_64/kernel/vmlinux.lds.S | 3
4 files changed, 1296 deletions(-)
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -660,14 +660,6 @@ config CC_STACKPROTECTOR_ALL
source kernel/Kconfig.hz
-config REORDER
- bool "Function reordering"
- default n
- help
- This option enables the toolchain to reorder functions for a more
- optimal TLB usage. If you have pretty much any version of binutils,
- this can increase your kernel build time by roughly one minute.
-
config K8_NB
def_bool y
depends on AGP_AMD64 || IOMMU || (PCI && NUMA)
Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -40,7 +40,6 @@ cflags-y += -m64
cflags-y += -mno-red-zone
cflags-y += -mcmodel=kernel
cflags-y += -pipe
-cflags-kernel-$(CONFIG_REORDER) += -ffunction-sections
cflags-y += -Wno-sign-compare
cflags-y += -fno-asynchronous-unwind-tables
ifneq ($(CONFIG_DEBUG_INFO),y)
Index: linux/arch/x86_64/kernel/functionlist
===================================================================
--- linux.orig/arch/x86_64/kernel/functionlist
+++ /dev/null
@@ -1,1284 +0,0 @@
-*(.text.flush_thread)
-*(.text.check_poison_obj)
-*(.text.copy_page)
-*(.text.__set_personality)
-*(.text.gart_map_sg)
-*(.text.kmem_cache_free)
-*(.text.find_get_page)
-*(.text._raw_spin_lock)
-*(.text.ide_outb)
-*(.text.unmap_vmas)
-*(.text.copy_page_range)
-*(.text.kprobe_handler)
-*(.text.__handle_mm_fault)
-*(.text.__d_lookup)
-*(.text.copy_user_generic)
-*(.text.__link_path_walk)
-*(.text.get_page_from_freelist)
-*(.text.kmem_cache_alloc)
-*(.text.drive_cmd_intr)
-*(.text.ia32_setup_sigcontext)
-*(.text.huge_pte_offset)
-*(.text.do_page_fault)
-*(.text.page_remove_rmap)
-*(.text.release_pages)
-*(.text.ide_end_request)
-*(.text.__mutex_lock_slowpath)
-*(.text.__find_get_block)
-*(.text.kfree)
-*(.text.vfs_read)
-*(.text._raw_spin_unlock)
-*(.text.free_hot_cold_page)
-*(.text.fget_light)
-*(.text.schedule)
-*(.text.memcmp)
-*(.text.touch_atime)
-*(.text.__might_sleep)
-*(.text.__down_read_trylock)
-*(.text.arch_pick_mmap_layout)
-*(.text.find_vma)
-*(.text.__make_request)
-*(.text.do_generic_mapping_read)
-*(.text.mutex_lock_interruptible)
-*(.text.__generic_file_aio_read)
-*(.text._atomic_dec_and_lock)
-*(.text.__wake_up_bit)
-*(.text.add_to_page_cache)
-*(.text.cache_alloc_debugcheck_after)
-*(.text.vm_normal_page)
-*(.text.mutex_debug_check_no_locks_freed)
-*(.text.net_rx_action)
-*(.text.__find_first_zero_bit)
-*(.text.put_page)
-*(.text._raw_read_lock)
-*(.text.__delay)
-*(.text.dnotify_parent)
-*(.text.do_path_lookup)
-*(.text.do_sync_read)
-*(.text.do_lookup)
-*(.text.bit_waitqueue)
-*(.text.file_read_actor)
-*(.text.strncpy_from_user)
-*(.text.__pagevec_lru_add_active)
-*(.text.fget)
-*(.text.dput)
-*(.text.__strnlen_user)
-*(.text.inotify_inode_queue_event)
-*(.text.rw_verify_area)
-*(.text.ide_intr)
-*(.text.inotify_dentry_parent_queue_event)
-*(.text.permission)
-*(.text.memscan)
-*(.text.hpet_rtc_interrupt)
-*(.text.do_mmap_pgoff)
-*(.text.current_fs_time)
-*(.text.vfs_getattr)
-*(.text.kmem_flagcheck)
-*(.text.mark_page_accessed)
-*(.text.free_pages_and_swap_cache)
-*(.text.generic_fillattr)
-*(.text.__block_prepare_write)
-*(.text.__set_page_dirty_nobuffers)
-*(.text.link_path_walk)
-*(.text.find_get_pages_tag)
-*(.text.ide_do_request)
-*(.text.__alloc_pages)
-*(.text.generic_permission)
-*(.text.mod_page_state_offset)
-*(.text.free_pgd_range)
-*(.text.generic_file_buffered_write)
-*(.text.number)
-*(.text.ide_do_rw_disk)
-*(.text.__brelse)
-*(.text.__mod_page_state_offset)
-*(.text.rotate_reclaimable_page)
-*(.text.find_vma_prepare)
-*(.text.find_vma_prev)
-*(.text.lru_cache_add_active)
-*(.text.__kmalloc_track_caller)
-*(.text.smp_invalidate_interrupt)
-*(.text.handle_IRQ_event)
-*(.text.__find_get_block_slow)
-*(.text.do_wp_page)
-*(.text.do_select)
-*(.text.set_user_nice)
-*(.text.sys_read)
-*(.text.do_munmap)
-*(.text.csum_partial)
-*(.text.__do_softirq)
-*(.text.may_open)
-*(.text.getname)
-*(.text.get_empty_filp)
-*(.text.__fput)
-*(.text.remove_mapping)
-*(.text.filp_ctor)
-*(.text.poison_obj)
-*(.text.unmap_region)
-*(.text.test_set_page_writeback)
-*(.text.__do_page_cache_readahead)
-*(.text.sock_def_readable)
-*(.text.ide_outl)
-*(.text.shrink_zone)
-*(.text.rb_insert_color)
-*(.text.get_request)
-*(.text.sys_pread64)
-*(.text.spin_bug)
-*(.text.ide_outsl)
-*(.text.mask_and_ack_8259A)
-*(.text.filemap_nopage)
-*(.text.page_add_file_rmap)
-*(.text.find_lock_page)
-*(.text.tcp_poll)
-*(.text.__mark_inode_dirty)
-*(.text.file_ra_state_init)
-*(.text.generic_file_llseek)
-*(.text.__pagevec_lru_add)
-*(.text.page_cache_readahead)
-*(.text.n_tty_receive_buf)
-*(.text.zonelist_policy)
-*(.text.vma_adjust)
-*(.text.test_clear_page_dirty)
-*(.text.sync_buffer)
-*(.text.do_exit)
-*(.text.__bitmap_weight)
-*(.text.alloc_pages_current)
-*(.text.get_unused_fd)
-*(.text.zone_watermark_ok)
-*(.text.cpuset_update_task_memory_state)
-*(.text.__bitmap_empty)
-*(.text.sys_munmap)
-*(.text.__inode_dir_notify)
-*(.text.__generic_file_aio_write_nolock)
-*(.text.__pte_alloc)
-*(.text.sys_select)
-*(.text.vm_acct_memory)
-*(.text.vfs_write)
-*(.text.__lru_add_drain)
-*(.text.prio_tree_insert)
-*(.text.generic_file_aio_read)
-*(.text.vma_merge)
-*(.text.block_write_full_page)
-*(.text.__page_set_anon_rmap)
-*(.text.apic_timer_interrupt)
-*(.text.release_console_sem)
-*(.text.sys_write)
-*(.text.sys_brk)
-*(.text.dup_mm)
-*(.text.read_current_timer)
-*(.text.ll_rw_block)
-*(.text.blk_rq_map_sg)
-*(.text.dbg_userword)
-*(.text.__block_commit_write)
-*(.text.cache_grow)
-*(.text.copy_strings)
-*(.text.release_task)
-*(.text.do_sync_write)
-*(.text.unlock_page)
-*(.text.load_elf_binary)
-*(.text.__follow_mount)
-*(.text.__getblk)
-*(.text.do_sys_open)
-*(.text.current_kernel_time)
-*(.text.call_rcu)
-*(.text.write_chan)
-*(.text.vsnprintf)
-*(.text.dummy_inode_setsecurity)
-*(.text.submit_bh)
-*(.text.poll_freewait)
-*(.text.bio_alloc_bioset)
-*(.text.skb_clone)
-*(.text.page_waitqueue)
-*(.text.__mutex_lock_interruptible_slowpath)
-*(.text.get_index)
-*(.text.csum_partial_copy_generic)
-*(.text.bad_range)
-*(.text.remove_vma)
-*(.text.cp_new_stat)
-*(.text.alloc_arraycache)
-*(.text.test_clear_page_writeback)
-*(.text.strsep)
-*(.text.open_namei)
-*(.text._raw_read_unlock)
-*(.text.get_vma_policy)
-*(.text.__down_write_trylock)
-*(.text.find_get_pages)
-*(.text.tcp_rcv_established)
-*(.text.generic_make_request)
-*(.text.__block_write_full_page)
-*(.text.cfq_set_request)
-*(.text.sys_inotify_init)
-*(.text.split_vma)
-*(.text.__mod_timer)
-*(.text.get_options)
-*(.text.vma_link)
-*(.text.mpage_writepages)
-*(.text.truncate_complete_page)
-*(.text.tcp_recvmsg)
-*(.text.sigprocmask)
-*(.text.filemap_populate)
-*(.text.sys_close)
-*(.text.inotify_dev_queue_event)
-*(.text.do_task_stat)
-*(.text.__dentry_open)
-*(.text.unlink_file_vma)
-*(.text.__pollwait)
-*(.text.packet_rcv_spkt)
-*(.text.drop_buffers)
-*(.text.free_pgtables)
-*(.text.generic_file_direct_write)
-*(.text.copy_process)
-*(.text.netif_receive_skb)
-*(.text.dnotify_flush)
-*(.text.print_bad_pte)
-*(.text.anon_vma_unlink)
-*(.text.sys_mprotect)
-*(.text.sync_sb_inodes)
-*(.text.find_inode_fast)
-*(.text.dummy_inode_readlink)
-*(.text.putname)
-*(.text.init_smp_flush)
-*(.text.dbg_redzone2)
-*(.text.sk_run_filter)
-*(.text.may_expand_vm)
-*(.text.generic_file_aio_write)
-*(.text.find_next_zero_bit)
-*(.text.file_kill)
-*(.text.audit_getname)
-*(.text.arch_unmap_area_topdown)
-*(.text.alloc_page_vma)
-*(.text.tcp_transmit_skb)
-*(.text.rb_next)
-*(.text.dbg_redzone1)
-*(.text.generic_file_mmap)
-*(.text.vfs_fstat)
-*(.text.sys_time)
-*(.text.page_lock_anon_vma)
-*(.text.get_unmapped_area)
-*(.text.remote_llseek)
-*(.text.__up_read)
-*(.text.fd_install)
-*(.text.eventpoll_init_file)
-*(.text.dma_alloc_coherent)
-*(.text.create_empty_buffers)
-*(.text.__mutex_unlock_slowpath)
-*(.text.dup_fd)
-*(.text.d_alloc)
-*(.text.tty_ldisc_try)
-*(.text.sys_stime)
-*(.text.__rb_rotate_right)
-*(.text.d_validate)
-*(.text.rb_erase)
-*(.text.path_release)
-*(.text.memmove)
-*(.text.invalidate_complete_page)
-*(.text.clear_inode)
-*(.text.cache_estimate)
-*(.text.alloc_buffer_head)
-*(.text.smp_call_function_interrupt)
-*(.text.flush_tlb_others)
-*(.text.file_move)
-*(.text.balance_dirty_pages_ratelimited)
-*(.text.vma_prio_tree_add)
-*(.text.timespec_trunc)
-*(.text.mempool_alloc)
-*(.text.iget_locked)
-*(.text.d_alloc_root)
-*(.text.cpuset_populate_dir)
-*(.text.anon_vma_prepare)
-*(.text.sys_newstat)
-*(.text.alloc_page_interleave)
-*(.text.__path_lookup_intent_open)
-*(.text.__pagevec_free)
-*(.text.inode_init_once)
-*(.text.free_vfsmnt)
-*(.text.__user_walk_fd)
-*(.text.cfq_idle_slice_timer)
-*(.text.sys_mmap)
-*(.text.sys_llseek)
-*(.text.prio_tree_remove)
-*(.text.filp_close)
-*(.text.file_permission)
-*(.text.vma_prio_tree_remove)
-*(.text.tcp_ack)
-*(.text.nameidata_to_filp)
-*(.text.sys_lseek)
-*(.text.percpu_counter_mod)
-*(.text.igrab)
-*(.text.__bread)
-*(.text.alloc_inode)
-*(.text.filldir)
-*(.text.__rb_rotate_left)
-*(.text.irq_affinity_write_proc)
-*(.text.init_request_from_bio)
-*(.text.find_or_create_page)
-*(.text.tty_poll)
-*(.text.tcp_sendmsg)
-*(.text.ide_wait_stat)
-*(.text.free_buffer_head)
-*(.text.flush_signal_handlers)
-*(.text.tcp_v4_rcv)
-*(.text.nr_blockdev_pages)
-*(.text.locks_remove_flock)
-*(.text.__iowrite32_copy)
-*(.text.do_filp_open)
-*(.text.try_to_release_page)
-*(.text.page_add_new_anon_rmap)
-*(.text.kmem_cache_size)
-*(.text.eth_type_trans)
-*(.text.try_to_free_buffers)
-*(.text.schedule_tail)
-*(.text.proc_lookup)
-*(.text.no_llseek)
-*(.text.kfree_skbmem)
-*(.text.do_wait)
-*(.text.do_mpage_readpage)
-*(.text.vfs_stat_fd)
-*(.text.tty_write)
-*(.text.705)
-*(.text.sync_page)
-*(.text.__remove_shared_vm_struct)
-*(.text.__kfree_skb)
-*(.text.sock_poll)
-*(.text.get_request_wait)
-*(.text.do_sigaction)
-*(.text.do_brk)
-*(.text.tcp_event_data_recv)
-*(.text.read_chan)
-*(.text.pipe_writev)
-*(.text.__emul_lookup_dentry)
-*(.text.rtc_get_rtc_time)
-*(.text.print_objinfo)
-*(.text.file_update_time)
-*(.text.do_signal)
-*(.text.disable_8259A_irq)
-*(.text.blk_queue_bounce)
-*(.text.__anon_vma_link)
-*(.text.__vma_link)
-*(.text.vfs_rename)
-*(.text.sys_newlstat)
-*(.text.sys_newfstat)
-*(.text.sys_mknod)
-*(.text.__show_regs)
-*(.text.iput)
-*(.text.get_signal_to_deliver)
-*(.text.flush_tlb_page)
-*(.text.debug_mutex_wake_waiter)
-*(.text.copy_thread)
-*(.text.clear_page_dirty_for_io)
-*(.text.buffer_io_error)
-*(.text.vfs_permission)
-*(.text.truncate_inode_pages_range)
-*(.text.sys_recvfrom)
-*(.text.remove_suid)
-*(.text.mark_buffer_dirty)
-*(.text.local_bh_enable)
-*(.text.get_zeroed_page)
-*(.text.get_vmalloc_info)
-*(.text.flush_old_exec)
-*(.text.dummy_inode_permission)
-*(.text.__bio_add_page)
-*(.text.prio_tree_replace)
-*(.text.notify_change)
-*(.text.mntput_no_expire)
-*(.text.fput)
-*(.text.__end_that_request_first)
-*(.text.wake_up_bit)
-*(.text.unuse_mm)
-*(.text.shrink_icache_memory)
-*(.text.sched_balance_self)
-*(.text.__pmd_alloc)
-*(.text.pipe_poll)
-*(.text.normal_poll)
-*(.text.__free_pages)
-*(.text.follow_mount)
-*(.text.cdrom_start_packet_command)
-*(.text.blk_recount_segments)
-*(.text.bio_put)
-*(.text.__alloc_skb)
-*(.text.__wake_up)
-*(.text.vm_stat_account)
-*(.text.sys_fcntl)
-*(.text.sys_fadvise64)
-*(.text._raw_write_unlock)
-*(.text.__pud_alloc)
-*(.text.alloc_page_buffers)
-*(.text.vfs_llseek)
-*(.text.sockfd_lookup)
-*(.text._raw_write_lock)
-*(.text.put_compound_page)
-*(.text.prune_dcache)
-*(.text.pipe_readv)
-*(.text.mempool_free)
-*(.text.make_ahead_window)
-*(.text.lru_add_drain)
-*(.text.constant_test_bit)
-*(.text.__clear_user)
-*(.text.arch_unmap_area)
-*(.text.anon_vma_link)
-*(.text.sys_chroot)
-*(.text.setup_arg_pages)
-*(.text.radix_tree_preload)
-*(.text.init_rwsem)
-*(.text.generic_osync_inode)
-*(.text.generic_delete_inode)
-*(.text.do_sys_poll)
-*(.text.dev_queue_xmit)
-*(.text.default_llseek)
-*(.text.__writeback_single_inode)
-*(.text.vfs_ioctl)
-*(.text.__up_write)
-*(.text.unix_poll)
-*(.text.sys_rt_sigprocmask)
-*(.text.sock_recvmsg)
-*(.text.recalc_bh_state)
-*(.text.__put_unused_fd)
-*(.text.process_backlog)
-*(.text.locks_remove_posix)
-*(.text.lease_modify)
-*(.text.expand_files)
-*(.text.end_buffer_read_nobh)
-*(.text.d_splice_alias)
-*(.text.debug_mutex_init_waiter)
-*(.text.copy_from_user)
-*(.text.cap_vm_enough_memory)
-*(.text.show_vfsmnt)
-*(.text.release_sock)
-*(.text.pfifo_fast_enqueue)
-*(.text.half_md4_transform)
-*(.text.fs_may_remount_ro)
-*(.text.do_fork)
-*(.text.copy_hugetlb_page_range)
-*(.text.cache_free_debugcheck)
-*(.text.__tcp_select_window)
-*(.text.task_handoff_register)
-*(.text.sys_open)
-*(.text.strlcpy)
-*(.text.skb_copy_datagram_iovec)
-*(.text.set_up_list3s)
-*(.text.release_open_intent)
-*(.text.qdisc_restart)
-*(.text.n_tty_chars_in_buffer)
-*(.text.inode_change_ok)
-*(.text.__downgrade_write)
-*(.text.debug_mutex_unlock)
-*(.text.add_timer_randomness)
-*(.text.sock_common_recvmsg)
-*(.text.set_bh_page)
-*(.text.printk_lock)
-*(.text.path_release_on_umount)
-*(.text.ip_output)
-*(.text.ide_build_dmatable)
-*(.text.__get_user_8)
-*(.text.end_buffer_read_sync)
-*(.text.__d_path)
-*(.text.d_move)
-*(.text.del_timer)
-*(.text.constant_test_bit)
-*(.text.blockable_page_cache_readahead)
-*(.text.tty_read)
-*(.text.sys_readlink)
-*(.text.sys_faccessat)
-*(.text.read_swap_cache_async)
-*(.text.pty_write_room)
-*(.text.page_address_in_vma)
-*(.text.kthread)
-*(.text.cfq_exit_io_context)
-*(.text.__tcp_push_pending_frames)
-*(.text.sys_pipe)
-*(.text.submit_bio)
-*(.text.pid_revalidate)
-*(.text.page_referenced_file)
-*(.text.lock_sock)
-*(.text.get_page_state_node)
-*(.text.generic_block_bmap)
-*(.text.do_setitimer)
-*(.text.dev_queue_xmit_nit)
-*(.text.copy_from_read_buf)
-*(.text.__const_udelay)
-*(.text.console_conditional_schedule)
-*(.text.wake_up_new_task)
-*(.text.wait_for_completion_interruptible)
-*(.text.tcp_rcv_rtt_update)
-*(.text.sys_mlockall)
-*(.text.set_fs_altroot)
-*(.text.schedule_timeout)
-*(.text.nr_free_pagecache_pages)
-*(.text.nf_iterate)
-*(.text.mapping_tagged)
-*(.text.ip_queue_xmit)
-*(.text.ip_local_deliver)
-*(.text.follow_page)
-*(.text.elf_map)
-*(.text.dummy_file_permission)
-*(.text.dispose_list)
-*(.text.dentry_open)
-*(.text.dentry_iput)
-*(.text.bio_alloc)
-*(.text.wait_on_page_bit)
-*(.text.vfs_readdir)
-*(.text.vfs_lstat)
-*(.text.seq_escape)
-*(.text.__posix_lock_file)
-*(.text.mm_release)
-*(.text.kref_put)
-*(.text.ip_rcv)
-*(.text.__iget)
-*(.text.free_pages)
-*(.text.find_mergeable_anon_vma)
-*(.text.find_extend_vma)
-*(.text.dummy_inode_listsecurity)
-*(.text.bio_add_page)
-*(.text.__vm_enough_memory)
-*(.text.vfs_stat)
-*(.text.tty_paranoia_check)
-*(.text.tcp_read_sock)
-*(.text.tcp_data_queue)
-*(.text.sys_uname)
-*(.text.sys_renameat)
-*(.text.__strncpy_from_user)
-*(.text.__mutex_init)
-*(.text.__lookup_hash)
-*(.text.kref_get)
-*(.text.ip_route_input)
-*(.text.__insert_inode_hash)
-*(.text.do_sock_write)
-*(.text.blk_done_softirq)
-*(.text.__wake_up_sync)
-*(.text.__vma_link_rb)
-*(.text.tty_ioctl)
-*(.text.tracesys)
-*(.text.sys_getdents)
-*(.text.sys_dup)
-*(.text.stub_execve)
-*(.text.sha_transform)
-*(.text.radix_tree_tag_clear)
-*(.text.put_unused_fd)
-*(.text.put_files_struct)
-*(.text.mpage_readpages)
-*(.text.may_delete)
-*(.text.kmem_cache_create)
-*(.text.ip_mc_output)
-*(.text.interleave_nodes)
-*(.text.groups_search)
-*(.text.generic_drop_inode)
-*(.text.generic_commit_write)
-*(.text.fcntl_setlk)
-*(.text.exit_mmap)
-*(.text.end_page_writeback)
-*(.text.__d_rehash)
-*(.text.debug_mutex_free_waiter)
-*(.text.csum_ipv6_magic)
-*(.text.count)
-*(.text.cleanup_rbuf)
-*(.text.check_spinlock_acquired_node)
-*(.text.can_vma_merge_after)
-*(.text.bio_endio)
-*(.text.alloc_pidmap)
-*(.text.write_ldt)
-*(.text.vmtruncate_range)
-*(.text.vfs_create)
-*(.text.__user_walk)
-*(.text.update_send_head)
-*(.text.unmap_underlying_metadata)
-*(.text.tty_ldisc_deref)
-*(.text.tcp_setsockopt)
-*(.text.tcp_send_ack)
-*(.text.sys_pause)
-*(.text.sys_gettimeofday)
-*(.text.sync_dirty_buffer)
-*(.text.strncmp)
-*(.text.release_posix_timer)
-*(.text.proc_file_read)
-*(.text.prepare_to_wait)
-*(.text.locks_mandatory_locked)
-*(.text.interruptible_sleep_on_timeout)
-*(.text.inode_sub_bytes)
-*(.text.in_group_p)
-*(.text.hrtimer_try_to_cancel)
-*(.text.filldir64)
-*(.text.fasync_helper)
-*(.text.dummy_sb_pivotroot)
-*(.text.d_lookup)
-*(.text.d_instantiate)
-*(.text.__d_find_alias)
-*(.text.cpu_idle_wait)
-*(.text.cond_resched_lock)
-*(.text.chown_common)
-*(.text.blk_congestion_wait)
-*(.text.activate_page)
-*(.text.unlock_buffer)
-*(.text.tty_wakeup)
-*(.text.tcp_v4_do_rcv)
-*(.text.tcp_current_mss)
-*(.text.sys_openat)
-*(.text.sys_fchdir)
-*(.text.strnlen_user)
-*(.text.strnlen)
-*(.text.strchr)
-*(.text.sock_common_getsockopt)
-*(.text.skb_checksum)
-*(.text.remove_wait_queue)
-*(.text.rb_replace_node)
-*(.text.radix_tree_node_ctor)
-*(.text.pty_chars_in_buffer)
-*(.text.profile_hit)
-*(.text.prio_tree_left)
-*(.text.pgd_clear_bad)
-*(.text.pfifo_fast_dequeue)
-*(.text.page_referenced)
-*(.text.open_exec)
-*(.text.mmput)
-*(.text.mm_init)
-*(.text.__ide_dma_off_quietly)
-*(.text.ide_dma_intr)
-*(.text.hrtimer_start)
-*(.text.get_io_context)
-*(.text.__get_free_pages)
-*(.text.find_first_zero_bit)
-*(.text.file_free_rcu)
-*(.text.dummy_socket_sendmsg)
-*(.text.do_unlinkat)
-*(.text.do_arch_prctl)
-*(.text.destroy_inode)
-*(.text.can_vma_merge_before)
-*(.text.block_sync_page)
-*(.text.block_prepare_write)
-*(.text.bio_init)
-*(.text.arch_ptrace)
-*(.text.wake_up_inode)
-*(.text.wait_on_retry_sync_kiocb)
-*(.text.vma_prio_tree_next)
-*(.text.tcp_rcv_space_adjust)
-*(.text.__tcp_ack_snd_check)
-*(.text.sys_utime)
-*(.text.sys_recvmsg)
-*(.text.sys_mremap)
-*(.text.sys_bdflush)
-*(.text.sleep_on)
-*(.text.set_page_dirty_lock)
-*(.text.seq_path)
-*(.text.schedule_timeout_interruptible)
-*(.text.sched_fork)
-*(.text.rt_run_flush)
-*(.text.profile_munmap)
-*(.text.prepare_binprm)
-*(.text.__pagevec_release_nonlru)
-*(.text.m_show)
-*(.text.lookup_mnt)
-*(.text.__lookup_mnt)
-*(.text.lock_timer_base)
-*(.text.is_subdir)
-*(.text.invalidate_bh_lru)
-*(.text.init_buffer_head)
-*(.text.ifind_fast)
-*(.text.ide_dma_start)
-*(.text.__get_page_state)
-*(.text.flock_to_posix_lock)
-*(.text.__find_symbol)
-*(.text.do_futex)
-*(.text.do_execve)
-*(.text.dirty_writeback_centisecs_handler)
-*(.text.dev_watchdog)
-*(.text.can_share_swap_page)
-*(.text.blkdev_put)
-*(.text.bio_get_nr_vecs)
-*(.text.xfrm_compile_policy)
-*(.text.vma_prio_tree_insert)
-*(.text.vfs_lstat_fd)
-*(.text.__user_path_lookup_open)
-*(.text.thread_return)
-*(.text.tcp_send_delayed_ack)
-*(.text.sock_def_error_report)
-*(.text.shrink_slab)
-*(.text.serial_out)
-*(.text.seq_read)
-*(.text.secure_ip_id)
-*(.text.search_binary_handler)
-*(.text.proc_pid_unhash)
-*(.text.pagevec_lookup)
-*(.text.new_inode)
-*(.text.memcpy_toiovec)
-*(.text.locks_free_lock)
-*(.text.__lock_page)
-*(.text.__lock_buffer)
-*(.text.load_module)
-*(.text.is_bad_inode)
-*(.text.invalidate_inode_buffers)
-*(.text.insert_vm_struct)
-*(.text.inode_setattr)
-*(.text.inode_add_bytes)
-*(.text.ide_read_24)
-*(.text.ide_get_error_location)
-*(.text.ide_do_drive_cmd)
-*(.text.get_locked_pte)
-*(.text.get_filesystem_list)
-*(.text.generic_file_open)
-*(.text.follow_down)
-*(.text.find_next_bit)
-*(.text.__find_first_bit)
-*(.text.exit_mm)
-*(.text.exec_keys)
-*(.text.end_buffer_write_sync)
-*(.text.end_bio_bh_io_sync)
-*(.text.dummy_socket_shutdown)
-*(.text.d_rehash)
-*(.text.d_path)
-*(.text.do_ioctl)
-*(.text.dget_locked)
-*(.text.copy_thread_group_keys)
-*(.text.cdrom_end_request)
-*(.text.cap_bprm_apply_creds)
-*(.text.blk_rq_bio_prep)
-*(.text.__bitmap_intersects)
-*(.text.bio_phys_segments)
-*(.text.bio_free)
-*(.text.arch_get_unmapped_area_topdown)
-*(.text.writeback_in_progress)
-*(.text.vfs_follow_link)
-*(.text.tcp_rcv_state_process)
-*(.text.tcp_check_space)
-*(.text.sys_stat)
-*(.text.sys_rt_sigreturn)
-*(.text.sys_rt_sigaction)
-*(.text.sys_remap_file_pages)
-*(.text.sys_pwrite64)
-*(.text.sys_fchownat)
-*(.text.sys_fchmodat)
-*(.text.strncat)
-*(.text.strlcat)
-*(.text.strcmp)
-*(.text.steal_locks)
-*(.text.sock_create)
-*(.text.sk_stream_rfree)
-*(.text.sk_stream_mem_schedule)
-*(.text.skip_atoi)
-*(.text.sk_alloc)
-*(.text.show_stat)
-*(.text.set_fs_pwd)
-*(.text.set_binfmt)
-*(.text.pty_unthrottle)
-*(.text.proc_symlink)
-*(.text.pipe_release)
-*(.text.pageout)
-*(.text.n_tty_write_wakeup)
-*(.text.n_tty_ioctl)
-*(.text.nr_free_zone_pages)
-*(.text.migration_thread)
-*(.text.mempool_free_slab)
-*(.text.meminfo_read_proc)
-*(.text.max_sane_readahead)
-*(.text.lru_cache_add)
-*(.text.kill_fasync)
-*(.text.kernel_read)
-*(.text.invalidate_mapping_pages)
-*(.text.inode_has_buffers)
-*(.text.init_once)
-*(.text.inet_sendmsg)
-*(.text.idedisk_issue_flush)
-*(.text.generic_file_write)
-*(.text.free_more_memory)
-*(.text.__free_fdtable)
-*(.text.filp_dtor)
-*(.text.exit_sem)
-*(.text.exit_itimers)
-*(.text.error_interrupt)
-*(.text.end_buffer_async_write)
-*(.text.eligible_child)
-*(.text.elf_map)
-*(.text.dump_task_regs)
-*(.text.dummy_task_setscheduler)
-*(.text.dummy_socket_accept)
-*(.text.dummy_file_free_security)
-*(.text.__down_read)
-*(.text.do_sock_read)
-*(.text.do_sigaltstack)
-*(.text.do_mremap)
-*(.text.current_io_context)
-*(.text.cpu_swap_callback)
-*(.text.copy_vma)
-*(.text.cap_bprm_set_security)
-*(.text.blk_insert_request)
-*(.text.bio_map_kern_endio)
-*(.text.bio_hw_segments)
-*(.text.bictcp_cong_avoid)
-*(.text.add_interrupt_randomness)
-*(.text.wait_for_completion)
-*(.text.version_read_proc)
-*(.text.unix_write_space)
-*(.text.tty_ldisc_ref_wait)
-*(.text.tty_ldisc_put)
-*(.text.try_to_wake_up)
-*(.text.tcp_v4_tw_remember_stamp)
-*(.text.tcp_try_undo_dsack)
-*(.text.tcp_may_send_now)
-*(.text.sys_waitid)
-*(.text.sys_sched_getparam)
-*(.text.sys_getppid)
-*(.text.sys_getcwd)
-*(.text.sys_dup2)
-*(.text.sys_chmod)
-*(.text.sys_chdir)
-*(.text.sprintf)
-*(.text.sock_wfree)
-*(.text.sock_aio_write)
-*(.text.skb_drop_fraglist)
-*(.text.skb_dequeue)
-*(.text.set_close_on_exec)
-*(.text.set_brk)
-*(.text.seq_puts)
-*(.text.SELECT_DRIVE)
-*(.text.sched_exec)
-*(.text.return_EIO)
-*(.text.remove_from_page_cache)
-*(.text.rcu_start_batch)
-*(.text.__put_task_struct)
-*(.text.proc_pid_readdir)
-*(.text.proc_get_inode)
-*(.text.prepare_to_wait_exclusive)
-*(.text.pipe_wait)
-*(.text.pipe_new)
-*(.text.pdflush_operation)
-*(.text.__pagevec_release)
-*(.text.pagevec_lookup_tag)
-*(.text.packet_rcv)
-*(.text.n_tty_set_room)
-*(.text.nr_free_pages)
-*(.text.__net_timestamp)
-*(.text.mpage_end_io_read)
-*(.text.mod_timer)
-*(.text.__memcpy)
-*(.text.mb_cache_shrink_fn)
-*(.text.lock_rename)
-*(.text.kstrdup)
-*(.text.is_ignored)
-*(.text.int_very_careful)
-*(.text.inotify_inode_is_dead)
-*(.text.inotify_get_cookie)
-*(.text.inode_get_bytes)
-*(.text.init_timer)
-*(.text.init_dev)
-*(.text.inet_getname)
-*(.text.ide_map_sg)
-*(.text.__ide_dma_end)
-*(.text.hrtimer_get_remaining)
-*(.text.get_task_mm)
-*(.text.get_random_int)
-*(.text.free_pipe_info)
-*(.text.filemap_write_and_wait_range)
-*(.text.exit_thread)
-*(.text.enter_idle)
-*(.text.end_that_request_first)
-*(.text.end_8259A_irq)
-*(.text.dummy_file_alloc_security)
-*(.text.do_group_exit)
-*(.text.debug_mutex_init)
-*(.text.cpuset_exit)
-*(.text.cpu_idle)
-*(.text.copy_semundo)
-*(.text.copy_files)
-*(.text.chrdev_open)
-*(.text.cdrom_transfer_packet_command)
-*(.text.cdrom_mode_sense)
-*(.text.blk_phys_contig_segment)
-*(.text.blk_get_queue)
-*(.text.bio_split)
-*(.text.audit_alloc)
-*(.text.anon_pipe_buf_release)
-*(.text.add_wait_queue_exclusive)
-*(.text.add_wait_queue)
-*(.text.acct_process)
-*(.text.account)
-*(.text.zeromap_page_range)
-*(.text.yield)
-*(.text.writeback_acquire)
-*(.text.worker_thread)
-*(.text.wait_on_page_writeback_range)
-*(.text.__wait_on_buffer)
-*(.text.vscnprintf)
-*(.text.vmalloc_to_pfn)
-*(.text.vgacon_save_screen)
-*(.text.vfs_unlink)
-*(.text.vfs_rmdir)
-*(.text.unregister_md_personality)
-*(.text.unlock_new_inode)
-*(.text.unix_stream_sendmsg)
-*(.text.unix_stream_recvmsg)
-*(.text.unhash_process)
-*(.text.udp_v4_lookup_longway)
-*(.text.tty_ldisc_flush)
-*(.text.tty_ldisc_enable)
-*(.text.tty_hung_up_p)
-*(.text.tty_buffer_free_all)
-*(.text.tso_fragment)
-*(.text.try_to_del_timer_sync)
-*(.text.tcp_v4_err)
-*(.text.tcp_unhash)
-*(.text.tcp_seq_next)
-*(.text.tcp_select_initial_window)
-*(.text.tcp_sacktag_write_queue)
-*(.text.tcp_cwnd_validate)
-*(.text.sys_vhangup)
-*(.text.sys_uselib)
-*(.text.sys_symlink)
-*(.text.sys_signal)
-*(.text.sys_poll)
-*(.text.sys_mount)
-*(.text.sys_kill)
-*(.text.sys_ioctl)
-*(.text.sys_inotify_add_watch)
-*(.text.sys_getuid)
-*(.text.sys_getrlimit)
-*(.text.sys_getitimer)
-*(.text.sys_getgroups)
-*(.text.sys_ftruncate)
-*(.text.sysfs_lookup)
-*(.text.sys_exit_group)
-*(.text.stub_fork)
-*(.text.sscanf)
-*(.text.sock_map_fd)
-*(.text.sock_get_timestamp)
-*(.text.__sock_create)
-*(.text.smp_call_function_single)
-*(.text.sk_stop_timer)
-*(.text.skb_copy_and_csum_datagram)
-*(.text.__skb_checksum_complete)
-*(.text.single_next)
-*(.text.sigqueue_alloc)
-*(.text.shrink_dcache_parent)
-*(.text.select_idle_routine)
-*(.text.run_workqueue)
-*(.text.run_local_timers)
-*(.text.remove_inode_hash)
-*(.text.remove_dquot_ref)
-*(.text.register_binfmt)
-*(.text.read_cache_pages)
-*(.text.rb_last)
-*(.text.pty_open)
-*(.text.proc_root_readdir)
-*(.text.proc_pid_flush)
-*(.text.proc_pident_lookup)
-*(.text.proc_fill_super)
-*(.text.proc_exe_link)
-*(.text.posix_locks_deadlock)
-*(.text.pipe_iov_copy_from_user)
-*(.text.opost)
-*(.text.nf_register_hook)
-*(.text.netif_rx_ni)
-*(.text.m_start)
-*(.text.mpage_writepage)
-*(.text.mm_alloc)
-*(.text.memory_open)
-*(.text.mark_buffer_async_write)
-*(.text.lru_add_drain_all)
-*(.text.locks_init_lock)
-*(.text.locks_delete_lock)
-*(.text.lock_hrtimer_base)
-*(.text.load_script)
-*(.text.__kill_fasync)
-*(.text.ip_mc_sf_allow)
-*(.text.__ioremap)
-*(.text.int_with_check)
-*(.text.int_sqrt)
-*(.text.install_thread_keyring)
-*(.text.init_page_buffers)
-*(.text.inet_sock_destruct)
-*(.text.idle_notifier_register)
-*(.text.ide_execute_command)
-*(.text.ide_end_drive_cmd)
-*(.text.__ide_dma_host_on)
-*(.text.hrtimer_run_queues)
-*(.text.hpet_mask_rtc_irq_bit)
-*(.text.__get_zone_counts)
-*(.text.get_zone_counts)
-*(.text.get_write_access)
-*(.text.get_fs_struct)
-*(.text.get_dirty_limits)
-*(.text.generic_readlink)
-*(.text.free_hot_page)
-*(.text.finish_wait)
-*(.text.find_inode)
-*(.text.find_first_bit)
-*(.text.__filemap_fdatawrite_range)
-*(.text.__filemap_copy_from_user_iovec)
-*(.text.exit_aio)
-*(.text.elv_set_request)
-*(.text.elv_former_request)
-*(.text.dup_namespace)
-*(.text.dupfd)
-*(.text.dummy_socket_getsockopt)
-*(.text.dummy_sb_post_mountroot)
-*(.text.dummy_quotactl)
-*(.text.dummy_inode_rename)
-*(.text.__do_SAK)
-*(.text.do_pipe)
-*(.text.do_fsync)
-*(.text.d_instantiate_unique)
-*(.text.d_find_alias)
-*(.text.deny_write_access)
-*(.text.dentry_unhash)
-*(.text.d_delete)
-*(.text.datagram_poll)
-*(.text.cpuset_fork)
-*(.text.cpuid_read)
-*(.text.copy_namespace)
-*(.text.cond_resched)
-*(.text.check_version)
-*(.text.__change_page_attr)
-*(.text.cfq_slab_kill)
-*(.text.cfq_completed_request)
-*(.text.cdrom_pc_intr)
-*(.text.cdrom_decode_status)
-*(.text.cap_capset_check)
-*(.text.blk_put_request)
-*(.text.bio_fs_destructor)
-*(.text.bictcp_min_cwnd)
-*(.text.alloc_chrdev_region)
-*(.text.add_element)
-*(.text.acct_update_integrals)
-*(.text.write_boundary_block)
-*(.text.writeback_release)
-*(.text.writeback_inodes)
-*(.text.wake_up_state)
-*(.text.__wake_up_locked)
-*(.text.wake_futex)
-*(.text.wait_task_inactive)
-*(.text.__wait_on_freeing_inode)
-*(.text.wait_noreap_copyout)
-*(.text.vmstat_start)
-*(.text.vgacon_do_font_op)
-*(.text.vfs_readv)
-*(.text.vfs_quota_sync)
-*(.text.update_queue)
-*(.text.unshare_files)
-*(.text.unmap_vm_area)
-*(.text.unix_socketpair)
-*(.text.unix_release_sock)
-*(.text.unix_detach_fds)
-*(.text.unix_create1)
-*(.text.unix_bind)
-*(.text.udp_sendmsg)
-*(.text.udp_rcv)
-*(.text.udp_queue_rcv_skb)
-*(.text.uart_write)
-*(.text.uart_startup)
-*(.text.uart_open)
-*(.text.tty_vhangup)
-*(.text.tty_termios_baud_rate)
-*(.text.tty_release)
-*(.text.tty_ldisc_ref)
-*(.text.throttle_vm_writeout)
-*(.text.058)
-*(.text.tcp_xmit_probe_skb)
-*(.text.tcp_v4_send_check)
-*(.text.tcp_v4_destroy_sock)
-*(.text.tcp_sync_mss)
-*(.text.tcp_snd_test)
-*(.text.tcp_slow_start)
-*(.text.tcp_send_fin)
-*(.text.tcp_rtt_estimator)
-*(.text.tcp_parse_options)
-*(.text.tcp_ioctl)
-*(.text.tcp_init_tso_segs)
-*(.text.tcp_init_cwnd)
-*(.text.tcp_getsockopt)
-*(.text.tcp_fin)
-*(.text.tcp_connect)
-*(.text.tcp_cong_avoid)
-*(.text.__tcp_checksum_complete_user)
-*(.text.task_dumpable)
-*(.text.sys_wait4)
-*(.text.sys_utimes)
-*(.text.sys_symlinkat)
-*(.text.sys_socketpair)
-*(.text.sys_rmdir)
-*(.text.sys_readahead)
-*(.text.sys_nanosleep)
-*(.text.sys_linkat)
-*(.text.sys_fstat)
-*(.text.sysfs_readdir)
-*(.text.sys_execve)
-*(.text.sysenter_tracesys)
-*(.text.sys_chown)
-*(.text.stub_clone)
-*(.text.strrchr)
-*(.text.strncpy)
-*(.text.stopmachine_set_state)
-*(.text.sock_sendmsg)
-*(.text.sock_release)
-*(.text.sock_fasync)
-*(.text.sock_close)
-*(.text.sk_stream_write_space)
-*(.text.sk_reset_timer)
-*(.text.skb_split)
-*(.text.skb_recv_datagram)
-*(.text.skb_queue_tail)
-*(.text.sk_attach_filter)
-*(.text.si_swapinfo)
-*(.text.simple_strtoll)
-*(.text.set_termios)
-*(.text.set_task_comm)
-*(.text.set_shrinker)
-*(.text.set_normalized_timespec)
-*(.text.set_brk)
-*(.text.serial_in)
-*(.text.seq_printf)
-*(.text.secure_dccp_sequence_number)
-*(.text.rwlock_bug)
-*(.text.rt_hash_code)
-*(.text.__rta_fill)
-*(.text.__request_resource)
-*(.text.relocate_new_kernel)
-*(.text.release_thread)
-*(.text.release_mem)
-*(.text.rb_prev)
-*(.text.rb_first)
-*(.text.random_poll)
-*(.text.__put_super_and_need_restart)
-*(.text.pty_write)
-*(.text.ptrace_stop)
-*(.text.proc_self_readlink)
-*(.text.proc_root_lookup)
-*(.text.proc_root_link)
-*(.text.proc_pid_make_inode)
-*(.text.proc_pid_attr_write)
-*(.text.proc_lookupfd)
-*(.text.proc_delete_inode)
-*(.text.posix_same_owner)
-*(.text.posix_block_lock)
-*(.text.poll_initwait)
-*(.text.pipe_write)
-*(.text.pipe_read_fasync)
-*(.text.pipe_ioctl)
-*(.text.pdflush)
-*(.text.pci_user_read_config_dword)
-*(.text.page_readlink)
-*(.text.null_lseek)
-*(.text.nf_hook_slow)
-*(.text.netlink_sock_destruct)
-*(.text.netlink_broadcast)
-*(.text.neigh_resolve_output)
-*(.text.name_to_int)
-*(.text.mwait_idle)
-*(.text.mutex_trylock)
-*(.text.mutex_debug_check_no_locks_held)
-*(.text.m_stop)
-*(.text.mpage_end_io_write)
-*(.text.mpage_alloc)
-*(.text.move_page_tables)
-*(.text.mounts_open)
-*(.text.__memset)
-*(.text.memcpy_fromiovec)
-*(.text.make_8259A_irq)
-*(.text.lookup_user_key_possessed)
-*(.text.lookup_create)
-*(.text.locks_insert_lock)
-*(.text.locks_alloc_lock)
-*(.text.kthread_should_stop)
-*(.text.kswapd)
-*(.text.kobject_uevent)
-*(.text.kobject_get_path)
-*(.text.kobject_get)
-*(.text.klist_children_put)
-*(.text.__ip_route_output_key)
-*(.text.ip_flush_pending_frames)
-*(.text.ip_compute_csum)
-*(.text.ip_append_data)
-*(.text.ioc_set_batching)
-*(.text.invalidate_inode_pages)
-*(.text.__invalidate_device)
-*(.text.install_arg_page)
-*(.text.in_sched_functions)
-*(.text.inotify_unmount_inodes)
-*(.text.init_once)
-*(.text.init_cdrom_command)
-*(.text.inet_stream_connect)
-*(.text.inet_sk_rebuild_header)
-*(.text.inet_csk_addr2sockaddr)
-*(.text.inet_create)
-*(.text.ifind)
-*(.text.ide_setup_dma)
-*(.text.ide_outsw)
-*(.text.ide_fixstring)
-*(.text.ide_dma_setup)
-*(.text.ide_cdrom_packet)
-*(.text.ide_cd_put)
-*(.text.ide_build_sglist)
-*(.text.i8259A_shutdown)
-*(.text.hung_up_tty_ioctl)
-*(.text.hrtimer_nanosleep)
-*(.text.hrtimer_init)
-*(.text.hrtimer_cancel)
-*(.text.hash_futex)
-*(.text.group_send_sig_info)
-*(.text.grab_cache_page_nowait)
-*(.text.get_wchan)
-*(.text.get_stack)
-*(.text.get_page_state)
-*(.text.getnstimeofday)
-*(.text.get_node)
-*(.text.get_kprobe)
-*(.text.generic_unplug_device)
-*(.text.free_task)
-*(.text.frag_show)
-*(.text.find_next_zero_string)
-*(.text.filp_open)
-*(.text.fillonedir)
-*(.text.exit_io_context)
-*(.text.exit_idle)
-*(.text.exact_lock)
-*(.text.eth_header)
-*(.text.dummy_unregister_security)
-*(.text.dummy_socket_post_create)
-*(.text.dummy_socket_listen)
-*(.text.dummy_quota_on)
-*(.text.dummy_inode_follow_link)
-*(.text.dummy_file_receive)
-*(.text.dummy_file_mprotect)
-*(.text.dummy_file_lock)
-*(.text.dummy_file_ioctl)
-*(.text.dummy_bprm_post_apply_creds)
-*(.text.do_writepages)
-*(.text.__down_interruptible)
-*(.text.do_notify_resume)
-*(.text.do_acct_process)
-*(.text.del_timer_sync)
-*(.text.default_rebuild_header)
-*(.text.d_callback)
-*(.text.dcache_readdir)
-*(.text.ctrl_dumpfamily)
-*(.text.cpuset_rmdir)
-*(.text.copy_strings_kernel)
-*(.text.con_write_room)
-*(.text.complete_all)
-*(.text.collect_sigign_sigcatch)
-*(.text.clear_user)
-*(.text.check_unthrottle)
-*(.text.cdrom_release)
-*(.text.cdrom_newpc_intr)
-*(.text.cdrom_ioctl)
-*(.text.cdrom_check_status)
-*(.text.cdev_put)
-*(.text.cdev_add)
-*(.text.cap_ptrace)
-*(.text.cap_bprm_secureexec)
-*(.text.cache_alloc_refill)
-*(.text.bmap)
-*(.text.blk_run_queue)
-*(.text.blk_queue_dma_alignment)
-*(.text.blk_ordered_req_seq)
-*(.text.blk_backing_dev_unplug)
-*(.text.__bitmap_subset)
-*(.text.__bitmap_and)
-*(.text.bio_unmap_user)
-*(.text.__bforget)
-*(.text.bd_forget)
-*(.text.bad_pipe_w)
-*(.text.bad_get_user)
-*(.text.audit_free)
-*(.text.anon_vma_ctor)
-*(.text.anon_pipe_buf_map)
-*(.text.alloc_sock_iocb)
-*(.text.alloc_fdset)
-*(.text.aio_kick_handler)
-*(.text.__add_entropy_words)
-*(.text.add_disk_randomness)
Index: linux/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux/arch/x86_64/kernel/vmlinux.lds.S
@@ -30,9 +30,6 @@ SECTIONS
/* First the code that has to be first for bootstrapping */
*(.bootstrap.text)
_stext = .;
- /* Then all the functions that are "hot" in profiles, to group them
- onto the same hugetlb entry */
- #include "functionlist"
/* Then the rest */
*(.text)
SCHED_TEXT
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (21 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
` (6 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/boot/setup.S | 2
arch/x86_64/boot/video.S | 2043 -----------------------------------------------
2 files changed, 1 insertion(+), 2044 deletions(-)
Index: linux/arch/x86_64/boot/setup.S
===================================================================
--- linux.orig/arch/x86_64/boot/setup.S
+++ linux/arch/x86_64/boot/setup.S
@@ -807,7 +807,7 @@ gdt_48:
# Include video setup & detection code
-#include "video.S"
+#include "../../i386/boot/video.S"
# Setup signature -- must be last
setup_sig1: .word SIG1
Index: linux/arch/x86_64/boot/video.S
===================================================================
--- linux.orig/arch/x86_64/boot/video.S
+++ /dev/null
@@ -1,2043 +0,0 @@
-/* video.S
- *
- * Display adapter & video mode setup, version 2.13 (14-May-99)
- *
- * Copyright (C) 1995 -- 1998 Martin Mares <mj@ucw.cz>
- * Based on the original setup.S code (C) Linus Torvalds and Mats Anderson
- *
- * Rewritten to use GNU 'as' by Chris Noe <stiker@northlink.com> May 1999
- *
- * For further information, look at Documentation/svga.txt.
- *
- */
-
-/* Enable autodetection of SVGA adapters and modes. */
-#undef CONFIG_VIDEO_SVGA
-
-/* Enable autodetection of VESA modes */
-#define CONFIG_VIDEO_VESA
-
-/* Enable compacting of mode table */
-#define CONFIG_VIDEO_COMPACT
-
-/* Retain screen contents when switching modes */
-#define CONFIG_VIDEO_RETAIN
-
-/* Enable local mode list */
-#undef CONFIG_VIDEO_LOCAL
-
-/* Force 400 scan lines for standard modes (hack to fix bad BIOS behaviour */
-#undef CONFIG_VIDEO_400_HACK
-
-/* Hack that lets you force specific BIOS mode ID and specific dimensions */
-#undef CONFIG_VIDEO_GFX_HACK
-#define VIDEO_GFX_BIOS_AX 0x4f02 /* 800x600 on ThinkPad */
-#define VIDEO_GFX_BIOS_BX 0x0102
-#define VIDEO_GFX_DUMMY_RESOLUTION 0x6425 /* 100x37 */
-
-/* This code uses an extended set of video mode numbers. These include:
- * Aliases for standard modes
- * NORMAL_VGA (-1)
- * EXTENDED_VGA (-2)
- * ASK_VGA (-3)
- * Video modes numbered by menu position -- NOT RECOMMENDED because of lack
- * of compatibility when extending the table. These are between 0x00 and 0xff.
- */
-#define VIDEO_FIRST_MENU 0x0000
-
-/* Standard BIOS video modes (BIOS number + 0x0100) */
-#define VIDEO_FIRST_BIOS 0x0100
-
-/* VESA BIOS video modes (VESA number + 0x0200) */
-#define VIDEO_FIRST_VESA 0x0200
-
-/* Video7 special modes (BIOS number + 0x0900) */
-#define VIDEO_FIRST_V7 0x0900
-
-/* Special video modes */
-#define VIDEO_FIRST_SPECIAL 0x0f00
-#define VIDEO_80x25 0x0f00
-#define VIDEO_8POINT 0x0f01
-#define VIDEO_80x43 0x0f02
-#define VIDEO_80x28 0x0f03
-#define VIDEO_CURRENT_MODE 0x0f04
-#define VIDEO_80x30 0x0f05
-#define VIDEO_80x34 0x0f06
-#define VIDEO_80x60 0x0f07
-#define VIDEO_GFX_HACK 0x0f08
-#define VIDEO_LAST_SPECIAL 0x0f09
-
-/* Video modes given by resolution */
-#define VIDEO_FIRST_RESOLUTION 0x1000
-
-/* The "recalculate timings" flag */
-#define VIDEO_RECALC 0x8000
-
-/* Positions of various video parameters passed to the kernel */
-/* (see also include/linux/tty.h) */
-#define PARAM_CURSOR_POS 0x00
-#define PARAM_VIDEO_PAGE 0x04
-#define PARAM_VIDEO_MODE 0x06
-#define PARAM_VIDEO_COLS 0x07
-#define PARAM_VIDEO_EGA_BX 0x0a
-#define PARAM_VIDEO_LINES 0x0e
-#define PARAM_HAVE_VGA 0x0f
-#define PARAM_FONT_POINTS 0x10
-
-#define PARAM_LFB_WIDTH 0x12
-#define PARAM_LFB_HEIGHT 0x14
-#define PARAM_LFB_DEPTH 0x16
-#define PARAM_LFB_BASE 0x18
-#define PARAM_LFB_SIZE 0x1c
-#define PARAM_LFB_LINELENGTH 0x24
-#define PARAM_LFB_COLORS 0x26
-#define PARAM_VESAPM_SEG 0x2e
-#define PARAM_VESAPM_OFF 0x30
-#define PARAM_LFB_PAGES 0x32
-#define PARAM_VESA_ATTRIB 0x34
-#define PARAM_CAPABILITIES 0x36
-
-/* Define DO_STORE according to CONFIG_VIDEO_RETAIN */
-#ifdef CONFIG_VIDEO_RETAIN
-#define DO_STORE call store_screen
-#else
-#define DO_STORE
-#endif /* CONFIG_VIDEO_RETAIN */
-
-# This is the main entry point called by setup.S
-# %ds *must* be pointing to the bootsector
-video: pushw %ds # We use different segments
- pushw %ds # FS contains original DS
- popw %fs
- pushw %cs # DS is equal to CS
- popw %ds
- pushw %cs # ES is equal to CS
- popw %es
- xorw %ax, %ax
- movw %ax, %gs # GS is zero
- cld
- call basic_detect # Basic adapter type testing (EGA/VGA/MDA/CGA)
-#ifdef CONFIG_VIDEO_SELECT
- movw %fs:(0x01fa), %ax # User selected video mode
- cmpw $ASK_VGA, %ax # Bring up the menu
- jz vid2
-
- call mode_set # Set the mode
- jc vid1
-
- leaw badmdt, %si # Invalid mode ID
- call prtstr
-vid2: call mode_menu
-vid1:
-#ifdef CONFIG_VIDEO_RETAIN
- call restore_screen # Restore screen contents
-#endif /* CONFIG_VIDEO_RETAIN */
- call store_edid
-#endif /* CONFIG_VIDEO_SELECT */
- call mode_params # Store mode parameters
- popw %ds # Restore original DS
- ret
-
-# Detect if we have CGA, MDA, EGA or VGA and pass it to the kernel.
-basic_detect:
- movb $0, %fs:(PARAM_HAVE_VGA)
- movb $0x12, %ah # Check EGA/VGA
- movb $0x10, %bl
- int $0x10
- movw %bx, %fs:(PARAM_VIDEO_EGA_BX) # Identifies EGA to the kernel
- cmpb $0x10, %bl # No, it's a CGA/MDA/HGA card.
- je basret
-
- incb adapter
- movw $0x1a00, %ax # Check EGA or VGA?
- int $0x10
- cmpb $0x1a, %al # 1a means VGA...
- jne basret # anything else is EGA.
-
- incb %fs:(PARAM_HAVE_VGA) # We've detected a VGA
- incb adapter
-basret: ret
-
-# Store the video mode parameters for later usage by the kernel.
-# This is done by asking the BIOS except for the rows/columns
-# parameters in the default 80x25 mode -- these are set directly,
-# because some very obscure BIOSes supply insane values.
-mode_params:
-#ifdef CONFIG_VIDEO_SELECT
- cmpb $0, graphic_mode
- jnz mopar_gr
-#endif
- movb $0x03, %ah # Read cursor position
- xorb %bh, %bh
- int $0x10
- movw %dx, %fs:(PARAM_CURSOR_POS)
- movb $0x0f, %ah # Read page/mode/width
- int $0x10
- movw %bx, %fs:(PARAM_VIDEO_PAGE)
- movw %ax, %fs:(PARAM_VIDEO_MODE) # Video mode and screen width
- cmpb $0x7, %al # MDA/HGA => segment differs
- jnz mopar0
-
- movw $0xb000, video_segment
-mopar0: movw %gs:(0x485), %ax # Font size
- movw %ax, %fs:(PARAM_FONT_POINTS) # (valid only on EGA/VGA)
- movw force_size, %ax # Forced size?
- orw %ax, %ax
- jz mopar1
-
- movb %ah, %fs:(PARAM_VIDEO_COLS)
- movb %al, %fs:(PARAM_VIDEO_LINES)
- ret
-
-mopar1: movb $25, %al
- cmpb $0, adapter # If we are on CGA/MDA/HGA, the
- jz mopar2 # screen must have 25 lines.
-
- movb %gs:(0x484), %al # On EGA/VGA, use the EGA+ BIOS
- incb %al # location of max lines.
-mopar2: movb %al, %fs:(PARAM_VIDEO_LINES)
- ret
-
-#ifdef CONFIG_VIDEO_SELECT
-# Fetching of VESA frame buffer parameters
-mopar_gr:
- leaw modelist+1024, %di
- movb $0x23, %fs:(PARAM_HAVE_VGA)
- movw 16(%di), %ax
- movw %ax, %fs:(PARAM_LFB_LINELENGTH)
- movw 18(%di), %ax
- movw %ax, %fs:(PARAM_LFB_WIDTH)
- movw 20(%di), %ax
- movw %ax, %fs:(PARAM_LFB_HEIGHT)
- movb 25(%di), %al
- movb $0, %ah
- movw %ax, %fs:(PARAM_LFB_DEPTH)
- movb 29(%di), %al
- movb $0, %ah
- movw %ax, %fs:(PARAM_LFB_PAGES)
- movl 40(%di), %eax
- movl %eax, %fs:(PARAM_LFB_BASE)
- movl 31(%di), %eax
- movl %eax, %fs:(PARAM_LFB_COLORS)
- movl 35(%di), %eax
- movl %eax, %fs:(PARAM_LFB_COLORS+4)
- movw 0(%di), %ax
- movw %ax, %fs:(PARAM_VESA_ATTRIB)
-
-# get video mem size
- leaw modelist+1024, %di
- movw $0x4f00, %ax
- int $0x10
- xorl %eax, %eax
- movw 18(%di), %ax
- movl %eax, %fs:(PARAM_LFB_SIZE)
-
-# store mode capabilities
- movl 10(%di), %eax
- movl %eax, %fs:(PARAM_CAPABILITIES)
-
-# switching the DAC to 8-bit is for <= 8 bpp only
- movw %fs:(PARAM_LFB_DEPTH), %ax
- cmpw $8, %ax
- jg dac_done
-
-# get DAC switching capability
- xorl %eax, %eax
- movb 10(%di), %al
- testb $1, %al
- jz dac_set
-
-# attempt to switch DAC to 8-bit
- movw $0x4f08, %ax
- movw $0x0800, %bx
- int $0x10
- cmpw $0x004f, %ax
- jne dac_set
- movb %bh, dac_size # store actual DAC size
-
-dac_set:
-# set color size to DAC size
- movb dac_size, %al
- movb %al, %fs:(PARAM_LFB_COLORS+0)
- movb %al, %fs:(PARAM_LFB_COLORS+2)
- movb %al, %fs:(PARAM_LFB_COLORS+4)
- movb %al, %fs:(PARAM_LFB_COLORS+6)
-
-# set color offsets to 0
- movb $0, %fs:(PARAM_LFB_COLORS+1)
- movb $0, %fs:(PARAM_LFB_COLORS+3)
- movb $0, %fs:(PARAM_LFB_COLORS+5)
- movb $0, %fs:(PARAM_LFB_COLORS+7)
-
-dac_done:
-# get protected mode interface informations
- movw $0x4f0a, %ax
- xorw %bx, %bx
- xorw %di, %di
- int $0x10
- cmp $0x004f, %ax
- jnz no_pm
-
- movw %es, %fs:(PARAM_VESAPM_SEG)
- movw %di, %fs:(PARAM_VESAPM_OFF)
-no_pm: ret
-
-# The video mode menu
-mode_menu:
- leaw keymsg, %si # "Return/Space/Timeout" message
- call prtstr
- call flush
-nokey: call getkt
-
- cmpb $0x0d, %al # ENTER ?
- je listm # yes - manual mode selection
-
- cmpb $0x20, %al # SPACE ?
- je defmd1 # no - repeat
-
- call beep
- jmp nokey
-
-defmd1: ret # No mode chosen? Default 80x25
-
-listm: call mode_table # List mode table
-listm0: leaw name_bann, %si # Print adapter name
- call prtstr
- movw card_name, %si
- orw %si, %si
- jnz an2
-
- movb adapter, %al
- leaw old_name, %si
- orb %al, %al
- jz an1
-
- leaw ega_name, %si
- decb %al
- jz an1
-
- leaw vga_name, %si
- jmp an1
-
-an2: call prtstr
- leaw svga_name, %si
-an1: call prtstr
- leaw listhdr, %si # Table header
- call prtstr
- movb $0x30, %dl # DL holds mode number
- leaw modelist, %si
-lm1: cmpw $ASK_VGA, (%si) # End?
- jz lm2
-
- movb %dl, %al # Menu selection number
- call prtchr
- call prtsp2
- lodsw
- call prthw # Mode ID
- call prtsp2
- movb 0x1(%si), %al
- call prtdec # Rows
- movb $0x78, %al # the letter 'x'
- call prtchr
- lodsw
- call prtdec # Columns
- movb $0x0d, %al # New line
- call prtchr
- movb $0x0a, %al
- call prtchr
- incb %dl # Next character
- cmpb $0x3a, %dl
- jnz lm1
-
- movb $0x61, %dl
- jmp lm1
-
-lm2: leaw prompt, %si # Mode prompt
- call prtstr
- leaw edit_buf, %di # Editor buffer
-lm3: call getkey
- cmpb $0x0d, %al # Enter?
- jz lment
-
- cmpb $0x08, %al # Backspace?
- jz lmbs
-
- cmpb $0x20, %al # Printable?
- jc lm3
-
- cmpw $edit_buf+4, %di # Enough space?
- jz lm3
-
- stosb
- call prtchr
- jmp lm3
-
-lmbs: cmpw $edit_buf, %di # Backspace
- jz lm3
-
- decw %di
- movb $0x08, %al
- call prtchr
- call prtspc
- movb $0x08, %al
- call prtchr
- jmp lm3
-
-lment: movb $0, (%di)
- leaw crlft, %si
- call prtstr
- leaw edit_buf, %si
- cmpb $0, (%si) # Empty string = default mode
- jz lmdef
-
- cmpb $0, 1(%si) # One character = menu selection
- jz mnusel
-
- cmpw $0x6373, (%si) # "scan" => mode scanning
- jnz lmhx
-
- cmpw $0x6e61, 2(%si)
- jz lmscan
-
-lmhx: xorw %bx, %bx # Else => mode ID in hex
-lmhex: lodsb
- orb %al, %al
- jz lmuse1
-
- subb $0x30, %al
- jc lmbad
-
- cmpb $10, %al
- jc lmhx1
-
- subb $7, %al
- andb $0xdf, %al
- cmpb $10, %al
- jc lmbad
-
- cmpb $16, %al
- jnc lmbad
-
-lmhx1: shlw $4, %bx
- orb %al, %bl
- jmp lmhex
-
-lmuse1: movw %bx, %ax
- jmp lmuse
-
-mnusel: lodsb # Menu selection
- xorb %ah, %ah
- subb $0x30, %al
- jc lmbad
-
- cmpb $10, %al
- jc lmuse
-
- cmpb $0x61-0x30, %al
- jc lmbad
-
- subb $0x61-0x30-10, %al
- cmpb $36, %al
- jnc lmbad
-
-lmuse: call mode_set
- jc lmdef
-
-lmbad: leaw unknt, %si
- call prtstr
- jmp lm2
-lmscan: cmpb $0, adapter # Scanning only on EGA/VGA
- jz lmbad
-
- movw $0, mt_end # Scanning of modes is
- movb $1, scanning # done as new autodetection.
- call mode_table
- jmp listm0
-lmdef: ret
-
-# Additional parts of mode_set... (relative jumps, you know)
-setv7: # Video7 extended modes
- DO_STORE
- subb $VIDEO_FIRST_V7>>8, %bh
- movw $0x6f05, %ax
- int $0x10
- stc
- ret
-
-_setrec: jmp setrec # Ugly...
-_set_80x25: jmp set_80x25
-
-# Aliases for backward compatibility.
-setalias:
- movw $VIDEO_80x25, %ax
- incw %bx
- jz mode_set
-
- movb $VIDEO_8POINT-VIDEO_FIRST_SPECIAL, %al
- incw %bx
- jnz setbad # Fall-through!
-
-# Setting of user mode (AX=mode ID) => CF=success
-mode_set:
- movw %ax, %fs:(0x01fa) # Store mode for use in acpi_wakeup.S
- movw %ax, %bx
- cmpb $0xff, %ah
- jz setalias
-
- testb $VIDEO_RECALC>>8, %ah
- jnz _setrec
-
- cmpb $VIDEO_FIRST_RESOLUTION>>8, %ah
- jnc setres
-
- cmpb $VIDEO_FIRST_SPECIAL>>8, %ah
- jz setspc
-
- cmpb $VIDEO_FIRST_V7>>8, %ah
- jz setv7
-
- cmpb $VIDEO_FIRST_VESA>>8, %ah
- jnc check_vesa
-
- orb %ah, %ah
- jz setmenu
-
- decb %ah
- jz setbios
-
-setbad: clc
- movb $0, do_restore # The screen needn't be restored
- ret
-
-setvesa:
- DO_STORE
- subb $VIDEO_FIRST_VESA>>8, %bh
- movw $0x4f02, %ax # VESA BIOS mode set call
- int $0x10
- cmpw $0x004f, %ax # AL=4f if implemented
- jnz setbad # AH=0 if OK
-
- stc
- ret
-
-setbios:
- DO_STORE
- int $0x10 # Standard BIOS mode set call
- pushw %bx
- movb $0x0f, %ah # Check if really set
- int $0x10
- popw %bx
- cmpb %bl, %al
- jnz setbad
-
- stc
- ret
-
-setspc: xorb %bh, %bh # Set special mode
- cmpb $VIDEO_LAST_SPECIAL-VIDEO_FIRST_SPECIAL, %bl
- jnc setbad
-
- addw %bx, %bx
- jmp *spec_inits(%bx)
-
-setmenu:
- orb %al, %al # 80x25 is an exception
- jz _set_80x25
-
- pushw %bx # Set mode chosen from menu
- call mode_table # Build the mode table
- popw %ax
- shlw $2, %ax
- addw %ax, %si
- cmpw %di, %si
- jnc setbad
-
- movw (%si), %ax # Fetch mode ID
-_m_s: jmp mode_set
-
-setres: pushw %bx # Set mode chosen by resolution
- call mode_table
- popw %bx
- xchgb %bl, %bh
-setr1: lodsw
- cmpw $ASK_VGA, %ax # End of the list?
- jz setbad
-
- lodsw
- cmpw %bx, %ax
- jnz setr1
-
- movw -4(%si), %ax # Fetch mode ID
- jmp _m_s
-
-check_vesa:
-#ifdef CONFIG_FIRMWARE_EDID
- leaw modelist+1024, %di
- movw $0x4f00, %ax
- int $0x10
- cmpw $0x004f, %ax
- jnz setbad
-
- movw 4(%di), %ax
- movw %ax, vbe_version
-#endif
- leaw modelist+1024, %di
- subb $VIDEO_FIRST_VESA>>8, %bh
- movw %bx, %cx # Get mode information structure
- movw $0x4f01, %ax
- int $0x10
- addb $VIDEO_FIRST_VESA>>8, %bh
- cmpw $0x004f, %ax
- jnz setbad
-
- movb (%di), %al # Check capabilities.
- andb $0x19, %al
- cmpb $0x09, %al
- jz setvesa # This is a text mode
-
- movb (%di), %al # Check capabilities.
- andb $0x99, %al
- cmpb $0x99, %al
- jnz _setbad # Doh! No linear frame buffer.
-
- subb $VIDEO_FIRST_VESA>>8, %bh
- orw $0x4000, %bx # Use linear frame buffer
- movw $0x4f02, %ax # VESA BIOS mode set call
- int $0x10
- cmpw $0x004f, %ax # AL=4f if implemented
- jnz _setbad # AH=0 if OK
-
- movb $1, graphic_mode # flag graphic mode
- movb $0, do_restore # no screen restore
- stc
- ret
-
-_setbad: jmp setbad # Ugly...
-
-# Recalculate vertical display end registers -- this fixes various
-# inconsistencies of extended modes on many adapters. Called when
-# the VIDEO_RECALC flag is set in the mode ID.
-
-setrec: subb $VIDEO_RECALC>>8, %ah # Set the base mode
- call mode_set
- jnc rct3
-
- movw %gs:(0x485), %ax # Font size in pixels
- movb %gs:(0x484), %bl # Number of rows
- incb %bl
- mulb %bl # Number of visible
- decw %ax # scan lines - 1
- movw $0x3d4, %dx
- movw %ax, %bx
- movb $0x12, %al # Lower 8 bits
- movb %bl, %ah
- outw %ax, %dx
- movb $0x07, %al # Bits 8 and 9 in the overflow register
- call inidx
- xchgb %al, %ah
- andb $0xbd, %ah
- shrb %bh
- jnc rct1
- orb $0x02, %ah
-rct1: shrb %bh
- jnc rct2
- orb $0x40, %ah
-rct2: movb $0x07, %al
- outw %ax, %dx
- stc
-rct3: ret
-
-# Table of routines for setting of the special modes.
-spec_inits:
- .word set_80x25
- .word set_8pixel
- .word set_80x43
- .word set_80x28
- .word set_current
- .word set_80x30
- .word set_80x34
- .word set_80x60
- .word set_gfx
-
-# Set the 80x25 mode. If already set, do nothing.
-set_80x25:
- movw $0x5019, force_size # Override possibly broken BIOS
-use_80x25:
-#ifdef CONFIG_VIDEO_400_HACK
- movw $0x1202, %ax # Force 400 scan lines
- movb $0x30, %bl
- int $0x10
-#else
- movb $0x0f, %ah # Get current mode ID
- int $0x10
- cmpw $0x5007, %ax # Mode 7 (80x25 mono) is the only one available
- jz st80 # on CGA/MDA/HGA and is also available on EGAM
-
- cmpw $0x5003, %ax # Unknown mode, force 80x25 color
- jnz force3
-
-st80: cmpb $0, adapter # CGA/MDA/HGA => mode 3/7 is always 80x25
- jz set80
-
- movb %gs:(0x0484), %al # This is EGA+ -- beware of 80x50 etc.
- orb %al, %al # Some buggy BIOS'es set 0 rows
- jz set80
-
- cmpb $24, %al # It's hopefully correct
- jz set80
-#endif /* CONFIG_VIDEO_400_HACK */
-force3: DO_STORE
- movw $0x0003, %ax # Forced set
- int $0x10
-set80: stc
- ret
-
-# Set the 80x50/80x43 8-pixel mode. Simple BIOS calls.
-set_8pixel:
- DO_STORE
- call use_80x25 # The base is 80x25
-set_8pt:
- movw $0x1112, %ax # Use 8x8 font
- xorb %bl, %bl
- int $0x10
- movw $0x1200, %ax # Use alternate print screen
- movb $0x20, %bl
- int $0x10
- movw $0x1201, %ax # Turn off cursor emulation
- movb $0x34, %bl
- int $0x10
- movb $0x01, %ah # Define cursor scan lines 6-7
- movw $0x0607, %cx
- int $0x10
-set_current:
- stc
- ret
-
-# Set the 80x28 mode. This mode works on all VGA's, because it's a standard
-# 80x25 mode with 14-point fonts instead of 16-point.
-set_80x28:
- DO_STORE
- call use_80x25 # The base is 80x25
-set14: movw $0x1111, %ax # Use 9x14 font
- xorb %bl, %bl
- int $0x10
- movb $0x01, %ah # Define cursor scan lines 11-12
- movw $0x0b0c, %cx
- int $0x10
- stc
- ret
-
-# Set the 80x43 mode. This mode is works on all VGA's.
-# It's a 350-scanline mode with 8-pixel font.
-set_80x43:
- DO_STORE
- movw $0x1201, %ax # Set 350 scans
- movb $0x30, %bl
- int $0x10
- movw $0x0003, %ax # Reset video mode
- int $0x10
- jmp set_8pt # Use 8-pixel font
-
-# Set the 80x30 mode (all VGA's). 480 scanlines, 16-pixel font.
-set_80x30:
- call use_80x25 # Start with real 80x25
- DO_STORE
- movw $0x3cc, %dx # Get CRTC port
- inb %dx, %al
- movb $0xd4, %dl
- rorb %al # Mono or color?
- jc set48a
-
- movb $0xb4, %dl
-set48a: movw $0x0c11, %ax # Vertical sync end (also unlocks CR0-7)
- call outidx
- movw $0x0b06, %ax # Vertical total
- call outidx
- movw $0x3e07, %ax # (Vertical) overflow
- call outidx
- movw $0xea10, %ax # Vertical sync start
- call outidx
- movw $0xdf12, %ax # Vertical display end
- call outidx
- movw $0xe715, %ax # Vertical blank start
- call outidx
- movw $0x0416, %ax # Vertical blank end
- call outidx
- pushw %dx
- movb $0xcc, %dl # Misc output register (read)
- inb %dx, %al
- movb $0xc2, %dl # (write)
- andb $0x0d, %al # Preserve clock select bits and color bit
- orb $0xe2, %al # Set correct sync polarity
- outb %al, %dx
- popw %dx
- movw $0x501e, force_size
- stc # That's all.
- ret
-
-# Set the 80x34 mode (all VGA's). 480 scans, 14-pixel font.
-set_80x34:
- call set_80x30 # Set 480 scans
- call set14 # And 14-pt font
- movw $0xdb12, %ax # VGA vertical display end
- movw $0x5022, force_size
-setvde: call outidx
- stc
- ret
-
-# Set the 80x60 mode (all VGA's). 480 scans, 8-pixel font.
-set_80x60:
- call set_80x30 # Set 480 scans
- call set_8pt # And 8-pt font
- movw $0xdf12, %ax # VGA vertical display end
- movw $0x503c, force_size
- jmp setvde
-
-# Special hack for ThinkPad graphics
-set_gfx:
-#ifdef CONFIG_VIDEO_GFX_HACK
- movw $VIDEO_GFX_BIOS_AX, %ax
- movw $VIDEO_GFX_BIOS_BX, %bx
- int $0x10
- movw $VIDEO_GFX_DUMMY_RESOLUTION, force_size
- stc
-#endif
- ret
-
-#ifdef CONFIG_VIDEO_RETAIN
-
-# Store screen contents to temporary buffer.
-store_screen:
- cmpb $0, do_restore # Already stored?
- jnz stsr
-
- testb $CAN_USE_HEAP, loadflags # Have we space for storing?
- jz stsr
-
- pushw %ax
- pushw %bx
- pushw force_size # Don't force specific size
- movw $0, force_size
- call mode_params # Obtain params of current mode
- popw force_size
- movb %fs:(PARAM_VIDEO_LINES), %ah
- movb %fs:(PARAM_VIDEO_COLS), %al
- movw %ax, %bx # BX=dimensions
- mulb %ah
- movw %ax, %cx # CX=number of characters
- addw %ax, %ax # Calculate image size
- addw $modelist+1024+4, %ax
- cmpw heap_end_ptr, %ax
- jnc sts1 # Unfortunately, out of memory
-
- movw %fs:(PARAM_CURSOR_POS), %ax # Store mode params
- leaw modelist+1024, %di
- stosw
- movw %bx, %ax
- stosw
- pushw %ds # Store the screen
- movw video_segment, %ds
- xorw %si, %si
- rep
- movsw
- popw %ds
- incb do_restore # Screen will be restored later
-sts1: popw %bx
- popw %ax
-stsr: ret
-
-# Restore screen contents from temporary buffer.
-restore_screen:
- cmpb $0, do_restore # Has the screen been stored?
- jz res1
-
- call mode_params # Get parameters of current mode
- movb %fs:(PARAM_VIDEO_LINES), %cl
- movb %fs:(PARAM_VIDEO_COLS), %ch
- leaw modelist+1024, %si # Screen buffer
- lodsw # Set cursor position
- movw %ax, %dx
- cmpb %cl, %dh
- jc res2
-
- movb %cl, %dh
- decb %dh
-res2: cmpb %ch, %dl
- jc res3
-
- movb %ch, %dl
- decb %dl
-res3: movb $0x02, %ah
- movb $0x00, %bh
- int $0x10
- lodsw # Display size
- movb %ah, %dl # DL=number of lines
- movb $0, %ah # BX=phys. length of orig. line
- movw %ax, %bx
- cmpb %cl, %dl # Too many?
- jc res4
-
- pushw %ax
- movb %dl, %al
- subb %cl, %al
- mulb %bl
- addw %ax, %si
- addw %ax, %si
- popw %ax
- movb %cl, %dl
-res4: cmpb %ch, %al # Too wide?
- jc res5
-
- movb %ch, %al # AX=width of src. line
-res5: movb $0, %cl
- xchgb %ch, %cl
- movw %cx, %bp # BP=width of dest. line
- pushw %es
- movw video_segment, %es
- xorw %di, %di # Move the data
- addw %bx, %bx # Convert BX and BP to _bytes_
- addw %bp, %bp
-res6: pushw %si
- pushw %di
- movw %ax, %cx
- rep
- movsw
- popw %di
- popw %si
- addw %bp, %di
- addw %bx, %si
- decb %dl
- jnz res6
-
- popw %es # Done
-res1: ret
-#endif /* CONFIG_VIDEO_RETAIN */
-
-# Write to indexed VGA register (AL=index, AH=data, DX=index reg. port)
-outidx: outb %al, %dx
- pushw %ax
- movb %ah, %al
- incw %dx
- outb %al, %dx
- decw %dx
- popw %ax
- ret
-
-# Build the table of video modes (stored after the setup.S code at the
-# `modelist' label. Each video mode record looks like:
-# .word MODE-ID (our special mode ID (see above))
-# .byte rows (number of rows)
-# .byte columns (number of columns)
-# Returns address of the end of the table in DI, the end is marked
-# with a ASK_VGA ID.
-mode_table:
- movw mt_end, %di # Already filled?
- orw %di, %di
- jnz mtab1x
-
- leaw modelist, %di # Store standard modes:
- movl $VIDEO_80x25 + 0x50190000, %eax # The 80x25 mode (ALL)
- stosl
- movb adapter, %al # CGA/MDA/HGA -- no more modes
- orb %al, %al
- jz mtabe
-
- decb %al
- jnz mtabv
-
- movl $VIDEO_8POINT + 0x502b0000, %eax # The 80x43 EGA mode
- stosl
- jmp mtabe
-
-mtab1x: jmp mtab1
-
-mtabv: leaw vga_modes, %si # All modes for std VGA
- movw $vga_modes_end-vga_modes, %cx
- rep # I'm unable to use movsw as I don't know how to store a half
- movsb # of the expression above to cx without using explicit shr.
-
- cmpb $0, scanning # Mode scan requested?
- jz mscan1
-
- call mode_scan
-mscan1:
-
-#ifdef CONFIG_VIDEO_LOCAL
- call local_modes
-#endif /* CONFIG_VIDEO_LOCAL */
-
-#ifdef CONFIG_VIDEO_VESA
- call vesa_modes # Detect VESA VGA modes
-#endif /* CONFIG_VIDEO_VESA */
-
-#ifdef CONFIG_VIDEO_SVGA
- cmpb $0, scanning # Bypass when scanning
- jnz mscan2
-
- call svga_modes # Detect SVGA cards & modes
-mscan2:
-#endif /* CONFIG_VIDEO_SVGA */
-
-mtabe:
-
-#ifdef CONFIG_VIDEO_COMPACT
- leaw modelist, %si
- movw %di, %dx
- movw %si, %di
-cmt1: cmpw %dx, %si # Scan all modes
- jz cmt2
-
- leaw modelist, %bx # Find in previous entries
- movw 2(%si), %cx
-cmt3: cmpw %bx, %si
- jz cmt4
-
- cmpw 2(%bx), %cx # Found => don't copy this entry
- jz cmt5
-
- addw $4, %bx
- jmp cmt3
-
-cmt4: movsl # Copy entry
- jmp cmt1
-
-cmt5: addw $4, %si # Skip entry
- jmp cmt1
-
-cmt2:
-#endif /* CONFIG_VIDEO_COMPACT */
-
- movw $ASK_VGA, (%di) # End marker
- movw %di, mt_end
-mtab1: leaw modelist, %si # SI=mode list, DI=list end
-ret0: ret
-
-# Modes usable on all standard VGAs
-vga_modes:
- .word VIDEO_8POINT
- .word 0x5032 # 80x50
- .word VIDEO_80x43
- .word 0x502b # 80x43
- .word VIDEO_80x28
- .word 0x501c # 80x28
- .word VIDEO_80x30
- .word 0x501e # 80x30
- .word VIDEO_80x34
- .word 0x5022 # 80x34
- .word VIDEO_80x60
- .word 0x503c # 80x60
-#ifdef CONFIG_VIDEO_GFX_HACK
- .word VIDEO_GFX_HACK
- .word VIDEO_GFX_DUMMY_RESOLUTION
-#endif
-
-vga_modes_end:
-# Detect VESA modes.
-
-#ifdef CONFIG_VIDEO_VESA
-vesa_modes:
- cmpb $2, adapter # VGA only
- jnz ret0
-
- movw %di, %bp # BP=original mode table end
- addw $0x200, %di # Buffer space
- movw $0x4f00, %ax # VESA Get card info call
- int $0x10
- movw %bp, %di
- cmpw $0x004f, %ax # Successful?
- jnz ret0
-
- cmpw $0x4556, 0x200(%di)
- jnz ret0
-
- cmpw $0x4153, 0x202(%di)
- jnz ret0
-
- movw $vesa_name, card_name # Set name to "VESA VGA"
- pushw %gs
- lgsw 0x20e(%di), %si # GS:SI=mode list
- movw $128, %cx # Iteration limit
-vesa1:
-# gas version 2.9.1, using BFD version 2.9.1.0.23 buggers the next inst.
-# XXX: lodsw %gs:(%si), %ax # Get next mode in the list
- gs; lodsw
- cmpw $0xffff, %ax # End of the table?
- jz vesar
-
- cmpw $0x0080, %ax # Check validity of mode ID
- jc vesa2
-
- orb %ah, %ah # Valid IDs: 0x0000-0x007f/0x0100-0x07ff
- jz vesan # Certain BIOSes report 0x80-0xff!
-
- cmpw $0x0800, %ax
- jnc vesae
-
-vesa2: pushw %cx
- movw %ax, %cx # Get mode information structure
- movw $0x4f01, %ax
- int $0x10
- movw %cx, %bx # BX=mode number
- addb $VIDEO_FIRST_VESA>>8, %bh
- popw %cx
- cmpw $0x004f, %ax
- jnz vesan # Don't report errors (buggy BIOSES)
-
- movb (%di), %al # Check capabilities. We require
- andb $0x19, %al # a color text mode.
- cmpb $0x09, %al
- jnz vesan
-
- cmpw $0xb800, 8(%di) # Standard video memory address required
- jnz vesan
-
- testb $2, (%di) # Mode characteristics supplied?
- movw %bx, (%di) # Store mode number
- jz vesa3
-
- xorw %dx, %dx
- movw 0x12(%di), %bx # Width
- orb %bh, %bh
- jnz vesan
-
- movb %bl, 0x3(%di)
- movw 0x14(%di), %ax # Height
- orb %ah, %ah
- jnz vesan
-
- movb %al, 2(%di)
- mulb %bl
- cmpw $8193, %ax # Small enough for Linux console driver?
- jnc vesan
-
- jmp vesaok
-
-vesa3: subw $0x8108, %bx # This mode has no detailed info specified,
- jc vesan # so it must be a standard VESA mode.
-
- cmpw $5, %bx
- jnc vesan
-
- movw vesa_text_mode_table(%bx), %ax
- movw %ax, 2(%di)
-vesaok: addw $4, %di # The mode is valid. Store it.
-vesan: loop vesa1 # Next mode. Limit exceeded => error
-vesae: leaw vesaer, %si
- call prtstr
- movw %bp, %di # Discard already found modes.
-vesar: popw %gs
- ret
-
-# Dimensions of standard VESA text modes
-vesa_text_mode_table:
- .byte 60, 80 # 0108
- .byte 25, 132 # 0109
- .byte 43, 132 # 010A
- .byte 50, 132 # 010B
- .byte 60, 132 # 010C
-#endif /* CONFIG_VIDEO_VESA */
-
-# Scan for video modes. A bit dirty, but should work.
-mode_scan:
- movw $0x0100, %cx # Start with mode 0
-scm1: movb $0, %ah # Test the mode
- movb %cl, %al
- int $0x10
- movb $0x0f, %ah
- int $0x10
- cmpb %cl, %al
- jnz scm2 # Mode not set
-
- movw $0x3c0, %dx # Test if it's a text mode
- movb $0x10, %al # Mode bits
- call inidx
- andb $0x03, %al
- jnz scm2
-
- movb $0xce, %dl # Another set of mode bits
- movb $0x06, %al
- call inidx
- shrb %al
- jc scm2
-
- movb $0xd4, %dl # Cursor location
- movb $0x0f, %al
- call inidx
- orb %al, %al
- jnz scm2
-
- movw %cx, %ax # Ok, store the mode
- stosw
- movb %gs:(0x484), %al # Number of rows
- incb %al
- stosb
- movw %gs:(0x44a), %ax # Number of columns
- stosb
-scm2: incb %cl
- jns scm1
-
- movw $0x0003, %ax # Return back to mode 3
- int $0x10
- ret
-
-tstidx: outw %ax, %dx # OUT DX,AX and inidx
-inidx: outb %al, %dx # Read from indexed VGA register
- incw %dx # AL=index, DX=index reg port -> AL=data
- inb %dx, %al
- decw %dx
- ret
-
-# Try to detect type of SVGA card and supply (usually approximate) video
-# mode table for it.
-
-#ifdef CONFIG_VIDEO_SVGA
-svga_modes:
- leaw svga_table, %si # Test all known SVGA adapters
-dosvga: lodsw
- movw %ax, %bp # Default mode table
- orw %ax, %ax
- jz didsv1
-
- lodsw # Pointer to test routine
- pushw %si
- pushw %di
- pushw %es
- movw $0xc000, %bx
- movw %bx, %es
- call *%ax # Call test routine
- popw %es
- popw %di
- popw %si
- orw %bp, %bp
- jz dosvga
-
- movw %bp, %si # Found, copy the modes
- movb svga_prefix, %ah
-cpsvga: lodsb
- orb %al, %al
- jz didsv
-
- stosw
- movsw
- jmp cpsvga
-
-didsv: movw %si, card_name # Store pointer to card name
-didsv1: ret
-
-# Table of all known SVGA cards. For each card, we store a pointer to
-# a table of video modes supported by the card and a pointer to a routine
-# used for testing of presence of the card. The video mode table is always
-# followed by the name of the card or the chipset.
-svga_table:
- .word ati_md, ati_test
- .word oak_md, oak_test
- .word paradise_md, paradise_test
- .word realtek_md, realtek_test
- .word s3_md, s3_test
- .word chips_md, chips_test
- .word video7_md, video7_test
- .word cirrus5_md, cirrus5_test
- .word cirrus6_md, cirrus6_test
- .word cirrus1_md, cirrus1_test
- .word ahead_md, ahead_test
- .word everex_md, everex_test
- .word genoa_md, genoa_test
- .word trident_md, trident_test
- .word tseng_md, tseng_test
- .word 0
-
-# Test routines and mode tables:
-
-# S3 - The test algorithm was taken from the SuperProbe package
-# for XFree86 1.2.1. Report bugs to Christoph.Niemann@linux.org
-s3_test:
- movw $0x0f35, %cx # we store some constants in cl/ch
- movw $0x03d4, %dx
- movb $0x38, %al
- call inidx
- movb %al, %bh # store current CRT-register 0x38
- movw $0x0038, %ax
- call outidx # disable writing to special regs
- movb %cl, %al # check whether we can write special reg 0x35
- call inidx
- movb %al, %bl # save the current value of CRT reg 0x35
- andb $0xf0, %al # clear bits 0-3
- movb %al, %ah
- movb %cl, %al # and write it to CRT reg 0x35
- call outidx
- call inidx # now read it back
- andb %ch, %al # clear the upper 4 bits
- jz s3_2 # the first test failed. But we have a
-
- movb %bl, %ah # second chance
- movb %cl, %al
- call outidx
- jmp s3_1 # do the other tests
-
-s3_2: movw %cx, %ax # load ah with 0xf and al with 0x35
- orb %bl, %ah # set the upper 4 bits of ah with the orig value
- call outidx # write ...
- call inidx # ... and reread
- andb %cl, %al # turn off the upper 4 bits
- pushw %ax
- movb %bl, %ah # restore old value in register 0x35
- movb %cl, %al
- call outidx
- popw %ax
- cmpb %ch, %al # setting lower 4 bits was successful => bad
- je no_s3 # writing is allowed => this is not an S3
-
-s3_1: movw $0x4838, %ax # allow writing to special regs by putting
- call outidx # magic number into CRT-register 0x38
- movb %cl, %al # check whether we can write special reg 0x35
- call inidx
- movb %al, %bl
- andb $0xf0, %al
- movb %al, %ah
- movb %cl, %al
- call outidx
- call inidx
- andb %ch, %al
- jnz no_s3 # no, we can't write => no S3
-
- movw %cx, %ax
- orb %bl, %ah
- call outidx
- call inidx
- andb %ch, %al
- pushw %ax
- movb %bl, %ah # restore old value in register 0x35
- movb %cl, %al
- call outidx
- popw %ax
- cmpb %ch, %al
- jne no_s31 # writing not possible => no S3
- movb $0x30, %al
- call inidx # now get the S3 id ...
- leaw idS3, %di
- movw $0x10, %cx
- repne
- scasb
- je no_s31
-
- movb %bh, %ah
- movb $0x38, %al
- jmp s3rest
-
-no_s3: movb $0x35, %al # restore CRT register 0x35
- movb %bl, %ah
- call outidx
-no_s31: xorw %bp, %bp # Detection failed
-s3rest: movb %bh, %ah
- movb $0x38, %al # restore old value of CRT register 0x38
- jmp outidx
-
-idS3: .byte 0x81, 0x82, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95
- .byte 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa8, 0xb0
-
-s3_md: .byte 0x54, 0x2b, 0x84
- .byte 0x55, 0x19, 0x84
- .byte 0
- .ascii "S3"
- .byte 0
-
-# ATI cards.
-ati_test:
- leaw idati, %si
- movw $0x31, %di
- movw $0x09, %cx
- repe
- cmpsb
- je atiok
-
- xorw %bp, %bp
-atiok: ret
-
-idati: .ascii "761295520"
-
-ati_md: .byte 0x23, 0x19, 0x84
- .byte 0x33, 0x2c, 0x84
- .byte 0x22, 0x1e, 0x64
- .byte 0x21, 0x19, 0x64
- .byte 0x58, 0x21, 0x50
- .byte 0x5b, 0x1e, 0x50
- .byte 0
- .ascii "ATI"
- .byte 0
-
-# AHEAD
-ahead_test:
- movw $0x200f, %ax
- movw $0x3ce, %dx
- outw %ax, %dx
- incw %dx
- inb %dx, %al
- cmpb $0x20, %al
- je isahed
-
- cmpb $0x21, %al
- je isahed
-
- xorw %bp, %bp
-isahed: ret
-
-ahead_md:
- .byte 0x22, 0x2c, 0x84
- .byte 0x23, 0x19, 0x84
- .byte 0x24, 0x1c, 0x84
- .byte 0x2f, 0x32, 0xa0
- .byte 0x32, 0x22, 0x50
- .byte 0x34, 0x42, 0x50
- .byte 0
- .ascii "Ahead"
- .byte 0
-
-# Chips & Tech.
-chips_test:
- movw $0x3c3, %dx
- inb %dx, %al
- orb $0x10, %al
- outb %al, %dx
- movw $0x104, %dx
- inb %dx, %al
- movb %al, %bl
- movw $0x3c3, %dx
- inb %dx, %al
- andb $0xef, %al
- outb %al, %dx
- cmpb $0xa5, %bl
- je cantok
-
- xorw %bp, %bp
-cantok: ret
-
-chips_md:
- .byte 0x60, 0x19, 0x84
- .byte 0x61, 0x32, 0x84
- .byte 0
- .ascii "Chips & Technologies"
- .byte 0
-
-# Cirrus Logic 5X0
-cirrus1_test:
- movw $0x3d4, %dx
- movb $0x0c, %al
- outb %al, %dx
- incw %dx
- inb %dx, %al
- movb %al, %bl
- xorb %al, %al
- outb %al, %dx
- decw %dx
- movb $0x1f, %al
- outb %al, %dx
- incw %dx
- inb %dx, %al
- movb %al, %bh
- xorb %ah, %ah
- shlb $4, %al
- movw %ax, %cx
- movb %bh, %al
- shrb $4, %al
- addw %ax, %cx
- shlw $8, %cx
- addw $6, %cx
- movw %cx, %ax
- movw $0x3c4, %dx
- outw %ax, %dx
- incw %dx
- inb %dx, %al
- andb %al, %al
- jnz nocirr
-
- movb %bh, %al
- outb %al, %dx
- inb %dx, %al
- cmpb $0x01, %al
- je iscirr
-
-nocirr: xorw %bp, %bp
-iscirr: movw $0x3d4, %dx
- movb %bl, %al
- xorb %ah, %ah
- shlw $8, %ax
- addw $0x0c, %ax
- outw %ax, %dx
- ret
-
-cirrus1_md:
- .byte 0x1f, 0x19, 0x84
- .byte 0x20, 0x2c, 0x84
- .byte 0x22, 0x1e, 0x84
- .byte 0x31, 0x25, 0x64
- .byte 0
- .ascii "Cirrus Logic 5X0"
- .byte 0
-
-# Cirrus Logic 54XX
-cirrus5_test:
- movw $0x3c4, %dx
- movb $6, %al
- call inidx
- movb %al, %bl # BL=backup
- movw $6, %ax
- call tstidx
- cmpb $0x0f, %al
- jne c5fail
-
- movw $0x1206, %ax
- call tstidx
- cmpb $0x12, %al
- jne c5fail
-
- movb $0x1e, %al
- call inidx
- movb %al, %bh
- movb %bh, %ah
- andb $0xc0, %ah
- movb $0x1e, %al
- call tstidx
- andb $0x3f, %al
- jne c5xx
-
- movb $0x1e, %al
- movb %bh, %ah
- orb $0x3f, %ah
- call tstidx
- xorb $0x3f, %al
- andb $0x3f, %al
-c5xx: pushf
- movb $0x1e, %al
- movb %bh, %ah
- outw %ax, %dx
- popf
- je c5done
-
-c5fail: xorw %bp, %bp
-c5done: movb $6, %al
- movb %bl, %ah
- outw %ax, %dx
- ret
-
-cirrus5_md:
- .byte 0x14, 0x19, 0x84
- .byte 0x54, 0x2b, 0x84
- .byte 0
- .ascii "Cirrus Logic 54XX"
- .byte 0
-
-# Cirrus Logic 64XX -- no known extra modes, but must be identified, because
-# it's misidentified by the Ahead test.
-cirrus6_test:
- movw $0x3ce, %dx
- movb $0x0a, %al
- call inidx
- movb %al, %bl # BL=backup
- movw $0xce0a, %ax
- call tstidx
- orb %al, %al
- jne c2fail
-
- movw $0xec0a, %ax
- call tstidx
- cmpb $0x01, %al
- jne c2fail
-
- movb $0xaa, %al
- call inidx # 4X, 5X, 7X and 8X are valid 64XX chip ID's.
- shrb $4, %al
- subb $4, %al
- jz c6done
-
- decb %al
- jz c6done
-
- subb $2, %al
- jz c6done
-
- decb %al
- jz c6done
-
-c2fail: xorw %bp, %bp
-c6done: movb $0x0a, %al
- movb %bl, %ah
- outw %ax, %dx
- ret
-
-cirrus6_md:
- .byte 0
- .ascii "Cirrus Logic 64XX"
- .byte 0
-
-# Everex / Trident
-everex_test:
- movw $0x7000, %ax
- xorw %bx, %bx
- int $0x10
- cmpb $0x70, %al
- jne noevrx
-
- shrw $4, %dx
- cmpw $0x678, %dx
- je evtrid
-
- cmpw $0x236, %dx
- jne evrxok
-
-evtrid: leaw trident_md, %bp
-evrxok: ret
-
-noevrx: xorw %bp, %bp
- ret
-
-everex_md:
- .byte 0x03, 0x22, 0x50
- .byte 0x04, 0x3c, 0x50
- .byte 0x07, 0x2b, 0x64
- .byte 0x08, 0x4b, 0x64
- .byte 0x0a, 0x19, 0x84
- .byte 0x0b, 0x2c, 0x84
- .byte 0x16, 0x1e, 0x50
- .byte 0x18, 0x1b, 0x64
- .byte 0x21, 0x40, 0xa0
- .byte 0x40, 0x1e, 0x84
- .byte 0
- .ascii "Everex/Trident"
- .byte 0
-
-# Genoa.
-genoa_test:
- leaw idgenoa, %si # Check Genoa 'clues'
- xorw %ax, %ax
- movb %es:(0x37), %al
- movw %ax, %di
- movw $0x04, %cx
- decw %si
- decw %di
-l1: incw %si
- incw %di
- movb (%si), %al
- testb %al, %al
- jz l2
-
- cmpb %es:(%di), %al
-l2: loope l1
- orw %cx, %cx
- je isgen
-
- xorw %bp, %bp
-isgen: ret
-
-idgenoa: .byte 0x77, 0x00, 0x99, 0x66
-
-genoa_md:
- .byte 0x58, 0x20, 0x50
- .byte 0x5a, 0x2a, 0x64
- .byte 0x60, 0x19, 0x84
- .byte 0x61, 0x1d, 0x84
- .byte 0x62, 0x20, 0x84
- .byte 0x63, 0x2c, 0x84
- .byte 0x64, 0x3c, 0x84
- .byte 0x6b, 0x4f, 0x64
- .byte 0x72, 0x3c, 0x50
- .byte 0x74, 0x42, 0x50
- .byte 0x78, 0x4b, 0x64
- .byte 0
- .ascii "Genoa"
- .byte 0
-
-# OAK
-oak_test:
- leaw idoakvga, %si
- movw $0x08, %di
- movw $0x08, %cx
- repe
- cmpsb
- je isoak
-
- xorw %bp, %bp
-isoak: ret
-
-idoakvga: .ascii "OAK VGA "
-
-oak_md: .byte 0x4e, 0x3c, 0x50
- .byte 0x4f, 0x3c, 0x84
- .byte 0x50, 0x19, 0x84
- .byte 0x51, 0x2b, 0x84
- .byte 0
- .ascii "OAK"
- .byte 0
-
-# WD Paradise.
-paradise_test:
- leaw idparadise, %si
- movw $0x7d, %di
- movw $0x04, %cx
- repe
- cmpsb
- je ispara
-
- xorw %bp, %bp
-ispara: ret
-
-idparadise: .ascii "VGA="
-
-paradise_md:
- .byte 0x41, 0x22, 0x50
- .byte 0x47, 0x1c, 0x84
- .byte 0x55, 0x19, 0x84
- .byte 0x54, 0x2c, 0x84
- .byte 0
- .ascii "Paradise"
- .byte 0
-
-# Trident.
-trident_test:
- movw $0x3c4, %dx
- movb $0x0e, %al
- outb %al, %dx
- incw %dx
- inb %dx, %al
- xchgb %al, %ah
- xorb %al, %al
- outb %al, %dx
- inb %dx, %al
- xchgb %ah, %al
- movb %al, %bl # Strange thing ... in the book this wasn't
- andb $0x02, %bl # necessary but it worked on my card which
- jz setb2 # is a trident. Without it the screen goes
- # blurred ...
- andb $0xfd, %al
- jmp clrb2
-
-setb2: orb $0x02, %al
-clrb2: outb %al, %dx
- andb $0x0f, %ah
- cmpb $0x02, %ah
- je istrid
-
- xorw %bp, %bp
-istrid: ret
-
-trident_md:
- .byte 0x50, 0x1e, 0x50
- .byte 0x51, 0x2b, 0x50
- .byte 0x52, 0x3c, 0x50
- .byte 0x57, 0x19, 0x84
- .byte 0x58, 0x1e, 0x84
- .byte 0x59, 0x2b, 0x84
- .byte 0x5a, 0x3c, 0x84
- .byte 0
- .ascii "Trident"
- .byte 0
-
-# Tseng.
-tseng_test:
- movw $0x3cd, %dx
- inb %dx, %al # Could things be this simple ! :-)
- movb %al, %bl
- movb $0x55, %al
- outb %al, %dx
- inb %dx, %al
- movb %al, %ah
- movb %bl, %al
- outb %al, %dx
- cmpb $0x55, %ah
- je istsen
-
-isnot: xorw %bp, %bp
-istsen: ret
-
-tseng_md:
- .byte 0x26, 0x3c, 0x50
- .byte 0x2a, 0x28, 0x64
- .byte 0x23, 0x19, 0x84
- .byte 0x24, 0x1c, 0x84
- .byte 0x22, 0x2c, 0x84
- .byte 0x21, 0x3c, 0x84
- .byte 0
- .ascii "Tseng"
- .byte 0
-
-# Video7.
-video7_test:
- movw $0x3cc, %dx
- inb %dx, %al
- movw $0x3b4, %dx
- andb $0x01, %al
- jz even7
-
- movw $0x3d4, %dx
-even7: movb $0x0c, %al
- outb %al, %dx
- incw %dx
- inb %dx, %al
- movb %al, %bl
- movb $0x55, %al
- outb %al, %dx
- inb %dx, %al
- decw %dx
- movb $0x1f, %al
- outb %al, %dx
- incw %dx
- inb %dx, %al
- movb %al, %bh
- decw %dx
- movb $0x0c, %al
- outb %al, %dx
- incw %dx
- movb %bl, %al
- outb %al, %dx
- movb $0x55, %al
- xorb $0xea, %al
- cmpb %bh, %al
- jne isnot
-
- movb $VIDEO_FIRST_V7>>8, svga_prefix # Use special mode switching
- ret
-
-video7_md:
- .byte 0x40, 0x2b, 0x50
- .byte 0x43, 0x3c, 0x50
- .byte 0x44, 0x3c, 0x64
- .byte 0x41, 0x19, 0x84
- .byte 0x42, 0x2c, 0x84
- .byte 0x45, 0x1c, 0x84
- .byte 0
- .ascii "Video 7"
- .byte 0
-
-# Realtek VGA
-realtek_test:
- leaw idrtvga, %si
- movw $0x45, %di
- movw $0x0b, %cx
- repe
- cmpsb
- je isrt
-
- xorw %bp, %bp
-isrt: ret
-
-idrtvga: .ascii "REALTEK VGA"
-
-realtek_md:
- .byte 0x1a, 0x3c, 0x50
- .byte 0x1b, 0x19, 0x84
- .byte 0x1c, 0x1e, 0x84
- .byte 0x1d, 0x2b, 0x84
- .byte 0x1e, 0x3c, 0x84
- .byte 0
- .ascii "REALTEK"
- .byte 0
-
-#endif /* CONFIG_VIDEO_SVGA */
-
-# User-defined local mode table (VGA only)
-#ifdef CONFIG_VIDEO_LOCAL
-local_modes:
- leaw local_mode_table, %si
-locm1: lodsw
- orw %ax, %ax
- jz locm2
-
- stosw
- movsw
- jmp locm1
-
-locm2: ret
-
-# This is the table of local video modes which can be supplied manually
-# by the user. Each entry consists of mode ID (word) and dimensions
-# (byte for column count and another byte for row count). These modes
-# are placed before all SVGA and VESA modes and override them if table
-# compacting is enabled. The table must end with a zero word followed
-# by NUL-terminated video adapter name.
-local_mode_table:
- .word 0x0100 # Example: 40x25
- .byte 25,40
- .word 0
- .ascii "Local"
- .byte 0
-#endif /* CONFIG_VIDEO_LOCAL */
-
-# Read a key and return the ASCII code in al, scan code in ah
-getkey: xorb %ah, %ah
- int $0x16
- ret
-
-# Read a key with a timeout of 30 seconds.
-# The hardware clock is used to get the time.
-getkt: call gettime
- addb $30, %al # Wait 30 seconds
- cmpb $60, %al
- jl lminute
-
- subb $60, %al
-lminute:
- movb %al, %cl
-again: movb $0x01, %ah
- int $0x16
- jnz getkey # key pressed, so get it
-
- call gettime
- cmpb %cl, %al
- jne again
-
- movb $0x20, %al # timeout, return `space'
- ret
-
-# Flush the keyboard buffer
-flush: movb $0x01, %ah
- int $0x16
- jz empty
-
- xorb %ah, %ah
- int $0x16
- jmp flush
-
-empty: ret
-
-# Print hexadecimal number.
-prthw: pushw %ax
- movb %ah, %al
- call prthb
- popw %ax
-prthb: pushw %ax
- shrb $4, %al
- call prthn
- popw %ax
- andb $0x0f, %al
-prthn: cmpb $0x0a, %al
- jc prth1
-
- addb $0x07, %al
-prth1: addb $0x30, %al
- jmp prtchr
-
-# Print decimal number in al
-prtdec: pushw %ax
- pushw %cx
- xorb %ah, %ah
- movb $0x0a, %cl
- idivb %cl
- cmpb $0x09, %al
- jbe lt100
-
- call prtdec
- jmp skip10
-
-lt100: addb $0x30, %al
- call prtchr
-skip10: movb %ah, %al
- addb $0x30, %al
- call prtchr
- popw %cx
- popw %ax
- ret
-
-store_edid:
-#ifdef CONFIG_FIRMWARE_EDID
- pushw %es # just save all registers
- pushw %ax
- pushw %bx
- pushw %cx
- pushw %dx
- pushw %di
-
- pushw %fs
- popw %es
-
- movl $0x13131313, %eax # memset block with 0x13
- movw $32, %cx
- movw $0x140, %di
- cld
- rep
- stosl
-
- cmpw $0x0200, vbe_version # only do EDID on >= VBE2.0
- jl no_edid
-
- pushw %es # save ES
- xorw %di, %di # Report Capability
- pushw %di
- popw %es # ES:DI must be 0:0
- movw $0x4f15, %ax
- xorw %bx, %bx
- xorw %cx, %cx
- int $0x10
- popw %es # restore ES
-
- cmpb $0x00, %ah # call successful
- jne no_edid
-
- cmpb $0x4f, %al # function supported
- jne no_edid
-
- movw $0x4f15, %ax # do VBE/DDC
- movw $0x01, %bx
- movw $0x00, %cx
- movw $0x00, %dx
- movw $0x140, %di
- int $0x10
-
-no_edid:
- popw %di # restore all registers
- popw %dx
- popw %cx
- popw %bx
- popw %ax
- popw %es
-#endif
- ret
-
-# VIDEO_SELECT-only variables
-mt_end: .word 0 # End of video mode table if built
-edit_buf: .space 6 # Line editor buffer
-card_name: .word 0 # Pointer to adapter name
-scanning: .byte 0 # Performing mode scan
-do_restore: .byte 0 # Screen contents altered during mode change
-svga_prefix: .byte VIDEO_FIRST_BIOS>>8 # Default prefix for BIOS modes
-graphic_mode: .byte 0 # Graphic mode with a linear frame buffer
-dac_size: .byte 6 # DAC bit depth
-vbe_version: .word 0 # VBE bios version
-
-# Status messages
-keymsg: .ascii "Press <RETURN> to see video modes available, "
- .ascii "<SPACE> to continue or wait 30 secs"
- .byte 0x0d, 0x0a, 0
-
-listhdr: .byte 0x0d, 0x0a
- .ascii "Mode: COLSxROWS:"
-
-crlft: .byte 0x0d, 0x0a, 0
-
-prompt: .byte 0x0d, 0x0a
- .asciz "Enter mode number or `scan': "
-
-unknt: .asciz "Unknown mode ID. Try again."
-
-badmdt: .ascii "You passed an undefined mode number."
- .byte 0x0d, 0x0a, 0
-
-vesaer: .ascii "Error: Scanning of VESA modes failed. Please "
- .ascii "report to <mj@ucw.cz>."
- .byte 0x0d, 0x0a, 0
-
-old_name: .asciz "CGA/MDA/HGA"
-
-ega_name: .asciz "EGA"
-
-svga_name: .ascii " "
-
-vga_name: .asciz "VGA"
-
-vesa_name: .asciz "VESA"
-
-name_bann: .asciz "Video adapter: "
-#endif /* CONFIG_VIDEO_SELECT */
-
-# Other variables:
-adapter: .byte 0 # Video adapter: 0=CGA/MDA/HGA,1=EGA,2=VGA
-video_segment: .word 0xb800 # Video memory segment
-force_size: .word 0 # Use this size instead of the one in BIOS vars
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (22 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 15:45 ` Chuck Ebbert
2007-05-01 3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
` (5 subsequent siblings)
29 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
vfat implements compat handlers for these ioctls, but when they
were executed on other file systems the kernel would still complain
about an unknown compat ioctl. Just declare them as compatible
and let them be rejected when not needed by the normal path.
This makes wine runs a lot quieter
Signed-off-by: Andi Kleen <ak@suse.de>
---
fs/compat_ioctl.c | 9 +++++++++
1 file changed, 9 insertions(+)
Index: linux/fs/compat_ioctl.c
===================================================================
--- linux.orig/fs/compat_ioctl.c
+++ linux/fs/compat_ioctl.c
@@ -2627,6 +2627,15 @@ COMPATIBLE_IOCTL(LPRESET)
/*LPGETSTATS not implemented, but no kernels seem to compile it in anyways*/
COMPATIBLE_IOCTL(LPGETFLAGS)
HANDLE_IOCTL(LPSETTIMEOUT, lp_timeout_trans)
+
+/* fat 'r' ioctls. These are handled by fat with ->compat_ioctl,
+ but we don't want warnings on other file systems. So declare
+ them as compatible here. */
+#define VFAT_IOCTL_READDIR_BOTH32 _IOR('r', 1, struct compat_dirent[2])
+#define VFAT_IOCTL_READDIR_SHORT32 _IOR('r', 2, struct compat_dirent[2])
+
+IGNORE_IOCTL(VFAT_IOCTL_READDIR_BOTH32)
+IGNORE_IOCTL(VFAT_IOCTL_READDIR_SHORT32)
};
int ioctl_table_size = ARRAY_SIZE(ioctl_start);
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
2007-05-01 3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
@ 2007-05-01 15:45 ` Chuck Ebbert
2007-05-02 10:46 ` Andi Kleen
0 siblings, 1 reply; 52+ messages in thread
From: Chuck Ebbert @ 2007-05-01 15:45 UTC (permalink / raw)
To: Andi Kleen; +Cc: patches, linux-kernel
Andi Kleen wrote:
> vfat implements compat handlers for these ioctls, but when they
> were executed on other file systems the kernel would still complain
> about an unknown compat ioctl. Just declare them as compatible
> and let them be rejected when not needed by the normal path.
>
> This makes wine runs a lot quieter
Does this also restore the original -ENOTTY return code?
The change that made it return -EINVAL broke a few Wine apps.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems
2007-05-01 15:45 ` Chuck Ebbert
@ 2007-05-02 10:46 ` Andi Kleen
0 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-02 10:46 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: patches, linux-kernel
On Tuesday 01 May 2007 17:45:52 Chuck Ebbert wrote:
> Andi Kleen wrote:
> > vfat implements compat handlers for these ioctls, but when they
> > were executed on other file systems the kernel would still complain
> > about an unknown compat ioctl. Just declare them as compatible
> > and let them be rejected when not needed by the normal path.
> >
> > This makes wine runs a lot quieter
>
> Does this also restore the original -ENOTTY return code?
>
> The change that made it return -EINVAL broke a few Wine apps.
No it's still EINVAL as before. I just eliminated the printks.
I guess one could add a IGNORE_IOCTL_ENOTTY() or somesuch though if it's
really a problem.
-Andi
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (23 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
` (4 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]
Fix:
In file included from include2/asm/apic.h:5,
from include2/asm/smp.h:15,
from linux/arch/x86_64/kernel/genapic_flat.c:18:
linux/include/linux/pm.h: In function ‘call_platform_enable_wakeup’:
linux/include/linux/pm.h:331: error: ‘EIO’ undeclared (first use in this function)
linux/include/linux/pm.h:331: error: (Each undeclared identifier is reported only once
linux/include/linux/pm.h:331: error: for each function it appears in.)
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86_64/kernel/genapic_flat.c | 1 +
1 file changed, 1 insertion(+)
Index: linux/arch/x86_64/kernel/genapic_flat.c
===================================================================
--- linux.orig/arch/x86_64/kernel/genapic_flat.c
+++ linux/arch/x86_64/kernel/genapic_flat.c
@@ -8,6 +8,7 @@
* Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
* James Cleverdon.
*/
+#include <linux/errno.h>
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/string.h>
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [26/30] i386: Drop noisy e820 debugging printks
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (24 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
` (3 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/i386/kernel/e820.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
Index: linux/arch/i386/kernel/e820.c
===================================================================
--- linux.orig/arch/i386/kernel/e820.c
+++ linux/arch/i386/kernel/e820.c
@@ -393,10 +393,8 @@ int __init sanitize_e820_map(struct e820
____________________33__
______________________4_
*/
- printk("sanitize start\n");
/* if there's only one memory region, don't bother */
if (*pnr_map < 2) {
- printk("sanitize bail 0\n");
return -1;
}
@@ -405,7 +403,6 @@ int __init sanitize_e820_map(struct e820
/* bail out if we find any unreasonable addresses in bios map */
for (i=0; i<old_nr; i++)
if (biosmap[i].addr + biosmap[i].size < biosmap[i].addr) {
- printk("sanitize bail 1\n");
return -1;
}
@@ -501,7 +498,6 @@ int __init sanitize_e820_map(struct e820
memcpy(biosmap, new_bios, new_nr*sizeof(struct e820entry));
*pnr_map = new_nr;
- printk("sanitize end\n");
return 0;
}
@@ -532,7 +528,6 @@ int __init copy_e820_map(struct e820entr
unsigned long long size = biosmap->size;
unsigned long long end = start + size;
unsigned long type = biosmap->type;
- printk("copy_e820_map() start: %016Lx size: %016Lx end: %016Lx type: %ld\n", start, size, end, type);
/* Overflow in 64 bits? Ignore the memory map. */
if (start > end)
@@ -543,17 +538,11 @@ int __init copy_e820_map(struct e820entr
* Not right. Fix it up.
*/
if (type == E820_RAM) {
- printk("copy_e820_map() type is E820_RAM\n");
if (start < 0x100000ULL && end > 0xA0000ULL) {
- printk("copy_e820_map() lies in range...\n");
- if (start < 0xA0000ULL) {
- printk("copy_e820_map() start < 0xA0000ULL\n");
+ if (start < 0xA0000ULL)
add_memory_region(start, 0xA0000ULL-start, type);
- }
- if (end <= 0x100000ULL) {
- printk("copy_e820_map() end <= 0x100000ULL\n");
+ if (end <= 0x100000ULL)
continue;
- }
start = 0x100000ULL;
size = end - start;
}
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [27/30] i386: white space fixes in i387.h
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (25 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
` (2 subsequent siblings)
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: jan.kiszka, patches, linux-kernel
From: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/i387.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Index: linux/include/asm-i386/i387.h
===================================================================
--- linux.orig/include/asm-i386/i387.h
+++ linux/include/asm-i386/i387.h
@@ -83,8 +83,8 @@ static inline void __save_init_fpu( stru
#define __clear_fpu( tsk ) \
do { \
- if (task_thread_info(tsk)->status & TS_USEDFPU) { \
- asm volatile("fnclex ; fwait"); \
+ if (task_thread_info(tsk)->status & TS_USEDFPU) { \
+ asm volatile("fnclex ; fwait"); \
task_thread_info(tsk)->status &= ~TS_USEDFPU; \
stts(); \
} \
@@ -113,7 +113,7 @@ static inline void save_init_fpu( struct
__clear_fpu( tsk ); \
preempt_enable(); \
} while (0)
- \
+
/*
* FPU state interaction...
*/
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (26 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
2007-05-01 3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: jan.kiszka, patches, linux-kernel
From: Jan Kiszka <jan.kiszka@web.de>
There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and
none of them appear to require additional preempt_disable/enable here.
Let's open-code save_init_fpu in __unlazy_fpu to save a few ops.
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-i386/i387.h | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
Index: linux/include/asm-i386/i387.h
===================================================================
--- linux.orig/include/asm-i386/i387.h
+++ linux/include/asm-i386/i387.h
@@ -74,11 +74,12 @@ static inline void __save_init_fpu( stru
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}
-#define __unlazy_fpu( tsk ) do { \
- if (task_thread_info(tsk)->status & TS_USEDFPU) \
- save_init_fpu( tsk ); \
- else \
- tsk->fpu_counter = 0; \
+#define __unlazy_fpu( tsk ) do { \
+ if (task_thread_info(tsk)->status & TS_USEDFPU) { \
+ __save_init_fpu(tsk); \
+ stts(); \
+ } else \
+ tsk->fpu_counter = 0; \
} while (0)
#define __clear_fpu( tsk ) \
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (27 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
29 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: patches, linux-kernel
asm-offsets.c is valid source code and needs to be diffed.
Signed-off-by: Andi Kleen <ak@suse.de>
---
Documentation/dontdiff | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux/Documentation/dontdiff
===================================================================
--- linux.orig/Documentation/dontdiff
+++ linux/Documentation/dontdiff
@@ -55,8 +55,8 @@ aic7*seq.h*
aicasm
aicdb.h*
asm
-asm-offsets.*
-asm_offsets.*
+asm-offsets.h
+asm_offsets.h
autoconf.h*
bbootsect
bin2c
^ permalink raw reply [flat|nested] 52+ messages in thread* [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
` (28 preceding siblings ...)
2007-05-01 3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
@ 2007-05-01 3:58 ` Andi Kleen
2007-05-01 4:26 ` Eric Dumazet
2007-05-01 4:37 ` William Lee Irwin III
29 siblings, 2 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 3:58 UTC (permalink / raw)
To: ebiederm, patches, linux-kernel
From: ebiederm@xmission.com
When in PAE mode we require that the user kernel divide to be
on a 1G boundary. The 2G/2G split does not have that property
so require !X86_PAE
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
arch/i386/Kconfig | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 1a94a73..80003de 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -570,6 +570,7 @@ choice
depends on !HIGHMEM
bool "3G/1G user/kernel split (for full 1G low memory)"
config VMSPLIT_2G
+ depends on !X86_PAE
bool "2G/2G user/kernel split"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
--
1.5.1.1.181.g2de0
^ permalink raw reply related [flat|nested] 52+ messages in thread* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
@ 2007-05-01 4:26 ` Eric Dumazet
2007-05-01 6:21 ` Andi Kleen
2007-05-01 4:37 ` William Lee Irwin III
1 sibling, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 4:26 UTC (permalink / raw)
To: Andi Kleen; +Cc: ebiederm, patches, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]
Andi Kleen a écrit :
> From: ebiederm@xmission.com
>
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary. The 2G/2G split does not have that property
> so require !X86_PAE
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> arch/i386/Kconfig | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> index 1a94a73..80003de 100644
> --- a/arch/i386/Kconfig
> +++ b/arch/i386/Kconfig
> @@ -570,6 +570,7 @@ choice
> depends on !HIGHMEM
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config VMSPLIT_2G
> + depends on !X86_PAE
> bool "2G/2G user/kernel split"
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change
PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?
Maybe the following patch is better ?
[PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
When in PAE mode we require that the user kernel divide to be
on a 1G boundary. We must therefore make sure PAGE_OFFSET is correctlty
defined in the 2G/2G split and PAE mode.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: i386_CONFIG_PAGE_OFFSET.patch --]
[-- Type: text/plain, Size: 414 bytes --]
diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index 53d6237..32356f2 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -578,7 +578,8 @@ endchoice
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
- default 0x78000000 if VMSPLIT_2G
+ default 0x78000000 if (VMSPLIT_2G && !X86_PAE)
+ default 0x80000000 if (VMSPLIT_2G && X86_PAE)
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 4:26 ` Eric Dumazet
@ 2007-05-01 6:21 ` Andi Kleen
2007-05-01 13:01 ` Bill Irwin
0 siblings, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2007-05-01 6:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, ebiederm, patches, linux-kernel, bill.irwin
On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
> Andi Kleen a ?crit :
> >From: ebiederm@xmission.com
> >
> >When in PAE mode we require that the user kernel divide to be
> >on a 1G boundary. The 2G/2G split does not have that property
> >so require !X86_PAE
> >
> >Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >---
> > arch/i386/Kconfig | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
> >index 1a94a73..80003de 100644
> >--- a/arch/i386/Kconfig
> >+++ b/arch/i386/Kconfig
> >@@ -570,6 +570,7 @@ choice
> > depends on !HIGHMEM
> > bool "3G/1G user/kernel split (for full 1G low memory)"
> > config VMSPLIT_2G
> >+ depends on !X86_PAE
> > bool "2G/2G user/kernel split"
> > config VMSPLIT_1G
> > bool "1G/3G user/kernel split"
>
> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change
> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?
I dropped the patch for now.
> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
>
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary. We must therefore make sure PAGE_OFFSET is correctlty
> defined in the 2G/2G split and PAE mode.
Looks reasonable. Did you test both cases? wli, ok for you too?
-Andi
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 6:21 ` Andi Kleen
@ 2007-05-01 13:01 ` Bill Irwin
2007-05-01 13:49 ` Mark Lord
2007-05-01 15:51 ` Eric Dumazet
0 siblings, 2 replies; 52+ messages in thread
From: Bill Irwin @ 2007-05-01 13:01 UTC (permalink / raw)
To: Andi Kleen; +Cc: Eric Dumazet, ebiederm, patches, linux-kernel, bill.irwin
On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change
>> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?
On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
> I dropped the patch for now.
I'm more miffed about what it's cleaning up after than the patch itself.
On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary. We must therefore make sure PAGE_OFFSET is correctlty
>> defined in the 2G/2G split and PAE mode.
On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
> Looks reasonable. Did you test both cases? wli, ok for you too?
Sorry about the delay in replying.
I don't mind so long as we're not letting doorstop configs through. I'd
probably do something more like
Index: sched/arch/i386/Kconfig
===================================================================
--- sched.orig/arch/i386/Kconfig 2007-05-01 04:35:47.065162310 -0700
+++ sched/arch/i386/Kconfig 2007-05-01 04:36:50.100754504 -0700
@@ -571,6 +571,9 @@
bool "3G/1G user/kernel split (for full 1G low memory)"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
+ config VMSPLIT_2G_OPT
+ depends on !HIGHMEM
+ bool "2G/2G user/kernel split (for full 2G low memory)"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
endchoice
@@ -578,7 +581,8 @@
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
- default 0x78000000 if VMSPLIT_2G
+ default 0x80000000 if VMSPLIT_2G
+ default 0x78000000 if VMSPLIT_2G_OPT
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
as a stopgap measure, but I'm not all that interested in grabbing patch
credits where others could do it easily enough. Either of the config
alterations is fine by me as they now stand; maybe Eric Dumazet might
care to do something like my suggestion at some point.
My interest here is in approaches that aren't really centered around
config options. Those are pmd handling for 1GB-unaligned PAGE_OFFSET
in PAE and dynamic vmallocspace reservation, the latter of which is
more complex than the first. I'd probably only do the pmd handling as
it's much easier than dynamic vmallocspace, which does substantial
violence to the core in order to reserve chunks of ZONE_NORMAL's
virtualspace so that no boot-time virtualspace reservations need to be
made for vmalloc(). Basically it would make vmalloc() proper use
physically contiguous memory and vmap() fiddle with ZONE_NORMAL
pagetables while reserving physical memory underlying the virtualspace
reserved for vmap() so that it's no longer necessary to carve
vmallocspace out of userspace to avoid highmem. That would occur at the
cost of runtime memory footprint of the rarely-called vmap() and
ZONE_NORMAL mapping updates. It would also alleviate pressure on
vmallocspace in configurations where it would be severely limited.
I'm doing a bit of thinking about this laptop-avoiding-highmem problem.
I've not come up with any better ideas than the dynamic vmallocspace
approach to avoid ABI damage while avoiding both highmem and sacrificing
memory for 1GB laptops, and slightly mitigating ABI damage for 2GB
laptops' highmem avoidance efforts. I'm thinking the applicability
isn't broad enough to merit the effort of dynamic vmallocspace. The pmd
fixup for 1GB-unaligned splits is not such a big deal in comparison.
-- wli
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 13:01 ` Bill Irwin
@ 2007-05-01 13:49 ` Mark Lord
2007-05-01 15:51 ` Eric Dumazet
1 sibling, 0 replies; 52+ messages in thread
From: Mark Lord @ 2007-05-01 13:49 UTC (permalink / raw)
To: Bill Irwin, Andi Kleen, Eric Dumazet, ebiederm, patches,
linux-kernel, Jens Axboe
Bill Irwin wrote:
>
> I don't mind so long as we're not letting doorstop configs through. I'd
> probably do something more like
>
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig 2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig 2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config VMSPLIT_2G
> bool "2G/2G user/kernel split"
> + config VMSPLIT_2G_OPT
> + depends on !HIGHMEM
> + bool "2G/2G user/kernel split (for full 2G low memory)"
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
> endchoice
> @@ -578,7 +581,8 @@
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> - default 0x78000000 if VMSPLIT_2G
> + default 0x80000000 if VMSPLIT_2G
> + default 0x78000000 if VMSPLIT_2G_OPT
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
> as a stopgap measure, but I'm not all that interested in grabbing patch
..
Yup, I second that one. The idea of the original 2G split
was to avoid the need for HIGHMEM entirely, reducing overhead
on slower machines.
Having both kinds of splits is fine, but probably just the original one
with the !PAE is okay too.
-ml
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 13:01 ` Bill Irwin
2007-05-01 13:49 ` Mark Lord
@ 2007-05-01 15:51 ` Eric Dumazet
2007-05-01 17:00 ` Bill Irwin
1 sibling, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 15:51 UTC (permalink / raw)
To: Bill Irwin, Andi Kleen, Eric Dumazet, ebiederm, patches,
linux-kernel
Bill Irwin a écrit :
> On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>>> Hum... We lose a usefull 2G/2G split. Should'nt we use a patch to change
>>> PAGE_OFFSET to 0x8000000 instead of 0x78000000 and keep 2G/2G split ?
>
> On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
>> I dropped the patch for now.
>
> I'm more miffed about what it's cleaning up after than the patch itself.
>
>
> On Tue, May 01, 2007 at 06:26:23AM +0200, Eric Dumazet wrote:
>>> [PATCH] i386 : Adjust CONFIG_PAGE_OFFSET in case of 2G/2G split and X86_PAE
>>> When in PAE mode we require that the user kernel divide to be
>>> on a 1G boundary. We must therefore make sure PAGE_OFFSET is correctlty
>>> defined in the 2G/2G split and PAE mode.
>
> On Tue, May 01, 2007 at 08:21:32AM +0200, Andi Kleen wrote:
>> Looks reasonable. Did you test both cases? wli, ok for you too?
>
> Sorry about the delay in replying.
>
> I don't mind so long as we're not letting doorstop configs through. I'd
> probably do something more like
>
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig 2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig 2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config VMSPLIT_2G
> bool "2G/2G user/kernel split"
> + config VMSPLIT_2G_OPT
> + depends on !HIGHMEM
> + bool "2G/2G user/kernel split (for full 2G low memory)"
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
> endchoice
> @@ -578,7 +581,8 @@
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> - default 0x78000000 if VMSPLIT_2G
> + default 0x80000000 if VMSPLIT_2G
> + default 0x78000000 if VMSPLIT_2G_OPT
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
> as a stopgap measure, but I'm not all that interested in grabbing patch
> credits where others could do it easily enough. Either of the config
> alterations is fine by me as they now stand; maybe Eric Dumazet might
> care to do something like my suggestion at some point.
Your patch is very fine Bill, please resubmit it with proper Signed-off-by
My first patch was a trivial reaction to try to keep alive 2G/2G split, yours
is better for fine tuning.
Thank you
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 15:51 ` Eric Dumazet
@ 2007-05-01 17:00 ` Bill Irwin
2007-05-01 17:17 ` Eric W. Biederman
2007-05-02 9:38 ` Andi Kleen
0 siblings, 2 replies; 52+ messages in thread
From: Bill Irwin @ 2007-05-01 17:00 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Bill Irwin, Andi Kleen, ebiederm, patches, linux-kernel
Bill Irwin a écrit :
>> as a stopgap measure, but I'm not all that interested in grabbing patch
>> credits where others could do it easily enough. Either of the config
>> alterations is fine by me as they now stand; maybe Eric Dumazet might
>> care to do something like my suggestion at some point.
On Tue, May 01, 2007 at 05:51:44PM +0200, Eric Dumazet wrote:
> Your patch is very fine Bill, please resubmit it with proper Signed-off-by
> My first patch was a trivial reaction to try to keep alive 2G/2G split,
> yours is better for fine tuning.
I was hoping you would submit it as an update, but maybe adding your
Signed-off-by: to my own will do.
-- wli
Only 1GB-aligned kernel/user splits are now handled for PAE. The
2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
mapping for physical memory by using an actual split of 1.875/2.125
to accommodate 128MB of vmallocspace out of what would otherwise
be a full 2GB for userspace. That attempt disturbs the alignment
required by PAE for 2GB/2GB splits, and furthermore does not provide
a 2GB/2GB split as advertised.
This patch resolves the issues here in two manners. The first is
by providing a true 2GB/2GB split in addition to the 1.875/2.125
split. The second is by renaming the 1.875/2.125 split to
CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
performs a similar manuever to avoid aliasing vmallocspace with
the 1:1 mapping for physical memory around the 3GB boundary. With
the 1.875/2.125 split properly-named, its config option is then
tagged as depending on !HIGHMEM to express the PAE implementation's
current inability to deal with such unaligned splits.
This patch is essentially a combination of two patches, one written
by Eric Biederman and the other by Eric Dumazet. If they could add
their Signed-off-by: to this, I'd be much obliged.
Signed-off-by: William Irwin <wli@holomorphy.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Index: sched/arch/i386/Kconfig
===================================================================
--- sched.orig/arch/i386/Kconfig 2007-05-01 04:35:47.065162310 -0700
+++ sched/arch/i386/Kconfig 2007-05-01 04:36:50.100754504 -0700
@@ -571,6 +571,9 @@
bool "3G/1G user/kernel split (for full 1G low memory)"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
+ config VMSPLIT_2G_OPT
+ depends on !HIGHMEM
+ bool "2G/2G user/kernel split (for full 2G low memory)"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
endchoice
@@ -578,7 +581,8 @@
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
- default 0x78000000 if VMSPLIT_2G
+ default 0x80000000 if VMSPLIT_2G
+ default 0x78000000 if VMSPLIT_2G_OPT
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 17:00 ` Bill Irwin
@ 2007-05-01 17:17 ` Eric W. Biederman
2007-05-01 20:41 ` Eric Dumazet
2007-05-02 9:38 ` Andi Kleen
1 sibling, 1 reply; 52+ messages in thread
From: Eric W. Biederman @ 2007-05-01 17:17 UTC (permalink / raw)
To: Bill Irwin; +Cc: Andi Kleen, patches, linux-kernel
Bill Irwin <bill.irwin@oracle.com> writes:
>
> Only 1GB-aligned kernel/user splits are now handled for PAE. The
> 2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
> mapping for physical memory by using an actual split of 1.875/2.125
> to accommodate 128MB of vmallocspace out of what would otherwise
> be a full 2GB for userspace. That attempt disturbs the alignment
> required by PAE for 2GB/2GB splits, and furthermore does not provide
> a 2GB/2GB split as advertised.
>
> This patch resolves the issues here in two manners. The first is
> by providing a true 2GB/2GB split in addition to the 1.875/2.125
> split. The second is by renaming the 1.875/2.125 split to
> CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
> performs a similar manuever to avoid aliasing vmallocspace with
> the 1:1 mapping for physical memory around the 3GB boundary. With
> the 1.875/2.125 split properly-named, its config option is then
> tagged as depending on !HIGHMEM to express the PAE implementation's
> current inability to deal with such unaligned splits.
>
> This patch is essentially a combination of two patches, one written
> by Eric Biederman and the other by Eric Dumazet. If they could add
> their Signed-off-by: to this, I'd be much obliged.
>
> Signed-off-by: William Irwin <wli@holomorphy.com>
> Cc: Eric Dumazet <dada1@cosmosbay.com>
> Cc: Mark Lord <lkml@rtr.ca>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Andi Kleen <ak@suse.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>
> Index: sched/arch/i386/Kconfig
> ===================================================================
> --- sched.orig/arch/i386/Kconfig 2007-05-01 04:35:47.065162310 -0700
> +++ sched/arch/i386/Kconfig 2007-05-01 04:36:50.100754504 -0700
> @@ -571,6 +571,9 @@
> bool "3G/1G user/kernel split (for full 1G low memory)"
> config VMSPLIT_2G
> bool "2G/2G user/kernel split"
> + config VMSPLIT_2G_OPT
> + depends on !HIGHMEM
> + bool "2G/2G user/kernel split (for full 2G low memory)"
> config VMSPLIT_1G
> bool "1G/3G user/kernel split"
> endchoice
> @@ -578,7 +581,8 @@
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> - default 0x78000000 if VMSPLIT_2G
> + default 0x80000000 if VMSPLIT_2G
> + default 0x78000000 if VMSPLIT_2G_OPT
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 17:17 ` Eric W. Biederman
@ 2007-05-01 20:41 ` Eric Dumazet
0 siblings, 0 replies; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 20:41 UTC (permalink / raw)
To: Bill Irwin; +Cc: Eric W. Biederman, Andi Kleen, patches, linux-kernel
Eric W. Biederman a écrit :
> Bill Irwin <bill.irwin@oracle.com> writes:
>
>> Only 1GB-aligned kernel/user splits are now handled for PAE. The
>> 2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
>> mapping for physical memory by using an actual split of 1.875/2.125
>> to accommodate 128MB of vmallocspace out of what would otherwise
>> be a full 2GB for userspace. That attempt disturbs the alignment
>> required by PAE for 2GB/2GB splits, and furthermore does not provide
>> a 2GB/2GB split as advertised.
>>
>> This patch resolves the issues here in two manners. The first is
>> by providing a true 2GB/2GB split in addition to the 1.875/2.125
>> split. The second is by renaming the 1.875/2.125 split to
>> CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
>> performs a similar manuever to avoid aliasing vmallocspace with
>> the 1:1 mapping for physical memory around the 3GB boundary. With
>> the 1.875/2.125 split properly-named, its config option is then
>> tagged as depending on !HIGHMEM to express the PAE implementation's
>> current inability to deal with such unaligned splits.
>>
>> This patch is essentially a combination of two patches, one written
>> by Eric Biederman and the other by Eric Dumazet. If they could add
>> their Signed-off-by: to this, I'd be much obliged.
>>
>> Signed-off-by: William Irwin <wli@holomorphy.com>
>> Cc: Eric Dumazet <dada1@cosmosbay.com>
>> Cc: Mark Lord <lkml@rtr.ca>
>> Cc: Eric W. Biederman <ebiederm@xmission.com>
>> Cc: Andi Kleen <ak@suse.de>
>
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 17:00 ` Bill Irwin
2007-05-01 17:17 ` Eric W. Biederman
@ 2007-05-02 9:38 ` Andi Kleen
1 sibling, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2007-05-02 9:38 UTC (permalink / raw)
To: Bill Irwin; +Cc: Eric Dumazet, ebiederm, patches, linux-kernel
On Tuesday 01 May 2007 19:00:46 Bill Irwin wrote:
> Bill Irwin a écrit :
> >> as a stopgap measure, but I'm not all that interested in grabbing patch
> >> credits where others could do it easily enough. Either of the config
> >> alterations is fine by me as they now stand; maybe Eric Dumazet might
> >> care to do something like my suggestion at some point.
>
> On Tue, May 01, 2007 at 05:51:44PM +0200, Eric Dumazet wrote:
> > Your patch is very fine Bill, please resubmit it with proper Signed-off-by
> > My first patch was a trivial reaction to try to keep alive 2G/2G split,
> > yours is better for fine tuning.
>
> I was hoping you would submit it as an update, but maybe adding your
> Signed-off-by: to my own will do.
Added thanks
-Andi
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
2007-05-01 4:26 ` Eric Dumazet
@ 2007-05-01 4:37 ` William Lee Irwin III
2007-05-01 4:57 ` Eric Dumazet
` (2 more replies)
1 sibling, 3 replies; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01 4:37 UTC (permalink / raw)
To: Andi Kleen; +Cc: Mark Lord, ebiederm, patches, linux-kernel
On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
> From: ebiederm@xmission.com
> When in PAE mode we require that the user kernel divide to be
> on a 1G boundary. The 2G/2G split does not have that property
> so require !X86_PAE
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> arch/i386/Kconfig | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
What on earth?
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
default 0x78000000 if VMSPLIT_2G
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
This appears to have been introduced by:
commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
Author: Mark Lord <lkml@rtr.ca>
Date: Wed Feb 1 03:06:11 2006 -0800
[PATCH] VMSPLIT config options
There's some sort of insanity going on here. Since when is 0x78000000
a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
nothing else, as it's certainly not 2GB/2GB.
These VMSPLIT config options vs. PAE are foul as they're now done in
any event. If they were done properly, they'd properly set up the pmd
within which the division point between user and kernelspace falls.
This patch, I suppose, stops people from shooting themselves in the
foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
without handling the partial pmd case. 2MB/4MB resolution is enough
granularity for any reasonable purpose, so split ptes aren't worth the
effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
If you're going to play the VMSPLIT game at all, handle split pmd's.
I'll see what else is pending in the i386 pagetable arena and clear
this up if there aren't other objections (this is where Andi gets to
complain that things are too complex already and preemptively NAK to
save me the effort, if it's not seen to be desirable). Eric, your patch
is a reasonable stop-gap measure for the original deficiency.
-- wli
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 4:37 ` William Lee Irwin III
@ 2007-05-01 4:57 ` Eric Dumazet
2007-05-01 5:11 ` William Lee Irwin III
2007-05-01 5:35 ` Eric W. Biederman
2007-05-01 13:32 ` Mark Lord
2 siblings, 1 reply; 52+ messages in thread
From: Eric Dumazet @ 2007-05-01 4:57 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Andi Kleen, Mark Lord, ebiederm, patches, linux-kernel
William Lee Irwin III a écrit :
> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary. The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>> arch/i386/Kconfig | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> What on earth?
>
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> default 0x78000000 if VMSPLIT_2G
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date: Wed Feb 1 03:06:11 2006 -0800
> [PATCH] VMSPLIT config options
>
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.
Please could you stop saying others are insane ?
They are like you and can fail sometime. Apparently when the patch came,
nobody (including you) commented.
It's not that difficult to think about VMALLOC space (I might be wrong about
this, but I feel this explains 78000000 vs 80000000)
>
> These VMSPLIT config options vs. PAE are foul as they're now done in
> any event. If they were done properly, they'd properly set up the pmd
> within which the division point between user and kernelspace falls.
>
> This patch, I suppose, stops people from shooting themselves in the
> foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
> without handling the partial pmd case. 2MB/4MB resolution is enough
> granularity for any reasonable purpose, so split ptes aren't worth the
> effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
> If you're going to play the VMSPLIT game at all, handle split pmd's.
>
> I'll see what else is pending in the i386 pagetable arena and clear
> this up if there aren't other objections (this is where Andi gets to
> complain that things are too complex already and preemptively NAK to
> save me the effort, if it's not seen to be desirable). Eric, your patch
> is a reasonable stop-gap measure for the original deficiency.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 4:57 ` Eric Dumazet
@ 2007-05-01 5:11 ` William Lee Irwin III
0 siblings, 0 replies; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01 5:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andi Kleen, Mark Lord, ebiederm, patches, linux-kernel
William Lee Irwin III a ?crit :
>> There's some sort of insanity going on here. Since when is 0x78000000
>> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
>> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
>> nothing else, as it's certainly not 2GB/2GB.
On Tue, May 01, 2007 at 06:57:19AM +0200, Eric Dumazet wrote:
> Please could you stop saying others are insane ?
> They are like you and can fail sometime. Apparently when the patch came,
> nobody (including you) commented.
> It's not that difficult to think about VMALLOC space (I might be wrong
> about this, but I feel this explains 78000000 vs 80000000)
I'm obviously aware of vmallocspace. Read carefully:
>> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
>> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
>> nothing else, as it's certainly not 2GB/2GB.
The meaning of "for laptops" is that it's carving out a chunk of
user virtualspace to use for vmallocspace in lieu of carving out a
piece of the 1:1 mapping of physical memory for the same purpose.
-- wli
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 4:37 ` William Lee Irwin III
2007-05-01 4:57 ` Eric Dumazet
@ 2007-05-01 5:35 ` Eric W. Biederman
2007-05-01 13:32 ` Mark Lord
2 siblings, 0 replies; 52+ messages in thread
From: Eric W. Biederman @ 2007-05-01 5:35 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andi Kleen, Mark Lord, patches, linux-kernel
William Lee Irwin III <wli@holomorphy.com> writes:
> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary. The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>> arch/i386/Kconfig | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> What on earth?
>
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> default 0x78000000 if VMSPLIT_2G
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date: Wed Feb 1 03:06:11 2006 -0800
> [PATCH] VMSPLIT config options
>
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.
It makes a little more sense when you realize all of the options
were originally !X86_PAE. So they were designed with highmem
disabled.
> These VMSPLIT config options vs. PAE are foul as they're now done in
> any event. If they were done properly, they'd properly set up the pmd
> within which the division point between user and kernelspace falls.
They were designed to avoid highmem a totally different design point.
> This patch, I suppose, stops people from shooting themselves in the
> foot, but (IMHO) the VMSPLIT patches shouldn't have been merged
> without handling the partial pmd case. 2MB/4MB resolution is enough
> granularity for any reasonable purpose, so split ptes aren't worth the
> effort, but this nonsense with PAE vs. VMSPLIT is just preposterous.
> If you're going to play the VMSPLIT game at all, handle split pmd's.
What I find telling is that I fixed this based on code review not
based on bug reports.
> I'll see what else is pending in the i386 pagetable arena and clear
> this up if there aren't other objections (this is where Andi gets to
> complain that things are too complex already and preemptively NAK to
> save me the effort, if it's not seen to be desirable). Eric, your patch
> is a reasonable stop-gap measure for the original deficiency.
Frankly rather then putting much effort into this I suspect we should
just delete these options entirely. We are long past the point where
they matter.
Eric
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 4:37 ` William Lee Irwin III
2007-05-01 4:57 ` Eric Dumazet
2007-05-01 5:35 ` Eric W. Biederman
@ 2007-05-01 13:32 ` Mark Lord
2007-05-01 14:17 ` William Lee Irwin III
2 siblings, 1 reply; 52+ messages in thread
From: Mark Lord @ 2007-05-01 13:32 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andi Kleen, ebiederm, patches, linux-kernel
William Lee Irwin III wrote:
> On Tue, May 01, 2007 at 05:58:29AM +0200, Andi Kleen wrote:
>> From: ebiederm@xmission.com
>> When in PAE mode we require that the user kernel divide to be
>> on a 1G boundary. The 2G/2G split does not have that property
>> so require !X86_PAE
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>> arch/i386/Kconfig | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> What on earth?
>
> config PAGE_OFFSET
> hex
> default 0xB0000000 if VMSPLIT_3G_OPT
> default 0x78000000 if VMSPLIT_2G
> default 0x40000000 if VMSPLIT_1G
> default 0xC0000000
>
> This appears to have been introduced by:
> commit 975b3d3d5b983eb60706d35f0d24cd19f6badabf
> Author: Mark Lord <lkml@rtr.ca>
> Date: Wed Feb 1 03:06:11 2006 -0800
> [PATCH] VMSPLIT config options
>
> There's some sort of insanity going on here. Since when is 0x78000000
> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
> nothing else, as it's certainly not 2GB/2GB.
You need to go search the archives and read the *extensive* thread
on this from when it was introduced. Lots of high profile kernel
developers were in on this one.
The idea is really simple: eliminate the need for HIGHMEM
on common machines.
And yes, VMSPLIT_2G really means VMSPLIT_2G_OPT,
in the same way as the (added last) VMSPLIT_3G_OPT flag.
Cheers
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 13:32 ` Mark Lord
@ 2007-05-01 14:17 ` William Lee Irwin III
2007-05-01 14:20 ` Mark Lord
0 siblings, 1 reply; 52+ messages in thread
From: William Lee Irwin III @ 2007-05-01 14:17 UTC (permalink / raw)
To: Mark Lord; +Cc: Andi Kleen, ebiederm, patches, linux-kernel
William Lee Irwin III wrote:
>> There's some sort of insanity going on here. Since when is 0x78000000
>> a 2GB/2GB split? Mark, dare I ask what you were thinking? That should
>> be VMSPLIT_2G_OPT for 2GB laptops analogously to VMSPLIT_3G_OPT, if
>> nothing else, as it's certainly not 2GB/2GB.
On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> You need to go search the archives and read the *extensive* thread
> on this from when it was introduced. Lots of high profile kernel
> developers were in on this one.
Various good points were raised. The NX bit requiring
3-level pagetables suggests that 3-level pagetables without highmem
would be valuable, at which point the unaligned splits to avoid highmem
with 3-level pagetables gain additional relevance. I'll queue that up
to be written along with split pmd affairs, and check for features
dependent on 4-level pagetables, too.
On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> The idea is really simple: eliminate the need for HIGHMEM
> on common machines.
Of course. Everything is simple until it's done properly. It's
why everyone forgets the "and no simpler" half of the quote, which
turns the meaning of what everyone tries to quote on its head.
I guess looking over it I don't blame you so much. I should've been
around to clean up these issues at the time it was happening. People
seem to have been trying to use the RAM in their laptops without big
slowdowns, and may not have had the VM "sophistication" to deal with
the sorts of issues I'm raising.
On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
> And yes, VMSPLIT_2G really means VMSPLIT_2G_OPT,
> in the same way as the (added last) VMSPLIT_3G_OPT flag.
This was a large portion of the substance of my complaint.
-- wli
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split.
2007-05-01 14:17 ` William Lee Irwin III
@ 2007-05-01 14:20 ` Mark Lord
0 siblings, 0 replies; 52+ messages in thread
From: Mark Lord @ 2007-05-01 14:20 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andi Kleen, ebiederm, patches, linux-kernel
William Lee Irwin III wrote:
>..
> On Tue, May 01, 2007 at 09:32:33AM -0400, Mark Lord wrote:
>> You need to go search the archives and read the *extensive* thread
>> on this from when it was introduced. Lots of high profile kernel
>> developers were in on this one.
>
> Various good points were raised. The NX bit requiring
> 3-level pagetables suggests that 3-level pagetables without highmem
> would be valuable, at which point the unaligned splits to avoid highmem
> with 3-level pagetables gain additional relevance. I'll queue that up
> to be written along with split pmd affairs, and check for features
> dependent on 4-level pagetables, too.
Excellent! That would be super to have in there, William.
Cheers!
^ permalink raw reply [flat|nested] 52+ messages in thread