* [patch 01/13] x86: PAT documentation
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 02/13] x86: PAT infrastructure patch venkatesh.pallipadi
` (13 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: pat_documentation.patch --]
[-- Type: text/plain, Size: 6326 bytes --]
Documentation about PAT related interfaces, intended usage and memory attribute
relationship.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/Documentation/x86/pat.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86.git/Documentation/x86/pat.txt 2008-03-17 05:05:53.000000000 -0700
@@ -0,0 +1,100 @@
+
+PAT (Page Attribute Table)
+
+x86 Page Attribute Table (PAT) allows for setting the memory attribute at the
+page level granularity. PAT is complementary to the MTRR settings which allows
+for setting of memory types over physical address ranges. However, PAT is
+more flexible than MTRR due to its capability to set attributes at page level
+and also due to the fact that there are no hardware limitations on number of
+such attribute settings allowed. Added flexibility comes with guidelines for
+not having memory type aliasing for the same physical memory with multiple
+virtual addresses.
+
+PAT allows for different types of memory attributes. The most commonly used
+ones that will be supported at this time are Write-back, Uncached,
+Write-combined and Uncached Minus.
+
+There are many different APIs in the kernel that allows setting of memory
+attributes at the page level. In order to avoid aliasing, these interfaces
+should be used thoughtfully. Below is a table of interfaces available,
+their intended usage and their memory attribute relationships. Internally,
+these APIs use a reserve_memtype()/free_memtype() interface on the physical
+address range to avoid any aliasing.
+
+
+-------------------------------------------------------------------
+API | RAM | ACPI,... | Reserved/Holes |
+-----------------------|----------|------------|------------------|
+ | | | |
+ioremap | -- | UC | UC |
+ | | | |
+ioremap_cache | -- | WB | WB |
+ | | | |
+ioremap_nocache | -- | UC | UC |
+ | | | |
+ioremap_wc | -- | -- | WC |
+ | | | |
+set_memory_uc | UC | -- | -- |
+ set_memory_wb | | | |
+ | | | |
+set_memory_wc | WC | -- | -- |
+ set_memory_wb | | | |
+ | | | |
+pci sysfs resource | -- | -- | UC |
+ | | | |
+pci sysfs resource_wc | -- | -- | WC |
+ is IORESOURCE_PREFETCH| | | |
+ | | | |
+pci proc | -- | -- | UC |
+ !PCIIOC_WRITE_COMBINE | | | |
+ | | | |
+pci proc | -- | -- | WC |
+ PCIIOC_WRITE_COMBINE | | | |
+ | | | |
+/dev/mem | -- | UC | UC |
+ read-write | | | |
+ | | | |
+/dev/mem | -- | UC | UC |
+ mmap SYNC flag | | | |
+ | | | |
+/dev/mem | -- | WB/WC/UC | WB/WC/UC |
+ mmap !SYNC flag | |(from exist-| (from exist- |
+ and | | ing alias)| ing alias) |
+ any alias to this area| | | |
+ | | | |
+/dev/mem | -- | WB | WB |
+ mmap !SYNC flag | | | |
+ no alias to this area | | | |
+ and | | | |
+ MTRR says WB | | | |
+ | | | |
+/dev/mem | -- | -- | UC_MINUS |
+ mmap !SYNC flag | | | |
+ no alias to this area | | | |
+ and | | | |
+ MTRR says !WB | | | |
+ | | | |
+-------------------------------------------------------------------
+
+Notes:
+
+-- in the above table mean "Not suggested usage for the API". Some of the --'s
+are strictly enforced by the kernel. Some others are not really enforced
+today, but may be enforced in future.
+
+For ioremap and pci access through /sys or /proc - The actual type returned
+can be more restrictive, in case of any existing aliasing for that address.
+For example: If there is an existing uncached mapping, a new ioremap_wc can
+return uncached mapping in place of write-combine requested.
+
+set_memory_[uc|wc] and set_memory_wb should be used in pairs, where driver will
+first make a region uc or wc and switch it back to wb after use.
+
+Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
+interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
+
+Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
+types.
+
+Drivers should use set_memory_[uc|wc] to set access type for RAM ranges.
+
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 02/13] x86: PAT infrastructure patch
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
2008-03-19 0:00 ` [patch 01/13] x86: PAT documentation venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 20:06 ` Randy Dunlap
2008-03-19 0:00 ` [patch 03/13] x86: PAT Avoid aliasing in /dev/mem read/write venkatesh.pallipadi
` (12 subsequent siblings)
14 siblings, 1 reply; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: pat_init_infrastructure.patch --]
[-- Type: text/plain, Size: 20244 bytes --]
Sets up pat_init() infrastructure.
PAT MSR has following setting.
PAT
|PCD
||PWT
|||
000 WB _PAGE_CACHE_WB
001 WC _PAGE_CACHE_WC
010 UC- _PAGE_CACHE_UC_MINUS
011 UC _PAGE_CACHE_UC
We are effectively changing WT from boot time setting to WC.
UC_MINUS is used to provide backward compatibility to existing /dev/mem
users(X).
reserve_memtype and free_memtype are new interfaces for maintaining alias-free
mapping. It is currently implemented in a simple way with a linked list and
not optimized. reserve and free tracks the effective memory type, as a result
of PAT and MTRR setting rather than what is actually requested in PAT.
pat_init piggy backs on mtrr_init as the rules for setting both pat and mtrr
are same.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/kernel/cpu/mtrr/generic.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/kernel/cpu/mtrr/generic.c 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/kernel/cpu/mtrr/generic.c 2008-03-18 02:54:04.000000000 -0700
@@ -11,6 +11,7 @@
#include <asm/cpufeature.h>
#include <asm/processor-flags.h>
#include <asm/tlbflush.h>
+#include <asm/pat.h>
#include "mtrr.h"
struct mtrr_state {
@@ -35,6 +36,7 @@
static unsigned long smp_changes_mask;
static struct mtrr_state mtrr_state = {};
+static int mtrr_state_set;
#undef MODULE_PARAM_PREFIX
#define MODULE_PARAM_PREFIX "mtrr."
@@ -42,6 +44,106 @@
static int mtrr_show;
module_param_named(show, mtrr_show, bool, 0);
+/*
+ * Returns the effective MTRR type for the region
+ * Error returns:
+ * - 0xFE - when the range is "not entirely covered" by _any_ var range MTRR
+ * - 0xFF - when MTRR is not enabled
+ */
+u8 mtrr_type_lookup(u64 start, u64 end)
+{
+ int i;
+ u64 base, mask;
+ u8 prev_match, curr_match;
+
+ if (!mtrr_state_set)
+ return 0xFF;
+
+ if (!mtrr_state.enabled)
+ return 0xFF;
+
+ /* Make end inclusive end, instead of exclusive */
+ end--;
+
+ /* Look in fixed ranges. Just return the type as per start */
+ if (mtrr_state.have_fixed && (start < 0x100000)) {
+ int idx;
+
+ if (start < 0x80000) {
+ idx = 0;
+ idx += (start >> 16);
+ return mtrr_state.fixed_ranges[idx];
+ } else if (start < 0xC0000) {
+ idx = 1 * 8;
+ idx += ((start - 0x80000) >> 14);
+ return mtrr_state.fixed_ranges[idx];
+ } else if (start < 0x1000000) {
+ idx = 3 * 8;
+ idx += ((start - 0xC0000) >> 12);
+ return mtrr_state.fixed_ranges[idx];
+ }
+ }
+
+ /*
+ * Look in variable ranges
+ * Look of multiple ranges matching this address and pick type
+ * as per MTRR precedence
+ */
+ if (!mtrr_state.enabled & 2) {
+ return mtrr_state.def_type;
+ }
+
+ prev_match = 0xFF;
+ for (i = 0; i < num_var_ranges; ++i) {
+ unsigned short start_state, end_state;
+
+ if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
+ continue;
+
+ base = (((u64)mtrr_state.var_ranges[i].base_hi) << 32) +
+ (mtrr_state.var_ranges[i].base_lo & PAGE_MASK);
+ mask = (((u64)mtrr_state.var_ranges[i].mask_hi) << 32) +
+ (mtrr_state.var_ranges[i].mask_lo & PAGE_MASK);
+
+ start_state = ((start & mask) == (base & mask));
+ end_state = ((end & mask) == (base & mask));
+ if (start_state != end_state)
+ return 0xFE;
+
+ if ((start & mask) != (base & mask)) {
+ continue;
+ }
+
+ curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
+ if (prev_match == 0xFF) {
+ prev_match = curr_match;
+ continue;
+ }
+
+ if (prev_match == MTRR_TYPE_UNCACHABLE ||
+ curr_match == MTRR_TYPE_UNCACHABLE) {
+ return MTRR_TYPE_UNCACHABLE;
+ }
+
+ if ((prev_match == MTRR_TYPE_WRBACK &&
+ curr_match == MTRR_TYPE_WRTHROUGH) ||
+ (prev_match == MTRR_TYPE_WRTHROUGH &&
+ curr_match == MTRR_TYPE_WRBACK)) {
+ prev_match = MTRR_TYPE_WRTHROUGH;
+ curr_match = MTRR_TYPE_WRTHROUGH;
+ }
+
+ if (prev_match != curr_match) {
+ return MTRR_TYPE_UNCACHABLE;
+ }
+ }
+
+ if (prev_match != 0xFF)
+ return prev_match;
+
+ return mtrr_state.def_type;
+}
+
/* Get the MSR pair relating to a var range */
static void
get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr)
@@ -79,12 +181,16 @@
base, base + step - 1, mtrr_attrib_to_str(*types));
}
+static void prepare_set(void);
+static void post_set(void);
+
/* Grab all of the MTRR state for this CPU into *state */
void __init get_mtrr_state(void)
{
unsigned int i;
struct mtrr_var_range *vrs;
unsigned lo, dummy;
+ unsigned long flags;
vrs = mtrr_state.var_ranges;
@@ -131,6 +237,17 @@
printk(KERN_INFO "MTRR %u disabled\n", i);
}
}
+ mtrr_state_set = 1;
+
+ /* PAT setup for BP. We need to go through sync steps here */
+ local_irq_save(flags);
+ prepare_set();
+
+ pat_init();
+
+ post_set();
+ local_irq_restore(flags);
+
}
/* Some BIOS's are fucked and don't set all MTRRs the same! */
@@ -393,6 +510,9 @@
/* Actually set the state */
mask = set_mtrr_state();
+ /* also set PAT */
+ pat_init();
+
post_set();
local_irq_restore(flags);
Index: linux-2.6-x86.git/arch/x86/mm/pat.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86.git/arch/x86/mm/pat.c 2008-03-18 09:20:34.000000000 -0700
@@ -0,0 +1,402 @@
+/*
+ * Handle caching attributes in page tables (PAT)
+ *
+ * Authors: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
+ * Suresh B Siddha <suresh.b.siddha@intel.com>
+ *
+ * Loosely based on earlier PAT patchset from Eric Biederman and Andi Kleen.
+ */
+
+#include <linux/mm.h>
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/fs.h>
+
+#include <asm/msr.h>
+#include <asm/tlbflush.h>
+#include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/cacheflush.h>
+#include <asm/fcntl.h>
+#include <asm/mtrr.h>
+
+int pat_wc_enabled = 1;
+
+static u64 __read_mostly boot_pat_state;
+
+static int nopat(char *str)
+{
+ pat_wc_enabled = 0;
+ printk(KERN_INFO "x86: PAT support disabled.\n");
+
+ return 0;
+}
+early_param("nopat", nopat);
+
+static int pat_known_cpu(void)
+{
+ if (!pat_wc_enabled)
+ return 0;
+
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+ (boot_cpu_data.x86 == 0xF ||
+ (boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model >= 15))) {
+ if (cpu_has_pat) {
+ return 1;
+ }
+ }
+
+ pat_wc_enabled = 0;
+ printk(KERN_INFO "CPU and/or kernel does not support PAT.\n");
+ return 0;
+}
+
+enum {
+ PAT_UC = 0, /* uncached */
+ PAT_WC = 1, /* Write combining */
+ PAT_WT = 4, /* Write Through */
+ PAT_WP = 5, /* Write Protected */
+ PAT_WB = 6, /* Write Back (default) */
+ PAT_UC_MINUS = 7, /* UC, but can be overriden by MTRR */
+};
+
+#define PAT(x,y) ((u64)PAT_ ## y << ((x)*8))
+
+void pat_init(void)
+{
+ u64 pat;
+
+#ifndef CONFIG_X86_PAT
+ nopat(NULL);
+#endif
+
+ /* Boot CPU enables PAT based on CPU feature */
+ if (!smp_processor_id() && !pat_known_cpu())
+ return;
+
+ /* APs enable PAT iff boot CPU has enabled it before */
+ if (smp_processor_id() && !pat_wc_enabled)
+ return;
+
+ /* Set PWT to Write-Combining. All other bits stay the same */
+ /*
+ * PTE encoding used in Linux:
+ * PAT
+ * |PCD
+ * ||PWT
+ * |||
+ * 000 WB _PAGE_CACHE_WB
+ * 001 WC _PAGE_CACHE_WC
+ * 010 UC- _PAGE_CACHE_UC_MINUS
+ * 011 UC _PAGE_CACHE_UC
+ * PAT bit unused
+ */
+ pat = PAT(0,WB) | PAT(1,WC) | PAT(2,UC_MINUS) | PAT(3,UC) |
+ PAT(4,WB) | PAT(5,WC) | PAT(6,UC_MINUS) | PAT(7,UC);
+
+ /* Boot CPU check */
+ if (!smp_processor_id()) {
+ rdmsrl(MSR_IA32_CR_PAT, boot_pat_state);
+ }
+
+ wrmsrl(MSR_IA32_CR_PAT, pat);
+ printk(KERN_INFO "x86 PAT enabled: cpu %d, old 0x%Lx, new 0x%Lx\n",
+ smp_processor_id(), boot_pat_state, pat);
+}
+
+#undef PAT
+
+static char *cattr_name(unsigned long flags)
+{
+ switch (flags & _PAGE_CACHE_MASK) {
+ case _PAGE_CACHE_UC: return "uncached";
+ case _PAGE_CACHE_UC_MINUS: return "uncached-minus";
+ case _PAGE_CACHE_WB: return "write-back";
+ case _PAGE_CACHE_WC: return "write-combining";
+ default: return "broken";
+ }
+}
+
+/*
+ * The global memtype list keeps track of memory type for specific
+ * physical memory areas. Conflicting memory types in different
+ * mappings can cause CPU cache corruption. To avoid this we keep track.
+ *
+ * The list is sorted based on starting address and can contain multiple
+ * entries for each address (this allows reference counting for overlapping
+ * areas). All the aliases have the same cache attributes of course.
+ * Zero attributes are represented as holes.
+ *
+ * Currently the data structure is a list because the number of mappings
+ * are expected to be relatively small. If this should be a problem
+ * it could be changed to a rbtree or similar.
+ *
+ * memtype_lock protects the whole list.
+ */
+
+struct memtype {
+ u64 start;
+ u64 end;
+ unsigned long type;
+ struct list_head nd;
+};
+
+static LIST_HEAD(memtype_list);
+static DEFINE_SPINLOCK(memtype_lock); /* protects memtype list */
+
+/*
+ * Does intersection of PAT memory type and MTRR memory type and returns
+ * the resulting memory type as PAT understands it.
+ * (Type in pat and mtrr will not have same value)
+ * The intersection is based on "Effective Memory Type" tables in IA-32
+ * SDM vol 3a
+ */
+static int pat_x_mtrr_type(u64 start, u64 end, unsigned long prot,
+ unsigned long *ret_prot)
+{
+ unsigned long pat_type;
+ u8 mtrr_type;
+
+ mtrr_type = mtrr_type_lookup(start, end);
+ if (mtrr_type == 0xFF) { /* MTRR not enabled */
+ *ret_prot = prot;
+ return 0;
+ }
+ if (mtrr_type == 0xFE) { /* MTRR match error */
+ *ret_prot = _PAGE_CACHE_UC;
+ return -1;
+ }
+ if (mtrr_type != MTRR_TYPE_UNCACHABLE &&
+ mtrr_type != MTRR_TYPE_WRBACK &&
+ mtrr_type != MTRR_TYPE_WRCOMB) { /* MTRR type unhandled */
+ *ret_prot = _PAGE_CACHE_UC;
+ return -1;
+ }
+
+ pat_type = prot & _PAGE_CACHE_MASK;
+ prot &= (~_PAGE_CACHE_MASK);
+
+ /* Currently doing intersection by hand. Optimize it later. */
+ if (pat_type == _PAGE_CACHE_WC) {
+ *ret_prot = prot | _PAGE_CACHE_WC;
+ } else if (pat_type == _PAGE_CACHE_UC_MINUS) {
+ *ret_prot = prot | _PAGE_CACHE_UC_MINUS;
+ } else if (pat_type == _PAGE_CACHE_UC ||
+ mtrr_type == MTRR_TYPE_UNCACHABLE) {
+ *ret_prot = prot | _PAGE_CACHE_UC;
+ } else if (mtrr_type == MTRR_TYPE_WRCOMB) {
+ *ret_prot = prot | _PAGE_CACHE_WC;
+ } else {
+ *ret_prot = prot | _PAGE_CACHE_WB;
+ }
+
+ return 0;
+}
+
+int reserve_memtype(u64 start, u64 end, unsigned long req_type,
+ unsigned long *ret_type)
+{
+ struct memtype *new_entry = NULL;
+ struct memtype *parse;
+ unsigned long actual_type;
+ int err = 0;
+
+ /* Only track when pat_wc_enabled */
+ if (!pat_wc_enabled) {
+ if (ret_type)
+ *ret_type = req_type;
+
+ return 0;
+ }
+
+ /* Low ISA region is always mapped WB in page table. No need to track */
+ if (start >= ISA_START_ADDRESS && (end - 1) <= ISA_END_ADDRESS) {
+ if (ret_type)
+ *ret_type = _PAGE_CACHE_WB;
+
+ return 0;
+ }
+
+ req_type &= _PAGE_CACHE_MASK;
+ err = pat_x_mtrr_type(start, end, req_type, &actual_type);
+ if (err) {
+ if (ret_type)
+ *ret_type = actual_type;
+
+ return -EINVAL;
+ }
+
+ new_entry = kmalloc(sizeof(struct memtype), GFP_KERNEL);
+ if (!new_entry)
+ return -ENOMEM;
+
+ new_entry->start = start;
+ new_entry->end = end;
+ new_entry->type = actual_type;
+
+ if (ret_type)
+ *ret_type = actual_type;
+
+ spin_lock(&memtype_lock);
+
+ /* Search for existing mapping that overlaps the current range */
+ list_for_each_entry(parse, &memtype_list, nd) {
+ struct memtype *saved_ptr;
+
+ if (parse->start >= end) {
+ list_add(&new_entry->nd, parse->nd.prev);
+ new_entry = NULL;
+ break;
+ }
+
+ if (start <= parse->start && end >= parse->start) {
+ if (actual_type != parse->type && ret_type) {
+ actual_type = parse->type;
+ *ret_type = actual_type;
+ new_entry->type = actual_type;
+ }
+
+ if (actual_type != parse->type) {
+ printk(
+ KERN_INFO "%s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid,
+ start, end,
+ cattr_name(actual_type),
+ cattr_name(parse->type));
+ err = -EBUSY;
+ break;
+ }
+
+ saved_ptr = parse;
+ /*
+ * Check to see whether the request overlaps more
+ * than one entry in the list
+ */
+ list_for_each_entry_continue(parse, &memtype_list, nd) {
+ if (end <= parse->start) {
+ break;
+ }
+
+ if (actual_type != parse->type) {
+ printk(
+ KERN_INFO "%s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid,
+ start, end,
+ cattr_name(actual_type),
+ cattr_name(parse->type));
+ err = -EBUSY;
+ break;
+ }
+ }
+
+ if (err) {
+ break;
+ }
+
+ /* No conflict. Go ahead and add this new entry */
+ list_add(&new_entry->nd, saved_ptr->nd.prev);
+ new_entry = NULL;
+ break;
+ }
+
+ if (start < parse->end) {
+ if (actual_type != parse->type && ret_type) {
+ actual_type = parse->type;
+ *ret_type = actual_type;
+ new_entry->type = actual_type;
+ }
+
+ if (actual_type != parse->type) {
+ printk(
+ KERN_INFO "%s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid,
+ start, end,
+ cattr_name(actual_type),
+ cattr_name(parse->type));
+ err = -EBUSY;
+ break;
+ }
+
+ saved_ptr = parse;
+ /*
+ * Check to see whether the request overlaps more
+ * than one entry in the list
+ */
+ list_for_each_entry_continue(parse, &memtype_list, nd) {
+ if (end <= parse->start) {
+ break;
+ }
+
+ if (actual_type != parse->type) {
+ printk(
+ KERN_INFO "%s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+ current->comm, current->pid,
+ start, end,
+ cattr_name(actual_type),
+ cattr_name(parse->type));
+ err = -EBUSY;
+ break;
+ }
+ }
+
+ if (err) {
+ break;
+ }
+
+ /* No conflict. Go ahead and add this new entry */
+ list_add(&new_entry->nd, &saved_ptr->nd);
+ new_entry = NULL;
+ break;
+ }
+ }
+
+ if (err) {
+ kfree(new_entry);
+ spin_unlock(&memtype_lock);
+ return err;
+ }
+
+ if (new_entry) {
+ /* No conflict. Not yet added to the list. Add to the tail */
+ list_add_tail(&new_entry->nd, &memtype_list);
+ }
+
+ spin_unlock(&memtype_lock);
+ return err;
+}
+
+int free_memtype(u64 start, u64 end)
+{
+ struct memtype *ml;
+ int err = -EINVAL;
+
+ /* Only track when pat_wc_enabled */
+ if (!pat_wc_enabled) {
+ return 0;
+ }
+
+ /* Low ISA region is always mapped WB. No need to track */
+ if (start >= ISA_START_ADDRESS && end <= ISA_END_ADDRESS) {
+ return 0;
+ }
+
+ spin_lock(&memtype_lock);
+ list_for_each_entry(ml, &memtype_list, nd) {
+ if (ml->start == start && ml->end == end) {
+ list_del(&ml->nd);
+ kfree(ml);
+ err = 0;
+ break;
+ }
+ }
+ spin_unlock(&memtype_lock);
+
+ if (err) {
+ printk(KERN_DEBUG "%s:%d freeing invalid memtype %Lx-%Lx\n",
+ current->comm, current->pid, start, end);
+ }
+ return err;
+}
+
Index: linux-2.6-x86.git/include/asm-x86/pat.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86.git/include/asm-x86/pat.h 2008-03-17 11:06:51.000000000 -0700
@@ -0,0 +1,16 @@
+
+#ifndef _ASM_PAT_H
+#define _ASM_PAT_H 1
+
+#include <linux/types.h>
+
+extern int pat_wc_enabled;
+
+extern void pat_init(void);
+
+extern int reserve_memtype(u64 start, u64 end,
+ unsigned long req_type, unsigned long *ret_type);
+extern int free_memtype(u64 start, u64 end);
+
+#endif
+
Index: linux-2.6-x86.git/arch/x86/Kconfig
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/Kconfig 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/Kconfig 2008-03-17 11:06:51.000000000 -0700
@@ -1013,6 +1013,21 @@
See <file:Documentation/mtrr.txt> for more information.
+config X86_PAT
+ def_bool y
+ prompt "x86 PAT support"
+ depends on MTRR && NONPROMISC_DEVMEM
+ help
+ Use PAT attributes to setup page level cache control.
+ ---help---
+ PATs are the modern equivalents of MTRRs and are much more
+ flexible than MTRRs.
+
+ Say N here if you see bootup problems (boot crash, boot hang,
+ spontaneous reboots) or a non-working Xorg.
+
+ If unsure, say Y.
+
config EFI
def_bool n
prompt "EFI runtime service support"
Index: linux-2.6-x86.git/arch/x86/mm/Makefile
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/Makefile 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/Makefile 2008-03-17 11:06:51.000000000 -0700
@@ -1,4 +1,4 @@
-obj-y := init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o
+obj-y := init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o pat.o
obj-$(CONFIG_X86_32) += pgtable_32.o
Index: linux-2.6-x86.git/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pageattr.c 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pageattr.c 2008-03-18 09:20:36.000000000 -0700
@@ -774,14 +774,14 @@
int set_memory_uc(unsigned long addr, int numpages)
{
return change_page_attr_set(addr, numpages,
- __pgprot(_PAGE_PCD | _PAGE_PWT));
+ __pgprot(_PAGE_CACHE_UC));
}
EXPORT_SYMBOL(set_memory_uc);
int set_memory_wb(unsigned long addr, int numpages)
{
return change_page_attr_clear(addr, numpages,
- __pgprot(_PAGE_PCD | _PAGE_PWT));
+ __pgprot(_PAGE_CACHE_MASK));
}
EXPORT_SYMBOL(set_memory_wb);
Index: linux-2.6-x86.git/include/asm-x86/mtrr.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/mtrr.h 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/mtrr.h 2008-03-17 11:06:51.000000000 -0700
@@ -84,6 +84,8 @@
#ifdef __KERNEL__
+extern u8 mtrr_type_lookup(u64 addr, u64 end);
+
/* The following functions are for use by other drivers */
# ifdef CONFIG_MTRR
extern void mtrr_save_fixed_ranges(void *);
Index: linux-2.6-x86.git/include/asm-x86/pgtable.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/pgtable.h 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/pgtable.h 2008-03-18 09:20:34.000000000 -0700
@@ -57,6 +57,12 @@
#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT)
+#define _PAGE_CACHE_WB (0)
+#define _PAGE_CACHE_WC (_PAGE_PWT)
+#define _PAGE_CACHE_UC_MINUS (_PAGE_PCD)
+#define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT)
+
#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_NX)
Index: linux-2.6-x86.git/include/asm-x86/cpufeature.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/cpufeature.h 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/cpufeature.h 2008-03-17 11:06:51.000000000 -0700
@@ -186,6 +186,7 @@
#define cpu_has_bts boot_cpu_has(X86_FEATURE_BTS)
#define cpu_has_gbpages boot_cpu_has(X86_FEATURE_GBPAGES)
#define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
+#define cpu_has_pat boot_cpu_has(X86_FEATURE_PAT)
#if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
# define cpu_has_invlpg 1
Index: linux-2.6-x86.git/include/asm-x86/msr-index.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/msr-index.h 2008-03-17 11:06:47.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/msr-index.h 2008-03-17 11:06:51.000000000 -0700
@@ -57,6 +57,8 @@
#define MSR_MTRRfix4K_F8000 0x0000026f
#define MSR_MTRRdefType 0x000002ff
+#define MSR_IA32_CR_PAT 0x00000277
+
#define MSR_IA32_DEBUGCTLMSR 0x000001d9
#define MSR_IA32_LASTBRANCHFROMIP 0x000001db
#define MSR_IA32_LASTBRANCHTOIP 0x000001dc
--
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [patch 02/13] x86: PAT infrastructure patch
2008-03-19 0:00 ` [patch 02/13] x86: PAT infrastructure patch venkatesh.pallipadi
@ 2008-03-19 20:06 ` Randy Dunlap
2008-03-24 21:22 ` Venki Pallipadi
0 siblings, 1 reply; 22+ messages in thread
From: Randy Dunlap @ 2008-03-19 20:06 UTC (permalink / raw)
To: venkatesh.pallipadi
Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes, linux-kernel, Suresh Siddha
On Tue, 18 Mar 2008 17:00:14 -0700 venkatesh.pallipadi@intel.com wrote:
diffstat would be very nice to have.
> Index: linux-2.6-x86.git/arch/x86/Kconfig
> ===================================================================
> --- linux-2.6-x86.git.orig/arch/x86/Kconfig 2008-03-17 11:06:47.000000000 -0700
> +++ linux-2.6-x86.git/arch/x86/Kconfig 2008-03-17 11:06:51.000000000 -0700
> @@ -1013,6 +1013,21 @@
>
> See <file:Documentation/mtrr.txt> for more information.
>
> +config X86_PAT
> + def_bool y
> + prompt "x86 PAT support"
> + depends on MTRR && NONPROMISC_DEVMEM
> + help
> + Use PAT attributes to setup page level cache control.
> + ---help---
> + PATs are the modern equivalents of MTRRs and are much more
> + flexible than MTRRs.
> +
Do 2 help sections actually work? We don't usually do that.
Just change the second one to a blank line.
> + Say N here if you see bootup problems (boot crash, boot hang,
> + spontaneous reboots) or a non-working Xorg.
Maybe "a non-working Xorg video driver" ?
or even omit Xorg and just say video driver?
> +
> + If unsure, say Y.
> +
> config EFI
> def_bool n
> prompt "EFI runtime service support"
---
~Randy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 02/13] x86: PAT infrastructure patch
2008-03-19 20:06 ` Randy Dunlap
@ 2008-03-24 21:22 ` Venki Pallipadi
0 siblings, 0 replies; 22+ messages in thread
From: Venki Pallipadi @ 2008-03-24 21:22 UTC (permalink / raw)
To: Randy Dunlap
Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, gregkh,
airlied, davej, mingo, tglx, hpa, akpm, arjan, jesse.barnes,
linux-kernel, Suresh Siddha
On Wed, Mar 19, 2008 at 01:06:39PM -0700, Randy Dunlap wrote:
> On Tue, 18 Mar 2008 17:00:14 -0700 venkatesh.pallipadi@intel.com wrote:
>
> diffstat would be very nice to have.
I have updated my quilt scripts to auto-generate diffstat for any future
patches.
>
>
> > Index: linux-2.6-x86.git/arch/x86/Kconfig
> > ===================================================================
> > --- linux-2.6-x86.git.orig/arch/x86/Kconfig 2008-03-17 11:06:47.000000000 -0700
> > +++ linux-2.6-x86.git/arch/x86/Kconfig 2008-03-17 11:06:51.000000000 -0700
> > @@ -1013,6 +1013,21 @@
> >
> > See <file:Documentation/mtrr.txt> for more information.
> >
> > +config X86_PAT
> > + def_bool y
> > + prompt "x86 PAT support"
> > + depends on MTRR && NONPROMISC_DEVMEM
> > + help
> > + Use PAT attributes to setup page level cache control.
> > + ---help---
> > + PATs are the modern equivalents of MTRRs and are much more
> > + flexible than MTRRs.
> > +
>
> Do 2 help sections actually work? We don't usually do that.
> Just change the second one to a blank line.
>
> > + Say N here if you see bootup problems (boot crash, boot hang,
> > + spontaneous reboots) or a non-working Xorg.
>
> Maybe "a non-working Xorg video driver" ?
> or even omit Xorg and just say video driver?
>
Done with below patch.
Ingo: Please apply.
Thanks,
Venki
Fix double help section in PAT Kconfig. Thanks to Randy Dunlap for catching
this bug.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
arch/x86/Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6-x86.git/arch/x86/Kconfig
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/Kconfig 2008-03-21 08:28:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/Kconfig 2008-03-24 07:19:35.000000000 -0700
@@ -1019,12 +1019,12 @@ config X86_PAT
depends on MTRR && NONPROMISC_DEVMEM
help
Use PAT attributes to setup page level cache control.
- ---help---
+
PATs are the modern equivalents of MTRRs and are much more
flexible than MTRRs.
Say N here if you see bootup problems (boot crash, boot hang,
- spontaneous reboots) or a non-working Xorg.
+ spontaneous reboots) or a non-working video driver.
If unsure, say Y.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [patch 03/13] x86: PAT Avoid aliasing in /dev/mem read/write
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
2008-03-19 0:00 ` [patch 01/13] x86: PAT documentation venkatesh.pallipadi
2008-03-19 0:00 ` [patch 02/13] x86: PAT infrastructure patch venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 04/13] x86: PAT make ioremap_change_attr non-static venkatesh.pallipadi
` (11 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: x86_xlate_dev_mem_ptr.patch --]
[-- Type: text/plain, Size: 4936 bytes --]
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 02:16:08.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 09:20:38.000000000 -0700
@@ -284,6 +284,35 @@
}
EXPORT_SYMBOL(iounmap);
+/*
+ * Convert a physical pointer to a virtual kernel pointer for /dev/mem
+ * access
+ */
+void *xlate_dev_mem_ptr(unsigned long phys)
+{
+ void *addr;
+ unsigned long start = phys & PAGE_MASK;
+
+ /* If page is RAM, we can use __va. Otherwise ioremap and unmap. */
+ if (page_is_ram(start >> PAGE_SHIFT))
+ return __va(phys);
+
+ addr = (void *)ioremap(start, PAGE_SIZE);
+ if (addr)
+ addr = (void *)((unsigned long)addr | (phys & ~PAGE_MASK));
+
+ return addr;
+}
+
+void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
+{
+ if (page_is_ram(phys >> PAGE_SHIFT))
+ return;
+
+ iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
+ return;
+}
+
#ifdef CONFIG_X86_32
int __initdata early_ioremap_debug;
Index: linux-2.6-x86.git/drivers/char/mem.c
===================================================================
--- linux-2.6-x86.git.orig/drivers/char/mem.c 2008-03-18 02:16:08.000000000 -0700
+++ linux-2.6-x86.git/drivers/char/mem.c 2008-03-18 09:20:34.000000000 -0700
@@ -134,6 +134,10 @@
}
#endif
+void __attribute__((weak)) unxlate_dev_mem_ptr(unsigned long phys, void *addr)
+{
+}
+
/*
* This funcion reads the *physical* memory. The f_pos points directly to the
* memory location.
@@ -176,17 +180,25 @@
sz = min_t(unsigned long, sz, count);
+ if (!range_is_allowed(p >> PAGE_SHIFT, count))
+ return -EPERM;
+
/*
* On ia64 if a page has been mapped somewhere as
* uncached, then it must also be accessed uncached
* by the kernel or data corruption may occur
*/
ptr = xlate_dev_mem_ptr(p);
+ if (!ptr)
+ return -EFAULT;
- if (!range_is_allowed(p >> PAGE_SHIFT, count))
- return -EPERM;
- if (copy_to_user(buf, ptr, sz))
+ if (copy_to_user(buf, ptr, sz)) {
+ unxlate_dev_mem_ptr(p, ptr);
return -EFAULT;
+ }
+
+ unxlate_dev_mem_ptr(p, ptr);
+
buf += sz;
p += sz;
count -= sz;
@@ -235,22 +247,32 @@
sz = min_t(unsigned long, sz, count);
+ if (!range_is_allowed(p >> PAGE_SHIFT, sz))
+ return -EPERM;
+
/*
* On ia64 if a page has been mapped somewhere as
* uncached, then it must also be accessed uncached
* by the kernel or data corruption may occur
*/
ptr = xlate_dev_mem_ptr(p);
+ if (!ptr) {
+ if (written)
+ break;
+ return -EFAULT;
+ }
- if (!range_is_allowed(p >> PAGE_SHIFT, sz))
- return -EPERM;
copied = copy_from_user(ptr, buf, sz);
if (copied) {
written += sz - copied;
+ unxlate_dev_mem_ptr(p, ptr);
if (written)
break;
return -EFAULT;
}
+
+ unxlate_dev_mem_ptr(p, ptr);
+
buf += sz;
p += sz;
count -= sz;
Index: linux-2.6-x86.git/include/asm-x86/io_32.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/io_32.h 2008-03-18 02:16:08.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/io_32.h 2008-03-18 03:11:12.000000000 -0700
@@ -49,12 +49,6 @@
#include <linux/vmalloc.h>
/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p) __va(p)
-
-/*
* Convert a virtual cached pointer to an uncached pointer
*/
#define xlate_dev_kmem_ptr(p) p
Index: linux-2.6-x86.git/include/asm-x86/io_64.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/io_64.h 2008-03-18 02:16:08.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/io_64.h 2008-03-18 03:11:12.000000000 -0700
@@ -282,12 +282,6 @@
#define BIO_VMERGE_BOUNDARY iommu_bio_merge
/*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p) __va(p)
-
-/*
* Convert a virtual cached pointer to an uncached pointer
*/
#define xlate_dev_kmem_ptr(p) p
Index: linux-2.6-x86.git/include/asm-x86/io.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/io.h 2008-03-18 02:16:08.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/io.h 2008-03-18 09:20:38.000000000 -0700
@@ -1,5 +1,13 @@
+#ifndef _ASM_X86_IO_H
+#define _ASM_X86_IO_H
+
#ifdef CONFIG_X86_32
# include "io_32.h"
#else
# include "io_64.h"
#endif
+
+extern void *xlate_dev_mem_ptr(unsigned long phys);
+extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
+
+#endif /* _ASM_X86_IO_H */
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 04/13] x86: PAT make ioremap_change_attr non-static
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (2 preceding siblings ...)
2008-03-19 0:00 ` [patch 03/13] x86: PAT Avoid aliasing in /dev/mem read/write venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 05/13] x86: PAT use reserve free memtype in ioremap and iounmap venkatesh.pallipadi
` (10 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: make_ioremap_change_attr_nonstatic.patch --]
[-- Type: text/plain, Size: 3184 bytes --]
Make ioremap_change_attr() non-static and use prot_val in place of ioremap_mode.
This interface is used in subsequent PAT patches.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 03:11:12.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 09:20:37.000000000 -0700
@@ -21,11 +21,6 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
-enum ioremap_mode {
- IOR_MODE_UNCACHED,
- IOR_MODE_CACHED,
-};
-
#ifdef CONFIG_X86_64
unsigned long __phys_addr(unsigned long x)
@@ -91,18 +86,18 @@
* Fix up the linear direct mapping of the kernel to avoid cache attribute
* conflicts.
*/
-static int ioremap_change_attr(unsigned long vaddr, unsigned long size,
- enum ioremap_mode mode)
+int ioremap_change_attr(unsigned long vaddr, unsigned long size,
+ unsigned long prot_val)
{
unsigned long nrpages = size >> PAGE_SHIFT;
int err;
- switch (mode) {
- case IOR_MODE_UNCACHED:
+ switch (prot_val) {
+ case _PAGE_CACHE_UC:
default:
err = set_memory_uc(vaddr, nrpages);
break;
- case IOR_MODE_CACHED:
+ case _PAGE_CACHE_WB:
err = set_memory_wb(vaddr, nrpages);
break;
}
@@ -120,7 +115,7 @@
* caller shouldn't need to know that small detail.
*/
static void __iomem *__ioremap(unsigned long phys_addr, unsigned long size,
- enum ioremap_mode mode)
+ unsigned long prot_val)
{
unsigned long pfn, offset, last_addr, vaddr;
struct vm_struct *area;
@@ -158,12 +153,12 @@
WARN_ON_ONCE(is_ram);
}
- switch (mode) {
- case IOR_MODE_UNCACHED:
+ switch (prot_val) {
+ case _PAGE_CACHE_UC:
default:
prot = PAGE_KERNEL_NOCACHE;
break;
- case IOR_MODE_CACHED:
+ case _PAGE_CACHE_WB:
prot = PAGE_KERNEL;
break;
}
@@ -188,7 +183,7 @@
return NULL;
}
- if (ioremap_change_attr(vaddr, size, mode) < 0) {
+ if (ioremap_change_attr(vaddr, size, prot_val) < 0) {
vunmap(area->addr);
return NULL;
}
@@ -222,13 +217,13 @@
*/
void __iomem *ioremap_nocache(unsigned long phys_addr, unsigned long size)
{
- return __ioremap(phys_addr, size, IOR_MODE_UNCACHED);
+ return __ioremap(phys_addr, size, _PAGE_CACHE_UC);
}
EXPORT_SYMBOL(ioremap_nocache);
void __iomem *ioremap_cache(unsigned long phys_addr, unsigned long size)
{
- return __ioremap(phys_addr, size, IOR_MODE_CACHED);
+ return __ioremap(phys_addr, size, _PAGE_CACHE_WB);
}
EXPORT_SYMBOL(ioremap_cache);
Index: linux-2.6-x86.git/include/asm-x86/io.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/io.h 2008-03-18 03:11:12.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/io.h 2008-03-18 09:20:25.000000000 -0700
@@ -7,6 +7,9 @@
# include "io_64.h"
#endif
+extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
+ unsigned long prot_val);
+
extern void *xlate_dev_mem_ptr(unsigned long phys);
extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 05/13] x86: PAT use reserve free memtype in ioremap and iounmap
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (3 preceding siblings ...)
2008-03-19 0:00 ` [patch 04/13] x86: PAT make ioremap_change_attr non-static venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 06/13] x86: PAT use reserve free memtype in set_memory_uc venkatesh.pallipadi
` (9 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: use_reserve_free_memtype_ioremap.patch --]
[-- Type: text/plain, Size: 2525 bytes --]
Use reserve_memtype and free_memtype interfaces in ioremap/iounmap to avoid
aliasing.
If there is an existing alias for the region, inherit the memory type from
the alias. If there are conflicting aliases for the entire region, then fail
ioremap.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 03:14:09.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 09:20:36.000000000 -0700
@@ -20,6 +20,7 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
+#include <asm/pat.h>
#ifdef CONFIG_X86_64
@@ -119,6 +120,7 @@
{
unsigned long pfn, offset, last_addr, vaddr;
struct vm_struct *area;
+ unsigned long new_prot_val;
pgprot_t prot;
void __iomem *ret_addr;
@@ -153,6 +155,28 @@
WARN_ON_ONCE(is_ram);
}
+ /*
+ * Mappings have to be page-aligned
+ */
+ offset = phys_addr & ~PAGE_MASK;
+ phys_addr &= PAGE_MASK;
+ size = PAGE_ALIGN(last_addr+1) - phys_addr;
+
+ if (reserve_memtype(phys_addr, phys_addr + size,
+ prot_val, &new_prot_val)) {
+ /*
+ * Do not fallback to certain memory types with certain
+ * requested type:
+ * - request is uncached, return cannot be write-back
+ */
+ if ((prot_val == _PAGE_CACHE_UC &&
+ new_prot_val == _PAGE_CACHE_WB)) {
+ free_memtype(phys_addr, phys_addr + size);
+ return NULL;
+ }
+ prot_val = new_prot_val;
+ }
+
switch (prot_val) {
case _PAGE_CACHE_UC:
default:
@@ -164,13 +188,6 @@
}
/*
- * Mappings have to be page-aligned
- */
- offset = phys_addr & ~PAGE_MASK;
- phys_addr &= PAGE_MASK;
- size = PAGE_ALIGN(last_addr+1) - phys_addr;
-
- /*
* Ok, go for it..
*/
area = get_vm_area(size, VM_IOREMAP);
@@ -179,11 +196,13 @@
area->phys_addr = phys_addr;
vaddr = (unsigned long) area->addr;
if (ioremap_page_range(vaddr, vaddr + size, phys_addr, prot)) {
+ free_memtype(phys_addr, phys_addr + size);
free_vm_area(area);
return NULL;
}
if (ioremap_change_attr(vaddr, size, prot_val) < 0) {
+ free_memtype(phys_addr, phys_addr + size);
vunmap(area->addr);
return NULL;
}
@@ -272,6 +291,8 @@
return;
}
+ free_memtype(p->phys_addr, p->phys_addr + get_vm_area_size(p));
+
/* Finally remove it */
o = remove_vm_area((void *)addr);
BUG_ON(p != o || o == NULL);
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 06/13] x86: PAT use reserve free memtype in set_memory_uc
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (4 preceding siblings ...)
2008-03-19 0:00 ` [patch 05/13] x86: PAT use reserve free memtype in ioremap and iounmap venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 07/13] x86: PAT use reserve free memtype in pci_mmap_page_range venkatesh.pallipadi
` (8 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: use_reserve_free_memtype_setmemory.patch --]
[-- Type: text/plain, Size: 3131 bytes --]
Use reserve_memtype and free_memtype interfaces in set_memory_uc/set_memory_wb
interfaces to avoid aliasing.
Usage model of set_memory_uc and set_memory_wb is for RAM memory and users
will first call set_memory_uc and call set_memory_wb after use to reset the
attribute.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pageattr.c 2008-03-18 02:16:06.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pageattr.c 2008-03-18 09:20:30.000000000 -0700
@@ -19,6 +19,7 @@
#include <asm/uaccess.h>
#include <asm/pgalloc.h>
#include <asm/proto.h>
+#include <asm/pat.h>
/*
* The current flushing context - we pass it instead of 5 arguments:
@@ -771,18 +772,34 @@
return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask);
}
-int set_memory_uc(unsigned long addr, int numpages)
+int _set_memory_uc(unsigned long addr, int numpages)
{
return change_page_attr_set(addr, numpages,
__pgprot(_PAGE_CACHE_UC));
}
+
+int set_memory_uc(unsigned long addr, int numpages)
+{
+ if (reserve_memtype(addr, addr + numpages * PAGE_SIZE,
+ _PAGE_CACHE_UC, NULL))
+ return -EINVAL;
+
+ return _set_memory_uc(addr, numpages);
+}
EXPORT_SYMBOL(set_memory_uc);
-int set_memory_wb(unsigned long addr, int numpages)
+int _set_memory_wb(unsigned long addr, int numpages)
{
return change_page_attr_clear(addr, numpages,
__pgprot(_PAGE_CACHE_MASK));
}
+
+int set_memory_wb(unsigned long addr, int numpages)
+{
+ free_memtype(addr, addr + numpages * PAGE_SIZE);
+
+ return _set_memory_wb(addr, numpages);
+}
EXPORT_SYMBOL(set_memory_wb);
int set_memory_x(unsigned long addr, int numpages)
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 03:19:44.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 09:20:25.000000000 -0700
@@ -96,10 +96,10 @@
switch (prot_val) {
case _PAGE_CACHE_UC:
default:
- err = set_memory_uc(vaddr, nrpages);
+ err = _set_memory_uc(vaddr, nrpages);
break;
case _PAGE_CACHE_WB:
- err = set_memory_wb(vaddr, nrpages);
+ err = _set_memory_wb(vaddr, nrpages);
break;
}
Index: linux-2.6-x86.git/include/asm-x86/cacheflush.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/cacheflush.h 2008-03-18 02:16:06.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/cacheflush.h 2008-03-18 09:20:30.000000000 -0700
@@ -34,6 +34,8 @@
int set_pages_ro(struct page *page, int numpages);
int set_pages_rw(struct page *page, int numpages);
+int _set_memory_uc(unsigned long addr, int numpages);
+int _set_memory_wb(unsigned long addr, int numpages);
int set_memory_uc(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 07/13] x86: PAT use reserve free memtype in pci_mmap_page_range
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (5 preceding siblings ...)
2008-03-19 0:00 ` [patch 06/13] x86: PAT use reserve free memtype in set_memory_uc venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 08/13] x86: PAT phys_mem_access_prot_allowed for dev/mem mmap venkatesh.pallipadi
` (7 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: use_reserve_free_memtype_pcimmap.patch --]
[-- Type: text/plain, Size: 3577 bytes --]
Add reserve_memtype and free_memtype wrapper for pci_mmap_page_range. Free
is called on unmap, but identity map continues to be mapped as per
pci_mmap_page_range request, until next request for the same region calls
ioremap_change_attr(), which will go through without conflict. This way of
mapping is identical to one used in ioremap/iounmap.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/pci/i386.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/pci/i386.c 2008-03-18 02:16:05.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/pci/i386.c 2008-03-18 03:29:39.000000000 -0700
@@ -30,6 +30,9 @@
#include <linux/init.h>
#include <linux/ioport.h>
#include <linux/errno.h>
+#include <linux/bootmem.h>
+
+#include <asm/pat.h>
#include "pci.h"
@@ -297,10 +300,34 @@
pci_write_config_byte(dev, PCI_LATENCY_TIMER, lat);
}
+static void pci_unmap_page_range(struct vm_area_struct *vma)
+{
+ u64 addr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+ free_memtype(addr, addr + vma->vm_end - vma->vm_start);
+}
+
+static void pci_track_mmap_page_range(struct vm_area_struct *vma)
+{
+ u64 addr = (u64)vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long flags = pgprot_val(vma->vm_page_prot)
+ & _PAGE_CACHE_MASK;
+
+ reserve_memtype(addr, addr + vma->vm_end - vma->vm_start, flags, NULL);
+}
+
+static struct vm_operations_struct pci_mmap_ops = {
+ .open = pci_track_mmap_page_range,
+ .close = pci_unmap_page_range,
+};
+
int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
enum pci_mmap_state mmap_state, int write_combine)
{
unsigned long prot;
+ u64 addr = vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long len = vma->vm_end - vma->vm_start;
+ unsigned long flags;
+ unsigned long new_flags;
/* I/O space cannot be accessed via normal processor loads and
* stores on this platform.
@@ -308,21 +335,46 @@
if (mmap_state == pci_mmap_io)
return -EINVAL;
- /* Leave vm_pgoff as-is, the PCI space address is the physical
- * address on this platform.
- */
prot = pgprot_val(vma->vm_page_prot);
- if (boot_cpu_data.x86 > 3)
- prot |= _PAGE_PCD | _PAGE_PWT;
+ if (pat_wc_enabled && write_combine)
+ prot |= _PAGE_CACHE_WC;
+ else if (boot_cpu_data.x86 > 3)
+ prot |= _PAGE_CACHE_UC;
+
vma->vm_page_prot = __pgprot(prot);
- /* Write-combine setting is ignored, it is changed via the mtrr
- * interfaces on this platform.
- */
+ flags = pgprot_val(vma->vm_page_prot) & _PAGE_CACHE_MASK;
+ if (reserve_memtype(addr, addr + len, flags, &new_flags)) {
+ /*
+ * Do not fallback to certain memory types with certain
+ * requested type:
+ * - request is uncached, return cannot be write-back
+ * - request is uncached, return cannot be write-combine
+ * - request is write-combine, return cannot be write-back
+ */
+ if ((flags == _PAGE_CACHE_UC &&
+ (new_flags == _PAGE_CACHE_WB ||
+ new_flags == _PAGE_CACHE_WC)) ||
+ (flags == _PAGE_CACHE_WC &&
+ new_flags == _PAGE_CACHE_WB)) {
+ free_memtype(addr, addr+len);
+ return -EINVAL;
+ }
+ flags = new_flags;
+ }
+
+ if (vma->vm_pgoff <= max_pfn_mapped &&
+ ioremap_change_attr((unsigned long)__va(addr), len, flags)) {
+ free_memtype(addr, addr + len);
+ return -EINVAL;
+ }
+
if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
return -EAGAIN;
+ vma->vm_ops = &pci_mmap_ops;
+
return 0;
}
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 08/13] x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (6 preceding siblings ...)
2008-03-19 0:00 ` [patch 07/13] x86: PAT use reserve free memtype in pci_mmap_page_range venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 09/13] x86: PAT use reserve free memtype in mmap of /dev/mem venkatesh.pallipadi
` (6 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: x86_phys_mem_access_prot.patch --]
[-- Type: text/plain, Size: 5386 bytes --]
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.
x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/pat.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pat.c 2008-03-18 03:11:03.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pat.c 2008-03-18 09:20:33.000000000 -0700
@@ -400,3 +400,42 @@
return err;
}
+
+/* /dev/mem interface. Use the previous mapping */
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot)
+{
+ return vma_prot;
+}
+
+int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t *vma_prot)
+{
+
+ if (file->f_flags & O_SYNC) {
+ *vma_prot = pgprot_noncached(*vma_prot);
+ return 1;
+ }
+
+#ifdef CONFIG_X86_32
+ /*
+ * On the PPro and successors, the MTRRs are used to set
+ * memory types for physical addresses outside main memory,
+ * so blindly setting UC or PWT on those pages is wrong.
+ * For Pentiums and earlier, the surround logic should disable
+ * caching for the high addresses through the KEN pin, but
+ * we maintain the tradition of paranoia in this code.
+ */
+ if (!pat_wc_enabled &&
+ ! ( test_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability) ||
+ test_bit(X86_FEATURE_K6_MTRR, boot_cpu_data.x86_capability) ||
+ test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
+ test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability)) &&
+ (pfn << PAGE_SHIFT) >= __pa(high_memory)) {
+ *vma_prot = pgprot_noncached(*vma_prot);
+ return 1;
+ }
+#endif
+
+ return 1;
+}
Index: linux-2.6-x86.git/include/asm-x86/pgtable.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/pgtable.h 2008-03-18 02:16:04.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/pgtable.h 2008-03-18 09:20:25.000000000 -0700
@@ -205,6 +205,15 @@
#define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)
+#ifndef __ASSEMBLY__
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot);
+int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t *vma_prot);
+#endif
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
Index: linux-2.6-x86.git/drivers/char/mem.c
===================================================================
--- linux-2.6-x86.git.orig/drivers/char/mem.c 2008-03-18 03:11:12.000000000 -0700
+++ linux-2.6-x86.git/drivers/char/mem.c 2008-03-18 09:20:33.000000000 -0700
@@ -41,36 +41,7 @@
*/
static inline int uncached_access(struct file *file, unsigned long addr)
{
-#if defined(__i386__) && !defined(__arch_um__)
- /*
- * On the PPro and successors, the MTRRs are used to set
- * memory types for physical addresses outside main memory,
- * so blindly setting PCD or PWT on those pages is wrong.
- * For Pentiums and earlier, the surround logic should disable
- * caching for the high addresses through the KEN pin, but
- * we maintain the tradition of paranoia in this code.
- */
- if (file->f_flags & O_SYNC)
- return 1;
- return !( test_bit(X86_FEATURE_MTRR, boot_cpu_data.x86_capability) ||
- test_bit(X86_FEATURE_K6_MTRR, boot_cpu_data.x86_capability) ||
- test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
- test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability) )
- && addr >= __pa(high_memory);
-#elif defined(__x86_64__) && !defined(__arch_um__)
- /*
- * This is broken because it can generate memory type aliases,
- * which can cause cache corruptions
- * But it is only available for root and we have to be bug-to-bug
- * compatible with i386.
- */
- if (file->f_flags & O_SYNC)
- return 1;
- /* same behaviour as i386. PAT always set to cached and MTRRs control the
- caching behaviour.
- Hopefully a full PAT implementation will fix that soon. */
- return 0;
-#elif defined(CONFIG_IA64)
+#if defined(CONFIG_IA64)
/*
* On ia64, we ignore O_SYNC because we cannot tolerate memory attribute aliases.
*/
@@ -283,6 +254,12 @@
return written;
}
+int __attribute__((weak)) phys_mem_access_prot_allowed(struct file *file,
+ unsigned long pfn, unsigned long size, pgprot_t *vma_prot)
+{
+ return 1;
+}
+
#ifndef __HAVE_PHYS_MEM_ACCESS_PROT
static pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
@@ -336,6 +313,10 @@
if (!range_is_allowed(vma->vm_pgoff, size))
return -EPERM;
+ if (!phys_mem_access_prot_allowed(file, vma->vm_pgoff, size,
+ &vma->vm_page_prot))
+ return -EINVAL;
+
vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff,
size,
vma->vm_page_prot);
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 09/13] x86: PAT use reserve free memtype in mmap of /dev/mem
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (7 preceding siblings ...)
2008-03-19 0:00 ` [patch 08/13] x86: PAT phys_mem_access_prot_allowed for dev/mem mmap venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 10/13] x86: PAT export resource_wc in pci sysfs venkatesh.pallipadi
` (5 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: use_reserve_free_memtype_devmmemmap.patch --]
[-- Type: text/plain, Size: 7189 bytes --]
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/pat.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pat.c 2008-03-18 03:33:01.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pat.c 2008-03-18 09:20:21.000000000 -0700
@@ -11,6 +11,7 @@
#include <linux/kernel.h>
#include <linux/gfp.h>
#include <linux/fs.h>
+#include <linux/bootmem.h>
#include <asm/msr.h>
#include <asm/tlbflush.h>
@@ -21,6 +22,7 @@
#include <asm/cacheflush.h>
#include <asm/fcntl.h>
#include <asm/mtrr.h>
+#include <asm/io.h>
int pat_wc_enabled = 1;
@@ -195,6 +197,21 @@
return 0;
}
+/*
+ * req_type typically has one of the:
+ * - _PAGE_CACHE_WB
+ * - _PAGE_CACHE_WC
+ * - _PAGE_CACHE_UC_MINUS
+ * - _PAGE_CACHE_UC
+ *
+ * req_type will have a special case value '-1', when requester want to inherit
+ * the memory type from mtrr (if WB), existing PAT, defaulting to UC_MINUS.
+ *
+ * If ret_type is NULL, function will return an error if it cannot reserve the
+ * region with req_type. If ret_type is non-null, function will return
+ * available type in ret_type in case of no error. In case of any error
+ * it will return a negative return value.
+ */
int reserve_memtype(u64 start, u64 end, unsigned long req_type,
unsigned long *ret_type)
{
@@ -205,9 +222,14 @@
/* Only track when pat_wc_enabled */
if (!pat_wc_enabled) {
- if (ret_type)
- *ret_type = req_type;
-
+ /* This is identical to page table setting without PAT */
+ if (ret_type) {
+ if (req_type == -1) {
+ *ret_type = _PAGE_CACHE_WB;
+ } else {
+ *ret_type = req_type;
+ }
+ }
return 0;
}
@@ -219,8 +241,29 @@
return 0;
}
- req_type &= _PAGE_CACHE_MASK;
- err = pat_x_mtrr_type(start, end, req_type, &actual_type);
+ if (req_type == -1) {
+ /*
+ * Special case where caller wants to inherit from mtrr or
+ * existing pat mapping, defaulting to UC_MINUS in case of
+ * no match.
+ */
+ u8 mtrr_type = mtrr_type_lookup(start, end);
+ if (mtrr_type == 0xFE) { /* MTRR match error */
+ err = -1;
+ }
+
+ if (mtrr_type == MTRR_TYPE_WRBACK) {
+ req_type = _PAGE_CACHE_WB;
+ actual_type = _PAGE_CACHE_WB;
+ } else {
+ req_type = _PAGE_CACHE_UC_MINUS;
+ actual_type = _PAGE_CACHE_UC_MINUS;
+ }
+ } else {
+ req_type &= _PAGE_CACHE_MASK;
+ err = pat_x_mtrr_type(start, end, req_type, &actual_type);
+ }
+
if (err) {
if (ret_type)
*ret_type = actual_type;
@@ -401,7 +444,14 @@
}
-/* /dev/mem interface. Use the previous mapping */
+/*
+ * /dev/mem mmap interface. The memtype used for mapping varies:
+ * - Use UC for mappings with O_SYNC flag
+ * - Without O_SYNC flag, if there is any conflict in reserve_memtype,
+ * inherit the memtype from existing mapping.
+ * - Else use UC_MINUS memtype (for backward compatibility with existing
+ * X drivers.
+ */
pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
{
@@ -411,10 +461,13 @@
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot)
{
+ u64 offset = ((u64) pfn) << PAGE_SHIFT;
+ unsigned long flags = _PAGE_CACHE_UC_MINUS;
+ unsigned long ret_flags;
+ int retval;
if (file->f_flags & O_SYNC) {
- *vma_prot = pgprot_noncached(*vma_prot);
- return 1;
+ flags = _PAGE_CACHE_UC;
}
#ifdef CONFIG_X86_32
@@ -432,10 +485,65 @@
test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
test_bit(X86_FEATURE_CENTAUR_MCR, boot_cpu_data.x86_capability)) &&
(pfn << PAGE_SHIFT) >= __pa(high_memory)) {
- *vma_prot = pgprot_noncached(*vma_prot);
- return 1;
+ flags = _PAGE_CACHE_UC;
}
#endif
+ /*
+ * With O_SYNC, we can only take UC mapping. Fail if we cannot.
+ * Without O_SYNC, we want to get
+ * - WB for WB-able memory and no other conflicting mappings
+ * - UC_MINUS for non-WB-able memory with no other conflicting mappings
+ * - Inherit from confliting mappings otherwise
+ */
+ if (flags != _PAGE_CACHE_UC_MINUS) {
+ retval = reserve_memtype(offset, offset + size, flags, NULL);
+ } else {
+ retval = reserve_memtype(offset, offset + size, -1, &ret_flags);
+ }
+
+ if (retval < 0)
+ return 0;
+
+ flags = ret_flags;
+
+ if (pfn <= max_pfn_mapped &&
+ ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) {
+ free_memtype(offset, offset + size);
+ printk(KERN_DEBUG
+ "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n",
+ current->comm, current->pid,
+ cattr_name(flags),
+ offset, offset + size);
+ return 0;
+ }
+
+ *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) |
+ flags);
return 1;
}
+
+void map_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
+{
+ u64 addr = (u64)pfn << PAGE_SHIFT;
+ unsigned long flags;
+ unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
+
+ reserve_memtype(addr, addr + size, want_flags, &flags);
+ if (flags != want_flags) {
+ printk(KERN_DEBUG
+ "%s:%d /dev/mem expected mapping type %s for %Lx-%Lx, got %s\n",
+ current->comm, current->pid,
+ cattr_name(want_flags),
+ addr, addr + size,
+ cattr_name(flags));
+ }
+}
+
+void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
+{
+ u64 addr = (u64)pfn << PAGE_SHIFT;
+
+ free_memtype(addr, addr + size);
+}
+
Index: linux-2.6-x86.git/drivers/char/mem.c
===================================================================
--- linux-2.6-x86.git.orig/drivers/char/mem.c 2008-03-18 03:33:01.000000000 -0700
+++ linux-2.6-x86.git/drivers/char/mem.c 2008-03-18 03:38:05.000000000 -0700
@@ -300,6 +300,35 @@
}
#endif
+void __attribute__((weak))
+map_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+ /* nothing. architectures can override. */
+}
+
+void __attribute__((weak))
+unmap_devmem(unsigned long pfn, unsigned long len, pgprot_t prot)
+{
+ /* nothing. architectures can override. */
+}
+
+static void mmap_mem_open(struct vm_area_struct *vma)
+{
+ map_devmem(vma->vm_pgoff, vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+}
+
+static void mmap_mem_close(struct vm_area_struct *vma)
+{
+ unmap_devmem(vma->vm_pgoff, vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+}
+
+static struct vm_operations_struct mmap_mem_ops = {
+ .open = mmap_mem_open,
+ .close = mmap_mem_close
+};
+
static int mmap_mem(struct file * file, struct vm_area_struct * vma)
{
size_t size = vma->vm_end - vma->vm_start;
@@ -321,13 +350,17 @@
size,
vma->vm_page_prot);
+ vma->vm_ops = &mmap_mem_ops;
+
/* Remap-pfn-range will mark the range VM_IO and VM_RESERVED */
if (remap_pfn_range(vma,
vma->vm_start,
vma->vm_pgoff,
size,
- vma->vm_page_prot))
+ vma->vm_page_prot)) {
+ unmap_devmem(vma->vm_pgoff, size, vma->vm_page_prot);
return -EAGAIN;
+ }
return 0;
}
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 10/13] x86: PAT export resource_wc in pci sysfs
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (8 preceding siblings ...)
2008-03-19 0:00 ` [patch 09/13] x86: PAT use reserve free memtype in mmap of /dev/mem venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 11/13] x86: PAT Add set_memory_wc() interface venkatesh.pallipadi
` (4 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: pci_sysfs_writecombine.patch --]
[-- Type: text/plain, Size: 5827 bytes --]
For the ranges with IORESOURCE_PREFETCH, export a new resource_wc interface in
pci /sysfs along with resource (which is uncached).
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6-x86.git.orig/drivers/pci/pci-sysfs.c 2008-03-18 02:16:03.000000000 -0700
+++ linux-2.6-x86.git/drivers/pci/pci-sysfs.c 2008-03-18 03:46:08.000000000 -0700
@@ -420,13 +420,14 @@
* @kobj: kobject for mapping
* @attr: struct bin_attribute for the file being mapped
* @vma: struct vm_area_struct passed into the mmap
+ * @write_combine: 1 for write_combine mapping
*
* Use the regular PCI mapping routines to map a PCI resource into userspace.
* FIXME: write combining? maybe automatic for prefetchable regions?
*/
static int
pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
- struct vm_area_struct *vma)
+ struct vm_area_struct *vma, int write_combine)
{
struct pci_dev *pdev = to_pci_dev(container_of(kobj,
struct device, kobj));
@@ -449,7 +450,21 @@
vma->vm_pgoff += start >> PAGE_SHIFT;
mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
- return pci_mmap_page_range(pdev, vma, mmap_type, 0);
+ return pci_mmap_page_range(pdev, vma, mmap_type, write_combine);
+}
+
+static int
+pci_mmap_resource_uc(struct kobject *kobj, struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ return pci_mmap_resource(kobj, attr, vma, 0);
+}
+
+static int
+pci_mmap_resource_wc(struct kobject *kobj, struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ return pci_mmap_resource(kobj, attr, vma, 1);
}
/**
@@ -472,9 +487,46 @@
sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
kfree(res_attr);
}
+
+ res_attr = pdev->res_attr_wc[i];
+ if (res_attr) {
+ sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
+ kfree(res_attr);
+ }
}
}
+static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
+{
+ /* allocate attribute structure, piggyback attribute name */
+ int name_len = write_combine ? 13 : 10;
+ struct bin_attribute *res_attr;
+ int retval;
+
+ res_attr = kzalloc(sizeof(*res_attr) + name_len, GFP_ATOMIC);
+ if (res_attr) {
+ char *res_attr_name = (char *)(res_attr + 1);
+
+ if (write_combine) {
+ pdev->res_attr_wc[num] = res_attr;
+ sprintf(res_attr_name, "resource%d_wc", num);
+ res_attr->mmap = pci_mmap_resource_wc;
+ } else {
+ pdev->res_attr[num] = res_attr;
+ sprintf(res_attr_name, "resource%d", num);
+ res_attr->mmap = pci_mmap_resource_uc;
+ }
+ res_attr->attr.name = res_attr_name;
+ res_attr->attr.mode = S_IRUSR | S_IWUSR;
+ res_attr->size = pci_resource_len(pdev, num);
+ res_attr->private = &pdev->resource[num];
+ retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
+ } else
+ retval = -ENOMEM;
+
+ return retval;
+}
+
/**
* pci_create_resource_files - create resource files in sysfs for @dev
* @dev: dev in question
@@ -488,31 +540,19 @@
/* Expose the PCI resources from this device as files */
for (i = 0; i < PCI_ROM_RESOURCE; i++) {
- struct bin_attribute *res_attr;
/* skip empty resources */
if (!pci_resource_len(pdev, i))
continue;
- /* allocate attribute structure, piggyback attribute name */
- res_attr = kzalloc(sizeof(*res_attr) + 10, GFP_ATOMIC);
- if (res_attr) {
- char *res_attr_name = (char *)(res_attr + 1);
+ retval = pci_create_attr(pdev, i, 0);
+ /* for prefetchable resources, create a WC mappable file */
+ if (!retval && pdev->resource[i].flags & IORESOURCE_PREFETCH)
+ retval = pci_create_attr(pdev, i, 1);
- pdev->res_attr[i] = res_attr;
- sprintf(res_attr_name, "resource%d", i);
- res_attr->attr.name = res_attr_name;
- res_attr->attr.mode = S_IRUSR | S_IWUSR;
- res_attr->size = pci_resource_len(pdev, i);
- res_attr->mmap = pci_mmap_resource;
- res_attr->private = &pdev->resource[i];
- retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
- if (retval) {
- pci_remove_resource_files(pdev);
- return retval;
- }
- } else {
- return -ENOMEM;
+ if (retval) {
+ pci_remove_resource_files(pdev);
+ return retval;
}
}
return 0;
Index: linux-2.6-x86.git/include/linux/pci.h
===================================================================
--- linux-2.6-x86.git.orig/include/linux/pci.h 2008-03-18 02:16:03.000000000 -0700
+++ linux-2.6-x86.git/include/linux/pci.h 2008-03-18 03:46:08.000000000 -0700
@@ -198,6 +198,7 @@
struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
int rom_attr_enabled; /* has display of the rom attribute been enabled? */
struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
+ struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
#ifdef CONFIG_PCI_MSI
struct list_head msi_list;
#endif
Index: linux-2.6-x86.git/Documentation/filesystems/sysfs-pci.txt
===================================================================
--- linux-2.6-x86.git.orig/Documentation/filesystems/sysfs-pci.txt 2008-03-18 02:16:03.000000000 -0700
+++ linux-2.6-x86.git/Documentation/filesystems/sysfs-pci.txt 2008-03-18 03:46:08.000000000 -0700
@@ -36,6 +36,7 @@
local_cpus nearby CPU mask (cpumask, ro)
resource PCI resource host addresses (ascii, ro)
resource0..N PCI resource N, if present (binary, mmap)
+ resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
rom PCI ROM resource, if present (binary, ro)
subsystem_device PCI subsystem device (ascii, ro)
subsystem_vendor PCI subsystem vendor (ascii, ro)
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 11/13] x86: PAT Add set_memory_wc() interface
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (9 preceding siblings ...)
2008-03-19 0:00 ` [patch 10/13] x86: PAT export resource_wc in pci sysfs venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 12/13] x86: PAT Add ioremap_wc() interface venkatesh.pallipadi
` (3 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: set_memory_wc.patch --]
[-- Type: text/plain, Size: 1992 bytes --]
Add a set_memory_wc interface(), similar to set_memory_uc interface.
Callers has to call set_memory_uc, set_memory_wb and
set_memory_wc, set_memory_wb as pairs.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/pageattr.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pageattr.c 2008-03-18 03:23:49.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pageattr.c 2008-03-18 03:53:45.000000000 -0700
@@ -788,6 +788,25 @@
}
EXPORT_SYMBOL(set_memory_uc);
+int _set_memory_wc(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(addr, numpages,
+ __pgprot(_PAGE_CACHE_WC));
+}
+
+int set_memory_wc(unsigned long addr, int numpages)
+{
+ if (!pat_wc_enabled)
+ return set_memory_uc(addr, numpages);
+
+ if (reserve_memtype(addr, addr + numpages * PAGE_SIZE,
+ _PAGE_CACHE_WC, NULL))
+ return -EINVAL;
+
+ return _set_memory_wc(addr, numpages);
+}
+EXPORT_SYMBOL(set_memory_wc);
+
int _set_memory_wb(unsigned long addr, int numpages)
{
return change_page_attr_clear(addr, numpages,
Index: linux-2.6-x86.git/include/asm-x86/cacheflush.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/cacheflush.h 2008-03-18 03:21:25.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/cacheflush.h 2008-03-18 03:53:45.000000000 -0700
@@ -35,8 +35,10 @@
int set_pages_rw(struct page *page, int numpages);
int _set_memory_uc(unsigned long addr, int numpages);
+int _set_memory_wc(unsigned long addr, int numpages);
int _set_memory_wb(unsigned long addr, int numpages);
int set_memory_uc(unsigned long addr, int numpages);
+int set_memory_wc(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_x(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 12/13] x86: PAT Add ioremap_wc() interface
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (10 preceding siblings ...)
2008-03-19 0:00 ` [patch 11/13] x86: PAT Add set_memory_wc() interface venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-19 0:00 ` [patch 13/13] x86: PAT Patch to add PAT related debug prints venkatesh.pallipadi
` (2 subsequent siblings)
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: ioremap_wc.patch --]
[-- Type: text/plain, Size: 4942 bytes --]
Introduce ioremap_wc for wc remap. There is also a generic ioremap_wc
aliased to ioremap_uc so that drivers can use this interface transparently.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 03:21:25.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 09:20:21.000000000 -0700
@@ -98,6 +98,9 @@
default:
err = _set_memory_uc(vaddr, nrpages);
break;
+ case _PAGE_CACHE_WC:
+ err = _set_memory_wc(vaddr, nrpages);
+ break;
case _PAGE_CACHE_WB:
err = _set_memory_wb(vaddr, nrpages);
break;
@@ -168,8 +171,13 @@
* Do not fallback to certain memory types with certain
* requested type:
* - request is uncached, return cannot be write-back
+ * - request is uncached, return cannot be write-combine
+ * - request is write-combine, return cannot be write-back
*/
if ((prot_val == _PAGE_CACHE_UC &&
+ (new_prot_val == _PAGE_CACHE_WB ||
+ new_prot_val == _PAGE_CACHE_WC)) ||
+ (prot_val == _PAGE_CACHE_WC &&
new_prot_val == _PAGE_CACHE_WB)) {
free_memtype(phys_addr, phys_addr + size);
return NULL;
@@ -182,6 +190,9 @@
default:
prot = PAGE_KERNEL_NOCACHE;
break;
+ case _PAGE_CACHE_WC:
+ prot = PAGE_KERNEL_WC;
+ break;
case _PAGE_CACHE_WB:
prot = PAGE_KERNEL;
break;
@@ -240,6 +251,25 @@
}
EXPORT_SYMBOL(ioremap_nocache);
+/**
+ * ioremap_wc - map memory into CPU space write combined
+ * @offset: bus address of the memory
+ * @size: size of the resource to map
+ *
+ * This version of ioremap ensures that the memory is marked write combining.
+ * Write combining allows faster writes to some hardware devices.
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+{
+ if (pat_wc_enabled)
+ return __ioremap(phys_addr, size, _PAGE_CACHE_WC);
+ else
+ return ioremap_nocache(phys_addr, size);
+}
+EXPORT_SYMBOL(ioremap_wc);
+
void __iomem *ioremap_cache(unsigned long phys_addr, unsigned long size)
{
return __ioremap(phys_addr, size, _PAGE_CACHE_WB);
Index: linux-2.6-x86.git/include/asm-generic/iomap.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-generic/iomap.h 2008-03-18 02:16:03.000000000 -0700
+++ linux-2.6-x86.git/include/asm-generic/iomap.h 2008-03-18 03:54:39.000000000 -0700
@@ -60,6 +60,10 @@
extern void __iomem *ioport_map(unsigned long port, unsigned int nr);
extern void ioport_unmap(void __iomem *);
+#ifndef ARCH_HAS_IOREMAP_WC
+#define ioremap_wc ioremap_nocache
+#endif
+
/* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
struct pci_dev;
extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
Index: linux-2.6-x86.git/include/asm-x86/io.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/io.h 2008-03-18 03:14:09.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/io.h 2008-03-18 03:54:39.000000000 -0700
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_IO_H
#define _ASM_X86_IO_H
+#define ARCH_HAS_IOREMAP_WC
+
#ifdef CONFIG_X86_32
# include "io_32.h"
#else
@@ -9,6 +11,7 @@
extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
unsigned long prot_val);
+extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
extern void *xlate_dev_mem_ptr(unsigned long phys);
extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
Index: linux-2.6-x86.git/include/asm-x86/pgtable.h
===================================================================
--- linux-2.6-x86.git.orig/include/asm-x86/pgtable.h 2008-03-18 03:33:01.000000000 -0700
+++ linux-2.6-x86.git/include/asm-x86/pgtable.h 2008-03-18 03:54:39.000000000 -0700
@@ -90,6 +90,7 @@
#define __PAGE_KERNEL_RO (__PAGE_KERNEL & ~_PAGE_RW)
#define __PAGE_KERNEL_RX (__PAGE_KERNEL_EXEC & ~_PAGE_RW)
#define __PAGE_KERNEL_EXEC_NOCACHE (__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_WC (__PAGE_KERNEL | _PAGE_CACHE_WC)
#define __PAGE_KERNEL_NOCACHE (__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
#define __PAGE_KERNEL_VSYSCALL (__PAGE_KERNEL_RX | _PAGE_USER)
#define __PAGE_KERNEL_VSYSCALL_NOCACHE (__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
@@ -106,6 +107,7 @@
#define PAGE_KERNEL_RO MAKE_GLOBAL(__PAGE_KERNEL_RO)
#define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
#define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX)
+#define PAGE_KERNEL_WC MAKE_GLOBAL(__PAGE_KERNEL_WC)
#define PAGE_KERNEL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
#define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE)
#define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
--
^ permalink raw reply [flat|nested] 22+ messages in thread* [patch 13/13] x86: PAT Patch to add PAT related debug prints
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (11 preceding siblings ...)
2008-03-19 0:00 ` [patch 12/13] x86: PAT Add ioremap_wc() interface venkatesh.pallipadi
@ 2008-03-19 0:00 ` venkatesh.pallipadi
2008-03-21 13:24 ` [patch 00/13] x86: PAT support updated - v3 Ingo Molnar
2008-03-21 13:29 ` H. Peter Anvin
14 siblings, 0 replies; 22+ messages in thread
From: venkatesh.pallipadi @ 2008-03-19 0:00 UTC (permalink / raw)
To: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, hpa, akpm, arjan, jesse.barnes
Cc: linux-kernel, Venkatesh Pallipadi, Suresh Siddha
[-- Attachment #1: debug_reserve_free_memtype.patch --]
[-- Type: text/plain, Size: 2920 bytes --]
Adds debug prints at critical code. Adds enough info in dmesg to allow us to
do effective first round of analysis of any issues that may result due to PAT
patch series.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Index: linux-2.6-x86.git/arch/x86/mm/pat.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/pat.c 2008-03-18 03:40:47.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/pat.c 2008-03-18 03:56:52.000000000 -0700
@@ -289,6 +289,7 @@
struct memtype *saved_ptr;
if (parse->start >= end) {
+ printk("New Entry\n");
list_add(&new_entry->nd, parse->nd.prev);
new_entry = NULL;
break;
@@ -338,6 +339,8 @@
break;
}
+ printk("Overlap at 0x%Lx-0x%Lx\n",
+ saved_ptr->start, saved_ptr->end);
/* No conflict. Go ahead and add this new entry */
list_add(&new_entry->nd, saved_ptr->nd.prev);
new_entry = NULL;
@@ -388,6 +391,8 @@
break;
}
+ printk("Overlap at 0x%Lx-0x%Lx\n",
+ saved_ptr->start, saved_ptr->end);
/* No conflict. Go ahead and add this new entry */
list_add(&new_entry->nd, &saved_ptr->nd);
new_entry = NULL;
@@ -396,6 +401,10 @@
}
if (err) {
+ printk(
+ "reserve_memtype failed 0x%Lx-0x%Lx, track %s, req %s\n",
+ start, end, cattr_name(new_entry->type),
+ cattr_name(req_type));
kfree(new_entry);
spin_unlock(&memtype_lock);
return err;
@@ -404,6 +413,19 @@
if (new_entry) {
/* No conflict. Not yet added to the list. Add to the tail */
list_add_tail(&new_entry->nd, &memtype_list);
+ printk("New Entry\n");
+ }
+
+ if (ret_type) {
+ printk(
+ "reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s, ret %s\n",
+ start, end, cattr_name(actual_type),
+ cattr_name(req_type), cattr_name(*ret_type));
+ } else {
+ printk(
+ "reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s\n",
+ start, end, cattr_name(actual_type),
+ cattr_name(req_type));
}
spin_unlock(&memtype_lock);
@@ -440,6 +462,8 @@
printk(KERN_DEBUG "%s:%d freeing invalid memtype %Lx-%Lx\n",
current->comm, current->pid, start, end);
}
+
+ printk( "free_memtype request 0x%Lx-0x%Lx\n", start, end);
return err;
}
Index: linux-2.6-x86.git/arch/x86/mm/ioremap.c
===================================================================
--- linux-2.6-x86.git.orig/arch/x86/mm/ioremap.c 2008-03-18 03:54:39.000000000 -0700
+++ linux-2.6-x86.git/arch/x86/mm/ioremap.c 2008-03-18 03:56:52.000000000 -0700
@@ -179,6 +179,10 @@
new_prot_val == _PAGE_CACHE_WC)) ||
(prot_val == _PAGE_CACHE_WC &&
new_prot_val == _PAGE_CACHE_WB)) {
+ printk(
+ "ioremap error for 0x%lx-0x%lx, requested 0x%lx, got 0x%lx\n",
+ phys_addr, phys_addr + size,
+ prot_val, new_prot_val);
free_memtype(phys_addr, phys_addr + size);
return NULL;
}
--
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (12 preceding siblings ...)
2008-03-19 0:00 ` [patch 13/13] x86: PAT Patch to add PAT related debug prints venkatesh.pallipadi
@ 2008-03-21 13:24 ` Ingo Molnar
2008-03-21 14:55 ` Ingo Molnar
2008-03-21 19:26 ` Venki Pallipadi
2008-03-21 13:29 ` H. Peter Anvin
14 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-03-21 13:24 UTC (permalink / raw)
To: venkatesh.pallipadi
Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, tglx,
hpa, akpm, arjan, jesse.barnes, linux-kernel
* venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> wrote:
> Follow up on earlier PAT patch series here:
> http://lkml.org/lkml/2008/1/10/312
>
> This patch series adds Page Attribute Table (PAT) support on x86.
> There have been few changes based on comments for earlier patches and
> also issues that was seen while the earlier patchset was in mm. The
> main changes include:
>
> * Unlike earlier patchset, there are no changes to identity mapping of
> reserved regions.
> * Unlike earlier patches, there are no chanegs to early ioremap.
> * We look at MTRR setting and PAT request and track the resultant type
> to avoid aliasing.
> * UC_MINUS in PAT to provide backward compatibility to /dem/mem mmap users.
>
> In general, we have tried to make patches more simpler and cleaner.
> Hope is to cause less disruption along the way. The changes/cleaups
> that went into x86/mm (specifically pageattr.c) has helped us along
> the way.
>
> The patchset is against x86 testing from couple of days back.
thanks Venki, i've queued this up so that we can see how well it goes.
It now looks a lot less dangerous and more compatible than it did before
- but i'm sure there'll be issues nevertheless :-/
> There are two issues that we are leaving out at the moment to make the patch
> simple. We will be addressing them with incremental patches soon:
> * FB/DRM drivers using pgprot_val and changing protection on their own
> without using any proper APIs like ioremap. There are few such usages and
> each one will be addressed separately.
> * To change attributes from WC to WB in a "perfect way", one has to follow
> certain sequence like make page non-present etc.
hm, until this is done correctly i guess we should disallow WC to WB
transitions? A good number of erratas apply i suspect :-/
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-21 13:24 ` [patch 00/13] x86: PAT support updated - v3 Ingo Molnar
@ 2008-03-21 14:55 ` Ingo Molnar
2008-03-21 19:26 ` Venki Pallipadi
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-03-21 14:55 UTC (permalink / raw)
To: venkatesh.pallipadi
Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, tglx,
hpa, akpm, arjan, jesse.barnes, linux-kernel
* Ingo Molnar <mingo@elte.hu> wrote:
> > The patchset is against x86 testing from couple of days back.
>
> thanks Venki, i've queued this up so that we can see how well it goes.
> It now looks a lot less dangerous and more compatible than it did
> before - but i'm sure there'll be issues nevertheless :-/
no big issues so far, just a simple build fix for the !MTRR case below.
Ingo
----------------->
Subject: x86: PAT fix
From: Ingo Molnar <mingo@elte.hu>
Date: Fri Mar 21 15:42:28 CET 2008
build fix for !CONFIG_MTRR.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
include/asm-x86/mtrr.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
Index: linux-x86.q/include/asm-x86/mtrr.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/mtrr.h
+++ linux-x86.q/include/asm-x86/mtrr.h
@@ -84,10 +84,9 @@ struct mtrr_gentry
#ifdef __KERNEL__
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
-
/* The following functions are for use by other drivers */
# ifdef CONFIG_MTRR
+extern u8 mtrr_type_lookup(u64 addr, u64 end);
extern void mtrr_save_fixed_ranges(void *);
extern void mtrr_save_state(void);
extern int mtrr_add (unsigned long base, unsigned long size,
@@ -101,6 +100,13 @@ extern void mtrr_ap_init(void);
extern void mtrr_bp_init(void);
extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
# else
+static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+{
+ /*
+ * Return no-MTRRs:
+ */
+ return 0xff;
+}
#define mtrr_save_fixed_ranges(arg) do {} while (0)
#define mtrr_save_state() do {} while (0)
static __inline__ int mtrr_add (unsigned long base, unsigned long size,
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-21 13:24 ` [patch 00/13] x86: PAT support updated - v3 Ingo Molnar
2008-03-21 14:55 ` Ingo Molnar
@ 2008-03-21 19:26 ` Venki Pallipadi
1 sibling, 0 replies; 22+ messages in thread
From: Venki Pallipadi @ 2008-03-21 19:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, gregkh,
airlied, davej, tglx, hpa, akpm, arjan, jesse.barnes,
linux-kernel, suresh.b.siddha
On Fri, Mar 21, 2008 at 02:24:17PM +0100, Ingo Molnar wrote:
>
> * venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> wrote:
>
> > Follow up on earlier PAT patch series here:
> > http://lkml.org/lkml/2008/1/10/312
> >
> > This patch series adds Page Attribute Table (PAT) support on x86.
> > There have been few changes based on comments for earlier patches and
> > also issues that was seen while the earlier patchset was in mm. The
> > main changes include:
> >
> > * Unlike earlier patchset, there are no changes to identity mapping of
> > reserved regions.
> > * Unlike earlier patches, there are no chanegs to early ioremap.
> > * We look at MTRR setting and PAT request and track the resultant type
> > to avoid aliasing.
> > * UC_MINUS in PAT to provide backward compatibility to /dem/mem mmap users.
> >
> > In general, we have tried to make patches more simpler and cleaner.
> > Hope is to cause less disruption along the way. The changes/cleaups
> > that went into x86/mm (specifically pageattr.c) has helped us along
> > the way.
> >
> > The patchset is against x86 testing from couple of days back.
>
> thanks Venki, i've queued this up so that we can see how well it goes.
> It now looks a lot less dangerous and more compatible than it did before
> - but i'm sure there'll be issues nevertheless :-/
I am keeping my fingers crossed. :-)
> > There are two issues that we are leaving out at the moment to make the patch
> > simple. We will be addressing them with incremental patches soon:
> > * FB/DRM drivers using pgprot_val and changing protection on their own
> > without using any proper APIs like ioremap. There are few such usages and
> > each one will be addressed separately.
> > * To change attributes from WC to WB in a "perfect way", one has to follow
> > certain sequence like make page non-present etc.
>
> hm, until this is done correctly i guess we should disallow WC to WB
> transitions? A good number of erratas apply i suspect :-/
>
I think we can support WC in the mean time. Currently we follow the TLB and
cache flushing logic that is there which is a OK solution. I mean, I dont think
there will be nasty hangs etc because of this. We do keep track of the usage
of these mappings and there should not be any thing other than speculative
accesses from other CPUs at the time we change the attribute. So, we are trying
to double check whether the SDM approach is really needed for our usage model.
BTW, suggested solution in SDM says we should make the page "Not Present" first
flush the TLBs and then change the attribute and make it present. This will
potentially involve some changes in page fault handler as well.
Thanks,
Venki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-19 0:00 [patch 00/13] x86: PAT support updated - v3 venkatesh.pallipadi
` (13 preceding siblings ...)
2008-03-21 13:24 ` [patch 00/13] x86: PAT support updated - v3 Ingo Molnar
@ 2008-03-21 13:29 ` H. Peter Anvin
2008-03-21 19:19 ` Venki Pallipadi
14 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2008-03-21 13:29 UTC (permalink / raw)
To: venkatesh.pallipadi
Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, akpm, arjan, jesse.barnes, linux-kernel
venkatesh.pallipadi@intel.com wrote:
> * UC_MINUS in PAT to provide backward compatibility to /dem/mem mmap users.
I have to say I think this looks like a good patchset. However, I'd
like a bit more clarification with regards to the above point?
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-21 13:29 ` H. Peter Anvin
@ 2008-03-21 19:19 ` Venki Pallipadi
2008-03-21 19:59 ` H. Peter Anvin
0 siblings, 1 reply; 22+ messages in thread
From: Venki Pallipadi @ 2008-03-21 19:19 UTC (permalink / raw)
To: H. Peter Anvin
Cc: venkatesh.pallipadi, ak, ebiederm, rdreier, torvalds, gregkh,
airlied, davej, mingo, tglx, akpm, arjan, jesse.barnes,
linux-kernel, suresh.b.siddha
On Fri, Mar 21, 2008 at 06:29:48AM -0700, H. Peter Anvin wrote:
> venkatesh.pallipadi@intel.com wrote:
> >* UC_MINUS in PAT to provide backward compatibility to /dem/mem mmap users.
>
> I have to say I think this looks like a good patchset. However, I'd
> like a bit more clarification with regards to the above point?
>
X seems to use (in that order)
- mmap the range through /dev/mem
- Set MTRR for the range to WC
I see this happening on one of my test systems with relatively new xorg.
In this case, when mmap does the reserve for this range, if we give UC mapping
then we will effectively negate the MTRR WC setting with the range being mapped
UC. To accomodate this special use case, we give /dev/mem mmap (only when there
are no other already existing mappings) a UC_MINUS attribute. With that,
if and when X sets MTRR the range will become WC and until that time it will be
UC. We ensure that all page table mappings use UC_MINUS for that range.
Long term, we want X to switch to /proc/ of /sys interfaces. But, we can also
provide backward compatibility for existing X usage like above.
Thanks,
Venki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [patch 00/13] x86: PAT support updated - v3
2008-03-21 19:19 ` Venki Pallipadi
@ 2008-03-21 19:59 ` H. Peter Anvin
0 siblings, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2008-03-21 19:59 UTC (permalink / raw)
To: Venki Pallipadi
Cc: ak, ebiederm, rdreier, torvalds, gregkh, airlied, davej, mingo,
tglx, akpm, arjan, jesse.barnes, linux-kernel, suresh.b.siddha
Venki Pallipadi wrote:
> On Fri, Mar 21, 2008 at 06:29:48AM -0700, H. Peter Anvin wrote:
>> venkatesh.pallipadi@intel.com wrote:
>>> * UC_MINUS in PAT to provide backward compatibility to /dem/mem mmap users.
>> I have to say I think this looks like a good patchset. However, I'd
>> like a bit more clarification with regards to the above point?
>>
>
> X seems to use (in that order)
> - mmap the range through /dev/mem
> - Set MTRR for the range to WC
>
> I see this happening on one of my test systems with relatively new xorg.
>
> In this case, when mmap does the reserve for this range, if we give UC mapping
> then we will effectively negate the MTRR WC setting with the range being mapped
> UC. To accomodate this special use case, we give /dev/mem mmap (only when there
> are no other already existing mappings) a UC_MINUS attribute. With that,
> if and when X sets MTRR the range will become WC and until that time it will be
> UC. We ensure that all page table mappings use UC_MINUS for that range.
>
> Long term, we want X to switch to /proc/ of /sys interfaces. But, we can also
> provide backward compatibility for existing X usage like above.
>
Makes total sense. Eventually I think we want to do /proc/mtrr
emulation, but for now, this is probably the best option.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread