* [git pull] x86 updates for v2.6.28, phase #2 - PAT updates
@ 2008-10-09 23:49 Ingo Molnar
2008-10-10 16:27 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2008-10-09 23:49 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin
Linus,
Please pull the latest x86-v28-for-linus-phase2 git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-v28-for-linus-phase2
these are the x86/pat optimizations and updates - with impact on AGP
drivers:
out-of-topic modifications in x86-v28-for-linus-phase2:
-------------------------------------------------------
drivers/char/agp/agp.h # bd07928: agp: add agp_generic_destroy_page
# 37acee1: agp: generic_alloc_pages()
# 9326d61: Revert "reduce tlb/cache flush ti
# 466ae83: reduce tlb/cache flush times of a
drivers/char/agp/alpha-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/amd-k7-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/amd64-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/ati-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/efficeon-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/generic.c # bd07928: agp: add agp_generic_destroy_page
# 37acee1: agp: generic_alloc_pages()
# 9326d61: Revert "reduce tlb/cache flush ti
# 466ae83: reduce tlb/cache flush times of a
drivers/char/agp/hp-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/i460-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/intel-agp.c # bd07928: agp: add agp_generic_destroy_page
# 37acee1: agp: generic_alloc_pages()
drivers/char/agp/nvidia-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/parisc-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/sis-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/sworks-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/uninorth-agp.c # c09ff7e: linux-next: fix x86 tree build fa
# 5f310b6: agp: enable optimized agp_alloc_p
drivers/char/agp/via-agp.c # 5f310b6: agp: enable optimized agp_alloc_p
mm/highmem.c # 5843d9a: x86, pat: avoid highmem cache att
Thanks,
Ingo
------------------>
Andreas Herrmann (1):
x86: compile pat debugfs interface only if CONFIG_X86_PAT is set
Ingo Molnar (12):
Revert "reduce tlb/cache flush times of agpgart memory allocation"
Revert "introduce two APIs for page attribute"
Revert "x86: handle error returns in set_memory_*()"
Revert "x86: track memtype for RAM in page struct"
Revert "x86, cpa: global flush tlb after splitting large page and before doing cpa"
Revert "x86, cpa: remove cpa pool code"
Revert "x86, cpa: fix taking the pgd_lock with interrupts off"
Revert "x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC"
Revert "x86, cpa: make the kernel physical mapping initialization a two pass sequence"
Revert "x86, cpa: remove USER permission from the very early identity mapping attribute"
Revert "x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot"
x86, pat: cleanups
Jeremy Fitzhardinge (1):
x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious
Nick Piggin (1):
x86, pat: avoid highmem cache attribute aliasing
Rene Herman (3):
x86: {reverve,free}_memtype() take a physical address
x86: have set_memory_array_{uc,wb} coalesce memtypes.
agp: enable optimized agp_alloc_pages methods
Shaohua Li (6):
introduce two APIs for page attribute
reduce tlb/cache flush times of agpgart memory allocation
x86, pageattr: introduce APIs to change pageattr of a page array
agp: generic_alloc_pages()
agp: add agp_generic_destroy_pages()
x86: fix pageattr-test
Stephen Rothwell (1):
linux-next: fix x86 tree build failure
Suresh Siddha (17):
x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
x86, cpa: remove USER permission from the very early identity mapping attribute
x86, cpa: make the kernel physical mapping initialization a two pass sequence
x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
x86, cpa: fix taking the pgd_lock with interrupts off
x86, cpa: remove cpa pool code
x86, cpa: global flush tlb after splitting large page and before doing cpa
x86: track memtype for RAM in page struct
x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot
x86, cpa: remove USER permission from the very early identity mapping attribute
x86, cpa: make the kernel physical mapping initialization a two pass sequence
x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
x86, cpa: remove cpa pool code
x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
x86: track memtype for RAM in page struct
x86: fix pagetable init 64-bit breakage
Venkatesh Pallipadi (1):
x86: handle error returns in set_memory_*()
Venki Pallipadi (1):
x86: have set_memory_array_{uc,wb} coalesce memtypes, fix
arch/x86/kernel/head_32.S | 34 ++--
arch/x86/kernel/head_64.S | 4 +-
arch/x86/mm/init_32.c | 84 +++++++-
arch/x86/mm/init_64.c | 127 +++++++++--
arch/x86/mm/ioremap.c | 19 ++
arch/x86/mm/pageattr-test.c | 9 +-
arch/x86/mm/pageattr.c | 459 +++++++++++++++++++++------------------
arch/x86/mm/pat.c | 132 +++++++++---
drivers/char/agp/agp.h | 5 +
drivers/char/agp/alpha-agp.c | 2 +
drivers/char/agp/amd-k7-agp.c | 2 +
drivers/char/agp/amd64-agp.c | 2 +
drivers/char/agp/ati-agp.c | 2 +
drivers/char/agp/efficeon-agp.c | 2 +
drivers/char/agp/generic.c | 97 ++++++++-
drivers/char/agp/hp-agp.c | 2 +
drivers/char/agp/i460-agp.c | 2 +
drivers/char/agp/intel-agp.c | 28 +++
drivers/char/agp/nvidia-agp.c | 2 +
drivers/char/agp/parisc-agp.c | 2 +
drivers/char/agp/sis-agp.c | 2 +
drivers/char/agp/sworks-agp.c | 2 +
drivers/char/agp/uninorth-agp.c | 4 +
drivers/char/agp/via-agp.c | 4 +
include/asm-x86/cacheflush.h | 7 +-
include/asm-x86/page.h | 1 +
include/asm-x86/pgtable.h | 13 +
mm/highmem.c | 5 +-
28 files changed, 752 insertions(+), 302 deletions(-)
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index a7010c3..e835b4e 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -172,10 +172,6 @@ num_subarch_entries = (. - subarch_entries) / 4
*
* Note that the stack is not yet set up!
*/
-#define PTE_ATTR 0x007 /* PRESENT+RW+USER */
-#define PDE_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */
-#define PGD_ATTR 0x001 /* PRESENT (no other attributes) */
-
default_entry:
#ifdef CONFIG_X86_PAE
@@ -196,9 +192,9 @@ default_entry:
movl $pa(pg0), %edi
movl %edi, pa(init_pg_tables_start)
movl $pa(swapper_pg_pmd), %edx
- movl $PTE_ATTR, %eax
+ movl $PTE_IDENT_ATTR, %eax
10:
- leal PDE_ATTR(%edi),%ecx /* Create PMD entry */
+ leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */
movl %ecx,(%edx) /* Store PMD entry */
/* Upper half already zero */
addl $8,%edx
@@ -215,7 +211,7 @@ default_entry:
* End condition: we must map up to and including INIT_MAP_BEYOND_END
* bytes beyond the end of our own page tables.
*/
- leal (INIT_MAP_BEYOND_END+PTE_ATTR)(%edi),%ebp
+ leal (INIT_MAP_BEYOND_END+PTE_IDENT_ATTR)(%edi),%ebp
cmpl %ebp,%eax
jb 10b
1:
@@ -224,7 +220,7 @@ default_entry:
movl %eax, pa(max_pfn_mapped)
/* Do early initialization of the fixmap area */
- movl $pa(swapper_pg_fixmap)+PDE_ATTR,%eax
+ movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8)
#else /* Not PAE */
@@ -233,9 +229,9 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
movl $pa(pg0), %edi
movl %edi, pa(init_pg_tables_start)
movl $pa(swapper_pg_dir), %edx
- movl $PTE_ATTR, %eax
+ movl $PTE_IDENT_ATTR, %eax
10:
- leal PDE_ATTR(%edi),%ecx /* Create PDE entry */
+ leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */
movl %ecx,(%edx) /* Store identity PDE entry */
movl %ecx,page_pde_offset(%edx) /* Store kernel PDE entry */
addl $4,%edx
@@ -249,7 +245,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
* bytes beyond the end of our own page tables; the +0x007 is
* the attribute bits
*/
- leal (INIT_MAP_BEYOND_END+PTE_ATTR)(%edi),%ebp
+ leal (INIT_MAP_BEYOND_END+PTE_IDENT_ATTR)(%edi),%ebp
cmpl %ebp,%eax
jb 10b
movl %edi,pa(init_pg_tables_end)
@@ -257,7 +253,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
movl %eax, pa(max_pfn_mapped)
/* Do early initialization of the fixmap area */
- movl $pa(swapper_pg_fixmap)+PDE_ATTR,%eax
+ movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax
movl %eax,pa(swapper_pg_dir+0xffc)
#endif
jmp 3f
@@ -634,19 +630,19 @@ ENTRY(empty_zero_page)
/* Page-aligned for the benefit of paravirt? */
.align PAGE_SIZE_asm
ENTRY(swapper_pg_dir)
- .long pa(swapper_pg_pmd+PGD_ATTR),0 /* low identity map */
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */
# if KPMDS == 3
- .long pa(swapper_pg_pmd+PGD_ATTR),0
- .long pa(swapper_pg_pmd+PGD_ATTR+0x1000),0
- .long pa(swapper_pg_pmd+PGD_ATTR+0x2000),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0
# elif KPMDS == 2
.long 0,0
- .long pa(swapper_pg_pmd+PGD_ATTR),0
- .long pa(swapper_pg_pmd+PGD_ATTR+0x1000),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0
# elif KPMDS == 1
.long 0,0
.long 0,0
- .long pa(swapper_pg_pmd+PGD_ATTR),0
+ .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0
# else
# error "Kernel PMDs should be 1, 2 or 3"
# endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index db3280a..26cfdc1 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -110,7 +110,7 @@ startup_64:
movq %rdi, %rax
shrq $PMD_SHIFT, %rax
andq $(PTRS_PER_PMD - 1), %rax
- leaq __PAGE_KERNEL_LARGE_EXEC(%rdi), %rdx
+ leaq __PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
leaq level2_spare_pgt(%rip), %rbx
movq %rdx, 0(%rbx, %rax, 8)
ident_complete:
@@ -374,7 +374,7 @@ NEXT_PAGE(level2_ident_pgt)
/* Since I easily can, map the first 1G.
* Don't set NX because code runs from these pages.
*/
- PMDS(0, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD)
+ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
NEXT_PAGE(level2_kernel_pgt)
/*
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 6b9a935..c3789bb 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -195,11 +195,30 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base,
pgd_t *pgd;
pmd_t *pmd;
pte_t *pte;
- unsigned pages_2m = 0, pages_4k = 0;
+ unsigned pages_2m, pages_4k;
+ int mapping_iter;
+
+ /*
+ * First iteration will setup identity mapping using large/small pages
+ * based on use_pse, with other attributes same as set by
+ * the early code in head_32.S
+ *
+ * Second iteration will setup the appropriate attributes (NX, GLOBAL..)
+ * as desired for the kernel identity mapping.
+ *
+ * This two pass mechanism conforms to the TLB app note which says:
+ *
+ * "Software should not write to a paging-structure entry in a way
+ * that would change, for any linear address, both the page size
+ * and either the page frame or attributes."
+ */
+ mapping_iter = 1;
if (!cpu_has_pse)
use_pse = 0;
+repeat:
+ pages_2m = pages_4k = 0;
pfn = start_pfn;
pgd_idx = pgd_index((pfn<<PAGE_SHIFT) + PAGE_OFFSET);
pgd = pgd_base + pgd_idx;
@@ -225,6 +244,13 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base,
if (use_pse) {
unsigned int addr2;
pgprot_t prot = PAGE_KERNEL_LARGE;
+ /*
+ * first pass will use the same initial
+ * identity mapping attribute + _PAGE_PSE.
+ */
+ pgprot_t init_prot =
+ __pgprot(PTE_IDENT_ATTR |
+ _PAGE_PSE);
addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE +
PAGE_OFFSET + PAGE_SIZE-1;
@@ -234,7 +260,10 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base,
prot = PAGE_KERNEL_LARGE_EXEC;
pages_2m++;
- set_pmd(pmd, pfn_pmd(pfn, prot));
+ if (mapping_iter == 1)
+ set_pmd(pmd, pfn_pmd(pfn, init_prot));
+ else
+ set_pmd(pmd, pfn_pmd(pfn, prot));
pfn += PTRS_PER_PTE;
continue;
@@ -246,17 +275,43 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base,
for (; pte_ofs < PTRS_PER_PTE && pfn < end_pfn;
pte++, pfn++, pte_ofs++, addr += PAGE_SIZE) {
pgprot_t prot = PAGE_KERNEL;
+ /*
+ * first pass will use the same initial
+ * identity mapping attribute.
+ */
+ pgprot_t init_prot = __pgprot(PTE_IDENT_ATTR);
if (is_kernel_text(addr))
prot = PAGE_KERNEL_EXEC;
pages_4k++;
- set_pte(pte, pfn_pte(pfn, prot));
+ if (mapping_iter == 1)
+ set_pte(pte, pfn_pte(pfn, init_prot));
+ else
+ set_pte(pte, pfn_pte(pfn, prot));
}
}
}
- update_page_count(PG_LEVEL_2M, pages_2m);
- update_page_count(PG_LEVEL_4K, pages_4k);
+ if (mapping_iter == 1) {
+ /*
+ * update direct mapping page count only in the first
+ * iteration.
+ */
+ update_page_count(PG_LEVEL_2M, pages_2m);
+ update_page_count(PG_LEVEL_4K, pages_4k);
+
+ /*
+ * local global flush tlb, which will flush the previous
+ * mappings present in both small and large page TLB's.
+ */
+ __flush_tlb_all();
+
+ /*
+ * Second iteration will set the actual desired PTE attributes.
+ */
+ mapping_iter = 2;
+ goto repeat;
+ }
}
/*
@@ -719,7 +774,7 @@ void __init setup_bootmem_allocator(void)
after_init_bootmem = 1;
}
-static void __init find_early_table_space(unsigned long end)
+static void __init find_early_table_space(unsigned long end, int use_pse)
{
unsigned long puds, pmds, ptes, tables, start;
@@ -729,7 +784,7 @@ static void __init find_early_table_space(unsigned long end)
pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
tables += PAGE_ALIGN(pmds * sizeof(pmd_t));
- if (cpu_has_pse) {
+ if (use_pse) {
unsigned long extra;
extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
@@ -769,12 +824,22 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
pgd_t *pgd_base = swapper_pg_dir;
unsigned long start_pfn, end_pfn;
unsigned long big_page_start;
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ /*
+ * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
+ * This will simplify cpa(), which otherwise needs to support splitting
+ * large pages into small in interrupt context, etc.
+ */
+ int use_pse = 0;
+#else
+ int use_pse = cpu_has_pse;
+#endif
/*
* Find space for the kernel direct mapping tables.
*/
if (!after_init_bootmem)
- find_early_table_space(end);
+ find_early_table_space(end, use_pse);
#ifdef CONFIG_X86_PAE
set_nx();
@@ -820,7 +885,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
end_pfn = (end>>PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT);
if (start_pfn < end_pfn)
kernel_physical_mapping_init(pgd_base, start_pfn, end_pfn,
- cpu_has_pse);
+ use_pse);
/* tail is not big page alignment ? */
start_pfn = end_pfn;
@@ -983,7 +1048,6 @@ void __init mem_init(void)
if (boot_cpu_data.wp_works_ok < 0)
test_wp_bit();
- cpa_init();
save_pg_dir();
zap_low_mappings();
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 770536e..0152221 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -270,6 +270,8 @@ static __ref void unmap_low_page(void *adr)
early_iounmap(adr, PAGE_SIZE);
}
+static int physical_mapping_iter;
+
static unsigned long __meminit
phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end)
{
@@ -290,16 +292,19 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end)
}
if (pte_val(*pte))
- continue;
+ goto repeat_set_pte;
if (0)
printk(" pte=%p addr=%lx pte=%016lx\n",
pte, addr, pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL).pte);
+ pages++;
+repeat_set_pte:
set_pte(pte, pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL));
last_map_addr = (addr & PAGE_MASK) + PAGE_SIZE;
- pages++;
}
- update_page_count(PG_LEVEL_4K, pages);
+
+ if (physical_mapping_iter == 1)
+ update_page_count(PG_LEVEL_4K, pages);
return last_map_addr;
}
@@ -318,7 +323,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
{
unsigned long pages = 0;
unsigned long last_map_addr = end;
- unsigned long start = address;
int i = pmd_index(address);
@@ -341,15 +345,14 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
last_map_addr = phys_pte_update(pmd, address,
end);
spin_unlock(&init_mm.page_table_lock);
+ continue;
}
- /* Count entries we're using from level2_ident_pgt */
- if (start == 0)
- pages++;
- continue;
+ goto repeat_set_pte;
}
if (page_size_mask & (1<<PG_LEVEL_2M)) {
pages++;
+repeat_set_pte:
spin_lock(&init_mm.page_table_lock);
set_pte((pte_t *)pmd,
pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
@@ -366,7 +369,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
spin_unlock(&init_mm.page_table_lock);
}
- update_page_count(PG_LEVEL_2M, pages);
+ if (physical_mapping_iter == 1)
+ update_page_count(PG_LEVEL_2M, pages);
return last_map_addr;
}
@@ -405,14 +409,18 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
}
if (pud_val(*pud)) {
- if (!pud_large(*pud))
+ if (!pud_large(*pud)) {
last_map_addr = phys_pmd_update(pud, addr, end,
page_size_mask);
- continue;
+ continue;
+ }
+
+ goto repeat_set_pte;
}
if (page_size_mask & (1<<PG_LEVEL_1G)) {
pages++;
+repeat_set_pte:
spin_lock(&init_mm.page_table_lock);
set_pte((pte_t *)pud,
pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
@@ -430,7 +438,9 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
spin_unlock(&init_mm.page_table_lock);
}
__flush_tlb_all();
- update_page_count(PG_LEVEL_1G, pages);
+
+ if (physical_mapping_iter == 1)
+ update_page_count(PG_LEVEL_1G, pages);
return last_map_addr;
}
@@ -446,13 +456,14 @@ phys_pud_update(pgd_t *pgd, unsigned long addr, unsigned long end,
return phys_pud_init(pud, addr, end, page_size_mask);
}
-static void __init find_early_table_space(unsigned long end)
+static void __init find_early_table_space(unsigned long end, int use_pse,
+ int use_gbpages)
{
unsigned long puds, pmds, ptes, tables, start;
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
- if (direct_gbpages) {
+ if (use_gbpages) {
unsigned long extra;
extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
@@ -460,7 +471,7 @@ static void __init find_early_table_space(unsigned long end)
pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
- if (cpu_has_pse) {
+ if (use_pse) {
unsigned long extra;
extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -494,15 +505,54 @@ static void __init init_gbpages(void)
direct_gbpages = 0;
}
+static int is_kernel(unsigned long pfn)
+{
+ unsigned long pg_addresss = pfn << PAGE_SHIFT;
+
+ if (pg_addresss >= (unsigned long) __pa(_text) &&
+ pg_addresss < (unsigned long) __pa(_end))
+ return 1;
+
+ return 0;
+}
+
static unsigned long __init kernel_physical_mapping_init(unsigned long start,
unsigned long end,
unsigned long page_size_mask)
{
- unsigned long next, last_map_addr = end;
+ unsigned long next, last_map_addr;
+ u64 cached_supported_pte_mask = __supported_pte_mask;
+ unsigned long cache_start = start;
+ unsigned long cache_end = end;
+
+ /*
+ * First iteration will setup identity mapping using large/small pages
+ * based on page_size_mask, with other attributes same as set by
+ * the early code in head_64.S
+ *
+ * Second iteration will setup the appropriate attributes
+ * as desired for the kernel identity mapping.
+ *
+ * This two pass mechanism conforms to the TLB app note which says:
+ *
+ * "Software should not write to a paging-structure entry in a way
+ * that would change, for any linear address, both the page size
+ * and either the page frame or attributes."
+ *
+ * For now, only difference between very early PTE attributes used in
+ * head_64.S and here is _PAGE_NX.
+ */
+ BUILD_BUG_ON((__PAGE_KERNEL_LARGE & ~__PAGE_KERNEL_IDENT_LARGE_EXEC)
+ != _PAGE_NX);
+ __supported_pte_mask &= ~(_PAGE_NX);
+ physical_mapping_iter = 1;
+
+repeat:
+ last_map_addr = cache_end;
- start = (unsigned long)__va(start);
- end = (unsigned long)__va(end);
+ start = (unsigned long)__va(cache_start);
+ end = (unsigned long)__va(cache_end);
for (; start < end; start = next) {
pgd_t *pgd = pgd_offset_k(start);
@@ -514,11 +564,21 @@ static unsigned long __init kernel_physical_mapping_init(unsigned long start,
next = end;
if (pgd_val(*pgd)) {
+ /*
+ * Static identity mappings will be overwritten
+ * with run-time mappings. For example, this allows
+ * the static 0-1GB identity mapping to be mapped
+ * non-executable with this.
+ */
+ if (is_kernel(pte_pfn(*((pte_t *) pgd))))
+ goto realloc;
+
last_map_addr = phys_pud_update(pgd, __pa(start),
__pa(end), page_size_mask);
continue;
}
+realloc:
pud = alloc_low_page(&pud_phys);
last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
page_size_mask);
@@ -528,6 +588,16 @@ static unsigned long __init kernel_physical_mapping_init(unsigned long start,
pgd_populate(&init_mm, pgd, __va(pud_phys));
spin_unlock(&init_mm.page_table_lock);
}
+ __flush_tlb_all();
+
+ if (physical_mapping_iter == 1) {
+ physical_mapping_iter = 2;
+ /*
+ * Second iteration will set the actual desired PTE attributes.
+ */
+ __supported_pte_mask = cached_supported_pte_mask;
+ goto repeat;
+ }
return last_map_addr;
}
@@ -571,6 +641,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
struct map_range mr[NR_RANGE_MR];
int nr_range, i;
+ int use_pse, use_gbpages;
printk(KERN_INFO "init_memory_mapping\n");
@@ -584,9 +655,21 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
if (!after_bootmem)
init_gbpages();
- if (direct_gbpages)
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ /*
+ * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages.
+ * This will simplify cpa(), which otherwise needs to support splitting
+ * large pages into small in interrupt context, etc.
+ */
+ use_pse = use_gbpages = 0;
+#else
+ use_pse = cpu_has_pse;
+ use_gbpages = direct_gbpages;
+#endif
+
+ if (use_gbpages)
page_size_mask |= 1 << PG_LEVEL_1G;
- if (cpu_has_pse)
+ if (use_pse)
page_size_mask |= 1 << PG_LEVEL_2M;
memset(mr, 0, sizeof(mr));
@@ -647,7 +730,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
(mr[i].page_size_mask & (1<<PG_LEVEL_2M))?"2M":"4k"));
if (!after_bootmem)
- find_early_table_space(end);
+ find_early_table_space(end, use_pse, use_gbpages);
for (i = 0; i < nr_range; i++)
last_map_addr = kernel_physical_mapping_init(
@@ -806,8 +889,6 @@ void __init mem_init(void)
reservedpages << (PAGE_SHIFT-10),
datasize >> 10,
initsize >> 10);
-
- cpa_init();
}
void free_init_pages(char *what, unsigned long begin, unsigned long end)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index cac6da5..6ab3196 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,6 +83,25 @@ int page_is_ram(unsigned long pagenr)
return 0;
}
+int pagerange_is_ram(unsigned long start, unsigned long end)
+{
+ int ram_page = 0, not_rampage = 0;
+ unsigned long page_nr;
+
+ for (page_nr = (start >> PAGE_SHIFT); page_nr < (end >> PAGE_SHIFT);
+ ++page_nr) {
+ if (page_is_ram(page_nr))
+ ram_page = 1;
+ else
+ not_rampage = 1;
+
+ if (ram_page == not_rampage)
+ return -1;
+ }
+
+ return ram_page;
+}
+
/*
* Fix up the linear direct mapping of the kernel to avoid cache attribute
* conflicts.
diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c
index d4aa503..e1d1069 100644
--- a/arch/x86/mm/pageattr-test.c
+++ b/arch/x86/mm/pageattr-test.c
@@ -32,7 +32,7 @@ enum {
GPS = (1<<30)
};
-#define PAGE_TESTBIT __pgprot(_PAGE_UNUSED1)
+#define PAGE_CPA_TEST __pgprot(_PAGE_CPA_TEST)
static int pte_testbit(pte_t pte)
{
@@ -118,6 +118,7 @@ static int pageattr_test(void)
unsigned int level;
int i, k;
int err;
+ unsigned long test_addr;
if (print)
printk(KERN_INFO "CPA self-test:\n");
@@ -172,7 +173,8 @@ static int pageattr_test(void)
continue;
}
- err = change_page_attr_set(addr[i], len[i], PAGE_TESTBIT);
+ test_addr = addr[i];
+ err = change_page_attr_set(&test_addr, len[i], PAGE_CPA_TEST, 0);
if (err < 0) {
printk(KERN_ERR "CPA %d failed %d\n", i, err);
failed++;
@@ -204,7 +206,8 @@ static int pageattr_test(void)
failed++;
continue;
}
- err = change_page_attr_clear(addr[i], len[i], PAGE_TESTBIT);
+ test_addr = addr[i];
+ err = change_page_attr_clear(&test_addr, len[i], PAGE_CPA_TEST, 0);
if (err < 0) {
printk(KERN_ERR "CPA reverting failed: %d\n", err);
failed++;
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 898fad6..a9ec89c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -25,15 +25,27 @@
* The current flushing context - we pass it instead of 5 arguments:
*/
struct cpa_data {
- unsigned long vaddr;
+ unsigned long *vaddr;
pgprot_t mask_set;
pgprot_t mask_clr;
int numpages;
- int flushtlb;
+ int flags;
unsigned long pfn;
unsigned force_split : 1;
+ int curpage;
};
+/*
+ * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
+ * using cpa_lock. So that we don't allow any other cpu, with stale large tlb
+ * entries change the page attribute in parallel to some other cpu
+ * splitting a large page entry along with changing the attribute.
+ */
+static DEFINE_SPINLOCK(cpa_lock);
+
+#define CPA_FLUSHTLB 1
+#define CPA_ARRAY 2
+
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -190,6 +202,41 @@ static void cpa_flush_range(unsigned long start, int numpages, int cache)
}
}
+static void cpa_flush_array(unsigned long *start, int numpages, int cache)
+{
+ unsigned int i, level;
+ unsigned long *addr;
+
+ BUG_ON(irqs_disabled());
+
+ on_each_cpu(__cpa_flush_range, NULL, 1);
+
+ if (!cache)
+ return;
+
+ /* 4M threshold */
+ if (numpages >= 1024) {
+ if (boot_cpu_data.x86_model >= 4)
+ wbinvd();
+ return;
+ }
+ /*
+ * We only need to flush on one CPU,
+ * clflush is a MESI-coherent instruction that
+ * will cause all other CPUs to flush the same
+ * cachelines:
+ */
+ for (i = 0, addr = start; i < numpages; i++, addr++) {
+ pte_t *pte = lookup_address(*addr, &level);
+
+ /*
+ * Only flush present addresses:
+ */
+ if (pte && (pte_val(*pte) & _PAGE_PRESENT))
+ clflush_cache_range((void *) *addr, PAGE_SIZE);
+ }
+}
+
/*
* Certain areas of memory on x86 require very specific protection flags,
* for example the BIOS area or kernel text. Callers don't always get this
@@ -398,7 +445,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
*/
new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot));
__set_pmd_pte(kpte, address, new_pte);
- cpa->flushtlb = 1;
+ cpa->flags |= CPA_FLUSHTLB;
do_split = 0;
}
@@ -408,84 +455,6 @@ out_unlock:
return do_split;
}
-static LIST_HEAD(page_pool);
-static unsigned long pool_size, pool_pages, pool_low;
-static unsigned long pool_used, pool_failed;
-
-static void cpa_fill_pool(struct page **ret)
-{
- gfp_t gfp = GFP_KERNEL;
- unsigned long flags;
- struct page *p;
-
- /*
- * Avoid recursion (on debug-pagealloc) and also signal
- * our priority to get to these pagetables:
- */
- if (current->flags & PF_MEMALLOC)
- return;
- current->flags |= PF_MEMALLOC;
-
- /*
- * Allocate atomically from atomic contexts:
- */
- if (in_atomic() || irqs_disabled() || debug_pagealloc)
- gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
-
- while (pool_pages < pool_size || (ret && !*ret)) {
- p = alloc_pages(gfp, 0);
- if (!p) {
- pool_failed++;
- break;
- }
- /*
- * If the call site needs a page right now, provide it:
- */
- if (ret && !*ret) {
- *ret = p;
- continue;
- }
- spin_lock_irqsave(&pgd_lock, flags);
- list_add(&p->lru, &page_pool);
- pool_pages++;
- spin_unlock_irqrestore(&pgd_lock, flags);
- }
-
- current->flags &= ~PF_MEMALLOC;
-}
-
-#define SHIFT_MB (20 - PAGE_SHIFT)
-#define ROUND_MB_GB ((1 << 10) - 1)
-#define SHIFT_MB_GB 10
-#define POOL_PAGES_PER_GB 16
-
-void __init cpa_init(void)
-{
- struct sysinfo si;
- unsigned long gb;
-
- si_meminfo(&si);
- /*
- * Calculate the number of pool pages:
- *
- * Convert totalram (nr of pages) to MiB and round to the next
- * GiB. Shift MiB to Gib and multiply the result by
- * POOL_PAGES_PER_GB:
- */
- if (debug_pagealloc) {
- gb = ((si.totalram >> SHIFT_MB) + ROUND_MB_GB) >> SHIFT_MB_GB;
- pool_size = POOL_PAGES_PER_GB * gb;
- } else {
- pool_size = 1;
- }
- pool_low = pool_size;
-
- cpa_fill_pool(NULL);
- printk(KERN_DEBUG
- "CPA: page pool initialized %lu of %lu pages preallocated\n",
- pool_pages, pool_size);
-}
-
static int split_large_page(pte_t *kpte, unsigned long address)
{
unsigned long flags, pfn, pfninc = 1;
@@ -494,28 +463,15 @@ static int split_large_page(pte_t *kpte, unsigned long address)
pgprot_t ref_prot;
struct page *base;
- /*
- * Get a page from the pool. The pool list is protected by the
- * pgd_lock, which we have to take anyway for the split
- * operation:
- */
- spin_lock_irqsave(&pgd_lock, flags);
- if (list_empty(&page_pool)) {
- spin_unlock_irqrestore(&pgd_lock, flags);
- base = NULL;
- cpa_fill_pool(&base);
- if (!base)
- return -ENOMEM;
- spin_lock_irqsave(&pgd_lock, flags);
- } else {
- base = list_first_entry(&page_pool, struct page, lru);
- list_del(&base->lru);
- pool_pages--;
-
- if (pool_pages < pool_low)
- pool_low = pool_pages;
- }
+ if (!debug_pagealloc)
+ spin_unlock(&cpa_lock);
+ base = alloc_pages(GFP_KERNEL, 0);
+ if (!debug_pagealloc)
+ spin_lock(&cpa_lock);
+ if (!base)
+ return -ENOMEM;
+ spin_lock_irqsave(&pgd_lock, flags);
/*
* Check for races, another CPU might have split this page
* up for us already:
@@ -572,11 +528,8 @@ out_unlock:
* If we dropped out via the lookup_address check under
* pgd_lock then stick the page back into the pool:
*/
- if (base) {
- list_add(&base->lru, &page_pool);
- pool_pages++;
- } else
- pool_used++;
+ if (base)
+ __free_page(base);
spin_unlock_irqrestore(&pgd_lock, flags);
return 0;
@@ -584,11 +537,16 @@ out_unlock:
static int __change_page_attr(struct cpa_data *cpa, int primary)
{
- unsigned long address = cpa->vaddr;
+ unsigned long address;
int do_split, err;
unsigned int level;
pte_t *kpte, old_pte;
+ if (cpa->flags & CPA_ARRAY)
+ address = cpa->vaddr[cpa->curpage];
+ else
+ address = *cpa->vaddr;
+
repeat:
kpte = lookup_address(address, &level);
if (!kpte)
@@ -600,7 +558,7 @@ repeat:
return 0;
WARN(1, KERN_WARNING "CPA: called for zero pte. "
"vaddr = %lx cpa->vaddr = %lx\n", address,
- cpa->vaddr);
+ *cpa->vaddr);
return -EINVAL;
}
@@ -626,7 +584,7 @@ repeat:
*/
if (pte_val(old_pte) != pte_val(new_pte)) {
set_pte_atomic(kpte, new_pte);
- cpa->flushtlb = 1;
+ cpa->flags |= CPA_FLUSHTLB;
}
cpa->numpages = 1;
return 0;
@@ -650,7 +608,25 @@ repeat:
*/
err = split_large_page(kpte, address);
if (!err) {
- cpa->flushtlb = 1;
+ /*
+ * Do a global flush tlb after splitting the large page
+ * and before we do the actual change page attribute in the PTE.
+ *
+ * With out this, we violate the TLB application note, that says
+ * "The TLBs may contain both ordinary and large-page
+ * translations for a 4-KByte range of linear addresses. This
+ * may occur if software modifies the paging structures so that
+ * the page size used for the address range changes. If the two
+ * translations differ with respect to page frame or attributes
+ * (e.g., permissions), processor behavior is undefined and may
+ * be implementation-specific."
+ *
+ * We do this global tlb flush inside the cpa_lock, so that we
+ * don't allow any other cpu, with stale tlb entries change the
+ * page attribute in parallel, that also falls into the
+ * just split large page entry.
+ */
+ flush_tlb_all();
goto repeat;
}
@@ -663,6 +639,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
{
struct cpa_data alias_cpa;
int ret = 0;
+ unsigned long temp_cpa_vaddr, vaddr;
if (cpa->pfn >= max_pfn_mapped)
return 0;
@@ -675,16 +652,24 @@ static int cpa_process_alias(struct cpa_data *cpa)
* No need to redo, when the primary call touched the direct
* mapping already:
*/
- if (!(within(cpa->vaddr, PAGE_OFFSET,
+ if (cpa->flags & CPA_ARRAY)
+ vaddr = cpa->vaddr[cpa->curpage];
+ else
+ vaddr = *cpa->vaddr;
+
+ if (!(within(vaddr, PAGE_OFFSET,
PAGE_OFFSET + (max_low_pfn_mapped << PAGE_SHIFT))
#ifdef CONFIG_X86_64
- || within(cpa->vaddr, PAGE_OFFSET + (1UL<<32),
+ || within(vaddr, PAGE_OFFSET + (1UL<<32),
PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))
#endif
)) {
alias_cpa = *cpa;
- alias_cpa.vaddr = (unsigned long) __va(cpa->pfn << PAGE_SHIFT);
+ temp_cpa_vaddr = (unsigned long) __va(cpa->pfn << PAGE_SHIFT);
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.flags &= ~CPA_ARRAY;
+
ret = __change_page_attr_set_clr(&alias_cpa, 0);
}
@@ -696,7 +681,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
* No need to redo, when the primary call touched the high
* mapping already:
*/
- if (within(cpa->vaddr, (unsigned long) _text, (unsigned long) _end))
+ if (within(vaddr, (unsigned long) _text, (unsigned long) _end))
return 0;
/*
@@ -707,8 +692,9 @@ static int cpa_process_alias(struct cpa_data *cpa)
return 0;
alias_cpa = *cpa;
- alias_cpa.vaddr =
- (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base;
+ temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base;
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.flags &= ~CPA_ARRAY;
/*
* The high mapping range is imprecise, so ignore the return value.
@@ -728,8 +714,15 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
* preservation check.
*/
cpa->numpages = numpages;
+ /* for array changes, we can't use large page */
+ if (cpa->flags & CPA_ARRAY)
+ cpa->numpages = 1;
+ if (!debug_pagealloc)
+ spin_lock(&cpa_lock);
ret = __change_page_attr(cpa, checkalias);
+ if (!debug_pagealloc)
+ spin_unlock(&cpa_lock);
if (ret)
return ret;
@@ -746,7 +739,11 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
*/
BUG_ON(cpa->numpages > numpages);
numpages -= cpa->numpages;
- cpa->vaddr += cpa->numpages * PAGE_SIZE;
+ if (cpa->flags & CPA_ARRAY)
+ cpa->curpage++;
+ else
+ *cpa->vaddr += cpa->numpages * PAGE_SIZE;
+
}
return 0;
}
@@ -757,9 +754,9 @@ static inline int cache_attr(pgprot_t attr)
(_PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PWT | _PAGE_PCD);
}
-static int change_page_attr_set_clr(unsigned long addr, int numpages,
+static int change_page_attr_set_clr(unsigned long *addr, int numpages,
pgprot_t mask_set, pgprot_t mask_clr,
- int force_split)
+ int force_split, int array)
{
struct cpa_data cpa;
int ret, cache, checkalias;
@@ -774,21 +771,38 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages,
return 0;
/* Ensure we are PAGE_SIZE aligned */
- if (addr & ~PAGE_MASK) {
- addr &= PAGE_MASK;
- /*
- * People should not be passing in unaligned addresses:
- */
- WARN_ON_ONCE(1);
+ if (!array) {
+ if (*addr & ~PAGE_MASK) {
+ *addr &= PAGE_MASK;
+ /*
+ * People should not be passing in unaligned addresses:
+ */
+ WARN_ON_ONCE(1);
+ }
+ } else {
+ int i;
+ for (i = 0; i < numpages; i++) {
+ if (addr[i] & ~PAGE_MASK) {
+ addr[i] &= PAGE_MASK;
+ WARN_ON_ONCE(1);
+ }
+ }
}
+ /* Must avoid aliasing mappings in the highmem code */
+ kmap_flush_unused();
+
cpa.vaddr = addr;
cpa.numpages = numpages;
cpa.mask_set = mask_set;
cpa.mask_clr = mask_clr;
- cpa.flushtlb = 0;
+ cpa.flags = 0;
+ cpa.curpage = 0;
cpa.force_split = force_split;
+ if (array)
+ cpa.flags |= CPA_ARRAY;
+
/* No alias checking for _NX bit modifications */
checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
@@ -797,7 +811,7 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages,
/*
* Check whether we really changed something:
*/
- if (!cpa.flushtlb)
+ if (!(cpa.flags & CPA_FLUSHTLB))
goto out;
/*
@@ -812,27 +826,30 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages,
* error case we fall back to cpa_flush_all (which uses
* wbindv):
*/
- if (!ret && cpu_has_clflush)
- cpa_flush_range(addr, numpages, cache);
- else
+ if (!ret && cpu_has_clflush) {
+ if (cpa.flags & CPA_ARRAY)
+ cpa_flush_array(addr, numpages, cache);
+ else
+ cpa_flush_range(*addr, numpages, cache);
+ } else
cpa_flush_all(cache);
out:
- cpa_fill_pool(NULL);
-
return ret;
}
-static inline int change_page_attr_set(unsigned long addr, int numpages,
- pgprot_t mask)
+static inline int change_page_attr_set(unsigned long *addr, int numpages,
+ pgprot_t mask, int array)
{
- return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0);
+ return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0,
+ array);
}
-static inline int change_page_attr_clear(unsigned long addr, int numpages,
- pgprot_t mask)
+static inline int change_page_attr_clear(unsigned long *addr, int numpages,
+ pgprot_t mask, int array)
{
- return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0);
+ return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0,
+ array);
}
int _set_memory_uc(unsigned long addr, int numpages)
@@ -840,8 +857,8 @@ int _set_memory_uc(unsigned long addr, int numpages)
/*
* for now UC MINUS. see comments in ioremap_nocache()
*/
- return change_page_attr_set(addr, numpages,
- __pgprot(_PAGE_CACHE_UC_MINUS));
+ return change_page_attr_set(&addr, numpages,
+ __pgprot(_PAGE_CACHE_UC_MINUS), 0);
}
int set_memory_uc(unsigned long addr, int numpages)
@@ -857,10 +874,48 @@ int set_memory_uc(unsigned long addr, int numpages)
}
EXPORT_SYMBOL(set_memory_uc);
+int set_memory_array_uc(unsigned long *addr, int addrinarray)
+{
+ unsigned long start;
+ unsigned long end;
+ int i;
+ /*
+ * for now UC MINUS. see comments in ioremap_nocache()
+ */
+ for (i = 0; i < addrinarray; i++) {
+ start = __pa(addr[i]);
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL))
+ goto out;
+ }
+
+ return change_page_attr_set(addr, addrinarray,
+ __pgprot(_PAGE_CACHE_UC_MINUS), 1);
+out:
+ for (i = 0; i < addrinarray; i++) {
+ unsigned long tmp = __pa(addr[i]);
+
+ if (tmp == start)
+ break;
+ for (end = tmp + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(tmp, end);
+ }
+ return -EINVAL;
+}
+EXPORT_SYMBOL(set_memory_array_uc);
+
int _set_memory_wc(unsigned long addr, int numpages)
{
- return change_page_attr_set(addr, numpages,
- __pgprot(_PAGE_CACHE_WC));
+ return change_page_attr_set(&addr, numpages,
+ __pgprot(_PAGE_CACHE_WC), 0);
}
int set_memory_wc(unsigned long addr, int numpages)
@@ -878,8 +933,8 @@ EXPORT_SYMBOL(set_memory_wc);
int _set_memory_wb(unsigned long addr, int numpages)
{
- return change_page_attr_clear(addr, numpages,
- __pgprot(_PAGE_CACHE_MASK));
+ return change_page_attr_clear(&addr, numpages,
+ __pgprot(_PAGE_CACHE_MASK), 0);
}
int set_memory_wb(unsigned long addr, int numpages)
@@ -890,39 +945,59 @@ int set_memory_wb(unsigned long addr, int numpages)
}
EXPORT_SYMBOL(set_memory_wb);
+int set_memory_array_wb(unsigned long *addr, int addrinarray)
+{
+ int i;
+
+ for (i = 0; i < addrinarray; i++) {
+ unsigned long start = __pa(addr[i]);
+ unsigned long end;
+
+ for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) {
+ if (end != __pa(addr[i + 1]))
+ break;
+ i++;
+ }
+ free_memtype(start, end);
+ }
+ return change_page_attr_clear(addr, addrinarray,
+ __pgprot(_PAGE_CACHE_MASK), 1);
+}
+EXPORT_SYMBOL(set_memory_array_wb);
+
int set_memory_x(unsigned long addr, int numpages)
{
- return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_NX));
+ return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0);
}
EXPORT_SYMBOL(set_memory_x);
int set_memory_nx(unsigned long addr, int numpages)
{
- return change_page_attr_set(addr, numpages, __pgprot(_PAGE_NX));
+ return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0);
}
EXPORT_SYMBOL(set_memory_nx);
int set_memory_ro(unsigned long addr, int numpages)
{
- return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW));
+ return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
}
EXPORT_SYMBOL_GPL(set_memory_ro);
int set_memory_rw(unsigned long addr, int numpages)
{
- return change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW));
+ return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0);
}
EXPORT_SYMBOL_GPL(set_memory_rw);
int set_memory_np(unsigned long addr, int numpages)
{
- return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_PRESENT));
+ return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
}
int set_memory_4k(unsigned long addr, int numpages)
{
- return change_page_attr_set_clr(addr, numpages, __pgprot(0),
- __pgprot(0), 1);
+ return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+ __pgprot(0), 1, 0);
}
int set_pages_uc(struct page *page, int numpages)
@@ -975,22 +1050,38 @@ int set_pages_rw(struct page *page, int numpages)
static int __set_pages_p(struct page *page, int numpages)
{
- struct cpa_data cpa = { .vaddr = (unsigned long) page_address(page),
+ unsigned long tempaddr = (unsigned long) page_address(page);
+ struct cpa_data cpa = { .vaddr = &tempaddr,
.numpages = numpages,
.mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW),
- .mask_clr = __pgprot(0)};
+ .mask_clr = __pgprot(0),
+ .flags = 0};
- return __change_page_attr_set_clr(&cpa, 1);
+ /*
+ * No alias checking needed for setting present flag. otherwise,
+ * we may need to break large pages for 64-bit kernel text
+ * mappings (this adds to complexity if we want to do this from
+ * atomic context especially). Let's keep it simple!
+ */
+ return __change_page_attr_set_clr(&cpa, 0);
}
static int __set_pages_np(struct page *page, int numpages)
{
- struct cpa_data cpa = { .vaddr = (unsigned long) page_address(page),
+ unsigned long tempaddr = (unsigned long) page_address(page);
+ struct cpa_data cpa = { .vaddr = &tempaddr,
.numpages = numpages,
.mask_set = __pgprot(0),
- .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW)};
+ .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+ .flags = 0};
- return __change_page_attr_set_clr(&cpa, 1);
+ /*
+ * No alias checking needed for setting not present flag. otherwise,
+ * we may need to break large pages for 64-bit kernel text
+ * mappings (this adds to complexity if we want to do this from
+ * atomic context especially). Let's keep it simple!
+ */
+ return __change_page_attr_set_clr(&cpa, 0);
}
void kernel_map_pages(struct page *page, int numpages, int enable)
@@ -1010,11 +1101,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
/*
* The return value is ignored as the calls cannot fail.
- * Large pages are kept enabled at boot time, and are
- * split up quickly with DEBUG_PAGEALLOC. If a splitup
- * fails here (due to temporary memory shortage) no damage
- * is done because we just keep the largepage intact up
- * to the next attempt when it will likely be split up:
+ * Large pages for identity mappings are not used at boot time
+ * and hence no memory allocations during large page split.
*/
if (enable)
__set_pages_p(page, numpages);
@@ -1026,53 +1114,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
* but that can deadlock->flush only current cpu:
*/
__flush_tlb_all();
-
- /*
- * Try to refill the page pool here. We can do this only after
- * the tlb flush.
- */
- cpa_fill_pool(NULL);
}
-#ifdef CONFIG_DEBUG_FS
-static int dpa_show(struct seq_file *m, void *v)
-{
- seq_puts(m, "DEBUG_PAGEALLOC\n");
- seq_printf(m, "pool_size : %lu\n", pool_size);
- seq_printf(m, "pool_pages : %lu\n", pool_pages);
- seq_printf(m, "pool_low : %lu\n", pool_low);
- seq_printf(m, "pool_used : %lu\n", pool_used);
- seq_printf(m, "pool_failed : %lu\n", pool_failed);
-
- return 0;
-}
-
-static int dpa_open(struct inode *inode, struct file *filp)
-{
- return single_open(filp, dpa_show, NULL);
-}
-
-static const struct file_operations dpa_fops = {
- .open = dpa_open,
- .read = seq_read,
- .llseek = seq_lseek,
- .release = single_release,
-};
-
-static int __init debug_pagealloc_proc_init(void)
-{
- struct dentry *de;
-
- de = debugfs_create_file("debug_pagealloc", 0600, NULL, NULL,
- &dpa_fops);
- if (!de)
- return -ENOMEM;
-
- return 0;
-}
-__initcall(debug_pagealloc_proc_init);
-#endif
-
#ifdef CONFIG_HIBERNATION
bool kernel_page_present(struct page *page)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 2a50e0f..738fd0f 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -7,24 +7,24 @@
* Loosely based on earlier PAT patchset from Eric Biederman and Andi Kleen.
*/
-#include <linux/mm.h>
+#include <linux/seq_file.h>
+#include <linux/bootmem.h>
+#include <linux/debugfs.h>
#include <linux/kernel.h>
#include <linux/gfp.h>
+#include <linux/mm.h>
#include <linux/fs.h>
-#include <linux/bootmem.h>
-#include <linux/debugfs.h>
-#include <linux/seq_file.h>
-#include <asm/msr.h>
-#include <asm/tlbflush.h>
+#include <asm/cacheflush.h>
#include <asm/processor.h>
-#include <asm/page.h>
+#include <asm/tlbflush.h>
#include <asm/pgtable.h>
-#include <asm/pat.h>
-#include <asm/e820.h>
-#include <asm/cacheflush.h>
#include <asm/fcntl.h>
+#include <asm/e820.h>
#include <asm/mtrr.h>
+#include <asm/page.h>
+#include <asm/msr.h>
+#include <asm/pat.h>
#include <asm/io.h>
#ifdef CONFIG_X86_PAT
@@ -46,6 +46,7 @@ early_param("nopat", nopat);
static int debug_enable;
+
static int __init pat_debug_setup(char *str)
{
debug_enable = 1;
@@ -145,14 +146,14 @@ static char *cattr_name(unsigned long flags)
*/
struct memtype {
- u64 start;
- u64 end;
- unsigned long type;
- struct list_head nd;
+ u64 start;
+ u64 end;
+ unsigned long type;
+ struct list_head nd;
};
static LIST_HEAD(memtype_list);
-static DEFINE_SPINLOCK(memtype_lock); /* protects memtype list */
+static DEFINE_SPINLOCK(memtype_lock); /* protects memtype list */
/*
* Does intersection of PAT memory type and MTRR memory type and returns
@@ -180,8 +181,8 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end, unsigned long req_type)
return req_type;
}
-static int chk_conflict(struct memtype *new, struct memtype *entry,
- unsigned long *type)
+static int
+chk_conflict(struct memtype *new, struct memtype *entry, unsigned long *type)
{
if (new->type != entry->type) {
if (type) {
@@ -211,6 +212,66 @@ static struct memtype *cached_entry;
static u64 cached_start;
/*
+ * For RAM pages, mark the pages as non WB memory type using
+ * PageNonWB (PG_arch_1). We allow only one set_memory_uc() or
+ * set_memory_wc() on a RAM page at a time before marking it as WB again.
+ * This is ok, because only one driver will be owning the page and
+ * doing set_memory_*() calls.
+ *
+ * For now, we use PageNonWB to track that the RAM page is being mapped
+ * as non WB. In future, we will have to use one more flag
+ * (or some other mechanism in page_struct) to distinguish between
+ * UC and WC mapping.
+ */
+static int reserve_ram_pages_type(u64 start, u64 end, unsigned long req_type,
+ unsigned long *new_type)
+{
+ struct page *page;
+ u64 pfn, end_pfn;
+
+ for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
+ page = pfn_to_page(pfn);
+ if (page_mapped(page) || PageNonWB(page))
+ goto out;
+
+ SetPageNonWB(page);
+ }
+ return 0;
+
+out:
+ end_pfn = pfn;
+ for (pfn = (start >> PAGE_SHIFT); pfn < end_pfn; ++pfn) {
+ page = pfn_to_page(pfn);
+ ClearPageNonWB(page);
+ }
+
+ return -EINVAL;
+}
+
+static int free_ram_pages_type(u64 start, u64 end)
+{
+ struct page *page;
+ u64 pfn, end_pfn;
+
+ for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
+ page = pfn_to_page(pfn);
+ if (page_mapped(page) || !PageNonWB(page))
+ goto out;
+
+ ClearPageNonWB(page);
+ }
+ return 0;
+
+out:
+ end_pfn = pfn;
+ for (pfn = (start >> PAGE_SHIFT); pfn < end_pfn; ++pfn) {
+ page = pfn_to_page(pfn);
+ SetPageNonWB(page);
+ }
+ return -EINVAL;
+}
+
+/*
* req_type typically has one of the:
* - _PAGE_CACHE_WB
* - _PAGE_CACHE_WC
@@ -226,14 +287,15 @@ static u64 cached_start;
* it will return a negative return value.
*/
int reserve_memtype(u64 start, u64 end, unsigned long req_type,
- unsigned long *new_type)
+ unsigned long *new_type)
{
struct memtype *new, *entry;
unsigned long actual_type;
struct list_head *where;
+ int is_range_ram;
int err = 0;
- BUG_ON(start >= end); /* end is exclusive */
+ BUG_ON(start >= end); /* end is exclusive */
if (!pat_enabled) {
/* This is identical to page table setting without PAT */
@@ -266,17 +328,24 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
actual_type = _PAGE_CACHE_WB;
else
actual_type = _PAGE_CACHE_UC_MINUS;
- } else
+ } else {
actual_type = pat_x_mtrr_type(start, end,
req_type & _PAGE_CACHE_MASK);
+ }
+
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return reserve_ram_pages_type(start, end, req_type, new_type);
+ else if (is_range_ram < 0)
+ return -EINVAL;
new = kmalloc(sizeof(struct memtype), GFP_KERNEL);
if (!new)
return -ENOMEM;
- new->start = start;
- new->end = end;
- new->type = actual_type;
+ new->start = start;
+ new->end = end;
+ new->type = actual_type;
if (new_type)
*new_type = actual_type;
@@ -335,6 +404,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
start, end, cattr_name(new->type), cattr_name(req_type));
kfree(new);
spin_unlock(&memtype_lock);
+
return err;
}
@@ -358,6 +428,7 @@ int free_memtype(u64 start, u64 end)
{
struct memtype *entry;
int err = -EINVAL;
+ int is_range_ram;
if (!pat_enabled)
return 0;
@@ -366,6 +437,12 @@ int free_memtype(u64 start, u64 end)
if (is_ISA_range(start, end - 1))
return 0;
+ is_range_ram = pagerange_is_ram(start, end);
+ if (is_range_ram == 1)
+ return free_ram_pages_type(start, end);
+ else if (is_range_ram < 0)
+ return -EINVAL;
+
spin_lock(&memtype_lock);
list_for_each_entry(entry, &memtype_list, nd) {
if (entry->start == start && entry->end == end) {
@@ -386,6 +463,7 @@ int free_memtype(u64 start, u64 end)
}
dprintk("free_memtype request 0x%Lx-0x%Lx\n", start, end);
+
return err;
}
@@ -492,9 +570,9 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
void map_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
{
+ unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
u64 addr = (u64)pfn << PAGE_SHIFT;
unsigned long flags;
- unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK);
reserve_memtype(addr, addr + size, want_flags, &flags);
if (flags != want_flags) {
@@ -514,7 +592,7 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
free_memtype(addr, addr + size);
}
-#if defined(CONFIG_DEBUG_FS)
+#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
/* get Nth element of the linked list */
static struct memtype *memtype_get_idx(loff_t pos)
@@ -537,6 +615,7 @@ static struct memtype *memtype_get_idx(loff_t pos)
}
spin_unlock(&memtype_lock);
kfree(print_entry);
+
return NULL;
}
@@ -567,6 +646,7 @@ static int memtype_seq_show(struct seq_file *seq, void *v)
seq_printf(seq, "%s @ 0x%Lx-0x%Lx\n", cattr_name(print_entry->type),
print_entry->start, print_entry->end);
kfree(print_entry);
+
return 0;
}
@@ -598,4 +678,4 @@ static int __init pat_memtype_list_init(void)
late_initcall(pat_memtype_list_init);
-#endif /* CONFIG_DEBUG_FS */
+#endif /* CONFIG_DEBUG_FS && CONFIG_X86_PAT */
diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index 4bada0e..46f5075 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -116,7 +116,9 @@ struct agp_bridge_driver {
struct agp_memory *(*alloc_by_type) (size_t, int);
void (*free_by_type)(struct agp_memory *);
void *(*agp_alloc_page)(struct agp_bridge_data *);
+ int (*agp_alloc_pages)(struct agp_bridge_data *, struct agp_memory *, size_t);
void (*agp_destroy_page)(void *, int flags);
+ void (*agp_destroy_pages)(struct agp_memory *);
int (*agp_type_to_mask_type) (struct agp_bridge_data *, int);
void (*chipset_flush)(struct agp_bridge_data *);
};
@@ -277,7 +279,10 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type);
struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type);
void agp_generic_free_by_type(struct agp_memory *curr);
void *agp_generic_alloc_page(struct agp_bridge_data *bridge);
+int agp_generic_alloc_pages(struct agp_bridge_data *agp_bridge,
+ struct agp_memory *memory, size_t page_count);
void agp_generic_destroy_page(void *addr, int flags);
+void agp_generic_destroy_pages(struct agp_memory *memory);
void agp_free_key(int key);
int agp_num_entries(void);
u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command);
diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c
index 5da89f6..5ea4da8 100644
--- a/drivers/char/agp/alpha-agp.c
+++ b/drivers/char/agp/alpha-agp.c
@@ -143,7 +143,9 @@ struct agp_bridge_driver alpha_core_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c
index e280531..603a986 100644
--- a/drivers/char/agp/amd-k7-agp.c
+++ b/drivers/char/agp/amd-k7-agp.c
@@ -386,7 +386,9 @@ static const struct agp_bridge_driver amd_irongate_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
index 7495c52..2812ee2 100644
--- a/drivers/char/agp/amd64-agp.c
+++ b/drivers/char/agp/amd64-agp.c
@@ -224,7 +224,9 @@ static const struct agp_bridge_driver amd_8151_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/ati-agp.c b/drivers/char/agp/ati-agp.c
index 6ecbcaf..ae2791b 100644
--- a/drivers/char/agp/ati-agp.c
+++ b/drivers/char/agp/ati-agp.c
@@ -418,7 +418,9 @@ static const struct agp_bridge_driver ati_generic_bridge = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/efficeon-agp.c b/drivers/char/agp/efficeon-agp.c
index 8ca6f26..453543a 100644
--- a/drivers/char/agp/efficeon-agp.c
+++ b/drivers/char/agp/efficeon-agp.c
@@ -335,7 +335,9 @@ static const struct agp_bridge_driver efficeon_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 118dbde..10d6cbd 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -201,14 +201,22 @@ void agp_free_memory(struct agp_memory *curr)
return;
}
if (curr->page_count != 0) {
- for (i = 0; i < curr->page_count; i++) {
- curr->memory[i] = (unsigned long)gart_to_virt(curr->memory[i]);
- curr->bridge->driver->agp_destroy_page((void *)curr->memory[i],
- AGP_PAGE_DESTROY_UNMAP);
- }
- for (i = 0; i < curr->page_count; i++) {
- curr->bridge->driver->agp_destroy_page((void *)curr->memory[i],
- AGP_PAGE_DESTROY_FREE);
+ if (curr->bridge->driver->agp_destroy_pages) {
+ curr->bridge->driver->agp_destroy_pages(curr);
+ } else {
+
+ for (i = 0; i < curr->page_count; i++) {
+ curr->memory[i] = (unsigned long)gart_to_virt(
+ curr->memory[i]);
+ curr->bridge->driver->agp_destroy_page(
+ (void *)curr->memory[i],
+ AGP_PAGE_DESTROY_UNMAP);
+ }
+ for (i = 0; i < curr->page_count; i++) {
+ curr->bridge->driver->agp_destroy_page(
+ (void *)curr->memory[i],
+ AGP_PAGE_DESTROY_FREE);
+ }
}
}
agp_free_key(curr->key);
@@ -264,6 +272,15 @@ struct agp_memory *agp_allocate_memory(struct agp_bridge_data *bridge,
if (new == NULL)
return NULL;
+ if (bridge->driver->agp_alloc_pages) {
+ if (bridge->driver->agp_alloc_pages(bridge, new, page_count)) {
+ agp_free_memory(new);
+ return NULL;
+ }
+ new->bridge = bridge;
+ return new;
+ }
+
for (i = 0; i < page_count; i++) {
void *addr = bridge->driver->agp_alloc_page(bridge);
@@ -1203,6 +1220,39 @@ EXPORT_SYMBOL(agp_generic_alloc_user);
* against a maximum value.
*/
+int agp_generic_alloc_pages(struct agp_bridge_data *bridge, struct agp_memory *mem, size_t num_pages)
+{
+ struct page * page;
+ int i, ret = -ENOMEM;
+
+ for (i = 0; i < num_pages; i++) {
+ page = alloc_page(GFP_KERNEL | GFP_DMA32);
+ /* agp_free_memory() needs gart address */
+ if (page == NULL)
+ goto out;
+
+#ifndef CONFIG_X86
+ map_page_into_agp(page);
+#endif
+ get_page(page);
+ atomic_inc(&agp_bridge->current_memory_agp);
+
+ /* set_memory_array_uc() needs virtual address */
+ mem->memory[i] = (unsigned long)page_address(page);
+ mem->page_count++;
+ }
+
+#ifdef CONFIG_X86
+ set_memory_array_uc(mem->memory, num_pages);
+#endif
+ ret = 0;
+out:
+ for (i = 0; i < mem->page_count; i++)
+ mem->memory[i] = virt_to_gart((void *)mem->memory[i]);
+ return ret;
+}
+EXPORT_SYMBOL(agp_generic_alloc_pages);
+
void *agp_generic_alloc_page(struct agp_bridge_data *bridge)
{
struct page * page;
@@ -1219,6 +1269,37 @@ void *agp_generic_alloc_page(struct agp_bridge_data *bridge)
}
EXPORT_SYMBOL(agp_generic_alloc_page);
+void agp_generic_destroy_pages(struct agp_memory *mem)
+{
+ int i;
+ void *addr;
+ struct page *page;
+
+ if (!mem)
+ return;
+
+ for (i = 0; i < mem->page_count; i++)
+ mem->memory[i] = (unsigned long)gart_to_virt(mem->memory[i]);
+
+#ifdef CONFIG_X86
+ set_memory_array_wb(mem->memory, mem->page_count);
+#endif
+
+ for (i = 0; i < mem->page_count; i++) {
+ addr = (void *)mem->memory[i];
+ page = virt_to_page(addr);
+
+#ifndef CONFIG_X86
+ unmap_page_from_agp(page);
+#endif
+
+ put_page(page);
+ free_page((unsigned long)addr);
+ atomic_dec(&agp_bridge->current_memory_agp);
+ mem->memory[i] = 0;
+ }
+}
+EXPORT_SYMBOL(agp_generic_destroy_pages);
void agp_generic_destroy_page(void *addr, int flags)
{
diff --git a/drivers/char/agp/hp-agp.c b/drivers/char/agp/hp-agp.c
index 80d7317..183ac3f 100644
--- a/drivers/char/agp/hp-agp.c
+++ b/drivers/char/agp/hp-agp.c
@@ -435,7 +435,9 @@ const struct agp_bridge_driver hp_zx1_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
diff --git a/drivers/char/agp/i460-agp.c b/drivers/char/agp/i460-agp.c
index e587eeb..10da687 100644
--- a/drivers/char/agp/i460-agp.c
+++ b/drivers/char/agp/i460-agp.c
@@ -575,7 +575,9 @@ const struct agp_bridge_driver intel_i460_driver = {
.insert_memory = i460_insert_memory_small_io_page,
.remove_memory = i460_remove_memory_small_io_page,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
#endif
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index 016fdf0..043e366 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -1711,7 +1711,9 @@ static const struct agp_bridge_driver intel_generic_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1736,7 +1738,9 @@ static const struct agp_bridge_driver intel_810_driver = {
.alloc_by_type = intel_i810_alloc_by_type,
.free_by_type = intel_i810_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1760,7 +1764,9 @@ static const struct agp_bridge_driver intel_815_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1785,7 +1791,9 @@ static const struct agp_bridge_driver intel_830_driver = {
.alloc_by_type = intel_i830_alloc_by_type,
.free_by_type = intel_i810_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = intel_i830_type_to_mask_type,
.chipset_flush = intel_i830_chipset_flush,
};
@@ -1810,7 +1818,9 @@ static const struct agp_bridge_driver intel_820_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1834,7 +1844,9 @@ static const struct agp_bridge_driver intel_830mp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1858,7 +1870,9 @@ static const struct agp_bridge_driver intel_840_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1882,7 +1896,9 @@ static const struct agp_bridge_driver intel_845_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.chipset_flush = intel_i830_chipset_flush,
};
@@ -1907,7 +1923,9 @@ static const struct agp_bridge_driver intel_850_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1931,7 +1949,9 @@ static const struct agp_bridge_driver intel_860_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -1956,7 +1976,9 @@ static const struct agp_bridge_driver intel_915_driver = {
.alloc_by_type = intel_i830_alloc_by_type,
.free_by_type = intel_i810_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = intel_i830_type_to_mask_type,
.chipset_flush = intel_i915_chipset_flush,
};
@@ -1982,7 +2004,9 @@ static const struct agp_bridge_driver intel_i965_driver = {
.alloc_by_type = intel_i830_alloc_by_type,
.free_by_type = intel_i810_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = intel_i830_type_to_mask_type,
.chipset_flush = intel_i915_chipset_flush,
};
@@ -2007,7 +2031,9 @@ static const struct agp_bridge_driver intel_7505_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -2032,7 +2058,9 @@ static const struct agp_bridge_driver intel_g33_driver = {
.alloc_by_type = intel_i830_alloc_by_type,
.free_by_type = intel_i810_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = intel_i830_type_to_mask_type,
.chipset_flush = intel_i915_chipset_flush,
};
diff --git a/drivers/char/agp/nvidia-agp.c b/drivers/char/agp/nvidia-agp.c
index eaceb61..dc70d37 100644
--- a/drivers/char/agp/nvidia-agp.c
+++ b/drivers/char/agp/nvidia-agp.c
@@ -312,7 +312,9 @@ static const struct agp_bridge_driver nvidia_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/parisc-agp.c b/drivers/char/agp/parisc-agp.c
index 8c42dcc..f2492ec 100644
--- a/drivers/char/agp/parisc-agp.c
+++ b/drivers/char/agp/parisc-agp.c
@@ -224,7 +224,9 @@ static const struct agp_bridge_driver parisc_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
diff --git a/drivers/char/agp/sis-agp.c b/drivers/char/agp/sis-agp.c
index 2587ef9..6c3837a 100644
--- a/drivers/char/agp/sis-agp.c
+++ b/drivers/char/agp/sis-agp.c
@@ -140,7 +140,9 @@ static struct agp_bridge_driver sis_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/sworks-agp.c b/drivers/char/agp/sworks-agp.c
index 2fb27fe..6224df8 100644
--- a/drivers/char/agp/sworks-agp.c
+++ b/drivers/char/agp/sworks-agp.c
@@ -437,7 +437,9 @@ static const struct agp_bridge_driver sworks_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/drivers/char/agp/uninorth-agp.c b/drivers/char/agp/uninorth-agp.c
index eef7270..0f004b6 100644
--- a/drivers/char/agp/uninorth-agp.c
+++ b/drivers/char/agp/uninorth-agp.c
@@ -509,7 +509,9 @@ const struct agp_bridge_driver uninorth_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
};
@@ -534,7 +536,9 @@ const struct agp_bridge_driver u3_agp_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
.cant_use_aperture = true,
.needs_scratch_page = true,
diff --git a/drivers/char/agp/via-agp.c b/drivers/char/agp/via-agp.c
index 7b36476..9f4d49e 100644
--- a/drivers/char/agp/via-agp.c
+++ b/drivers/char/agp/via-agp.c
@@ -190,7 +190,9 @@ static const struct agp_bridge_driver via_agp3_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
@@ -214,7 +216,9 @@ static const struct agp_bridge_driver via_driver = {
.alloc_by_type = agp_generic_alloc_by_type,
.free_by_type = agp_generic_free_by_type,
.agp_alloc_page = agp_generic_alloc_page,
+ .agp_alloc_pages = agp_generic_alloc_pages,
.agp_destroy_page = agp_generic_destroy_page,
+ .agp_destroy_pages = agp_generic_destroy_pages,
.agp_type_to_mask_type = agp_generic_type_to_mask_type,
};
diff --git a/include/asm-x86/cacheflush.h b/include/asm-x86/cacheflush.h
index 59859cb..68840ef 100644
--- a/include/asm-x86/cacheflush.h
+++ b/include/asm-x86/cacheflush.h
@@ -24,6 +24,8 @@
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy((dst), (src), (len))
+#define PG_non_WB PG_arch_1
+PAGEFLAG(NonWB, non_WB)
/*
* The set_memory_* API can be used to change various attributes of a virtual
@@ -66,6 +68,9 @@ int set_memory_rw(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_array_uc(unsigned long *addr, int addrinarray);
+int set_memory_array_wb(unsigned long *addr, int addrinarray);
+
/*
* For legacy compatibility with the old APIs, a few functions
* are provided that work on a "struct page".
@@ -96,8 +101,6 @@ int set_pages_rw(struct page *page, int numpages);
void clflush_cache_range(void *addr, unsigned int size);
-void cpa_init(void);
-
#ifdef CONFIG_DEBUG_RODATA
void mark_rodata_ro(void);
extern const int rodata_test_data;
diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h
index 79544e6..c915747 100644
--- a/include/asm-x86/page.h
+++ b/include/asm-x86/page.h
@@ -57,6 +57,7 @@ typedef struct { pgdval_t pgd; } pgd_t;
typedef struct { pgprotval_t pgprot; } pgprot_t;
extern int page_is_ram(unsigned long pagenr);
+extern int pagerange_is_ram(unsigned long start, unsigned long end);
extern int devmem_is_allowed(unsigned long pagenr);
extern void map_devmem(unsigned long pfn, unsigned long size,
pgprot_t vma_prot);
diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
index 888add7..ed93245 100644
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -19,6 +19,7 @@
#define _PAGE_BIT_UNUSED3 11
#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */
#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1
+#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
#define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
@@ -36,6 +37,7 @@
#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT)
#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
+#define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST)
#define __HAVE_ARCH_PTE_SPECIAL
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
@@ -130,6 +132,17 @@
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC
+/*
+ * early identity mapping pte attrib macros.
+ */
+#ifdef CONFIG_X86_64
+#define __PAGE_KERNEL_IDENT_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC
+#else
+#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */
+#define PDE_IDENT_ATTR 0x063 /* PRESENT+RW+DIRTY+ACCESSED */
+#define PGD_IDENT_ATTR 0x001 /* PRESENT (no other attributes) */
+#endif
+
#ifndef __ASSEMBLY__
/*
diff --git a/mm/highmem.c b/mm/highmem.c
index e16e152..b36b83b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -70,6 +70,7 @@ static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
static void flush_all_zero_pkmaps(void)
{
int i;
+ int need_flush = 0;
flush_cache_kmaps();
@@ -101,8 +102,10 @@ static void flush_all_zero_pkmaps(void)
&pkmap_page_table[i]);
set_page_address(page, NULL);
+ need_flush = 1;
}
- flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+ if (need_flush)
+ flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
}
/**
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-09 23:49 [git pull] x86 updates for v2.6.28, phase #2 - PAT updates Ingo Molnar @ 2008-10-10 16:27 ` Linus Torvalds 2008-10-10 16:45 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2008-10-10 16:27 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin On Fri, 10 Oct 2008, Ingo Molnar wrote: > > Ingo Molnar (12): > Revert "reduce tlb/cache flush times of agpgart memory allocation" > Revert "introduce two APIs for page attribute" > Revert "x86: handle error returns in set_memory_*()" > Revert "x86: track memtype for RAM in page struct" > Revert "x86, cpa: global flush tlb after splitting large page and before doing cpa" > Revert "x86, cpa: remove cpa pool code" > Revert "x86, cpa: fix taking the pgd_lock with interrupts off" > Revert "x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC" > Revert "x86, cpa: make the kernel physical mapping initialization a two pass sequence" > Revert "x86, cpa: remove USER permission from the very early identity mapping attribute" > Revert "x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot" > x86, pat: cleanups So half of the commits by Suresh were reverted. Not only that, they were reverted WITH ABSOLUTELY NO EXPLANATIONS OF WHY THEY WERE CLEARLY BUGGY PILES OF CRUD. The revert messages are just things like This reverts commit <sha1>. which makes both the original commit _and_ the revert just totally pointless, because we didn't learn anything. So tell me exactly why I should pull this series again? Why should I ever pull _any_ tree that cannot explain its commits? Next time you revert, explain _why_. And if a tree has this many reverts (and by "this many" I mean "dammit, there are three times as many commits as there should be, because a third of the commits are crap, a third of the commits are reverts of that crap, and a third of the commits actually remain"), why should I pull it again? In other words, why shouldn't I just think that this whole branch is a total failure, which is what I think right now? Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 16:27 ` Linus Torvalds @ 2008-10-10 16:45 ` Ingo Molnar 2008-10-10 16:56 ` Ingo Molnar 2008-10-10 17:04 ` Ingo Molnar 0 siblings, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 16:45 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, 10 Oct 2008, Ingo Molnar wrote: > > > > Ingo Molnar (12): > > Revert "reduce tlb/cache flush times of agpgart memory allocation" > > Revert "introduce two APIs for page attribute" > > Revert "x86: handle error returns in set_memory_*()" > > Revert "x86: track memtype for RAM in page struct" > > Revert "x86, cpa: global flush tlb after splitting large page and before doing cpa" > > Revert "x86, cpa: remove cpa pool code" > > Revert "x86, cpa: fix taking the pgd_lock with interrupts off" > > Revert "x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC" > > Revert "x86, cpa: make the kernel physical mapping initialization a two pass sequence" > > Revert "x86, cpa: remove USER permission from the very early identity mapping attribute" > > Revert "x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot" > > x86, pat: cleanups > > So half of the commits by Suresh were reverted. > > Not only that, they were reverted WITH ABSOLUTELY NO EXPLANATIONS OF > WHY THEY WERE CLEARLY BUGGY PILES OF CRUD. The revert messages are > just things like > > This reverts commit <sha1>. > > which makes both the original commit _and_ the revert just totally > pointless, because we didn't learn anything. hm, those reverts werent supposed to survive. Again, my bad. I'll clean it out. Here is how the screwup happened: a series was sent, i applied it, found a test failure with it and reported it: http://www.uwsg.iu.edu/hypermail/linux/kernel/0809.1/2388.html but to be able to continue testing i temporarily reverted those bits manually in reverse order, because it took some down to pin down the breakage and these bits got intermixed with other commits - and i did not want to rebase commits that came after the broken series. Then the corrected v2 series arrived (not a delta fix) and i applied those. The idea was to create a delta patch against the first series so that i can see the changes. v2 worked well in testing: http://www.uwsg.iu.edu/hypermail/linux/kernel/0809.1/2375.html So it's basically a v1 -> v2 sequence, with artificial reverts surviving unintentionally. That's why there were no revert messages either: i never intended them to become public. I even rember having taken a good look at "git diff 6b5b551..6e3e492", which the v1->v2 delta was. What i havent done was to squash all these artificial commits together and create the delta commit - this is one of the few cases where rebasing of that temporary tail would have been the right thing to do, before pushing it out. And i should also have noticed this weird sequence of commit logs when doing the pull request. Sorry :-( Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 16:45 ` Ingo Molnar @ 2008-10-10 16:56 ` Ingo Molnar 2008-10-10 17:23 ` Linus Torvalds 2008-10-10 17:04 ` Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 16:56 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin * Ingo Molnar <mingo@elte.hu> wrote: > > which makes both the original commit _and_ the revert just totally > > pointless, because we didn't learn anything. > > hm, those reverts werent supposed to survive. Again, my bad. I'll > clean it out. > > Here is how the screwup happened: a series was sent, i applied it, > found a test failure with it and reported it: > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0809.1/2388.html > > but to be able to continue testing i temporarily reverted those bits > manually in reverse order, because it took some down to pin down the > breakage and these bits got intermixed with other commits - and i did > not want to rebase commits that came after the broken series. Correction: i did the reverts shortly before applying the v2 series - so it was fully technical. I wanted to do a line by line review of the fix (and did it), because the lockup took quite some time to pin down. To avoid this mistake i could also have hard-reset the full v1 series and could have redone the whole topic. What's the best Git way to avoid such mishaps? I try to avoid resets and rebases of already pushed out bits as much as possible, because that is both risky because information can be lost and asocial because ongoing work of contributors can be disturbed. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 16:56 ` Ingo Molnar @ 2008-10-10 17:23 ` Linus Torvalds 0 siblings, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2008-10-10 17:23 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin On Fri, 10 Oct 2008, Ingo Molnar wrote: > > Correction: i did the reverts shortly before applying the v2 series - so > it was fully technical. I wanted to do a line by line review of the fix > (and did it), because the lockup took quite some time to pin down. To > avoid this mistake i could also have hard-reset the full v1 series and > could have redone the whole topic. > > What's the best Git way to avoid such mishaps? I try to avoid resets and > rebases of already pushed out bits as much as possible, because that is > both risky because information can be lost and asocial because ongoing > work of contributors can be disturbed. There's a couple of things you could have done. The simplest approach is to kill the branch. Of course, you could do that with just a "git reset", but if you had already pushed the branch out and it wasn't one of the "gets rebased" branches, that's not very nice either. But rather than doing "git reset" on an old branch, what I think is nicer is to just create a new branch instead, and call it "PAT-v2" or something like that - leaving the old one around, but simply never _using_ it for anything (and obviously not merging it). That seems to match how you work otherwise. Just opening another branch has the advantage that it also allows you to easily do cross-branch compares. It's actually how I work a lot personally when I work with the git repo and do some new thing: I start a topic branch for it, but it's inevitably going to be ugly in the end because I'm finding problems as I go, so I just start another branch and basically re-create a cleaner version in that. (I don't end up merging with Junio with git - I just send him patches, so all those branches end up being just private things, but it doesn't change the basic workflow). And the nice thing with starting a new branch is that you can do your cleanups etc in it, and then things like git diff broken..new-branch work fine to check the differences _between_ branches. In other words, this is also a good way to just take a "v2" of a patch-series, and then you basically get a inter-patch-series diff for free that shows what the differences were. Of cource, once you do that, you can also decide that you're not actually going to use the new branch, you're just going to append the fixes from v2 to the old branch instead. So you may decide that your new branch (that you never pushed out) is not worth making a new branch for, and that it's actually going to be easier for people involved if you just turn the second patch-series into a single (or multiple, for that matter) fix-commit on top of the old patch-series. Finally, one option that I do not recommend in this case (simply because all of the commits got rewritten) is to actually keep both branches, but do a merge between them and just pick the correct end result in the merge. That can be useful as a way to show both histories, but I'd generally suggest that only if there is some big reason why you'd really want to have both sets of patches around later. A reason for the last case could be two different approaches that were both valid, and actually ended up both with a very similar end result, and where the way they got to it was interesting and/or relevant exactly because they were different. Or even if they weren't that technically different, maybe they were done by different people, and keeping both ends up showing the real development history _and_ gives credit to both. There are other ways to do this too, but they all boil down to the "make a new branch that starts at the common point". For example, instead of creating a diff with "git diff broken..fixed" (whether to just see what the breakage was, or to actually fix up the broken branch with), if the series are very similar in structure and in many of the sub-commits, you can end up using "git rebase" to rebase the fixed sequence on top of the broken branch. That last "git rebase" approach will generally cause a lot of conflicts that you have to fix up by hand (since by definition the two branches will obviously be touching the same area, and in slightly different ways), but sometimes that can actually be exactly what you _want_, since the merge conflicts are going to be exactly the things that changed, and if the series was clean and didn't have lots of patches where one patch in the series changed something that an earlier patch already modified, then the rebase can be a nice way to get the incremental fixes on top of the old broken branch from the fixed one. So there are certainly different things you can do. Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 16:45 ` Ingo Molnar 2008-10-10 16:56 ` Ingo Molnar @ 2008-10-10 17:04 ` Ingo Molnar 2008-10-10 17:34 ` Linus Torvalds 2008-10-10 17:41 ` [git pull, take 2] " Ingo Molnar 1 sibling, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 17:04 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin * Ingo Molnar <mingo@elte.hu> wrote: > v2 worked well in testing: > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0809.1/2375.html > > So it's basically a v1 -> v2 sequence, with artificial reverts > surviving unintentionally. That's why there were no revert messages > either: i never intended them to become public. I even rember having > taken a good look at "git diff 6b5b551..6e3e492", which the v1->v2 > delta was. here's the linearized history of the commits (older commits go last): 6e3e492: x86: track memtype for RAM in page struct 09543b6d: x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa eaaa6c4: x86, cpa: remove cpa pool code ef91bf0: x86, cpa: no need to check alias for __set_pages_p/__set_pages_np 8e1701f: x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC 065ae73: x86, cpa: make the kernel physical mapping initialization a two pass sequence eb60b73: x86, cpa: remove USER permission from the very early identity mapping attribute fabf99f: x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot abfae7f6: Revert "x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot" f279399: Revert "x86, cpa: remove USER permission from the very early identity mapping attribute" 153eb23: Revert "x86, cpa: make the kernel physical mapping initialization a two pass sequence" f5c3489: Revert "x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC" c613e65: Revert "x86, cpa: fix taking the pgd_lock with interrupts off" e64b1f1: Revert "x86, cpa: remove cpa pool code" 94448e3: Revert "x86, cpa: global flush tlb after splitting large page and before doing cpa" aca9b5f: Revert "x86: track memtype for RAM in page struct" 6375b67: Revert "x86: handle error returns in set_memory_*()" 6b5b551: x86: handle error returns in set_memory_*() 5f25f5b: x86: track memtype for RAM in page struct 3196625: x86, cpa: global flush tlb after splitting large page and before doing cpa 79abc89: x86, cpa: remove cpa pool code e96d59b: x86, cpa: fix taking the pgd_lock with interrupts off 888fdd9: x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC e579af6: x86, cpa: make the kernel physical mapping initialization a two pass sequence c86eefc: x86, cpa: remove USER permission from the very early identity mapping attribute e8729a5: x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot f81b691: Merge commit 'v2.6.27-rc6' into x86/pat e8729a5..6b5b551 was v1, fabf99f..6e3e492 is v2. The reverts in 6375b67..abfae7f6 are a mirror image of v1 and were supposed to be squashed together into a delta commit. the time gap between v1 and v2 was rather large, 10 days: CommitDate: Sun Sep 14 17:50:41 2008 +0200 CommitDate: Wed Sep 24 10:07:52 2008 +0200 because it took time to find and track down the lockup, and it also took time to come up with the fix. So i definitely wanted a delta fix (we handle another 150 append-only commits in a 10 day timespan so a delta fix is the right approach), but messed up its generation. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 17:04 ` Ingo Molnar @ 2008-10-10 17:34 ` Linus Torvalds 2008-10-10 17:41 ` [git pull, take 2] " Ingo Molnar 1 sibling, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2008-10-10 17:34 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin On Fri, 10 Oct 2008, Ingo Molnar wrote: > > the time gap between v1 and v2 was rather large, 10 days: > > CommitDate: Sun Sep 14 17:50:41 2008 +0200 > CommitDate: Wed Sep 24 10:07:52 2008 +0200 > > because it took time to find and track down the lockup, and it also took > time to come up with the fix. So i definitely wanted a delta fix (we > handle another 150 append-only commits in a 10 day timespan so a delta > fix is the right approach), but messed up its generation. So because you wanted a delta fix, and if you want to keep the old series, then yes, creating a new (local-to-your-repository-only, throw-away) branch with the new "v2" series of patches, and then just applying them on top of the old broken series as incremental fixes would probably be the right thing to do. With the reverts, there's not only all that useless noise in the logs, the problem is that all that extra crud is actually _hiding_ what the real fix was. So instead of seeing the fix, we just see a revert and a totally new version. That's pretty pessimal - the whole sequence isn't bisectable anyway. So what I suggest are good alternatives would be: - Make things bisectable by just using a totally new branch (and dropping the old branch). This just throws out the buggy version, since it never gets merged upstream, and as such nobody will ever really care what the bug was. This obviously depends on _others_ not having based their work on the broken crud. - Make nice incremental fix patches, rather than "revert and redo". This isn't bisectable, but on the other hand it can certainly be educational, and "git blame" etc will give reasonable results, and people seeing the history can at least see what the bug was. I tried to outline a couple of ways to have git help you do the second version in the previous email. It's still going to be somewhat manual, but especially if the difference between patch-series wasn't _that_ big, a simple sequence of git checkout broken git diff broken..fixed | git apply .. see how you want to commit the fix - maybe as one thing, maybe as a series of independent fixes depending on the size of it .. will get you there with fairly little pain. Of course, if the branch had other commits to it than just the broken series, you will need to cherry-pick those other commits to the "fixed" branch too, so that the difference between broken and fixed is literally just the difference between the two patch-series. Hmm? Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* [git pull, take 2] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 17:04 ` Ingo Molnar 2008-10-10 17:34 ` Linus Torvalds @ 2008-10-10 17:41 ` Ingo Molnar 2008-10-10 18:15 ` Linus Torvalds 2008-10-10 18:17 ` Ingo Molnar 1 sibling, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 17:41 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin in other words, this linear commit range: abfae7f6: Revert "x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot" f279399: Revert "x86, cpa: remove USER permission from the very early identity mapping attribute" 153eb23: Revert "x86, cpa: make the kernel physical mapping initialization a two pass sequence" f5c3489: Revert "x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC" c613e65: Revert "x86, cpa: fix taking the pgd_lock with interrupts off" e64b1f1: Revert "x86, cpa: remove cpa pool code" 94448e3: Revert "x86, cpa: global flush tlb after splitting large page and before doing cpa" aca9b5f: Revert "x86: track memtype for RAM in page struct" 6375b67: Revert "x86: handle error returns in set_memory_*()" 6b5b551: x86: handle error returns in set_memory_*() 5f25f5b: x86: track memtype for RAM in page struct 3196625: x86, cpa: global flush tlb after splitting large page and before doing cpa 79abc89: x86, cpa: remove cpa pool code e96d59b: x86, cpa: fix taking the pgd_lock with interrupts off 888fdd9: x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC e579af6: x86, cpa: make the kernel physical mapping initialization a two pass sequence c86eefc: x86, cpa: remove USER permission from the very early identity mapping attribute e8729a5: x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot is a NOP, "git diff f81b691..abfae7f6" comes up empty. So it carries nothing other than the information that this series was broken. (and even that is not very apparent) So ... i've created a new branch, tip/x86/pat2 for the cleaned up history and we'll retire the contents of tip/x86/pat later on once we are sure it has no value. (other than for it to be a permanent embarrassment for me, which i'm totally fine with to destroy) i merged your latest tree to tip/x86/pat and compared that to pat2, and the difference is zero: earth4:~/tip> git diff x86/pat x86/pat2 earth4:~/tip> which is the expected result. i'll re-roll the seven x86-v28-for-linus-phase3...phase10 trees as well. Find below the updated pull request. Another question i was thinking about: the x86-v28-for-linus-phase2-B kind of branch names sound a bit artificial to pull. But i definitely needed them to be cleanly enumerated to be able to test through every step (we did serious boot testing of every phase) and to be able to keep track of everything. Ingo --------------> Linus, Please pull the latest x86-v28-for-linus-phase2-B git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git x86-v28-for-linus-phase2-B out-of-topic modifications in x86-v28-for-linus-phase2-B: --------------------------------------------------------- drivers/char/agp/agp.h # bd07928: agp: add agp_generic_destroy_page # 37acee1: agp: generic_alloc_pages() # 9326d61: Revert "reduce tlb/cache flush ti # 466ae83: reduce tlb/cache flush times of a drivers/char/agp/alpha-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/amd-k7-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/amd64-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/ati-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/efficeon-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/generic.c # bd07928: agp: add agp_generic_destroy_page # 37acee1: agp: generic_alloc_pages() # 9326d61: Revert "reduce tlb/cache flush ti # 466ae83: reduce tlb/cache flush times of a drivers/char/agp/hp-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/i460-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/intel-agp.c # bd07928: agp: add agp_generic_destroy_page # 37acee1: agp: generic_alloc_pages() drivers/char/agp/nvidia-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/parisc-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/sis-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/sworks-agp.c # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/uninorth-agp.c # c09ff7e: linux-next: fix x86 tree build fa # 5f310b6: agp: enable optimized agp_alloc_p drivers/char/agp/via-agp.c # 5f310b6: agp: enable optimized agp_alloc_p mm/highmem.c # 5843d9a: x86, pat: avoid highmem cache att Thanks, Ingo ------------------> Andreas Herrmann (1): x86: compile pat debugfs interface only if CONFIG_X86_PAT is set Ingo Molnar (3): Revert "reduce tlb/cache flush times of agpgart memory allocation" Revert "introduce two APIs for page attribute" x86, pat: cleanups Jeremy Fitzhardinge (1): x86: make sure the CPA test code's use of _PAGE_UNUSED1 is obvious Nick Piggin (1): x86, pat: avoid highmem cache attribute aliasing Rene Herman (3): x86: {reverve,free}_memtype() take a physical address x86: have set_memory_array_{uc,wb} coalesce memtypes. agp: enable optimized agp_alloc_pages methods Shaohua Li (6): introduce two APIs for page attribute reduce tlb/cache flush times of agpgart memory allocation x86, pageattr: introduce APIs to change pageattr of a page array agp: generic_alloc_pages() agp: add agp_generic_destroy_pages() x86: fix pageattr-test Stephen Rothwell (1): linux-next: fix x86 tree build failure Suresh Siddha (10): x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot x86, cpa: remove USER permission from the very early identity mapping attribute x86, cpa: make the kernel physical mapping initialization a two pass sequence x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC x86, cpa: no need to check alias for __set_pages_p/__set_pages_np x86, cpa: remove cpa pool code x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa x86: track memtype for RAM in page struct x86: fix pagetable init 64-bit breakage x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix Venki Pallipadi (1): x86: have set_memory_array_{uc,wb} coalesce memtypes, fix arch/x86/kernel/head_32.S | 34 ++-- arch/x86/kernel/head_64.S | 4 +- arch/x86/mm/init_32.c | 84 +++++++- arch/x86/mm/init_64.c | 110 +++++++--- arch/x86/mm/ioremap.c | 19 ++ arch/x86/mm/pageattr-test.c | 9 +- arch/x86/mm/pageattr.c | 459 +++++++++++++++++++++------------------ arch/x86/mm/pat.c | 132 +++++++++--- drivers/char/agp/agp.h | 5 + drivers/char/agp/alpha-agp.c | 2 + drivers/char/agp/amd-k7-agp.c | 2 + drivers/char/agp/amd64-agp.c | 2 + drivers/char/agp/ati-agp.c | 2 + drivers/char/agp/efficeon-agp.c | 2 + drivers/char/agp/generic.c | 97 ++++++++- drivers/char/agp/hp-agp.c | 2 + drivers/char/agp/i460-agp.c | 2 + drivers/char/agp/intel-agp.c | 28 +++ drivers/char/agp/nvidia-agp.c | 2 + drivers/char/agp/parisc-agp.c | 2 + drivers/char/agp/sis-agp.c | 2 + drivers/char/agp/sworks-agp.c | 2 + drivers/char/agp/uninorth-agp.c | 4 + drivers/char/agp/via-agp.c | 4 + include/asm-x86/cacheflush.h | 7 +- include/asm-x86/page.h | 1 + include/asm-x86/pgtable.h | 13 + mm/highmem.c | 5 +- 28 files changed, 730 insertions(+), 307 deletions(-) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index a7010c3..e835b4e 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -172,10 +172,6 @@ num_subarch_entries = (. - subarch_entries) / 4 * * Note that the stack is not yet set up! */ -#define PTE_ATTR 0x007 /* PRESENT+RW+USER */ -#define PDE_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */ -#define PGD_ATTR 0x001 /* PRESENT (no other attributes) */ - default_entry: #ifdef CONFIG_X86_PAE @@ -196,9 +192,9 @@ default_entry: movl $pa(pg0), %edi movl %edi, pa(init_pg_tables_start) movl $pa(swapper_pg_pmd), %edx - movl $PTE_ATTR, %eax + movl $PTE_IDENT_ATTR, %eax 10: - leal PDE_ATTR(%edi),%ecx /* Create PMD entry */ + leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */ movl %ecx,(%edx) /* Store PMD entry */ /* Upper half already zero */ addl $8,%edx @@ -215,7 +211,7 @@ default_entry: * End condition: we must map up to and including INIT_MAP_BEYOND_END * bytes beyond the end of our own page tables. */ - leal (INIT_MAP_BEYOND_END+PTE_ATTR)(%edi),%ebp + leal (INIT_MAP_BEYOND_END+PTE_IDENT_ATTR)(%edi),%ebp cmpl %ebp,%eax jb 10b 1: @@ -224,7 +220,7 @@ default_entry: movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_ATTR,%eax + movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax movl %eax,pa(swapper_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ @@ -233,9 +229,9 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(pg0), %edi movl %edi, pa(init_pg_tables_start) movl $pa(swapper_pg_dir), %edx - movl $PTE_ATTR, %eax + movl $PTE_IDENT_ATTR, %eax 10: - leal PDE_ATTR(%edi),%ecx /* Create PDE entry */ + leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */ movl %ecx,(%edx) /* Store identity PDE entry */ movl %ecx,page_pde_offset(%edx) /* Store kernel PDE entry */ addl $4,%edx @@ -249,7 +245,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20); * bytes beyond the end of our own page tables; the +0x007 is * the attribute bits */ - leal (INIT_MAP_BEYOND_END+PTE_ATTR)(%edi),%ebp + leal (INIT_MAP_BEYOND_END+PTE_IDENT_ATTR)(%edi),%ebp cmpl %ebp,%eax jb 10b movl %edi,pa(init_pg_tables_end) @@ -257,7 +253,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl %eax, pa(max_pfn_mapped) /* Do early initialization of the fixmap area */ - movl $pa(swapper_pg_fixmap)+PDE_ATTR,%eax + movl $pa(swapper_pg_fixmap)+PDE_IDENT_ATTR,%eax movl %eax,pa(swapper_pg_dir+0xffc) #endif jmp 3f @@ -634,19 +630,19 @@ ENTRY(empty_zero_page) /* Page-aligned for the benefit of paravirt? */ .align PAGE_SIZE_asm ENTRY(swapper_pg_dir) - .long pa(swapper_pg_pmd+PGD_ATTR),0 /* low identity map */ + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 /* low identity map */ # if KPMDS == 3 - .long pa(swapper_pg_pmd+PGD_ATTR),0 - .long pa(swapper_pg_pmd+PGD_ATTR+0x1000),0 - .long pa(swapper_pg_pmd+PGD_ATTR+0x2000),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x2000),0 # elif KPMDS == 2 .long 0,0 - .long pa(swapper_pg_pmd+PGD_ATTR),0 - .long pa(swapper_pg_pmd+PGD_ATTR+0x1000),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR+0x1000),0 # elif KPMDS == 1 .long 0,0 .long 0,0 - .long pa(swapper_pg_pmd+PGD_ATTR),0 + .long pa(swapper_pg_pmd+PGD_IDENT_ATTR),0 # else # error "Kernel PMDs should be 1, 2 or 3" # endif diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index db3280a..26cfdc1 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -110,7 +110,7 @@ startup_64: movq %rdi, %rax shrq $PMD_SHIFT, %rax andq $(PTRS_PER_PMD - 1), %rax - leaq __PAGE_KERNEL_LARGE_EXEC(%rdi), %rdx + leaq __PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx leaq level2_spare_pgt(%rip), %rbx movq %rdx, 0(%rbx, %rax, 8) ident_complete: @@ -374,7 +374,7 @@ NEXT_PAGE(level2_ident_pgt) /* Since I easily can, map the first 1G. * Don't set NX because code runs from these pages. */ - PMDS(0, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD) + PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) NEXT_PAGE(level2_kernel_pgt) /* diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 6b9a935..c3789bb 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -195,11 +195,30 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base, pgd_t *pgd; pmd_t *pmd; pte_t *pte; - unsigned pages_2m = 0, pages_4k = 0; + unsigned pages_2m, pages_4k; + int mapping_iter; + + /* + * First iteration will setup identity mapping using large/small pages + * based on use_pse, with other attributes same as set by + * the early code in head_32.S + * + * Second iteration will setup the appropriate attributes (NX, GLOBAL..) + * as desired for the kernel identity mapping. + * + * This two pass mechanism conforms to the TLB app note which says: + * + * "Software should not write to a paging-structure entry in a way + * that would change, for any linear address, both the page size + * and either the page frame or attributes." + */ + mapping_iter = 1; if (!cpu_has_pse) use_pse = 0; +repeat: + pages_2m = pages_4k = 0; pfn = start_pfn; pgd_idx = pgd_index((pfn<<PAGE_SHIFT) + PAGE_OFFSET); pgd = pgd_base + pgd_idx; @@ -225,6 +244,13 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base, if (use_pse) { unsigned int addr2; pgprot_t prot = PAGE_KERNEL_LARGE; + /* + * first pass will use the same initial + * identity mapping attribute + _PAGE_PSE. + */ + pgprot_t init_prot = + __pgprot(PTE_IDENT_ATTR | + _PAGE_PSE); addr2 = (pfn + PTRS_PER_PTE-1) * PAGE_SIZE + PAGE_OFFSET + PAGE_SIZE-1; @@ -234,7 +260,10 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base, prot = PAGE_KERNEL_LARGE_EXEC; pages_2m++; - set_pmd(pmd, pfn_pmd(pfn, prot)); + if (mapping_iter == 1) + set_pmd(pmd, pfn_pmd(pfn, init_prot)); + else + set_pmd(pmd, pfn_pmd(pfn, prot)); pfn += PTRS_PER_PTE; continue; @@ -246,17 +275,43 @@ static void __init kernel_physical_mapping_init(pgd_t *pgd_base, for (; pte_ofs < PTRS_PER_PTE && pfn < end_pfn; pte++, pfn++, pte_ofs++, addr += PAGE_SIZE) { pgprot_t prot = PAGE_KERNEL; + /* + * first pass will use the same initial + * identity mapping attribute. + */ + pgprot_t init_prot = __pgprot(PTE_IDENT_ATTR); if (is_kernel_text(addr)) prot = PAGE_KERNEL_EXEC; pages_4k++; - set_pte(pte, pfn_pte(pfn, prot)); + if (mapping_iter == 1) + set_pte(pte, pfn_pte(pfn, init_prot)); + else + set_pte(pte, pfn_pte(pfn, prot)); } } } - update_page_count(PG_LEVEL_2M, pages_2m); - update_page_count(PG_LEVEL_4K, pages_4k); + if (mapping_iter == 1) { + /* + * update direct mapping page count only in the first + * iteration. + */ + update_page_count(PG_LEVEL_2M, pages_2m); + update_page_count(PG_LEVEL_4K, pages_4k); + + /* + * local global flush tlb, which will flush the previous + * mappings present in both small and large page TLB's. + */ + __flush_tlb_all(); + + /* + * Second iteration will set the actual desired PTE attributes. + */ + mapping_iter = 2; + goto repeat; + } } /* @@ -719,7 +774,7 @@ void __init setup_bootmem_allocator(void) after_init_bootmem = 1; } -static void __init find_early_table_space(unsigned long end) +static void __init find_early_table_space(unsigned long end, int use_pse) { unsigned long puds, pmds, ptes, tables, start; @@ -729,7 +784,7 @@ static void __init find_early_table_space(unsigned long end) pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT; tables += PAGE_ALIGN(pmds * sizeof(pmd_t)); - if (cpu_has_pse) { + if (use_pse) { unsigned long extra; extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT); @@ -769,12 +824,22 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, pgd_t *pgd_base = swapper_pg_dir; unsigned long start_pfn, end_pfn; unsigned long big_page_start; +#ifdef CONFIG_DEBUG_PAGEALLOC + /* + * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. + * This will simplify cpa(), which otherwise needs to support splitting + * large pages into small in interrupt context, etc. + */ + int use_pse = 0; +#else + int use_pse = cpu_has_pse; +#endif /* * Find space for the kernel direct mapping tables. */ if (!after_init_bootmem) - find_early_table_space(end); + find_early_table_space(end, use_pse); #ifdef CONFIG_X86_PAE set_nx(); @@ -820,7 +885,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, end_pfn = (end>>PMD_SHIFT) << (PMD_SHIFT - PAGE_SHIFT); if (start_pfn < end_pfn) kernel_physical_mapping_init(pgd_base, start_pfn, end_pfn, - cpu_has_pse); + use_pse); /* tail is not big page alignment ? */ start_pfn = end_pfn; @@ -983,7 +1048,6 @@ void __init mem_init(void) if (boot_cpu_data.wp_works_ok < 0) test_wp_bit(); - cpa_init(); save_pg_dir(); zap_low_mappings(); } diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 770536e..fb30486 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -271,7 +271,8 @@ static __ref void unmap_low_page(void *adr) } static unsigned long __meminit -phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end) +phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end, + pgprot_t prot) { unsigned pages = 0; unsigned long last_map_addr = end; @@ -289,36 +290,43 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end) break; } + /* + * We will re-use the existing mapping. + * Xen for example has some special requirements, like mapping + * pagetable pages as RO. So assume someone who pre-setup + * these mappings are more intelligent. + */ if (pte_val(*pte)) continue; if (0) printk(" pte=%p addr=%lx pte=%016lx\n", pte, addr, pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL).pte); - set_pte(pte, pfn_pte(addr >> PAGE_SHIFT, PAGE_KERNEL)); - last_map_addr = (addr & PAGE_MASK) + PAGE_SIZE; pages++; + set_pte(pte, pfn_pte(addr >> PAGE_SHIFT, prot)); + last_map_addr = (addr & PAGE_MASK) + PAGE_SIZE; } + update_page_count(PG_LEVEL_4K, pages); return last_map_addr; } static unsigned long __meminit -phys_pte_update(pmd_t *pmd, unsigned long address, unsigned long end) +phys_pte_update(pmd_t *pmd, unsigned long address, unsigned long end, + pgprot_t prot) { pte_t *pte = (pte_t *)pmd_page_vaddr(*pmd); - return phys_pte_init(pte, address, end); + return phys_pte_init(pte, address, end, prot); } static unsigned long __meminit phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end, - unsigned long page_size_mask) + unsigned long page_size_mask, pgprot_t prot) { unsigned long pages = 0; unsigned long last_map_addr = end; - unsigned long start = address; int i = pmd_index(address); @@ -326,6 +334,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end, unsigned long pte_phys; pmd_t *pmd = pmd_page + pmd_index(address); pte_t *pte; + pgprot_t new_prot = prot; if (address >= end) { if (!after_bootmem) { @@ -339,27 +348,40 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end, if (!pmd_large(*pmd)) { spin_lock(&init_mm.page_table_lock); last_map_addr = phys_pte_update(pmd, address, - end); + end, prot); spin_unlock(&init_mm.page_table_lock); + continue; } - /* Count entries we're using from level2_ident_pgt */ - if (start == 0) - pages++; - continue; + /* + * If we are ok with PG_LEVEL_2M mapping, then we will + * use the existing mapping, + * + * Otherwise, we will split the large page mapping but + * use the same existing protection bits except for + * large page, so that we don't violate Intel's TLB + * Application note (317080) which says, while changing + * the page sizes, new and old translations should + * not differ with respect to page frame and + * attributes. + */ + if (page_size_mask & (1 << PG_LEVEL_2M)) + continue; + new_prot = pte_pgprot(pte_clrhuge(*(pte_t *)pmd)); } if (page_size_mask & (1<<PG_LEVEL_2M)) { pages++; spin_lock(&init_mm.page_table_lock); set_pte((pte_t *)pmd, - pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE)); + pfn_pte(address >> PAGE_SHIFT, + __pgprot(pgprot_val(prot) | _PAGE_PSE))); spin_unlock(&init_mm.page_table_lock); last_map_addr = (address & PMD_MASK) + PMD_SIZE; continue; } pte = alloc_low_page(&pte_phys); - last_map_addr = phys_pte_init(pte, address, end); + last_map_addr = phys_pte_init(pte, address, end, new_prot); unmap_low_page(pte); spin_lock(&init_mm.page_table_lock); @@ -372,12 +394,12 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end, static unsigned long __meminit phys_pmd_update(pud_t *pud, unsigned long address, unsigned long end, - unsigned long page_size_mask) + unsigned long page_size_mask, pgprot_t prot) { pmd_t *pmd = pmd_offset(pud, 0); unsigned long last_map_addr; - last_map_addr = phys_pmd_init(pmd, address, end, page_size_mask); + last_map_addr = phys_pmd_init(pmd, address, end, page_size_mask, prot); __flush_tlb_all(); return last_map_addr; } @@ -394,6 +416,7 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end, unsigned long pmd_phys; pud_t *pud = pud_page + pud_index(addr); pmd_t *pmd; + pgprot_t prot = PAGE_KERNEL; if (addr >= end) break; @@ -405,10 +428,26 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end, } if (pud_val(*pud)) { - if (!pud_large(*pud)) + if (!pud_large(*pud)) { last_map_addr = phys_pmd_update(pud, addr, end, - page_size_mask); - continue; + page_size_mask, prot); + continue; + } + /* + * If we are ok with PG_LEVEL_1G mapping, then we will + * use the existing mapping. + * + * Otherwise, we will split the gbpage mapping but use + * the same existing protection bits except for large + * page, so that we don't violate Intel's TLB + * Application note (317080) which says, while changing + * the page sizes, new and old translations should + * not differ with respect to page frame and + * attributes. + */ + if (page_size_mask & (1 << PG_LEVEL_1G)) + continue; + prot = pte_pgprot(pte_clrhuge(*(pte_t *)pud)); } if (page_size_mask & (1<<PG_LEVEL_1G)) { @@ -422,7 +461,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end, } pmd = alloc_low_page(&pmd_phys); - last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask); + last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask, + prot); unmap_low_page(pmd); spin_lock(&init_mm.page_table_lock); @@ -430,6 +470,7 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end, spin_unlock(&init_mm.page_table_lock); } __flush_tlb_all(); + update_page_count(PG_LEVEL_1G, pages); return last_map_addr; @@ -446,13 +487,14 @@ phys_pud_update(pgd_t *pgd, unsigned long addr, unsigned long end, return phys_pud_init(pud, addr, end, page_size_mask); } -static void __init find_early_table_space(unsigned long end) +static void __init find_early_table_space(unsigned long end, int use_pse, + int use_gbpages) { unsigned long puds, pmds, ptes, tables, start; puds = (end + PUD_SIZE - 1) >> PUD_SHIFT; tables = roundup(puds * sizeof(pud_t), PAGE_SIZE); - if (direct_gbpages) { + if (use_gbpages) { unsigned long extra; extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT); pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT; @@ -460,7 +502,7 @@ static void __init find_early_table_space(unsigned long end) pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT; tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE); - if (cpu_has_pse) { + if (use_pse) { unsigned long extra; extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT); ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT; @@ -528,6 +570,7 @@ static unsigned long __init kernel_physical_mapping_init(unsigned long start, pgd_populate(&init_mm, pgd, __va(pud_phys)); spin_unlock(&init_mm.page_table_lock); } + __flush_tlb_all(); return last_map_addr; } @@ -571,6 +614,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, struct map_range mr[NR_RANGE_MR]; int nr_range, i; + int use_pse, use_gbpages; printk(KERN_INFO "init_memory_mapping\n"); @@ -584,9 +628,21 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, if (!after_bootmem) init_gbpages(); - if (direct_gbpages) +#ifdef CONFIG_DEBUG_PAGEALLOC + /* + * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. + * This will simplify cpa(), which otherwise needs to support splitting + * large pages into small in interrupt context, etc. + */ + use_pse = use_gbpages = 0; +#else + use_pse = cpu_has_pse; + use_gbpages = direct_gbpages; +#endif + + if (use_gbpages) page_size_mask |= 1 << PG_LEVEL_1G; - if (cpu_has_pse) + if (use_pse) page_size_mask |= 1 << PG_LEVEL_2M; memset(mr, 0, sizeof(mr)); @@ -647,7 +703,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, (mr[i].page_size_mask & (1<<PG_LEVEL_2M))?"2M":"4k")); if (!after_bootmem) - find_early_table_space(end); + find_early_table_space(end, use_pse, use_gbpages); for (i = 0; i < nr_range; i++) last_map_addr = kernel_physical_mapping_init( @@ -806,8 +862,6 @@ void __init mem_init(void) reservedpages << (PAGE_SHIFT-10), datasize >> 10, initsize >> 10); - - cpa_init(); } void free_init_pages(char *what, unsigned long begin, unsigned long end) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index cac6da5..6ab3196 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -83,6 +83,25 @@ int page_is_ram(unsigned long pagenr) return 0; } +int pagerange_is_ram(unsigned long start, unsigned long end) +{ + int ram_page = 0, not_rampage = 0; + unsigned long page_nr; + + for (page_nr = (start >> PAGE_SHIFT); page_nr < (end >> PAGE_SHIFT); + ++page_nr) { + if (page_is_ram(page_nr)) + ram_page = 1; + else + not_rampage = 1; + + if (ram_page == not_rampage) + return -1; + } + + return ram_page; +} + /* * Fix up the linear direct mapping of the kernel to avoid cache attribute * conflicts. diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c index d4aa503..e1d1069 100644 --- a/arch/x86/mm/pageattr-test.c +++ b/arch/x86/mm/pageattr-test.c @@ -32,7 +32,7 @@ enum { GPS = (1<<30) }; -#define PAGE_TESTBIT __pgprot(_PAGE_UNUSED1) +#define PAGE_CPA_TEST __pgprot(_PAGE_CPA_TEST) static int pte_testbit(pte_t pte) { @@ -118,6 +118,7 @@ static int pageattr_test(void) unsigned int level; int i, k; int err; + unsigned long test_addr; if (print) printk(KERN_INFO "CPA self-test:\n"); @@ -172,7 +173,8 @@ static int pageattr_test(void) continue; } - err = change_page_attr_set(addr[i], len[i], PAGE_TESTBIT); + test_addr = addr[i]; + err = change_page_attr_set(&test_addr, len[i], PAGE_CPA_TEST, 0); if (err < 0) { printk(KERN_ERR "CPA %d failed %d\n", i, err); failed++; @@ -204,7 +206,8 @@ static int pageattr_test(void) failed++; continue; } - err = change_page_attr_clear(addr[i], len[i], PAGE_TESTBIT); + test_addr = addr[i]; + err = change_page_attr_clear(&test_addr, len[i], PAGE_CPA_TEST, 0); if (err < 0) { printk(KERN_ERR "CPA reverting failed: %d\n", err); failed++; diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 898fad6..a9ec89c 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -25,15 +25,27 @@ * The current flushing context - we pass it instead of 5 arguments: */ struct cpa_data { - unsigned long vaddr; + unsigned long *vaddr; pgprot_t mask_set; pgprot_t mask_clr; int numpages; - int flushtlb; + int flags; unsigned long pfn; unsigned force_split : 1; + int curpage; }; +/* + * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings) + * using cpa_lock. So that we don't allow any other cpu, with stale large tlb + * entries change the page attribute in parallel to some other cpu + * splitting a large page entry along with changing the attribute. + */ +static DEFINE_SPINLOCK(cpa_lock); + +#define CPA_FLUSHTLB 1 +#define CPA_ARRAY 2 + #ifdef CONFIG_PROC_FS static unsigned long direct_pages_count[PG_LEVEL_NUM]; @@ -190,6 +202,41 @@ static void cpa_flush_range(unsigned long start, int numpages, int cache) } } +static void cpa_flush_array(unsigned long *start, int numpages, int cache) +{ + unsigned int i, level; + unsigned long *addr; + + BUG_ON(irqs_disabled()); + + on_each_cpu(__cpa_flush_range, NULL, 1); + + if (!cache) + return; + + /* 4M threshold */ + if (numpages >= 1024) { + if (boot_cpu_data.x86_model >= 4) + wbinvd(); + return; + } + /* + * We only need to flush on one CPU, + * clflush is a MESI-coherent instruction that + * will cause all other CPUs to flush the same + * cachelines: + */ + for (i = 0, addr = start; i < numpages; i++, addr++) { + pte_t *pte = lookup_address(*addr, &level); + + /* + * Only flush present addresses: + */ + if (pte && (pte_val(*pte) & _PAGE_PRESENT)) + clflush_cache_range((void *) *addr, PAGE_SIZE); + } +} + /* * Certain areas of memory on x86 require very specific protection flags, * for example the BIOS area or kernel text. Callers don't always get this @@ -398,7 +445,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, */ new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot)); __set_pmd_pte(kpte, address, new_pte); - cpa->flushtlb = 1; + cpa->flags |= CPA_FLUSHTLB; do_split = 0; } @@ -408,84 +455,6 @@ out_unlock: return do_split; } -static LIST_HEAD(page_pool); -static unsigned long pool_size, pool_pages, pool_low; -static unsigned long pool_used, pool_failed; - -static void cpa_fill_pool(struct page **ret) -{ - gfp_t gfp = GFP_KERNEL; - unsigned long flags; - struct page *p; - - /* - * Avoid recursion (on debug-pagealloc) and also signal - * our priority to get to these pagetables: - */ - if (current->flags & PF_MEMALLOC) - return; - current->flags |= PF_MEMALLOC; - - /* - * Allocate atomically from atomic contexts: - */ - if (in_atomic() || irqs_disabled() || debug_pagealloc) - gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN; - - while (pool_pages < pool_size || (ret && !*ret)) { - p = alloc_pages(gfp, 0); - if (!p) { - pool_failed++; - break; - } - /* - * If the call site needs a page right now, provide it: - */ - if (ret && !*ret) { - *ret = p; - continue; - } - spin_lock_irqsave(&pgd_lock, flags); - list_add(&p->lru, &page_pool); - pool_pages++; - spin_unlock_irqrestore(&pgd_lock, flags); - } - - current->flags &= ~PF_MEMALLOC; -} - -#define SHIFT_MB (20 - PAGE_SHIFT) -#define ROUND_MB_GB ((1 << 10) - 1) -#define SHIFT_MB_GB 10 -#define POOL_PAGES_PER_GB 16 - -void __init cpa_init(void) -{ - struct sysinfo si; - unsigned long gb; - - si_meminfo(&si); - /* - * Calculate the number of pool pages: - * - * Convert totalram (nr of pages) to MiB and round to the next - * GiB. Shift MiB to Gib and multiply the result by - * POOL_PAGES_PER_GB: - */ - if (debug_pagealloc) { - gb = ((si.totalram >> SHIFT_MB) + ROUND_MB_GB) >> SHIFT_MB_GB; - pool_size = POOL_PAGES_PER_GB * gb; - } else { - pool_size = 1; - } - pool_low = pool_size; - - cpa_fill_pool(NULL); - printk(KERN_DEBUG - "CPA: page pool initialized %lu of %lu pages preallocated\n", - pool_pages, pool_size); -} - static int split_large_page(pte_t *kpte, unsigned long address) { unsigned long flags, pfn, pfninc = 1; @@ -494,28 +463,15 @@ static int split_large_page(pte_t *kpte, unsigned long address) pgprot_t ref_prot; struct page *base; - /* - * Get a page from the pool. The pool list is protected by the - * pgd_lock, which we have to take anyway for the split - * operation: - */ - spin_lock_irqsave(&pgd_lock, flags); - if (list_empty(&page_pool)) { - spin_unlock_irqrestore(&pgd_lock, flags); - base = NULL; - cpa_fill_pool(&base); - if (!base) - return -ENOMEM; - spin_lock_irqsave(&pgd_lock, flags); - } else { - base = list_first_entry(&page_pool, struct page, lru); - list_del(&base->lru); - pool_pages--; - - if (pool_pages < pool_low) - pool_low = pool_pages; - } + if (!debug_pagealloc) + spin_unlock(&cpa_lock); + base = alloc_pages(GFP_KERNEL, 0); + if (!debug_pagealloc) + spin_lock(&cpa_lock); + if (!base) + return -ENOMEM; + spin_lock_irqsave(&pgd_lock, flags); /* * Check for races, another CPU might have split this page * up for us already: @@ -572,11 +528,8 @@ out_unlock: * If we dropped out via the lookup_address check under * pgd_lock then stick the page back into the pool: */ - if (base) { - list_add(&base->lru, &page_pool); - pool_pages++; - } else - pool_used++; + if (base) + __free_page(base); spin_unlock_irqrestore(&pgd_lock, flags); return 0; @@ -584,11 +537,16 @@ out_unlock: static int __change_page_attr(struct cpa_data *cpa, int primary) { - unsigned long address = cpa->vaddr; + unsigned long address; int do_split, err; unsigned int level; pte_t *kpte, old_pte; + if (cpa->flags & CPA_ARRAY) + address = cpa->vaddr[cpa->curpage]; + else + address = *cpa->vaddr; + repeat: kpte = lookup_address(address, &level); if (!kpte) @@ -600,7 +558,7 @@ repeat: return 0; WARN(1, KERN_WARNING "CPA: called for zero pte. " "vaddr = %lx cpa->vaddr = %lx\n", address, - cpa->vaddr); + *cpa->vaddr); return -EINVAL; } @@ -626,7 +584,7 @@ repeat: */ if (pte_val(old_pte) != pte_val(new_pte)) { set_pte_atomic(kpte, new_pte); - cpa->flushtlb = 1; + cpa->flags |= CPA_FLUSHTLB; } cpa->numpages = 1; return 0; @@ -650,7 +608,25 @@ repeat: */ err = split_large_page(kpte, address); if (!err) { - cpa->flushtlb = 1; + /* + * Do a global flush tlb after splitting the large page + * and before we do the actual change page attribute in the PTE. + * + * With out this, we violate the TLB application note, that says + * "The TLBs may contain both ordinary and large-page + * translations for a 4-KByte range of linear addresses. This + * may occur if software modifies the paging structures so that + * the page size used for the address range changes. If the two + * translations differ with respect to page frame or attributes + * (e.g., permissions), processor behavior is undefined and may + * be implementation-specific." + * + * We do this global tlb flush inside the cpa_lock, so that we + * don't allow any other cpu, with stale tlb entries change the + * page attribute in parallel, that also falls into the + * just split large page entry. + */ + flush_tlb_all(); goto repeat; } @@ -663,6 +639,7 @@ static int cpa_process_alias(struct cpa_data *cpa) { struct cpa_data alias_cpa; int ret = 0; + unsigned long temp_cpa_vaddr, vaddr; if (cpa->pfn >= max_pfn_mapped) return 0; @@ -675,16 +652,24 @@ static int cpa_process_alias(struct cpa_data *cpa) * No need to redo, when the primary call touched the direct * mapping already: */ - if (!(within(cpa->vaddr, PAGE_OFFSET, + if (cpa->flags & CPA_ARRAY) + vaddr = cpa->vaddr[cpa->curpage]; + else + vaddr = *cpa->vaddr; + + if (!(within(vaddr, PAGE_OFFSET, PAGE_OFFSET + (max_low_pfn_mapped << PAGE_SHIFT)) #ifdef CONFIG_X86_64 - || within(cpa->vaddr, PAGE_OFFSET + (1UL<<32), + || within(vaddr, PAGE_OFFSET + (1UL<<32), PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT)) #endif )) { alias_cpa = *cpa; - alias_cpa.vaddr = (unsigned long) __va(cpa->pfn << PAGE_SHIFT); + temp_cpa_vaddr = (unsigned long) __va(cpa->pfn << PAGE_SHIFT); + alias_cpa.vaddr = &temp_cpa_vaddr; + alias_cpa.flags &= ~CPA_ARRAY; + ret = __change_page_attr_set_clr(&alias_cpa, 0); } @@ -696,7 +681,7 @@ static int cpa_process_alias(struct cpa_data *cpa) * No need to redo, when the primary call touched the high * mapping already: */ - if (within(cpa->vaddr, (unsigned long) _text, (unsigned long) _end)) + if (within(vaddr, (unsigned long) _text, (unsigned long) _end)) return 0; /* @@ -707,8 +692,9 @@ static int cpa_process_alias(struct cpa_data *cpa) return 0; alias_cpa = *cpa; - alias_cpa.vaddr = - (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base; + temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base; + alias_cpa.vaddr = &temp_cpa_vaddr; + alias_cpa.flags &= ~CPA_ARRAY; /* * The high mapping range is imprecise, so ignore the return value. @@ -728,8 +714,15 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias) * preservation check. */ cpa->numpages = numpages; + /* for array changes, we can't use large page */ + if (cpa->flags & CPA_ARRAY) + cpa->numpages = 1; + if (!debug_pagealloc) + spin_lock(&cpa_lock); ret = __change_page_attr(cpa, checkalias); + if (!debug_pagealloc) + spin_unlock(&cpa_lock); if (ret) return ret; @@ -746,7 +739,11 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias) */ BUG_ON(cpa->numpages > numpages); numpages -= cpa->numpages; - cpa->vaddr += cpa->numpages * PAGE_SIZE; + if (cpa->flags & CPA_ARRAY) + cpa->curpage++; + else + *cpa->vaddr += cpa->numpages * PAGE_SIZE; + } return 0; } @@ -757,9 +754,9 @@ static inline int cache_attr(pgprot_t attr) (_PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PWT | _PAGE_PCD); } -static int change_page_attr_set_clr(unsigned long addr, int numpages, +static int change_page_attr_set_clr(unsigned long *addr, int numpages, pgprot_t mask_set, pgprot_t mask_clr, - int force_split) + int force_split, int array) { struct cpa_data cpa; int ret, cache, checkalias; @@ -774,21 +771,38 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages, return 0; /* Ensure we are PAGE_SIZE aligned */ - if (addr & ~PAGE_MASK) { - addr &= PAGE_MASK; - /* - * People should not be passing in unaligned addresses: - */ - WARN_ON_ONCE(1); + if (!array) { + if (*addr & ~PAGE_MASK) { + *addr &= PAGE_MASK; + /* + * People should not be passing in unaligned addresses: + */ + WARN_ON_ONCE(1); + } + } else { + int i; + for (i = 0; i < numpages; i++) { + if (addr[i] & ~PAGE_MASK) { + addr[i] &= PAGE_MASK; + WARN_ON_ONCE(1); + } + } } + /* Must avoid aliasing mappings in the highmem code */ + kmap_flush_unused(); + cpa.vaddr = addr; cpa.numpages = numpages; cpa.mask_set = mask_set; cpa.mask_clr = mask_clr; - cpa.flushtlb = 0; + cpa.flags = 0; + cpa.curpage = 0; cpa.force_split = force_split; + if (array) + cpa.flags |= CPA_ARRAY; + /* No alias checking for _NX bit modifications */ checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX; @@ -797,7 +811,7 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages, /* * Check whether we really changed something: */ - if (!cpa.flushtlb) + if (!(cpa.flags & CPA_FLUSHTLB)) goto out; /* @@ -812,27 +826,30 @@ static int change_page_attr_set_clr(unsigned long addr, int numpages, * error case we fall back to cpa_flush_all (which uses * wbindv): */ - if (!ret && cpu_has_clflush) - cpa_flush_range(addr, numpages, cache); - else + if (!ret && cpu_has_clflush) { + if (cpa.flags & CPA_ARRAY) + cpa_flush_array(addr, numpages, cache); + else + cpa_flush_range(*addr, numpages, cache); + } else cpa_flush_all(cache); out: - cpa_fill_pool(NULL); - return ret; } -static inline int change_page_attr_set(unsigned long addr, int numpages, - pgprot_t mask) +static inline int change_page_attr_set(unsigned long *addr, int numpages, + pgprot_t mask, int array) { - return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0); + return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0, + array); } -static inline int change_page_attr_clear(unsigned long addr, int numpages, - pgprot_t mask) +static inline int change_page_attr_clear(unsigned long *addr, int numpages, + pgprot_t mask, int array) { - return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0); + return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0, + array); } int _set_memory_uc(unsigned long addr, int numpages) @@ -840,8 +857,8 @@ int _set_memory_uc(unsigned long addr, int numpages) /* * for now UC MINUS. see comments in ioremap_nocache() */ - return change_page_attr_set(addr, numpages, - __pgprot(_PAGE_CACHE_UC_MINUS)); + return change_page_attr_set(&addr, numpages, + __pgprot(_PAGE_CACHE_UC_MINUS), 0); } int set_memory_uc(unsigned long addr, int numpages) @@ -857,10 +874,48 @@ int set_memory_uc(unsigned long addr, int numpages) } EXPORT_SYMBOL(set_memory_uc); +int set_memory_array_uc(unsigned long *addr, int addrinarray) +{ + unsigned long start; + unsigned long end; + int i; + /* + * for now UC MINUS. see comments in ioremap_nocache() + */ + for (i = 0; i < addrinarray; i++) { + start = __pa(addr[i]); + for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) { + if (end != __pa(addr[i + 1])) + break; + i++; + } + if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL)) + goto out; + } + + return change_page_attr_set(addr, addrinarray, + __pgprot(_PAGE_CACHE_UC_MINUS), 1); +out: + for (i = 0; i < addrinarray; i++) { + unsigned long tmp = __pa(addr[i]); + + if (tmp == start) + break; + for (end = tmp + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) { + if (end != __pa(addr[i + 1])) + break; + i++; + } + free_memtype(tmp, end); + } + return -EINVAL; +} +EXPORT_SYMBOL(set_memory_array_uc); + int _set_memory_wc(unsigned long addr, int numpages) { - return change_page_attr_set(addr, numpages, - __pgprot(_PAGE_CACHE_WC)); + return change_page_attr_set(&addr, numpages, + __pgprot(_PAGE_CACHE_WC), 0); } int set_memory_wc(unsigned long addr, int numpages) @@ -878,8 +933,8 @@ EXPORT_SYMBOL(set_memory_wc); int _set_memory_wb(unsigned long addr, int numpages) { - return change_page_attr_clear(addr, numpages, - __pgprot(_PAGE_CACHE_MASK)); + return change_page_attr_clear(&addr, numpages, + __pgprot(_PAGE_CACHE_MASK), 0); } int set_memory_wb(unsigned long addr, int numpages) @@ -890,39 +945,59 @@ int set_memory_wb(unsigned long addr, int numpages) } EXPORT_SYMBOL(set_memory_wb); +int set_memory_array_wb(unsigned long *addr, int addrinarray) +{ + int i; + + for (i = 0; i < addrinarray; i++) { + unsigned long start = __pa(addr[i]); + unsigned long end; + + for (end = start + PAGE_SIZE; i < addrinarray - 1; end += PAGE_SIZE) { + if (end != __pa(addr[i + 1])) + break; + i++; + } + free_memtype(start, end); + } + return change_page_attr_clear(addr, addrinarray, + __pgprot(_PAGE_CACHE_MASK), 1); +} +EXPORT_SYMBOL(set_memory_array_wb); + int set_memory_x(unsigned long addr, int numpages) { - return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_NX)); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0); } EXPORT_SYMBOL(set_memory_x); int set_memory_nx(unsigned long addr, int numpages) { - return change_page_attr_set(addr, numpages, __pgprot(_PAGE_NX)); + return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0); } EXPORT_SYMBOL(set_memory_nx); int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW)); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); } EXPORT_SYMBOL_GPL(set_memory_ro); int set_memory_rw(unsigned long addr, int numpages) { - return change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW)); + return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0); } EXPORT_SYMBOL_GPL(set_memory_rw); int set_memory_np(unsigned long addr, int numpages) { - return change_page_attr_clear(addr, numpages, __pgprot(_PAGE_PRESENT)); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0); } int set_memory_4k(unsigned long addr, int numpages) { - return change_page_attr_set_clr(addr, numpages, __pgprot(0), - __pgprot(0), 1); + return change_page_attr_set_clr(&addr, numpages, __pgprot(0), + __pgprot(0), 1, 0); } int set_pages_uc(struct page *page, int numpages) @@ -975,22 +1050,38 @@ int set_pages_rw(struct page *page, int numpages) static int __set_pages_p(struct page *page, int numpages) { - struct cpa_data cpa = { .vaddr = (unsigned long) page_address(page), + unsigned long tempaddr = (unsigned long) page_address(page); + struct cpa_data cpa = { .vaddr = &tempaddr, .numpages = numpages, .mask_set = __pgprot(_PAGE_PRESENT | _PAGE_RW), - .mask_clr = __pgprot(0)}; + .mask_clr = __pgprot(0), + .flags = 0}; - return __change_page_attr_set_clr(&cpa, 1); + /* + * No alias checking needed for setting present flag. otherwise, + * we may need to break large pages for 64-bit kernel text + * mappings (this adds to complexity if we want to do this from + * atomic context especially). Let's keep it simple! + */ + return __change_page_attr_set_clr(&cpa, 0); } static int __set_pages_np(struct page *page, int numpages) { - struct cpa_data cpa = { .vaddr = (unsigned long) page_address(page), + unsigned long tempaddr = (unsigned long) page_address(page); + struct cpa_data cpa = { .vaddr = &tempaddr, .numpages = numpages, .mask_set = __pgprot(0), - .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW)}; + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW), + .flags = 0}; - return __change_page_attr_set_clr(&cpa, 1); + /* + * No alias checking needed for setting not present flag. otherwise, + * we may need to break large pages for 64-bit kernel text + * mappings (this adds to complexity if we want to do this from + * atomic context especially). Let's keep it simple! + */ + return __change_page_attr_set_clr(&cpa, 0); } void kernel_map_pages(struct page *page, int numpages, int enable) @@ -1010,11 +1101,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable) /* * The return value is ignored as the calls cannot fail. - * Large pages are kept enabled at boot time, and are - * split up quickly with DEBUG_PAGEALLOC. If a splitup - * fails here (due to temporary memory shortage) no damage - * is done because we just keep the largepage intact up - * to the next attempt when it will likely be split up: + * Large pages for identity mappings are not used at boot time + * and hence no memory allocations during large page split. */ if (enable) __set_pages_p(page, numpages); @@ -1026,53 +1114,8 @@ void kernel_map_pages(struct page *page, int numpages, int enable) * but that can deadlock->flush only current cpu: */ __flush_tlb_all(); - - /* - * Try to refill the page pool here. We can do this only after - * the tlb flush. - */ - cpa_fill_pool(NULL); } -#ifdef CONFIG_DEBUG_FS -static int dpa_show(struct seq_file *m, void *v) -{ - seq_puts(m, "DEBUG_PAGEALLOC\n"); - seq_printf(m, "pool_size : %lu\n", pool_size); - seq_printf(m, "pool_pages : %lu\n", pool_pages); - seq_printf(m, "pool_low : %lu\n", pool_low); - seq_printf(m, "pool_used : %lu\n", pool_used); - seq_printf(m, "pool_failed : %lu\n", pool_failed); - - return 0; -} - -static int dpa_open(struct inode *inode, struct file *filp) -{ - return single_open(filp, dpa_show, NULL); -} - -static const struct file_operations dpa_fops = { - .open = dpa_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int __init debug_pagealloc_proc_init(void) -{ - struct dentry *de; - - de = debugfs_create_file("debug_pagealloc", 0600, NULL, NULL, - &dpa_fops); - if (!de) - return -ENOMEM; - - return 0; -} -__initcall(debug_pagealloc_proc_init); -#endif - #ifdef CONFIG_HIBERNATION bool kernel_page_present(struct page *page) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index 2a50e0f..738fd0f 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -7,24 +7,24 @@ * Loosely based on earlier PAT patchset from Eric Biederman and Andi Kleen. */ -#include <linux/mm.h> +#include <linux/seq_file.h> +#include <linux/bootmem.h> +#include <linux/debugfs.h> #include <linux/kernel.h> #include <linux/gfp.h> +#include <linux/mm.h> #include <linux/fs.h> -#include <linux/bootmem.h> -#include <linux/debugfs.h> -#include <linux/seq_file.h> -#include <asm/msr.h> -#include <asm/tlbflush.h> +#include <asm/cacheflush.h> #include <asm/processor.h> -#include <asm/page.h> +#include <asm/tlbflush.h> #include <asm/pgtable.h> -#include <asm/pat.h> -#include <asm/e820.h> -#include <asm/cacheflush.h> #include <asm/fcntl.h> +#include <asm/e820.h> #include <asm/mtrr.h> +#include <asm/page.h> +#include <asm/msr.h> +#include <asm/pat.h> #include <asm/io.h> #ifdef CONFIG_X86_PAT @@ -46,6 +46,7 @@ early_param("nopat", nopat); static int debug_enable; + static int __init pat_debug_setup(char *str) { debug_enable = 1; @@ -145,14 +146,14 @@ static char *cattr_name(unsigned long flags) */ struct memtype { - u64 start; - u64 end; - unsigned long type; - struct list_head nd; + u64 start; + u64 end; + unsigned long type; + struct list_head nd; }; static LIST_HEAD(memtype_list); -static DEFINE_SPINLOCK(memtype_lock); /* protects memtype list */ +static DEFINE_SPINLOCK(memtype_lock); /* protects memtype list */ /* * Does intersection of PAT memory type and MTRR memory type and returns @@ -180,8 +181,8 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end, unsigned long req_type) return req_type; } -static int chk_conflict(struct memtype *new, struct memtype *entry, - unsigned long *type) +static int +chk_conflict(struct memtype *new, struct memtype *entry, unsigned long *type) { if (new->type != entry->type) { if (type) { @@ -211,6 +212,66 @@ static struct memtype *cached_entry; static u64 cached_start; /* + * For RAM pages, mark the pages as non WB memory type using + * PageNonWB (PG_arch_1). We allow only one set_memory_uc() or + * set_memory_wc() on a RAM page at a time before marking it as WB again. + * This is ok, because only one driver will be owning the page and + * doing set_memory_*() calls. + * + * For now, we use PageNonWB to track that the RAM page is being mapped + * as non WB. In future, we will have to use one more flag + * (or some other mechanism in page_struct) to distinguish between + * UC and WC mapping. + */ +static int reserve_ram_pages_type(u64 start, u64 end, unsigned long req_type, + unsigned long *new_type) +{ + struct page *page; + u64 pfn, end_pfn; + + for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { + page = pfn_to_page(pfn); + if (page_mapped(page) || PageNonWB(page)) + goto out; + + SetPageNonWB(page); + } + return 0; + +out: + end_pfn = pfn; + for (pfn = (start >> PAGE_SHIFT); pfn < end_pfn; ++pfn) { + page = pfn_to_page(pfn); + ClearPageNonWB(page); + } + + return -EINVAL; +} + +static int free_ram_pages_type(u64 start, u64 end) +{ + struct page *page; + u64 pfn, end_pfn; + + for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { + page = pfn_to_page(pfn); + if (page_mapped(page) || !PageNonWB(page)) + goto out; + + ClearPageNonWB(page); + } + return 0; + +out: + end_pfn = pfn; + for (pfn = (start >> PAGE_SHIFT); pfn < end_pfn; ++pfn) { + page = pfn_to_page(pfn); + SetPageNonWB(page); + } + return -EINVAL; +} + +/* * req_type typically has one of the: * - _PAGE_CACHE_WB * - _PAGE_CACHE_WC @@ -226,14 +287,15 @@ static u64 cached_start; * it will return a negative return value. */ int reserve_memtype(u64 start, u64 end, unsigned long req_type, - unsigned long *new_type) + unsigned long *new_type) { struct memtype *new, *entry; unsigned long actual_type; struct list_head *where; + int is_range_ram; int err = 0; - BUG_ON(start >= end); /* end is exclusive */ + BUG_ON(start >= end); /* end is exclusive */ if (!pat_enabled) { /* This is identical to page table setting without PAT */ @@ -266,17 +328,24 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type, actual_type = _PAGE_CACHE_WB; else actual_type = _PAGE_CACHE_UC_MINUS; - } else + } else { actual_type = pat_x_mtrr_type(start, end, req_type & _PAGE_CACHE_MASK); + } + + is_range_ram = pagerange_is_ram(start, end); + if (is_range_ram == 1) + return reserve_ram_pages_type(start, end, req_type, new_type); + else if (is_range_ram < 0) + return -EINVAL; new = kmalloc(sizeof(struct memtype), GFP_KERNEL); if (!new) return -ENOMEM; - new->start = start; - new->end = end; - new->type = actual_type; + new->start = start; + new->end = end; + new->type = actual_type; if (new_type) *new_type = actual_type; @@ -335,6 +404,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type, start, end, cattr_name(new->type), cattr_name(req_type)); kfree(new); spin_unlock(&memtype_lock); + return err; } @@ -358,6 +428,7 @@ int free_memtype(u64 start, u64 end) { struct memtype *entry; int err = -EINVAL; + int is_range_ram; if (!pat_enabled) return 0; @@ -366,6 +437,12 @@ int free_memtype(u64 start, u64 end) if (is_ISA_range(start, end - 1)) return 0; + is_range_ram = pagerange_is_ram(start, end); + if (is_range_ram == 1) + return free_ram_pages_type(start, end); + else if (is_range_ram < 0) + return -EINVAL; + spin_lock(&memtype_lock); list_for_each_entry(entry, &memtype_list, nd) { if (entry->start == start && entry->end == end) { @@ -386,6 +463,7 @@ int free_memtype(u64 start, u64 end) } dprintk("free_memtype request 0x%Lx-0x%Lx\n", start, end); + return err; } @@ -492,9 +570,9 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, void map_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot) { + unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK); u64 addr = (u64)pfn << PAGE_SHIFT; unsigned long flags; - unsigned long want_flags = (pgprot_val(vma_prot) & _PAGE_CACHE_MASK); reserve_memtype(addr, addr + size, want_flags, &flags); if (flags != want_flags) { @@ -514,7 +592,7 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot) free_memtype(addr, addr + size); } -#if defined(CONFIG_DEBUG_FS) +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT) /* get Nth element of the linked list */ static struct memtype *memtype_get_idx(loff_t pos) @@ -537,6 +615,7 @@ static struct memtype *memtype_get_idx(loff_t pos) } spin_unlock(&memtype_lock); kfree(print_entry); + return NULL; } @@ -567,6 +646,7 @@ static int memtype_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "%s @ 0x%Lx-0x%Lx\n", cattr_name(print_entry->type), print_entry->start, print_entry->end); kfree(print_entry); + return 0; } @@ -598,4 +678,4 @@ static int __init pat_memtype_list_init(void) late_initcall(pat_memtype_list_init); -#endif /* CONFIG_DEBUG_FS */ +#endif /* CONFIG_DEBUG_FS && CONFIG_X86_PAT */ diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h index 4bada0e..46f5075 100644 --- a/drivers/char/agp/agp.h +++ b/drivers/char/agp/agp.h @@ -116,7 +116,9 @@ struct agp_bridge_driver { struct agp_memory *(*alloc_by_type) (size_t, int); void (*free_by_type)(struct agp_memory *); void *(*agp_alloc_page)(struct agp_bridge_data *); + int (*agp_alloc_pages)(struct agp_bridge_data *, struct agp_memory *, size_t); void (*agp_destroy_page)(void *, int flags); + void (*agp_destroy_pages)(struct agp_memory *); int (*agp_type_to_mask_type) (struct agp_bridge_data *, int); void (*chipset_flush)(struct agp_bridge_data *); }; @@ -277,7 +279,10 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type); struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type); void agp_generic_free_by_type(struct agp_memory *curr); void *agp_generic_alloc_page(struct agp_bridge_data *bridge); +int agp_generic_alloc_pages(struct agp_bridge_data *agp_bridge, + struct agp_memory *memory, size_t page_count); void agp_generic_destroy_page(void *addr, int flags); +void agp_generic_destroy_pages(struct agp_memory *memory); void agp_free_key(int key); int agp_num_entries(void); u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command); diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c index 5da89f6..5ea4da8 100644 --- a/drivers/char/agp/alpha-agp.c +++ b/drivers/char/agp/alpha-agp.c @@ -143,7 +143,9 @@ struct agp_bridge_driver alpha_core_agp_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c index e280531..603a986 100644 --- a/drivers/char/agp/amd-k7-agp.c +++ b/drivers/char/agp/amd-k7-agp.c @@ -386,7 +386,9 @@ static const struct agp_bridge_driver amd_irongate_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c index 7495c52..2812ee2 100644 --- a/drivers/char/agp/amd64-agp.c +++ b/drivers/char/agp/amd64-agp.c @@ -224,7 +224,9 @@ static const struct agp_bridge_driver amd_8151_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/ati-agp.c b/drivers/char/agp/ati-agp.c index 6ecbcaf..ae2791b 100644 --- a/drivers/char/agp/ati-agp.c +++ b/drivers/char/agp/ati-agp.c @@ -418,7 +418,9 @@ static const struct agp_bridge_driver ati_generic_bridge = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/efficeon-agp.c b/drivers/char/agp/efficeon-agp.c index 8ca6f26..453543a 100644 --- a/drivers/char/agp/efficeon-agp.c +++ b/drivers/char/agp/efficeon-agp.c @@ -335,7 +335,9 @@ static const struct agp_bridge_driver efficeon_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index 118dbde..10d6cbd 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -201,14 +201,22 @@ void agp_free_memory(struct agp_memory *curr) return; } if (curr->page_count != 0) { - for (i = 0; i < curr->page_count; i++) { - curr->memory[i] = (unsigned long)gart_to_virt(curr->memory[i]); - curr->bridge->driver->agp_destroy_page((void *)curr->memory[i], - AGP_PAGE_DESTROY_UNMAP); - } - for (i = 0; i < curr->page_count; i++) { - curr->bridge->driver->agp_destroy_page((void *)curr->memory[i], - AGP_PAGE_DESTROY_FREE); + if (curr->bridge->driver->agp_destroy_pages) { + curr->bridge->driver->agp_destroy_pages(curr); + } else { + + for (i = 0; i < curr->page_count; i++) { + curr->memory[i] = (unsigned long)gart_to_virt( + curr->memory[i]); + curr->bridge->driver->agp_destroy_page( + (void *)curr->memory[i], + AGP_PAGE_DESTROY_UNMAP); + } + for (i = 0; i < curr->page_count; i++) { + curr->bridge->driver->agp_destroy_page( + (void *)curr->memory[i], + AGP_PAGE_DESTROY_FREE); + } } } agp_free_key(curr->key); @@ -264,6 +272,15 @@ struct agp_memory *agp_allocate_memory(struct agp_bridge_data *bridge, if (new == NULL) return NULL; + if (bridge->driver->agp_alloc_pages) { + if (bridge->driver->agp_alloc_pages(bridge, new, page_count)) { + agp_free_memory(new); + return NULL; + } + new->bridge = bridge; + return new; + } + for (i = 0; i < page_count; i++) { void *addr = bridge->driver->agp_alloc_page(bridge); @@ -1203,6 +1220,39 @@ EXPORT_SYMBOL(agp_generic_alloc_user); * against a maximum value. */ +int agp_generic_alloc_pages(struct agp_bridge_data *bridge, struct agp_memory *mem, size_t num_pages) +{ + struct page * page; + int i, ret = -ENOMEM; + + for (i = 0; i < num_pages; i++) { + page = alloc_page(GFP_KERNEL | GFP_DMA32); + /* agp_free_memory() needs gart address */ + if (page == NULL) + goto out; + +#ifndef CONFIG_X86 + map_page_into_agp(page); +#endif + get_page(page); + atomic_inc(&agp_bridge->current_memory_agp); + + /* set_memory_array_uc() needs virtual address */ + mem->memory[i] = (unsigned long)page_address(page); + mem->page_count++; + } + +#ifdef CONFIG_X86 + set_memory_array_uc(mem->memory, num_pages); +#endif + ret = 0; +out: + for (i = 0; i < mem->page_count; i++) + mem->memory[i] = virt_to_gart((void *)mem->memory[i]); + return ret; +} +EXPORT_SYMBOL(agp_generic_alloc_pages); + void *agp_generic_alloc_page(struct agp_bridge_data *bridge) { struct page * page; @@ -1219,6 +1269,37 @@ void *agp_generic_alloc_page(struct agp_bridge_data *bridge) } EXPORT_SYMBOL(agp_generic_alloc_page); +void agp_generic_destroy_pages(struct agp_memory *mem) +{ + int i; + void *addr; + struct page *page; + + if (!mem) + return; + + for (i = 0; i < mem->page_count; i++) + mem->memory[i] = (unsigned long)gart_to_virt(mem->memory[i]); + +#ifdef CONFIG_X86 + set_memory_array_wb(mem->memory, mem->page_count); +#endif + + for (i = 0; i < mem->page_count; i++) { + addr = (void *)mem->memory[i]; + page = virt_to_page(addr); + +#ifndef CONFIG_X86 + unmap_page_from_agp(page); +#endif + + put_page(page); + free_page((unsigned long)addr); + atomic_dec(&agp_bridge->current_memory_agp); + mem->memory[i] = 0; + } +} +EXPORT_SYMBOL(agp_generic_destroy_pages); void agp_generic_destroy_page(void *addr, int flags) { diff --git a/drivers/char/agp/hp-agp.c b/drivers/char/agp/hp-agp.c index 80d7317..183ac3f 100644 --- a/drivers/char/agp/hp-agp.c +++ b/drivers/char/agp/hp-agp.c @@ -435,7 +435,9 @@ const struct agp_bridge_driver hp_zx1_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = true, }; diff --git a/drivers/char/agp/i460-agp.c b/drivers/char/agp/i460-agp.c index e587eeb..10da687 100644 --- a/drivers/char/agp/i460-agp.c +++ b/drivers/char/agp/i460-agp.c @@ -575,7 +575,9 @@ const struct agp_bridge_driver intel_i460_driver = { .insert_memory = i460_insert_memory_small_io_page, .remove_memory = i460_remove_memory_small_io_page, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, #endif .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index 016fdf0..043e366 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -1711,7 +1711,9 @@ static const struct agp_bridge_driver intel_generic_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1736,7 +1738,9 @@ static const struct agp_bridge_driver intel_810_driver = { .alloc_by_type = intel_i810_alloc_by_type, .free_by_type = intel_i810_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1760,7 +1764,9 @@ static const struct agp_bridge_driver intel_815_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1785,7 +1791,9 @@ static const struct agp_bridge_driver intel_830_driver = { .alloc_by_type = intel_i830_alloc_by_type, .free_by_type = intel_i810_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = intel_i830_type_to_mask_type, .chipset_flush = intel_i830_chipset_flush, }; @@ -1810,7 +1818,9 @@ static const struct agp_bridge_driver intel_820_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1834,7 +1844,9 @@ static const struct agp_bridge_driver intel_830mp_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1858,7 +1870,9 @@ static const struct agp_bridge_driver intel_840_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1882,7 +1896,9 @@ static const struct agp_bridge_driver intel_845_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, .chipset_flush = intel_i830_chipset_flush, }; @@ -1907,7 +1923,9 @@ static const struct agp_bridge_driver intel_850_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1931,7 +1949,9 @@ static const struct agp_bridge_driver intel_860_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -1956,7 +1976,9 @@ static const struct agp_bridge_driver intel_915_driver = { .alloc_by_type = intel_i830_alloc_by_type, .free_by_type = intel_i810_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = intel_i830_type_to_mask_type, .chipset_flush = intel_i915_chipset_flush, }; @@ -1982,7 +2004,9 @@ static const struct agp_bridge_driver intel_i965_driver = { .alloc_by_type = intel_i830_alloc_by_type, .free_by_type = intel_i810_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = intel_i830_type_to_mask_type, .chipset_flush = intel_i915_chipset_flush, }; @@ -2007,7 +2031,9 @@ static const struct agp_bridge_driver intel_7505_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -2032,7 +2058,9 @@ static const struct agp_bridge_driver intel_g33_driver = { .alloc_by_type = intel_i830_alloc_by_type, .free_by_type = intel_i810_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = intel_i830_type_to_mask_type, .chipset_flush = intel_i915_chipset_flush, }; diff --git a/drivers/char/agp/nvidia-agp.c b/drivers/char/agp/nvidia-agp.c index eaceb61..dc70d37 100644 --- a/drivers/char/agp/nvidia-agp.c +++ b/drivers/char/agp/nvidia-agp.c @@ -312,7 +312,9 @@ static const struct agp_bridge_driver nvidia_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/parisc-agp.c b/drivers/char/agp/parisc-agp.c index 8c42dcc..f2492ec 100644 --- a/drivers/char/agp/parisc-agp.c +++ b/drivers/char/agp/parisc-agp.c @@ -224,7 +224,9 @@ static const struct agp_bridge_driver parisc_agp_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = true, }; diff --git a/drivers/char/agp/sis-agp.c b/drivers/char/agp/sis-agp.c index 2587ef9..6c3837a 100644 --- a/drivers/char/agp/sis-agp.c +++ b/drivers/char/agp/sis-agp.c @@ -140,7 +140,9 @@ static struct agp_bridge_driver sis_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/sworks-agp.c b/drivers/char/agp/sworks-agp.c index 2fb27fe..6224df8 100644 --- a/drivers/char/agp/sworks-agp.c +++ b/drivers/char/agp/sworks-agp.c @@ -437,7 +437,9 @@ static const struct agp_bridge_driver sworks_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/uninorth-agp.c b/drivers/char/agp/uninorth-agp.c index eef7270..0f004b6 100644 --- a/drivers/char/agp/uninorth-agp.c +++ b/drivers/char/agp/uninorth-agp.c @@ -509,7 +509,9 @@ const struct agp_bridge_driver uninorth_agp_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = true, }; @@ -534,7 +536,9 @@ const struct agp_bridge_driver u3_agp_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = true, .needs_scratch_page = true, diff --git a/drivers/char/agp/via-agp.c b/drivers/char/agp/via-agp.c index 7b36476..9f4d49e 100644 --- a/drivers/char/agp/via-agp.c +++ b/drivers/char/agp/via-agp.c @@ -190,7 +190,9 @@ static const struct agp_bridge_driver via_agp3_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; @@ -214,7 +216,9 @@ static const struct agp_bridge_driver via_driver = { .alloc_by_type = agp_generic_alloc_by_type, .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, + .agp_alloc_pages = agp_generic_alloc_pages, .agp_destroy_page = agp_generic_destroy_page, + .agp_destroy_pages = agp_generic_destroy_pages, .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/include/asm-x86/cacheflush.h b/include/asm-x86/cacheflush.h index 59859cb..68840ef 100644 --- a/include/asm-x86/cacheflush.h +++ b/include/asm-x86/cacheflush.h @@ -24,6 +24,8 @@ #define copy_from_user_page(vma, page, vaddr, dst, src, len) \ memcpy((dst), (src), (len)) +#define PG_non_WB PG_arch_1 +PAGEFLAG(NonWB, non_WB) /* * The set_memory_* API can be used to change various attributes of a virtual @@ -66,6 +68,9 @@ int set_memory_rw(unsigned long addr, int numpages); int set_memory_np(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages); +int set_memory_array_uc(unsigned long *addr, int addrinarray); +int set_memory_array_wb(unsigned long *addr, int addrinarray); + /* * For legacy compatibility with the old APIs, a few functions * are provided that work on a "struct page". @@ -96,8 +101,6 @@ int set_pages_rw(struct page *page, int numpages); void clflush_cache_range(void *addr, unsigned int size); -void cpa_init(void); - #ifdef CONFIG_DEBUG_RODATA void mark_rodata_ro(void); extern const int rodata_test_data; diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h index 79544e6..c915747 100644 --- a/include/asm-x86/page.h +++ b/include/asm-x86/page.h @@ -57,6 +57,7 @@ typedef struct { pgdval_t pgd; } pgd_t; typedef struct { pgprotval_t pgprot; } pgprot_t; extern int page_is_ram(unsigned long pagenr); +extern int pagerange_is_ram(unsigned long start, unsigned long end); extern int devmem_is_allowed(unsigned long pagenr); extern void map_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot); diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h index 888add7..ed93245 100644 --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -19,6 +19,7 @@ #define _PAGE_BIT_UNUSED3 11 #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ #define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1 +#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1 #define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) @@ -36,6 +37,7 @@ #define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT) #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) #define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL) +#define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST) #define __HAVE_ARCH_PTE_SPECIAL #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) @@ -130,6 +132,17 @@ #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC +/* + * early identity mapping pte attrib macros. + */ +#ifdef CONFIG_X86_64 +#define __PAGE_KERNEL_IDENT_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC +#else +#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */ +#define PDE_IDENT_ATTR 0x063 /* PRESENT+RW+DIRTY+ACCESSED */ +#define PGD_IDENT_ATTR 0x001 /* PRESENT (no other attributes) */ +#endif + #ifndef __ASSEMBLY__ /* diff --git a/mm/highmem.c b/mm/highmem.c index e16e152..b36b83b 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -70,6 +70,7 @@ static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait); static void flush_all_zero_pkmaps(void) { int i; + int need_flush = 0; flush_cache_kmaps(); @@ -101,8 +102,10 @@ static void flush_all_zero_pkmaps(void) &pkmap_page_table[i]); set_page_address(page, NULL); + need_flush = 1; } - flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); + if (need_flush) + flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP)); } /** ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [git pull, take 2] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 17:41 ` [git pull, take 2] " Ingo Molnar @ 2008-10-10 18:15 ` Linus Torvalds 2008-10-10 18:33 ` Ingo Molnar 2008-10-10 18:17 ` Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2008-10-10 18:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin On Fri, 10 Oct 2008, Ingo Molnar wrote: > > i'll re-roll the seven x86-v28-for-linus-phase3...phase10 trees as well. Just checking: the scheduler/rcu trees you did pull requests for yesterday are all independent and I can pull those as-is, right? Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull, take 2] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 18:15 ` Linus Torvalds @ 2008-10-10 18:33 ` Ingo Molnar 0 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 18:33 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, 10 Oct 2008, Ingo Molnar wrote: > > > > i'll re-roll the seven x86-v28-for-linus-phase3...phase10 trees as well. > > Just checking: the scheduler/rcu trees you did pull requests for > yesterday are all independent and I can pull those as-is, right? Correct, scheduler, RCU and fastboot are all independent. To make sure i just checked the merging aspects of all of them: sched-v28-for-linus: OK 2c10c22: Merge branch 'linus' into sched/devel 63e5c39: Merge branches 'sched/urgent' and 'sched/rt' into sched/devel 09b22a2: Merge commit 'v2.6.27-rc6' into sched/devel 7f79d85: Merge branch 'linus' into sched/devel 3cf430b: Merge branch 'linus' into sched/devel rcu-v28-for-linus: OK cdbb92b: Merge branch 'linus' into core/rcu b5259d9: Merge commit 'v2.6.27-rc8' into core/rcu 429b022: Merge commit 'v2.6.27-rc6' into core/rcu c4c0c56: Merge branch 'linus' into core/rcu fastboot-v28-for-linus: OK 1562542: Merge branch 'linus' into fastboot 3588ed2: Merge branch 'linus' into fastboot f793691: Merge commit 'v2.6.27-rc6' into fastboot b676303: Merge branch 'linus' into fastboot bf015f7: Merge branch 'linus' into fastboot the x86-v28-phase*-linus branches were all interdependent but are easy to re-propagate because all the x86 topics are -git based and while they merged the old x86/pat, they merged it before it grew that ugliness. (and the pat2 rebase did not touch the old commits that were fine) So it should be fine. I'm also doing rolling tests of the new branches. In general, this is where the "extreme topical" setup rocks IMO: had this mishap happened in a linear tree i'd have had to rebase about 300 followup commits to get rid of it - that would have been quite a mess to validate. But with the topical setup it was at the end of x86/pat and only a single followup commit had to be rebased. And bisectability bugs easily slip into extreme rebasing excercises: crossing merges cause conflicts and manual steps that are easy to mess up. I had to do such a rebase once a couple of months ago and it was very stressful and very hard to get it right. ... we are not totally immune to it though, because sometimes we still do cross-merges between topics when the conflicts just get too ugly. So this was pretty much close to a worst-case scenario: phase2 was unacceptable due to this really stupid v1->v2 series and i had to redo the whole followup integration - but still i did not have to do _one single_ manual step in the code space itself (only on branches), so while there are new-looking merge commits with conflicts, they are all cached git-rerere entries and thus the testing status should still all be valid. btw., that might be a Git feature request: it would be _really_ nice if instead of: Merge branch 'linus' into x86/pat2 Conflicts: arch/x86/mm/init_64.c git-rerere facilities created this default commit message automatically: Merge branch 'linus' into x86/pat2 Conflicts: arch/x86/mm/init_64.c, cached 14 days ago that way it's a lot easier to judge and trust the quality of recent merge commits. (and reduce/manage risks associated with merge mistakes) Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [git pull, take 2] x86 updates for v2.6.28, phase #2 - PAT updates 2008-10-10 17:41 ` [git pull, take 2] " Ingo Molnar 2008-10-10 18:15 ` Linus Torvalds @ 2008-10-10 18:17 ` Ingo Molnar 1 sibling, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2008-10-10 18:17 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin * Ingo Molnar <mingo@elte.hu> wrote: > i merged your latest tree to tip/x86/pat and compared that to pat2, > and the difference is zero: > > earth4:~/tip> git diff x86/pat x86/pat2 > earth4:~/tip> > > which is the expected result. Note: one detail i just noticed, x86/pat grew a new commit today, to fix a bug that affected Xen: b27a43c: x86, cpa: make the kernel physical mapping initialization a two pass se this is in pat2 (and hence included in the pull request as well), but was not included in my first phase2 pull request ~17 hours ago. So if you diffed old phase2 against new phase2 that commit might stick out as a weird delta. Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-10-10 18:33 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-09 23:49 [git pull] x86 updates for v2.6.28, phase #2 - PAT updates Ingo Molnar 2008-10-10 16:27 ` Linus Torvalds 2008-10-10 16:45 ` Ingo Molnar 2008-10-10 16:56 ` Ingo Molnar 2008-10-10 17:23 ` Linus Torvalds 2008-10-10 17:04 ` Ingo Molnar 2008-10-10 17:34 ` Linus Torvalds 2008-10-10 17:41 ` [git pull, take 2] " Ingo Molnar 2008-10-10 18:15 ` Linus Torvalds 2008-10-10 18:33 ` Ingo Molnar 2008-10-10 18:17 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox