Re: another new version of pageattr caching conflict fix for 2.4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-14 17:31                 ` Andrea Arcangeli
@ 2002-06-14 18:05                   ` Andi Kleen
  2002-06-16 10:08                     ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2002-06-14 18:05 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

On Fri, Jun 14, 2002 at 12:31:33PM -0500, Andrea Arcangeli wrote:
> On Fri, Jun 14, 2002 at 06:13:28PM +0200, Andi Kleen wrote:
> > +#ifdef CONFIG_HIGHMEM
> > +	/* Hopefully not be mapped anywhere else. */
> > +	if (page >= highmem_start_page) 
> > +		return 0;
> > +#endif
> 
> there's no hope here. If you don't want to code it right because nobody
> is exercising such path and flush both any per-cpu kmap-atomic slot and
> the page->virtual, please place a BUG() there or any more graceful
> failure notification.

Ok done. 

I also removed the DRM change because it was buggy. I'm not 100% yet
it is even a problem. Needs more investigation.

BTW there is another corner case that was overlooked:  
mmap of /dev/kmem, /proc/kcore, /dev/mem. To handle this correctly 
the mmap functions would need to always walk the kernel page table
and force the correct caching attribute in the user mapping.
But what to do when there is already an existing mapping to user space?


> > +int change_page_attr(struct page *page, int numpages, pgprot_t prot)
> > +{
> 
> this API not the best, again I would recommend something on these lines:

Hmm: i would really prefer to do the allocation in the caller.
If you want your API you could do  just do it as a simple wrapper to
the existing function (with the new numpages it would be even 
efficient) 

New version appended.
Also available at ftp://ftp.suse.com/pub/people/ak/v2.4/pageattr-2.4.19pre9-6

-Andi


diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/arch/i386/mm/Makefile linux-2.4.19pre9-work/arch/i386/mm/Makefile
--- linux-2.4.19pre9/arch/i386/mm/Makefile	Fri Dec 29 23:07:20 2000
+++ linux-2.4.19pre9-work/arch/i386/mm/Makefile	Wed Jun  5 02:35:11 2002
@@ -9,6 +9,7 @@
 
 O_TARGET := mm.o
 
-obj-y	 := init.o fault.o ioremap.o extable.o
+obj-y	 := init.o fault.o ioremap.o extable.o pageattr.o
+export-objs := pageattr.o
 
 include $(TOPDIR)/Rules.make
diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/arch/i386/mm/pageattr.c linux-2.4.19pre9-work/arch/i386/mm/pageattr.c
--- linux-2.4.19pre9/arch/i386/mm/pageattr.c	Thu Jan  1 01:00:00 1970
+++ linux-2.4.19pre9-work/arch/i386/mm/pageattr.c	Fri Jun 14 20:00:04 2002
@@ -0,0 +1,167 @@
+/* 
+ * Copyright 2002 Andi Kleen, SuSE Labs. 
+ * Thanks to Ben LaHaise for precious feedback.
+ */ 
+
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+#include <asm/processor.h>
+
+/* Should move most of this stuff into the appropiate includes */
+#define PAGE_LARGE (_PAGE_PSE|_PAGE_PRESENT) 
+#define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
+#define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
+
+static inline pte_t *lookup_address(unsigned long address) 
+{ 
+	pmd_t *pmd;	
+	pgd_t *pgd = pgd_offset(&init_mm, address); 
+
+	if ((pgd_val(*pgd) & PAGE_LARGE) == PAGE_LARGE)
+		return (pte_t *)pgd; 
+	pmd = pmd_offset(pgd, address); 	       
+	if ((pmd_val(*pmd) & PAGE_LARGE) == PAGE_LARGE)
+		return (pte_t *)pmd; 
+
+        return pte_offset(pmd, address);
+} 
+
+static struct page *split_large_page(unsigned long address, pgprot_t prot)
+{ 
+	int i; 
+	unsigned long addr;
+	struct page *base = alloc_pages(GFP_KERNEL, 0);
+	pte_t *pbase;
+	if (!base) 
+		return NULL;
+	address = __pa(address);
+	addr = address & LARGE_PAGE_MASK; 
+	pbase = (pte_t *)page_address(base);
+	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
+		pbase[i] = mk_pte_phys(addr, 
+				      addr == address ? prot : PAGE_KERNEL);
+	}
+	return base;
+} 
+
+static void flush_kernel_map(void * address) 
+{ 
+	/* Could use CLFLUSH here if the CPU supports it (Hammer,P4) */
+	if (boot_cpu_data.x86_model >= 4) 
+		asm volatile("wbinvd":::"memory"); 
+	__flush_tlb_all(); 	
+}
+
+static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) 
+{ 
+	set_pte_atomic(kpte, pte); 	/* change init_mm */
+#ifndef CONFIG_X86_PAE
+	{
+		struct list_head *l;
+		spin_lock(&mmlist_lock);
+		list_for_each(l, &init_mm.mmlist) { 
+			struct mm_struct *mm = list_entry(l, struct mm_struct, mmlist);
+			pmd_t *pmd = pmd_offset(pgd_offset(mm, address), address);
+			set_pte_atomic((pte_t *)pmd, pte);
+		} 
+		spin_unlock(&mmlist_lock);
+	}
+#endif
+}
+
+/* no more special protections in this 2/4MB area - revert to a
+   large page again. */
+static inline void revert_page(struct page *kpte_page, unsigned long address)
+{
+	pte_t *linear = (pte_t *) 
+		pmd_offset(pgd_offset(&init_mm, address), address);
+	set_pmd_pte(linear,  address,
+		mk_pte_phys(__pa(address & LARGE_PAGE_MASK),
+			    __pgprot(_KERNPG_TABLE|_PAGE_PSE)));
+}	
+ 
+/*
+ * Change the page attributes of an page in the linear mapping.
+ *
+ * This should be used when a page is mapped with a different caching policy
+ * than write-back somewhere - some CPUs do not like it when mappings with
+ * different caching policies exist. This changes the page attributes of the
+ * in kernel linear mapping too.
+ * 
+ * The caller needs to ensure that there are no conflicting mappings elsewhere.
+ * This function only deals with the kernel linear map.
+ * When page is in highmem it must never be kmap'ed.
+ */
+static int 
+__change_page_attr(struct page *page, pgprot_t prot, struct page **oldpage) 
+{ 
+	pte_t *kpte; 
+	unsigned long address;
+	struct page *kpte_page;
+
+#ifdef CONFIG_HIGHMEM
+	if (page >= highmem_start_page) 
+		BUG(); 
+#endif
+	address = (unsigned long)page_address(page);
+
+	kpte = lookup_address(address);
+	kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
+	if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) { 
+		if ((pte_val(*kpte) & _PAGE_PSE) == 0) { 
+			pte_t old = *kpte;
+			pte_t standard = mk_pte(page, PAGE_KERNEL); 
+
+			set_pte_atomic(kpte, mk_pte(page, prot)); 
+			if (pte_same(old,standard))
+				atomic_inc(&kpte_page->count);
+		} else {
+			struct page *split = split_large_page(address, prot); 
+			if (!split)
+				return -ENOMEM;
+			set_pmd_pte(kpte,address,mk_pte(split, PAGE_KERNEL));
+		}	
+	} else if ((pte_val(*kpte) & _PAGE_PSE) == 0) { 
+		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
+		atomic_dec(&kpte_page->count); 
+	}
+
+	if (test_bit(X86_FEATURE_PSE, &boot_cpu_data.x86_capability) && 
+	    (atomic_read(&kpte_page->count) == 1)) { 
+		*oldpage = kpte_page;
+		revert_page(kpte_page, address);
+	} 
+	return 0;
+} 
+
+int change_page_attr(struct page *page, int numpages, pgprot_t prot)
+{
+	int err = 0; 
+	struct page *fpage; 
+	int i; 
+
+	down_write(&init_mm.mmap_sem);
+	for (i = 0; i < numpages; i++, page++) { 
+		fpage = NULL;
+		err = __change_page_attr(page, prot, &fpage); 
+		if (err) 
+			break; 
+		if (fpage || i == numpages-1) { 
+			void *address = page_address(page);
+#ifdef CONFIG_SMP 
+			smp_call_function(flush_kernel_map, address, 1, 1);
+#endif	
+			flush_kernel_map(address);
+			if (fpage)
+				__free_page(fpage);
+		} 
+	} 	
+	up_write(&init_mm.mmap_sem); 
+	return err;
+}
+
+EXPORT_SYMBOL(change_page_attr);
diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/drivers/char/agp/agpgart_be.c linux-2.4.19pre9-work/drivers/char/agp/agpgart_be.c
--- linux-2.4.19pre9/drivers/char/agp/agpgart_be.c	Sun Jun  2 20:24:00 2002
+++ linux-2.4.19pre9-work/drivers/char/agp/agpgart_be.c	Fri Jun 14 17:37:46 2002
@@ -397,7 +397,7 @@ int agp_unbind_memory(agp_memory * curr)
 static void agp_generic_agp_enable(u32 mode)
 {
 	struct pci_dev *device = NULL;
-	u32 command, scratch, cap_id;
+	u32 command, scratch;
 	u8 cap_ptr;
 
 	pci_read_config_dword(agp_bridge.dev,
@@ -561,6 +561,7 @@ static int agp_generic_create_gatt_table
 					    agp_bridge.current_size;
 					break;
 				}
+				temp = agp_bridge.current_size;
 			} else {
 				agp_bridge.aperture_size_idx = i;
 			}
@@ -578,23 +579,21 @@ static int agp_generic_create_gatt_table
 	}
 	table_end = table + ((PAGE_SIZE * (1 << page_order)) - 1);
 
-	for (page = virt_to_page(table); page <= virt_to_page(table_end); page++)
+	page = virt_to_page(table);
+
+#ifdef __i386__
+	if (change_page_attr(page, 1<<page_order, PAGE_KERNEL_NOCACHE) < 0) 
+		return -ENOMEM;
+#endif
+	for (; page <= virt_to_page(table_end); page++) { 
 		SetPageReserved(page);
+	}
 
 	agp_bridge.gatt_table_real = (unsigned long *) table;
 	CACHE_FLUSH();
-	agp_bridge.gatt_table = ioremap_nocache(virt_to_phys(table),
-					(PAGE_SIZE * (1 << page_order)));
+	agp_bridge.gatt_table = agp_bridge.gatt_table_real;
 	CACHE_FLUSH();
 
-	if (agp_bridge.gatt_table == NULL) {
-		for (page = virt_to_page(table); page <= virt_to_page(table_end); page++)
-			ClearPageReserved(page);
-
-		free_pages((unsigned long) table, page_order);
-
-		return -ENOMEM;
-	}
 	agp_bridge.gatt_bus_addr = virt_to_phys(agp_bridge.gatt_table_real);
 
 	for (i = 0; i < num_entries; i++) {
@@ -651,7 +650,6 @@ static int agp_generic_free_gatt_table(v
 	 * from the table.
 	 */
 
-	iounmap(agp_bridge.gatt_table);
 	table = (char *) agp_bridge.gatt_table_real;
 	table_end = table + ((PAGE_SIZE * (1 << page_order)) - 1);
 
@@ -769,6 +767,13 @@ static unsigned long agp_generic_alloc_p
 	if (page == NULL) {
 		return 0;
 	}
+#ifdef __i386__
+	if (change_page_attr(page, 1, PAGE_KERNEL_NOCACHE) < 0) { 
+	        __free_page(page); 			   
+		return 0; 
+	}
+#endif
+
 	get_page(page);
 	LockPage(page);
 	atomic_inc(&agp_bridge.current_memory_agp);
@@ -785,6 +790,11 @@ static void agp_generic_destroy_page(uns
 	}
 	
 	page = virt_to_page(pt);
+#ifdef __i386__
+	change_page_attr(page, 1, PAGE_KERNEL);
+#endif
+	
+
 	put_page(page);
 	UnlockPage(page);
 	free_page((unsigned long) pt);
@@ -2241,17 +2251,15 @@ static int amd_create_page_map(amd_page_
 	if (page_map->real == NULL) {
 		return -ENOMEM;
 	}
-	SetPageReserved(virt_to_page(page_map->real));
-	CACHE_FLUSH();
-	page_map->remapped = ioremap_nocache(virt_to_phys(page_map->real), 
-					    PAGE_SIZE);
-	if (page_map->remapped == NULL) {
-		ClearPageReserved(virt_to_page(page_map->real));
-		free_page((unsigned long) page_map->real);
-		page_map->real = NULL;
+#ifdef __i386__
+	if (change_page_attr(virt_to_page(page_map->real), 
+			     1, PAGE_KERNEL_NOCACHE)) {
+		free_page(page_map->real); 
 		return -ENOMEM;
 	}
-	CACHE_FLUSH();
+#endif
+	SetPageReserved(virt_to_page(page_map->real));
+	page_map->remapped = page_map->real; 
 
 	for(i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) {
 		page_map->remapped[i] = agp_bridge.scratch_page;
@@ -2262,7 +2270,6 @@ static int amd_create_page_map(amd_page_
 
 static void amd_free_page_map(amd_page_map *page_map)
 {
-	iounmap(page_map->remapped);
 	ClearPageReserved(virt_to_page(page_map->real));
 	free_page((unsigned long) page_map->real);
 }
@@ -2744,27 +2751,22 @@ static void ali_cache_flush(void)
 
 static unsigned long ali_alloc_page(void)
 {
-	struct page *page;
-	u32 temp;
-
-	page = alloc_page(GFP_KERNEL);
-	if (page == NULL)
+	unsigned long p = agp_generic_alloc_page(); 
+	if (!p) 
 		return 0;
 
-	get_page(page);
-	LockPage(page);
-	atomic_inc(&agp_bridge.current_memory_agp);
-
+	/* probably not needed anymore */
 	global_cache_flush();
 
 	if (agp_bridge.type == ALI_M1541) {
+		u32 temp;
 		pci_read_config_dword(agp_bridge.dev, ALI_CACHE_FLUSH_CTRL, &temp);
 		pci_write_config_dword(agp_bridge.dev, ALI_CACHE_FLUSH_CTRL,
 				(((temp & ALI_CACHE_FLUSH_ADDR_MASK) |
-				  virt_to_phys(page_address(page))) |
+				  virt_to_phys(p)) |
 				    ALI_CACHE_FLUSH_EN ));
 	}
-	return (unsigned long)page_address(page);
+	return p;
 }
 
 static void ali_destroy_page(unsigned long addr)
@@ -2786,11 +2788,7 @@ static void ali_destroy_page(unsigned lo
 				    ALI_CACHE_FLUSH_EN));
 	}
 
-	page = virt_to_page(pt);
-	put_page(page);
-	UnlockPage(page);
-	free_page((unsigned long) pt);
-	atomic_dec(&agp_bridge.current_memory_agp);
+	agp_generic_destroy_page(pt);
 }
 
 /* Setup function */
@@ -2870,17 +2868,15 @@ static int serverworks_create_page_map(s
 	if (page_map->real == NULL) {
 		return -ENOMEM;
 	}
-	SetPageReserved(virt_to_page(page_map->real));
-	CACHE_FLUSH();
-	page_map->remapped = ioremap_nocache(virt_to_phys(page_map->real), 
-					    PAGE_SIZE);
-	if (page_map->remapped == NULL) {
-		ClearPageReserved(virt_to_page(page_map->real));
-		free_page((unsigned long) page_map->real);
-		page_map->real = NULL;
+#ifdef __i386__
+	if (change_page_attr(virt_to_page(page_map->real), 1, 
+			     PAGE_KERNEL_NOCACHE) < 0) {
+		free_page(page_map->real);
 		return -ENOMEM;
 	}
-	CACHE_FLUSH();
+#endif
+	SetPageReserved(virt_to_page(page_map->real));
+	page_map->remapped = page_map->real;
 
 	for(i = 0; i < PAGE_SIZE / sizeof(unsigned long); i++) {
 		page_map->remapped[i] = agp_bridge.scratch_page;
@@ -2891,7 +2887,6 @@ static int serverworks_create_page_map(s
 
 static void serverworks_free_page_map(serverworks_page_map *page_map)
 {
-	iounmap(page_map->remapped);
 	ClearPageReserved(virt_to_page(page_map->real));
 	free_page((unsigned long) page_map->real);
 }
diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/include/asm-i386/pgtable-2level.h linux-2.4.19pre9-work/include/asm-i386/pgtable-2level.h
--- linux-2.4.19pre9/include/asm-i386/pgtable-2level.h	Thu Jul 26 22:40:32 2001
+++ linux-2.4.19pre9-work/include/asm-i386/pgtable-2level.h	Thu Jun 13 16:52:53 2002
@@ -40,6 +40,8 @@ static inline int pgd_present(pgd_t pgd)
  * hook is made available.
  */
 #define set_pte(pteptr, pteval) (*(pteptr) = pteval)
+#define set_pte_atomic(pteptr, pteval) (*(pteptr) = pteval)
+
 /*
  * (pmds are folded into pgds so this doesnt get actually called,
  * but the define is needed for a generic inline function.)
diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/include/asm-i386/pgtable-3level.h linux-2.4.19pre9-work/include/asm-i386/pgtable-3level.h
--- linux-2.4.19pre9/include/asm-i386/pgtable-3level.h	Thu Jul 26 22:40:32 2001
+++ linux-2.4.19pre9-work/include/asm-i386/pgtable-3level.h	Thu Jun 13 17:15:14 2002
@@ -53,6 +53,9 @@ static inline void set_pte(pte_t *ptep, 
 		set_64bit((unsigned long long *)(pmdptr),pmd_val(pmdval))
 #define set_pgd(pgdptr,pgdval) \
 		set_64bit((unsigned long long *)(pgdptr),pgd_val(pgdval))
+#define set_pte_atomic(pteptr,pteval) \
+		set_64bit((unsigned long long *)(pteptr),pte_val(pteval))
+
 
 /*
  * Pentium-II erratum A13: in PAE mode we explicitly have to flush
diff -burpN -X ../KDIFX -x *-o -x *.[ch]-* linux-2.4.19pre9/include/asm-i386/pgtable.h linux-2.4.19pre9-work/include/asm-i386/pgtable.h
--- linux-2.4.19pre9/include/asm-i386/pgtable.h	Mon Jun  3 21:15:18 2002
+++ linux-2.4.19pre9-work/include/asm-i386/pgtable.h	Fri Jun 14 17:45:29 2002
@@ -347,6 +347,9 @@ static inline pte_t pte_modify(pte_t pte
 #define pte_to_swp_entry(pte)		((swp_entry_t) { (pte).pte_low })
 #define swp_entry_to_pte(x)		((pte_t) { (x).val })
 
+struct page;
+int change_page_attr(struct page *, int, pgprot_t prot);
+
 #endif /* !__ASSEMBLY__ */
 
 /* Needs to be defined here and not in linux/mm.h, as it is arch dependent */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-14 18:05                   ` another new " Andi Kleen
@ 2002-06-16 10:08                     ` Eric W. Biederman
  2002-06-16 16:48                       ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-16 10:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

Andi Kleen <ak@suse.de> writes:

> On Fri, Jun 14, 2002 at 12:31:33PM -0500, Andrea Arcangeli wrote:
> > On Fri, Jun 14, 2002 at 06:13:28PM +0200, Andi Kleen wrote:
> > > +#ifdef CONFIG_HIGHMEM
> > > +	/* Hopefully not be mapped anywhere else. */
> > > +	if (page >= highmem_start_page) 
> > > +		return 0;
> > > +#endif
> > 
> > there's no hope here. If you don't want to code it right because nobody
> > is exercising such path and flush both any per-cpu kmap-atomic slot and
> > the page->virtual, please place a BUG() there or any more graceful
> > failure notification.
> 
> Ok done. 
> 
> I also removed the DRM change because it was buggy. I'm not 100% yet
> it is even a problem. Needs more investigation.
> 
> BTW there is another corner case that was overlooked:  
> mmap of /dev/kmem, /proc/kcore, /dev/mem. To handle this correctly 
> the mmap functions would need to always walk the kernel page table
> and force the correct caching attribute in the user mapping.
> But what to do when there is already an existing mapping to user space?

Don't allow the change_page_attr if page->count > 1 is an easy solution,
and it probably suffices.  Beyond that anything mmaped can be found
by walking into the backing address space, and then through the
vm_area_structs found with i_mmap && i_mmap_shared.  Of course the
vm_area_structs may possibly need to break because of the multiple
page protections.

> > > +int change_page_attr(struct page *page, int numpages, pgprot_t prot)
> > > +{
> > 
> > this API not the best, again I would recommend something on these lines:
> 
> Hmm: i would really prefer to do the allocation in the caller.
> If you want your API you could do  just do it as a simple wrapper to
> the existing function (with the new numpages it would be even 
> efficient) 

Using pgprot_t to convey the cacheablity of a page appears to be an
abuse.  At the very least we need a PAGE_CACHE_MASK, to find just
the cacheability attributes.

And we should really consider using the other cacheability page
attributes on x86, not just cache enable/disable.  Using just mtrr's
is limiting in that you don't always have enough of them, and that
sometimes valid ram is mapped uncacheable, because of the mtrr
alignment restrictions. 

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 10:08                     ` Eric W. Biederman
@ 2002-06-16 16:48                       ` Andi Kleen
  2002-06-16 17:50                         ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2002-06-16 16:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Andrea Arcangeli, Benjamin LaHaise, linux-kernel,
	Richard Brunner, mark.langsdorf

On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> Andi Kleen <ak@suse.de> writes:
> 
> > On Fri, Jun 14, 2002 at 12:31:33PM -0500, Andrea Arcangeli wrote:
> > > On Fri, Jun 14, 2002 at 06:13:28PM +0200, Andi Kleen wrote:
> > > > +#ifdef CONFIG_HIGHMEM
> > > > +	/* Hopefully not be mapped anywhere else. */
> > > > +	if (page >= highmem_start_page) 
> > > > +		return 0;
> > > > +#endif
> > > 
> > > there's no hope here. If you don't want to code it right because nobody
> > > is exercising such path and flush both any per-cpu kmap-atomic slot and
> > > the page->virtual, please place a BUG() there or any more graceful
> > > failure notification.
> > 
> > Ok done. 
> > 
> > I also removed the DRM change because it was buggy. I'm not 100% yet
> > it is even a problem. Needs more investigation.
> > 
> > BTW there is another corner case that was overlooked:  
> > mmap of /dev/kmem, /proc/kcore, /dev/mem. To handle this correctly 
> > the mmap functions would need to always walk the kernel page table
> > and force the correct caching attribute in the user mapping.
> > But what to do when there is already an existing mapping to user space?
> 
> Don't allow the change_page_attr if page->count > 1 is an easy solution,
> and it probably suffices.  Beyond that anything mmaped can be found

Erm, what should it do then? Fail the AGP map ?

> by walking into the backing address space, and then through the
> vm_area_structs found with i_mmap && i_mmap_shared.  Of course the
> vm_area_structs may possibly need to break because of the multiple
> page protections.

I know that, but it seems rather racy expensive and complicated.

> 
> > > > +int change_page_attr(struct page *page, int numpages, pgprot_t prot)
> > > > +{
> > > 
> > > this API not the best, again I would recommend something on these lines:
> > 
> > Hmm: i would really prefer to do the allocation in the caller.
> > If you want your API you could do  just do it as a simple wrapper to
> > the existing function (with the new numpages it would be even 
> > efficient) 
> 
> Using pgprot_t to convey the cacheablity of a page appears to be an
> abuse.  At the very least we need a PAGE_CACHE_MASK, to find just
> the cacheability attributes.

change_page_attr is more than just cachability. You can use it e.g.
to write protect kernel pages not (if you wanted to do that for some
reason) 

The current one is fine with PAGE_KERNEL_NOCACHE, you could eventually
define PAGE_KERNEL_WRITECOMBINING too if you wanted.


> And we should really consider using the other cacheability page
> attributes on x86, not just cache enable/disable.  Using just mtrr's
> is limiting in that you don't always have enough of them, and that
> sometimes valid ram is mapped uncacheable, because of the mtrr
> alignment restrictions. 

Exposing it to user space in a more general way is tricky because
you need to avoid aliases with different caching attributes and also
change the kernel map as needed (would be surely an interesting 
project, but definitely requires more work) 

You can already use WT. DRM does that already in fact. 
Newer Intel/AMD CPUs allow to set up a more general PAT table with some
more modis, but to be honest I don't see the point in most of them,
except perhaps WC. Unfortunately there is no 'weak ordering like alpha/sparc64'
mode that could be used in the kernel :-)

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 16:48                       ` Andi Kleen
@ 2002-06-16 17:50                         ` Eric W. Biederman
  2002-06-16 18:43                           ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-16 17:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

Andi Kleen <ak@suse.de> writes:

> On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> > 
> > Don't allow the change_page_attr if page->count > 1 is an easy solution,
> > and it probably suffices.  Beyond that anything mmaped can be found
> 
> Erm, what should it do then? Fail the AGP map ?

Why not.  If user-space has already mapped the memory one way, turning
around and using it another way is a problem.  If the memory is
dynamically allocated for AGP I don't even see how this case
could occur.

But the mmap case still has to apply the general cache ability of the
page from the kernel page tables (or where ever it is cached), if you
do the operation in the opposite order.

> > by walking into the backing address space, and then through the
> > vm_area_structs found with i_mmap && i_mmap_shared.  Of course the
> > vm_area_structs may possibly need to break because of the multiple
> > page protections.
> 
> I know that, but it seems rather racy expensive and complicated.
> 
> > 
> > > > > +int change_page_attr(struct page *page, int numpages, pgprot_t prot)
> > > > > +{
> > > > 
> > > > this API not the best, again I would recommend something on these lines:
> > > 
> > > Hmm: i would really prefer to do the allocation in the caller.
> > > If you want your API you could do  just do it as a simple wrapper to
> > > the existing function (with the new numpages it would be even 
> > > efficient) 
> > 
> > Using pgprot_t to convey the cacheablity of a page appears to be an
> > abuse.  At the very least we need a PAGE_CACHE_MASK, to find just
> > the cacheability attributes.
> 
> change_page_attr is more than just cachability. You can use it e.g.
> to write protect kernel pages not (if you wanted to do that for some
> reason) 
> The current one is fine with PAGE_KERNEL_NOCACHE, you could eventually
> define PAGE_KERNEL_WRITECOMBINING too if you wanted.
> 

Combining different operations like modifying cacheability and the
page protections worries me.  Unless we name the operations something
like change_kernel_page_attr() in which case we are only worrying
about a subset of the problem.  Which it appears that we are.

> > And we should really consider using the other cacheability page
> > attributes on x86, not just cache enable/disable.  Using just mtrr's
> > is limiting in that you don't always have enough of them, and that
> > sometimes valid ram is mapped uncacheable, because of the mtrr
> > alignment restrictions. 
> 
> Exposing it to user space in a more general way is tricky because
> you need to avoid aliases with different caching attributes and also
> change the kernel map as needed (would be surely an interesting 
> project, but definitely requires more work) 

The kernel already exposes through /proc/mtrr the ability to
arbitrarily change the caching of pages.  And since this is a case
of physical aliasing we need to make certain the mtrr's don't
conflict.  So much of this is already exposed to user space already.

> You can already use WT. DRM does that already in fact. 
> Newer Intel/AMD CPUs allow to set up a more general PAT table with some
> more modis, but to be honest I don't see the point in most of them,
> except perhaps WC. Unfortunately there is no 'weak ordering like alpha/sparc64'
> mode that could be used in the kernel :-)

With the PAT table only write-back, write-combining, uncached interest
me.  Given the number of BIOS's that don't set all of RAM to
write-back and the major performance penalty of running on uncached
RAM having the kernel fix it, would reduce a lot of headaches long
term. 

Primarily I'm brining up the connections because it may be worth
considering them.  For 2.4.19 unless the implementation is trivial it
probably isn't wise to try for more that what is needed.  For later
kernels 2.4.x, 2.4.20+ it is almost certainly worth doing something.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 17:50                         ` Eric W. Biederman
@ 2002-06-16 18:43                           ` Andi Kleen
  2002-06-16 19:56                             ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2002-06-16 18:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Andrea Arcangeli, Benjamin LaHaise, linux-kernel,
	Richard Brunner, mark.langsdorf

On Sun, Jun 16, 2002 at 11:50:51AM -0600, Eric W. Biederman wrote:
> Andi Kleen <ak@suse.de> writes:
> 
> > On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> > > 
> > > Don't allow the change_page_attr if page->count > 1 is an easy solution,
> > > and it probably suffices.  Beyond that anything mmaped can be found
> > 
> > Erm, what should it do then? Fail the AGP map ?
> 
> Why not.  If user-space has already mapped the memory one way, turning
> around and using it another way is a problem.  If the memory is
> dynamically allocated for AGP I don't even see how this case
> could occur.

It's a random error. AGP randomly gets some page that is already mapped
by /dev/mem somewhere. There is nothing that it can do to avoid this.
As /dev/mem is root only and root can already cause much damage it is
probably not worth avoiding the particular breakage.

Now if someone used change_page_attr as a more general call and used
it on random memory that could be already mapped to user space in a more
legitimate way then he or she has to do much more work anyways (avoiding
alias etc.). Part of that more work would be to check the page count.
I don't think it should be done by the low level routine.


> 
> The kernel already exposes through /proc/mtrr the ability to
> arbitrarily change the caching of pages.  And since this is a case
> of physical aliasing we need to make certain the mtrr's don't
> conflict.  So much of this is already exposed to user space already.

MTRRs work on physical, not virtual memory, so they have no aliasing
issues.

> > You can already use WT. DRM does that already in fact. 
> > Newer Intel/AMD CPUs allow to set up a more general PAT table with some
> > more modis, but to be honest I don't see the point in most of them,
> > except perhaps WC. Unfortunately there is no 'weak ordering like alpha/sparc64'
> > mode that could be used in the kernel :-)
> 
> With the PAT table only write-back, write-combining, uncached interest
> me.  Given the number of BIOS's that don't set all of RAM to
> write-back and the major performance penalty of running on uncached
> RAM having the kernel fix it, would reduce a lot of headaches long
> term. 

Fixing the MTRRs is fine, but it is really outside the scope of my patch.
Just changing the kernel map wouldn't be enough to fix wrong MTRRs,
because it wouldn't cover highmem. 

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 18:43                           ` Andi Kleen
@ 2002-06-16 19:56                             ` Eric W. Biederman
  2002-06-16 23:37                               ` Andi Kleen
  2002-06-17  6:53                               ` Andrea Arcangeli
  0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-16 19:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

Andi Kleen <ak@suse.de> writes:

> On Sun, Jun 16, 2002 at 11:50:51AM -0600, Eric W. Biederman wrote:
> > Andi Kleen <ak@suse.de> writes:
> > 
> > > On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> > > > 
> > > > Don't allow the change_page_attr if page->count > 1 is an easy solution,
> > > > and it probably suffices.  Beyond that anything mmaped can be found
> > > 
> > > Erm, what should it do then? Fail the AGP map ?
> > 
> > Why not.  If user-space has already mapped the memory one way, turning
> > around and using it another way is a problem.  If the memory is
> > dynamically allocated for AGP I don't even see how this case
> > could occur.
> 
> It's a random error. AGP randomly gets some page that is already mapped
> by /dev/mem somewhere. There is nothing that it can do to avoid this.
> As /dev/mem is root only and root can already cause much damage it is
> probably not worth avoiding the particular breakage.

Sorry my confusion I thought mmapping /dev/mem increased the page count,
in which case it would be easy to detect, and avoid.  I just looked
and it doesn't so there appears no trivial way to detect or handle
that case.
 
> Now if someone used change_page_attr as a more general call and used
> it on random memory that could be already mapped to user space in a more
> legitimate way then he or she has to do much more work anyways (avoiding
> alias etc.). Part of that more work would be to check the page count.
> I don't think it should be done by the low level routine.
> 
> 
> > 
> > The kernel already exposes through /proc/mtrr the ability to
> > arbitrarily change the caching of pages.  And since this is a case
> > of physical aliasing we need to make certain the mtrr's don't
> > conflict.  So much of this is already exposed to user space already.
> 
> MTRRs work on physical, not virtual memory, so they have no aliasing
> issues.

Doesn't the AGP aperture cause a physical alias?  Leading to strange
the same problems if the agp aperture was marked write-back, and the
memory was marked uncacheable.  My gut impression is to just make the
agp aperture write-back cacheable, and then we don't have to change
the kernel page table at all.  Unfortunately I don't expect the host
bridge with the memory and agp controllers to like that mode,
especially as there are physical aliasing issues.

> > > You can already use WT. DRM does that already in fact. 
> > > Newer Intel/AMD CPUs allow to set up a more general PAT table with some
> > > more modis, but to be honest I don't see the point in most of them,
> > > except perhaps WC. Unfortunately there is no 'weak ordering like
> alpha/sparc64'
> 
> > > mode that could be used in the kernel :-)
> > 
> > With the PAT table only write-back, write-combining, uncached interest
> > me.  Given the number of BIOS's that don't set all of RAM to
> > write-back and the major performance penalty of running on uncached
> > RAM having the kernel fix it, would reduce a lot of headaches long
> > term. 
> 
> Fixing the MTRRs is fine, but it is really outside the scope of my patch.
> Just changing the kernel map wouldn't be enough to fix wrong MTRRs,
> because it wouldn't cover highmem. 

My preferred fix is to use PAT, to override the buggy mtrrs.  Which
brings up the same aliasing issues.  Which makes it related but
outside the scope of the problem.

Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 19:56                             ` Eric W. Biederman
@ 2002-06-16 23:37                               ` Andi Kleen
  2002-06-17  0:08                                 ` Albert D. Cahalan
  2002-06-17  4:06                                 ` Eric W. Biederman
  2002-06-17  6:53                               ` Andrea Arcangeli
  1 sibling, 2 replies; 15+ messages in thread
From: Andi Kleen @ 2002-06-16 23:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Andrea Arcangeli, Benjamin LaHaise, linux-kernel,
	Richard Brunner, mark.langsdorf

> > MTRRs work on physical, not virtual memory, so they have no aliasing
> > issues.
> 
> Doesn't the AGP aperture cause a physical alias?  Leading to strange

Yes. That's what this patch is all about.

> the same problems if the agp aperture was marked write-back, and the

AGP aperture is uncacheable, not write-back.

> memory was marked uncacheable.  My gut impression is to just make the
> agp aperture write-back cacheable, and then we don't have to change
> the kernel page table at all.  Unfortunately I don't expect the host

That would violate the AGP specification.

> bridge with the memory and agp controllers to like that mode,
> especially as there are physical aliasing issues.

exactly.

> 
> > Fixing the MTRRs is fine, but it is really outside the scope of my patch.
> > Just changing the kernel map wouldn't be enough to fix wrong MTRRs,
> > because it wouldn't cover highmem. 
> 
> My preferred fix is to use PAT, to override the buggy mtrrs.  Which
> brings up the same aliasing issues.  Which makes it related but
> outside the scope of the problem.

I don't follow you here. IMHO it is much easier to fix the MTRRs in the
MTRR driver for those rare buggy BIOS (if they exist - I've never seen one)
than to hack up all of memory management just to get the right bits set.
I see no disadvantage of using the MTRRs and it is lot simpler than
PAT and pte bits.


-Andi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 23:37                               ` Andi Kleen
@ 2002-06-17  0:08                                 ` Albert D. Cahalan
  2002-06-17  4:06                                 ` Eric W. Biederman
  1 sibling, 0 replies; 15+ messages in thread
From: Albert D. Cahalan @ 2002-06-17  0:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric W. Biederman, Andi Kleen, Andrea Arcangeli, Benjamin LaHaise,
	linux-kernel, Richard Brunner, mark.langsdorf

Andi Kleen writes:

>> the same problems if the agp aperture was marked write-back, and the
>
> AGP aperture is uncacheable, not write-back.
>
>> memory was marked uncacheable.  My gut impression is to just make the
>> agp aperture write-back cacheable, and then we don't have to change
>> the kernel page table at all.  Unfortunately I don't expect the host
>
> That would violate the AGP specification.
>
>> bridge with the memory and agp controllers to like that mode,
>> especially as there are physical aliasing issues.
>
> exactly.

You can do whatever you want, as long as...

1. you have cache control instructions and use them
2. the bridge ignores the coherency protocol (no machine check)

Most likely you should make the AGP memory write-back
cacheable. This requires some care regarding cache lines,
but ought to be faster.

>>> Fixing the MTRRs is fine, but it is really outside the scope of my patch.
>>> Just changing the kernel map wouldn't be enough to fix wrong MTRRs,
>>> because it wouldn't cover highmem.
>>
>> My preferred fix is to use PAT, to override the buggy mtrrs.  Which
>> brings up the same aliasing issues.  Which makes it related but
>> outside the scope of the problem.
>
> I don't follow you here. IMHO it is much easier to fix the MTRRs in the
> MTRR driver for those rare buggy BIOS (if they exist - I've never seen one)
> than to hack up all of memory management just to get the right bits set.
> I see no disadvantage of using the MTRRs and it is lot simpler than
> PAT and pte bits.

For non-x86 one must "hack up all of memory management" anyway.

Example: There aren't any MTRRs on the PowerPC, but every page
has 4 memory type bits. It's not OK to map something more than
one way at the same time. Large "pages" (256 MB each) are used
to cover all of non-highmem physical memory.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 23:37                               ` Andi Kleen
  2002-06-17  0:08                                 ` Albert D. Cahalan
@ 2002-06-17  4:06                                 ` Eric W. Biederman
  1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-17  4:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

Andi Kleen <ak@suse.de> writes:

> > > MTRRs work on physical, not virtual memory, so they have no aliasing
> > > issues.
> > 
> > Doesn't the AGP aperture cause a physical alias?  Leading to strange
> 
> Yes. That's what this patch is all about.
> 
> > the same problems if the agp aperture was marked write-back, and the
> 
> AGP aperture is uncacheable, not write-back.
> 
> > memory was marked uncacheable.  My gut impression is to just make the
> > agp aperture write-back cacheable, and then we don't have to change
> > the kernel page table at all.  Unfortunately I don't expect the host
> 
> That would violate the AGP specification.
> 
> > bridge with the memory and agp controllers to like that mode,
> > especially as there are physical aliasing issues.
> 
> exactly.

All of which is an AGP design bug, if you want performance you don't
cripple your caches, but we have to live with it so no use tilting at
windmills.

> > > Fixing the MTRRs is fine, but it is really outside the scope of my patch.
> > > Just changing the kernel map wouldn't be enough to fix wrong MTRRs,
> > > because it wouldn't cover highmem. 
> > 
> > My preferred fix is to use PAT, to override the buggy mtrrs.  Which
> > brings up the same aliasing issues.  Which makes it related but
> > outside the scope of the problem.
> 
> I don't follow you here. IMHO it is much easier to fix the MTRRs in the
> MTRR driver for those rare buggy BIOS (if they exist - I've never seen one)

I've heard of several and dealt with one.  The problem was essentially they
ran out of mtrrs, the edges of free memory were to rough.

> than to hack up all of memory management just to get the right bits set.
> I see no disadvantage of using the MTRRs and it is lot simpler than
> PAT and pte bits.

There are not enough MTRRs.  And using the PAT bits to say memory is
write-back can be a constant.  It just takes a little work to get in
place.  Plus the weird assortment of consistency issues.

For most purposes PAT makes memory easier to deal with because you
can be as fine grained as you like.  The difficulty is that you must
have consistent attributes across all of your virtual mappings.  

The other case PAT should help is when a machine has multiple cards
that can benefit from write-combining.  Currently running out of mtrrs
is a problem.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-16 19:56                             ` Eric W. Biederman
  2002-06-16 23:37                               ` Andi Kleen
@ 2002-06-17  6:53                               ` Andrea Arcangeli
  2002-06-17 15:46                                 ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Andrea Arcangeli @ 2002-06-17  6:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

On Sun, Jun 16, 2002 at 01:56:51PM -0600, Eric W. Biederman wrote:
> Andi Kleen <ak@suse.de> writes:
> 
> > On Sun, Jun 16, 2002 at 11:50:51AM -0600, Eric W. Biederman wrote:
> > > Andi Kleen <ak@suse.de> writes:
> > > 
> > > > On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> > > > > 
> > > > > Don't allow the change_page_attr if page->count > 1 is an easy solution,
> > > > > and it probably suffices.  Beyond that anything mmaped can be found
> > > > 
> > > > Erm, what should it do then? Fail the AGP map ?
> > > 
> > > Why not.  If user-space has already mapped the memory one way, turning
> > > around and using it another way is a problem.  If the memory is
> > > dynamically allocated for AGP I don't even see how this case
> > > could occur.
> > 
> > It's a random error. AGP randomly gets some page that is already mapped
> > by /dev/mem somewhere. There is nothing that it can do to avoid this.
> > As /dev/mem is root only and root can already cause much damage it is
> > probably not worth avoiding the particular breakage.
> 
> Sorry my confusion I thought mmapping /dev/mem increased the page count,
> in which case it would be easy to detect, and avoid.  I just looked
> and it doesn't so there appears no trivial way to detect or handle
> that case.

yep, because remap_page_range is used to work on things that doesn't
have a page structure in the first place too. We cannot trap /dev/mem,
unless additional infrastructure is developed for it. I would ignore it,
/dev/mem should be necessary only to non-ram pages, and so it should
never be a problem. If it's a problem it falls into Al's category of the
"doctor it hurts".

> My preferred fix is to use PAT, to override the buggy mtrrs.  Which
> brings up the same aliasing issues.  Which makes it related but
> outside the scope of the problem.

I don't see why you keep wondering about cacheability attributes. We
simply need to avoid the aliasing, then you don't care about the cache
attributes. It is a matter of the agp code if it wants write-back
write-combining or uncached then. the corruption happens because of the
physical aliases, if you remove the physical aliases in the cache then
you don't need to care about the cache-type.

So my suggestion, besides changing the API to the device drivers so that
the arch returns an array of pages ready to handle physical aliasing (so
moving the allocation/freeing in the arch/ section), is to mark the pte
of the aliased 4k pages as invalid. No need to care about cacheability
then. Just mark the pte invalid, flush the tlb, and flush the caches.
When you drop pages from the aperture, drop the agp mapping, flush the
tlb for the agp mapping, flush caches and reeanble the original
pagetable possibly making it again a largepage. No need to care about
mtrr or pte cacheability attributes this way, and even better this way
we'll oops any bogus user that is trying to access the page via the
kernel direct mapping while there's a physical alias, such guy should
use the agp virtual mapping instead, not the kernel direct mapping.
Marking the pte invalid is's even simpler than marking it uncacheable,
so it's very simple to change Andi's patch that way.

Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-17  6:53                               ` Andrea Arcangeli
@ 2002-06-17 15:46                                 ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-17 15:46 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Benjamin LaHaise, linux-kernel, Richard Brunner,
	mark.langsdorf

Andrea Arcangeli <andrea@suse.de> writes:

> On Sun, Jun 16, 2002 at 01:56:51PM -0600, Eric W. Biederman wrote:
> > Andi Kleen <ak@suse.de> writes:
> > 
> > > On Sun, Jun 16, 2002 at 11:50:51AM -0600, Eric W. Biederman wrote:
> > > > Andi Kleen <ak@suse.de> writes:
> > > > 
> > > > > On Sun, Jun 16, 2002 at 04:08:49AM -0600, Eric W. Biederman wrote:
> > > > > > 
> > > > > > Don't allow the change_page_attr if page->count > 1 is an easy
> solution,
> 
> > > > > > and it probably suffices.  Beyond that anything mmaped can be found
> > > > > 
> > > > > Erm, what should it do then? Fail the AGP map ?
> > > > 
> > > > Why not.  If user-space has already mapped the memory one way, turning
> > > > around and using it another way is a problem.  If the memory is
> > > > dynamically allocated for AGP I don't even see how this case
> > > > could occur.
> > > 
> > > It's a random error. AGP randomly gets some page that is already mapped
> > > by /dev/mem somewhere. There is nothing that it can do to avoid this.
> > > As /dev/mem is root only and root can already cause much damage it is
> > > probably not worth avoiding the particular breakage.
> > 
> > Sorry my confusion I thought mmapping /dev/mem increased the page count,
> > in which case it would be easy to detect, and avoid.  I just looked
> > and it doesn't so there appears no trivial way to detect or handle
> > that case.
> 
> yep, because remap_page_range is used to work on things that doesn't
> have a page structure in the first place too. We cannot trap /dev/mem,
> unless additional infrastructure is developed for it. I would ignore it,
> /dev/mem should be necessary only to non-ram pages, and so it should
> never be a problem. If it's a problem it falls into Al's category of the
> "doctor it hurts".

Which is fine as long as we recognize it will hurt if we use it.  
 
> > My preferred fix is to use PAT, to override the buggy mtrrs.  Which
> > brings up the same aliasing issues.  Which makes it related but
> > outside the scope of the problem.
> 
> I don't see why you keep wondering about cacheability attributes. 

Because this patch is nipping on the edge of a more general problem,
and the practical purpose I can serve is to stir up ideas.  

If the mtrrs we a good flexible general purpose solution we could just
mark all of the affected memory as uncacheable, in the mtrrs, and not
even modify the page tables.  As it is this case is one of several
proofs that the mtrrs are significantly flexibility impaired.

Beyond that having cacheability attributes in the page table is not
limited to x86, and we are coming close to building the infrastructure 
needed to handle this in general.  Not that I advocate we do anything
except consider the case at this juncture, but having had the
conversation now it lays the foundation for developing code that
handles the cache of virtual aliases having the same cacheability attributes.

> We
> simply need to avoid the aliasing, then you don't care about the cache
> attributes. 

That we should avoid physical aliasing I agree.  That is the real
problem even though the summary provided by AMD a little while ago
got this wrong.

> It is a matter of the agp code if it wants write-back
> write-combining or uncached then. the corruption happens because of the
> physical aliases, if you remove the physical aliases in the cache then
> you don't need to care about the cache-type.

Potentially.  If you start caching the data it will only work if the
agp controller participates in the cache-consistency protocol.
 
> So my suggestion, besides changing the API to the device drivers so that
> the arch returns an array of pages ready to handle physical aliasing (so
> moving the allocation/freeing in the arch/ section), is to mark the pte
> of the aliased 4k pages as invalid. No need to care about cacheability
> then. Just mark the pte invalid, flush the tlb, and flush the caches.
> When you drop pages from the aperture, drop the agp mapping, flush the
> tlb for the agp mapping, flush caches and reeanble the original
> pagetable possibly making it again a largepage. No need to care about
> mtrr or pte cacheability attributes this way, and even better this way
> we'll oops any bogus user that is trying to access the page via the
> kernel direct mapping while there's a physical alias, such guy should
> use the agp virtual mapping instead, not the kernel direct mapping.
> Marking the pte invalid is's even simpler than marking it uncacheable,
> so it's very simple to change Andi's patch that way.

A relatively simple hook into the /dev/mem infrastructure would
probably be worth it as well.  If you are going to oops the kernel,
giving user space an error is polite.

Eric




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: another new version of pageattr caching conflict fix for 2.4
@ 2002-06-17 21:07 richard.brunner
  2002-06-18 17:34 ` Eric W. Biederman
  2002-06-26  1:37 ` another new version of pageattr caching conflict fix for Albert D. Cahalan
  0 siblings, 2 replies; 15+ messages in thread
From: richard.brunner @ 2002-06-17 21:07 UTC (permalink / raw)
  Cc: linux-kernel

Making the AGP Aperture write-back cacheable is not good from
a performance perspective. (I can't comment on which is better 
from a Linux Kernel perspective).

An Aperture Page can be made
cache-coherent depending on the implementation
and the AGP 3.0 spec provides an
architectural way of specifying and controlling these as
well.  But, by default the area is not made cache-coherent
due to the performance loss and the lack of software to take
advantage of it -- the two play off against each
other. 

Making it cache-coherent causes every AGP access to
snoop processor caches and this can be quite a hit in
performance when you consider the predominant AGP software
model. Most software that takes advantage of AGP is still
using the old Intel model of uncacheable, the majority of
data placed in the Aperture are read-only structures for the
AGP device -- such as vertex lists, locked vertex arrays,
and texture data. For the most part this fits the current
paradigm of throwing textures and vertices at the graphics
device. The only graphics area found so far that could
benefit from a coherent aperture is video capture data which
streams in from the graphics device and requires CPU
post-processing.



-Rich ...
[richard.brunner@amd.com    -- (360)-867-0654]
[Senior Member, Technical Staff, SW R&D @ AMD]

> -----Original Message-----
> From: Albert D. Cahalan [mailto:acahalan@cs.uml.edu]
> Sent: Sunday, June 16, 2002 5:09 PM
> To: ak@suse.de
> Cc: ebiederm@xmission.com; ak@suse.de; andrea@suse.de; 
> bcrl@redhat.com;
> linux-kernel@vger.kernel.org; Brunner, Richard; Langsdorf, Mark
> Subject: Re: another new version of pageattr caching conflict fix for
> 2.4
> 
> 
> Andi Kleen writes:
> 
> >> the same problems if the agp aperture was marked 
> write-back, and the
> >
> > AGP aperture is uncacheable, not write-back.
> >
> >> memory was marked uncacheable.  My gut impression is to 
> just make the
> >> agp aperture write-back cacheable, and then we don't have to change
> >> the kernel page table at all.  Unfortunately I don't 
> expect the host
> >
> > That would violate the AGP specification.
> >
> >> bridge with the memory and agp controllers to like that mode,
> >> especially as there are physical aliasing issues.
> >
> > exactly.
> 
> You can do whatever you want, as long as...
> 
> 1. you have cache control instructions and use them
> 2. the bridge ignores the coherency protocol (no machine check)
> 
> Most likely you should make the AGP memory write-back
> cacheable. This requires some care regarding cache lines,
> but ought to be faster.
> 
> >>> Fixing the MTRRs is fine, but it is really outside the 
> scope of my patch.
> >>> Just changing the kernel map wouldn't be enough to fix 
> wrong MTRRs,
> >>> because it wouldn't cover highmem.
> >>
> >> My preferred fix is to use PAT, to override the buggy mtrrs.  Which
> >> brings up the same aliasing issues.  Which makes it related but
> >> outside the scope of the problem.
> >
> > I don't follow you here. IMHO it is much easier to fix the 
> MTRRs in the
> > MTRR driver for those rare buggy BIOS (if they exist - I've 
> never seen one)
> > than to hack up all of memory management just to get the 
> right bits set.
> > I see no disadvantage of using the MTRRs and it is lot simpler than
> > PAT and pte bits.
> 
> For non-x86 one must "hack up all of memory management" anyway.
> 
> Example: There aren't any MTRRs on the PowerPC, but every page
> has 4 memory type bits. It's not OK to map something more than
> one way at the same time. Large "pages" (256 MB each) are used
> to cover all of non-highmem physical memory.
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: another new version of pageattr caching conflict fix for 2.4
@ 2002-06-17 21:13 richard.brunner
  0 siblings, 0 replies; 15+ messages in thread
From: richard.brunner @ 2002-06-17 21:13 UTC (permalink / raw)
  To: ebiederm, ak; +Cc: andrea, bcrl, linux-kernel

I have to agree with Eric, that from an architectural perspective,
PAT was invented to address the coarse granularity of the MTRRs
in assigning memory cacheability types.

The power-of-2 sizes and alignments for MTRRs 
make them clumsy for page-based
or region-based assignment.

Running out of MTRRs was a real porblem before PAT. MTRRS are best used
to set the "default" for all of memory and then use PAT
to override it on a page-by-page basis as needed.

Using MTRRdef for the actual Aperture is generally ok, because
it is mapped above the top-of-DRAM.


-Rich ...
[richard.brunner@amd.com    --               ]
[Senior Member, Technical Staff, SW R&D @ AMD]

> -----Original Message-----
> From: ebiederm@xmission.com [mailto:ebiederm@xmission.com]
> Sent: Sunday, June 16, 2002 9:06 PM
> To: Andi Kleen
> Cc: Andrea Arcangeli; Benjamin LaHaise; linux-kernel@vger.kernel.org;
> Brunner, Richard; Langsdorf, Mark
> Subject: Re: another new version of pageattr caching conflict fix for
> 2.4
> 
> 
> Andi Kleen <ak@suse.de> writes:
> 
> > > > MTRRs work on physical, not virtual memory, so they 
> have no aliasing
> > > > issues.
> > > 
> > > Doesn't the AGP aperture cause a physical alias?  Leading 
> to strange
> > 
> > Yes. That's what this patch is all about.
> > 
> > > the same problems if the agp aperture was marked 
> write-back, and the
> > 
> > AGP aperture is uncacheable, not write-back.
> > 
> > > memory was marked uncacheable.  My gut impression is to 
> just make the
> > > agp aperture write-back cacheable, and then we don't have 
> to change
> > > the kernel page table at all.  Unfortunately I don't 
> expect the host
> > 
> > That would violate the AGP specification.
> > 
> > > bridge with the memory and agp controllers to like that mode,
> > > especially as there are physical aliasing issues.
> > 
> > exactly.
> 
> All of which is an AGP design bug, if you want performance you don't
> cripple your caches, but we have to live with it so no use tilting at
> windmills.
> 
> > > > Fixing the MTRRs is fine, but it is really outside the 
> scope of my patch.
> > > > Just changing the kernel map wouldn't be enough to fix 
> wrong MTRRs,
> > > > because it wouldn't cover highmem. 
> > > 
> > > My preferred fix is to use PAT, to override the buggy 
> mtrrs.  Which
> > > brings up the same aliasing issues.  Which makes it related but
> > > outside the scope of the problem.
> > 
> > I don't follow you here. IMHO it is much easier to fix the 
> MTRRs in the
> > MTRR driver for those rare buggy BIOS (if they exist - I've 
> never seen one)
> 
> I've heard of several and dealt with one.  The problem was 
> essentially they
> ran out of mtrrs, the edges of free memory were to rough.
> 
> > than to hack up all of memory management just to get the 
> right bits set.
> > I see no disadvantage of using the MTRRs and it is lot simpler than
> > PAT and pte bits.
> 
> There are not enough MTRRs.  And using the PAT bits to say memory is
> write-back can be a constant.  It just takes a little work to get in
> place.  Plus the weird assortment of consistency issues.
> 
> For most purposes PAT makes memory easier to deal with because you
> can be as fine grained as you like.  The difficulty is that you must
> have consistent attributes across all of your virtual mappings.  
> 
> The other case PAT should help is when a machine has multiple cards
> that can benefit from write-combining.  Currently running out of mtrrs
> is a problem.
> 
> Eric
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for 2.4
  2002-06-17 21:07 another new version of pageattr caching conflict fix for 2.4 richard.brunner
@ 2002-06-18 17:34 ` Eric W. Biederman
  2002-06-26  1:37 ` another new version of pageattr caching conflict fix for Albert D. Cahalan
  1 sibling, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2002-06-18 17:34 UTC (permalink / raw)
  To: richard.brunner; +Cc: linux-kernel

richard.brunner@amd.com writes:

> Making the AGP Aperture write-back cacheable is not good from
> a performance perspective. (I can't comment on which is better 
> from a Linux Kernel perspective).

Mainly I was arguing from.  
- Make the common case fast.
- The common case is write-back.
- AGP is not the common case.
- AGP has performance limitations.  

>From the kernel side, the caching attributes don't particularly matter
because physical aliasing is introduced, with the AGP aperture.  So
the cache coherency protocols cannot make our lives simpler.

> An Aperture Page can be made
> cache-coherent depending on the implementation
> and the AGP 3.0 spec provides an
> architectural way of specifying and controlling these as
> well.  But, by default the area is not made cache-coherent
> due to the performance loss and the lack of software to take
> advantage of it -- the two play off against each
> other. 

Cache coherency is tricky, so there is some argument there.

> Making it cache-coherent causes every AGP access to
> snoop processor caches and this can be quite a hit in
> performance when you consider the predominant AGP software
> model. Most software that takes advantage of AGP is still
> using the old Intel model of uncacheable, the majority of
> data placed in the Aperture are read-only structures for the
> AGP device -- such as vertex lists, locked vertex arrays,
> and texture data. For the most part this fits the current
> paradigm of throwing textures and vertices at the graphics
> device. The only graphics area found so far that could
> benefit from a coherent aperture is video capture data which
> streams in from the graphics device and requires CPU
> post-processing.

I hadn't thought of the snooping from the AGP side, but even then given
that the AGP aperture is a fixed region it would probably work to just
have a fixed snoop on the AGP region, and only do something when AGP
traffic comes in.  Though I will buy the argument it may not be
possible to do it at full performance unless the AGP card knows
something about cache coherency.  Though mostly I suspect it is a cost
tradeoff issue.

If the area is purely uncacheable, then writing to that area cannot go
at full memory speed.  So we should at the very least mark the region
as write-combining.  This should be get the cpu putting data in there
at about 1400MB/s with PC2100, and moving data there just short of
2100MB/s.  This doesn't help directly AGP performance, but it does
allow the cpu to spend it's cycles on more important things, much
sooner.

I don't believe there is a memory caching attribute that would get the
data copy from the AGP aperture sped up except write-back.  Which is
where I guess video capture comes in.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: another new version of pageattr caching conflict fix for
  2002-06-17 21:07 another new version of pageattr caching conflict fix for 2.4 richard.brunner
  2002-06-18 17:34 ` Eric W. Biederman
@ 2002-06-26  1:37 ` Albert D. Cahalan
  1 sibling, 0 replies; 15+ messages in thread
From: Albert D. Cahalan @ 2002-06-26  1:37 UTC (permalink / raw)
  To: richard.brunner; +Cc: linux-kernel

richard.brunner writes:
> [Albert Cahalan]

>> You can do whatever you want, as long as...
>>
>> 1. you have cache control instructions and use them
>> 2. the bridge ignores the coherency protocol (no machine check)
>>
>> Most likely you should make the AGP memory write-back
>> cacheable. This requires some care regarding cache lines,
>> but ought to be faster.

> Making the AGP Aperture write-back cacheable is not good from
> a performance perspective. (I can't comment on which is better 
> from a Linux Kernel perspective).
> 
> An Aperture Page can be made
> cache-coherent depending on the implementation
> and the AGP 3.0 spec provides an
> architectural way of specifying and controlling these as
> well.  But, by default the area is not made cache-coherent
> due to the performance loss and the lack of software to take
> advantage of it -- the two play off against each
> other. 
> 
> Making it cache-coherent causes every AGP access to
> snoop processor caches and this can be quite a hit in
> performance when you consider the predominant AGP software
> model. Most software that takes advantage of AGP is still
> using the old Intel model of uncacheable, the majority of
> data placed in the Aperture are read-only structures for the
> AGP device -- such as vertex lists, locked vertex arrays,
> and texture data. For the most part this fits the current
> paradigm of throwing textures and vertices at the graphics
> device. The only graphics area found so far that could
> benefit from a coherent aperture is video capture data which
> streams in from the graphics device and requires CPU
> post-processing.

I didn't suggest enabling coherency.

You can cache your _incoherent_ memory as long as the CPU
has instructions that manipulate cache lines. This gives
you write-combining without AGP snooping overhead. If you
can have the CPU be incoherent too, you should do so.

I'm used to working with PowerPC, so maybe you'll tell me
that x86 is too lame to handle this. Hopefully AMD supports
most of these useful operations:

a. mark a cache line valid (with junk data)
b. cause immediate write-back
c. mark a cache line invalid (discard data)
d. prefetch for load
e. prefetch for store (leave clean)
f. create a zero-filled dirty cache line
g. write-back, then invalidate
h. mark some memory as "cached, but NOT coherent"

So you can work like this:

1. mark a cache line valid (with junk data)
2. modify the data the regular way
3. write-back, then invalidate
4. tell the video card to read the data

For data coming the other way:

1. ensure that the cache line isn't dirty
2. tell the video card to write data
3. ensure that the cache line isn't valid
4. prefetch for read
5. see what the video card had to say

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-06-26  1:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-17 21:07 another new version of pageattr caching conflict fix for 2.4 richard.brunner
2002-06-18 17:34 ` Eric W. Biederman
2002-06-26  1:37 ` another new version of pageattr caching conflict fix for Albert D. Cahalan
  -- strict thread matches above, loose matches on Subject: below --
2002-06-17 21:13 another new version of pageattr caching conflict fix for 2.4 richard.brunner
2002-06-13 20:15 New " Andi Kleen
2002-06-14  1:03 ` Benjamin LaHaise
2002-06-14  1:24   ` Andi Kleen
2002-06-14  1:37     ` Benjamin LaHaise
2002-06-14  4:00       ` Andrea Arcangeli
2002-06-14  4:17         ` Benjamin LaHaise
2002-06-14  4:27           ` Andi Kleen
2002-06-14 15:28             ` Benjamin LaHaise
2002-06-14 16:13               ` Andi Kleen
2002-06-14 17:31                 ` Andrea Arcangeli
2002-06-14 18:05                   ` another new " Andi Kleen
2002-06-16 10:08                     ` Eric W. Biederman
2002-06-16 16:48                       ` Andi Kleen
2002-06-16 17:50                         ` Eric W. Biederman
2002-06-16 18:43                           ` Andi Kleen
2002-06-16 19:56                             ` Eric W. Biederman
2002-06-16 23:37                               ` Andi Kleen
2002-06-17  0:08                                 ` Albert D. Cahalan
2002-06-17  4:06                                 ` Eric W. Biederman
2002-06-17  6:53                               ` Andrea Arcangeli
2002-06-17 15:46                                 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox