* [rfc][patch] mm: use a pte bit to flag normal pages
[not found] ` <OFEC52C590.33A28896-ONC12573B8.0069F07E-C12573B8.006B1A41@de.ibm.com>
@ 2008-01-07 4:43 ` Nick Piggin
2008-01-07 10:30 ` Russell King
2008-01-10 13:33 ` Carsten Otte
0 siblings, 2 replies; 16+ messages in thread
From: Nick Piggin @ 2008-01-07 4:43 UTC (permalink / raw)
To: Martin Schwidefsky
Cc: carsteno, Heiko Carstens, Jared Hulbert,
Linux Memory Management List, linux-arch
On Fri, Dec 21, 2007 at 08:29:50PM +0100, Martin Schwidefsky wrote:
> Nick Piggin <npiggin@suse.de> wrote on 12/21/2007 11:47:01 AM:
> > On Fri, Dec 21, 2007 at 11:35:02AM +0100, Carsten Otte wrote:
> > > Nick Piggin wrote:
> > > >But it doesn't still retain sparsemem sections behind that? Ie. so
> that
> > > >pfn_valid could be used? (I admittedly don't know enough eabout the
> memory
> > > >model code).
> > > Not as far as I know. But arch/s390/mm/vmem.c has:
> > >
> > > struct memory_segment {
> > > struct list_head list;
> > > unsigned long start;
> > > unsigned long size;
> > > };
> > >
> > > static LIST_HEAD(mem_segs);
> > >
> > > This is maintained every time we map a segment/unmap a segment. And we
> > > could add a bit to struct memory_segment meaning "refcount this one".
> > > This way, we could tell core mm whether or not a pfn should be
> refcounted.
> >
> > Right, this should work.
> >
> > BTW. having a per-arch function sounds reasonable for a start. I'd just
> give
> > it a long name, so that people don't start using it for weird things ;)
> > mixedmap_refcount_pfn() or something.
>
> Hmm, I would prefer to have a pte bit, it seem much more natural to me.
> We know that this is a special pte when it gets mapped, but we "forgot"
> that fact when the pte is picked up again in vm_normal_page. To search a
> list when a simple bit in the pte get the job done just feels wrong.
> By the way, for s390 the lower 8 bits of the pte are OS defined. The lowest
> two bits are used in addition to the hardware invalid and the hardware
> read-
> only bit to define the pte type. For valid ptes the remaining 6 bits are
> unused. Pick one, e.g. 2**2 for the bit that says
> "don't-refcount-this-pte".
This would be nice if we can do it, although I would prefer to make everything
work without any pte bits first, in order to make sure all architectures have a
chance at implementing it (although I guess for s390 specific memory map stuff,
it is reasonable for you to do your own thing there...).
We initially wanted to do the whole vm_normal_page thing this way, with another
pte bit, but we thought there were one or two archs with no spare bits. BTW. I
also need this bit in order to implement my lockless get_user_pages, so I do hope
to get it in. I'd like to know what architectures cannot spare a software bit in
their pte_present ptes...
---
Rather than play interesting games with vmas to work out whether the mapped page
should be refcounted or not, use a new bit in the "present" pte to distinguish
such pages.
This allows much simpler "vm_normal_page" implementation, and more flexible rules
for COW pages in pfn mappings (eg. our proposed VM_MIXEDMAP mode would becomes a noop).
It also provides one of the required pieces for the lockless get_user_pages.
Unfortunately, maybe not all architectures can spare a bit in the pte for this.
So we probably have to end up with some ifdefs (if we even want to add this
approach at all). For this reason, I would prefer for now to avoid using a new pte
bit to implement any of this stuff, and get VM_MIXEDMAP and its callers working
nicely on all architectures first.
Thanks,
Nick
---
Index: linux-2.6/include/asm-powerpc/pgtable-ppc64.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/pgtable-ppc64.h
+++ linux-2.6/include/asm-powerpc/pgtable-ppc64.h
@@ -93,6 +93,7 @@
#define _PAGE_RW 0x0200 /* software: user write access allowed */
#define _PAGE_HASHPTE 0x0400 /* software: pte has an associated HPTE */
#define _PAGE_BUSY 0x0800 /* software: PTE & hash are busy */
+#define _PAGE_SPECIAL 0x1000 /* software: pte associated with special page */
#define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_COHERENT)
@@ -233,12 +234,13 @@ static inline pte_t pfn_pte(unsigned lon
/*
* The following only work if pte_present() is true.
- * Undefined behaviour if not..
+ * Undefined behaviour if not.. (XXX: comment wrong eg. for pte_file())
*/
static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW;}
static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY;}
static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;}
static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;}
+static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; }
static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; }
static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; }
@@ -257,6 +259,8 @@ static inline pte_t pte_mkyoung(pte_t pt
pte_val(pte) |= _PAGE_ACCESSED; return pte; }
static inline pte_t pte_mkhuge(pte_t pte) {
return pte; }
+static inline pte_t pte_mkspecial(pte_t pte) {
+ pte_val(pte) |= _PAGE_SPECIAL; return pte; }
/* Atomic PTE updates */
static inline unsigned long pte_update(struct mm_struct *mm,
Index: linux-2.6/include/asm-um/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-um/pgtable.h
+++ linux-2.6/include/asm-um/pgtable.h
@@ -21,6 +21,7 @@
#define _PAGE_USER 0x040
#define _PAGE_ACCESSED 0x080
#define _PAGE_DIRTY 0x100
+#define _PAGE_SPECIAL 0x200
/* If _PAGE_PRESENT is clear, we use these: */
#define _PAGE_FILE 0x008 /* nonlinear file mapping, saved PTE; unset:swap */
#define _PAGE_PROTNONE 0x010 /* if the user mapped it with PROT_NONE;
@@ -220,6 +221,11 @@ static inline int pte_newprot(pte_t pte)
return(pte_present(pte) && (pte_get_bits(pte, _PAGE_NEWPROT)));
}
+static inline int pte_special(pte_t pte)
+{
+ return pte_get_bits(pte, _PAGE_SPECIAL);
+}
+
/*
* =================================
* Flags setting section.
@@ -288,6 +294,12 @@ static inline pte_t pte_mknewpage(pte_t
return(pte);
}
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+ pte_set_bits(pte, _PAGE_SPECIAL);
+ return(pte);
+}
+
static inline void set_pte(pte_t *pteptr, pte_t pteval)
{
pte_copy(*pteptr, pteval);
Index: linux-2.6/include/asm-x86/pgtable_32.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_32.h
+++ linux-2.6/include/asm-x86/pgtable_32.h
@@ -102,6 +102,7 @@ void paging_init(void);
#define _PAGE_BIT_UNUSED2 10
#define _PAGE_BIT_UNUSED3 11
#define _PAGE_BIT_NX 63
+#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1
#define _PAGE_PRESENT 0x001
#define _PAGE_RW 0x002
@@ -115,6 +116,7 @@ void paging_init(void);
#define _PAGE_UNUSED1 0x200 /* available for programmer */
#define _PAGE_UNUSED2 0x400
#define _PAGE_UNUSED3 0x800
+#define _PAGE_SPECIAL PAGE_UNUSED1
/* If _PAGE_PRESENT is clear, we use these: */
#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */
@@ -219,6 +221,7 @@ static inline int pte_dirty(pte_t pte)
static inline int pte_young(pte_t pte) { return (pte).pte_low & _PAGE_ACCESSED; }
static inline int pte_write(pte_t pte) { return (pte).pte_low & _PAGE_RW; }
static inline int pte_huge(pte_t pte) { return (pte).pte_low & _PAGE_PSE; }
+static inline int pte_special(pte_t pte) { return (pte).pte_low & _PAGE_SPECIAL; }
/*
* The following only works if pte_present() is not true.
@@ -232,6 +235,7 @@ static inline pte_t pte_mkdirty(pte_t pt
static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte_low |= _PAGE_ACCESSED; return pte; }
static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte_low |= _PAGE_RW; return pte; }
static inline pte_t pte_mkhuge(pte_t pte) { (pte).pte_low |= _PAGE_PSE; return pte; }
+static inline pte_t pte_mkspecial(pte_t pte) { (pte).pte_low |= _PAGE_SPECIAL; return pte; }
#ifdef CONFIG_X86_PAE
# include <asm/pgtable-3level.h>
Index: linux-2.6/include/asm-x86/pgtable_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/pgtable_64.h
+++ linux-2.6/include/asm-x86/pgtable_64.h
@@ -151,6 +151,7 @@ static inline pte_t ptep_get_and_clear_f
#define _PAGE_BIT_DIRTY 6
#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */
#define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */
+#define _PAGE_BIT_SPECIAL 9
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
#define _PAGE_PRESENT 0x001
@@ -163,6 +164,7 @@ static inline pte_t ptep_get_and_clear_f
#define _PAGE_PSE 0x080 /* 2MB page */
#define _PAGE_FILE 0x040 /* nonlinear file mapping, saved PTE; unset:swap */
#define _PAGE_GLOBAL 0x100 /* Global TLB entry */
+#define _PAGE_SPECIAL 0x200
#define _PAGE_PROTNONE 0x080 /* If not present */
#define _PAGE_NX (_AC(1,UL)<<_PAGE_BIT_NX)
@@ -272,6 +274,7 @@ static inline int pte_young(pte_t pte)
static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; }
static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; }
static inline int pte_huge(pte_t pte) { return pte_val(pte) & _PAGE_PSE; }
+static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; }
static inline pte_t pte_mkclean(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_DIRTY)); return pte; }
static inline pte_t pte_mkold(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_ACCESSED)); return pte; }
@@ -282,6 +285,7 @@ static inline pte_t pte_mkyoung(pte_t pt
static inline pte_t pte_mkwrite(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_RW)); return pte; }
static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_PSE)); return pte; }
static inline pte_t pte_clrhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_PSE)); return pte; }
+static inline pte_t pte_mkspecial(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_SPECIAL)); return pte; }
struct vm_area_struct;
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -698,7 +698,20 @@ struct zap_details {
unsigned long truncate_count; /* Compare vm_truncate_count */
};
-struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
+/*
+ * This function gets the "struct page" associated with a pte.
+ *
+ * "Special" mappings do not wish to be associated with a "struct page" (either
+ * it doesn't exist, or it exists but they don't want to touch it). In this
+ * case, NULL is returned here.
+ */
+static inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
+{
+ if (likely(!pte_special(pte)))
+ return pte_page(pte);
+ return NULL;
+}
+
unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *);
unsigned long unmap_vmas(struct mmu_gather **tlb,
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -361,64 +361,10 @@ static inline int is_cow_mapping(unsigne
}
/*
- * This function gets the "struct page" associated with a pte.
- *
- * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
- * will have each page table entry just pointing to a raw page frame
- * number, and as far as the VM layer is concerned, those do not have
- * pages associated with them - even if the PFN might point to memory
- * that otherwise is perfectly fine and has a "struct page".
- *
- * The way we recognize those mappings is through the rules set up
- * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
- * and the vm_pgoff will point to the first PFN mapped: thus every
- * page that is a raw mapping will always honor the rule
- *
- * pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
- *
- * and if that isn't true, the page has been COW'ed (in which case it
- * _does_ have a "struct page" associated with it even if it is in a
- * VM_PFNMAP range).
- */
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
-{
- unsigned long pfn = pte_pfn(pte);
-
- if (unlikely(vma->vm_flags & VM_PFNMAP)) {
- unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
- if (pfn == vma->vm_pgoff + off)
- return NULL;
- if (!is_cow_mapping(vma->vm_flags))
- return NULL;
- }
-
- /*
- * Add some anal sanity checks for now. Eventually,
- * we should just do "return pfn_to_page(pfn)", but
- * in the meantime we check that we get a valid pfn,
- * and that the resulting page looks ok.
- */
- if (unlikely(!pfn_valid(pfn))) {
- print_bad_pte(vma, pte, addr);
- return NULL;
- }
-
- /*
- * NOTE! We still have PageReserved() pages in the page
- * tables.
- *
- * The PAGE_ZERO() pages and various VDSO mappings can
- * cause them to exist.
- */
- return pfn_to_page(pfn);
-}
-
-/*
* copy one vm_area from one task to the other. Assumes the page tables
* already present in the new task to be cleared in the whole range
* covered by this vma.
*/
-
static inline void
copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
@@ -1212,7 +1158,6 @@ int vm_insert_pfn(struct vm_area_struct
spinlock_t *ptl;
BUG_ON(!(vma->vm_flags & VM_PFNMAP));
- BUG_ON(is_cow_mapping(vma->vm_flags));
retval = -ENOMEM;
pte = get_locked_pte(mm, addr, &ptl);
@@ -1223,7 +1168,7 @@ int vm_insert_pfn(struct vm_area_struct
goto out_unlock;
/* Ok, finally just insert the thing.. */
- entry = pfn_pte(pfn, vma->vm_page_prot);
+ entry = pte_mkspecial(pfn_pte(pfn, vma->vm_page_prot));
set_pte_at(mm, addr, pte, entry);
update_mmu_cache(vma, addr, entry);
@@ -1254,7 +1199,7 @@ static int remap_pte_range(struct mm_str
arch_enter_lazy_mmu_mode();
do {
BUG_ON(!pte_none(*pte));
- set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
+ set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
pfn++;
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
@@ -1321,30 +1266,6 @@ int remap_pfn_range(struct vm_area_struc
struct mm_struct *mm = vma->vm_mm;
int err;
- /*
- * Physically remapped pages are special. Tell the
- * rest of the world about it:
- * VM_IO tells people not to look at these pages
- * (accesses can have side effects).
- * VM_RESERVED is specified all over the place, because
- * in 2.4 it kept swapout's vma scan off this vma; but
- * in 2.6 the LRU scan won't even find its pages, so this
- * flag means no more than count its pages in reserved_vm,
- * and omit it from core dump, even when VM_IO turned off.
- * VM_PFNMAP tells the core MM that the base pages are just
- * raw PFN mappings, and do not have a "struct page" associated
- * with them.
- *
- * There's a horrible special case to handle copy-on-write
- * behaviour that some programs depend on. We mark the "original"
- * un-COW'ed pages by matching them up with "vma->vm_pgoff".
- */
- if (is_cow_mapping(vma->vm_flags)) {
- if (addr != vma->vm_start || end != vma->vm_end)
- return -EINVAL;
- vma->vm_pgoff = pfn;
- }
-
vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;
BUG_ON(addr >= end);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 4:43 ` [rfc][patch] mm: use a pte bit to flag normal pages Nick Piggin
@ 2008-01-07 10:30 ` Russell King
2008-01-07 11:14 ` Nick Piggin
2008-01-07 18:49 ` Jared Hulbert
2008-01-10 13:33 ` Carsten Otte
1 sibling, 2 replies; 16+ messages in thread
From: Russell King @ 2008-01-07 10:30 UTC (permalink / raw)
To: Nick Piggin
Cc: Martin Schwidefsky, carsteno, Heiko Carstens, Jared Hulbert,
Linux Memory Management List, linux-arch
On Mon, Jan 07, 2008 at 05:43:55AM +0100, Nick Piggin wrote:
> We initially wanted to do the whole vm_normal_page thing this way, with
> another pte bit, but we thought there were one or two archs with no spare
> bits. BTW. I also need this bit in order to implement my lockless
> get_user_pages, so I do hope to get it in. I'd like to know what
> architectures cannot spare a software bit in their pte_present ptes...
ARM is going to have to use the three remaining bits we have in the PTE
to store the memory type to resolve bugs on later platforms. Once they're
used, ARM will no longer have any room for any further PTE expansion.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 10:30 ` Russell King
@ 2008-01-07 11:14 ` Nick Piggin
2008-01-07 18:49 ` Jared Hulbert
1 sibling, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2008-01-07 11:14 UTC (permalink / raw)
To: Martin Schwidefsky, carsteno, Heiko Carstens, Jared Hulbert,
Linux Memory Management List, linux-arch
On Mon, Jan 07, 2008 at 10:30:29AM +0000, Russell King wrote:
> On Mon, Jan 07, 2008 at 05:43:55AM +0100, Nick Piggin wrote:
> > We initially wanted to do the whole vm_normal_page thing this way, with
> > another pte bit, but we thought there were one or two archs with no spare
> > bits. BTW. I also need this bit in order to implement my lockless
> > get_user_pages, so I do hope to get it in. I'd like to know what
> > architectures cannot spare a software bit in their pte_present ptes...
>
> ARM is going to have to use the three remaining bits we have in the PTE
> to store the memory type to resolve bugs on later platforms. Once they're
> used, ARM will no longer have any room for any further PTE expansion.
OK, it is good to have a negative confirmed. So I think we should definitely
get the non-pte-bit based mapping schemes working and tested on all platforms
before using a pte bit mapping...
FWIW, it might be possible for platforms to implement lockless get_user_pages
in other ways too. But that's getting ahead of myself.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 10:30 ` Russell King
2008-01-07 11:14 ` Nick Piggin
@ 2008-01-07 18:49 ` Jared Hulbert
2008-01-07 19:45 ` Russell King
1 sibling, 1 reply; 16+ messages in thread
From: Jared Hulbert @ 2008-01-07 18:49 UTC (permalink / raw)
To: Nick Piggin, Martin Schwidefsky, carsteno, Heiko Carstens,
Jared Hulbert, Linux Memory Management List, linux-arch
> ARM is going to have to use the three remaining bits we have in the PTE
> to store the memory type to resolve bugs on later platforms. Once they're
> used, ARM will no longer have any room for any further PTE expansion.
Russell,
Can you explain this a little more.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 18:49 ` Jared Hulbert
@ 2008-01-07 19:45 ` Russell King
2008-01-07 22:52 ` Jared Hulbert
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Russell King @ 2008-01-07 19:45 UTC (permalink / raw)
To: Jared Hulbert
Cc: Nick Piggin, Martin Schwidefsky, carsteno, Heiko Carstens,
Linux Memory Management List, linux-arch
On Mon, Jan 07, 2008 at 10:49:57AM -0800, Jared Hulbert wrote:
> > ARM is going to have to use the three remaining bits we have in the PTE
> > to store the memory type to resolve bugs on later platforms. Once they're
> > used, ARM will no longer have any room for any further PTE expansion.
>
> Russell,
>
> Can you explain this a little more.
In old ARM CPUs, there were two bits that defined the characteristics of
the mapping - the C and B bits (C = cacheable, B = bufferable)
Some ARMv5 (particularly Xscale-based) and all ARMv6 CPUs extend this to
five bits and introduce "memory types" - 3 bits of TEX, and C and B.
Between these bits, it defines:
- strongly ordered
- bufferable only *
- device, sharable *
- device, unsharable
- memory, bufferable and cacheable, write through, no write allocate
- memory, bufferable and cacheable, write back, no write allocate
- memory, bufferable and cacheable, write back, write allocate
- implementation defined combinations (eg, selecting "minicache")
- and a set of 16 states to allow the policy of inner and outer levels
of cache to be defined (two bits per level).
Of course, not all CPUs support all the above - for example, if write
back caches aren't supported then the result is a write through cache.
The write allocation setting is a "hint" - if the hardware doesn't
support write allocate, it'll just be read allocate.
There are now CPUs out there where the old combinations (TEX=0) are
broken - and causes nasty effects like writes to bypass the write
protection under certain circumstances, or the data cache to hang if
you're using a strongly ordered mapping.
The "workaround" for these is to avoid the problematical mapping mode -
which is CPU specific, and depends on knowledge of what's being mapped.
For instance, you might use a sharable device mapping instead of
strongly ordered for devices. However, you might want to use an
outer cacheable but inner uncacheable mapping instead of strongly
ordered for memory.
Now, couple this with the fix for shared mmaps - where we normally turn
a cacheable mapping into a bufferable mapping, or if the write buffer has
visible side effects, a strongly ordered mapping, or if strongly ordered
mappings are buggy... etc.
Also note that there are devices (typically "unshared" devices) on some
ARM CPUs that you can only access if you set the TEX bits correctly.
Currently, Linux is able to setup mappings in kernel space to cover
any combination of settings. However, userspace is much more limited
because we don't carry the additional bits around in the Linux version
of the PTE - and as such shared mmaps on some systems can end up locking
the CPU.
A few attempts have been made at solving these without using the
additional PTE bits, but they've been less that robust.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 19:45 ` Russell King
@ 2008-01-07 22:52 ` Jared Hulbert
2008-01-08 2:37 ` Andi Kleen
2008-01-08 10:11 ` Catalin Marinas
2 siblings, 0 replies; 16+ messages in thread
From: Jared Hulbert @ 2008-01-07 22:52 UTC (permalink / raw)
To: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
> Currently, Linux is able to setup mappings in kernel space to cover
> any combination of settings. However, userspace is much more limited
> because we don't carry the additional bits around in the Linux version
> of the PTE - and as such shared mmaps on some systems can end up locking
> the CPU.
>
> A few attempts have been made at solving these without using the
> additional PTE bits, but they've been less that robust.
Do these new ARM implementations use more bits than most archs?
Most ARM implementations can spare a PTE bit for this, right? Is the
use of these 3 extra bits to cover a few buggy processors or is this
caused by consolidating the needs of widely differing architectures?
I just can't get over the idea that you _have_ use up all available
bits. Oh well.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 19:45 ` Russell King
2008-01-07 22:52 ` Jared Hulbert
@ 2008-01-08 2:37 ` Andi Kleen
2008-01-08 2:49 ` Nick Piggin
2008-01-08 10:11 ` Catalin Marinas
2 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2008-01-08 2:37 UTC (permalink / raw)
To: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
> - strongly ordered
> - bufferable only *
> - device, sharable *
> - device, unsharable
> - memory, bufferable and cacheable, write through, no write allocate
> - memory, bufferable and cacheable, write back, no write allocate
> - memory, bufferable and cacheable, write back, write allocate
> - implementation defined combinations (eg, selecting "minicache")
> - and a set of 16 states to allow the policy of inner and outer levels
> of cache to be defined (two bits per level).
Do you need all of those in user space? Perhaps you could give
the bits different meanings depending on user or kernel space.
I think Nick et.al. just need the bits for user space; they won't
care about kernel mappings.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 2:37 ` Andi Kleen
@ 2008-01-08 2:49 ` Nick Piggin
2008-01-08 3:31 ` Andi Kleen
0 siblings, 1 reply; 16+ messages in thread
From: Nick Piggin @ 2008-01-08 2:49 UTC (permalink / raw)
To: Andi Kleen
Cc: Jared Hulbert, Martin Schwidefsky, carsteno, Heiko Carstens,
Linux Memory Management List, linux-arch
On Tue, Jan 08, 2008 at 03:37:46AM +0100, Andi Kleen wrote:
> > - strongly ordered
> > - bufferable only *
> > - device, sharable *
> > - device, unsharable
> > - memory, bufferable and cacheable, write through, no write allocate
> > - memory, bufferable and cacheable, write back, no write allocate
> > - memory, bufferable and cacheable, write back, write allocate
> > - implementation defined combinations (eg, selecting "minicache")
> > - and a set of 16 states to allow the policy of inner and outer levels
> > of cache to be defined (two bits per level).
>
> Do you need all of those in user space? Perhaps you could give
> the bits different meanings depending on user or kernel space.
> I think Nick et.al. just need the bits for user space; they won't
> care about kernel mappings.
Yes correct -- they are only for userspace mappings. Though that includes mmaps
of /dev/mem and device drivers etc.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 2:49 ` Nick Piggin
@ 2008-01-08 3:31 ` Andi Kleen
2008-01-08 3:52 ` Nick Piggin
0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2008-01-08 3:31 UTC (permalink / raw)
To: Nick Piggin
Cc: Andi Kleen, Jared Hulbert, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
On Tue, Jan 08, 2008 at 03:49:07AM +0100, Nick Piggin wrote:
> On Tue, Jan 08, 2008 at 03:37:46AM +0100, Andi Kleen wrote:
> > > - strongly ordered
> > > - bufferable only *
> > > - device, sharable *
> > > - device, unsharable
> > > - memory, bufferable and cacheable, write through, no write allocate
> > > - memory, bufferable and cacheable, write back, no write allocate
> > > - memory, bufferable and cacheable, write back, write allocate
> > > - implementation defined combinations (eg, selecting "minicache")
> > > - and a set of 16 states to allow the policy of inner and outer levels
> > > of cache to be defined (two bits per level).
> >
> > Do you need all of those in user space? Perhaps you could give
> > the bits different meanings depending on user or kernel space.
> > I think Nick et.al. just need the bits for user space; they won't
> > care about kernel mappings.
>
> Yes correct -- they are only for userspace mappings. Though that includes mmaps
> of /dev/mem and device drivers etc.
/dev/mem can be always special cased by checking the VMA flags, can't it?
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 3:31 ` Andi Kleen
@ 2008-01-08 3:52 ` Nick Piggin
0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2008-01-08 3:52 UTC (permalink / raw)
To: Andi Kleen
Cc: Jared Hulbert, Martin Schwidefsky, carsteno, Heiko Carstens,
Linux Memory Management List, linux-arch
On Tue, Jan 08, 2008 at 04:31:03AM +0100, Andi Kleen wrote:
> On Tue, Jan 08, 2008 at 03:49:07AM +0100, Nick Piggin wrote:
> > On Tue, Jan 08, 2008 at 03:37:46AM +0100, Andi Kleen wrote:
> > > > - strongly ordered
> > > > - bufferable only *
> > > > - device, sharable *
> > > > - device, unsharable
> > > > - memory, bufferable and cacheable, write through, no write allocate
> > > > - memory, bufferable and cacheable, write back, no write allocate
> > > > - memory, bufferable and cacheable, write back, write allocate
> > > > - implementation defined combinations (eg, selecting "minicache")
> > > > - and a set of 16 states to allow the policy of inner and outer levels
> > > > of cache to be defined (two bits per level).
> > >
> > > Do you need all of those in user space? Perhaps you could give
> > > the bits different meanings depending on user or kernel space.
> > > I think Nick et.al. just need the bits for user space; they won't
> > > care about kernel mappings.
> >
> > Yes correct -- they are only for userspace mappings. Though that includes mmaps
> > of /dev/mem and device drivers etc.
>
> /dev/mem can be always special cased by checking the VMA flags, can't it?
That's basically what we do today with COW support for VM_PFNMAP. Once you have
that, I don't think there is a huge reason to _also_ use the pte bit for other
mappings (because you need to have the VM_PFNMAP support there anyway).
For lockless get_user_pages, I don't take mmap_sem, look up any vmas, or even
take any page table locks, so it doesn't help there either. (though in the case
of lockless gup, architectues that cannot support it can simply revert to the
regular gup).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 19:45 ` Russell King
2008-01-07 22:52 ` Jared Hulbert
2008-01-08 2:37 ` Andi Kleen
@ 2008-01-08 10:11 ` Catalin Marinas
2008-01-08 10:52 ` Russell King
2 siblings, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2008-01-08 10:11 UTC (permalink / raw)
To: Russell King
Cc: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
On Mon, 2008-01-07 at 19:45 +0000, Russell King wrote:
> In old ARM CPUs, there were two bits that defined the characteristics of
> the mapping - the C and B bits (C = cacheable, B = bufferable)
>
> Some ARMv5 (particularly Xscale-based) and all ARMv6 CPUs extend this to
> five bits and introduce "memory types" - 3 bits of TEX, and C and B.
>
> Between these bits, it defines:
>
> - strongly ordered
> - bufferable only *
> - device, sharable *
> - device, unsharable
> - memory, bufferable and cacheable, write through, no write allocate
> - memory, bufferable and cacheable, write back, no write allocate
> - memory, bufferable and cacheable, write back, write allocate
> - implementation defined combinations (eg, selecting "minicache")
> - and a set of 16 states to allow the policy of inner and outer levels
> of cache to be defined (two bits per level).
Can we not restrict these to a maximum of 8 base types at run-time? If
yes, we can only use 3 bits for encoding and also benefit from the
automatic remapping in later ARM CPUs. For those not familiar with ARM,
8 combinations of the TEX, C, B and S (shared) bits can be specified in
separate registers and the pte would only use 3 bits to refer to those.
Even older cores would benefit from this as I think it is faster to read
the encoding from an array in set_pte than doing all the bit comparisons
to calculate the hardware pte in the current implementation.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 10:11 ` Catalin Marinas
@ 2008-01-08 10:52 ` Russell King
2008-01-08 13:54 ` Catalin Marinas
0 siblings, 1 reply; 16+ messages in thread
From: Russell King @ 2008-01-08 10:52 UTC (permalink / raw)
To: Catalin Marinas
Cc: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
On Tue, Jan 08, 2008 at 10:11:15AM +0000, Catalin Marinas wrote:
> On Mon, 2008-01-07 at 19:45 +0000, Russell King wrote:
> > In old ARM CPUs, there were two bits that defined the characteristics of
> > the mapping - the C and B bits (C = cacheable, B = bufferable)
> >
> > Some ARMv5 (particularly Xscale-based) and all ARMv6 CPUs extend this to
> > five bits and introduce "memory types" - 3 bits of TEX, and C and B.
> >
> > Between these bits, it defines:
> >
> > - strongly ordered
> > - bufferable only *
> > - device, sharable *
> > - device, unsharable
> > - memory, bufferable and cacheable, write through, no write allocate
> > - memory, bufferable and cacheable, write back, no write allocate
> > - memory, bufferable and cacheable, write back, write allocate
> > - implementation defined combinations (eg, selecting "minicache")
> > - and a set of 16 states to allow the policy of inner and outer levels
> > of cache to be defined (two bits per level).
>
> Can we not restrict these to a maximum of 8 base types at run-time? If
> yes, we can only use 3 bits for encoding and also benefit from the
> automatic remapping in later ARM CPUs. For those not familiar with ARM,
> 8 combinations of the TEX, C, B and S (shared) bits can be specified in
> separate registers and the pte would only use 3 bits to refer to those.
> Even older cores would benefit from this as I think it is faster to read
> the encoding from an array in set_pte than doing all the bit comparisons
> to calculate the hardware pte in the current implementation.
So basically that gives us the following combinations:
TEXCB
00000 - /dev/mem and device uncachable mappings (strongly ordered)
00001 - frame buffers
00010 - write through mappings (selectable via kernel command line)
and also work-around for user read-only write-back mappings
on PXA2.
00011 - normal write back mappings
00101 - Xscale3 "shared device" work-around for strongly ordered mappings
00110 - PXA3 mini-cache or other "implementation defined features"
00111 - write back write allocate mappings
01000 - non-shared device (will be required to map some devices to userspace)
and also Xscale3 work-around for strongly ordered mappings
10111 - Xscale3 L2 cache-enabled mappings
It's unclear at present what circumstances you'd use each of the two
Xscale3 work-around bit combinations - or indeed whether there's a
printing error in the documentation concerning TEXCB=00101.
It's also unclear how to squeeze these down into a bit pattern in such
a way that we avoid picking out bits from the Linux PTE, and recombining
them so we can look them up in a table or whatever - especially given
that set_pte is a fast path and extra cycles there have a VERY noticable
impact on overall system performance.
However, until we get around to sorting out the implementation of the
Xscale3 strongly ordered work-around which seems to be the highest
priority (and hardest to resolve) I don't think there's much more to
discuss; we don't have a clear way ahead on these issues at the moment.
All we current have is the errata entry, and we know people are seeing
data corruption on Xscale3 platforms.
And no, I don't think we can keep it contained within the Xscale3 support
file - the set_pte method isn't passed sufficient information for that.
Conversely, setting the TEX bits behind set_pte's back by using set_pte_ext
results in loss of that information when the page is aged - again resulting
in data corruption.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 10:52 ` Russell King
@ 2008-01-08 13:54 ` Catalin Marinas
2008-01-08 14:08 ` Russell King
0 siblings, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2008-01-08 13:54 UTC (permalink / raw)
To: Russell King
Cc: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
On Tue, 2008-01-08 at 10:52 +0000, Russell King wrote:
> On Tue, Jan 08, 2008 at 10:11:15AM +0000, Catalin Marinas wrote:
> > Can we not restrict these to a maximum of 8 base types at run-time? If
> > yes, we can only use 3 bits for encoding and also benefit from the
> > automatic remapping in later ARM CPUs. For those not familiar with ARM,
> > 8 combinations of the TEX, C, B and S (shared) bits can be specified in
> > separate registers and the pte would only use 3 bits to refer to those.
> > Even older cores would benefit from this as I think it is faster to read
> > the encoding from an array in set_pte than doing all the bit comparisons
> > to calculate the hardware pte in the current implementation.
>
> So basically that gives us the following combinations:
I reordered them a bit for easier commenting.
> TEXCB
> 00010 - write through mappings (selectable via kernel command line)
> and also work-around for user read-only write-back mappings
> on PXA2.
> 00011 - normal write back mappings
> 00111 - write back write allocate mappings
Do you need to use all of the above at the same time? We could have only
one type, "normal memory", and configure the desired TEX encoding at
boot time.
> 00000 - /dev/mem and device uncachable mappings (strongly ordered)
> 00101 - Xscale3 "shared device" work-around for strongly ordered mappings
> 01000 - non-shared device (will be required to map some devices to
> userspace)
> and also Xscale3 work-around for strongly ordered mappings
I don't know the details of the Xscale3 bug but would you need all of
these encodings at run-time? Do you need both "strongly ordered" and the
workaround? We could only have the "strongly ordered" type and configure
the TEX bits at boot time to be "shared device" if the workaround is
needed.
For the last one, we could have the "non-shared device" type.
> 00001 - frame buffers
This would be "shared device" on newer CPUs.
> 00110 - PXA3 mini-cache or other "implementation defined features"
> 10111 - Xscale3 L2 cache-enabled mappings
It depends on how many of these you would need at run-time. If the base
types are "normal", "strongly ordered", "shared device", "non-shared
device", you still have 4 more left (or 3 on ARMv6 with TEX remapping
enabled since one encoding is implementation defined).
> It's unclear at present what circumstances you'd use each of the two
> Xscale3 work-around bit combinations - or indeed whether there's a
> printing error in the documentation concerning TEXCB=00101.
As I said, I don't know the details of this bug and can't comment.
> It's also unclear how to squeeze these down into a bit pattern in such
> a way that we avoid picking out bits from the Linux PTE, and recombining
> them so we can look them up in a table or whatever - especially given
> that set_pte is a fast path and extra cycles there have a VERY noticable
> impact on overall system performance.
As with the automatic remapping on ARMv6, we could use TEX[0], C and B
to for the 3 bit index in the table. For pre-ARMv6 hardware, we need a
bit of shifting and masking before looking up in the 8 32bit words table
but, for subsequent calls to set_pte, it is likely that the table would
be in cache anyway. There is also the option of choosing 3 consecutive
bits to avoid shifting on pre-ARMv6.
I agree there would be a delay on pre-ARMv6 CPUs but the impact might
not be that big since the current set_pte implementations still do
additional bit shifting/comparison for the access permissions. The
advantage is that we free 2 bits from the TEXCB encoding.
I haven't run any benchmarks and I can't say how big the impact is but,
based on some past discussions, 3-4 more cycles in set_pte might go
unnoticed because of other, bigger overheads.
--
Catalin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-08 13:54 ` Catalin Marinas
@ 2008-01-08 14:08 ` Russell King
0 siblings, 0 replies; 16+ messages in thread
From: Russell King @ 2008-01-08 14:08 UTC (permalink / raw)
To: Catalin Marinas
Cc: Jared Hulbert, Nick Piggin, Martin Schwidefsky, carsteno,
Heiko Carstens, Linux Memory Management List, linux-arch
On Tue, Jan 08, 2008 at 01:54:15PM +0000, Catalin Marinas wrote:
> On Tue, 2008-01-08 at 10:52 +0000, Russell King wrote:
> > It's unclear at present what circumstances you'd use each of the two
> > Xscale3 work-around bit combinations - or indeed whether there's a
> > printing error in the documentation concerning TEXCB=00101.
>
> As I said, I don't know the details of this bug and can't comment.
As I said I don't think there's anything further that can be usefully
added to this discussion until we're further down the road with this.
Even though you don't know the details of the bug report, I've mentioned
as much as I know about it at present - and that includes with access to
Marvells spec update document. When I'm further down the line with PXA3
work maybe I'll know more, but my priority at the moment on PXA3 is
suspend/resume support.
> I haven't run any benchmarks and I can't say how big the impact is but,
> based on some past discussions, 3-4 more cycles in set_pte might go
> unnoticed because of other, bigger overheads.
Except when you're clearing out page tables - for instance when a
thread exits. It's very noticable and shows up rather well in
fork+exit tests - even shell scripts.
This was certainly the case with 2.2 kernels. Whether 2.6 kernels
are soo heavy weight that it's been swapped into non-existence I
don't know.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-07 4:43 ` [rfc][patch] mm: use a pte bit to flag normal pages Nick Piggin
2008-01-07 10:30 ` Russell King
@ 2008-01-10 13:33 ` Carsten Otte
2008-01-10 23:18 ` Nick Piggin
1 sibling, 1 reply; 16+ messages in thread
From: Carsten Otte @ 2008-01-10 13:33 UTC (permalink / raw)
To: Nick Piggin
Cc: Martin Schwidefsky, carsteno, Heiko Carstens, Jared Hulbert,
Linux Memory Management List, linux-arch
Nick Piggin wrote:
> We initially wanted to do the whole vm_normal_page thing this way, with another
> pte bit, but we thought there were one or two archs with no spare bits. BTW. I
> also need this bit in order to implement my lockless get_user_pages, so I do hope
> to get it in. I'd like to know what architectures cannot spare a software bit in
> their pte_present ptes...
I've been playing with the original PAGE_SPECIAL patch a little bit, and
you can find the corresponding s390 definition below that you might want
to add to your patch queue.
It is a little unclear to me, how you'd like to proceed from here:
- with PTE_SPECIAL, do we still have VM_MIXEDMAP or similar flag to
distinguish our new type of mapping from VM_PFNMAP? Which vma flags are
we supposed to use for xip mappings?
- does VM_PFNMAP work as before, or do you intend to replace it?
- what about vm_normal_page? Do you intend to have one per arch? The one
proposed by this patch breaks Jared's pfn_valid() thing and VM_PFNMAP
for archs that don't have PAGE_SPECIAL as far as I can tell.
---
Index: linux-2.6/include/asm-s390/pgtable.h
===================================================================
--- linux-2.6.orig/include/asm-s390/pgtable.h
+++ linux-2.6/include/asm-s390/pgtable.h
@@ -228,6 +228,7 @@ extern unsigned long vmalloc_end;
/* Software bits in the page table entry */
#define _PAGE_SWT 0x001 /* SW pte type bit t */
#define _PAGE_SWX 0x002 /* SW pte type bit x */
+#define _PAGE_SPECIAL 0x004 /* SW associated with special page */
/* Six different types of pages. */
#define _PAGE_TYPE_EMPTY 0x400
@@ -504,6 +505,12 @@ static inline int pte_file(pte_t pte)
return (pte_val(pte) & mask) == _PAGE_TYPE_FILE;
}
+static inline int pte_special(pte_t pte)
+{
+ BUG_ON(!pte_present(pte));
+ return (pte_val(pte) & _PAGE_SPECIAL);
+}
+
#define __HAVE_ARCH_PTE_SAME
#define pte_same(a,b) (pte_val(a) == pte_val(b))
@@ -654,6 +661,13 @@ static inline pte_t pte_mkyoung(pte_t pt
return pte;
}
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+ BUG_ON(!pte_present(pte));
+ pte_val(pte) |= _PAGE_SPECIAL;
+ return pte;
+}
+
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [rfc][patch] mm: use a pte bit to flag normal pages
2008-01-10 13:33 ` Carsten Otte
@ 2008-01-10 23:18 ` Nick Piggin
0 siblings, 0 replies; 16+ messages in thread
From: Nick Piggin @ 2008-01-10 23:18 UTC (permalink / raw)
To: Carsten Otte
Cc: Martin Schwidefsky, carsteno, Heiko Carstens, Jared Hulbert,
Linux Memory Management List, linux-arch
On Thu, Jan 10, 2008 at 02:33:27PM +0100, Carsten Otte wrote:
> Nick Piggin wrote:
> > We initially wanted to do the whole vm_normal_page thing this way, with another
> > pte bit, but we thought there were one or two archs with no spare bits. BTW. I
> > also need this bit in order to implement my lockless get_user_pages, so I do hope
> > to get it in. I'd like to know what architectures cannot spare a software bit in
> > their pte_present ptes...
> I've been playing with the original PAGE_SPECIAL patch a little bit, and
> you can find the corresponding s390 definition below that you might want
> to add to your patch queue.
> It is a little unclear to me, how you'd like to proceed from here:
> - with PTE_SPECIAL, do we still have VM_MIXEDMAP or similar flag to
> distinguish our new type of mapping from VM_PFNMAP? Which vma flags are
> we supposed to use for xip mappings?
We should not need anything in the VMA, because the vm can get all the
required information from the pte. However, we still need to keep the
MIXEMAP and PFNMAP stuff around for architectures that don't provide a
pte_special.
> - does VM_PFNMAP work as before, or do you intend to replace it?
PFNMAP can be replaced with pte_special as well. They are all schemes
used to exempt a pte from having its struct page refcounted... if we
use a bit per pte, then we need nothing else.
> - what about vm_normal_page? Do you intend to have one per arch? The one
> proposed by this patch breaks Jared's pfn_valid() thing and VM_PFNMAP
> for archs that don't have PAGE_SPECIAL as far as I can tell.
I think just have 2 in the core code. Switched by ifdef. I'll work on a
more polished patch for that.
>
> ---
> Index: linux-2.6/include/asm-s390/pgtable.h
> ===================================================================
> --- linux-2.6.orig/include/asm-s390/pgtable.h
> +++ linux-2.6/include/asm-s390/pgtable.h
> @@ -228,6 +228,7 @@ extern unsigned long vmalloc_end;
> /* Software bits in the page table entry */
> #define _PAGE_SWT 0x001 /* SW pte type bit t */
> #define _PAGE_SWX 0x002 /* SW pte type bit x */
> +#define _PAGE_SPECIAL 0x004 /* SW associated with special page */
>
> /* Six different types of pages. */
> #define _PAGE_TYPE_EMPTY 0x400
> @@ -504,6 +505,12 @@ static inline int pte_file(pte_t pte)
> return (pte_val(pte) & mask) == _PAGE_TYPE_FILE;
> }
>
> +static inline int pte_special(pte_t pte)
> +{
> + BUG_ON(!pte_present(pte));
> + return (pte_val(pte) & _PAGE_SPECIAL);
> +}
> +
> #define __HAVE_ARCH_PTE_SAME
> #define pte_same(a,b) (pte_val(a) == pte_val(b))
>
> @@ -654,6 +661,13 @@ static inline pte_t pte_mkyoung(pte_t pt
> return pte;
> }
>
> +static inline pte_t pte_mkspecial(pte_t pte)
> +{
> + BUG_ON(!pte_present(pte));
> + pte_val(pte) |= _PAGE_SPECIAL;
> + return pte;
> +}
> +
> #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-01-10 23:18 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20071221104701.GE28484@wotan.suse.de>
[not found] ` <OFEC52C590.33A28896-ONC12573B8.0069F07E-C12573B8.006B1A41@de.ibm.com>
2008-01-07 4:43 ` [rfc][patch] mm: use a pte bit to flag normal pages Nick Piggin
2008-01-07 10:30 ` Russell King
2008-01-07 11:14 ` Nick Piggin
2008-01-07 18:49 ` Jared Hulbert
2008-01-07 19:45 ` Russell King
2008-01-07 22:52 ` Jared Hulbert
2008-01-08 2:37 ` Andi Kleen
2008-01-08 2:49 ` Nick Piggin
2008-01-08 3:31 ` Andi Kleen
2008-01-08 3:52 ` Nick Piggin
2008-01-08 10:11 ` Catalin Marinas
2008-01-08 10:52 ` Russell King
2008-01-08 13:54 ` Catalin Marinas
2008-01-08 14:08 ` Russell King
2008-01-10 13:33 ` Carsten Otte
2008-01-10 23:18 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).