* [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes
@ 2008-01-25 22:49 Arjan van de Ven
2008-01-28 14:56 ` Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: Arjan van de Ven @ 2008-01-25 22:49 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, tglx, hpa
Subject: x86: a new API for drivers/etc to control cache and other page attributes
From: Arjan van de Ven <arjan@linux.intel.com>
Right now, if drivers or other code want to change, say, a cache attribute of a
page, the only API they have is change_page_attr(). c-p-a is a really bad API
for this, because it forces the caller to know *ALL* the attributes he wants
for the page, not just the 1 thing he wants to change. So code that wants to
set a page uncachable, needs to be aware of the NX status as well etc etc etc.
This patch introduces a set of new APIs for this, set_pages_<attr> and
set_memory_<attr>, that offer a logical change to the user, and leave all
attributes not implied by the requested logical change alone.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
arch/x86/mm/pageattr.c | 197 +++++++++++++++++++++++++++++++++++++++++++
include/asm-x86/cacheflush.h | 15 +++
2 files changed, 212 insertions(+)
Index: linux.trees.git/arch/x86/mm/pageattr.c
===================================================================
--- linux.trees.git.orig/arch/x86/mm/pageattr.c
+++ linux.trees.git/arch/x86/mm/pageattr.c
@@ -215,6 +215,8 @@ repeat:
* mem_map entry (pfn_valid() is false).
*
* See change_page_attr() documentation for more details.
+ *
+ * Modules and drivers should use the set_memory_* APIs instead.
*/
int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
@@ -277,6 +279,8 @@ int change_page_attr_addr(unsigned long
* (e.g. in user space) * This function only deals with the kernel linear map.
*
* For MMIO areas without mem_map use change_page_attr_addr() instead.
+ *
+ * Modules and drivers should use the set_pages_* APIs instead.
*/
int change_page_attr(struct page *page, int numpages, pgprot_t prot)
{
@@ -286,6 +290,199 @@ int change_page_attr(struct page *page,
}
EXPORT_SYMBOL(change_page_attr);
+/**
+ * change_page_attr_set - Change page table attributes in the linear mapping.
+ * @addr: Virtual address in linear mapping.
+ * @numpages: Number of pages to change
+ * @prot: Protection/caching type bits to set (PAGE_*)
+ *
+ * Returns 0 on success, otherwise a negated errno.
+ *
+ * This should be used when a page is mapped with a different caching policy
+ * than write-back somewhere - some CPUs do not like it when mappings with
+ * different caching policies exist. This changes the page attributes of the
+ * in kernel linear mapping too.
+ *
+ * Caller must call global_flush_tlb() later to make the changes active.
+ *
+ * The caller needs to ensure that there are no conflicting mappings elsewhere
+ * (e.g. in user space) * This function only deals with the kernel linear map.
+ *
+ * This function is different from change_page_attr() in that only selected bits
+ * are impacted, all other bits remain as is.
+ */
+int change_page_attr_set(unsigned long addr, int numpages, pgprot_t prot)
+{
+ pgprot_t current_prot;
+ int level;
+ pte_t *pte;
+
+ pte = lookup_address(addr, &level);
+ if (pte)
+ current_prot = pte_pgprot(*pte);
+ else
+ pgprot_val(current_prot) = 0;
+
+ pgprot_val(prot) = pgprot_val(current_prot) | pgprot_val(prot);
+
+ return change_page_attr_addr(addr, numpages, prot);
+}
+
+/**
+ * change_page_attr_clear - Change page table attributes in the linear mapping.
+ * @addr: Virtual address in linear mapping.
+ * @numpages: Number of pages to change
+ * @prot: Protection/caching type bits to clear (PAGE_*)
+ *
+ * Returns 0 on success, otherwise a negated errno.
+ *
+ * This should be used when a page is mapped with a different caching policy
+ * than write-back somewhere - some CPUs do not like it when mappings with
+ * different caching policies exist. This changes the page attributes of the
+ * in kernel linear mapping too.
+ *
+ * Caller must call global_flush_tlb() later to make the changes active.
+ *
+ * The caller needs to ensure that there are no conflicting mappings elsewhere
+ * (e.g. in user space) * This function only deals with the kernel linear map.
+ *
+ * This function is different from change_page_attr() in that only selected bits
+ * are impacted, all other bits remain as is.
+ */
+int change_page_attr_clear(unsigned long addr, int numpages, pgprot_t prot)
+{
+ pgprot_t current_prot;
+ int level;
+ pte_t *pte;
+
+ pte = lookup_address(addr, &level);
+ if (pte)
+ current_prot = pte_pgprot(*pte);
+ else
+ pgprot_val(current_prot) = 0;
+
+ pgprot_val(prot) = pgprot_val(current_prot) & ~pgprot_val(prot);
+
+ return change_page_attr_addr(addr, numpages, prot);
+}
+
+
+
+int set_memory_uc(unsigned long addr, int numpages)
+{
+ pgprot_t uncached;
+
+ pgprot_val(uncached) = _PAGE_PCD | _PAGE_PWT;
+ return change_page_attr_set(addr, numpages, uncached);
+}
+EXPORT_SYMBOL(set_memory_uc);
+
+int set_memory_cached(unsigned long addr, int numpages)
+{
+ pgprot_t uncached;
+
+ pgprot_val(uncached) = _PAGE_PCD | _PAGE_PWT;
+ return change_page_attr_clear(addr, numpages, uncached);
+}
+EXPORT_SYMBOL(set_memory_cached);
+
+int set_memory_x(unsigned long addr, int numpages)
+{
+ pgprot_t nx;
+
+ pgprot_val(nx) = _PAGE_NX;
+ return change_page_attr_clear(addr, numpages, nx);
+}
+EXPORT_SYMBOL(set_memory_x);
+
+int set_memory_nx(unsigned long addr, int numpages)
+{
+ pgprot_t nx;
+
+ pgprot_val(nx) = _PAGE_NX;
+ return change_page_attr_set(addr, numpages, nx);
+}
+EXPORT_SYMBOL(set_memory_nx);
+
+int set_memory_ro(unsigned long addr, int numpages)
+{
+ pgprot_t rw;
+
+ pgprot_val(rw) = _PAGE_RW;
+ return change_page_attr_clear(addr, numpages, rw);
+}
+EXPORT_SYMBOL(set_memory_ro);
+
+int set_memory_rw(unsigned long addr, int numpages)
+{
+ pgprot_t rw;
+
+ pgprot_val(rw) = _PAGE_RW;
+ return change_page_attr_set(addr, numpages, rw);
+}
+EXPORT_SYMBOL(set_memory_rw);
+
+int set_pages_uc(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t uncached;
+
+ pgprot_val(uncached) = _PAGE_PCD | _PAGE_PWT;
+ return change_page_attr_set(addr, numpages, uncached);
+}
+EXPORT_SYMBOL(set_pages_uc);
+
+int set_pages_cached(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t uncached;
+
+ pgprot_val(uncached) = _PAGE_PCD | _PAGE_PWT;
+ return change_page_attr_clear(addr, numpages, uncached);
+}
+EXPORT_SYMBOL(set_pages_cached);
+
+int set_pages_x(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t nx;
+
+ pgprot_val(nx) = _PAGE_NX;
+ return change_page_attr_clear(addr, numpages, nx);
+}
+EXPORT_SYMBOL(set_pages_x);
+
+int set_pages_nx(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t nx;
+
+ pgprot_val(nx) = _PAGE_NX;
+ return change_page_attr_set(addr, numpages, nx);
+}
+EXPORT_SYMBOL(set_pages_nx);
+
+int set_pages_ro(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t rw;
+
+ pgprot_val(rw) = _PAGE_RW;
+ return change_page_attr_clear(addr, numpages, rw);
+}
+EXPORT_SYMBOL(set_pages_ro);
+
+int set_pages_rw(struct page *page, int numpages)
+{
+ unsigned long addr = (unsigned long)page_address(page);
+ pgprot_t rw;
+
+ pgprot_val(rw) = _PAGE_RW;
+ return change_page_attr_set(addr, numpages, rw);
+}
+EXPORT_SYMBOL(set_pages_rw);
+
+
void clflush_cache_range(void *addr, int size)
{
int i;
Index: linux.trees.git/include/asm-x86/cacheflush.h
===================================================================
--- linux.trees.git.orig/include/asm-x86/cacheflush.h
+++ linux.trees.git/include/asm-x86/cacheflush.h
@@ -27,6 +27,21 @@
void global_flush_tlb(void);
int change_page_attr(struct page *page, int numpages, pgprot_t prot);
int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot);
+
+int set_pages_uc(struct page *page, int numpages);
+int set_pages_cached(struct page *page, int numpages);
+int set_pages_x(struct page *page, int numpages);
+int set_pages_nx(struct page *page, int numpages);
+int set_pages_ro(struct page *page, int numpages);
+int set_pages_rw(struct page *page, int numpages);
+
+int set_memory_uc(unsigned long addr, int numpages);
+int set_memory_cached(unsigned long addr, int numpages);
+int set_memory_x(unsigned long addr, int numpages);
+int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_ro(unsigned long addr, int numpages);
+int set_memory_rw(unsigned long addr, int numpages);
+
void clflush_cache_range(void *addr, int size);
#ifdef CONFIG_DEBUG_RODATA
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes
2008-01-25 22:49 [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes Arjan van de Ven
@ 2008-01-28 14:56 ` Andi Kleen
2008-01-28 15:55 ` Arjan van de Ven
0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2008-01-28 14:56 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: linux-kernel, mingo, tglx, hpa
Arjan van de Ven <arjan@infradead.org> writes:
> Right now, if drivers or other code want to change, say, a cache attribute of a
> page, the only API they have is change_page_attr(). c-p-a is a really bad API
> for this, because it forces the caller to know *ALL* the attributes he wants
> for the page, not just the 1 thing he wants to change. So code that wants to
> set a page uncachable, needs to be aware of the NX status as well etc etc etc.
Please think clearly through the various cases.
NX for areas which can be legitimately non NX (very few) is 100%
transparently handled in c_p_a() no matter what the caller passes in.
For DEBUG_PAGEALLOC it is similar. While it can make kernel pages
ro these should be only free pages and what business has anybody changing
attributes on arbitary free pages? No it is just a bug.
So if you look closely at the various cases there is no legitimate
reason to ever use anything other than the standard PAGE_KERNEL_*
defines with change_page_attr()
The only exception I know of is the cpa selftest which can change
attributes of arbitrary pages, but that one does a lookup_address on
its own anyways.
You basically solved a non-issue here.
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes
2008-01-28 14:56 ` Andi Kleen
@ 2008-01-28 15:55 ` Arjan van de Ven
2008-01-28 17:28 ` Andi Kleen
0 siblings, 1 reply; 4+ messages in thread
From: Arjan van de Ven @ 2008-01-28 15:55 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, mingo, tglx, hpa
On Mon, 28 Jan 2008 15:56:21 +0100
Andi Kleen <andi@firstfloor.org> wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
>
> > Right now, if drivers or other code want to change, say, a cache
> > attribute of a page, the only API they have is change_page_attr().
> > c-p-a is a really bad API for this, because it forces the caller to
> > know *ALL* the attributes he wants for the page, not just the 1
> > thing he wants to change. So code that wants to set a page
> > uncachable, needs to be aware of the NX status as well etc etc etc.
>
> Please think clearly through the various cases.
>
> NX for areas which can be legitimately non NX (very few) is 100%
> transparently handled in c_p_a() no matter what the caller passes in.
1) that didn't used to be the case upto and including 2.6.24
2) even for the various arch/x86 pieces there were.. interesting issues there
The new API is a lot simpler, and it is INTENT driven.
This means that PAT (for 2.6.26) no longer has to second guess various things
and capabilities, it just gets a set_memory_uc() or set_memory_wc() call and
it can do the right thing for the hw/sw combination at hand.
c_p_a() doesn't give you intent, it gives you a whole range of bits, and it's
not clear which ones the caller cares about.
> So if you look closely at the various cases there is no legitimate
> reason to ever use anything other than the standard PAGE_KERNEL_*
> defines with change_page_attr()
I looked carefully at all the cases, and there are basically 2 classes
1) Drivers needed memory to be uncached (or in 2.6.26, wc)
2) Core x86 code needing to change 1 attribute only for various reasons
The new API caters to both in a very natural way, while allowing the implementation
to deal with things like PAT etc in a much more natural way because the intent
is given, and only the intent is changed in behavior.
> The only exception I know of is the cpa selftest which can change
> attributes of arbitrary pages, but that one does a lookup_address on
> its own anyways.
yeah the selftest was quite buggy; it would look at the first page of a range,
and then mark the entire range with that attribute. This explodes nicely if the
first page is in the read only .rodata section, but the range extends well beyond that...
it made the .data section and more read only. That works great... for about 100 nanoseconds.
> You basically solved a non-issue here.
Given the amount of issues in this area of code (now and in the future with PAT), including
int the cpa-selftest that you wrote and was buggy in using cpa, I would have to disagree with you.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes
2008-01-28 15:55 ` Arjan van de Ven
@ 2008-01-28 17:28 ` Andi Kleen
0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2008-01-28 17:28 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Andi Kleen, linux-kernel, mingo, tglx, hpa
> The new API is a lot simpler, and it is INTENT driven.
> This means that PAT (for 2.6.26) no longer has to second guess various things
> and capabilities, it just gets a set_memory_uc() or set_memory_wc() call and
AFAIK there is no valid use case where you would ever change PAT
bits on a page you don't own. And on pages you own you should really
know the full capability bits anyways.
> c_p_a() doesn't give you intent, it gives you a whole range of bits, and it's
> not clear which ones the caller cares about.
It needs to care about all of them if it wants to do anything
useful with the page.
>
>
>
> > So if you look closely at the various cases there is no legitimate
> > reason to ever use anything other than the standard PAGE_KERNEL_*
> > defines with change_page_attr()
>
> I looked carefully at all the cases, and there are basically 2 classes
> 1) Drivers needed memory to be uncached (or in 2.6.26, wc)
Only on their own memory they control. And it better no be funny
in any other way. e.g. toggling RW there is obviously not a good idea,
unless the driver is aware of that.
> 2) Core x86 code needing to change 1 attribute only for various reasons
The two cases here are NX (transparent) and setting RO (only on freed pages
nobody else should mess with and freed pages should not have any
PAT etc. attributes either).
I maintain my earlier point that there is no valid use case for
changing bits on pages without you're being aware of all bits
(modulo NX which is transparent and self test)
>
> > The only exception I know of is the cpa selftest which can change
> > attributes of arbitrary pages, but that one does a lookup_address on
> > its own anyways.
>
> yeah the selftest was quite buggy; it would look at the first page of a range,
> and then mark the entire range with that attribute. This explodes nicely if the
Yes that was a bug I had fixed here, unfortunately that fix never made
it in because I was forced to drop all my changes with the great rewrite.
> Given the amount of issues in this area of code (now and in the future with PAT), including
> int the cpa-selftest that you wrote and was buggy in using cpa, I would have to disagree with you.
The original code I wrote was correct but then I later decided to add
in variable length changes too which was not quite correct on the first
try.
Anyways using the selftest for arguing general API changes seems somewhat
bogus to me because it is really an extremly specialized special case
that I don't expect to apply anywhere else. What it actually does is
quite bogus.
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-01-28 16:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-25 22:49 [patch 1/3] x86: a new API for drivers/etc to control cache and other page attributes Arjan van de Ven
2008-01-28 14:56 ` Andi Kleen
2008-01-28 15:55 ` Arjan van de Ven
2008-01-28 17:28 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox