* [PATCH] Allow drivers to map individual 4k pages to userspace
@ 2007-04-03 11:24 Paul Mackerras
2007-04-04 2:34 ` Roland Dreier
0 siblings, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2007-04-03 11:24 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Joachim Fenkes
Some drivers have resources that they want to be able to map into
userspace that are 4k in size. On a kernel configured with 64k pages
we currently end up mapping the 4k we want plus another 60k of
physical address space, which could contain anything. This can
introduce security problems, for example in the case of an infiniband
adaptor where the other 60k could contain registers that some other
program is using for its communications.
This patch adds a new function, remap_4k_pfn, which drivers can use to
map a single 4k page to userspace regardless of whether the kernel is
using a 4k or a 64k page size. Like remap_pfn_range, it would
typically be called in a driver's mmap function. It only maps a
single 4k page, which on a 64k page kernel appears replicated 16 times
throughout a 64k page. On a 4k page kernel it reduces to a call to
remap_pfn_range.
The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN,
gets set on the linux PTE. This alters the way that __hash_page_4K
computes the real address to put in the HPTE. The RPN field of the
linux PTE becomes the 4k RPN directly rather than being interpreted as
a 64k RPN. Since the RPN field is 32 bits, this means that physical
addresses being mapped with remap_4k_pfn have to be below 2^44,
i.e. 0x100000000000.
The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c
that deals with demoting a process to use 4k pages into one function
that gets called in the various different places where we need to do
that. There were some discrepancies between exactly what was done in
the various places, such as a call to spu_flush_all_slbs in one case
but not in others.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
diff --git a/arch/powerpc/mm/hash_low_64.S b/arch/powerpc/mm/hash_low_64.S
index 9bc0a9c..e64ce3e 100644
--- a/arch/powerpc/mm/hash_low_64.S
+++ b/arch/powerpc/mm/hash_low_64.S
@@ -445,9 +445,12 @@ END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)
htab_insert_pte:
/* real page number in r5, PTE RPN value + index */
- rldicl r5,r31,64-PTE_RPN_SHIFT,PTE_RPN_SHIFT
+ andis. r0,r31,_PAGE_4K_PFN@h
+ srdi r5,r31,PTE_RPN_SHIFT
+ bne- htab_special_pfn
sldi r5,r5,PAGE_SHIFT-HW_PAGE_SHIFT
add r5,r5,r25
+htab_special_pfn:
sldi r5,r5,HW_PAGE_SHIFT
/* Calculate primary group hash */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 3c7fe2c..aae0853 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -573,6 +573,27 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
return pp;
}
+/*
+ * Demote a segment to using 4k pages.
+ * For now this makes the whole process use 4k pages.
+ */
+void demote_segment_4k(struct mm_struct *mm, unsigned long addr)
+{
+#ifdef CONFIG_PPC_64K_PAGES
+ if (mm->context.user_psize == MMU_PAGE_4K)
+ return;
+ mm->context.user_psize = MMU_PAGE_4K;
+ mm->context.sllp = SLB_VSID_USER | mmu_psize_defs[MMU_PAGE_4K].sllp;
+ get_paca()->context = mm->context;
+ slb_flush_and_rebolt();
+#ifdef CONFIG_SPE_BASE
+ spu_flush_all_slbs(mm);
+#endif
+#endif
+}
+
+EXPORT_SYMBOL_GPL(demote_segment_4k);
+
/* Result code is:
* 0 - handled
* 1 - normal page fault
@@ -665,15 +686,19 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap)
#ifndef CONFIG_PPC_64K_PAGES
rc = __hash_page_4K(ea, access, vsid, ptep, trap, local);
#else
+ /* If _PAGE_4K_PFN is set, make sure this is a 4k segment */
+ if (pte_val(*ptep) & _PAGE_4K_PFN) {
+ demote_segment_4k(mm, ea);
+ psize = MMU_PAGE_4K;
+ }
+
if (mmu_ci_restrictions) {
/* If this PTE is non-cacheable, switch to 4k */
if (psize == MMU_PAGE_64K &&
(pte_val(*ptep) & _PAGE_NO_CACHE)) {
if (user_region) {
+ demote_segment_4k(mm, ea);
psize = MMU_PAGE_4K;
- mm->context.user_psize = MMU_PAGE_4K;
- mm->context.sllp = SLB_VSID_USER |
- mmu_psize_defs[MMU_PAGE_4K].sllp;
} else if (ea < VMALLOC_END) {
/*
* some driver did a non-cacheable mapping
@@ -756,16 +781,8 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
if (mmu_ci_restrictions) {
/* If this PTE is non-cacheable, switch to 4k */
if (mm->context.user_psize == MMU_PAGE_64K &&
- (pte_val(*ptep) & _PAGE_NO_CACHE)) {
- mm->context.user_psize = MMU_PAGE_4K;
- mm->context.sllp = SLB_VSID_USER |
- mmu_psize_defs[MMU_PAGE_4K].sllp;
- get_paca()->context = mm->context;
- slb_flush_and_rebolt();
-#ifdef CONFIG_SPE_BASE
- spu_flush_all_slbs(mm);
-#endif
- }
+ (pte_val(*ptep) & _PAGE_NO_CACHE))
+ demote_segment_4k(mm, ea);
}
if (mm->context.user_psize == MMU_PAGE_64K)
__hash_page_64K(ea, access, vsid, ptep, trap, local);
diff --git a/include/asm-powerpc/pgtable-4k.h b/include/asm-powerpc/pgtable-4k.h
index 345d9b0..a28fa8b 100644
--- a/include/asm-powerpc/pgtable-4k.h
+++ b/include/asm-powerpc/pgtable-4k.h
@@ -97,3 +97,6 @@
#define pud_ERROR(e) \
printk("%s:%d: bad pud %08lx.\n", __FILE__, __LINE__, pud_val(e))
+
+#define remap_4k_pfn(vma, addr, pfn, prot) \
+ remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
diff --git a/include/asm-powerpc/pgtable-64k.h b/include/asm-powerpc/pgtable-64k.h
index 4b7126c..5e84f07 100644
--- a/include/asm-powerpc/pgtable-64k.h
+++ b/include/asm-powerpc/pgtable-64k.h
@@ -35,6 +35,7 @@
#define _PAGE_HPTE_SUB 0x0ffff000 /* combo only: sub pages HPTE bits */
#define _PAGE_HPTE_SUB0 0x08000000 /* combo only: first sub page */
#define _PAGE_COMBO 0x10000000 /* this is a combo 4k page */
+#define _PAGE_4K_PFN 0x20000000 /* PFN is for a single 4k page */
#define _PAGE_F_SECOND 0x00008000 /* full page: hidx bits */
#define _PAGE_F_GIX 0x00007000 /* full page: hidx bits */
@@ -93,6 +94,10 @@
#define pte_pagesize_index(pte) \
(((pte) & _PAGE_COMBO)? MMU_PAGE_4K: MMU_PAGE_64K)
+#define remap_4k_pfn(vma, addr, pfn, prot) \
+ remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, \
+ __pgprot(pgprot_val((prot)) | _PAGE_4K_PFN))
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_PGTABLE_64K_H */
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-03 11:24 [PATCH] Allow drivers to map individual 4k pages to userspace Paul Mackerras
@ 2007-04-04 2:34 ` Roland Dreier
2007-04-04 3:23 ` Benjamin Herrenschmidt
2007-04-04 4:41 ` Paul Mackerras
0 siblings, 2 replies; 7+ messages in thread
From: Roland Dreier @ 2007-04-04 2:34 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, Joachim Fenkes
Warning: I am somewhat of a PowerPC ignoramus, so this reply may not
make sense.
> Some drivers have resources that they want to be able to map into
> userspace that are 4k in size. On a kernel configured with 64k pages
> we currently end up mapping the 4k we want plus another 60k of
> physical address space, which could contain anything. This can
> introduce security problems, for example in the case of an infiniband
> adaptor where the other 60k could contain registers that some other
> program is using for its communications.
I assume this is an eHCA-specific problem. Mellanox adapters (which
I'm much more familiar with) allow the driver to pass in the system
page size at initialization time, and make the register pages of size
equal to the system size.
Another approach is simply not to enable the other 4K pages that are
exposed when mapping a 64K page into userspace - ie only use 1/16th of
the available contexts. Although perhaps eHCA has such a limited # of
contexts that this is not practical.
> This patch adds a new function, remap_4k_pfn, which drivers can use to
> map a single 4k page to userspace regardless of whether the kernel is
> using a 4k or a 64k page size. Like remap_pfn_range, it would
> typically be called in a driver's mmap function. It only maps a
> single 4k page, which on a 64k page kernel appears replicated 16 times
> throughout a 64k page. On a 4k page kernel it reduces to a call to
> remap_pfn_range.
The problem with this approach is that remap_4k_pfn is
powerpc-specific, right? For example, I don't believe that an ia64
kernel running with 16K pages could implement this. Which means that
any driver that calls remap_4k_pfn is now powerpc-specific (or has an
#ifdef to work around this).
In fact my impression was that the powerpc MMU is not part of the
architecture, in the sense that a new implementation could come along
that supported 64K pages without the ability to do this 4K aliasing
trick. Which would make multiplatform kernels very painful, since
remap_4k_pfn might work for some platforms the kernel could boot on.
Or is this not a problem?
- R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-04 2:34 ` Roland Dreier
@ 2007-04-04 3:23 ` Benjamin Herrenschmidt
2007-04-04 4:07 ` Roland Dreier
2007-04-04 4:41 ` Paul Mackerras
1 sibling, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2007-04-04 3:23 UTC (permalink / raw)
To: Roland Dreier; +Cc: linuxppc-dev, Joachim Fenkes, Paul Mackerras
> The problem with this approach is that remap_4k_pfn is
> powerpc-specific, right? For example, I don't believe that an ia64
> kernel running with 16K pages could implement this. Which means that
> any driver that calls remap_4k_pfn is now powerpc-specific (or has an
> #ifdef to work around this).
>
> In fact my impression was that the powerpc MMU is not part of the
> architecture, in the sense that a new implementation could come along
> that supported 64K pages without the ability to do this 4K aliasing
> trick. Which would make multiplatform kernels very painful, since
> remap_4k_pfn might work for some platforms the kernel could boot on.
> Or is this not a problem?
It's somewhat architected. I doubt there will ever be a processor that
can have an eHCA and doesn't support that trick. The thing is, eHCA is
platform specific, so the remap_4k_pfn would have to be called by driver
specific code, but that's not a problem since that driver will only ever
be used on those platforms that support that call.
Ben.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-04 3:23 ` Benjamin Herrenschmidt
@ 2007-04-04 4:07 ` Roland Dreier
2007-04-04 4:43 ` Benjamin Herrenschmidt
2007-04-04 5:14 ` Paul Mackerras
0 siblings, 2 replies; 7+ messages in thread
From: Roland Dreier @ 2007-04-04 4:07 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Joachim Fenkes, Paul Mackerras
> It's somewhat architected. I doubt there will ever be a processor that
> can have an eHCA and doesn't support that trick. The thing is, eHCA is
> platform specific, so the remap_4k_pfn would have to be called by driver
> specific code, but that's not a problem since that driver will only ever
> be used on those platforms that support that call.
If I'm going off on something irrelevant, just tell me. But is there
a chance that you would want to build a kernel that can boot on both a
platform that has eHCA, and also on some other platform that cannot
support remap_4k_pfn? If so does this approach cause problems?
- R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-04 4:07 ` Roland Dreier
@ 2007-04-04 4:43 ` Benjamin Herrenschmidt
2007-04-04 5:14 ` Paul Mackerras
1 sibling, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2007-04-04 4:43 UTC (permalink / raw)
To: Roland Dreier; +Cc: linuxppc-dev, Joachim Fenkes, Paul Mackerras
On Tue, 2007-04-03 at 21:07 -0700, Roland Dreier wrote:
> > It's somewhat architected. I doubt there will ever be a processor that
> > can have an eHCA and doesn't support that trick. The thing is, eHCA is
> > platform specific, so the remap_4k_pfn would have to be called by driver
> > specific code, but that's not a problem since that driver will only ever
> > be used on those platforms that support that call.
>
> If I'm going off on something irrelevant, just tell me. But is there
> a chance that you would want to build a kernel that can boot on both a
> platform that has eHCA, and also on some other platform that cannot
> support remap_4k_pfn? If so does this approach cause problems?
As long as remap_4k_pfn() is only called on the platform that supports
it, it's fine... and if we ever end up having such platforms (we don't
at the moment, remap_4k_pfn() is basically a forced 4K fallback which is
always possible in a machine running a 64K base page kernel), we'll make
it fail at runtime. Again, as long as it's called only by the eHCA
driver, we are pretty certain it will only ever be called on plaforms
where it makes sense.
Ben.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-04 4:07 ` Roland Dreier
2007-04-04 4:43 ` Benjamin Herrenschmidt
@ 2007-04-04 5:14 ` Paul Mackerras
1 sibling, 0 replies; 7+ messages in thread
From: Paul Mackerras @ 2007-04-04 5:14 UTC (permalink / raw)
To: Roland Dreier; +Cc: Joachim Fenkes, linuxppc-dev
Roland Dreier writes:
> If I'm going off on something irrelevant, just tell me. But is there
> a chance that you would want to build a kernel that can boot on both a
> platform that has eHCA, and also on some other platform that cannot
> support remap_4k_pfn?
In a word, no. :)
Only 64-bit machines have eHCA. A 64-bit powerpc kernel only supports
64-bit machines, and all 64-bit powerpc machines can support
remap_4k_pfn. I don't see any difficulty arising with future 64-bit
embedded processors either.
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Allow drivers to map individual 4k pages to userspace
2007-04-04 2:34 ` Roland Dreier
2007-04-04 3:23 ` Benjamin Herrenschmidt
@ 2007-04-04 4:41 ` Paul Mackerras
1 sibling, 0 replies; 7+ messages in thread
From: Paul Mackerras @ 2007-04-04 4:41 UTC (permalink / raw)
To: Roland Dreier; +Cc: linuxppc-dev, Joachim Fenkes
Roland Dreier writes:
> I assume this is an eHCA-specific problem.
Guilty as charged. :)
> Another approach is simply not to enable the other 4K pages that are
> exposed when mapping a 64K page into userspace - ie only use 1/16th of
> the available contexts. Although perhaps eHCA has such a limited # of
> contexts that this is not practical.
I believe the hypervisor controls the allocation of contexts to
partitions, so this wouldn't be practical.
> The problem with this approach is that remap_4k_pfn is
> powerpc-specific, right?
It is powerpc-specific, but so is the eHCA driver...
> For example, I don't believe that an ia64
> kernel running with 16K pages could implement this. Which means that
> any driver that calls remap_4k_pfn is now powerpc-specific (or has an
> #ifdef to work around this).
I am a complete ia64-ignoramus, so I couldn't say whether ia64 could
do it or not.
> In fact my impression was that the powerpc MMU is not part of the
> architecture, in the sense that a new implementation could come along
> that supported 64K pages without the ability to do this 4K aliasing
> trick. Which would make multiplatform kernels very painful, since
> remap_4k_pfn might work for some platforms the kernel could boot on.
> Or is this not a problem?
The PowerPC architecture distinguishes between server and embedded
processors. The MMU is part of the architecture for server
processors, and is specified reasonably tightly. For embedded there
is also a specification of the MMU but it is very loose. In any case,
all PowerPC processors can do 4k pages, so I don't see any problem.
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-04-04 5:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-03 11:24 [PATCH] Allow drivers to map individual 4k pages to userspace Paul Mackerras
2007-04-04 2:34 ` Roland Dreier
2007-04-04 3:23 ` Benjamin Herrenschmidt
2007-04-04 4:07 ` Roland Dreier
2007-04-04 4:43 ` Benjamin Herrenschmidt
2007-04-04 5:14 ` Paul Mackerras
2007-04-04 4:41 ` Paul Mackerras
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).