* parisc kernel cache flush issue with mprotect()
@ 2025-06-25 20:56 Helge Deller
2025-07-03 19:22 ` Dave Anglin
0 siblings, 1 reply; 4+ messages in thread
From: Helge Deller @ 2025-06-25 20:56 UTC (permalink / raw)
To: Dave Anglin, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
While analyzing testcase failures in the libunwind package,
I found a parisc Linux kernel issue.
The attached testcase initilizes a memory region, then
calls mprotect(PROT_NONE) on that region and thus effectively
should prevent userspace and kernel to access that memory.
But on parisc it seems the CPU cache isn't flushed when
changing the protection of a memory region with mprotect().
So, the kernel will wrongly not fault when accessing this memory.
I think we are missing to purge the data cache when changing
the protection.
Ideas?
Helge
[-- Attachment #2: mprotect-testcase.c --]
[-- Type: text/x-csrc, Size: 1021 bytes --]
/* Testcase based on Ltest-mem-validate.c from libunwind */
#include <stdint.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char **argv)
{
unsigned long page_size = sysconf(_SC_PAGESIZE);
char *p = malloc(3 * page_size);
char *p_aligned;
/* initialize memory region. If not initialized, write syscall below will correctly return EFAULT. */
if (1)
memset(p, 'X', 3 * page_size);
p_aligned = (char *) ((((uintptr_t) p) + (2*page_size - 1)) & ~(page_size - 1));
/* Drop PROT_READ protection. Kernel and userspace should fault when accessing that memory region */
mprotect(p_aligned, page_size, PROT_NONE);
/* the following write() should return EFAULT, since PROT_READ was dropped by previous mprotect() */
int ret = write(2, p_aligned, 1);
if (!ret || errno != EFAULT)
printf("\n FAILURE: write() did not returned expected EFAULT value\n");
return 0;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: parisc kernel cache flush issue with mprotect()
2025-06-25 20:56 parisc kernel cache flush issue with mprotect() Helge Deller
@ 2025-07-03 19:22 ` Dave Anglin
2025-07-04 19:49 ` Dave Anglin
0 siblings, 1 reply; 4+ messages in thread
From: Dave Anglin @ 2025-07-03 19:22 UTC (permalink / raw)
To: Helge Deller, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]
On 2025-06-25 4:56 p.m., Helge Deller wrote:
> While analyzing testcase failures in the libunwind package,
> I found a parisc Linux kernel issue.
>
> The attached testcase initilizes a memory region, then
> calls mprotect(PROT_NONE) on that region and thus effectively
> should prevent userspace and kernel to access that memory.
>
> But on parisc it seems the CPU cache isn't flushed when
> changing the protection of a memory region with mprotect().
> So, the kernel will wrongly not fault when accessing this memory.
>
> I think we are missing to purge the data cache when changing
> the protection.
> Ideas?
The attached patch fixes the test. It also
After a lot of study regarding our TLB handling, I realized that user pages can always be
read by the kernel even when _PAGE_READ is dropped. We need to use the probe instruction
to determine whether a page is accessible in user mode or not.
Probably, get_user needs a similar fix.
I've wondered if _PAGE_WRITE pages should always be writable by kernel? We could put
_PAGE_WRITE in the most significant bit of PL2. Access mode 0 is nominally for public pages
as protection checking is disabled for it. Maybe shouldn't use it? Does _PAGE_WRITE imply
_PAGE_READ on Linux?
Dave
--
John David Anglin dave@parisc-linux.org
[-- Attachment #2: check-user-read-access.txt --]
[-- Type: text/plain, Size: 1412 bytes --]
diff --git a/arch/parisc/include/asm/special_insns.h b/arch/parisc/include/asm/special_insns.h
index 51f40eaf7780..3bec9bc2075a 100644
--- a/arch/parisc/include/asm/special_insns.h
+++ b/arch/parisc/include/asm/special_insns.h
@@ -32,6 +32,21 @@
pa; \
})
+#define prober_user(va) ({ \
+ unsigned long pa; \
+ __asm__ __volatile__( \
+ "copy %%r0,%0\n" \
+ "8:\tprobei,r (%%sr3,%1),3,%0\n" \
+ "9:\n" \
+ ASM_EXCEPTIONTABLE_ENTRY(8b, 9b, \
+ "or %%r0,%%r0,%%r0") \
+ : "=&r" (pa) \
+ : "r" (va) \
+ : "memory" \
+ ); \
+ pa; \
+})
+
#define CR_EIEM 15 /* External Interrupt Enable Mask */
#define CR_CR16 16 /* CR16 Interval Timer */
#define CR_EIRR 23 /* External Interrupt Request Register */
diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 5fc0c852c84c..2116b60d4072 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -32,6 +32,15 @@ EXPORT_SYMBOL(raw_copy_to_user);
unsigned long raw_copy_from_user(void *dst, const void __user *src,
unsigned long len)
{
+ unsigned long start = (unsigned long) src;
+ unsigned long end = start + len;
+
+ /* Check region is user accessible */
+ while (start < end) {
+ if (!prober_user(src))
+ return len;
+ start += PAGE_SIZE;
+ }
mtsp(get_user_space(), SR_TEMP1);
mtsp(get_kernel_space(), SR_TEMP2);
return pa_memcpy(dst, (void __force *)src, len);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: parisc kernel cache flush issue with mprotect()
2025-07-03 19:22 ` Dave Anglin
@ 2025-07-04 19:49 ` Dave Anglin
2025-07-05 8:06 ` Jeroen Roovers
0 siblings, 1 reply; 4+ messages in thread
From: Dave Anglin @ 2025-07-04 19:49 UTC (permalink / raw)
To: Helge Deller, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]
On 2025-07-03 3:22 p.m., Dave Anglin wrote:
> On 2025-06-25 4:56 p.m., Helge Deller wrote:
>> While analyzing testcase failures in the libunwind package,
>> I found a parisc Linux kernel issue.
>>
>> The attached testcase initilizes a memory region, then
>> calls mprotect(PROT_NONE) on that region and thus effectively
>> should prevent userspace and kernel to access that memory.
>>
>> But on parisc it seems the CPU cache isn't flushed when
>> changing the protection of a memory region with mprotect().
>> So, the kernel will wrongly not fault when accessing this memory.
>>
>> I think we are missing to purge the data cache when changing
>> the protection.
>> Ideas?
> The attached patch fixes the test. It also
>
> After a lot of study regarding our TLB handling, I realized that user pages can always be
> read by the kernel even when _PAGE_READ is dropped. We need to use the probe instruction
> to determine whether a page is accessible in user mode or not.
>
> Probably, get_user needs a similar fix.
>
> I've wondered if _PAGE_WRITE pages should always be writable by kernel? We could put
> _PAGE_WRITE in the most significant bit of PL2. Access mode 0 is nominally for public pages
> as protection checking is disabled for it. Maybe shouldn't use it? Does _PAGE_WRITE imply
> _PAGE_READ on Linux?
Attached is the current patch that I'm testing on mx3210. I believe it fixes everything that I noticed
debugging the mprotect-testcase. It also has your command line patch and some old radeon changes.
Dave
--
John David Anglin dave@parisc-linux.org
[-- Attachment #2: linux-6.16.0-rc4+-20250704.txt --]
[-- Type: text/plain, Size: 11711 bytes --]
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index 1a86a4370b29..2c139a4dbf4b 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -276,7 +276,7 @@ extern unsigned long *empty_zero_page;
#define pte_none(x) (pte_val(x) == 0)
#define pte_present(x) (pte_val(x) & _PAGE_PRESENT)
#define pte_user(x) (pte_val(x) & _PAGE_USER)
-#define pte_clear(mm, addr, xp) set_pte(xp, __pte(0))
+#define pte_clear(mm, addr, xp) set_pte_at((mm), (addr), (xp), __pte(0))
#define pmd_flag(x) (pmd_val(x) & PxD_FLAG_MASK)
#define pmd_address(x) ((unsigned long)(pmd_val(x) &~ PxD_FLAG_MASK) << PxD_VALUE_SHIFT)
@@ -392,6 +392,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
}
}
#define set_ptes set_ptes
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
/* Used for deferring calls to flush_dcache_page() */
@@ -456,7 +457,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned
if (!pte_young(pte)) {
return 0;
}
- set_pte(ptep, pte_mkold(pte));
+ set_pte_at(vma->vm_mm, addr, ptep, pte_mkold(pte));
return 1;
}
@@ -466,7 +467,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *pt
struct mm_struct;
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
- set_pte(ptep, pte_wrprotect(*ptep));
+ set_pte_at(mm, addr, ptep, pte_wrprotect(*ptep));
}
#define pte_same(A,B) (pte_val(A) == pte_val(B))
diff --git a/arch/parisc/include/asm/special_insns.h b/arch/parisc/include/asm/special_insns.h
index 51f40eaf7780..3bec9bc2075a 100644
--- a/arch/parisc/include/asm/special_insns.h
+++ b/arch/parisc/include/asm/special_insns.h
@@ -32,6 +32,21 @@
pa; \
})
+#define prober_user(va) ({ \
+ unsigned long pa; \
+ __asm__ __volatile__( \
+ "copy %%r0,%0\n" \
+ "8:\tprobei,r (%%sr3,%1),3,%0\n" \
+ "9:\n" \
+ ASM_EXCEPTIONTABLE_ENTRY(8b, 9b, \
+ "or %%r0,%%r0,%%r0") \
+ : "=&r" (pa) \
+ : "r" (va) \
+ : "memory" \
+ ); \
+ pa; \
+})
+
#define CR_EIEM 15 /* External Interrupt Enable Mask */
#define CR_CR16 16 /* CR16 Interval Timer */
#define CR_EIRR 23 /* External Interrupt Request Register */
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index db531e58d70e..37ca484cc495 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -429,7 +429,7 @@ static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
return ptep;
}
-static inline bool pte_needs_flush(pte_t pte)
+static inline bool pte_needs_cache_flush(pte_t pte)
{
return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_NO_CACHE))
== (_PAGE_PRESENT | _PAGE_ACCESSED);
@@ -630,7 +630,7 @@ static void flush_cache_page_if_present(struct vm_area_struct *vma,
ptep = get_ptep(vma->vm_mm, vmaddr);
if (ptep) {
pte = ptep_get(ptep);
- needs_flush = pte_needs_flush(pte);
+ needs_flush = pte_needs_cache_flush(pte);
pte_unmap(ptep);
}
if (needs_flush)
@@ -841,7 +841,7 @@ void flush_cache_vmap(unsigned long start, unsigned long end)
}
vm = find_vm_area((void *)start);
- if (WARN_ON_ONCE(!vm)) {
+ if (!vm) {
flush_cache_all();
return;
}
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index ea57bcc21dc5..c9c7e46c4a25 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -499,6 +499,12 @@
* this happens is quite subtle, read below */
.macro make_insert_tlb spc,pte,prot,tmp
space_to_prot \spc \prot /* create prot id from space */
+
+#if _PAGE_SPECIAL_BIT == _PAGE_DMB_BIT
+ /* need to drop DMB bit, as it's used as SPECIAL flag */
+ depi 0,_PAGE_SPECIAL_BIT,1,\pte
+#endif
+
/* The following is the real subtlety. This is depositing
* T <-> _PAGE_REFTRAP
* D <-> _PAGE_DIRTY
@@ -511,17 +517,18 @@
* Finally, _PAGE_READ goes in the top bit of PL1 (so we
* trigger an access rights trap in user space if the user
* tries to read an unreadable page */
-#if _PAGE_SPECIAL_BIT == _PAGE_DMB_BIT
- /* need to drop DMB bit, as it's used as SPECIAL flag */
- depi 0,_PAGE_SPECIAL_BIT,1,\pte
-#endif
depd \pte,8,7,\prot
/* PAGE_USER indicates the page can be read with user privileges,
* so deposit X1|11 to PL1|PL2 (remember the upper bit of PL1
- * contains _PAGE_READ) */
+ * contains _PAGE_READ). While the kernel can't directly write
+ * user pages which have _PAGE_WRITE zero, it can read pages
+ * pages which have _PAGE_READ zero (PL <= PL1). Thus, the kernel
+ * exception fault handler doesn't trigger when reading pages
+ * that aren't user read accessible */
extrd,u,*= \pte,_PAGE_USER_BIT+32,1,%r0
depdi 7,11,3,\prot
+
/* If we're a gateway page, drop PL2 back to zero for promotion
* to kernel privilege (so we can execute the page as kernel).
* Any privilege promotion page always denys read and write */
diff --git a/arch/parisc/lib/memcpy.c b/arch/parisc/lib/memcpy.c
index 5fc0c852c84c..2116b60d4072 100644
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -32,6 +32,15 @@ EXPORT_SYMBOL(raw_copy_to_user);
unsigned long raw_copy_from_user(void *dst, const void __user *src,
unsigned long len)
{
+ unsigned long start = (unsigned long) src;
+ unsigned long end = start + len;
+
+ /* Check region is user accessible */
+ while (start < end) {
+ if (!prober_user(src))
+ return len;
+ start += PAGE_SIZE;
+ }
mtsp(get_user_space(), SR_TEMP1);
mtsp(get_kernel_space(), SR_TEMP2);
return pa_memcpy(dst, (void __force *)src, len);
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index c39de84e98b0..bb41a2d2028d 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -19,6 +19,7 @@
#include <linux/uaccess.h>
#include <linux/hugetlb.h>
#include <linux/perf_event.h>
+#include <linux/sched/mm.h>
#include <asm/traps.h>
@@ -234,6 +235,146 @@ const char *trap_name(unsigned long code)
return t ? t : "Unknown trap";
}
+/*
+ * If the user used setproctitle(), we just get the string from
+ * user space at arg_start, and limit it to a maximum of one page.
+ */
+static ssize_t get_mm_proctitle(struct mm_struct *mm,
+ size_t count, unsigned long pos,
+ unsigned long arg_start)
+{
+ char *page;
+ int ret, got;
+
+ if (pos >= PAGE_SIZE)
+ return 0;
+
+ page = (char *)__get_free_page(GFP_KERNEL);
+ if (!page)
+ return -ENOMEM;
+
+ ret = 0;
+ got = access_remote_vm(mm, arg_start, page, PAGE_SIZE, FOLL_ANON);
+ if (got > 0) {
+ int len = strnlen(page, got);
+
+ /* Include the NUL character if it was found */
+ if (0 && len < got)
+ len++;
+
+ if (len > pos) {
+ len -= pos;
+ if (len > count)
+ len = count;
+ pr_cont("%.*s", len, page+pos);
+ ret = len;
+ }
+ }
+ free_page((unsigned long)page);
+ return ret;
+}
+
+static ssize_t get_mm_cmdline(struct mm_struct *mm, size_t count)
+{
+ unsigned long arg_start, arg_end, env_start, env_end;
+ unsigned long pos, len;
+ char *page, c;
+
+ /* Check if process spawned far enough to have cmdline. */
+ if (!mm->env_end)
+ return 0;
+
+ spin_lock(&mm->arg_lock);
+ arg_start = mm->arg_start;
+ arg_end = mm->arg_end;
+ env_start = mm->env_start;
+ env_end = mm->env_end;
+ spin_unlock(&mm->arg_lock);
+
+ if (arg_start >= arg_end)
+ return 0;
+
+ /*
+ * We allow setproctitle() to overwrite the argument
+ * strings, and overflow past the original end. But
+ * only when it overflows into the environment area.
+ */
+ if (env_start != arg_end || env_end < env_start)
+ env_start = env_end = arg_end;
+ len = env_end - arg_start;
+
+ /* We're not going to care if "*ppos" has high bits set */
+ pos = 0;
+ if (pos >= len)
+ return 0;
+ if (count > len - pos)
+ count = len - pos;
+ if (!count)
+ return 0;
+
+ /*
+ * Magical special case: if the argv[] end byte is not
+ * zero, the user has overwritten it with setproctitle(3).
+ *
+ * Possible future enhancement: do this only once when
+ * pos is 0, and set a flag in the 'struct file'.
+ */
+ if (access_remote_vm(mm, arg_end-1, &c, 1, FOLL_ANON) == 1 && c)
+ return get_mm_proctitle(mm, count, pos, arg_start);
+
+ /*
+ * For the non-setproctitle() case we limit things strictly
+ * to the [arg_start, arg_end[ range.
+ */
+ pos += arg_start;
+ if (pos < arg_start || pos >= arg_end)
+ return 0;
+ if (count > arg_end - pos)
+ count = arg_end - pos;
+
+ page = (char *)__get_free_page(GFP_KERNEL);
+ if (!page)
+ return -ENOMEM;
+
+ // limit to one page: (HELGE)
+ count = min_t(size_t, PAGE_SIZE, count);
+
+ len = 0;
+ while (count) {
+ int got;
+ int i;
+ size_t size = min_t(size_t, PAGE_SIZE, count);
+
+ got = access_remote_vm(mm, pos, page, size, FOLL_ANON);
+ if (got <= 0)
+ break;
+ for (i = got - 1; i >= 0; i--)
+ if (page[i] == 0)
+ page[i] = ' ';
+ pr_cont("%s", page);
+ pos += got;
+ len += got;
+ count -= got;
+ }
+
+ free_page((unsigned long)page);
+ return len;
+}
+
+static void print_task_cmdline(struct task_struct *tsk)
+{
+ struct mm_struct *mm;
+
+ mm = get_task_mm(tsk);
+ if (!mm)
+ return;
+
+ pr_warn("command line: ");
+ get_mm_cmdline(mm, PAGE_SIZE);
+ pr_cont("\n");
+ mmput(mm);
+}
+
/*
* Print out info about fatal segfaults, if the show_unhandled_signals
* sysctl is set:
@@ -261,6 +402,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long code,
pr_cont(" vm_start = 0x%08lx, vm_end = 0x%08lx\n",
vma->vm_start, vma->vm_end);
+ print_task_cmdline(tsk);
+
show_regs(regs);
}
@@ -363,6 +506,10 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
mmap_read_unlock(mm);
bad_area_nosemaphore:
+ if (!user_mode(regs) && fixup_exception(regs)) {
+ return;
+ }
+
if (user_mode(regs)) {
int signo, si_code;
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 7051c9c909c2..a84eba2fb429 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -174,6 +174,8 @@ drm_clflush_virt_range(void *addr, unsigned long length)
if (wbinvd_on_all_cpus())
pr_err("Timed out waiting for cache flush\n");
+#elif defined(CONFIG_PARISC)
+ flush_kernel_dcache_range((unsigned long) addr, length);
#else
WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
#endif
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 80703417d8a1..6325341d7ed0 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -643,6 +643,10 @@ void r100_hpd_fini(struct radeon_device *rdev)
*/
void r100_pci_gart_tlb_flush(struct radeon_device *rdev)
{
+ /* flush gtt[] gart table entries from r100_pci_gart_set_page() */
+ dma_sync_single_for_device(&rdev->pdev->dev, rdev->gart.table_addr,
+ rdev->gart.table_size, DMA_TO_DEVICE);
+
/* TODO: can we do somethings here ? */
/* It seems hw only cache one entry so we should discard this
* entry otherwise if first GPU GART read hit this entry it
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 581ae20c46e4..1e19ceedc59a 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -31,6 +31,7 @@
#include <drm/drm_device.h>
#include <drm/drm_file.h>
+#include <drm/drm_cache.h>
#include "radeon.h"
@@ -179,6 +180,9 @@ void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *ring,
radeon_ring_write(ring, ring->nop);
}
mb();
+ if (IS_ENABLED(CONFIG_PARISC))
+ drm_clflush_virt_range((void *)&ring->ring[0], ring->wptr * sizeof(uint32_t));
+
/* If we are emitting the HDP flush via MMIO, we need to do it after
* all CPU writes to VRAM finished.
*/
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: parisc kernel cache flush issue with mprotect()
2025-07-04 19:49 ` Dave Anglin
@ 2025-07-05 8:06 ` Jeroen Roovers
0 siblings, 0 replies; 4+ messages in thread
From: Jeroen Roovers @ 2025-07-05 8:06 UTC (permalink / raw)
To: Dave Anglin; +Cc: Helge Deller, linux-parisc
On Fri, 4 Jul 2025 15:49:07 -0400
Dave Anglin <dave@parisc-linux.org> wrote:
> Attached is the current patch that I'm testing on mx3210. I believe
> it fixes everything that I noticed debugging the mprotect-testcase.
> It also has your command line patch and some old radeon changes.
>
> Dave
>
While the kernel can't directly write user pages which have
_PAGE_WRITE zero, it can read pages pages which have [...]
^^^^^ ^^^^^
Kind regards,
jer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-05 8:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 20:56 parisc kernel cache flush issue with mprotect() Helge Deller
2025-07-03 19:22 ` Dave Anglin
2025-07-04 19:49 ` Dave Anglin
2025-07-05 8:06 ` Jeroen Roovers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).