* [PATCH v2 0/8] mm/mprotect: Fix dax puds
@ 2024-07-03 21:29 Peter Xu
2024-07-03 21:29 ` [PATCH v2 1/8] mm/dax: Dump start address in fault handler Peter Xu
` (7 more replies)
0 siblings, 8 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
[Based on mm-unstable, commit 31334cf98dbd, July 2nd]
v2:
- Added tags
- Fix wrong pmd helper used in powerpc
- Added patch "mm/x86: arch_check_zapped_pud()" [Rick]
- Do proper dirty bit shifts for shadow stack on puds [Dave]
- Add missing page_table_check hooks in pudp_establish() [Dave]
v1: https://lore.kernel.org/r/20240621142504.1940209-1-peterx@redhat.com
Dax supports pud pages for a while, but mprotect on puds was missing since
the start. This series tries to fix that by providing pud handling in
mprotect(). The goal is to add more types of pud mappings like hugetlb or
pfnmaps. This series paves way for it by fixing known pud entries.
Considering nobody reported this until when I looked at those other types
of pud mappings, I am thinking maybe it doesn't need to be a fix for stable
and this may not need to be backported. I would guess whoever cares about
mprotect() won't care 1G dax puds yet, vice versa. I hope fixing that in
new kernels would be fine, but I'm open to suggestions.
There're a few small things changed to teach mprotect work on PUDs. E.g. it
will need to start with dropping NUMA_HUGE_PTE_UPDATES which may stop
making sense when there can be more than one type of huge pte. OTOH, we'll
also need to push the mmu notifiers from pmd to pud layers, which might
need some attention but so far I think it's safe. For such details, please
refer to each patch's commit message.
The mprotect() pud process should be straightforward, as I kept it as
simple as possible. There's no NUMA handled as dax simply doesn't support
that. There's also no userfault involvements as file memory (even if work
with userfault-wp async mode) will need to split a pud, so pud entry
doesn't need to yet know userfault's existance (but hugetlb entries will;
that's also for later).
Tests
=====
What I did test:
- cross-build tests that I normally cover [1]
- smoke tested on x86_64 the simplest program [2] on dev_dax 1G PUD
mprotect() using QEMU's nvdimm emulations [3] and ndctl to create
namespaces with proper alignments, which used to throw "bad pud" but now
it'll run through all fine. I checked sigbus happens if with illegal
access on protected puds.
What I didn't test:
- fsdax: I wanted to also give it a shot, but only until then I noticed it
doesn't seem to be supported (according to dax_iomap_fault(), which will
always fallback on PUD_ORDER). I did remember it was supported before, I
could miss something important there.. please shoot if so.
- userfault wp-async: I also wanted to test userfault-wp async be able to
split huge puds (here it's simply a clear_pud.. though), but it won't
work for devdax anyway due to not allowed to do smaller than 1G faults in
this case. So skip too.
- Power, as no hardware on hand.
Thanks,
[1] https://gitlab.com/peterx/lkb-harness/-/blob/main/config.json
[2] https://github.com/xzpeter/clibs/blob/master/misc/dax.c
[3] https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt
Peter Xu (8):
mm/dax: Dump start address in fault handler
mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES
mm/mprotect: Push mmu notifier to PUDs
mm/powerpc: Add missing pud helpers
mm/x86: Make pud_leaf() only cares about PSE bit
mm/x86: arch_check_zapped_pud()
mm/x86: Add missing pud helpers
mm/mprotect: fix dax pud handlings
arch/powerpc/include/asm/book3s/64/pgtable.h | 3 +
arch/powerpc/mm/book3s64/pgtable.c | 20 ++++++
arch/x86/include/asm/pgtable.h | 68 +++++++++++++++---
arch/x86/mm/pgtable.c | 18 +++++
drivers/dax/device.c | 6 +-
include/linux/huge_mm.h | 24 +++++++
include/linux/pgtable.h | 7 ++
include/linux/vm_event_item.h | 1 -
mm/huge_memory.c | 56 ++++++++++++++-
mm/mprotect.c | 74 ++++++++++++--------
mm/vmstat.c | 1 -
11 files changed, 233 insertions(+), 45 deletions(-)
--
2.45.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/8] mm/dax: Dump start address in fault handler
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES Peter Xu
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
Currently the dax fault handler dumps the vma range when dynamic debugging
enabled. That's mostly not useful. Dump the (aligned) address instead
with the order info.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
drivers/dax/device.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index eb61598247a9..714174844ca5 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -235,9 +235,9 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, unsigned int order)
int id;
struct dev_dax *dev_dax = filp->private_data;
- dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) order:%d\n", current->comm,
- (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read",
- vmf->vma->vm_start, vmf->vma->vm_end, order);
+ dev_dbg(&dev_dax->dev, "%s: op=%s addr=%#lx order=%d\n", current->comm,
+ (vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read",
+ vmf->address & ~((1UL << (order + PAGE_SHIFT)) - 1), order);
id = dax_read_lock();
if (order == 0)
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
2024-07-03 21:29 ` [PATCH v2 1/8] mm/dax: Dump start address in fault handler Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 3/8] mm/mprotect: Push mmu notifier to PUDs Peter Xu
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Alex Thorlton, Rik van Riel,
Nicholas Piggin, Borislav Petkov, Kirill A . Shutemov,
Dan Williams, Vlastimil Babka, Oscar Salvador, Mel Gorman,
Andrew Morton, Rick P Edgecombe, linuxppc-dev
In 2013, commit 72403b4a0fbd ("mm: numa: return the number of base pages
altered by protection changes") introduced "numa_huge_pte_updates" vmstat
entry, trying to capture how many huge ptes (in reality, PMD thps at that
time) are marked by NUMA balancing.
This patch proposes to remove it for some reasons.
Firstly, the name is misleading. We can have more than one way to have a
"huge pte" at least nowadays, and that's also the major goal of this patch,
where it paves way for PUD handling in change protection code paths.
PUDs are coming not only for dax (which has already came and yet broken..),
but also for pfnmaps and hugetlb pages. The name will simply stop making
sense when PUD will start to be involved in mprotect() world.
It'll also make it not reasonable either if we boost the counter for both
pmd/puds. In short, current accounting won't be right when PUD comes, so
the scheme was only suitable at that point in time where PUD wasn't even
possible.
Secondly, the accounting was simply not right from the start as long as it
was also affected by other call sites besides NUMA. mprotect() is one,
while userfaultfd-wp also leverages change protection path to modify
pgtables. If it wants to do right it needs to check the caller but it
never did; at least mprotect() should be there even in 2013.
It gives me the impression that nobody is seriously using this field, and
it's also impossible to be serious.
We may want to do it right if any NUMA developers would like it to exist,
but we should do that with all above resolved, on both considering PUDs,
but also on correct accountings. That should be able to be done on top
when there's a real need of such.
Cc: Huang Ying <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/vm_event_item.h | 1 -
mm/mprotect.c | 8 +-------
mm/vmstat.c | 1 -
3 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 747943bc8cc2..2a3797fb6742 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -59,7 +59,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
OOM_KILL,
#ifdef CONFIG_NUMA_BALANCING
NUMA_PTE_UPDATES,
- NUMA_HUGE_PTE_UPDATES,
NUMA_HINT_FAULTS,
NUMA_HINT_FAULTS_LOCAL,
NUMA_PAGE_MIGRATE,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 222ab434da54..21172272695e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -363,7 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
pmd_t *pmd;
unsigned long next;
long pages = 0;
- unsigned long nr_huge_updates = 0;
struct mmu_notifier_range range;
range.start = 0;
@@ -411,11 +410,8 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
ret = change_huge_pmd(tlb, vma, pmd,
addr, newprot, cp_flags);
if (ret) {
- if (ret == HPAGE_PMD_NR) {
+ if (ret == HPAGE_PMD_NR)
pages += HPAGE_PMD_NR;
- nr_huge_updates++;
- }
-
/* huge pmd was handled */
goto next;
}
@@ -435,8 +431,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
if (range.start)
mmu_notifier_invalidate_range_end(&range);
- if (nr_huge_updates)
- count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
return pages;
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 73d791d1caad..53656227f70d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1313,7 +1313,6 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
"numa_pte_updates",
- "numa_huge_pte_updates",
"numa_hint_faults",
"numa_hint_faults_local",
"numa_pages_migrated",
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/8] mm/mprotect: Push mmu notifier to PUDs
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
2024-07-03 21:29 ` [PATCH v2 1/8] mm/dax: Dump start address in fault handler Peter Xu
2024-07-03 21:29 ` [PATCH v2 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 4/8] mm/powerpc: Add missing pud helpers Peter Xu
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: kvm, Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
David Rientjes, Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins,
Matthew Wilcox, Ingo Molnar, Huang Ying, Rik van Riel,
Sean Christopherson, Nicholas Piggin, Borislav Petkov,
Kirill A . Shutemov, Dan Williams, Vlastimil Babka,
Oscar Salvador, Mel Gorman, Paolo Bonzini, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
mprotect() does mmu notifiers in PMD levels. It's there since 2014 of
commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to
change_pmd_range").
At that time, the issue was that NUMA balancing can be applied on a huge
range of VM memory, even if nothing was populated. The notification can be
avoided in this case if no valid pmd detected, which includes either THP or
a PTE pgtable page.
Now to pave way for PUD handling, this isn't enough. We need to generate
mmu notifications even on PUD entries properly. mprotect() is currently
broken on PUD (e.g., one can easily trigger kernel error with dax 1G
mappings already), this is the start to fix it.
To fix that, this patch proposes to push such notifications to the PUD
layers.
There is risk on regressing the problem Rik wanted to resolve before, but I
think it shouldn't really happen, and I still chose this solution because
of a few reasons:
1) Consider a large VM that should definitely contain more than GBs of
memory, it's highly likely that PUDs are also none. In this case there
will have no regression.
2) KVM has evolved a lot over the years to get rid of rmap walks, which
might be the major cause of the previous soft-lockup. At least TDP MMU
already got rid of rmap as long as not nested (which should be the major
use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM
pgtable (e.g. EPT on x86), the invalidation of a full empty region in
most cases could be pretty fast now, comparing to 2014.
3) KVM has explicit code paths now to even give way for mmu notifiers
just like this one, e.g. in commit d02c357e5bfa ("KVM: x86/mmu: Retry
fault before acquiring mmu_lock if mapping is changing"). It'll also
avoid contentions that may also contribute to a soft-lockup.
4) Stick with PMD layer simply don't work when PUD is there... We need
one way or another to fix PUD mappings on mprotect().
Pushing it to PUD should be the safest approach as of now, e.g. there's yet
no sign of huge P4D coming on any known archs.
Cc: kvm@vger.kernel.org
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/mprotect.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 21172272695e..fb8bf3ff7cd9 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -363,9 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
pmd_t *pmd;
unsigned long next;
long pages = 0;
- struct mmu_notifier_range range;
-
- range.start = 0;
pmd = pmd_offset(pud, addr);
do {
@@ -383,14 +380,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
if (pmd_none(*pmd))
goto next;
- /* invoke the mmu notifier if the pmd is populated */
- if (!range.start) {
- mmu_notifier_range_init(&range,
- MMU_NOTIFY_PROTECTION_VMA, 0,
- vma->vm_mm, addr, end);
- mmu_notifier_invalidate_range_start(&range);
- }
-
_pmd = pmdp_get_lockless(pmd);
if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) {
if ((next - addr != HPAGE_PMD_SIZE) ||
@@ -428,9 +417,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
cond_resched();
} while (pmd++, addr = next, addr != end);
- if (range.start)
- mmu_notifier_invalidate_range_end(&range);
-
return pages;
}
@@ -438,10 +424,13 @@ static inline long change_pud_range(struct mmu_gather *tlb,
struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t newprot, unsigned long cp_flags)
{
+ struct mmu_notifier_range range;
pud_t *pud;
unsigned long next;
long pages = 0, ret;
+ range.start = 0;
+
pud = pud_offset(p4d, addr);
do {
next = pud_addr_end(addr, end);
@@ -450,10 +439,19 @@ static inline long change_pud_range(struct mmu_gather *tlb,
return ret;
if (pud_none_or_clear_bad(pud))
continue;
+ if (!range.start) {
+ mmu_notifier_range_init(&range,
+ MMU_NOTIFY_PROTECTION_VMA, 0,
+ vma->vm_mm, addr, end);
+ mmu_notifier_invalidate_range_start(&range);
+ }
pages += change_pmd_range(tlb, vma, pud, addr, next, newprot,
cp_flags);
} while (pud++, addr = next, addr != end);
+ if (range.start)
+ mmu_notifier_invalidate_range_end(&range);
+
return pages;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/8] mm/powerpc: Add missing pud helpers
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
` (2 preceding siblings ...)
2024-07-03 21:29 ` [PATCH v2 3/8] mm/mprotect: Push mmu notifier to PUDs Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 5/8] mm/x86: Make pud_leaf() only cares about PSE bit Peter Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
These new helpers will be needed for pud entry updates soon. Introduce
them by referencing the pmd ones. Namely:
- pudp_invalidate()
- pud_modify()
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/powerpc/include/asm/book3s/64/pgtable.h | 3 +++
arch/powerpc/mm/book3s64/pgtable.c | 20 ++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 519b1743a0f4..5da92ba68a45 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1124,6 +1124,7 @@ extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot);
+extern pud_t pud_modify(pud_t pud, pgprot_t newprot);
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
extern void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -1384,6 +1385,8 @@ static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
#define __HAVE_ARCH_PMDP_INVALIDATE
extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
+extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp);
#define pmd_move_must_withdraw pmd_move_must_withdraw
struct spinlock;
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index f4d8d3c40e5c..5a4a75369043 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -176,6 +176,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
return __pmd(old_pmd);
}
+pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp)
+{
+ unsigned long old_pud;
+
+ VM_WARN_ON_ONCE(!pud_present(*pudp));
+ old_pud = pud_hugepage_update(vma->vm_mm, address, pudp, _PAGE_PRESENT, _PAGE_INVALID);
+ flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
+ return __pud(old_pud);
+}
+
pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp, int full)
{
@@ -259,6 +270,15 @@ pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
pmdv &= _HPAGE_CHG_MASK;
return pmd_set_protbits(__pmd(pmdv), newprot);
}
+
+pud_t pud_modify(pud_t pud, pgprot_t newprot)
+{
+ unsigned long pudv;
+
+ pudv = pud_val(pud);
+ pudv &= _HPAGE_CHG_MASK;
+ return pud_set_protbits(__pud(pudv), newprot);
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/* For use by kexec, called with MMU off */
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/8] mm/x86: Make pud_leaf() only cares about PSE bit
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
` (3 preceding siblings ...)
2024-07-03 21:29 ` [PATCH v2 4/8] mm/powerpc: Add missing pud helpers Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 6/8] mm/x86: arch_check_zapped_pud() Peter Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
An entry should be reported as PUD leaf even if it's PROT_NONE, in which
case PRESENT bit isn't there. I hit bad pud without this when testing dax
1G on zapping a PROT_NONE PUD.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/x86/include/asm/pgtable.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 65b8e5bb902c..25fc6d809572 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1073,8 +1073,7 @@ static inline pmd_t *pud_pgtable(pud_t pud)
#define pud_leaf pud_leaf
static inline bool pud_leaf(pud_t pud)
{
- return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
- (_PAGE_PSE | _PAGE_PRESENT);
+ return pud_val(pud) & _PAGE_PSE;
}
static inline int pud_bad(pud_t pud)
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/8] mm/x86: arch_check_zapped_pud()
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
` (4 preceding siblings ...)
2024-07-03 21:29 ` [PATCH v2 5/8] mm/x86: Make pud_leaf() only cares about PSE bit Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 7/8] mm/x86: Add missing pud helpers Peter Xu
2024-07-03 21:29 ` [PATCH v2 8/8] mm/mprotect: fix dax pud handlings Peter Xu
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
Introduce arch_check_zapped_pud() to sanity check shadow stack on PUD zaps.
It has the same logic of the PMD helper.
One thing to mention is, it might be a good idea to use page_table_check in
the future for trapping wrong setups of shadow stack pgtable entries [1].
That is left for the future as a separate effort.
[1] https://lore.kernel.org/all/59d518698f664e07c036a5098833d7b56b953305.camel@intel.com
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/x86/include/asm/pgtable.h | 10 ++++++++++
arch/x86/mm/pgtable.c | 7 +++++++
include/linux/pgtable.h | 7 +++++++
mm/huge_memory.c | 4 +++-
4 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 25fc6d809572..cdf044c2ad6e 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -169,6 +169,13 @@ static inline int pud_young(pud_t pud)
return pud_flags(pud) & _PAGE_ACCESSED;
}
+static inline bool pud_shstk(pud_t pud)
+{
+ return cpu_feature_enabled(X86_FEATURE_SHSTK) &&
+ (pud_flags(pud) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) ==
+ (_PAGE_DIRTY | _PAGE_PSE);
+}
+
static inline int pte_write(pte_t pte)
{
/*
@@ -1662,6 +1669,9 @@ void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte);
#define arch_check_zapped_pmd arch_check_zapped_pmd
void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd);
+#define arch_check_zapped_pud arch_check_zapped_pud
+void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t pud);
+
#ifdef CONFIG_XEN_PV
#define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young
static inline bool arch_has_hw_nonleaf_pmd_young(void)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 93e54ba91fbf..564b8945951e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -926,3 +926,10 @@ void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd)
VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) &&
pmd_shstk(pmd));
}
+
+void arch_check_zapped_pud(struct vm_area_struct *vma, pud_t pud)
+{
+ /* See note in arch_check_zapped_pte() */
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) &&
+ pud_shstk(pud));
+}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 2a6a3cccfc36..2289e9f7aa1b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -447,6 +447,13 @@ static inline void arch_check_zapped_pmd(struct vm_area_struct *vma,
}
#endif
+#ifndef arch_check_zapped_pud
+static inline void arch_check_zapped_pud(struct vm_area_struct *vma,
+ pud_t pud)
+{
+}
+#endif
+
#ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long address,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 251d6932130f..017377920d18 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2278,12 +2278,14 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
pud_t *pud, unsigned long addr)
{
spinlock_t *ptl;
+ pud_t orig_pud;
ptl = __pud_trans_huge_lock(pud, vma);
if (!ptl)
return 0;
- pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
+ orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
+ arch_check_zapped_pud(vma, orig_pud);
tlb_remove_pud_tlb_entry(tlb, pud, addr);
if (vma_is_special_huge(vma)) {
spin_unlock(ptl);
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 7/8] mm/x86: Add missing pud helpers
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
` (5 preceding siblings ...)
2024-07-03 21:29 ` [PATCH v2 6/8] mm/x86: arch_check_zapped_pud() Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
2024-07-06 9:16 ` kernel test robot
2024-07-03 21:29 ` [PATCH v2 8/8] mm/mprotect: fix dax pud handlings Peter Xu
7 siblings, 1 reply; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
These new helpers will be needed for pud entry updates soon. Introduce
these helpers by referencing the pmd ones. Namely:
- pudp_invalidate()
- pud_modify()
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
arch/x86/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++++-----
arch/x86/mm/pgtable.c | 11 +++++++
2 files changed, 58 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index cdf044c2ad6e..701593c53f3b 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -782,6 +782,12 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd)
__pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
}
+static inline pud_t pud_mkinvalid(pud_t pud)
+{
+ return pfn_pud(pud_pfn(pud),
+ __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE)));
+}
+
static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask);
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
@@ -829,14 +835,8 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
pmd_result = __pmd(val);
/*
- * To avoid creating Write=0,Dirty=1 PMDs, pte_modify() needs to avoid:
- * 1. Marking Write=0 PMDs Dirty=1
- * 2. Marking Dirty=1 PMDs Write=0
- *
- * The first case cannot happen because the _PAGE_CHG_MASK will filter
- * out any Dirty bit passed in newprot. Handle the second case by
- * going through the mksaveddirty exercise. Only do this if the old
- * value was Write=1 to avoid doing this on Shadow Stack PTEs.
+ * Avoid creating shadow stack PMD by accident. See comment in
+ * pte_modify().
*/
if (oldval & _PAGE_RW)
pmd_result = pmd_mksaveddirty(pmd_result);
@@ -846,6 +846,29 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
return pmd_result;
}
+static inline pud_t pud_modify(pud_t pud, pgprot_t newprot)
+{
+ pudval_t val = pud_val(pud), oldval = val;
+ pud_t pud_result;
+
+ val &= _HPAGE_CHG_MASK;
+ val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK;
+ val = flip_protnone_guard(oldval, val, PHYSICAL_PUD_PAGE_MASK);
+
+ pud_result = __pud(val);
+
+ /*
+ * Avoid creating shadow stack PUD by accident. See comment in
+ * pte_modify().
+ */
+ if (oldval & _PAGE_RW)
+ pud_result = pud_mksaveddirty(pud_result);
+ else
+ pud_result = pud_clear_saveddirty(pud_result);
+
+ return pud_result;
+}
+
/*
* mprotect needs to preserve PAT and encryption bits when updating
* vm_page_prot
@@ -1384,10 +1407,26 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
}
#endif
+static inline pud_t pudp_establish(struct vm_area_struct *vma,
+ unsigned long address, pud_t *pudp, pud_t pud)
+{
+ page_table_check_pud_set(vma->vm_mm, pudp, pud);
+ if (IS_ENABLED(CONFIG_SMP)) {
+ return xchg(pudp, pud);
+ } else {
+ pud_t old = *pudp;
+ WRITE_ONCE(*pudp, pud);
+ return old;
+ }
+}
+
#define __HAVE_ARCH_PMDP_INVALIDATE_AD
extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
+pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp);
+
/*
* Page table pages are page-aligned. The lower half of the top
* level is used for userspace and the top half for the kernel.
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 564b8945951e..9d97a1b99160 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -641,6 +641,17 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
}
#endif
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address,
+ pud_t *pudp)
+{
+ VM_WARN_ON_ONCE(!pud_present(*pudp));
+ pud_t old = pudp_establish(vma, address, pudp, pud_mkinvalid(*pudp));
+ flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
+ return old;
+}
+#endif
+
/**
* reserve_top_address - reserves a hole in the top of kernel address space
* @reserve - size of hole to reserve
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 8/8] mm/mprotect: fix dax pud handlings
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
` (6 preceding siblings ...)
2024-07-03 21:29 ` [PATCH v2 7/8] mm/x86: Add missing pud helpers Peter Xu
@ 2024-07-03 21:29 ` Peter Xu
7 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-03 21:29 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Huang Ying, Rik van Riel, Nicholas Piggin,
Borislav Petkov, Kirill A . Shutemov, Dan Williams,
Vlastimil Babka, Oscar Salvador, Mel Gorman, Andrew Morton,
Rick P Edgecombe, linuxppc-dev
This is only relevant to the two archs that support PUD dax, aka, x86_64
and ppc64. PUD THPs do not yet exist elsewhere, and hugetlb PUDs do not
count in this case.
DAX have had PUD mappings for years, but change protection path never
worked. When the path is triggered in any form (a simple test program
would be: call mprotect() on a 1G dev_dax mapping), the kernel will report
"bad pud". This patch should fix that.
The new change_huge_pud() tries to keep everything simple. For example, it
doesn't optimize write bit as that will need even more PUD helpers. It's
not too bad anyway to have one more write fault in the worst case once for
1G range; may be a bigger thing for each PAGE_SIZE, though. Neither does
it support userfault-wp bits, as there isn't such PUD mappings that is
supported; file mappings always need a split there.
The same to TLB shootdown: the pmd path (which was for x86 only) has the
trick of using _ad() version of pmdp_invalidate*() which can avoid one
redundant TLB, but let's also leave that for later. Again, the larger the
mapping, the smaller of such effect.
Another thing worth mention is this path needs to be careful on handling
"retry" event for change_huge_pud() (where it can return 0): it isn't like
change_huge_pmd(), as the pmd version is safe with all conditions handled
in change_pte_range() later, thanks to Hugh's new pte_offset_map_lock().
In short, change_pte_range() is simply smarter than change_pmd_range() now
after the shmem thp collapse rework. For that reason, change_pud_range()
will need proper retry if it races with something else when a huge PUD
changed from under us.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/huge_mm.h | 24 +++++++++++++++++++
mm/huge_memory.c | 52 +++++++++++++++++++++++++++++++++++++++++
mm/mprotect.c | 40 ++++++++++++++++++++++++-------
3 files changed, 108 insertions(+), 8 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index cee3c5da8f0e..2ebebadfbe47 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -348,6 +348,17 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address,
void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
unsigned long address);
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ pud_t *pudp, unsigned long addr, pgprot_t newprot,
+ unsigned long cp_flags);
+#else
+static inline int
+change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ pud_t *pudp, unsigned long addr, pgprot_t newprot,
+ unsigned long cp_flags) { return 0; }
+#endif
+
#define split_huge_pud(__vma, __pud, __address) \
do { \
pud_t *____pud = (__pud); \
@@ -591,6 +602,19 @@ static inline int next_order(unsigned long *orders, int prev)
{
return 0;
}
+
+static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
+ unsigned long address)
+{
+}
+
+static inline int change_huge_pud(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, pud_t *pudp,
+ unsigned long addr, pgprot_t newprot,
+ unsigned long cp_flags)
+{
+ return 0;
+}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline int split_folio_to_list_to_order(struct folio *folio,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 017377920d18..4998452f7c49 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2099,6 +2099,53 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
return ret;
}
+/*
+ * Returns:
+ *
+ * - 0: if pud leaf changed from under us
+ * - 1: if pud can be skipped
+ * - HPAGE_PUD_NR: if pud was successfully processed
+ */
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ pud_t *pudp, unsigned long addr, pgprot_t newprot,
+ unsigned long cp_flags)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pud_t oldpud, entry;
+ spinlock_t *ptl;
+
+ tlb_change_page_size(tlb, HPAGE_PUD_SIZE);
+
+ /* NUMA balancing doesn't apply to dax */
+ if (cp_flags & MM_CP_PROT_NUMA)
+ return 1;
+
+ /*
+ * Huge entries on userfault-wp only works with anonymous, while we
+ * don't have anonymous PUDs yet.
+ */
+ if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL))
+ return 1;
+
+ ptl = __pud_trans_huge_lock(pudp, vma);
+ if (!ptl)
+ return 0;
+
+ /*
+ * Can't clear PUD or it can race with concurrent zapping. See
+ * change_huge_pmd().
+ */
+ oldpud = pudp_invalidate(vma, addr, pudp);
+ entry = pud_modify(oldpud, newprot);
+ set_pud_at(mm, addr, pudp, entry);
+ tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE);
+
+ spin_unlock(ptl);
+ return HPAGE_PUD_NR;
+}
+#endif
+
#ifdef CONFIG_USERFAULTFD
/*
* The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by
@@ -2329,6 +2376,11 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(&range);
}
+#else
+void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
+ unsigned long address)
+{
+}
#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index fb8bf3ff7cd9..694f13b83864 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -425,29 +425,53 @@ static inline long change_pud_range(struct mmu_gather *tlb,
unsigned long end, pgprot_t newprot, unsigned long cp_flags)
{
struct mmu_notifier_range range;
- pud_t *pud;
+ pud_t *pudp, pud;
unsigned long next;
long pages = 0, ret;
range.start = 0;
- pud = pud_offset(p4d, addr);
+ pudp = pud_offset(p4d, addr);
do {
+again:
next = pud_addr_end(addr, end);
- ret = change_prepare(vma, pud, pmd, addr, cp_flags);
- if (ret)
- return ret;
- if (pud_none_or_clear_bad(pud))
+ ret = change_prepare(vma, pudp, pmd, addr, cp_flags);
+ if (ret) {
+ pages = ret;
+ break;
+ }
+
+ pud = READ_ONCE(*pudp);
+ if (pud_none(pud))
continue;
+
if (!range.start) {
mmu_notifier_range_init(&range,
MMU_NOTIFY_PROTECTION_VMA, 0,
vma->vm_mm, addr, end);
mmu_notifier_invalidate_range_start(&range);
}
- pages += change_pmd_range(tlb, vma, pud, addr, next, newprot,
+
+ if (pud_leaf(pud)) {
+ if ((next - addr != PUD_SIZE) ||
+ pgtable_split_needed(vma, cp_flags)) {
+ __split_huge_pud(vma, pudp, addr);
+ goto again;
+ } else {
+ ret = change_huge_pud(tlb, vma, pudp,
+ addr, newprot, cp_flags);
+ if (ret == 0)
+ goto again;
+ /* huge pud was handled */
+ if (ret == HPAGE_PUD_NR)
+ pages += HPAGE_PUD_NR;
+ continue;
+ }
+ }
+
+ pages += change_pmd_range(tlb, vma, pudp, addr, next, newprot,
cp_flags);
- } while (pud++, addr = next, addr != end);
+ } while (pudp++, addr = next, addr != end);
if (range.start)
mmu_notifier_invalidate_range_end(&range);
--
2.45.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 7/8] mm/x86: Add missing pud helpers
2024-07-03 21:29 ` [PATCH v2 7/8] mm/x86: Add missing pud helpers Peter Xu
@ 2024-07-06 9:16 ` kernel test robot
2024-07-09 19:44 ` Peter Xu
0 siblings, 1 reply; 11+ messages in thread
From: kernel test robot @ 2024-07-06 9:16 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: Dave Hansen, peterx, Linux Memory Management List,
Christophe Leroy, Thomas Gleixner, Dave Jiang, Aneesh Kumar K . V,
x86, Hugh Dickins, Matthew Wilcox, Ingo Molnar, Mel Gorman,
Huang Ying, Rik van Riel, Nicholas Piggin, Borislav Petkov,
oe-kbuild-all, Kirill A . Shutemov, Dan Williams, Vlastimil Babka,
Oscar Salvador, Andrew Morton, Rick P Edgecombe, linuxppc-dev
Hi Peter,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-dax-Dump-start-address-in-fault-handler/20240705-013812
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20240703212918.2417843-8-peterx%40redhat.com
patch subject: [PATCH v2 7/8] mm/x86: Add missing pud helpers
config: i386-randconfig-011-20240706 (https://download.01.org/0day-ci/archive/20240706/202407061716.WH5NMiL2-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240706/202407061716.WH5NMiL2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407061716.WH5NMiL2-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/include/asm/atomic.h:8,
from include/linux/atomic.h:7,
from include/linux/jump_label.h:256,
from include/linux/static_key.h:1,
from arch/x86/include/asm/nospec-branch.h:6,
from arch/x86/include/asm/irqflags.h:9,
from include/linux/irqflags.h:18,
from include/linux/spinlock.h:59,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:7,
from include/linux/mm.h:7,
from arch/x86/mm/pgtable.c:2:
In function 'pudp_establish',
inlined from 'pudp_invalidate' at arch/x86/mm/pgtable.c:649:14:
>> arch/x86/include/asm/cmpxchg.h:67:25: error: call to '__xchg_wrong_size' declared with attribute error: Bad argument size for xchg
67 | __ ## op ## _wrong_size(); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:78:33: note: in expansion of macro '__xchg_op'
78 | #define arch_xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
| ^~~~~~~~~
include/linux/atomic/atomic-arch-fallback.h:12:18: note: in expansion of macro 'arch_xchg'
12 | #define raw_xchg arch_xchg
| ^~~~~~~~~
include/linux/atomic/atomic-instrumented.h:4758:9: note: in expansion of macro 'raw_xchg'
4758 | raw_xchg(__ai_ptr, __VA_ARGS__); \
| ^~~~~~~~
arch/x86/include/asm/pgtable.h:1415:24: note: in expansion of macro 'xchg'
1415 | return xchg(pudp, pud);
| ^~~~
vim +/__xchg_wrong_size +67 arch/x86/include/asm/cmpxchg.h
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 37
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 38 /*
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 39 * An exchange-type operation, which takes a value and a pointer, and
7f5281ae8a8e7f8 Li Zhong 2013-04-25 40 * returns the old value.
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 41 */
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 42 #define __xchg_op(ptr, arg, op, lock) \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 43 ({ \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 44 __typeof__ (*(ptr)) __ret = (arg); \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 45 switch (sizeof(*(ptr))) { \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 46 case __X86_CASE_B: \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 47 asm volatile (lock #op "b %b0, %1\n" \
2ca052a3710fac2 Jeremy Fitzhardinge 2012-04-02 48 : "+q" (__ret), "+m" (*(ptr)) \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 49 : : "memory", "cc"); \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 50 break; \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 51 case __X86_CASE_W: \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 52 asm volatile (lock #op "w %w0, %1\n" \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 53 : "+r" (__ret), "+m" (*(ptr)) \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 54 : : "memory", "cc"); \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 55 break; \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 56 case __X86_CASE_L: \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 57 asm volatile (lock #op "l %0, %1\n" \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 58 : "+r" (__ret), "+m" (*(ptr)) \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 59 : : "memory", "cc"); \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 60 break; \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 61 case __X86_CASE_Q: \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 62 asm volatile (lock #op "q %q0, %1\n" \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 63 : "+r" (__ret), "+m" (*(ptr)) \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 64 : : "memory", "cc"); \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 65 break; \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 66 default: \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 @67 __ ## op ## _wrong_size(); \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 68 } \
31a8394e069e47d Jeremy Fitzhardinge 2011-09-30 69 __ret; \
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 70 })
e9826380d83d1bd Jeremy Fitzhardinge 2011-08-18 71
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 7/8] mm/x86: Add missing pud helpers
2024-07-06 9:16 ` kernel test robot
@ 2024-07-09 19:44 ` Peter Xu
0 siblings, 0 replies; 11+ messages in thread
From: Peter Xu @ 2024-07-09 19:44 UTC (permalink / raw)
To: kernel test robot
Cc: Dave Hansen, linux-mm, Christophe Leroy, Thomas Gleixner,
Dave Jiang, Aneesh Kumar K . V, x86, Hugh Dickins, Matthew Wilcox,
Ingo Molnar, Mel Gorman, Huang Ying, Rik van Riel,
Nicholas Piggin, Borislav Petkov, oe-kbuild-all,
Kirill A . Shutemov, Dan Williams, Vlastimil Babka,
Oscar Salvador, linux-kernel, Andrew Morton, Rick P Edgecombe,
linuxppc-dev
On Sat, Jul 06, 2024 at 05:16:15PM +0800, kernel test robot wrote:
> Hi Peter,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-dax-Dump-start-address-in-fault-handler/20240705-013812
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20240703212918.2417843-8-peterx%40redhat.com
> patch subject: [PATCH v2 7/8] mm/x86: Add missing pud helpers
> config: i386-randconfig-011-20240706 (https://download.01.org/0day-ci/archive/20240706/202407061716.WH5NMiL2-lkp@intel.com/config)
> compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240706/202407061716.WH5NMiL2-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202407061716.WH5NMiL2-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> In file included from arch/x86/include/asm/atomic.h:8,
> from include/linux/atomic.h:7,
> from include/linux/jump_label.h:256,
> from include/linux/static_key.h:1,
> from arch/x86/include/asm/nospec-branch.h:6,
> from arch/x86/include/asm/irqflags.h:9,
> from include/linux/irqflags.h:18,
> from include/linux/spinlock.h:59,
> from include/linux/mmzone.h:8,
> from include/linux/gfp.h:7,
> from include/linux/mm.h:7,
> from arch/x86/mm/pgtable.c:2:
> In function 'pudp_establish',
> inlined from 'pudp_invalidate' at arch/x86/mm/pgtable.c:649:14:
> >> arch/x86/include/asm/cmpxchg.h:67:25: error: call to '__xchg_wrong_size' declared with attribute error: Bad argument size for xchg
> 67 | __ ## op ## _wrong_size(); \
> | ^~~~~~~~~~~~~~~~~~~~~~~~~
> arch/x86/include/asm/cmpxchg.h:78:33: note: in expansion of macro '__xchg_op'
> 78 | #define arch_xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
> | ^~~~~~~~~
> include/linux/atomic/atomic-arch-fallback.h:12:18: note: in expansion of macro 'arch_xchg'
> 12 | #define raw_xchg arch_xchg
> | ^~~~~~~~~
> include/linux/atomic/atomic-instrumented.h:4758:9: note: in expansion of macro 'raw_xchg'
> 4758 | raw_xchg(__ai_ptr, __VA_ARGS__); \
> | ^~~~~~~~
> arch/x86/include/asm/pgtable.h:1415:24: note: in expansion of macro 'xchg'
> 1415 | return xchg(pudp, pud);
> | ^~~~
So this is the PAE paging on i386 which indeed didn't get covered in my
testsuite.. where it only covered allno/alldef which were always 2lvls.
I'll fix it when I repost, I'll add PAE into my harness too.
--
Peter Xu
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-07-09 19:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 21:29 [PATCH v2 0/8] mm/mprotect: Fix dax puds Peter Xu
2024-07-03 21:29 ` [PATCH v2 1/8] mm/dax: Dump start address in fault handler Peter Xu
2024-07-03 21:29 ` [PATCH v2 2/8] mm/mprotect: Remove NUMA_HUGE_PTE_UPDATES Peter Xu
2024-07-03 21:29 ` [PATCH v2 3/8] mm/mprotect: Push mmu notifier to PUDs Peter Xu
2024-07-03 21:29 ` [PATCH v2 4/8] mm/powerpc: Add missing pud helpers Peter Xu
2024-07-03 21:29 ` [PATCH v2 5/8] mm/x86: Make pud_leaf() only cares about PSE bit Peter Xu
2024-07-03 21:29 ` [PATCH v2 6/8] mm/x86: arch_check_zapped_pud() Peter Xu
2024-07-03 21:29 ` [PATCH v2 7/8] mm/x86: Add missing pud helpers Peter Xu
2024-07-06 9:16 ` kernel test robot
2024-07-09 19:44 ` Peter Xu
2024-07-03 21:29 ` [PATCH v2 8/8] mm/mprotect: fix dax pud handlings Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).