* [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:19 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start() Alistair Popple
` (12 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
Currently dax is the only user of pmd and pud mapped ZONE_DEVICE
pages. Therefore page walkers that want to exclude DAX pages can check
pmd_devmap or pud_devmap. However soon dax will no longer set PFN_DEV,
meaning dax pages are mapped as normal pages.
Ensure page walkers that currently use pXd_devmap to skip DAX pages
continue to do so by adding explicit checks of the VMA instead.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v1:
- Remove vma_is_dax() check from mm/userfaultfd.c as
validate_move_areas() will already skip DAX VMA's on account of them
not being anonymous.
---
fs/userfaultfd.c | 2 +-
mm/hmm.c | 2 +-
mm/userfaultfd.c | 6 ------
3 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index ef054b3..a886750 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -304,7 +304,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
goto out;
ret = false;
- if (!pmd_present(_pmd) || pmd_devmap(_pmd))
+ if (!pmd_present(_pmd) || vma_is_dax(vmf->vma))
goto out;
if (pmd_trans_huge(_pmd)) {
diff --git a/mm/hmm.c b/mm/hmm.c
index feac861..5311753 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -441,7 +441,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
return hmm_vma_walk_hole(start, end, -1, walk);
}
- if (pud_leaf(pud) && pud_devmap(pud)) {
+ if (pud_leaf(pud) && vma_is_dax(walk->vma)) {
unsigned long i, npages, pfn;
unsigned int required_fault;
unsigned long *hmm_pfns;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 58b3ad6..8395db2 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1818,12 +1818,6 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
ptl = pmd_trans_huge_lock(src_pmd, src_vma);
if (ptl) {
- if (pmd_devmap(*src_pmd)) {
- spin_unlock(ptl);
- err = -ENOENT;
- break;
- }
-
/* Check if we can move the pmd without splitting it. */
if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
!pmd_none(dst_pmdval)) {
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax
2025-06-16 11:58 ` [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax Alistair Popple
@ 2025-06-17 9:19 ` David Hildenbrand
2025-06-19 8:40 ` Alistair Popple
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:19 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> Currently dax is the only user of pmd and pud mapped ZONE_DEVICE
> pages. Therefore page walkers that want to exclude DAX pages can check
> pmd_devmap or pud_devmap. However soon dax will no longer set PFN_DEV,
> meaning dax pages are mapped as normal pages.
>
> Ensure page walkers that currently use pXd_devmap to skip DAX pages
> continue to do so by adding explicit checks of the VMA instead.
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> ---
>
> Changes from v1:
>
> - Remove vma_is_dax() check from mm/userfaultfd.c as
> validate_move_areas() will already skip DAX VMA's on account of them
> not being anonymous.
This should be documented in the patch description above.
> ---
> fs/userfaultfd.c | 2 +-
> mm/hmm.c | 2 +-
> mm/userfaultfd.c | 6 ------
> 3 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index ef054b3..a886750 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -304,7 +304,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
> goto out;
>
> ret = false;
> - if (!pmd_present(_pmd) || pmd_devmap(_pmd))
> + if (!pmd_present(_pmd) || vma_is_dax(vmf->vma))
> goto out;
VMA checks should be done before doing any page table walk.
>
> if (pmd_trans_huge(_pmd)) {
> diff --git a/mm/hmm.c b/mm/hmm.c
> index feac861..5311753 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -441,7 +441,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
> return hmm_vma_walk_hole(start, end, -1, walk);
> }
>
> - if (pud_leaf(pud) && pud_devmap(pud)) {
> + if (pud_leaf(pud) && vma_is_dax(walk->vma)) {
> unsigned long i, npages, pfn;
> unsigned int required_fault;
> unsigned long *hmm_pfns;
Dito.
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 58b3ad6..8395db2 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -1818,12 +1818,6 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
>
> ptl = pmd_trans_huge_lock(src_pmd, src_vma);
> if (ptl) {
> - if (pmd_devmap(*src_pmd)) {
> - spin_unlock(ptl);
> - err = -ENOENT;
> - break;
> - }
> -
> /* Check if we can move the pmd without splitting it. */
> if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
> !pmd_none(dst_pmdval)) {
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax
2025-06-17 9:19 ` David Hildenbrand
@ 2025-06-19 8:40 ` Alistair Popple
0 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-19 8:40 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On Tue, Jun 17, 2025 at 11:19:34AM +0200, David Hildenbrand wrote:
> On 16.06.25 13:58, Alistair Popple wrote:
> > Currently dax is the only user of pmd and pud mapped ZONE_DEVICE
> > pages. Therefore page walkers that want to exclude DAX pages can check
> > pmd_devmap or pud_devmap. However soon dax will no longer set PFN_DEV,
> > meaning dax pages are mapped as normal pages.
> >
> > Ensure page walkers that currently use pXd_devmap to skip DAX pages
> > continue to do so by adding explicit checks of the VMA instead.
> > > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> >
> > ---
> >
> > Changes from v1:
> >
> > - Remove vma_is_dax() check from mm/userfaultfd.c as
> > validate_move_areas() will already skip DAX VMA's on account of them
> > not being anonymous.
>
> This should be documented in the patch description above.
Ok.
> > ---
> > fs/userfaultfd.c | 2 +-
> > mm/hmm.c | 2 +-
> > mm/userfaultfd.c | 6 ------
> > 3 files changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index ef054b3..a886750 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -304,7 +304,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
> > goto out;
> > ret = false;
> > - if (!pmd_present(_pmd) || pmd_devmap(_pmd))
> > + if (!pmd_present(_pmd) || vma_is_dax(vmf->vma))
> > goto out;
>
> VMA checks should be done before doing any page table walk.
Actually upon review I think this check was always redundant as well -
vma_can_userfault() checks limit userfaultfd to anon/hugetlb/shmem vma's. Boy we
sure have a lot of these "normal vma" checks around the place ... at least for
certain definitions of normal.
Anyway will remove this and add a note to the commit (and apologies for not
catching this last time as I think you may have ready mentioned this or at least
the general concept).
> > if (pmd_trans_huge(_pmd)) {
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index feac861..5311753 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -441,7 +441,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
> > return hmm_vma_walk_hole(start, end, -1, walk);
> > }
> > - if (pud_leaf(pud) && pud_devmap(pud)) {
> > + if (pud_leaf(pud) && vma_is_dax(walk->vma)) {
> > unsigned long i, npages, pfn;
> > unsigned int required_fault;
> > unsigned long *hmm_pfns;
>
> Dito.
Actually I see little reason to restrict this only to DAX for HMM ... we don't
end up doing that for the equivalent PMD path so will drop this as well. Thanks!
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 58b3ad6..8395db2 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -1818,12 +1818,6 @@ ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
> > ptl = pmd_trans_huge_lock(src_pmd, src_vma);
> > if (ptl) {
> > - if (pmd_devmap(*src_pmd)) {
> > - spin_unlock(ptl);
> > - err = -ENOENT;
> > - break;
> > - }
> > -
> > /* Check if we can move the pmd without splitting it. */
> > if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
> > !pmd_none(dst_pmdval)) {
>
>
> --
> Cheers,
>
> David / dhildenb
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start()
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
2025-06-16 11:58 ` [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:25 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special Alistair Popple
` (11 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski
Previously dax pages were skipped by the pagewalk code as pud_special() or
vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages are
refcounted normally that is no longer the case, so the pagewalk code will
start returning them.
Most callers already explicitly filter for DAX or zone device pages so
don't need updating. However some don't, so add checks to those callers.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Changes since v1:
- Dropped "mm/pagewalk: Skip dax pages in pagewalk" and replaced it
with this new patch for v2
- As suggested by David and Jason we can filter the folios in the
callers instead of doing it in folio_start_walk(). Most callers
already do this (see below).
I audited all callers of folio_walk_start() and found the following:
mm/ksm.c:
break_ksm() - doesn't need to filter zone_device pages because the can
never be KSM pages.
get_mergeable_page() - already filters out zone_device pages.
scan_get_next_rmap_iterm() - already filters out zone_device_pages.
mm/huge_memory.c:
split_huge_pages_pid() - already checks for DAX with
vma_not_suitable_for_thp_split()
mm/rmap.c:
make_device_exclusive() - only works on anonymous pages, although
there'd be no issue with finding a DAX page even if support was extended
to file-backed pages.
mm/migrate.c:
add_folio_for_migration() - already checks the vma with vma_migratable()
do_pages_stat_array() - explicitly checks for zone_device folios
kernel/event/uprobes.c:
uprobe_write_opcode() - only works on anonymous pages, not sure if
zone_device could ever work so add an explicit check
arch/s390/mm/fault.c:
do_secure_storage_access() - not sure so be conservative and add a check
arch/s390/kernel/uv.c:
make_hva_secure() - not sure so be conservative and add a check
---
arch/s390/kernel/uv.c | 2 +-
arch/s390/mm/fault.c | 2 +-
kernel/events/uprobes.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index b99478e..55aa280 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -424,7 +424,7 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
return -EFAULT;
}
folio = folio_walk_start(&fw, vma, hva, 0);
- if (!folio) {
+ if (!folio || folio_is_zone_device(folio)) {
mmap_read_unlock(mm);
return -ENXIO;
}
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e1ad05b..df1a067 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -449,7 +449,7 @@ void do_secure_storage_access(struct pt_regs *regs)
if (!vma)
return handle_fault_error(regs, SEGV_MAPERR);
folio = folio_walk_start(&fw, vma, addr, 0);
- if (!folio) {
+ if (!folio || folio_is_zone_device(folio)) {
mmap_read_unlock(mm);
return;
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8a601df..f774367 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -539,7 +539,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
}
ret = 0;
- if (unlikely(!folio_test_anon(folio))) {
+ if (unlikely(!folio_test_anon(folio) || folio_is_zone_device(folio))) {
VM_WARN_ON_ONCE(is_register);
folio_put(folio);
goto out;
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start()
2025-06-16 11:58 ` [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start() Alistair Popple
@ 2025-06-17 9:25 ` David Hildenbrand
2025-06-17 9:30 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:25 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On 16.06.25 13:58, Alistair Popple wrote:
> Previously dax pages were skipped by the pagewalk code as pud_special() or
> vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages are
> refcounted normally that is no longer the case, so the pagewalk code will
> start returning them.
>
> Most callers already explicitly filter for DAX or zone device pages so
> don't need updating. However some don't, so add checks to those callers.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>
> ---
>
> Changes since v1:
>
> - Dropped "mm/pagewalk: Skip dax pages in pagewalk" and replaced it
> with this new patch for v2
>
> - As suggested by David and Jason we can filter the folios in the
> callers instead of doing it in folio_start_walk(). Most callers
> already do this (see below).
>
> I audited all callers of folio_walk_start() and found the following:
>
> mm/ksm.c:
>
> break_ksm() - doesn't need to filter zone_device pages because the can
> never be KSM pages.
>
> get_mergeable_page() - already filters out zone_device pages.
> scan_get_next_rmap_iterm() - already filters out zone_device_pages.
>
> mm/huge_memory.c:
>
> split_huge_pages_pid() - already checks for DAX with
> vma_not_suitable_for_thp_split()
>
> mm/rmap.c:
>
> make_device_exclusive() - only works on anonymous pages, although
> there'd be no issue with finding a DAX page even if support was extended
> to file-backed pages.
>
> mm/migrate.c:
>
> add_folio_for_migration() - already checks the vma with vma_migratable()
> do_pages_stat_array() - explicitly checks for zone_device folios
>
> kernel/event/uprobes.c:
>
> uprobe_write_opcode() - only works on anonymous pages, not sure if
> zone_device could ever work so add an explicit check
>
> arch/s390/mm/fault.c:
>
> do_secure_storage_access() - not sure so be conservative and add a check
>
> arch/s390/kernel/uv.c:
>
> make_hva_secure() - not sure so be conservative and add a check
> ---
> arch/s390/kernel/uv.c | 2 +-
> arch/s390/mm/fault.c | 2 +-
> kernel/events/uprobes.c | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index b99478e..55aa280 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -424,7 +424,7 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
> return -EFAULT;
> }
> folio = folio_walk_start(&fw, vma, hva, 0);
> - if (!folio) {
> + if (!folio || folio_is_zone_device(folio)) {
> mmap_read_unlock(mm);
> return -ENXIO;
> }
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e1ad05b..df1a067 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -449,7 +449,7 @@ void do_secure_storage_access(struct pt_regs *regs)
> if (!vma)
> return handle_fault_error(regs, SEGV_MAPERR);
> folio = folio_walk_start(&fw, vma, addr, 0);
> - if (!folio) {
> + if (!folio || folio_is_zone_device(folio)) {
> mmap_read_unlock(mm);
> return;
> }
Curious, does s390 even support ZONE_DEVICE and could trigger this?
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 8a601df..f774367 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -539,7 +539,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> }
>
> ret = 0;
> - if (unlikely(!folio_test_anon(folio))) {
> + if (unlikely(!folio_test_anon(folio) || folio_is_zone_device(folio))) {
> VM_WARN_ON_ONCE(is_register);
> folio_put(folio);
> goto out;
I wonder if __uprobe_write_opcode() would just work with anon device folios?
We only modify page content, and conditionally zap the page. Would there
be a problem with anon device folios?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start()
2025-06-17 9:25 ` David Hildenbrand
@ 2025-06-17 9:30 ` David Hildenbrand
2025-06-19 0:41 ` Alistair Popple
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:30 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On 17.06.25 11:25, David Hildenbrand wrote:
> On 16.06.25 13:58, Alistair Popple wrote:
>> Previously dax pages were skipped by the pagewalk code as pud_special() or
>> vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages are
>> refcounted normally that is no longer the case, so the pagewalk code will
>> start returning them.
>>
>> Most callers already explicitly filter for DAX or zone device pages so
>> don't need updating. However some don't, so add checks to those callers.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>
>> ---
>>
>> Changes since v1:
>>
>> - Dropped "mm/pagewalk: Skip dax pages in pagewalk" and replaced it
>> with this new patch for v2
>>
>> - As suggested by David and Jason we can filter the folios in the
>> callers instead of doing it in folio_start_walk(). Most callers
>> already do this (see below).
>>
>> I audited all callers of folio_walk_start() and found the following:
>>
>> mm/ksm.c:
>>
>> break_ksm() - doesn't need to filter zone_device pages because the can
>> never be KSM pages.
>>
>> get_mergeable_page() - already filters out zone_device pages.
>> scan_get_next_rmap_iterm() - already filters out zone_device_pages.
>>
>> mm/huge_memory.c:
>>
>> split_huge_pages_pid() - already checks for DAX with
>> vma_not_suitable_for_thp_split()
>>
>> mm/rmap.c:
>>
>> make_device_exclusive() - only works on anonymous pages, although
>> there'd be no issue with finding a DAX page even if support was extended
>> to file-backed pages.
>>
>> mm/migrate.c:
>>
>> add_folio_for_migration() - already checks the vma with vma_migratable()
>> do_pages_stat_array() - explicitly checks for zone_device folios
>>
>> kernel/event/uprobes.c:
>>
>> uprobe_write_opcode() - only works on anonymous pages, not sure if
>> zone_device could ever work so add an explicit check
>>
>> arch/s390/mm/fault.c:
>>
>> do_secure_storage_access() - not sure so be conservative and add a check
>>
>> arch/s390/kernel/uv.c:
>>
>> make_hva_secure() - not sure so be conservative and add a check
>> ---
>> arch/s390/kernel/uv.c | 2 +-
>> arch/s390/mm/fault.c | 2 +-
>> kernel/events/uprobes.c | 2 +-
>> 3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>> index b99478e..55aa280 100644
>> --- a/arch/s390/kernel/uv.c
>> +++ b/arch/s390/kernel/uv.c
>> @@ -424,7 +424,7 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
>> return -EFAULT;
>> }
>> folio = folio_walk_start(&fw, vma, hva, 0);
>> - if (!folio) {
>> + if (!folio || folio_is_zone_device(folio)) {
>> mmap_read_unlock(mm);
>> return -ENXIO;
>> }
>> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
>> index e1ad05b..df1a067 100644
>> --- a/arch/s390/mm/fault.c
>> +++ b/arch/s390/mm/fault.c
>> @@ -449,7 +449,7 @@ void do_secure_storage_access(struct pt_regs *regs)
>> if (!vma)
>> return handle_fault_error(regs, SEGV_MAPERR);
>> folio = folio_walk_start(&fw, vma, addr, 0);
>> - if (!folio) {
>> + if (!folio || folio_is_zone_device(folio)) {
>> mmap_read_unlock(mm);
>> return;
>> }
>
> Curious, does s390 even support ZONE_DEVICE and could trigger this?
Ah, I see you raised this above. Even if it could be triggered (which I
don't think), I wonder if there would actually be a problem with
zone_device folios in here?
I think these two can be dropped for now
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start()
2025-06-17 9:30 ` David Hildenbrand
@ 2025-06-19 0:41 ` Alistair Popple
0 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-19 0:41 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On Tue, Jun 17, 2025 at 11:30:20AM +0200, David Hildenbrand wrote:
> On 17.06.25 11:25, David Hildenbrand wrote:
> > On 16.06.25 13:58, Alistair Popple wrote:
> > > Previously dax pages were skipped by the pagewalk code as pud_special() or
> > > vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages are
> > > refcounted normally that is no longer the case, so the pagewalk code will
> > > start returning them.
> > >
> > > Most callers already explicitly filter for DAX or zone device pages so
> > > don't need updating. However some don't, so add checks to those callers.
> > >
> > > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> > >
> > > ---
> > >
> > > Changes since v1:
> > >
> > > - Dropped "mm/pagewalk: Skip dax pages in pagewalk" and replaced it
> > > with this new patch for v2
> > >
> > > - As suggested by David and Jason we can filter the folios in the
> > > callers instead of doing it in folio_start_walk(). Most callers
> > > already do this (see below).
> > >
> > > I audited all callers of folio_walk_start() and found the following:
> > >
> > > mm/ksm.c:
> > >
> > > break_ksm() - doesn't need to filter zone_device pages because the can
> > > never be KSM pages.
> > >
> > > get_mergeable_page() - already filters out zone_device pages.
> > > scan_get_next_rmap_iterm() - already filters out zone_device_pages.
> > >
> > > mm/huge_memory.c:
> > >
> > > split_huge_pages_pid() - already checks for DAX with
> > > vma_not_suitable_for_thp_split()
> > >
> > > mm/rmap.c:
> > >
> > > make_device_exclusive() - only works on anonymous pages, although
> > > there'd be no issue with finding a DAX page even if support was extended
> > > to file-backed pages.
> > >
> > > mm/migrate.c:
> > >
> > > add_folio_for_migration() - already checks the vma with vma_migratable()
> > > do_pages_stat_array() - explicitly checks for zone_device folios
> > >
> > > kernel/event/uprobes.c:
> > >
> > > uprobe_write_opcode() - only works on anonymous pages, not sure if
> > > zone_device could ever work so add an explicit check
> > >
> > > arch/s390/mm/fault.c:
> > >
> > > do_secure_storage_access() - not sure so be conservative and add a check
> > >
> > > arch/s390/kernel/uv.c:
> > >
> > > make_hva_secure() - not sure so be conservative and add a check
> > > ---
> > > arch/s390/kernel/uv.c | 2 +-
> > > arch/s390/mm/fault.c | 2 +-
> > > kernel/events/uprobes.c | 2 +-
> > > 3 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > > index b99478e..55aa280 100644
> > > --- a/arch/s390/kernel/uv.c
> > > +++ b/arch/s390/kernel/uv.c
> > > @@ -424,7 +424,7 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
> > > return -EFAULT;
> > > }
> > > folio = folio_walk_start(&fw, vma, hva, 0);
> > > - if (!folio) {
> > > + if (!folio || folio_is_zone_device(folio)) {
> > > mmap_read_unlock(mm);
> > > return -ENXIO;
> > > }
> > > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> > > index e1ad05b..df1a067 100644
> > > --- a/arch/s390/mm/fault.c
> > > +++ b/arch/s390/mm/fault.c
> > > @@ -449,7 +449,7 @@ void do_secure_storage_access(struct pt_regs *regs)
> > > if (!vma)
> > > return handle_fault_error(regs, SEGV_MAPERR);
> > > folio = folio_walk_start(&fw, vma, addr, 0);
> > > - if (!folio) {
> > > + if (!folio || folio_is_zone_device(folio)) {
> > > mmap_read_unlock(mm);
> > > return;
> > > }
> >
> > Curious, does s390 even support ZONE_DEVICE and could trigger this?
In thoery yes. Now that we don't need the DEVMAP PTE bit someone could enable
ZONE_DEVICE on s390 as it supports the rest of the prerequisites AFAICT:
config ZONE_DEVICE
bool "Device memory (pmem, HMM, etc...) hotplug support"
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
depends on SPARSEMEM_VMEMMAP
> Ah, I see you raised this above. Even if it could be triggered (which I
> don't think), I wonder if there would actually be a problem with zone_device
> folios in here?
Yes, I'm not sure either - it seems unlikely but I know nothing about how secure
storage works on s390 so was trying to be be conservative.
> I think these two can be dropped for now
Ok.
> > I wonder if __uprobe_write_opcode() would just work with anon device folios?
> >
> > We only modify page content, and conditionally zap the page. Would there
> > be a problem with anon device folios?
The two main types of anon device folios I know of are DEVICE_COHERENT
and DEVICE_PRIVATE. I doubt it would be a problem for the former, but it
would definitely be a problem for the latter as the actual page content is
unaddressable from the CPU.
So we could probably make the check specific to DEVICE_PRIVATE, although it's
hard to imagine anyone caring about uprobes from DEVICE_COHERENT memory.
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
2025-06-16 11:58 ` [PATCH v2 01/14] mm: Convert pXd_devmap checks to vma_is_dax Alistair Popple
2025-06-16 11:58 ` [PATCH v2 02/14] mm: Filter zone device pages returned from folio_walk_start() Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:49 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 04/14] mm: Remove remaining uses of PFN_DEV Alistair Popple
` (10 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
DAX no longer requires device PTEs as it always has a ZONE_DEVICE page
associated with the PTE that can be reference counted normally. Other users
of pte_devmap are drivers that set PFN_DEV when calling vmf_insert_mixed()
which ensures vm_normal_page() returns NULL for these entries.
There is no reason to distinguish these pte_devmap users so in order to
free up a PTE bit use pte_special instead for entries created with
vmf_insert_mixed(). This will ensure vm_normal_page() will continue to
return NULL for these pages.
Architectures that don't support pte_special also don't support pte_devmap
so those will continue to rely on pfn_valid() to determine if the page can
be mapped.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
mm/hmm.c | 3 ---
mm/memory.c | 20 ++------------------
mm/vmscan.c | 2 +-
3 files changed, 3 insertions(+), 22 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index 5311753..1a3489f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -302,13 +302,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
goto fault;
/*
- * Bypass devmap pte such as DAX page when all pfn requested
- * flags(pfn_req_flags) are fulfilled.
* Since each architecture defines a struct page for the zero page, just
* fall through and treat it like a normal page.
*/
if (!vm_normal_page(walk->vma, addr, pte) &&
- !pte_devmap(pte) &&
!is_zero_pfn(pte_pfn(pte))) {
if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
pte_unmap(ptep);
diff --git a/mm/memory.c b/mm/memory.c
index b0cda5a..2c6eda1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -598,16 +598,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
return NULL;
if (is_zero_pfn(pfn))
return NULL;
- if (pte_devmap(pte))
- /*
- * NOTE: New users of ZONE_DEVICE will not set pte_devmap()
- * and will have refcounts incremented on their struct pages
- * when they are inserted into PTEs, thus they are safe to
- * return here. Legacy ZONE_DEVICE pages that set pte_devmap()
- * do not have refcounts. Example of legacy ZONE_DEVICE is
- * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers.
- */
- return NULL;
print_bad_pte(vma, addr, pte, NULL);
return NULL;
@@ -2483,10 +2473,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
}
/* Ok, finally just insert the thing.. */
- if (pfn_t_devmap(pfn))
- entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
- else
- entry = pte_mkspecial(pfn_t_pte(pfn, prot));
+ entry = pte_mkspecial(pfn_t_pte(pfn, prot));
if (mkwrite) {
entry = pte_mkyoung(entry);
@@ -2597,8 +2584,6 @@ static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite)
/* these checks mirror the abort conditions in vm_normal_page */
if (vma->vm_flags & VM_MIXEDMAP)
return true;
- if (pfn_t_devmap(pfn))
- return true;
if (pfn_t_special(pfn))
return true;
if (is_zero_pfn(pfn_t_to_pfn(pfn)))
@@ -2630,8 +2615,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
* than insert_pfn). If a zero_pfn were inserted into a VM_MIXEDMAP
* without pte special, it would there be refcounted as a normal page.
*/
- if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
- !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pfn_t_valid(pfn)) {
struct page *page;
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a93a1ba..85bf782 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3424,7 +3424,7 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned
if (!pte_present(pte) || is_zero_pfn(pfn))
return -1;
- if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
+ if (WARN_ON_ONCE(pte_special(pte)))
return -1;
if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm))
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special
2025-06-16 11:58 ` [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special Alistair Popple
@ 2025-06-17 9:49 ` David Hildenbrand
2025-06-18 9:31 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:49 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> DAX no longer requires device PTEs as it always has a ZONE_DEVICE page
> associated with the PTE that can be reference counted normally. Other users
> of pte_devmap are drivers that set PFN_DEV when calling vmf_insert_mixed()
> which ensures vm_normal_page() returns NULL for these entries.
>
> There is no reason to distinguish these pte_devmap users so in order to
> free up a PTE bit use pte_special instead for entries created with
> vmf_insert_mixed(). This will ensure vm_normal_page() will continue to
> return NULL for these pages.
>
> Architectures that don't support pte_special also don't support pte_devmap
> so those will continue to rely on pfn_valid() to determine if the page can
> be mapped.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> ---
> mm/hmm.c | 3 ---
> mm/memory.c | 20 ++------------------
> mm/vmscan.c | 2 +-
> 3 files changed, 3 insertions(+), 22 deletions(-)
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 5311753..1a3489f 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -302,13 +302,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> goto fault;
>
> /*
> - * Bypass devmap pte such as DAX page when all pfn requested
> - * flags(pfn_req_flags) are fulfilled.
> * Since each architecture defines a struct page for the zero page, just
> * fall through and treat it like a normal page.
> */
> if (!vm_normal_page(walk->vma, addr, pte) &&
> - !pte_devmap(pte) &&
> !is_zero_pfn(pte_pfn(pte))) {
> if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
> pte_unmap(ptep);
> diff --git a/mm/memory.c b/mm/memory.c
> index b0cda5a..2c6eda1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -598,16 +598,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> return NULL;
> if (is_zero_pfn(pfn))
> return NULL;
> - if (pte_devmap(pte))
> - /*
> - * NOTE: New users of ZONE_DEVICE will not set pte_devmap()
> - * and will have refcounts incremented on their struct pages
> - * when they are inserted into PTEs, thus they are safe to
> - * return here. Legacy ZONE_DEVICE pages that set pte_devmap()
> - * do not have refcounts. Example of legacy ZONE_DEVICE is
> - * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers.
> - */
> - return NULL;
>
> print_bad_pte(vma, addr, pte, NULL);
> return NULL;
> @@ -2483,10 +2473,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> }
>
> /* Ok, finally just insert the thing.. */
> - if (pfn_t_devmap(pfn))
> - entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
> - else
> - entry = pte_mkspecial(pfn_t_pte(pfn, prot));
> + entry = pte_mkspecial(pfn_t_pte(pfn, prot));
>
> if (mkwrite) {
> entry = pte_mkyoung(entry);
> @@ -2597,8 +2584,6 @@ static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite)
> /* these checks mirror the abort conditions in vm_normal_page */
> if (vma->vm_flags & VM_MIXEDMAP)
> return true;
> - if (pfn_t_devmap(pfn))
> - return true;
> if (pfn_t_special(pfn))
> return true;
> if (is_zero_pfn(pfn_t_to_pfn(pfn)))
> @@ -2630,8 +2615,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
> * than insert_pfn). If a zero_pfn were inserted into a VM_MIXEDMAP
> * without pte special, it would there be refcounted as a normal page.
> */
> - if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) &&
> - !pfn_t_devmap(pfn) && pfn_t_valid(pfn)) {
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pfn_t_valid(pfn)) {
> struct page *page;
>
> /*
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a93a1ba..85bf782 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3424,7 +3424,7 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned
> if (!pte_present(pte) || is_zero_pfn(pfn))
> return -1;
>
> - if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
> + if (WARN_ON_ONCE(pte_special(pte)))
> return -1;
>
> if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm))
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special
2025-06-17 9:49 ` David Hildenbrand
@ 2025-06-18 9:31 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-18 9:31 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 17.06.25 11:49, David Hildenbrand wrote:
> On 16.06.25 13:58, Alistair Popple wrote:
>> DAX no longer requires device PTEs as it always has a ZONE_DEVICE page
>> associated with the PTE that can be reference counted normally. Other users
>> of pte_devmap are drivers that set PFN_DEV when calling vmf_insert_mixed()
>> which ensures vm_normal_page() returns NULL for these entries.
>>
>> There is no reason to distinguish these pte_devmap users so in order to
>> free up a PTE bit use pte_special instead for entries created with
>> vmf_insert_mixed(). This will ensure vm_normal_page() will continue to
>> return NULL for these pages.
>>
>> Architectures that don't support pte_special also don't support pte_devmap
>> so those will continue to rely on pfn_valid() to determine if the page can
>> be mapped.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> ---
As Andrew notes offlined, there is no content here. I sent this by mistake
after replying to patch#6 instead.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 04/14] mm: Remove remaining uses of PFN_DEV
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (2 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 03/14] mm: Convert vmf_insert_mixed() from using pte_devmap to pte_special Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-16 11:58 ` [PATCH v2 05/14] mm/gup: Remove pXX_devmap usage from get_user_pages() Alistair Popple
` (9 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
PFN_DEV was used by callers of dax_direct_access() to figure out if the
returned PFN is associated with a page using pfn_t_has_page() or
not. However all DAX PFNs now require an assoicated ZONE_DEVICE page so can
assume a page exists.
Other users of PFN_DEV were setting it before calling
vmf_insert_mixed(). This is unnecessary as it is no longer checked, instead
relying on pfn_valid() to determine if there is an associated page or not.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v1:
- Removed usage of pfn_t_devmap() that was removed later in the series
to maintain bisectability as pointed out by Jonathan Cameron.
- Rebased on David's cleanup[1]
[1] https://lore.kernel.org/linux-mm/20250611120654.545963-1-david@redhat.com/
---
drivers/gpu/drm/gma500/fbdev.c | 2 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 5 ++---
drivers/s390/block/dcssblk.c | 3 +--
drivers/vfio/pci/vfio_pci_core.c | 6 ++----
fs/cramfs/inode.c | 2 +-
include/linux/pfn_t.h | 7 ++-----
mm/memory.c | 2 +-
7 files changed, 10 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/gma500/fbdev.c b/drivers/gpu/drm/gma500/fbdev.c
index 8edefea..109efdc 100644
--- a/drivers/gpu/drm/gma500/fbdev.c
+++ b/drivers/gpu/drm/gma500/fbdev.c
@@ -33,7 +33,7 @@ static vm_fault_t psb_fbdev_vm_fault(struct vm_fault *vmf)
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
for (i = 0; i < page_num; ++i) {
- err = vmf_insert_mixed(vma, address, __pfn_to_pfn_t(pfn, PFN_DEV));
+ err = vmf_insert_mixed(vma, address, __pfn_to_pfn_t(pfn, 0));
if (unlikely(err & VM_FAULT_ERROR))
break;
address += PAGE_SIZE;
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index b9c67e4..9df05b2 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -371,8 +371,7 @@ static vm_fault_t omap_gem_fault_1d(struct drm_gem_object *obj,
VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->address,
pfn, pfn << PAGE_SHIFT);
- return vmf_insert_mixed(vma, vmf->address,
- __pfn_to_pfn_t(pfn, PFN_DEV));
+ return vmf_insert_mixed(vma, vmf->address, __pfn_to_pfn_t(pfn, 0));
}
/* Special handling for the case of faulting in 2d tiled buffers */
@@ -468,7 +467,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object *obj,
for (i = n; i > 0; i--) {
ret = vmf_insert_mixed(vma,
- vaddr, __pfn_to_pfn_t(pfn, PFN_DEV));
+ vaddr, __pfn_to_pfn_t(pfn, 0));
if (ret & VM_FAULT_ERROR)
break;
pfn += priv->usergart[fmt].stride_pfn;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index cdc7b2f..249ae40 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -923,8 +923,7 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, pgoff_t pgoff,
if (kaddr)
*kaddr = __va(dev_info->start + offset);
if (pfn)
- *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset),
- PFN_DEV);
+ *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), 0);
return (dev_sz - offset) / PAGE_SIZE;
}
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 6328c3a..3f2ad5f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1669,14 +1669,12 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
break;
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
case PMD_ORDER:
- ret = vmf_insert_pfn_pmd(vmf,
- __pfn_to_pfn_t(pfn, PFN_DEV), false);
+ ret = vmf_insert_pfn_pmd(vmf, __pfn_to_pfn_t(pfn, 0), false);
break;
#endif
#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP
case PUD_ORDER:
- ret = vmf_insert_pfn_pud(vmf,
- __pfn_to_pfn_t(pfn, PFN_DEV), false);
+ ret = vmf_insert_pfn_pud(vmf, __pfn_to_pfn_t(pfn, 0), false);
break;
#endif
default:
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index b84d174..820a664 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -412,7 +412,7 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
for (i = 0; i < pages && !ret; i++) {
vm_fault_t vmf;
unsigned long off = i * PAGE_SIZE;
- pfn_t pfn = phys_to_pfn_t(address + off, PFN_DEV);
+ pfn_t pfn = phys_to_pfn_t(address + off, 0);
vmf = vmf_insert_mixed(vma, vma->vm_start + off, pfn);
if (vmf & VM_FAULT_ERROR)
ret = vm_fault_to_errno(vmf, 0);
diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
index 2d91482..2b0d6e0 100644
--- a/include/linux/pfn_t.h
+++ b/include/linux/pfn_t.h
@@ -7,7 +7,6 @@
* PFN_FLAGS_MASK - mask of all the possible valid pfn_t flags
* PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
* PFN_SG_LAST - pfn references a page and is the last scatterlist entry
- * PFN_DEV - pfn is not covered by system memmap by default
* PFN_MAP - pfn has a dynamic page mapping established by a device driver
* PFN_SPECIAL - for CONFIG_FS_DAX_LIMITED builds to allow XIP, but not
* get_user_pages
@@ -15,7 +14,6 @@
#define PFN_FLAGS_MASK (((u64) (~PAGE_MASK)) << (BITS_PER_LONG_LONG - PAGE_SHIFT))
#define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
#define PFN_SG_LAST (1ULL << (BITS_PER_LONG_LONG - 2))
-#define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
#define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
#define PFN_SPECIAL (1ULL << (BITS_PER_LONG_LONG - 5))
@@ -23,7 +21,6 @@
{ PFN_SPECIAL, "SPECIAL" }, \
{ PFN_SG_CHAIN, "SG_CHAIN" }, \
{ PFN_SG_LAST, "SG_LAST" }, \
- { PFN_DEV, "DEV" }, \
{ PFN_MAP, "MAP" }
static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
@@ -46,7 +43,7 @@ static inline pfn_t phys_to_pfn_t(phys_addr_t addr, u64 flags)
static inline bool pfn_t_has_page(pfn_t pfn)
{
- return (pfn.val & PFN_MAP) == PFN_MAP || (pfn.val & PFN_DEV) == 0;
+ return (pfn.val & PFN_MAP) == PFN_MAP;
}
static inline unsigned long pfn_t_to_pfn(pfn_t pfn)
@@ -100,7 +97,7 @@ static inline pud_t pfn_t_pud(pfn_t pfn, pgprot_t pgprot)
#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
static inline bool pfn_t_devmap(pfn_t pfn)
{
- const u64 flags = PFN_DEV|PFN_MAP;
+ const u64 flags = PFN_MAP;
return (pfn.val & flags) == flags;
}
diff --git a/mm/memory.c b/mm/memory.c
index 2c6eda1..97aaad9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2544,7 +2544,7 @@ vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
pfnmap_setup_cachemode_pfn(pfn, &pgprot);
- return insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot,
+ return insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, 0), pgprot,
false);
}
EXPORT_SYMBOL(vmf_insert_pfn_prot);
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 05/14] mm/gup: Remove pXX_devmap usage from get_user_pages()
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (3 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 04/14] mm: Remove remaining uses of PFN_DEV Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-16 11:58 ` [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn() Alistair Popple
` (8 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
GUP uses pXX_devmap() calls to see if it needs to a get a reference on
the associated pgmap data structure to ensure the pages won't go
away. However it's a driver responsibility to ensure that if pages are
mapped (ie. discoverable by GUP) that they are not offlined or removed
from the memmap so there is no need to hold a reference on the pgmap
data structure to ensure this.
Furthermore mappings with PFN_DEV are no longer created, hence this
effectively dead code anyway so can be removed.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/huge_mm.h | 3 +-
mm/gup.c | 160 +----------------------------------------
mm/huge_memory.c | 40 +----------
3 files changed, 5 insertions(+), 198 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c9..519c3f0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -473,9 +473,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
return folio_order(folio) >= HPAGE_PMD_ORDER;
}
-struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
-
vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf);
extern struct folio *huge_zero_folio;
diff --git a/mm/gup.c b/mm/gup.c
index cbe8e4b..6888e87 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -679,31 +679,9 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma,
return NULL;
pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
-
- if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
- pud_devmap(pud)) {
- /*
- * device mapped pages can only be returned if the caller
- * will manage the page reference count.
- *
- * At least one of FOLL_GET | FOLL_PIN must be set, so
- * assert that here:
- */
- if (!(flags & (FOLL_GET | FOLL_PIN)))
- return ERR_PTR(-EEXIST);
-
- if (flags & FOLL_TOUCH)
- touch_pud(vma, addr, pudp, flags & FOLL_WRITE);
-
- ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap);
- if (!ctx->pgmap)
- return ERR_PTR(-EFAULT);
- }
-
page = pfn_to_page(pfn);
- if (!pud_devmap(pud) && !pud_write(pud) &&
- gup_must_unshare(vma, flags, page))
+ if (!pud_write(pud) && gup_must_unshare(vma, flags, page))
return ERR_PTR(-EMLINK);
ret = try_grab_folio(page_folio(page), 1, flags);
@@ -857,8 +835,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
page = vm_normal_page(vma, address, pte);
/*
- * We only care about anon pages in can_follow_write_pte() and don't
- * have to worry about pte_devmap() because they are never anon.
+ * We only care about anon pages in can_follow_write_pte().
*/
if ((flags & FOLL_WRITE) &&
!can_follow_write_pte(pte, page, vma, flags)) {
@@ -866,18 +843,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
goto out;
}
- if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
- /*
- * Only return device mapping pages in the FOLL_GET or FOLL_PIN
- * case since they are only valid while holding the pgmap
- * reference.
- */
- *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
- if (*pgmap)
- page = pte_page(pte);
- else
- goto no_page;
- } else if (unlikely(!page)) {
+ if (unlikely(!page)) {
if (flags & FOLL_DUMP) {
/* Avoid special (like zero) pages in core dumps */
page = ERR_PTR(-EFAULT);
@@ -959,14 +925,6 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
return no_page_table(vma, flags, address);
if (!pmd_present(pmdval))
return no_page_table(vma, flags, address);
- if (pmd_devmap(pmdval)) {
- ptl = pmd_lock(mm, pmd);
- page = follow_devmap_pmd(vma, address, pmd, flags, &ctx->pgmap);
- spin_unlock(ptl);
- if (page)
- return page;
- return no_page_table(vma, flags, address);
- }
if (likely(!pmd_leaf(pmdval)))
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
@@ -2896,7 +2854,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
int *nr)
{
struct dev_pagemap *pgmap = NULL;
- int nr_start = *nr, ret = 0;
+ int ret = 0;
pte_t *ptep, *ptem;
ptem = ptep = pte_offset_map(&pmd, addr);
@@ -2920,16 +2878,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
goto pte_unmap;
- if (pte_devmap(pte)) {
- if (unlikely(flags & FOLL_LONGTERM))
- goto pte_unmap;
-
- pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
- if (unlikely(!pgmap)) {
- gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
- goto pte_unmap;
- }
- } else if (pte_special(pte))
+ if (pte_special(pte))
goto pte_unmap;
/* If it's not marked as special it must have a valid memmap. */
@@ -3001,91 +2950,6 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
}
#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
-#if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
-static int gup_fast_devmap_leaf(unsigned long pfn, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages, int *nr)
-{
- int nr_start = *nr;
- struct dev_pagemap *pgmap = NULL;
-
- do {
- struct folio *folio;
- struct page *page = pfn_to_page(pfn);
-
- pgmap = get_dev_pagemap(pfn, pgmap);
- if (unlikely(!pgmap)) {
- gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
- break;
- }
-
- folio = try_grab_folio_fast(page, 1, flags);
- if (!folio) {
- gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
- break;
- }
- folio_set_referenced(folio);
- pages[*nr] = page;
- (*nr)++;
- pfn++;
- } while (addr += PAGE_SIZE, addr != end);
-
- put_dev_pagemap(pgmap);
- return addr == end;
-}
-
-static int gup_fast_devmap_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages,
- int *nr)
-{
- unsigned long fault_pfn;
- int nr_start = *nr;
-
- fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- if (!gup_fast_devmap_leaf(fault_pfn, addr, end, flags, pages, nr))
- return 0;
-
- if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
- gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
- return 0;
- }
- return 1;
-}
-
-static int gup_fast_devmap_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages,
- int *nr)
-{
- unsigned long fault_pfn;
- int nr_start = *nr;
-
- fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- if (!gup_fast_devmap_leaf(fault_pfn, addr, end, flags, pages, nr))
- return 0;
-
- if (unlikely(pud_val(orig) != pud_val(*pudp))) {
- gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
- return 0;
- }
- return 1;
-}
-#else
-static int gup_fast_devmap_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages,
- int *nr)
-{
- BUILD_BUG();
- return 0;
-}
-
-static int gup_fast_devmap_pud_leaf(pud_t pud, pud_t *pudp, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages,
- int *nr)
-{
- BUILD_BUG();
- return 0;
-}
-#endif
-
static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
unsigned long end, unsigned int flags, struct page **pages,
int *nr)
@@ -3100,13 +2964,6 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
if (pmd_special(orig))
return 0;
- if (pmd_devmap(orig)) {
- if (unlikely(flags & FOLL_LONGTERM))
- return 0;
- return gup_fast_devmap_pmd_leaf(orig, pmdp, addr, end, flags,
- pages, nr);
- }
-
page = pmd_page(orig);
refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
@@ -3147,13 +3004,6 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
if (pud_special(orig))
return 0;
- if (pud_devmap(orig)) {
- if (unlikely(flags & FOLL_LONGTERM))
- return 0;
- return gup_fast_devmap_pud_leaf(orig, pudp, addr, end, flags,
- pages, nr);
- }
-
page = pud_page(orig);
refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbc1dab..b096240 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1672,46 +1672,6 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
update_mmu_cache_pmd(vma, addr, pmd);
}
-struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd, int flags, struct dev_pagemap **pgmap)
-{
- unsigned long pfn = pmd_pfn(*pmd);
- struct mm_struct *mm = vma->vm_mm;
- struct page *page;
- int ret;
-
- assert_spin_locked(pmd_lockptr(mm, pmd));
-
- if (flags & FOLL_WRITE && !pmd_write(*pmd))
- return NULL;
-
- if (pmd_present(*pmd) && pmd_devmap(*pmd))
- /* pass */;
- else
- return NULL;
-
- if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd, flags & FOLL_WRITE);
-
- /*
- * device mapped pages can only be returned if the
- * caller will manage the page reference count.
- */
- if (!(flags & (FOLL_GET | FOLL_PIN)))
- return ERR_PTR(-EEXIST);
-
- pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT;
- *pgmap = get_dev_pagemap(pfn, *pgmap);
- if (!*pgmap)
- return ERR_PTR(-EFAULT);
- page = pfn_to_page(pfn);
- ret = try_grab_folio(page_folio(page), 1, flags);
- if (ret)
- page = ERR_PTR(ret);
-
- return page;
-}
-
int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn()
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (4 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 05/14] mm/gup: Remove pXX_devmap usage from get_user_pages() Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:49 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 07/14] mm: Remove redundant pXd_devmap calls Alistair Popple
` (7 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski
Nothing uses PFN_DEV anymore so no need to create devmap pXd's when
mapping a PFN. Instead special mappings will be created which ensures
vm_normal_page_pXd() will not return pages which don't have an
associated page. This could change behaviour slightly on architectures
where pXd_devmap() does not imply pXd_special() as the normal page
checks would have fallen through to checking VM_PFNMAP/MIXEDMAP instead,
which in theory at least could have returned a page.
However vm_normal_page_pXd() should never have been returning pages for
pXd_devmap() entries anyway, so anything relying on that would have been
a bug.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Changes since v1:
- New for v2
---
mm/huge_memory.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b096240..6514e25 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1415,11 +1415,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
} else {
entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot));
-
- if (pfn_t_devmap(fop.pfn))
- entry = pmd_mkdevmap(entry);
- else
- entry = pmd_mkspecial(entry);
+ entry = pmd_mkspecial(entry);
}
if (write) {
entry = pmd_mkyoung(pmd_mkdirty(entry));
@@ -1565,11 +1561,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR);
} else {
entry = pud_mkhuge(pfn_t_pud(fop.pfn, prot));
-
- if (pfn_t_devmap(fop.pfn))
- entry = pud_mkdevmap(entry);
- else
- entry = pud_mkspecial(entry);
+ entry = pud_mkspecial(entry);
}
if (write) {
entry = pud_mkyoung(pud_mkdirty(entry));
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn()
2025-06-16 11:58 ` [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn() Alistair Popple
@ 2025-06-17 9:49 ` David Hildenbrand
2025-06-19 8:52 ` Alistair Popple
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:49 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On 16.06.25 13:58, Alistair Popple wrote:
> Nothing uses PFN_DEV anymore so no need to create devmap pXd's when
> mapping a PFN. Instead special mappings will be created which ensures
> vm_normal_page_pXd() will not return pages which don't have an
> associated page. This could change behaviour slightly on architectures
> where pXd_devmap() does not imply pXd_special() as the normal page
> checks would have fallen through to checking VM_PFNMAP/MIXEDMAP instead,
> which in theory at least could have returned a page.
>
> However vm_normal_page_pXd() should never have been returning pages for
> pXd_devmap() entries anyway, so anything relying on that would have been
> a bug.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>
> ---
>
> Changes since v1:
>
> - New for v2
> ---
> mm/huge_memory.c | 12 ++----------
> 1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b096240..6514e25 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1415,11 +1415,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
> add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
> } else {
> entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot));
> -
> - if (pfn_t_devmap(fop.pfn))
> - entry = pmd_mkdevmap(entry);
> - else
> - entry = pmd_mkspecial(entry);
> + entry = pmd_mkspecial(entry);
> }
> if (write) {
> entry = pmd_mkyoung(pmd_mkdirty(entry));
> @@ -1565,11 +1561,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
> add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR);
> } else {
> entry = pud_mkhuge(pfn_t_pud(fop.pfn, prot));
> -
> - if (pfn_t_devmap(fop.pfn))
> - entry = pud_mkdevmap(entry);
> - else
> - entry = pud_mkspecial(entry);
> + entry = pud_mkspecial(entry);
> }
> if (write) {
> entry = pud_mkyoung(pud_mkdirty(entry));
Why not squash this patch into #3, and remove the pmd_special() check
from vm_normal_page_pmd() in the same go? Seems wrong to handle the
PMD/PUD case separately.
But now I am confused why some pte_devmap() checks are removed in patch
#3, while others are removed in #7.
Why not split it up into (a) stop setting p*_devmap() and (b) remove
p*_devmap().
Logically makes more sense to me ... :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn()
2025-06-17 9:49 ` David Hildenbrand
@ 2025-06-19 8:52 ` Alistair Popple
0 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-19 8:52 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On Tue, Jun 17, 2025 at 11:49:20AM +0200, David Hildenbrand wrote:
> On 16.06.25 13:58, Alistair Popple wrote:
> > Nothing uses PFN_DEV anymore so no need to create devmap pXd's when
> > mapping a PFN. Instead special mappings will be created which ensures
> > vm_normal_page_pXd() will not return pages which don't have an
> > associated page. This could change behaviour slightly on architectures
> > where pXd_devmap() does not imply pXd_special() as the normal page
> > checks would have fallen through to checking VM_PFNMAP/MIXEDMAP instead,
> > which in theory at least could have returned a page.
> >
> > However vm_normal_page_pXd() should never have been returning pages for
> > pXd_devmap() entries anyway, so anything relying on that would have been
> > a bug.
> >
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >
> > ---
> >
> > Changes since v1:
> >
> > - New for v2
> > ---
> > mm/huge_memory.c | 12 ++----------
> > 1 file changed, 2 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b096240..6514e25 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1415,11 +1415,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
> > add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
> > } else {
> > entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot));
> > -
> > - if (pfn_t_devmap(fop.pfn))
> > - entry = pmd_mkdevmap(entry);
> > - else
> > - entry = pmd_mkspecial(entry);
> > + entry = pmd_mkspecial(entry);
> > }
> > if (write) {
> > entry = pmd_mkyoung(pmd_mkdirty(entry));
> > @@ -1565,11 +1561,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
> > add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR);
> > } else {
> > entry = pud_mkhuge(pfn_t_pud(fop.pfn, prot));
> > -
> > - if (pfn_t_devmap(fop.pfn))
> > - entry = pud_mkdevmap(entry);
> > - else
> > - entry = pud_mkspecial(entry);
> > + entry = pud_mkspecial(entry);
> > }
> > if (write) {
> > entry = pud_mkyoung(pud_mkdirty(entry));
>
>
> Why not squash this patch into #3, and remove the pmd_special() check from
> vm_normal_page_pmd() in the same go? Seems wrong to handle the PMD/PUD case
> separately.
Yeah, that was mostly because "someone" (and thankyou btw, it was somewhat my
mess) changed all this while I was working on it :-) I wanted to make the rebase
fixups obvious but will squash them for v3.
> But now I am confused why some pte_devmap() checks are removed in patch #3,
> while others are removed in #7.
>
> Why not split it up into (a) stop setting p*_devmap() and (b) remove
> p*_devmap().
>
> Logically makes more sense to me ... :)
Heh. You're right. For various reasons this patch series has gone through a
couple of reorderings, mainly to get rid of unused stuff early in the series but
that didn't work out due to that RISC-V bug. I needed a break from silly rebase
build errors so this was a good checkpoint.
But I've reworked things for v3 to get the ordering a bit more sensible.
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 07/14] mm: Remove redundant pXd_devmap calls
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (5 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 06/14] mm/huge_memory: Remove pXd_devmap usage from insert_pXd_pfn() Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-16 11:58 ` [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check Alistair Popple
` (6 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski
DAX was the only thing that created pmd_devmap and pud_devmap entries
however it no longer does as DAX pages are now refcounted normally and
pXd_trans_huge() returns true for those. Therefore checking both pXd_devmap
and pXd_trans_huge() is redundant and the former can be removed without
changing behaviour as it will always be false.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Changes since v1:
- Removed a new pud_devmap() call added to __relocate_anon_folios().
This could never have been hit as only DAX created pud_devmap entries
and never for anonymous VMA's.
---
fs/dax.c | 5 ++---
include/linux/huge_mm.h | 10 ++++------
include/linux/pgtable.h | 2 +-
mm/hmm.c | 4 ++--
mm/huge_memory.c | 23 +++++++++--------------
mm/mapping_dirty_helpers.c | 4 ++--
mm/memory.c | 15 ++++++---------
mm/migrate_device.c | 2 +-
mm/mprotect.c | 2 +-
mm/mremap.c | 9 +++------
mm/page_vma_mapped.c | 5 ++---
mm/pagewalk.c | 8 +++-----
mm/pgtable-generic.c | 7 +++----
mm/userfaultfd.c | 4 ++--
mm/vmscan.c | 3 ---
15 files changed, 41 insertions(+), 62 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index ea0c357..7d4ecb9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1937,7 +1937,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
* the PTE we need to set up. If so just return and the fault will be
* retried.
*/
- if (pmd_trans_huge(*vmf->pmd) || pmd_devmap(*vmf->pmd)) {
+ if (pmd_trans_huge(*vmf->pmd)) {
ret = VM_FAULT_NOPAGE;
goto unlock_entry;
}
@@ -2060,8 +2060,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
* the PMD we need to set up. If so just return and the fault will be
* retried.
*/
- if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd) &&
- !pmd_devmap(*vmf->pmd)) {
+ if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd)) {
ret = 0;
goto unlock_entry;
}
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 519c3f0..21b3f0b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -400,8 +400,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
#define split_huge_pmd(__vma, __pmd, __address) \
do { \
pmd_t *____pmd = (__pmd); \
- if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \
- || pmd_devmap(*____pmd)) \
+ if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd)) \
__split_huge_pmd(__vma, __pmd, __address, \
false); \
} while (0)
@@ -426,8 +425,7 @@ change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
#define split_huge_pud(__vma, __pud, __address) \
do { \
pud_t *____pud = (__pud); \
- if (pud_trans_huge(*____pud) \
- || pud_devmap(*____pud)) \
+ if (pud_trans_huge(*____pud)) \
__split_huge_pud(__vma, __pud, __address); \
} while (0)
@@ -450,7 +448,7 @@ static inline int is_swap_pmd(pmd_t pmd)
static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
struct vm_area_struct *vma)
{
- if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
+ if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd))
return __pmd_trans_huge_lock(pmd, vma);
else
return NULL;
@@ -458,7 +456,7 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
struct vm_area_struct *vma)
{
- if (pud_trans_huge(*pud) || pud_devmap(*pud))
+ if (pud_trans_huge(*pud))
return __pud_trans_huge_lock(pud, vma);
else
return NULL;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e4a3895..0298e55 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1672,7 +1672,7 @@ static inline int pud_trans_unstable(pud_t *pud)
defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
pud_t pudval = READ_ONCE(*pud);
- if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval))
+ if (pud_none(pudval) || pud_trans_huge(pudval))
return 1;
if (unlikely(pud_bad(pudval))) {
pud_clear_bad(pud);
diff --git a/mm/hmm.c b/mm/hmm.c
index 1a3489f..f1b579f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -360,7 +360,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
}
- if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
+ if (pmd_trans_huge(pmd)) {
/*
* No need to take pmd_lock here, even if some other thread
* is splitting the huge pmd we will get that event through
@@ -371,7 +371,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
* values.
*/
pmd = pmdp_get_lockless(pmdp);
- if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
+ if (!pmd_trans_huge(pmd))
goto again;
return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6514e25..642fd83 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1459,8 +1459,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
* but we need to be consistent with PTEs and architectures that
* can't support a 'special' bit.
*/
- BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&
- !pfn_t_devmap(pfn));
+ BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) ==
(VM_PFNMAP|VM_MIXEDMAP));
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
@@ -1596,8 +1595,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write)
* but we need to be consistent with PTEs and architectures that
* can't support a 'special' bit.
*/
- BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&
- !pfn_t_devmap(pfn));
+ BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) ==
(VM_PFNMAP|VM_MIXEDMAP));
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
@@ -1815,7 +1813,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
ret = -EAGAIN;
pud = *src_pud;
- if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud)))
+ if (unlikely(!pud_trans_huge(pud)))
goto out_unlock;
/*
@@ -2677,8 +2675,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
{
spinlock_t *ptl;
ptl = pmd_lock(vma->vm_mm, pmd);
- if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) ||
- pmd_devmap(*pmd)))
+ if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)))
return ptl;
spin_unlock(ptl);
return NULL;
@@ -2695,7 +2692,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma)
spinlock_t *ptl;
ptl = pud_lock(vma->vm_mm, pud);
- if (likely(pud_trans_huge(*pud) || pud_devmap(*pud)))
+ if (likely(pud_trans_huge(*pud)))
return ptl;
spin_unlock(ptl);
return NULL;
@@ -2747,7 +2744,7 @@ static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud,
VM_BUG_ON(haddr & ~HPAGE_PUD_MASK);
VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma);
- VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud));
+ VM_BUG_ON(!pud_trans_huge(*pud));
count_vm_event(THP_SPLIT_PUD);
@@ -2780,7 +2777,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptl = pud_lock(vma->vm_mm, pud);
- if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud)))
+ if (unlikely(!pud_trans_huge(*pud)))
goto out;
__split_huge_pud_locked(vma, pud, range.start);
@@ -2853,8 +2850,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
VM_BUG_ON_VMA(vma->vm_start > haddr, vma);
VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma);
- VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)
- && !pmd_devmap(*pmd));
+ VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd));
count_vm_event(THP_SPLIT_PMD);
@@ -3062,8 +3058,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze)
{
VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd))
+ if (pmd_trans_huge(*pmd) || is_pmd_migration_entry(*pmd))
__split_huge_pmd_locked(vma, pmd, address, freeze);
}
diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c
index 2f8829b..208b428 100644
--- a/mm/mapping_dirty_helpers.c
+++ b/mm/mapping_dirty_helpers.c
@@ -129,7 +129,7 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end,
pmd_t pmdval = pmdp_get_lockless(pmd);
/* Do not split a huge pmd, present or migrated */
- if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) {
+ if (pmd_trans_huge(pmdval)) {
WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval));
walk->action = ACTION_CONTINUE;
}
@@ -152,7 +152,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end,
pud_t pudval = READ_ONCE(*pud);
/* Do not split a huge pud */
- if (pud_trans_huge(pudval) || pud_devmap(pudval)) {
+ if (pud_trans_huge(pudval)) {
WARN_ON(pud_write(pudval) || pud_dirty(pudval));
walk->action = ACTION_CONTINUE;
}
diff --git a/mm/memory.c b/mm/memory.c
index 97aaad9..f69e66d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -675,8 +675,6 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
}
}
- if (pmd_devmap(pmd))
- return NULL;
if (is_huge_zero_pmd(pmd))
return NULL;
if (unlikely(pfn > highest_memmap_pfn))
@@ -1240,8 +1238,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
src_pmd = pmd_offset(src_pud, addr);
do {
next = pmd_addr_end(addr, end);
- if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd)
- || pmd_devmap(*src_pmd)) {
+ if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd)) {
int err;
VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd,
@@ -1277,7 +1274,7 @@ copy_pud_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
src_pud = pud_offset(src_p4d, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) {
+ if (pud_trans_huge(*src_pud)) {
int err;
VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma);
@@ -1791,7 +1788,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
- if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
+ if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
__split_huge_pmd(vma, pmd, addr, false);
else if (zap_huge_pmd(tlb, vma, pmd, addr)) {
@@ -1833,7 +1830,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
pud = pud_offset(p4d, addr);
do {
next = pud_addr_end(addr, end);
- if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
+ if (pud_trans_huge(*pud)) {
if (next - addr != HPAGE_PUD_SIZE) {
mmap_assert_locked(tlb->mm);
split_huge_pud(vma, pud, addr);
@@ -6136,7 +6133,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
pud_t orig_pud = *vmf.pud;
barrier();
- if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) {
+ if (pud_trans_huge(orig_pud)) {
/*
* TODO once we support anonymous PUDs: NUMA case and
@@ -6177,7 +6174,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}
- if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
+ if (pmd_trans_huge(vmf.orig_pmd)) {
if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(&vmf);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 3158afe..e05e14d 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -615,7 +615,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
pmdp = pmd_alloc(mm, pudp, addr);
if (!pmdp)
goto abort;
- if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp))
+ if (pmd_trans_huge(*pmdp))
goto abort;
if (pte_alloc(mm, pmdp))
goto abort;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 88608d0..00d5989 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -376,7 +376,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
goto next;
_pmd = pmdp_get_lockless(pmd);
- if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) {
+ if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd)) {
if ((next - addr != HPAGE_PMD_SIZE) ||
pgtable_split_needed(vma, cp_flags)) {
__split_huge_pmd(vma, pmd, addr, false);
diff --git a/mm/mremap.c b/mm/mremap.c
index 7a0657b..6541faa 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1237,8 +1237,6 @@ static bool __relocate_anon_folios(struct pagetable_move_control *pmc, bool undo
/* Otherwise, we split so we can do this with PMDs/PTEs. */
split_huge_pud(pmc->old, pudp, old_addr);
- } else if (pud_devmap(pud)) {
- return false;
}
extent = get_old_extent(NORMAL_PMD, pmc);
@@ -1267,7 +1265,7 @@ static bool __relocate_anon_folios(struct pagetable_move_control *pmc, bool undo
/* Otherwise, we split so we can do this with PTEs. */
split_huge_pmd(pmc->old, pmdp, old_addr);
- } else if (is_swap_pmd(pmd) || pmd_devmap(pmd)) {
+ } else if (is_swap_pmd(pmd)) {
return false;
}
@@ -1333,7 +1331,7 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc)
new_pud = alloc_new_pud(mm, pmc->new_addr);
if (!new_pud)
break;
- if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
+ if (pud_trans_huge(*old_pud)) {
if (extent == HPAGE_PUD_SIZE) {
move_pgt_entry(pmc, HPAGE_PUD, old_pud, new_pud);
/* We ignore and continue on error? */
@@ -1352,8 +1350,7 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc)
if (!new_pmd)
break;
again:
- if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) ||
- pmd_devmap(*old_pmd)) {
+ if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd)) {
if (extent == HPAGE_PMD_SIZE &&
move_pgt_entry(pmc, HPAGE_PMD, old_pmd, new_pmd))
continue;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index e463c3b..e981a1a 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -246,8 +246,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
*/
pmde = pmdp_get_lockless(pvmw->pmd);
- if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) ||
- (pmd_present(pmde) && pmd_devmap(pmde))) {
+ if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
pmde = *pvmw->pmd;
if (!pmd_present(pmde)) {
@@ -262,7 +261,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
- if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) {
+ if (likely(pmd_trans_huge(pmde))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
if (!check_pmd(pmd_pfn(pmde), pvmw))
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index a214a2b..6480382 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -143,8 +143,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
* We are ONLY installing, so avoid unnecessarily
* splitting a present huge page.
*/
- if (pmd_present(*pmd) &&
- (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
+ if (pmd_present(*pmd) && pmd_trans_huge(*pmd))
continue;
}
@@ -210,8 +209,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
* We are ONLY installing, so avoid unnecessarily
* splitting a present huge page.
*/
- if (pud_present(*pud) &&
- (pud_trans_huge(*pud) || pud_devmap(*pud)))
+ if (pud_present(*pud) && pud_trans_huge(*pud))
continue;
}
@@ -908,7 +906,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
* TODO: FW_MIGRATION support for PUD migration entries
* once there are relevant users.
*/
- if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
+ if (!pud_present(pud) || pud_special(pud)) {
spin_unlock(ptl);
goto not_found;
} else if (!pud_leaf(pud)) {
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 5a882f2..567e2d0 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -139,8 +139,7 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
{
pmd_t pmd;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
- VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
- !pmd_devmap(*pmdp));
+ VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp));
pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return pmd;
@@ -153,7 +152,7 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
pud_t pud;
VM_BUG_ON(address & ~HPAGE_PUD_MASK);
- VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp));
+ VM_BUG_ON(!pud_trans_huge(*pudp));
pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp);
flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE);
return pud;
@@ -293,7 +292,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
*pmdvalp = pmdval;
if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval)))
goto nomap;
- if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval)))
+ if (unlikely(pmd_trans_huge(pmdval)))
goto nomap;
if (unlikely(pmd_bad(pmdval))) {
pmd_clear_bad(pmd);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 8395db2..879505c 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -795,8 +795,8 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx,
* (This includes the case where the PMD used to be THP and
* changed back to none after __pte_alloc().)
*/
- if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
- pmd_devmap(dst_pmdval))) {
+ if (unlikely(!pmd_present(dst_pmdval) ||
+ pmd_trans_huge(dst_pmdval))) {
err = -EEXIST;
break;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 85bf782..b8a2889 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3449,9 +3449,6 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned
if (!pmd_present(pmd) || is_huge_zero_pmd(pmd))
return -1;
- if (WARN_ON_ONCE(pmd_devmap(pmd)))
- return -1;
-
if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm))
return -1;
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (6 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 07/14] mm: Remove redundant pXd_devmap calls Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:44 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 09/14] powerpc: Remove checks for devmap pages and PMDs/PUDs Alistair Popple
` (5 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
The only users of pmd_devmap were device dax and fs dax. The check for
pmd_devmap() in check_pmd_state() is therefore redundant as callers
explicitly check for is_zone_device_page(), so this check can be dropped.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
mm/khugepaged.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 15203ea..d45d08b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -945,8 +945,6 @@ static inline int check_pmd_state(pmd_t *pmd)
return SCAN_PMD_NULL;
if (pmd_trans_huge(pmde))
return SCAN_PMD_MAPPED;
- if (pmd_devmap(pmde))
- return SCAN_PMD_NULL;
if (pmd_bad(pmde))
return SCAN_PMD_NULL;
return SCAN_SUCCEED;
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check
2025-06-16 11:58 ` [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check Alistair Popple
@ 2025-06-17 9:44 ` David Hildenbrand
2025-06-19 7:56 ` Alistair Popple
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:44 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> The only users of pmd_devmap were device dax and fs dax. The check for
> pmd_devmap() in check_pmd_state() is therefore redundant as callers
> explicitly check for is_zone_device_page(), so this check can be dropped.
>
Looking again, is this true?
If we return "SCAN_SUCCEED", we assume there is a page table there that
we can map and walk.
But I assume we can drop that check because nobody will ever set
pmd_devmap() anymore?
So likely just the description+sibject of this patch should be adjusted.
FWIW, I think check_pmd_state() should be changed to work on pmd_leaf()
etc, but that's something for another day.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check
2025-06-17 9:44 ` David Hildenbrand
@ 2025-06-19 7:56 ` Alistair Popple
0 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-19 7:56 UTC (permalink / raw)
To: David Hildenbrand
Cc: akpm, linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On Tue, Jun 17, 2025 at 11:44:30AM +0200, David Hildenbrand wrote:
> On 16.06.25 13:58, Alistair Popple wrote:
> > The only users of pmd_devmap were device dax and fs dax. The check for
> > pmd_devmap() in check_pmd_state() is therefore redundant as callers
> > explicitly check for is_zone_device_page(), so this check can be dropped.
> >
>
> Looking again, is this true?
>
> If we return "SCAN_SUCCEED", we assume there is a page table there that we
> can map and walk.
>
> But I assume we can drop that check because nobody will ever set
> pmd_devmap() anymore?
>
> So likely just the description+sibject of this patch should be adjusted.
Ugh. I wish I had've documented this better because it's all very "obvious",
but I think the description is still accurate. These devmap checks could never
have been hit and were therefore redundant, because in practice all callers call
thp_vma_allowable_order() either before or after check_pmd_state() (or both).
thp_vma_allowable_order() will eventually (in __thp_vma_allowable_orders())
check for a DAX VMA and bail. But yes, I should call that out in the patch
subject so I don't have to keep relearning this obvious fact :-)
> FWIW, I think check_pmd_state() should be changed to work on pmd_leaf() etc,
> but that's something for another day.
Sure, and I do appreciate these kind of remarks because sometimes I get bored
enough to come back and actually do them. Not very often ... but sometimes.
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 09/14] powerpc: Remove checks for devmap pages and PMDs/PUDs
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (7 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 08/14] mm/khugepaged: Remove redundant pmd_devmap() check Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:37 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 10/14] fs/dax: Remove FS_DAX_LIMITED config option Alistair Popple
` (4 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
PFN_DEV no longer exists. This means no devmap PMDs or PUDs will be
created, so checking for them is redundant. Instead mappings of pages that
would have previously returned true for pXd_devmap() will return true for
pXd_trans_huge()
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
arch/powerpc/mm/book3s64/hash_hugepage.c | 2 +-
arch/powerpc/mm/book3s64/hash_pgtable.c | 3 +--
arch/powerpc/mm/book3s64/hugetlbpage.c | 2 +-
arch/powerpc/mm/book3s64/pgtable.c | 10 ++++------
arch/powerpc/mm/book3s64/radix_pgtable.c | 5 ++---
arch/powerpc/mm/pgtable.c | 2 +-
6 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_hugepage.c b/arch/powerpc/mm/book3s64/hash_hugepage.c
index 15d6f3e..cdfd4fe 100644
--- a/arch/powerpc/mm/book3s64/hash_hugepage.c
+++ b/arch/powerpc/mm/book3s64/hash_hugepage.c
@@ -54,7 +54,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
/*
* Make sure this is thp or devmap entry
*/
- if (!(old_pmd & (H_PAGE_THP_HUGE | _PAGE_DEVMAP)))
+ if (!(old_pmd & H_PAGE_THP_HUGE))
return 0;
rflags = htab_convert_pte_flags(new_pmd, flags);
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d..82d3117 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -195,7 +195,7 @@ unsigned long hash__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr
unsigned long old;
#ifdef CONFIG_DEBUG_VM
- WARN_ON(!hash__pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ WARN_ON(!hash__pmd_trans_huge(*pmdp));
assert_spin_locked(pmd_lockptr(mm, pmdp));
#endif
@@ -227,7 +227,6 @@ pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addres
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
VM_BUG_ON(pmd_trans_huge(*pmdp));
- VM_BUG_ON(pmd_devmap(*pmdp));
pmd = *pmdp;
pmd_clear(pmdp);
diff --git a/arch/powerpc/mm/book3s64/hugetlbpage.c b/arch/powerpc/mm/book3s64/hugetlbpage.c
index 83c3361..2bcbbf9 100644
--- a/arch/powerpc/mm/book3s64/hugetlbpage.c
+++ b/arch/powerpc/mm/book3s64/hugetlbpage.c
@@ -74,7 +74,7 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
} while(!pte_xchg(ptep, __pte(old_pte), __pte(new_pte)));
/* Make sure this is a hugetlb entry */
- if (old_pte & (H_PAGE_THP_HUGE | _PAGE_DEVMAP))
+ if (old_pte & H_PAGE_THP_HUGE)
return 0;
rflags = htab_convert_pte_flags(new_pte, flags);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 0db01e1..b38cd0b 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -62,7 +62,7 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
{
int changed;
#ifdef CONFIG_DEBUG_VM
- WARN_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ WARN_ON(!pmd_trans_huge(*pmdp));
assert_spin_locked(pmd_lockptr(vma->vm_mm, pmdp));
#endif
changed = !pmd_same(*(pmdp), entry);
@@ -82,7 +82,6 @@ int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
{
int changed;
#ifdef CONFIG_DEBUG_VM
- WARN_ON(!pud_devmap(*pudp));
assert_spin_locked(pud_lockptr(vma->vm_mm, pudp));
#endif
changed = !pud_same(*(pudp), entry);
@@ -204,8 +203,8 @@ pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
{
pmd_t pmd;
VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
- VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) &&
- !pmd_devmap(*pmdp)) || !pmd_present(*pmdp));
+ VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp)) ||
+ !pmd_present(*pmdp));
pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp);
/*
* if it not a fullmm flush, then we can possibly end up converting
@@ -223,8 +222,7 @@ pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma,
pud_t pud;
VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
- VM_BUG_ON((pud_present(*pudp) && !pud_devmap(*pudp)) ||
- !pud_present(*pudp));
+ VM_BUG_ON(!pud_present(*pudp));
pud = pudp_huge_get_and_clear(vma->vm_mm, addr, pudp);
/*
* if it not a fullmm flush, then we can possibly end up converting
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 9f764bc..877870d 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1426,7 +1426,7 @@ unsigned long radix__pmd_hugepage_update(struct mm_struct *mm, unsigned long add
unsigned long old;
#ifdef CONFIG_DEBUG_VM
- WARN_ON(!radix__pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp));
+ WARN_ON(!radix__pmd_trans_huge(*pmdp));
assert_spin_locked(pmd_lockptr(mm, pmdp));
#endif
@@ -1443,7 +1443,7 @@ unsigned long radix__pud_hugepage_update(struct mm_struct *mm, unsigned long add
unsigned long old;
#ifdef CONFIG_DEBUG_VM
- WARN_ON(!pud_devmap(*pudp));
+ WARN_ON(!pud_trans_huge(*pudp));
assert_spin_locked(pud_lockptr(mm, pudp));
#endif
@@ -1461,7 +1461,6 @@ pmd_t radix__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long addre
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
VM_BUG_ON(radix__pmd_trans_huge(*pmdp));
- VM_BUG_ON(pmd_devmap(*pmdp));
/*
* khugepaged calls this for normal pmd
*/
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 61df5ae..dfaa9fd 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -509,7 +509,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
return NULL;
#endif
- if (pmd_trans_huge(pmd) || pmd_devmap(pmd)) {
+ if (pmd_trans_huge(pmd)) {
if (is_thp)
*is_thp = true;
ret_pte = (pte_t *)pmdp;
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 09/14] powerpc: Remove checks for devmap pages and PMDs/PUDs
2025-06-16 11:58 ` [PATCH v2 09/14] powerpc: Remove checks for devmap pages and PMDs/PUDs Alistair Popple
@ 2025-06-17 9:37 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:37 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> PFN_DEV no longer exists. This means no devmap PMDs or PUDs will be
> created, so checking for them is redundant. Instead mappings of pages that
> would have previously returned true for pXd_devmap() will return true for
> pXd_trans_huge()
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 10/14] fs/dax: Remove FS_DAX_LIMITED config option
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (8 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 09/14] powerpc: Remove checks for devmap pages and PMDs/PUDs Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:37 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 11/14] mm: Remove devmap related functions and page table bits Alistair Popple
` (3 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski
The dcssblk driver was the last user of FS_DAX_LIMITED. That was marked
broken by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
support") to allow removal of PFN_SPECIAL. However the FS_DAX_LIMITED
config option itself was not removed, so do that now.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Changes since v1:
- New for v2.
---
fs/Kconfig | 9 +--------
fs/dax.c | 12 ------------
mm/memremap.c | 4 ----
3 files changed, 1 insertion(+), 24 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 44b6cdd..ccdf371 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -59,7 +59,7 @@ endif # BLOCK
config FS_DAX
bool "File system based Direct Access (DAX) support"
depends on MMU
- depends on ZONE_DEVICE || FS_DAX_LIMITED
+ depends on ZONE_DEVICE
select FS_IOMAP
select DAX
help
@@ -95,13 +95,6 @@ config FS_DAX_PMD
depends on ZONE_DEVICE
depends on TRANSPARENT_HUGEPAGE
-# Selected by DAX drivers that do not expect filesystem DAX to support
-# get_user_pages() of DAX mappings. I.e. "limited" indicates no support
-# for fork() of processes with MAP_SHARED mappings or support for
-# direct-I/O to a DAX mapping.
-config FS_DAX_LIMITED
- bool
-
# Posix ACL utility routines
#
# Note: Posix ACLs can be implemented without these helpers. Never use
diff --git a/fs/dax.c b/fs/dax.c
index 7d4ecb9..f4ffb69 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -449,9 +449,6 @@ static void dax_associate_entry(void *entry, struct address_space *mapping,
if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry))
return;
- if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
- return;
-
index = linear_page_index(vma, address & ~(size - 1));
if (shared && (folio->mapping || dax_folio_is_shared(folio))) {
if (folio->mapping)
@@ -474,9 +471,6 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping,
{
struct folio *folio = dax_to_folio(entry);
- if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
- return;
-
if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry))
return;
@@ -768,12 +762,6 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping,
pgoff_t end_idx;
XA_STATE(xas, &mapping->i_pages, start_idx);
- /*
- * In the 'limited' case get_user_pages() for dax is disabled.
- */
- if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
- return NULL;
-
if (!dax_mapping(mapping))
return NULL;
diff --git a/mm/memremap.c b/mm/memremap.c
index c417c84..c17e0a6 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -332,10 +332,6 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
}
break;
case MEMORY_DEVICE_FS_DAX:
- if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) {
- WARN(1, "File system DAX not supported\n");
- return ERR_PTR(-EINVAL);
- }
params.pgprot = pgprot_decrypted(params.pgprot);
break;
case MEMORY_DEVICE_GENERIC:
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 10/14] fs/dax: Remove FS_DAX_LIMITED config option
2025-06-16 11:58 ` [PATCH v2 10/14] fs/dax: Remove FS_DAX_LIMITED config option Alistair Popple
@ 2025-06-17 9:37 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:37 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski
On 16.06.25 13:58, Alistair Popple wrote:
> The dcssblk driver was the last user of FS_DAX_LIMITED. That was marked
> broken by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
> support") to allow removal of PFN_SPECIAL. However the FS_DAX_LIMITED
> config option itself was not removed, so do that now.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 11/14] mm: Remove devmap related functions and page table bits
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (9 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 10/14] fs/dax: Remove FS_DAX_LIMITED config option Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:36 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 12/14] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST Alistair Popple
` (2 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Will Deacon, Björn Töpel, Jason Gunthorpe
Now that DAX and all other reference counts to ZONE_DEVICE pages are
managed normally there is no need for the special devmap PTE/PMD/PUD
page table bits. So drop all references to these, freeing up a
software defined page table bit on architectures supporting it.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Will Deacon <will@kernel.org> # arm64
Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
Documentation/mm/arch_pgtable_helpers.rst | 6 +--
arch/arm64/Kconfig | 1 +-
arch/arm64/include/asm/pgtable-prot.h | 1 +-
arch/arm64/include/asm/pgtable.h | 24 +--------
arch/loongarch/Kconfig | 1 +-
arch/loongarch/include/asm/pgtable-bits.h | 6 +--
arch/loongarch/include/asm/pgtable.h | 19 +------
arch/powerpc/Kconfig | 1 +-
arch/powerpc/include/asm/book3s/64/hash-4k.h | 6 +--
arch/powerpc/include/asm/book3s/64/hash-64k.h | 7 +--
arch/powerpc/include/asm/book3s/64/pgtable.h | 53 +------------------
arch/powerpc/include/asm/book3s/64/radix.h | 14 +-----
arch/riscv/Kconfig | 1 +-
arch/riscv/include/asm/pgtable-64.h | 16 +-----
arch/riscv/include/asm/pgtable-bits.h | 1 +-
arch/riscv/include/asm/pgtable.h | 17 +------
arch/x86/Kconfig | 1 +-
arch/x86/include/asm/pgtable.h | 51 +-----------------
arch/x86/include/asm/pgtable_types.h | 5 +--
include/linux/mm.h | 7 +--
include/linux/pfn_t.h | 20 +-------
include/linux/pgtable.h | 19 +------
mm/Kconfig | 4 +-
mm/debug_vm_pgtable.c | 59 +--------------------
mm/hmm.c | 3 +-
mm/madvise.c | 8 +--
26 files changed, 17 insertions(+), 334 deletions(-)
diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst
index af24516..c88c7fa 100644
--- a/Documentation/mm/arch_pgtable_helpers.rst
+++ b/Documentation/mm/arch_pgtable_helpers.rst
@@ -30,8 +30,6 @@ PTE Page Table Helpers
+---------------------------+--------------------------------------------------+
| pte_protnone | Tests a PROT_NONE PTE |
+---------------------------+--------------------------------------------------+
-| pte_devmap | Tests a ZONE_DEVICE mapped PTE |
-+---------------------------+--------------------------------------------------+
| pte_soft_dirty | Tests a soft dirty PTE |
+---------------------------+--------------------------------------------------+
| pte_swp_soft_dirty | Tests a soft dirty swapped PTE |
@@ -104,8 +102,6 @@ PMD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pmd_protnone | Tests a PROT_NONE PMD |
+---------------------------+--------------------------------------------------+
-| pmd_devmap | Tests a ZONE_DEVICE mapped PMD |
-+---------------------------+--------------------------------------------------+
| pmd_soft_dirty | Tests a soft dirty PMD |
+---------------------------+--------------------------------------------------+
| pmd_swp_soft_dirty | Tests a soft dirty swapped PMD |
@@ -177,8 +173,6 @@ PUD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pud_write | Tests a writable PUD |
+---------------------------+--------------------------------------------------+
-| pud_devmap | Tests a ZONE_DEVICE mapped PUD |
-+---------------------------+--------------------------------------------------+
| pud_mkyoung | Creates a young PUD |
+---------------------------+--------------------------------------------------+
| pud_mkold | Creates an old PUD |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 55fc331..94b48b1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -44,7 +44,6 @@ config ARM64
select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT
select ARCH_HAS_PREEMPT_LAZY
select ARCH_HAS_PTDUMP
- select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_HW_PTE_YOUNG
select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 7830d03..85dceb1 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -17,7 +17,6 @@
#define PTE_SWP_EXCLUSIVE (_AT(pteval_t, 1) << 2) /* only for swp ptes */
#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
-#define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
/*
* PTE_PRESENT_INVALID=1 & PTE_VALID=0 indicates that the pte's fields should be
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e511f90..ba63c87 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -190,7 +190,6 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
#define pte_user(pte) (!!(pte_val(pte) & PTE_USER))
#define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN))
#define pte_cont(pte) (!!(pte_val(pte) & PTE_CONT))
-#define pte_devmap(pte) (!!(pte_val(pte) & PTE_DEVMAP))
#define pte_tagged(pte) ((pte_val(pte) & PTE_ATTRINDX_MASK) == \
PTE_ATTRINDX(MT_NORMAL_TAGGED))
@@ -372,11 +371,6 @@ static inline pmd_t pmd_mkcont(pmd_t pmd)
return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
}
-static inline pte_t pte_mkdevmap(pte_t pte)
-{
- return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
-}
-
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
static inline int pte_uffd_wp(pte_t pte)
{
@@ -653,14 +647,6 @@ static inline pmd_t pmd_mkhuge(pmd_t pmd)
return __pmd((pmd_val(pmd) & ~mask) | val);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define pmd_devmap(pmd) pte_devmap(pmd_pte(pmd))
-#endif
-static inline pmd_t pmd_mkdevmap(pmd_t pmd)
-{
- return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP)));
-}
-
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
#define pmd_special(pte) (!!((pmd_val(pte) & PTE_SPECIAL)))
static inline pmd_t pmd_mkspecial(pmd_t pmd)
@@ -1302,16 +1288,6 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma,
return __ptep_set_access_flags(vma, address, (pte_t *)pmdp,
pmd_pte(entry), dirty);
}
-
-static inline int pud_devmap(pud_t pud)
-{
- return 0;
-}
-
-static inline int pgd_devmap(pgd_t pgd)
-{
- return 0;
-}
#endif
#ifdef CONFIG_PAGE_TABLE_CHECK
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 4b19f93..edb3db2 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -25,7 +25,6 @@ config LOONGARCH
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PREEMPT_LAZY
- select ARCH_HAS_PTE_DEVMAP
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SET_DIRECT_MAP
diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h
index 45bfc65..c8777a9 100644
--- a/arch/loongarch/include/asm/pgtable-bits.h
+++ b/arch/loongarch/include/asm/pgtable-bits.h
@@ -22,7 +22,6 @@
#define _PAGE_PFN_SHIFT 12
#define _PAGE_SWP_EXCLUSIVE_SHIFT 23
#define _PAGE_PFN_END_SHIFT 48
-#define _PAGE_DEVMAP_SHIFT 59
#define _PAGE_PRESENT_INVALID_SHIFT 60
#define _PAGE_NO_READ_SHIFT 61
#define _PAGE_NO_EXEC_SHIFT 62
@@ -36,7 +35,6 @@
#define _PAGE_MODIFIED (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT)
#define _PAGE_PROTNONE (_ULCAST_(1) << _PAGE_PROTNONE_SHIFT)
#define _PAGE_SPECIAL (_ULCAST_(1) << _PAGE_SPECIAL_SHIFT)
-#define _PAGE_DEVMAP (_ULCAST_(1) << _PAGE_DEVMAP_SHIFT)
/* We borrow bit 23 to store the exclusive marker in swap PTEs. */
#define _PAGE_SWP_EXCLUSIVE (_ULCAST_(1) << _PAGE_SWP_EXCLUSIVE_SHIFT)
@@ -76,8 +74,8 @@
#define __READABLE (_PAGE_VALID)
#define __WRITEABLE (_PAGE_DIRTY | _PAGE_WRITE)
-#define _PAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PAGE_DEVMAP | _PFN_MASK | _CACHE_MASK | _PAGE_PLV)
-#define _HPAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PAGE_DEVMAP | _PFN_MASK | _CACHE_MASK | _PAGE_PLV | _PAGE_HUGE)
+#define _PAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PFN_MASK | _CACHE_MASK | _PAGE_PLV)
+#define _HPAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PFN_MASK | _CACHE_MASK | _PAGE_PLV | _PAGE_HUGE)
#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_NO_READ | \
_PAGE_USER | _CACHE_CC)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index b301853..40e6e0b 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -409,9 +409,6 @@ static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL;
static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; return pte; }
#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
-static inline int pte_devmap(pte_t pte) { return !!(pte_val(pte) & _PAGE_DEVMAP); }
-static inline pte_t pte_mkdevmap(pte_t pte) { pte_val(pte) |= _PAGE_DEVMAP; return pte; }
-
#define pte_accessible pte_accessible
static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a)
{
@@ -540,17 +537,6 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
return pmd;
}
-static inline int pmd_devmap(pmd_t pmd)
-{
- return !!(pmd_val(pmd) & _PAGE_DEVMAP);
-}
-
-static inline pmd_t pmd_mkdevmap(pmd_t pmd)
-{
- pmd_val(pmd) |= _PAGE_DEVMAP;
- return pmd;
-}
-
static inline struct page *pmd_page(pmd_t pmd)
{
if (pmd_trans_huge(pmd))
@@ -606,11 +592,6 @@ static inline long pmd_protnone(pmd_t pmd)
#define pmd_leaf(pmd) ((pmd_val(pmd) & _PAGE_HUGE) != 0)
#define pud_leaf(pud) ((pud_val(pud) & _PAGE_HUGE) != 0)
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define pud_devmap(pud) (0)
-#define pgd_devmap(pgd) (0)
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
/*
* We provide our own get_unmapped area to cope with the virtual aliasing
* constraints placed on us by the cache architecture.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c3e0cc8..7a555c1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,7 +149,6 @@ config PPC
select ARCH_HAS_PMEM_API
select ARCH_HAS_PREEMPT_LAZY
select ARCH_HAS_PTDUMP
- select ARCH_HAS_PTE_DEVMAP if PPC_BOOK3S_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index aa90a04..7132392 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -168,12 +168,6 @@ extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
extern int hash__has_transparent_hugepage(void);
#endif
-static inline pmd_t hash__pmd_mkdevmap(pmd_t pmd)
-{
- BUG();
- return pmd;
-}
-
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_HASH_4K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0bf6fd0..0fb5b7d 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -259,7 +259,7 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
*/
static inline int hash__pmd_trans_huge(pmd_t pmd)
{
- return !!((pmd_val(pmd) & (_PAGE_PTE | H_PAGE_THP_HUGE | _PAGE_DEVMAP)) ==
+ return !!((pmd_val(pmd) & (_PAGE_PTE | H_PAGE_THP_HUGE)) ==
(_PAGE_PTE | H_PAGE_THP_HUGE));
}
@@ -281,11 +281,6 @@ extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm,
extern int hash__has_transparent_hugepage(void);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline pmd_t hash__pmd_mkdevmap(pmd_t pmd)
-{
- return __pmd(pmd_val(pmd) | (_PAGE_PTE | H_PAGE_THP_HUGE | _PAGE_DEVMAP));
-}
-
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_HASH_64K_H */
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index a2ddcbb..c198003 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -88,7 +88,6 @@
#define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking */
#define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */
-#define _PAGE_DEVMAP _RPAGE_SW1 /* software: ZONE_DEVICE page */
/*
* Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
@@ -109,7 +108,7 @@
*/
#define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
_PAGE_ACCESSED | H_PAGE_THP_HUGE | _PAGE_PTE | \
- _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
+ _PAGE_SOFT_DIRTY)
/*
* user access blocked by key
*/
@@ -123,7 +122,7 @@
*/
#define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \
_PAGE_ACCESSED | _PAGE_SPECIAL | _PAGE_PTE | \
- _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
+ _PAGE_SOFT_DIRTY)
/*
* We define 2 sets of base prot bits, one for basic pages (ie,
@@ -609,24 +608,6 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}
-static inline pte_t pte_mkdevmap(pte_t pte)
-{
- return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SPECIAL | _PAGE_DEVMAP));
-}
-
-/*
- * This is potentially called with a pmd as the argument, in which case it's not
- * safe to check _PAGE_DEVMAP unless we also confirm that _PAGE_PTE is set.
- * That's because the bit we use for _PAGE_DEVMAP is not reserved for software
- * use in page directory entries (ie. non-ptes).
- */
-static inline int pte_devmap(pte_t pte)
-{
- __be64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE);
-
- return (pte_raw(pte) & mask) == mask;
-}
-
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
/* FIXME!! check whether this need to be a conditional */
@@ -1379,36 +1360,6 @@ static inline bool arch_needs_pgtable_deposit(void)
}
extern void serialize_against_pte_lookup(struct mm_struct *mm);
-
-static inline pmd_t pmd_mkdevmap(pmd_t pmd)
-{
- if (radix_enabled())
- return radix__pmd_mkdevmap(pmd);
- return hash__pmd_mkdevmap(pmd);
-}
-
-static inline pud_t pud_mkdevmap(pud_t pud)
-{
- if (radix_enabled())
- return radix__pud_mkdevmap(pud);
- BUG();
- return pud;
-}
-
-static inline int pmd_devmap(pmd_t pmd)
-{
- return pte_devmap(pmd_pte(pmd));
-}
-
-static inline int pud_devmap(pud_t pud)
-{
- return pte_devmap(pud_pte(pud));
-}
-
-static inline int pgd_devmap(pgd_t pgd)
-{
- return 0;
-}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index 8f55ff7..df23a82 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -264,7 +264,7 @@ static inline int radix__p4d_bad(p4d_t p4d)
static inline int radix__pmd_trans_huge(pmd_t pmd)
{
- return (pmd_val(pmd) & (_PAGE_PTE | _PAGE_DEVMAP)) == _PAGE_PTE;
+ return (pmd_val(pmd) & _PAGE_PTE) == _PAGE_PTE;
}
static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
@@ -274,7 +274,7 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
static inline int radix__pud_trans_huge(pud_t pud)
{
- return (pud_val(pud) & (_PAGE_PTE | _PAGE_DEVMAP)) == _PAGE_PTE;
+ return (pud_val(pud) & _PAGE_PTE) == _PAGE_PTE;
}
static inline pud_t radix__pud_mkhuge(pud_t pud)
@@ -315,16 +315,6 @@ static inline int radix__has_transparent_pud_hugepage(void)
}
#endif
-static inline pmd_t radix__pmd_mkdevmap(pmd_t pmd)
-{
- return __pmd(pmd_val(pmd) | (_PAGE_PTE | _PAGE_DEVMAP));
-}
-
-static inline pud_t radix__pud_mkdevmap(pud_t pud)
-{
- return __pud(pud_val(pud) | (_PAGE_PTE | _PAGE_DEVMAP));
-}
-
struct vmem_altmap;
struct dev_pagemap;
extern int __meminit radix__vmemmap_create_mapping(unsigned long start,
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 36061f4..15f5172 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -46,7 +46,6 @@ config RISCV
select ARCH_HAS_PREEMPT_LAZY
select ARCH_HAS_PREPARE_SYNC_CORE_CMD
select ARCH_HAS_PTDUMP if MMU
- select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP if MMU
select ARCH_HAS_SET_MEMORY if MMU
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 7de05db..1018d22 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -397,24 +397,8 @@ static inline struct page *pgd_page(pgd_t pgd)
p4d_t *p4d_offset(pgd_t *pgd, unsigned long address);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static inline int pte_devmap(pte_t pte);
static inline pte_t pmd_pte(pmd_t pmd);
static inline pte_t pud_pte(pud_t pud);
-
-static inline int pmd_devmap(pmd_t pmd)
-{
- return pte_devmap(pmd_pte(pmd));
-}
-
-static inline int pud_devmap(pud_t pud)
-{
- return pte_devmap(pud_pte(pud));
-}
-
-static inline int pgd_devmap(pgd_t pgd)
-{
- return 0;
-}
#endif
#endif /* _ASM_RISCV_PGTABLE_64_H */
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index a8f5205..179bd4a 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -19,7 +19,6 @@
#define _PAGE_SOFT (3 << 8) /* Reserved for software */
#define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */
-#define _PAGE_DEVMAP (1 << 9) /* RSW, devmap */
#define _PAGE_TABLE _PAGE_PRESENT
/*
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 438ce7d..f0a8c95 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -409,13 +409,6 @@ static inline int pte_special(pte_t pte)
return pte_val(pte) & _PAGE_SPECIAL;
}
-#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
-static inline int pte_devmap(pte_t pte)
-{
- return pte_val(pte) & _PAGE_DEVMAP;
-}
-#endif
-
/* static inline pte_t pte_rdprotect(pte_t pte) */
static inline pte_t pte_wrprotect(pte_t pte)
@@ -457,11 +450,6 @@ static inline pte_t pte_mkspecial(pte_t pte)
return __pte(pte_val(pte) | _PAGE_SPECIAL);
}
-static inline pte_t pte_mkdevmap(pte_t pte)
-{
- return __pte(pte_val(pte) | _PAGE_DEVMAP);
-}
-
static inline pte_t pte_mkhuge(pte_t pte)
{
return pte;
@@ -790,11 +778,6 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return pte_pmd(pte_mkdirty(pmd_pte(pmd)));
}
-static inline pmd_t pmd_mkdevmap(pmd_t pmd)
-{
- return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
-}
-
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
static inline bool pmd_special(pmd_t pmd)
{
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 340e546..bdf8816 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -101,7 +101,6 @@ config X86
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API if X86_64
select ARCH_HAS_PREEMPT_LAZY
- select ARCH_HAS_PTE_DEVMAP if X86_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_HW_PTE_YOUNG
select ARCH_HAS_NONLEAF_PMD_YOUNG if PGTABLE_LEVELS > 2
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 97954c9..e33df3d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -301,16 +301,15 @@ static inline bool pmd_leaf(pmd_t pte)
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/* NOTE: when predicate huge page, consider also pmd_devmap, or use pmd_leaf */
static inline int pmd_trans_huge(pmd_t pmd)
{
- return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
+ return (pmd_val(pmd) & _PAGE_PSE) == _PAGE_PSE;
}
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static inline int pud_trans_huge(pud_t pud)
{
- return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
+ return (pud_val(pud) & _PAGE_PSE) == _PAGE_PSE;
}
#endif
@@ -320,24 +319,6 @@ static inline int has_transparent_hugepage(void)
return boot_cpu_has(X86_FEATURE_PSE);
}
-#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
-static inline int pmd_devmap(pmd_t pmd)
-{
- return !!(pmd_val(pmd) & _PAGE_DEVMAP);
-}
-
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-static inline int pud_devmap(pud_t pud)
-{
- return !!(pud_val(pud) & _PAGE_DEVMAP);
-}
-#else
-static inline int pud_devmap(pud_t pud)
-{
- return 0;
-}
-#endif
-
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
static inline bool pmd_special(pmd_t pmd)
{
@@ -361,12 +342,6 @@ static inline pud_t pud_mkspecial(pud_t pud)
return pud_set_flags(pud, _PAGE_SPECIAL);
}
#endif /* CONFIG_ARCH_SUPPORTS_PUD_PFNMAP */
-
-static inline int pgd_devmap(pgd_t pgd)
-{
- return 0;
-}
-#endif
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
@@ -527,11 +502,6 @@ static inline pte_t pte_mkspecial(pte_t pte)
return pte_set_flags(pte, _PAGE_SPECIAL);
}
-static inline pte_t pte_mkdevmap(pte_t pte)
-{
- return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP);
-}
-
/* See comments above mksaveddirty_shift() */
static inline pmd_t pmd_mksaveddirty(pmd_t pmd)
{
@@ -603,11 +573,6 @@ static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd)
return pmd_set_flags(pmd, _PAGE_DIRTY);
}
-static inline pmd_t pmd_mkdevmap(pmd_t pmd)
-{
- return pmd_set_flags(pmd, _PAGE_DEVMAP);
-}
-
static inline pmd_t pmd_mkhuge(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_PSE);
@@ -673,11 +638,6 @@ static inline pud_t pud_mkdirty(pud_t pud)
return pud_mksaveddirty(pud);
}
-static inline pud_t pud_mkdevmap(pud_t pud)
-{
- return pud_set_flags(pud, _PAGE_DEVMAP);
-}
-
static inline pud_t pud_mkhuge(pud_t pud)
{
return pud_set_flags(pud, _PAGE_PSE);
@@ -1008,13 +968,6 @@ static inline int pte_present(pte_t a)
return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
}
-#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
-static inline int pte_devmap(pte_t a)
-{
- return (pte_flags(a) & _PAGE_DEVMAP) == _PAGE_DEVMAP;
-}
-#endif
-
#define pte_accessible pte_accessible
static inline bool pte_accessible(struct mm_struct *mm, pte_t a)
{
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index b74ec5c..f63ae8d 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -34,7 +34,6 @@
#define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */
#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */
#define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted to large */
-#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
#ifdef CONFIG_X86_64
#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */
@@ -121,11 +120,9 @@
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
-#define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP)
#define _PAGE_SOFTW4 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW4)
#else
#define _PAGE_NX (_AT(pteval_t, 0))
-#define _PAGE_DEVMAP (_AT(pteval_t, 0))
#define _PAGE_SOFTW4 (_AT(pteval_t, 0))
#endif
@@ -154,7 +151,7 @@
#define _COMMON_PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
_PAGE_SPECIAL | _PAGE_ACCESSED | \
_PAGE_DIRTY_BITS | _PAGE_SOFT_DIRTY | \
- _PAGE_DEVMAP | _PAGE_CC | _PAGE_UFFD_WP)
+ _PAGE_CC | _PAGE_UFFD_WP)
#define _PAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PAT)
#define _HPAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_PAT_LARGE)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 912b6d4..7c2f760 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2709,13 +2709,6 @@ static inline pud_t pud_mkspecial(pud_t pud)
}
#endif /* CONFIG_ARCH_SUPPORTS_PUD_PFNMAP */
-#ifndef CONFIG_ARCH_HAS_PTE_DEVMAP
-static inline int pte_devmap(pte_t pte)
-{
- return 0;
-}
-#endif
-
extern pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr,
spinlock_t **ptl);
static inline pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr,
diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
index 2b0d6e0..2869095 100644
--- a/include/linux/pfn_t.h
+++ b/include/linux/pfn_t.h
@@ -94,26 +94,6 @@ static inline pud_t pfn_t_pud(pfn_t pfn, pgprot_t pgprot)
#endif
#endif
-#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
-static inline bool pfn_t_devmap(pfn_t pfn)
-{
- const u64 flags = PFN_MAP;
-
- return (pfn.val & flags) == flags;
-}
-#else
-static inline bool pfn_t_devmap(pfn_t pfn)
-{
- return false;
-}
-pte_t pte_mkdevmap(pte_t pte);
-pmd_t pmd_mkdevmap(pmd_t pmd);
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
- defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
-pud_t pud_mkdevmap(pud_t pud);
-#endif
-#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
-
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
static inline bool pfn_t_special(pfn_t pfn)
{
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 0298e55..1d44394 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1643,21 +1643,6 @@ static inline int pud_write(pud_t pud)
}
#endif /* pud_write */
-#if !defined(CONFIG_ARCH_HAS_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE)
-static inline int pmd_devmap(pmd_t pmd)
-{
- return 0;
-}
-static inline int pud_devmap(pud_t pud)
-{
- return 0;
-}
-static inline int pgd_devmap(pgd_t pgd)
-{
- return 0;
-}
-#endif
-
#if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \
!defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
static inline int pud_trans_huge(pud_t pud)
@@ -1912,8 +1897,8 @@ typedef unsigned int pgtbl_mod_mask;
* - It should contain a huge PFN, which points to a huge page larger than
* PAGE_SIZE of the platform. The PFN format isn't important here.
*
- * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(),
- * pXd_devmap(), or hugetlb mappings).
+ * - It should cover all kinds of huge mappings (i.e. pXd_trans_huge()
+ * or hugetlb mappings).
*/
#ifndef pgd_leaf
#define pgd_leaf(x) false
diff --git a/mm/Kconfig b/mm/Kconfig
index 3b2060a..cb6dfea 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1110,9 +1110,6 @@ config ARCH_HAS_CURRENT_STACK_POINTER
register alias named "current_stack_pointer", this config can be
selected.
-config ARCH_HAS_PTE_DEVMAP
- bool
-
config ARCH_HAS_ZONE_DMA_SET
bool
@@ -1130,7 +1127,6 @@ config ZONE_DEVICE
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
depends on SPARSEMEM_VMEMMAP
- depends on ARCH_HAS_PTE_DEVMAP
select XARRAY_MULTI
help
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 7731b23..d84d0c4 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -348,12 +348,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
vaddr &= HPAGE_PUD_MASK;
pud = pfn_pud(args->pud_pfn, args->page_prot);
- /*
- * Some architectures have debug checks to make sure
- * huge pud mapping are only found with devmap entries
- * For now test with only devmap entries.
- */
- pud = pud_mkdevmap(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
flush_dcache_page(page);
pudp_set_wrprotect(args->mm, vaddr, args->pudp);
@@ -366,7 +360,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
WARN_ON(!pud_none(pud));
#endif /* __PAGETABLE_PMD_FOLDED */
pud = pfn_pud(args->pud_pfn, args->page_prot);
- pud = pud_mkdevmap(pud);
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
@@ -384,7 +377,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
#endif /* __PAGETABLE_PMD_FOLDED */
pud = pfn_pud(args->pud_pfn, args->page_prot);
- pud = pud_mkdevmap(pud);
pud = pud_mkyoung(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
flush_dcache_page(page);
@@ -693,53 +685,6 @@ static void __init pmd_protnone_tests(struct pgtable_debug_args *args)
static void __init pmd_protnone_tests(struct pgtable_debug_args *args) { }
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
-static void __init pte_devmap_tests(struct pgtable_debug_args *args)
-{
- pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
-
- pr_debug("Validating PTE devmap\n");
- WARN_ON(!pte_devmap(pte_mkdevmap(pte)));
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void __init pmd_devmap_tests(struct pgtable_debug_args *args)
-{
- pmd_t pmd;
-
- if (!has_transparent_hugepage())
- return;
-
- pr_debug("Validating PMD devmap\n");
- pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot);
- WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd)));
-}
-
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-static void __init pud_devmap_tests(struct pgtable_debug_args *args)
-{
- pud_t pud;
-
- if (!has_transparent_pud_hugepage())
- return;
-
- pr_debug("Validating PUD devmap\n");
- pud = pfn_pud(args->fixed_pud_pfn, args->page_prot);
- WARN_ON(!pud_devmap(pud_mkdevmap(pud)));
-}
-#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
-static void __init pud_devmap_tests(struct pgtable_debug_args *args) { }
-#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static void __init pmd_devmap_tests(struct pgtable_debug_args *args) { }
-static void __init pud_devmap_tests(struct pgtable_debug_args *args) { }
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-#else
-static void __init pte_devmap_tests(struct pgtable_debug_args *args) { }
-static void __init pmd_devmap_tests(struct pgtable_debug_args *args) { }
-static void __init pud_devmap_tests(struct pgtable_debug_args *args) { }
-#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */
-
static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
{
pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
@@ -1333,10 +1278,6 @@ static int __init debug_vm_pgtable(void)
pte_protnone_tests(&args);
pmd_protnone_tests(&args);
- pte_devmap_tests(&args);
- pmd_devmap_tests(&args);
- pud_devmap_tests(&args);
-
pte_soft_dirty_tests(&args);
pmd_soft_dirty_tests(&args);
pte_swap_soft_dirty_tests(&args);
diff --git a/mm/hmm.c b/mm/hmm.c
index f1b579f..07d507f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -405,8 +405,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
return 0;
}
-#if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && \
- defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
static inline unsigned long pud_to_hmm_pfn_flags(struct hmm_range *range,
pud_t pud)
{
diff --git a/mm/madvise.c b/mm/madvise.c
index 267d8e4..efe5d64 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1069,7 +1069,7 @@ static int guard_install_pud_entry(pud_t *pud, unsigned long addr,
pud_t pudval = pudp_get(pud);
/* If huge return >0 so we abort the operation + zap. */
- return pud_trans_huge(pudval) || pud_devmap(pudval);
+ return pud_trans_huge(pudval);
}
static int guard_install_pmd_entry(pmd_t *pmd, unsigned long addr,
@@ -1078,7 +1078,7 @@ static int guard_install_pmd_entry(pmd_t *pmd, unsigned long addr,
pmd_t pmdval = pmdp_get(pmd);
/* If huge return >0 so we abort the operation + zap. */
- return pmd_trans_huge(pmdval) || pmd_devmap(pmdval);
+ return pmd_trans_huge(pmdval);
}
static int guard_install_pte_entry(pte_t *pte, unsigned long addr,
@@ -1189,7 +1189,7 @@ static int guard_remove_pud_entry(pud_t *pud, unsigned long addr,
pud_t pudval = pudp_get(pud);
/* If huge, cannot have guard pages present, so no-op - skip. */
- if (pud_trans_huge(pudval) || pud_devmap(pudval))
+ if (pud_trans_huge(pudval))
walk->action = ACTION_CONTINUE;
return 0;
@@ -1201,7 +1201,7 @@ static int guard_remove_pmd_entry(pmd_t *pmd, unsigned long addr,
pmd_t pmdval = pmdp_get(pmd);
/* If huge, cannot have guard pages present, so no-op - skip. */
- if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval))
+ if (pmd_trans_huge(pmdval))
walk->action = ACTION_CONTINUE;
return 0;
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 11/14] mm: Remove devmap related functions and page table bits
2025-06-16 11:58 ` [PATCH v2 11/14] mm: Remove devmap related functions and page table bits Alistair Popple
@ 2025-06-17 9:36 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:36 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Will Deacon, Björn Töpel,
Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> Now that DAX and all other reference counts to ZONE_DEVICE pages are
> managed normally there is no need for the special devmap PTE/PMD/PUD
> page table bits. So drop all references to these, freeing up a
> software defined page table bit on architectures supporting it.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Acked-by: Will Deacon <will@kernel.org> # arm64
> Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com>
> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 12/14] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (10 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 11/14] mm: Remove devmap related functions and page table bits Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-16 11:58 ` [PATCH v2 13/14] mm: Remove callers of pfn_t functionality Alistair Popple
2025-06-16 11:58 ` [PATCH v2 14/14] mm/memremap: Remove unused devmap_managed_key Alistair Popple
13 siblings, 0 replies; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
The PFN_MAP flag is no longer used for anything, so remove it.
The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been
used so also remove them. The last user of PFN_SPECIAL was removed
by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
support").
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
Changes since v1:
- Moved this later in series, after PFN_DEV has been removed. This
should solve an issue reported by Marek[1] on RISC-V (and probably
Loongarch and others where pte_devmap() didn't imply pte_special())
[1] - https://lore.kernel.org/linux-mm/957c0d9d-2c37-4d5f-a8b8-8bf90cd0aedb@samsung.com/
---
include/linux/pfn_t.h | 27 ++-------------------------
mm/memory.c | 2 --
tools/testing/nvdimm/test/iomap.c | 4 ----
3 files changed, 2 insertions(+), 31 deletions(-)
diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
index 2869095..2c00293 100644
--- a/include/linux/pfn_t.h
+++ b/include/linux/pfn_t.h
@@ -5,23 +5,11 @@
/*
* PFN_FLAGS_MASK - mask of all the possible valid pfn_t flags
- * PFN_SG_CHAIN - pfn is a pointer to the next scatterlist entry
- * PFN_SG_LAST - pfn references a page and is the last scatterlist entry
- * PFN_MAP - pfn has a dynamic page mapping established by a device driver
- * PFN_SPECIAL - for CONFIG_FS_DAX_LIMITED builds to allow XIP, but not
- * get_user_pages
*/
#define PFN_FLAGS_MASK (((u64) (~PAGE_MASK)) << (BITS_PER_LONG_LONG - PAGE_SHIFT))
-#define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
-#define PFN_SG_LAST (1ULL << (BITS_PER_LONG_LONG - 2))
-#define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4))
-#define PFN_SPECIAL (1ULL << (BITS_PER_LONG_LONG - 5))
#define PFN_FLAGS_TRACE \
- { PFN_SPECIAL, "SPECIAL" }, \
- { PFN_SG_CHAIN, "SG_CHAIN" }, \
- { PFN_SG_LAST, "SG_LAST" }, \
- { PFN_MAP, "MAP" }
+ { }
static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
{
@@ -43,7 +31,7 @@ static inline pfn_t phys_to_pfn_t(phys_addr_t addr, u64 flags)
static inline bool pfn_t_has_page(pfn_t pfn)
{
- return (pfn.val & PFN_MAP) == PFN_MAP;
+ return true;
}
static inline unsigned long pfn_t_to_pfn(pfn_t pfn)
@@ -94,15 +82,4 @@ static inline pud_t pfn_t_pud(pfn_t pfn, pgprot_t pgprot)
#endif
#endif
-#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-static inline bool pfn_t_special(pfn_t pfn)
-{
- return (pfn.val & PFN_SPECIAL) == PFN_SPECIAL;
-}
-#else
-static inline bool pfn_t_special(pfn_t pfn)
-{
- return false;
-}
-#endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
#endif /* _LINUX_PFN_T_H_ */
diff --git a/mm/memory.c b/mm/memory.c
index f69e66d..f1d81ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2581,8 +2581,6 @@ static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite)
/* these checks mirror the abort conditions in vm_normal_page */
if (vma->vm_flags & VM_MIXEDMAP)
return true;
- if (pfn_t_special(pfn))
- return true;
if (is_zero_pfn(pfn_t_to_pfn(pfn)))
return true;
return false;
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index e431372..ddceb04 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -137,10 +137,6 @@ EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
{
- struct nfit_test_resource *nfit_res = get_nfit_res(addr);
-
- if (nfit_res)
- flags &= ~PFN_MAP;
return phys_to_pfn_t(addr, flags);
}
EXPORT_SYMBOL(__wrap_phys_to_pfn_t);
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 13/14] mm: Remove callers of pfn_t functionality
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (11 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 12/14] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:35 ` David Hildenbrand
2025-06-16 11:58 ` [PATCH v2 14/14] mm/memremap: Remove unused devmap_managed_key Alistair Popple
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
All PFN_* pfn_t flags have been removed. Therefore there is no longer
a need for the pfn_t type and all uses can be replaced with normal
pfns.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
Changes since v1:
- Rebased on David's cleanup[1]
[1] https://lore.kernel.org/linux-mm/20250611120654.545963-1-david@redhat.com/
---
arch/x86/mm/pat/memtype.c | 1 +-
drivers/dax/device.c | 23 +++----
drivers/dax/hmem/hmem.c | 1 +-
drivers/dax/kmem.c | 1 +-
drivers/dax/pmem.c | 1 +-
drivers/dax/super.c | 3 +-
drivers/gpu/drm/exynos/exynos_drm_gem.c | 1 +-
drivers/gpu/drm/gma500/fbdev.c | 3 +-
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 1 +-
drivers/gpu/drm/msm/msm_gem.c | 1 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 6 +--
drivers/gpu/drm/v3d/v3d_bo.c | 1 +-
drivers/hwtracing/intel_th/msu.c | 3 +-
drivers/md/dm-linear.c | 2 +-
drivers/md/dm-log-writes.c | 2 +-
drivers/md/dm-stripe.c | 2 +-
drivers/md/dm-target.c | 2 +-
drivers/md/dm-writecache.c | 11 +--
drivers/md/dm.c | 2 +-
drivers/nvdimm/pmem.c | 8 +--
drivers/nvdimm/pmem.h | 4 +-
drivers/s390/block/dcssblk.c | 9 +--
drivers/vfio/pci/vfio_pci_core.c | 5 +-
fs/cramfs/inode.c | 5 +-
fs/dax.c | 50 +++++++--------
fs/ext4/file.c | 2 +-
fs/fuse/dax.c | 3 +-
fs/fuse/virtio_fs.c | 5 +-
fs/xfs/xfs_file.c | 2 +-
include/linux/dax.h | 9 +--
include/linux/device-mapper.h | 2 +-
include/linux/huge_mm.h | 6 +-
include/linux/mm.h | 4 +-
include/linux/pfn.h | 9 +---
include/linux/pfn_t.h | 85 +-------------------------
mm/debug_vm_pgtable.c | 1 +-
mm/huge_memory.c | 21 +++---
mm/memory.c | 31 ++++-----
mm/memremap.c | 1 +-
mm/migrate.c | 1 +-
tools/testing/nvdimm/pmem-dax.c | 6 +-
tools/testing/nvdimm/test/iomap.c | 7 +--
tools/testing/nvdimm/test/nfit_test.h | 1 +-
43 files changed, 109 insertions(+), 235 deletions(-)
delete mode 100644 include/linux/pfn_t.h
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 2e79238..c092843 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -36,7 +36,6 @@
#include <linux/debugfs.h>
#include <linux/ioport.h>
#include <linux/kernel.h>
-#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/io.h>
#include <linux/mm.h>
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 328231c..2bb40a6 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -4,7 +4,6 @@
#include <linux/pagemap.h>
#include <linux/module.h>
#include <linux/device.h>
-#include <linux/pfn_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/dax.h>
@@ -73,7 +72,7 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff,
return -1;
}
-static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
+static void dax_set_mapping(struct vm_fault *vmf, unsigned long pfn,
unsigned long fault_size)
{
unsigned long i, nr_pages = fault_size / PAGE_SIZE;
@@ -89,7 +88,7 @@ static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn,
ALIGN_DOWN(vmf->address, fault_size));
for (i = 0; i < nr_pages; i++) {
- struct folio *folio = pfn_folio(pfn_t_to_pfn(pfn) + i);
+ struct folio *folio = pfn_folio(pfn + i);
if (folio->mapping)
continue;
@@ -104,7 +103,7 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
{
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
- pfn_t pfn;
+ unsigned long pfn;
unsigned int fault_size = PAGE_SIZE;
if (check_vma(dev_dax, vmf->vma, __func__))
@@ -125,11 +124,11 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- pfn = phys_to_pfn_t(phys, 0);
+ pfn = PHYS_PFN(phys);
dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_page_mkwrite(vmf, pfn_t_to_page(pfn),
+ return vmf_insert_page_mkwrite(vmf, pfn_to_page(pfn),
vmf->flags & FAULT_FLAG_WRITE);
}
@@ -140,7 +139,7 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
pgoff_t pgoff;
- pfn_t pfn;
+ unsigned long pfn;
unsigned int fault_size = PMD_SIZE;
if (check_vma(dev_dax, vmf->vma, __func__))
@@ -169,11 +168,11 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- pfn = phys_to_pfn_t(phys, 0);
+ pfn = PHYS_PFN(phys);
dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_folio_pmd(vmf, page_folio(pfn_t_to_page(pfn)),
+ return vmf_insert_folio_pmd(vmf, page_folio(pfn_to_page(pfn)),
vmf->flags & FAULT_FLAG_WRITE);
}
@@ -185,7 +184,7 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
struct device *dev = &dev_dax->dev;
phys_addr_t phys;
pgoff_t pgoff;
- pfn_t pfn;
+ unsigned long pfn;
unsigned int fault_size = PUD_SIZE;
@@ -215,11 +214,11 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax,
return VM_FAULT_SIGBUS;
}
- pfn = phys_to_pfn_t(phys, 0);
+ pfn = PHYS_PFN(phys);
dax_set_mapping(vmf, pfn, fault_size);
- return vmf_insert_folio_pud(vmf, page_folio(pfn_t_to_page(pfn)),
+ return vmf_insert_folio_pud(vmf, page_folio(pfn_to_page(pfn)),
vmf->flags & FAULT_FLAG_WRITE);
}
#else
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 5e7c53f..c18451a 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -2,7 +2,6 @@
#include <linux/platform_device.h>
#include <linux/memregion.h>
#include <linux/module.h>
-#include <linux/pfn_t.h>
#include <linux/dax.h>
#include "../bus.h"
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 584c70a..c036e4d 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -5,7 +5,6 @@
#include <linux/memory.h>
#include <linux/module.h>
#include <linux/device.h>
-#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/dax.h>
#include <linux/fs.h>
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index c8ebf4e..bee9306 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -2,7 +2,6 @@
/* Copyright(c) 2016 - 2018 Intel Corporation. All rights reserved. */
#include <linux/memremap.h>
#include <linux/module.h>
-#include <linux/pfn_t.h>
#include "../nvdimm/pfn.h"
#include "../nvdimm/nd.h"
#include "bus.h"
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e16d1d4..54c480e 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,7 +7,6 @@
#include <linux/mount.h>
#include <linux/pseudo_fs.h>
#include <linux/magic.h>
-#include <linux/pfn_t.h>
#include <linux/cdev.h>
#include <linux/slab.h>
#include <linux/uio.h>
@@ -148,7 +147,7 @@ enum dax_device_flags {
* pages accessible at the device relative @pgoff.
*/
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
- enum dax_access_mode mode, void **kaddr, pfn_t *pfn)
+ enum dax_access_mode mode, void **kaddr, unsigned long *pfn)
{
long avail;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 4787fee..84b2172 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -7,7 +7,6 @@
#include <linux/dma-buf.h>
-#include <linux/pfn_t.h>
#include <linux/shmem_fs.h>
#include <linux/module.h>
diff --git a/drivers/gpu/drm/gma500/fbdev.c b/drivers/gpu/drm/gma500/fbdev.c
index 109efdc..68b825f 100644
--- a/drivers/gpu/drm/gma500/fbdev.c
+++ b/drivers/gpu/drm/gma500/fbdev.c
@@ -6,7 +6,6 @@
**************************************************************************/
#include <linux/fb.h>
-#include <linux/pfn_t.h>
#include <drm/drm_crtc_helper.h>
#include <drm/drm_drv.h>
@@ -33,7 +32,7 @@ static vm_fault_t psb_fbdev_vm_fault(struct vm_fault *vmf)
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
for (i = 0; i < page_num; ++i) {
- err = vmf_insert_mixed(vma, address, __pfn_to_pfn_t(pfn, 0));
+ err = vmf_insert_mixed(vma, address, pfn);
if (unlikely(err & VM_FAULT_ERROR))
break;
address += PAGE_SIZE;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index f6d37df..75f5b0e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -5,7 +5,6 @@
#include <linux/anon_inodes.h>
#include <linux/mman.h>
-#include <linux/pfn_t.h>
#include <linux/sizes.h>
#include <drm/drm_cache.h>
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 2995e80..20bf31f 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -9,7 +9,6 @@
#include <linux/spinlock.h>
#include <linux/shmem_fs.h>
#include <linux/dma-buf.h>
-#include <linux/pfn_t.h>
#include <drm/drm_prime.h>
#include <drm/drm_file.h>
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 9df05b2..381552b 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -8,7 +8,6 @@
#include <linux/seq_file.h>
#include <linux/shmem_fs.h>
#include <linux/spinlock.h>
-#include <linux/pfn_t.h>
#include <linux/vmalloc.h>
#include <drm/drm_prime.h>
@@ -371,7 +370,7 @@ static vm_fault_t omap_gem_fault_1d(struct drm_gem_object *obj,
VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->address,
pfn, pfn << PAGE_SHIFT);
- return vmf_insert_mixed(vma, vmf->address, __pfn_to_pfn_t(pfn, 0));
+ return vmf_insert_mixed(vma, vmf->address, pfn);
}
/* Special handling for the case of faulting in 2d tiled buffers */
@@ -466,8 +465,7 @@ static vm_fault_t omap_gem_fault_2d(struct drm_gem_object *obj,
pfn, pfn << PAGE_SHIFT);
for (i = n; i > 0; i--) {
- ret = vmf_insert_mixed(vma,
- vaddr, __pfn_to_pfn_t(pfn, 0));
+ ret = vmf_insert_mixed(vma, vaddr, pfn);
if (ret & VM_FAULT_ERROR)
break;
pfn += priv->usergart[fmt].stride_pfn;
diff --git a/drivers/gpu/drm/v3d/v3d_bo.c b/drivers/gpu/drm/v3d/v3d_bo.c
index bb78155..c41476d 100644
--- a/drivers/gpu/drm/v3d/v3d_bo.c
+++ b/drivers/gpu/drm/v3d/v3d_bo.c
@@ -16,7 +16,6 @@
*/
#include <linux/dma-buf.h>
-#include <linux/pfn_t.h>
#include <linux/vmalloc.h>
#include "v3d_drv.h"
diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index 7163950..f3a13b3 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -19,7 +19,6 @@
#include <linux/io.h>
#include <linux/workqueue.h>
#include <linux/dma-mapping.h>
-#include <linux/pfn_t.h>
#ifdef CONFIG_X86
#include <asm/set_memory.h>
@@ -1618,7 +1617,7 @@ static vm_fault_t msc_mmap_fault(struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
get_page(page);
- return vmf_insert_mixed(vmf->vma, vmf->address, page_to_pfn_t(page));
+ return vmf_insert_mixed(vmf->vma, vmf->address, page_to_pfn(page));
}
static const struct vm_operations_struct msc_mmap_ops = {
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 15538ec..73bf290 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -170,7 +170,7 @@ static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index d484e8e..679b07d 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -893,7 +893,7 @@ static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a7dc04b..366f461 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -316,7 +316,7 @@ static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c
index 652627a..2af5a95 100644
--- a/drivers/md/dm-target.c
+++ b/drivers/md/dm-target.c
@@ -255,7 +255,7 @@ static void io_err_io_hints(struct dm_target *ti, struct queue_limits *limits)
static long io_err_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
return -EIO;
}
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index a428e1c..d8de4a3 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -13,7 +13,6 @@
#include <linux/dm-io.h>
#include <linux/dm-kcopyd.h>
#include <linux/dax.h>
-#include <linux/pfn_t.h>
#include <linux/libnvdimm.h>
#include <linux/delay.h>
#include "dm-io-tracker.h"
@@ -256,7 +255,7 @@ static int persistent_memory_claim(struct dm_writecache *wc)
int r;
loff_t s;
long p, da;
- pfn_t pfn;
+ unsigned long pfn;
int id;
struct page **pages;
sector_t offset;
@@ -290,7 +289,7 @@ static int persistent_memory_claim(struct dm_writecache *wc)
r = da;
goto err2;
}
- if (!pfn_t_has_page(pfn)) {
+ if (!pfn_valid(pfn)) {
wc->memory_map = NULL;
r = -EOPNOTSUPP;
goto err2;
@@ -314,13 +313,13 @@ static int persistent_memory_claim(struct dm_writecache *wc)
r = daa ? daa : -EINVAL;
goto err3;
}
- if (!pfn_t_has_page(pfn)) {
+ if (!pfn_valid(pfn)) {
r = -EOPNOTSUPP;
goto err3;
}
while (daa-- && i < p) {
- pages[i++] = pfn_t_to_page(pfn);
- pfn.val++;
+ pages[i++] = pfn_to_page(pfn);
+ pfn++;
if (!(i & 15))
cond_resched();
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1726f0f..4b9415f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1218,7 +1218,7 @@ static struct dm_target *dm_dax_get_live_target(struct mapped_device *md,
static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
struct mapped_device *md = dax_get_private(dax_dev);
sector_t sector = pgoff * PAGE_SECTORS;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index aa50006..05785ff 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -20,7 +20,6 @@
#include <linux/kstrtox.h>
#include <linux/vmalloc.h>
#include <linux/blk-mq.h>
-#include <linux/pfn_t.h>
#include <linux/slab.h>
#include <linux/uio.h>
#include <linux/dax.h>
@@ -242,7 +241,7 @@ static void pmem_submit_bio(struct bio *bio)
/* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
__weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT;
@@ -254,7 +253,7 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
if (kaddr)
*kaddr = pmem->virt_addr + offset;
if (pfn)
- *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ *pfn = PHYS_PFN(pmem->phys_addr + offset);
if (bb->count &&
badblocks_check(bb, sector, num, &first_bad, &num_bad)) {
@@ -303,7 +302,7 @@ static int pmem_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
static long pmem_dax_direct_access(struct dax_device *dax_dev,
pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
- void **kaddr, pfn_t *pfn)
+ void **kaddr, unsigned long *pfn)
{
struct pmem_device *pmem = dax_get_private(dax_dev);
@@ -513,7 +512,6 @@ static int pmem_attach_disk(struct device *dev,
pmem->disk = disk;
pmem->pgmap.owner = pmem;
- pmem->pfn_flags = 0;
if (is_nd_pfn(dev)) {
pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
pmem->pgmap.ops = &fsdax_pagemap_ops;
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 392b0b3..a48509f 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -5,7 +5,6 @@
#include <linux/badblocks.h>
#include <linux/memremap.h>
#include <linux/types.h>
-#include <linux/pfn_t.h>
#include <linux/fs.h>
enum dax_access_mode;
@@ -16,7 +15,6 @@ struct pmem_device {
phys_addr_t phys_addr;
/* when non-zero this device is hosting a 'pfn' instance */
phys_addr_t data_offset;
- u64 pfn_flags;
void *virt_addr;
/* immutable base size of the namespace */
size_t size;
@@ -31,7 +29,7 @@ struct pmem_device {
long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn);
+ unsigned long *pfn);
#ifdef CONFIG_MEMORY_FAILURE
static inline bool test_and_clear_pmem_poison(struct page *page)
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 249ae40..94fa5ed 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -17,7 +17,6 @@
#include <linux/blkdev.h>
#include <linux/completion.h>
#include <linux/interrupt.h>
-#include <linux/pfn_t.h>
#include <linux/uio.h>
#include <linux/dax.h>
#include <linux/io.h>
@@ -33,7 +32,7 @@ static void dcssblk_release(struct gendisk *disk);
static void dcssblk_submit_bio(struct bio *bio);
static long dcssblk_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn);
+ unsigned long *pfn);
static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
@@ -914,7 +913,7 @@ dcssblk_submit_bio(struct bio *bio)
static long
__dcssblk_direct_access(struct dcssblk_dev_info *dev_info, pgoff_t pgoff,
- long nr_pages, void **kaddr, pfn_t *pfn)
+ long nr_pages, void **kaddr, unsigned long *pfn)
{
resource_size_t offset = pgoff * PAGE_SIZE;
unsigned long dev_sz;
@@ -923,7 +922,7 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, pgoff_t pgoff,
if (kaddr)
*kaddr = __va(dev_info->start + offset);
if (pfn)
- *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), 0);
+ *pfn = PFN_DOWN(dev_info->start + offset);
return (dev_sz - offset) / PAGE_SIZE;
}
@@ -931,7 +930,7 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, pgoff_t pgoff,
static long
dcssblk_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
struct dcssblk_dev_info *dev_info = dax_get_private(dax_dev);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f2ad5f..31bdb91 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -20,7 +20,6 @@
#include <linux/mutex.h>
#include <linux/notifier.h>
#include <linux/pci.h>
-#include <linux/pfn_t.h>
#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -1669,12 +1668,12 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
break;
#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
case PMD_ORDER:
- ret = vmf_insert_pfn_pmd(vmf, __pfn_to_pfn_t(pfn, 0), false);
+ ret = vmf_insert_pfn_pmd(vmf, pfn, false);
break;
#endif
#ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP
case PUD_ORDER:
- ret = vmf_insert_pfn_pud(vmf, __pfn_to_pfn_t(pfn, 0), false);
+ ret = vmf_insert_pfn_pud(vmf, pfn, false);
break;
#endif
default:
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 820a664..b002e9b 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -17,7 +17,6 @@
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/pagemap.h>
-#include <linux/pfn_t.h>
#include <linux/ramfs.h>
#include <linux/init.h>
#include <linux/string.h>
@@ -412,8 +411,8 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
for (i = 0; i < pages && !ret; i++) {
vm_fault_t vmf;
unsigned long off = i * PAGE_SIZE;
- pfn_t pfn = phys_to_pfn_t(address + off, 0);
- vmf = vmf_insert_mixed(vma, vma->vm_start + off, pfn);
+ vmf = vmf_insert_mixed(vma, vma->vm_start + off,
+ address + off);
if (vmf & VM_FAULT_ERROR)
ret = vm_fault_to_errno(vmf, 0);
}
diff --git a/fs/dax.c b/fs/dax.c
index f4ffb69..4229513 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -20,7 +20,6 @@
#include <linux/sched/signal.h>
#include <linux/uio.h>
#include <linux/vmstat.h>
-#include <linux/pfn_t.h>
#include <linux/sizes.h>
#include <linux/mmu_notifier.h>
#include <linux/iomap.h>
@@ -76,9 +75,9 @@ static struct folio *dax_to_folio(void *entry)
return page_folio(pfn_to_page(dax_to_pfn(entry)));
}
-static void *dax_make_entry(pfn_t pfn, unsigned long flags)
+static void *dax_make_entry(unsigned long pfn, unsigned long flags)
{
- return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT));
+ return xa_mk_value(flags | (pfn << DAX_SHIFT));
}
static bool dax_is_locked(void *entry)
@@ -713,7 +712,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
if (order > 0)
flags |= DAX_PMD;
- entry = dax_make_entry(pfn_to_pfn_t(0), flags);
+ entry = dax_make_entry(0, flags);
dax_lock_entry(xas, entry);
if (xas_error(xas))
goto out_unlock;
@@ -1041,7 +1040,7 @@ static bool dax_fault_is_synchronous(const struct iomap_iter *iter,
* appropriate.
*/
static void *dax_insert_entry(struct xa_state *xas, struct vm_fault *vmf,
- const struct iomap_iter *iter, void *entry, pfn_t pfn,
+ const struct iomap_iter *iter, void *entry, unsigned long pfn,
unsigned long flags)
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
@@ -1239,7 +1238,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
- size_t size, void **kaddr, pfn_t *pfnp)
+ size_t size, void **kaddr, unsigned long *pfnp)
{
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
int id, rc = 0;
@@ -1257,7 +1256,7 @@ static int dax_iomap_direct_access(const struct iomap *iomap, loff_t pos,
rc = -EINVAL;
if (PFN_PHYS(length) < size)
goto out;
- if (pfn_t_to_pfn(*pfnp) & (PHYS_PFN(size)-1))
+ if (*pfnp & (PHYS_PFN(size)-1))
goto out;
rc = 0;
@@ -1361,12 +1360,12 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, struct vm_fault *vmf,
{
struct inode *inode = iter->inode;
unsigned long vaddr = vmf->address;
- pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr));
+ unsigned long pfn = my_zero_pfn(vaddr);
vm_fault_t ret;
*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn, DAX_ZERO_PAGE);
- ret = vmf_insert_page_mkwrite(vmf, pfn_t_to_page(pfn), false);
+ ret = vmf_insert_page_mkwrite(vmf, pfn_to_page(pfn), false);
trace_dax_load_hole(inode, vmf, ret);
return ret;
}
@@ -1383,14 +1382,14 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
struct folio *zero_folio;
spinlock_t *ptl;
pmd_t pmd_entry;
- pfn_t pfn;
+ unsigned long pfn;
zero_folio = mm_get_huge_zero_folio(vmf->vma->vm_mm);
if (unlikely(!zero_folio))
goto fallback;
- pfn = page_to_pfn_t(&zero_folio->page);
+ pfn = page_to_pfn(&zero_folio->page);
*entry = dax_insert_entry(xas, vmf, iter, *entry, pfn,
DAX_PMD | DAX_ZERO_PAGE);
@@ -1779,7 +1778,8 @@ static vm_fault_t dax_fault_return(int error)
* insertion for now and return the pfn so that caller can insert it after the
* fsync is done.
*/
-static vm_fault_t dax_fault_synchronous_pfnp(pfn_t *pfnp, pfn_t pfn)
+static vm_fault_t dax_fault_synchronous_pfnp(unsigned long *pfnp,
+ unsigned long pfn)
{
if (WARN_ON_ONCE(!pfnp))
return VM_FAULT_SIGBUS;
@@ -1827,7 +1827,7 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
* @pmd: distinguish whether it is a pmd fault
*/
static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
- const struct iomap_iter *iter, pfn_t *pfnp,
+ const struct iomap_iter *iter, unsigned long *pfnp,
struct xa_state *xas, void **entry, bool pmd)
{
const struct iomap *iomap = &iter->iomap;
@@ -1838,7 +1838,7 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
unsigned long entry_flags = pmd ? DAX_PMD : 0;
struct folio *folio;
int ret, err = 0;
- pfn_t pfn;
+ unsigned long pfn;
void *kaddr;
if (!pmd && vmf->cow_page)
@@ -1875,16 +1875,15 @@ static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
folio_ref_inc(folio);
if (pmd)
- ret = vmf_insert_folio_pmd(vmf, pfn_folio(pfn_t_to_pfn(pfn)),
- write);
+ ret = vmf_insert_folio_pmd(vmf, pfn_folio(pfn), write);
else
- ret = vmf_insert_page_mkwrite(vmf, pfn_t_to_page(pfn), write);
+ ret = vmf_insert_page_mkwrite(vmf, pfn_to_page(pfn), write);
folio_put(folio);
return ret;
}
-static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
+static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, unsigned long *pfnp,
int *iomap_errp, const struct iomap_ops *ops)
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
@@ -1996,7 +1995,7 @@ static bool dax_fault_check_fallback(struct vm_fault *vmf, struct xa_state *xas,
return false;
}
-static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
+static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, unsigned long *pfnp,
const struct iomap_ops *ops)
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
@@ -2077,7 +2076,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
return ret;
}
#else
-static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
+static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, unsigned long *pfnp,
const struct iomap_ops *ops)
{
return VM_FAULT_FALLBACK;
@@ -2098,7 +2097,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
* successfully.
*/
vm_fault_t dax_iomap_fault(struct vm_fault *vmf, unsigned int order,
- pfn_t *pfnp, int *iomap_errp, const struct iomap_ops *ops)
+ unsigned long *pfnp, int *iomap_errp,
+ const struct iomap_ops *ops)
{
if (order == 0)
return dax_iomap_pte_fault(vmf, pfnp, iomap_errp, ops);
@@ -2118,8 +2118,8 @@ EXPORT_SYMBOL_GPL(dax_iomap_fault);
* This function inserts a writeable PTE or PMD entry into the page tables
* for an mmaped DAX file. It also marks the page cache entry as dirty.
*/
-static vm_fault_t
-dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order)
+static vm_fault_t dax_insert_pfn_mkwrite(struct vm_fault *vmf,
+ unsigned long pfn, unsigned int order)
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, order);
@@ -2141,7 +2141,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order)
xas_set_mark(&xas, PAGECACHE_TAG_DIRTY);
dax_lock_entry(&xas, entry);
xas_unlock_irq(&xas);
- folio = pfn_folio(pfn_t_to_pfn(pfn));
+ folio = pfn_folio(pfn);
folio_ref_inc(folio);
if (order == 0)
ret = vmf_insert_page_mkwrite(vmf, &folio->page, true);
@@ -2168,7 +2168,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order)
* table entry.
*/
vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, unsigned int order,
- pfn_t pfn)
+ unsigned long pfn)
{
int err;
loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 21df813..e6e9629 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -747,7 +747,7 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, unsigned int order)
bool write = (vmf->flags & FAULT_FLAG_WRITE) &&
(vmf->vma->vm_flags & VM_SHARED);
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
- pfn_t pfn;
+ unsigned long pfn;
if (write) {
sb_start_pagefault(sb);
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 0502bf3..ac6d4c1 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -10,7 +10,6 @@
#include <linux/dax.h>
#include <linux/uio.h>
#include <linux/pagemap.h>
-#include <linux/pfn_t.h>
#include <linux/iomap.h>
#include <linux/interval_tree.h>
@@ -757,7 +756,7 @@ static vm_fault_t __fuse_dax_fault(struct vm_fault *vmf, unsigned int order,
vm_fault_t ret;
struct inode *inode = file_inode(vmf->vma->vm_file);
struct super_block *sb = inode->i_sb;
- pfn_t pfn;
+ unsigned long pfn;
int error = 0;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_conn_dax *fcd = fc->dax;
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 53c2626..aac914b 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -9,7 +9,6 @@
#include <linux/pci.h>
#include <linux/interrupt.h>
#include <linux/group_cpus.h>
-#include <linux/pfn_t.h>
#include <linux/memremap.h>
#include <linux/module.h>
#include <linux/virtio.h>
@@ -1008,7 +1007,7 @@ static void virtio_fs_cleanup_vqs(struct virtio_device *vdev)
*/
static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode,
- void **kaddr, pfn_t *pfn)
+ void **kaddr, unsigned long *pfn)
{
struct virtio_fs *fs = dax_get_private(dax_dev);
phys_addr_t offset = PFN_PHYS(pgoff);
@@ -1017,7 +1016,7 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
if (kaddr)
*kaddr = fs->window_kaddr + offset;
if (pfn)
- *pfn = phys_to_pfn_t(fs->window_phys_addr + offset, 0);
+ *pfn = fs->window_phys_addr + offset;
return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
}
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 48254a7..0b2db25 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1729,7 +1729,7 @@ xfs_dax_fault_locked(
bool write_fault)
{
vm_fault_t ret;
- pfn_t pfn;
+ unsigned long pfn;
if (!IS_ENABLED(CONFIG_FS_DAX)) {
ASSERT(0);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index dcc9fcd..29eec75 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -26,7 +26,7 @@ struct dax_operations {
* number of pages available for DAX at that pfn.
*/
long (*direct_access)(struct dax_device *, pgoff_t, long,
- enum dax_access_mode, void **, pfn_t *);
+ enum dax_access_mode, void **, unsigned long *);
/* zero_page_range: required operation. Zero page range */
int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
/*
@@ -241,7 +241,7 @@ static inline void dax_break_layout_final(struct inode *inode)
bool dax_alive(struct dax_device *dax_dev);
void *dax_get_private(struct dax_device *dax_dev);
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
- enum dax_access_mode mode, void **kaddr, pfn_t *pfn);
+ enum dax_access_mode mode, void **kaddr, unsigned long *pfn);
size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
size_t bytes, struct iov_iter *i);
size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
@@ -255,9 +255,10 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
vm_fault_t dax_iomap_fault(struct vm_fault *vmf, unsigned int order,
- pfn_t *pfnp, int *errp, const struct iomap_ops *ops);
+ unsigned long *pfnp, int *errp,
+ const struct iomap_ops *ops);
vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
- unsigned int order, pfn_t pfn);
+ unsigned int order, unsigned long pfn);
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
void dax_delete_mapping_range(struct address_space *mapping,
loff_t start, loff_t end);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index cb95951..84fdc3a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -156,7 +156,7 @@ typedef int (*dm_busy_fn) (struct dm_target *ti);
*/
typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff,
long nr_pages, enum dax_access_mode node, void **kaddr,
- pfn_t *pfn);
+ unsigned long *pfn);
typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
size_t nr_pages);
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 21b3f0b..35e34e6 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -37,8 +37,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr, pgprot_t newprot,
unsigned long cp_flags);
-vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
-vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
+vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, unsigned long pfn,
+ bool write);
+vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, unsigned long pfn,
+ bool write);
vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio,
bool write);
vm_fault_t vmf_insert_folio_pud(struct vm_fault *vmf, struct folio *folio,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7c2f760..98a6069 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3527,9 +3527,9 @@ vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, pgprot_t pgprot);
vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
- pfn_t pfn);
+ unsigned long pfn);
vm_fault_t vmf_insert_mixed_mkwrite(struct vm_area_struct *vma,
- unsigned long addr, pfn_t pfn);
+ unsigned long addr, unsigned long pfn);
int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len);
static inline vm_fault_t vmf_insert_page(struct vm_area_struct *vma,
diff --git a/include/linux/pfn.h b/include/linux/pfn.h
index 14bc053..b90ca0b 100644
--- a/include/linux/pfn.h
+++ b/include/linux/pfn.h
@@ -4,15 +4,6 @@
#ifndef __ASSEMBLY__
#include <linux/types.h>
-
-/*
- * pfn_t: encapsulates a page-frame number that is optionally backed
- * by memmap (struct page). Whether a pfn_t has a 'struct page'
- * backing is indicated by flags in the high bits of the value.
- */
-typedef struct {
- u64 val;
-} pfn_t;
#endif
#define PFN_ALIGN(x) (((unsigned long)(x) + (PAGE_SIZE - 1)) & PAGE_MASK)
diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h
deleted file mode 100644
index 2c00293..0000000
--- a/include/linux/pfn_t.h
+++ /dev/null
@@ -1,85 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_PFN_T_H_
-#define _LINUX_PFN_T_H_
-#include <linux/mm.h>
-
-/*
- * PFN_FLAGS_MASK - mask of all the possible valid pfn_t flags
- */
-#define PFN_FLAGS_MASK (((u64) (~PAGE_MASK)) << (BITS_PER_LONG_LONG - PAGE_SHIFT))
-
-#define PFN_FLAGS_TRACE \
- { }
-
-static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags)
-{
- pfn_t pfn_t = { .val = pfn | (flags & PFN_FLAGS_MASK), };
-
- return pfn_t;
-}
-
-/* a default pfn to pfn_t conversion assumes that @pfn is pfn_valid() */
-static inline pfn_t pfn_to_pfn_t(unsigned long pfn)
-{
- return __pfn_to_pfn_t(pfn, 0);
-}
-
-static inline pfn_t phys_to_pfn_t(phys_addr_t addr, u64 flags)
-{
- return __pfn_to_pfn_t(addr >> PAGE_SHIFT, flags);
-}
-
-static inline bool pfn_t_has_page(pfn_t pfn)
-{
- return true;
-}
-
-static inline unsigned long pfn_t_to_pfn(pfn_t pfn)
-{
- return pfn.val & ~PFN_FLAGS_MASK;
-}
-
-static inline struct page *pfn_t_to_page(pfn_t pfn)
-{
- if (pfn_t_has_page(pfn))
- return pfn_to_page(pfn_t_to_pfn(pfn));
- return NULL;
-}
-
-static inline phys_addr_t pfn_t_to_phys(pfn_t pfn)
-{
- return PFN_PHYS(pfn_t_to_pfn(pfn));
-}
-
-static inline pfn_t page_to_pfn_t(struct page *page)
-{
- return pfn_to_pfn_t(page_to_pfn(page));
-}
-
-static inline int pfn_t_valid(pfn_t pfn)
-{
- return pfn_valid(pfn_t_to_pfn(pfn));
-}
-
-#ifdef CONFIG_MMU
-static inline pte_t pfn_t_pte(pfn_t pfn, pgprot_t pgprot)
-{
- return pfn_pte(pfn_t_to_pfn(pfn), pgprot);
-}
-#endif
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static inline pmd_t pfn_t_pmd(pfn_t pfn, pgprot_t pgprot)
-{
- return pfn_pmd(pfn_t_to_pfn(pfn), pgprot);
-}
-
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-static inline pud_t pfn_t_pud(pfn_t pfn, pgprot_t pgprot)
-{
- return pfn_pud(pfn_t_to_pfn(pfn), pgprot);
-}
-#endif
-#endif
-
-#endif /* _LINUX_PFN_T_H_ */
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index d84d0c4..bd8f931 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -20,7 +20,6 @@
#include <linux/mman.h>
#include <linux/mm_types.h>
#include <linux/module.h>
-#include <linux/pfn_t.h>
#include <linux/printk.h>
#include <linux/pgtable.h>
#include <linux/random.h>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 642fd83..d9a76e7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -22,7 +22,6 @@
#include <linux/mm_types.h>
#include <linux/khugepaged.h>
#include <linux/freezer.h>
-#include <linux/pfn_t.h>
#include <linux/mman.h>
#include <linux/memremap.h>
#include <linux/pagemap.h>
@@ -1375,7 +1374,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
struct folio_or_pfn {
union {
struct folio *folio;
- pfn_t pfn;
+ unsigned long pfn;
};
bool is_folio;
};
@@ -1391,7 +1390,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!pmd_none(*pmd)) {
const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
- pfn_t_to_pfn(fop.pfn);
+ fop.pfn;
if (write) {
if (pmd_pfn(*pmd) != pfn) {
@@ -1414,7 +1413,7 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma);
add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR);
} else {
- entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot));
+ entry = pmd_mkhuge(pfn_pmd(fop.pfn, prot));
entry = pmd_mkspecial(entry);
}
if (write) {
@@ -1442,7 +1441,8 @@ static int insert_pmd(struct vm_area_struct *vma, unsigned long addr,
*
* Return: vm_fault_t value.
*/
-vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
+vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, unsigned long pfn,
+ bool write)
{
unsigned long addr = vmf->address & PMD_MASK;
struct vm_area_struct *vma = vmf->vma;
@@ -1473,7 +1473,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
return VM_FAULT_OOM;
}
- pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
+ pfnmap_setup_cachemode_pfn(pfn, &pgprot);
ptl = pmd_lock(vma->vm_mm, vmf->pmd);
error = insert_pmd(vma, addr, vmf->pmd, fop, pgprot, write,
@@ -1539,7 +1539,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
if (!pud_none(*pud)) {
const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) :
- pfn_t_to_pfn(fop.pfn);
+ fop.pfn;
if (write) {
if (WARN_ON_ONCE(pud_pfn(*pud) != pfn))
@@ -1559,7 +1559,7 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
folio_add_file_rmap_pud(fop.folio, &fop.folio->page, vma);
add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR);
} else {
- entry = pud_mkhuge(pfn_t_pud(fop.pfn, prot));
+ entry = pud_mkhuge(pfn_pud(fop.pfn, prot));
entry = pud_mkspecial(entry);
}
if (write) {
@@ -1580,7 +1580,8 @@ static void insert_pud(struct vm_area_struct *vma, unsigned long addr,
*
* Return: vm_fault_t value.
*/
-vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write)
+vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, unsigned long pfn,
+ bool write)
{
unsigned long addr = vmf->address & PUD_MASK;
struct vm_area_struct *vma = vmf->vma;
@@ -1603,7 +1604,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write)
if (addr < vma->vm_start || addr >= vma->vm_end)
return VM_FAULT_SIGBUS;
- pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
+ pfnmap_setup_cachemode_pfn(pfn, &pgprot);
ptl = pud_lock(vma->vm_mm, vmf->pud);
insert_pud(vma, addr, vmf->pud, fop, pgprot, write);
diff --git a/mm/memory.c b/mm/memory.c
index f1d81ad..0163d12 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -57,7 +57,6 @@
#include <linux/export.h>
#include <linux/delayacct.h>
#include <linux/init.h>
-#include <linux/pfn_t.h>
#include <linux/writeback.h>
#include <linux/memcontrol.h>
#include <linux/mmu_notifier.h>
@@ -2435,7 +2434,7 @@ int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
EXPORT_SYMBOL(vm_map_pages_zero);
static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
- pfn_t pfn, pgprot_t prot, bool mkwrite)
+ unsigned long pfn, pgprot_t prot, bool mkwrite)
{
struct mm_struct *mm = vma->vm_mm;
pte_t *pte, entry;
@@ -2457,7 +2456,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
* allocation and mapping invalidation so just skip the
* update.
*/
- if (pte_pfn(entry) != pfn_t_to_pfn(pfn)) {
+ if (pte_pfn(entry) != pfn) {
WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry)));
goto out_unlock;
}
@@ -2470,7 +2469,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr,
}
/* Ok, finally just insert the thing.. */
- entry = pte_mkspecial(pfn_t_pte(pfn, prot));
+ entry = pte_mkspecial(pfn_pte(pfn, prot));
if (mkwrite) {
entry = pte_mkyoung(entry);
@@ -2541,8 +2540,7 @@ vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
pfnmap_setup_cachemode_pfn(pfn, &pgprot);
- return insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, 0), pgprot,
- false);
+ return insert_pfn(vma, addr, pfn, pgprot, false);
}
EXPORT_SYMBOL(vmf_insert_pfn_prot);
@@ -2573,21 +2571,22 @@ vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
}
EXPORT_SYMBOL(vmf_insert_pfn);
-static bool vm_mixed_ok(struct vm_area_struct *vma, pfn_t pfn, bool mkwrite)
+static bool vm_mixed_ok(struct vm_area_struct *vma, unsigned long pfn,
+ bool mkwrite)
{
- if (unlikely(is_zero_pfn(pfn_t_to_pfn(pfn))) &&
+ if (unlikely(is_zero_pfn(pfn)) &&
(mkwrite || !vm_mixed_zeropage_allowed(vma)))
return false;
/* these checks mirror the abort conditions in vm_normal_page */
if (vma->vm_flags & VM_MIXEDMAP)
return true;
- if (is_zero_pfn(pfn_t_to_pfn(pfn)))
+ if (is_zero_pfn(pfn))
return true;
return false;
}
static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
- unsigned long addr, pfn_t pfn, bool mkwrite)
+ unsigned long addr, unsigned long pfn, bool mkwrite)
{
pgprot_t pgprot = vma->vm_page_prot;
int err;
@@ -2598,9 +2597,9 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
if (addr < vma->vm_start || addr >= vma->vm_end)
return VM_FAULT_SIGBUS;
- pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
+ pfnmap_setup_cachemode_pfn(pfn, &pgprot);
- if (!pfn_modify_allowed(pfn_t_to_pfn(pfn), pgprot))
+ if (!pfn_modify_allowed(pfn, pgprot))
return VM_FAULT_SIGBUS;
/*
@@ -2610,7 +2609,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
* than insert_pfn). If a zero_pfn were inserted into a VM_MIXEDMAP
* without pte special, it would there be refcounted as a normal page.
*/
- if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pfn_t_valid(pfn)) {
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pfn_valid(pfn)) {
struct page *page;
/*
@@ -2618,7 +2617,7 @@ static vm_fault_t __vm_insert_mixed(struct vm_area_struct *vma,
* regardless of whether the caller specified flags that
* result in pfn_t_has_page() == false.
*/
- page = pfn_to_page(pfn_t_to_pfn(pfn));
+ page = pfn_to_page(pfn);
err = insert_page(vma, addr, page, pgprot, mkwrite);
} else {
return insert_pfn(vma, addr, pfn, pgprot, mkwrite);
@@ -2653,7 +2652,7 @@ vm_fault_t vmf_insert_page_mkwrite(struct vm_fault *vmf, struct page *page,
EXPORT_SYMBOL_GPL(vmf_insert_page_mkwrite);
vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
- pfn_t pfn)
+ unsigned long pfn)
{
return __vm_insert_mixed(vma, addr, pfn, false);
}
@@ -2665,7 +2664,7 @@ EXPORT_SYMBOL(vmf_insert_mixed);
* the same entry was actually inserted.
*/
vm_fault_t vmf_insert_mixed_mkwrite(struct vm_area_struct *vma,
- unsigned long addr, pfn_t pfn)
+ unsigned long addr, unsigned long pfn)
{
return __vm_insert_mixed(vma, addr, pfn, true);
}
diff --git a/mm/memremap.c b/mm/memremap.c
index c17e0a6..044a455 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -5,7 +5,6 @@
#include <linux/kasan.h>
#include <linux/memory_hotplug.h>
#include <linux/memremap.h>
-#include <linux/pfn_t.h>
#include <linux/swap.h>
#include <linux/mm.h>
#include <linux/mmzone.h>
diff --git a/mm/migrate.c b/mm/migrate.c
index 8cf0f9c..ea8c74d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -35,7 +35,6 @@
#include <linux/compat.h>
#include <linux/hugetlb.h>
#include <linux/gfp.h>
-#include <linux/pfn_t.h>
#include <linux/page_idle.h>
#include <linux/page_owner.h>
#include <linux/sched/mm.h>
diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c
index c1ec099..05e763a 100644
--- a/tools/testing/nvdimm/pmem-dax.c
+++ b/tools/testing/nvdimm/pmem-dax.c
@@ -10,7 +10,7 @@
long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
long nr_pages, enum dax_access_mode mode, void **kaddr,
- pfn_t *pfn)
+ unsigned long *pfn)
{
resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
@@ -29,7 +29,7 @@ long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
*kaddr = pmem->virt_addr + offset;
page = vmalloc_to_page(pmem->virt_addr + offset);
if (pfn)
- *pfn = page_to_pfn_t(page);
+ *pfn = page_to_pfn(page);
pr_debug_ratelimited("%s: pmem: %p pgoff: %#lx pfn: %#lx\n",
__func__, pmem, pgoff, page_to_pfn(page));
@@ -39,7 +39,7 @@ long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
if (kaddr)
*kaddr = pmem->virt_addr + offset;
if (pfn)
- *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ *pfn = PHYS_PFN(pmem->phys_addr + offset);
/*
* If badblocks are present, limit known good range to the
diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c
index ddceb04..f7e7bfe 100644
--- a/tools/testing/nvdimm/test/iomap.c
+++ b/tools/testing/nvdimm/test/iomap.c
@@ -8,7 +8,6 @@
#include <linux/ioport.h>
#include <linux/module.h>
#include <linux/types.h>
-#include <linux/pfn_t.h>
#include <linux/acpi.h>
#include <linux/io.h>
#include <linux/mm.h>
@@ -135,12 +134,6 @@ void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
}
EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
-pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
-{
- return phys_to_pfn_t(addr, flags);
-}
-EXPORT_SYMBOL(__wrap_phys_to_pfn_t);
-
void *__wrap_memremap(resource_size_t offset, size_t size,
unsigned long flags)
{
diff --git a/tools/testing/nvdimm/test/nfit_test.h b/tools/testing/nvdimm/test/nfit_test.h
index b00583d..b9047fb 100644
--- a/tools/testing/nvdimm/test/nfit_test.h
+++ b/tools/testing/nvdimm/test/nfit_test.h
@@ -212,7 +212,6 @@ void __iomem *__wrap_devm_ioremap(struct device *dev,
void *__wrap_devm_memremap(struct device *dev, resource_size_t offset,
size_t size, unsigned long flags);
void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
-pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags);
void *__wrap_memremap(resource_size_t offset, size_t size,
unsigned long flags);
void __wrap_devm_memunmap(struct device *dev, void *addr);
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 13/14] mm: Remove callers of pfn_t functionality
2025-06-16 11:58 ` [PATCH v2 13/14] mm: Remove callers of pfn_t functionality Alistair Popple
@ 2025-06-17 9:35 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:35 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> All PFN_* pfn_t flags have been removed. Therefore there is no longer
> a need for the pfn_t type and all uses can be replaced with normal
> pfns.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>
> ---
>
> Changes since v1:
>
> - Rebased on David's cleanup[1]
>
> [1] https://lore.kernel.org/linux-mm/20250611120654.545963-1-david@redhat.com/
> ---
> arch/x86/mm/pat/memtype.c | 1 +-
> drivers/dax/device.c | 23 +++----
> drivers/dax/hmem/hmem.c | 1 +-
> drivers/dax/kmem.c | 1 +-
> drivers/dax/pmem.c | 1 +-
> drivers/dax/super.c | 3 +-
> drivers/gpu/drm/exynos/exynos_drm_gem.c | 1 +-
> drivers/gpu/drm/gma500/fbdev.c | 3 +-
> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 1 +-
> drivers/gpu/drm/msm/msm_gem.c | 1 +-
> drivers/gpu/drm/omapdrm/omap_gem.c | 6 +--
> drivers/gpu/drm/v3d/v3d_bo.c | 1 +-
> drivers/hwtracing/intel_th/msu.c | 3 +-
> drivers/md/dm-linear.c | 2 +-
> drivers/md/dm-log-writes.c | 2 +-
> drivers/md/dm-stripe.c | 2 +-
> drivers/md/dm-target.c | 2 +-
> drivers/md/dm-writecache.c | 11 +--
> drivers/md/dm.c | 2 +-
> drivers/nvdimm/pmem.c | 8 +--
> drivers/nvdimm/pmem.h | 4 +-
> drivers/s390/block/dcssblk.c | 9 +--
> drivers/vfio/pci/vfio_pci_core.c | 5 +-
> fs/cramfs/inode.c | 5 +-
> fs/dax.c | 50 +++++++--------
> fs/ext4/file.c | 2 +-
> fs/fuse/dax.c | 3 +-
> fs/fuse/virtio_fs.c | 5 +-
> fs/xfs/xfs_file.c | 2 +-
> include/linux/dax.h | 9 +--
> include/linux/device-mapper.h | 2 +-
> include/linux/huge_mm.h | 6 +-
> include/linux/mm.h | 4 +-
> include/linux/pfn.h | 9 +---
> include/linux/pfn_t.h | 85 +-------------------------
> mm/debug_vm_pgtable.c | 1 +-
> mm/huge_memory.c | 21 +++---
> mm/memory.c | 31 ++++-----
> mm/memremap.c | 1 +-
> mm/migrate.c | 1 +-
> tools/testing/nvdimm/pmem-dax.c | 6 +-
> tools/testing/nvdimm/test/iomap.c | 7 +--
> tools/testing/nvdimm/test/nfit_test.h | 1 +-
> 43 files changed, 109 insertions(+), 235 deletions(-)
> delete mode 100644 include/linux/pfn_t.h
>
Lovely
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 14/14] mm/memremap: Remove unused devmap_managed_key
2025-06-16 11:58 [PATCH v2 00/14] mm: Remove pXX_devmap page table bit and pfn_t type Alistair Popple
` (12 preceding siblings ...)
2025-06-16 11:58 ` [PATCH v2 13/14] mm: Remove callers of pfn_t functionality Alistair Popple
@ 2025-06-16 11:58 ` Alistair Popple
2025-06-17 9:34 ` David Hildenbrand
13 siblings, 1 reply; 31+ messages in thread
From: Alistair Popple @ 2025-06-16 11:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, Alistair Popple, gerald.schaefer, dan.j.williams, jgg,
willy, david, linux-kernel, nvdimm, linux-fsdevel, linux-ext4,
linux-xfs, jhubbard, hch, zhang.lyra, debug, bjorn, balbirs,
lorenzo.stoakes, linux-arm-kernel, loongarch, linuxppc-dev,
linux-riscv, linux-cxl, dri-devel, John, m.szyprowski,
Jason Gunthorpe
It's no longer used so remove it.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
mm/memremap.c | 27 ---------------------------
1 file changed, 27 deletions(-)
diff --git a/mm/memremap.c b/mm/memremap.c
index 044a455..f75078c 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -38,30 +38,6 @@ unsigned long memremap_compat_align(void)
EXPORT_SYMBOL_GPL(memremap_compat_align);
#endif
-#ifdef CONFIG_FS_DAX
-DEFINE_STATIC_KEY_FALSE(devmap_managed_key);
-EXPORT_SYMBOL(devmap_managed_key);
-
-static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
-{
- if (pgmap->type == MEMORY_DEVICE_FS_DAX)
- static_branch_dec(&devmap_managed_key);
-}
-
-static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
-{
- if (pgmap->type == MEMORY_DEVICE_FS_DAX)
- static_branch_inc(&devmap_managed_key);
-}
-#else
-static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
-{
-}
-static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
-{
-}
-#endif /* CONFIG_FS_DAX */
-
static void pgmap_array_delete(struct range *range)
{
xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end),
@@ -150,7 +126,6 @@ void memunmap_pages(struct dev_pagemap *pgmap)
percpu_ref_exit(&pgmap->ref);
WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n");
- devmap_managed_enable_put(pgmap);
}
EXPORT_SYMBOL_GPL(memunmap_pages);
@@ -349,8 +324,6 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
if (error)
return ERR_PTR(error);
- devmap_managed_enable_get(pgmap);
-
/*
* Clear the pgmap nr_range as it will be incremented for each
* successfully processed range. This communicates how many
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 14/14] mm/memremap: Remove unused devmap_managed_key
2025-06-16 11:58 ` [PATCH v2 14/14] mm/memremap: Remove unused devmap_managed_key Alistair Popple
@ 2025-06-17 9:34 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-06-17 9:34 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: linux-mm, gerald.schaefer, dan.j.williams, jgg, willy,
linux-kernel, nvdimm, linux-fsdevel, linux-ext4, linux-xfs,
jhubbard, hch, zhang.lyra, debug, bjorn, balbirs, lorenzo.stoakes,
linux-arm-kernel, loongarch, linuxppc-dev, linux-riscv, linux-cxl,
dri-devel, John, m.szyprowski, Jason Gunthorpe
On 16.06.25 13:58, Alistair Popple wrote:
> It's no longer used so remove it.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread