[PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
@ 2026-06-25  4:28 Dev Jain
  2026-06-25  4:42 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Dev Jain @ 2026-06-25  4:28 UTC (permalink / raw)
  To: akpm, david, ljs
  Cc: Dev Jain, riel, liam, vbabka, harry, jannh, kas, linux-mm,
	linux-kernel, ryan.roberts, anshuman.khandual, stable

try_to_unmap_one() handles hugetlb folios when memory failure needs
to replace a poisoned hugetlb mapping with a hwpoison entry. In that
case page_vma_mapped_walk() returns the hugetlb entry in pvmw.pte, but
the code reads it with ptep_get() before decoding the PFN.

That is wrong on architectures where hugetlb entries are not encoded as
regular PTEs. On s390, for example, a raw huge RSTE must be converted
by huge_ptep_get() before helpers such as pte_pfn() can inspect it. A
raw decode can select the wrong subpage, so try_to_unmap_one() can
install a hwpoison entry for the wrong PFN.

The userspace-visible result is that a later access to the poisoned
hugetlb subpage can miss the expected SIGBUS. With DEBUG_VM, the wrong
subpage can also trip the PageHWPoison check.

Use huge_ptep_get() for hugetlb mappings before decoding the PFN.

Before c7ab0d2fdc84, the bug existed in the form of a plain dereference:
we would check the head page pfn of the hugetlb with pte_pfn(*pte), and
bail out on mismatch. This would mean that the hwpoisoned entry will not
get installed.

I am not sure what is the procedure on such kinds of very old bugs - how
back should I really go?

Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
Applies on mm-unstable (d17fe8a046a2).
There are similar old bugs present, in try_to_migrate_one(), check_pte(),
remove_migration_pte(), prot_none_hugetlb_entry().

 mm/rmap.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1c77d5dc06e9f..aa8a254efaecc 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2095,11 +2095,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		/* Unexpected PMD-mapped THP? */
 		VM_BUG_ON_FOLIO(!pvmw.pte, folio);

-		/*
-		 * Handle PFN swap PTEs, such as device-exclusive ones, that
-		 * actually map pages.
-		 */
-		pteval = ptep_get(pvmw.pte);
+		address = pvmw.address;
+		if (folio_test_hugetlb(folio)) {
+			pteval = huge_ptep_get(mm, address, pvmw.pte);
+		} else {
+			/*
+			 * Handle PFN swap PTEs, such as device-exclusive ones,
+			 * that actually map pages.
+			 */
+			pteval = ptep_get(pvmw.pte);
+		}
 		if (likely(pte_present(pteval))) {
 			pfn = pte_pfn(pteval);
 		} else {
@@ -2110,7 +2115,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 		}

 		subpage = folio_page(folio, pfn - folio_pfn(folio));
-		address = pvmw.address;
 		anon_exclusive = folio_test_anon(folio) &&
 				 PageAnonExclusive(subpage);

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
  2026-06-25  4:28 [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
@ 2026-06-25  4:42 ` Andrew Morton
  2026-06-25  5:06   ` Dev Jain
  2026-06-25  5:45 ` kernel test robot
  2026-06-25  5:45 ` kernel test robot
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-06-25  4:42 UTC (permalink / raw)
  To: Dev Jain
  Cc: david, ljs, riel, liam, vbabka, harry, jannh, kas, linux-mm,
	linux-kernel, ryan.roberts, anshuman.khandual, stable

On Thu, 25 Jun 2026 04:28:51 +0000 Dev Jain <dev.jain@arm.com> wrote:

> try_to_unmap_one() handles hugetlb folios when memory failure needs
> to replace a poisoned hugetlb mapping with a hwpoison entry. In that
> case page_vma_mapped_walk() returns the hugetlb entry in pvmw.pte, but
> the code reads it with ptep_get() before decoding the PFN.
> 
> That is wrong on architectures where hugetlb entries are not encoded as
> regular PTEs. On s390, for example, a raw huge RSTE must be converted
> by huge_ptep_get() before helpers such as pte_pfn() can inspect it. A
> raw decode can select the wrong subpage, so try_to_unmap_one() can
> install a hwpoison entry for the wrong PFN.
> 
> The userspace-visible result is that a later access to the poisoned
> hugetlb subpage can miss the expected SIGBUS. With DEBUG_VM, the wrong
> subpage can also trip the PageHWPoison check.
> 
> Use huge_ptep_get() for hugetlb mappings before decoding the PFN.
> 
> Before c7ab0d2fdc84, the bug existed in the form of a plain dereference:
> we would check the head page pfn of the hugetlb with pte_pfn(*pte), and
> bail out on mismatch. This would mean that the hwpoisoned entry will not
> get installed.
> 
> I am not sure what is the procedure on such kinds of very old bugs - how
> back should I really go?

I think 9 years is enough ;)

> There are similar old bugs present, in try_to_migrate_one(), check_pte(),
> remove_migration_pte(), prot_none_hugetlb_entry().

Why now?  Was there some more recent (s390?) change which exposed this?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
  2026-06-25  4:42 ` Andrew Morton
@ 2026-06-25  5:06   ` Dev Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Dev Jain @ 2026-06-25  5:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, ljs, riel, liam, vbabka, harry, jannh, kas, linux-mm,
	linux-kernel, ryan.roberts, anshuman.khandual, stable



On 25/06/26 10:12 am, Andrew Morton wrote:
> On Thu, 25 Jun 2026 04:28:51 +0000 Dev Jain <dev.jain@arm.com> wrote:
> 
>> try_to_unmap_one() handles hugetlb folios when memory failure needs
>> to replace a poisoned hugetlb mapping with a hwpoison entry. In that
>> case page_vma_mapped_walk() returns the hugetlb entry in pvmw.pte, but
>> the code reads it with ptep_get() before decoding the PFN.
>>
>> That is wrong on architectures where hugetlb entries are not encoded as
>> regular PTEs. On s390, for example, a raw huge RSTE must be converted
>> by huge_ptep_get() before helpers such as pte_pfn() can inspect it. A
>> raw decode can select the wrong subpage, so try_to_unmap_one() can
>> install a hwpoison entry for the wrong PFN.
>>
>> The userspace-visible result is that a later access to the poisoned
>> hugetlb subpage can miss the expected SIGBUS. With DEBUG_VM, the wrong
>> subpage can also trip the PageHWPoison check.
>>
>> Use huge_ptep_get() for hugetlb mappings before decoding the PFN.
>>
>> Before c7ab0d2fdc84, the bug existed in the form of a plain dereference:
>> we would check the head page pfn of the hugetlb with pte_pfn(*pte), and
>> bail out on mismatch. This would mean that the hwpoisoned entry will not
>> get installed.
>>
>> I am not sure what is the procedure on such kinds of very old bugs - how
>> back should I really go?
> 
> I think 9 years is enough ;)
> 
>> There are similar old bugs present, in try_to_migrate_one(), check_pte(),
>> remove_migration_pte(), prot_none_hugetlb_entry().
> 
> Why now?  Was there some more recent (s390?) change which exposed this?

I was refactoring the hugetlb bits in try_to_unmap_one, so the bug got
caught in review by David (which reminds me to put a "Reported-by" tag
on this patch).

I guess if someone would run hugetlb-read-hwpoison.c on s390, this would
be caught. Turns out, this selftest is in a category of "destructive tests"
in run_vmtests.sh, so ./run_vmtests.sh or even ./run_vmtests.sh -a won't
run this. We are supposed to run this with ./run_vmtests.sh -d, and that
option was broken until one month ago, see 3432cbb291aa. So essentially
no one has been running that test.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
  2026-06-25  4:28 [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
  2026-06-25  4:42 ` Andrew Morton
@ 2026-06-25  5:45 ` kernel test robot
  2026-06-25  5:45 ` kernel test robot
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-25  5:45 UTC (permalink / raw)
  To: Dev Jain, akpm, david, ljs
  Cc: oe-kbuild-all, Dev Jain, riel, liam, vbabka, harry, jannh, kas,
	linux-mm, linux-kernel, ryan.roberts, anshuman.khandual, stable

Hi Dev,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Dev-Jain/mm-rmap-use-huge_ptep_get-in-try_to_unmap_one/20260625-123050
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260625042853.2752898-1-dev.jain%40arm.com
patch subject: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
config: nios2-allnoconfig (https://download.01.org/0day-ci/archive/20260625/202606251311.CCKYInqf-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251311.CCKYInqf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251311.CCKYInqf-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/rmap.c: In function 'try_to_unmap_one':
>> mm/rmap.c:2100:34: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
    2100 |                         pteval = huge_ptep_get(mm, address, pvmw.pte);
         |                                  ^~~~~~~~~~~~~
>> mm/rmap.c:2100:34: error: incompatible types when assigning to type 'pte_t' from type 'int'
   cc1: some warnings being treated as errors


vim +/huge_ptep_get +2100 mm/rmap.c

  1980	
  1981	/*
  1982	 * @arg: enum ttu_flags will be passed to this argument
  1983	 */
  1984	static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
  1985			     unsigned long address, void *arg)
  1986	{
  1987		struct mm_struct *mm = vma->vm_mm;
  1988		DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
  1989		bool anon_exclusive, ret = true;
  1990		pte_t pteval;
  1991		struct page *subpage;
  1992		struct mmu_notifier_range range;
  1993		enum ttu_flags flags = (enum ttu_flags)(long)arg;
  1994		unsigned long nr_pages = 1, end_addr;
  1995		unsigned long pfn;
  1996		unsigned long hsz = 0;
  1997		int ptes = 0;
  1998	
  1999		/*
  2000		 * When racing against e.g. zap_pte_range() on another cpu,
  2001		 * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
  2002		 * try_to_unmap() may return before folio_mapped() has become false,
  2003		 * if page table locking is skipped: use TTU_SYNC to wait for that.
  2004		 */
  2005		if (flags & TTU_SYNC)
  2006			pvmw.flags = PVMW_SYNC;
  2007	
  2008		/*
  2009		 * For THP, we have to assume the worse case ie pmd for invalidation.
  2010		 * For hugetlb, it could be much worse if we need to do pud
  2011		 * invalidation in the case of pmd sharing.
  2012		 *
  2013		 * Note that the folio can not be freed in this function as call of
  2014		 * try_to_unmap() must hold a reference on the folio.
  2015		 */
  2016		range.end = vma_address_end(&pvmw);
  2017		mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
  2018					address, range.end);
  2019		if (folio_test_hugetlb(folio)) {
  2020			/*
  2021			 * If sharing is possible, start and end will be adjusted
  2022			 * accordingly.
  2023			 */
  2024			adjust_range_if_pmd_sharing_possible(vma, &range.start,
  2025							     &range.end);
  2026	
  2027			/* We need the huge page size for set_huge_pte_at() */
  2028			hsz = huge_page_size(hstate_vma(vma));
  2029		}
  2030		mmu_notifier_invalidate_range_start(&range);
  2031	
  2032		while (page_vma_mapped_walk(&pvmw)) {
  2033			nr_pages = 1;
  2034	
  2035			/*
  2036			 * If the folio is in an mlock()d vma, we must not swap it out.
  2037			 */
  2038			if (!(flags & TTU_IGNORE_MLOCK) &&
  2039			    (vma->vm_flags & VM_LOCKED)) {
  2040				ptes++;
  2041	
  2042				/*
  2043				 * Set 'ret' to indicate the page cannot be unmapped.
  2044				 *
  2045				 * Do not jump to walk_abort immediately as additional
  2046				 * iteration might be required to detect fully mapped
  2047				 * folio an mlock it.
  2048				 */
  2049				ret = false;
  2050	
  2051				/* Only mlock fully mapped pages */
  2052				if (pvmw.pte && ptes != pvmw.nr_pages)
  2053					continue;
  2054	
  2055				/*
  2056				 * All PTEs must be protected by page table lock in
  2057				 * order to mlock the page.
  2058				 *
  2059				 * If page table boundary has been cross, current ptl
  2060				 * only protect part of ptes.
  2061				 */
  2062				if (pvmw.flags & PVMW_PGTABLE_CROSSED)
  2063					goto walk_done;
  2064	
  2065				/* Restore the mlock which got missed */
  2066				mlock_vma_folio(folio, vma);
  2067				goto walk_done;
  2068			}
  2069	
  2070			if (!pvmw.pte) {
  2071				if (folio_test_lazyfree(folio)) {
  2072					if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio))
  2073						goto walk_done;
  2074					/*
  2075					 * unmap_huge_pmd_locked has either already marked
  2076					 * the folio as swap-backed or decided to retain it
  2077					 * due to GUP or speculative references.
  2078					 */
  2079					goto walk_abort;
  2080				}
  2081	
  2082				if (flags & TTU_SPLIT_HUGE_PMD) {
  2083					/*
  2084					 * We temporarily have to drop the PTL and
  2085					 * restart so we can process the PTE-mapped THP.
  2086					 */
  2087					split_huge_pmd_locked(vma, pvmw.address,
  2088							      pvmw.pmd, false);
  2089					flags &= ~TTU_SPLIT_HUGE_PMD;
  2090					page_vma_mapped_walk_restart(&pvmw);
  2091					continue;
  2092				}
  2093			}
  2094	
  2095			/* Unexpected PMD-mapped THP? */
  2096			VM_BUG_ON_FOLIO(!pvmw.pte, folio);
  2097	
  2098			address = pvmw.address;
  2099			if (folio_test_hugetlb(folio)) {
> 2100				pteval = huge_ptep_get(mm, address, pvmw.pte);
  2101			} else {
  2102				/*
  2103				 * Handle PFN swap PTEs, such as device-exclusive ones,
  2104				 * that actually map pages.
  2105				 */
  2106				pteval = ptep_get(pvmw.pte);
  2107			}
  2108			if (likely(pte_present(pteval))) {
  2109				pfn = pte_pfn(pteval);
  2110			} else {
  2111				const softleaf_t entry = softleaf_from_pte(pteval);
  2112	
  2113				pfn = softleaf_to_pfn(entry);
  2114				VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
  2115			}
  2116	
  2117			subpage = folio_page(folio, pfn - folio_pfn(folio));
  2118			anon_exclusive = folio_test_anon(folio) &&
  2119					 PageAnonExclusive(subpage);
  2120	
  2121			if (folio_test_hugetlb(folio)) {
  2122				bool anon = folio_test_anon(folio);
  2123	
  2124				/*
  2125				 * The try_to_unmap() is only passed a hugetlb page
  2126				 * in the case where the hugetlb page is poisoned.
  2127				 */
  2128				VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
  2129				/*
  2130				 * huge_pmd_unshare may unmap an entire PMD page.
  2131				 * There is no way of knowing exactly which PMDs may
  2132				 * be cached for this mm, so we must flush them all.
  2133				 * start/end were already adjusted above to cover this
  2134				 * range.
  2135				 */
  2136				flush_cache_range(vma, range.start, range.end);
  2137	
  2138				/*
  2139				 * To call huge_pmd_unshare, i_mmap_rwsem must be
  2140				 * held in write mode.  Caller needs to explicitly
  2141				 * do this outside rmap routines.
  2142				 *
  2143				 * We also must hold hugetlb vma_lock in write mode.
  2144				 * Lock order dictates acquiring vma_lock BEFORE
  2145				 * i_mmap_rwsem.  We can only try lock here and fail
  2146				 * if unsuccessful.
  2147				 */
  2148				if (!anon) {
  2149					struct mmu_gather tlb;
  2150	
  2151					VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
  2152					if (!hugetlb_vma_trylock_write(vma))
  2153						goto walk_abort;
  2154	
  2155					tlb_gather_mmu_vma(&tlb, vma);
  2156					if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
  2157						hugetlb_vma_unlock_write(vma);
  2158						huge_pmd_unshare_flush(&tlb, vma);
  2159						tlb_finish_mmu(&tlb);
  2160						/*
  2161						 * The PMD table was unmapped,
  2162						 * consequently unmapping the folio.
  2163						 */
  2164						goto walk_done;
  2165					}
  2166					hugetlb_vma_unlock_write(vma);
  2167					tlb_finish_mmu(&tlb);
  2168				}
  2169				pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
  2170				if (pte_dirty(pteval))
  2171					folio_mark_dirty(folio);
  2172			} else if (likely(pte_present(pteval))) {
  2173				nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
  2174				end_addr = address + nr_pages * PAGE_SIZE;
  2175				flush_cache_range(vma, address, end_addr);
  2176	
  2177				/* Nuke the page table entry. */
  2178				pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
  2179				/*
  2180				 * We clear the PTE but do not flush so potentially
  2181				 * a remote CPU could still be writing to the folio.
  2182				 * If the entry was previously clean then the
  2183				 * architecture must guarantee that a clear->dirty
  2184				 * transition on a cached TLB entry is written through
  2185				 * and traps if the PTE is unmapped.
  2186				 */
  2187				if (should_defer_flush(mm, flags))
  2188					set_tlb_ubc_flush_pending(mm, pteval, address, end_addr);
  2189				else
  2190					flush_tlb_range(vma, address, end_addr);
  2191				if (pte_dirty(pteval))
  2192					folio_mark_dirty(folio);
  2193			} else {
  2194				pte_clear(mm, address, pvmw.pte);
  2195			}
  2196	
  2197			/*
  2198			 * Now the pte is cleared. If this pte was uffd-wp armed,
  2199			 * we may want to replace a none pte with a marker pte if
  2200			 * it's file-backed, so we don't lose the tracking info.
  2201			 */
  2202			pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
  2203	
  2204			/* Update high watermark before we lower rss */
  2205			update_hiwater_rss(mm);
  2206	
  2207			if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
  2208				pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
  2209				if (folio_test_hugetlb(folio)) {
  2210					hugetlb_count_sub(folio_nr_pages(folio), mm);
  2211					set_huge_pte_at(mm, address, pvmw.pte, pteval,
  2212							hsz);
  2213				} else {
  2214					dec_mm_counter(mm, mm_counter(folio));
  2215					set_pte_at(mm, address, pvmw.pte, pteval);
  2216				}
  2217			} else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
  2218				   !userfaultfd_armed(vma)) {
  2219				/*
  2220				 * The guest indicated that the page content is of no
  2221				 * interest anymore. Simply discard the pte, vmscan
  2222				 * will take care of the rest.
  2223				 * A future reference will then fault in a new zero
  2224				 * page. When userfaultfd is active, we must not drop
  2225				 * this page though, as its main user (postcopy
  2226				 * migration) will not expect userfaults on already
  2227				 * copied pages.
  2228				 */
  2229				dec_mm_counter(mm, mm_counter(folio));
  2230			} else if (folio_test_anon(folio)) {
  2231				swp_entry_t entry = page_swap_entry(subpage);
  2232				pte_t swp_pte;
  2233				/*
  2234				 * Store the swap location in the pte.
  2235				 * See handle_pte_fault() ...
  2236				 */
  2237				if (unlikely(folio_test_swapbacked(folio) !=
  2238						folio_test_swapcache(folio))) {
  2239					WARN_ON_ONCE(1);
  2240					goto walk_abort;
  2241				}
  2242	
  2243				/* MADV_FREE page check */
  2244				if (!folio_test_swapbacked(folio)) {
  2245					int ref_count, map_count;
  2246	
  2247					/*
  2248					 * Synchronize with gup_pte_range():
  2249					 * - clear PTE; barrier; read refcount
  2250					 * - inc refcount; barrier; read PTE
  2251					 */
  2252					smp_mb();
  2253	
  2254					ref_count = folio_ref_count(folio);
  2255					map_count = folio_mapcount(folio);
  2256	
  2257					/*
  2258					 * Order reads for page refcount and dirty flag
  2259					 * (see comments in __remove_mapping()).
  2260					 */
  2261					smp_rmb();
  2262	
  2263					if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
  2264						/*
  2265						 * redirtied either using the page table or a previously
  2266						 * obtained GUP reference.
  2267						 */
  2268						set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
  2269						folio_set_swapbacked(folio);
  2270						goto walk_abort;
  2271					} else if (ref_count != 1 + map_count) {
  2272						/*
  2273						 * Additional reference. Could be a GUP reference or any
  2274						 * speculative reference. GUP users must mark the folio
  2275						 * dirty if there was a modification. This folio cannot be
  2276						 * reclaimed right now either way, so act just like nothing
  2277						 * happened.
  2278						 * We'll come back here later and detect if the folio was
  2279						 * dirtied when the additional reference is gone.
  2280						 */
  2281						set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
  2282						goto walk_abort;
  2283					}
  2284					add_mm_counter(mm, MM_ANONPAGES, -nr_pages);
  2285					goto discard;
  2286				}
  2287	
  2288				if (folio_dup_swap(folio, subpage) < 0) {
  2289					set_pte_at(mm, address, pvmw.pte, pteval);
  2290					goto walk_abort;
  2291				}
  2292	
  2293				/*
  2294				 * arch_unmap_one() is expected to be a NOP on
  2295				 * architectures where we could have PFN swap PTEs,
  2296				 * so we'll not check/care.
  2297				 */
  2298				if (arch_unmap_one(mm, vma, address, pteval) < 0) {
  2299					folio_put_swap(folio, subpage);
  2300					set_pte_at(mm, address, pvmw.pte, pteval);
  2301					goto walk_abort;
  2302				}
  2303	
  2304				/* See folio_try_share_anon_rmap(): clear PTE first. */
  2305				if (anon_exclusive &&
  2306				    folio_try_share_anon_rmap_pte(folio, subpage)) {
  2307					folio_put_swap(folio, subpage);
  2308					set_pte_at(mm, address, pvmw.pte, pteval);
  2309					goto walk_abort;
  2310				}
  2311				if (list_empty(&mm->mmlist)) {
  2312					spin_lock(&mmlist_lock);
  2313					if (list_empty(&mm->mmlist))
  2314						list_add(&mm->mmlist, &init_mm.mmlist);
  2315					spin_unlock(&mmlist_lock);
  2316				}
  2317				dec_mm_counter(mm, MM_ANONPAGES);
  2318				inc_mm_counter(mm, MM_SWAPENTS);
  2319				swp_pte = swp_entry_to_pte(entry);
  2320				if (anon_exclusive)
  2321					swp_pte = pte_swp_mkexclusive(swp_pte);
  2322				if (likely(pte_present(pteval))) {
  2323					if (pte_soft_dirty(pteval))
  2324						swp_pte = pte_swp_mksoft_dirty(swp_pte);
  2325					if (pte_uffd_wp(pteval))
  2326						swp_pte = pte_swp_mkuffd_wp(swp_pte);
  2327				} else {
  2328					if (pte_swp_soft_dirty(pteval))
  2329						swp_pte = pte_swp_mksoft_dirty(swp_pte);
  2330					if (pte_swp_uffd_wp(pteval))
  2331						swp_pte = pte_swp_mkuffd_wp(swp_pte);
  2332				}
  2333				set_pte_at(mm, address, pvmw.pte, swp_pte);
  2334			} else {
  2335				/*
  2336				 * This is a locked file-backed folio,
  2337				 * so it cannot be removed from the page
  2338				 * cache and replaced by a new folio before
  2339				 * mmu_notifier_invalidate_range_end, so no
  2340				 * concurrent thread might update its page table
  2341				 * to point at a new folio while a device is
  2342				 * still using this folio.
  2343				 *
  2344				 * See Documentation/mm/mmu_notifier.rst
  2345				 */
  2346				add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
  2347			}
  2348	discard:
  2349			if (unlikely(folio_test_hugetlb(folio))) {
  2350				hugetlb_remove_rmap(folio);
  2351			} else {
  2352				folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
  2353			}
  2354			if (vma->vm_flags & VM_LOCKED)
  2355				mlock_drain_local();
  2356			folio_put_refs(folio, nr_pages);
  2357	
  2358			/*
  2359			 * If we are sure that we batched the entire folio and cleared
  2360			 * all PTEs, we can just optimize and stop right here.
  2361			 */
  2362			if (nr_pages == folio_nr_pages(folio))
  2363				goto walk_done;
  2364			continue;
  2365	walk_abort:
  2366			ret = false;
  2367	walk_done:
  2368			page_vma_mapped_walk_done(&pvmw);
  2369			break;
  2370		}
  2371	
  2372		mmu_notifier_invalidate_range_end(&range);
  2373	
  2374		return ret;
  2375	}
  2376	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
  2026-06-25  4:28 [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
  2026-06-25  4:42 ` Andrew Morton
  2026-06-25  5:45 ` kernel test robot
@ 2026-06-25  5:45 ` kernel test robot
  2 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2026-06-25  5:45 UTC (permalink / raw)
  To: Dev Jain, akpm, david, ljs
  Cc: llvm, oe-kbuild-all, Dev Jain, riel, liam, vbabka, harry, jannh,
	kas, linux-mm, linux-kernel, ryan.roberts, anshuman.khandual,
	stable

Hi Dev,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Dev-Jain/mm-rmap-use-huge_ptep_get-in-try_to_unmap_one/20260625-123050
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20260625042853.2752898-1-dev.jain%40arm.com
patch subject: [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one()
config: hexagon-allnoconfig (https://download.01.org/0day-ci/archive/20260625/202606251341.jfIr1D7m-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 6cc609bb250b21b47fc7d394b4019101e9983597)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260625/202606251341.jfIr1D7m-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606251341.jfIr1D7m-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/rmap.c:2100:13: error: call to undeclared function 'huge_ptep_get'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    2100 |                         pteval = huge_ptep_get(mm, address, pvmw.pte);
         |                                  ^
>> mm/rmap.c:2100:11: error: assigning to 'pte_t' from incompatible type 'int'
    2100 |                         pteval = huge_ptep_get(mm, address, pvmw.pte);
         |                                ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   2 errors generated.


vim +/huge_ptep_get +2100 mm/rmap.c

  1980	
  1981	/*
  1982	 * @arg: enum ttu_flags will be passed to this argument
  1983	 */
  1984	static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
  1985			     unsigned long address, void *arg)
  1986	{
  1987		struct mm_struct *mm = vma->vm_mm;
  1988		DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
  1989		bool anon_exclusive, ret = true;
  1990		pte_t pteval;
  1991		struct page *subpage;
  1992		struct mmu_notifier_range range;
  1993		enum ttu_flags flags = (enum ttu_flags)(long)arg;
  1994		unsigned long nr_pages = 1, end_addr;
  1995		unsigned long pfn;
  1996		unsigned long hsz = 0;
  1997		int ptes = 0;
  1998	
  1999		/*
  2000		 * When racing against e.g. zap_pte_range() on another cpu,
  2001		 * in between its ptep_get_and_clear_full() and folio_remove_rmap_*(),
  2002		 * try_to_unmap() may return before folio_mapped() has become false,
  2003		 * if page table locking is skipped: use TTU_SYNC to wait for that.
  2004		 */
  2005		if (flags & TTU_SYNC)
  2006			pvmw.flags = PVMW_SYNC;
  2007	
  2008		/*
  2009		 * For THP, we have to assume the worse case ie pmd for invalidation.
  2010		 * For hugetlb, it could be much worse if we need to do pud
  2011		 * invalidation in the case of pmd sharing.
  2012		 *
  2013		 * Note that the folio can not be freed in this function as call of
  2014		 * try_to_unmap() must hold a reference on the folio.
  2015		 */
  2016		range.end = vma_address_end(&pvmw);
  2017		mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
  2018					address, range.end);
  2019		if (folio_test_hugetlb(folio)) {
  2020			/*
  2021			 * If sharing is possible, start and end will be adjusted
  2022			 * accordingly.
  2023			 */
  2024			adjust_range_if_pmd_sharing_possible(vma, &range.start,
  2025							     &range.end);
  2026	
  2027			/* We need the huge page size for set_huge_pte_at() */
  2028			hsz = huge_page_size(hstate_vma(vma));
  2029		}
  2030		mmu_notifier_invalidate_range_start(&range);
  2031	
  2032		while (page_vma_mapped_walk(&pvmw)) {
  2033			nr_pages = 1;
  2034	
  2035			/*
  2036			 * If the folio is in an mlock()d vma, we must not swap it out.
  2037			 */
  2038			if (!(flags & TTU_IGNORE_MLOCK) &&
  2039			    (vma->vm_flags & VM_LOCKED)) {
  2040				ptes++;
  2041	
  2042				/*
  2043				 * Set 'ret' to indicate the page cannot be unmapped.
  2044				 *
  2045				 * Do not jump to walk_abort immediately as additional
  2046				 * iteration might be required to detect fully mapped
  2047				 * folio an mlock it.
  2048				 */
  2049				ret = false;
  2050	
  2051				/* Only mlock fully mapped pages */
  2052				if (pvmw.pte && ptes != pvmw.nr_pages)
  2053					continue;
  2054	
  2055				/*
  2056				 * All PTEs must be protected by page table lock in
  2057				 * order to mlock the page.
  2058				 *
  2059				 * If page table boundary has been cross, current ptl
  2060				 * only protect part of ptes.
  2061				 */
  2062				if (pvmw.flags & PVMW_PGTABLE_CROSSED)
  2063					goto walk_done;
  2064	
  2065				/* Restore the mlock which got missed */
  2066				mlock_vma_folio(folio, vma);
  2067				goto walk_done;
  2068			}
  2069	
  2070			if (!pvmw.pte) {
  2071				if (folio_test_lazyfree(folio)) {
  2072					if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio))
  2073						goto walk_done;
  2074					/*
  2075					 * unmap_huge_pmd_locked has either already marked
  2076					 * the folio as swap-backed or decided to retain it
  2077					 * due to GUP or speculative references.
  2078					 */
  2079					goto walk_abort;
  2080				}
  2081	
  2082				if (flags & TTU_SPLIT_HUGE_PMD) {
  2083					/*
  2084					 * We temporarily have to drop the PTL and
  2085					 * restart so we can process the PTE-mapped THP.
  2086					 */
  2087					split_huge_pmd_locked(vma, pvmw.address,
  2088							      pvmw.pmd, false);
  2089					flags &= ~TTU_SPLIT_HUGE_PMD;
  2090					page_vma_mapped_walk_restart(&pvmw);
  2091					continue;
  2092				}
  2093			}
  2094	
  2095			/* Unexpected PMD-mapped THP? */
  2096			VM_BUG_ON_FOLIO(!pvmw.pte, folio);
  2097	
  2098			address = pvmw.address;
  2099			if (folio_test_hugetlb(folio)) {
> 2100				pteval = huge_ptep_get(mm, address, pvmw.pte);
  2101			} else {
  2102				/*
  2103				 * Handle PFN swap PTEs, such as device-exclusive ones,
  2104				 * that actually map pages.
  2105				 */
  2106				pteval = ptep_get(pvmw.pte);
  2107			}
  2108			if (likely(pte_present(pteval))) {
  2109				pfn = pte_pfn(pteval);
  2110			} else {
  2111				const softleaf_t entry = softleaf_from_pte(pteval);
  2112	
  2113				pfn = softleaf_to_pfn(entry);
  2114				VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
  2115			}
  2116	
  2117			subpage = folio_page(folio, pfn - folio_pfn(folio));
  2118			anon_exclusive = folio_test_anon(folio) &&
  2119					 PageAnonExclusive(subpage);
  2120	
  2121			if (folio_test_hugetlb(folio)) {
  2122				bool anon = folio_test_anon(folio);
  2123	
  2124				/*
  2125				 * The try_to_unmap() is only passed a hugetlb page
  2126				 * in the case where the hugetlb page is poisoned.
  2127				 */
  2128				VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
  2129				/*
  2130				 * huge_pmd_unshare may unmap an entire PMD page.
  2131				 * There is no way of knowing exactly which PMDs may
  2132				 * be cached for this mm, so we must flush them all.
  2133				 * start/end were already adjusted above to cover this
  2134				 * range.
  2135				 */
  2136				flush_cache_range(vma, range.start, range.end);
  2137	
  2138				/*
  2139				 * To call huge_pmd_unshare, i_mmap_rwsem must be
  2140				 * held in write mode.  Caller needs to explicitly
  2141				 * do this outside rmap routines.
  2142				 *
  2143				 * We also must hold hugetlb vma_lock in write mode.
  2144				 * Lock order dictates acquiring vma_lock BEFORE
  2145				 * i_mmap_rwsem.  We can only try lock here and fail
  2146				 * if unsuccessful.
  2147				 */
  2148				if (!anon) {
  2149					struct mmu_gather tlb;
  2150	
  2151					VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
  2152					if (!hugetlb_vma_trylock_write(vma))
  2153						goto walk_abort;
  2154	
  2155					tlb_gather_mmu_vma(&tlb, vma);
  2156					if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
  2157						hugetlb_vma_unlock_write(vma);
  2158						huge_pmd_unshare_flush(&tlb, vma);
  2159						tlb_finish_mmu(&tlb);
  2160						/*
  2161						 * The PMD table was unmapped,
  2162						 * consequently unmapping the folio.
  2163						 */
  2164						goto walk_done;
  2165					}
  2166					hugetlb_vma_unlock_write(vma);
  2167					tlb_finish_mmu(&tlb);
  2168				}
  2169				pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
  2170				if (pte_dirty(pteval))
  2171					folio_mark_dirty(folio);
  2172			} else if (likely(pte_present(pteval))) {
  2173				nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
  2174				end_addr = address + nr_pages * PAGE_SIZE;
  2175				flush_cache_range(vma, address, end_addr);
  2176	
  2177				/* Nuke the page table entry. */
  2178				pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
  2179				/*
  2180				 * We clear the PTE but do not flush so potentially
  2181				 * a remote CPU could still be writing to the folio.
  2182				 * If the entry was previously clean then the
  2183				 * architecture must guarantee that a clear->dirty
  2184				 * transition on a cached TLB entry is written through
  2185				 * and traps if the PTE is unmapped.
  2186				 */
  2187				if (should_defer_flush(mm, flags))
  2188					set_tlb_ubc_flush_pending(mm, pteval, address, end_addr);
  2189				else
  2190					flush_tlb_range(vma, address, end_addr);
  2191				if (pte_dirty(pteval))
  2192					folio_mark_dirty(folio);
  2193			} else {
  2194				pte_clear(mm, address, pvmw.pte);
  2195			}
  2196	
  2197			/*
  2198			 * Now the pte is cleared. If this pte was uffd-wp armed,
  2199			 * we may want to replace a none pte with a marker pte if
  2200			 * it's file-backed, so we don't lose the tracking info.
  2201			 */
  2202			pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
  2203	
  2204			/* Update high watermark before we lower rss */
  2205			update_hiwater_rss(mm);
  2206	
  2207			if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
  2208				pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
  2209				if (folio_test_hugetlb(folio)) {
  2210					hugetlb_count_sub(folio_nr_pages(folio), mm);
  2211					set_huge_pte_at(mm, address, pvmw.pte, pteval,
  2212							hsz);
  2213				} else {
  2214					dec_mm_counter(mm, mm_counter(folio));
  2215					set_pte_at(mm, address, pvmw.pte, pteval);
  2216				}
  2217			} else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
  2218				   !userfaultfd_armed(vma)) {
  2219				/*
  2220				 * The guest indicated that the page content is of no
  2221				 * interest anymore. Simply discard the pte, vmscan
  2222				 * will take care of the rest.
  2223				 * A future reference will then fault in a new zero
  2224				 * page. When userfaultfd is active, we must not drop
  2225				 * this page though, as its main user (postcopy
  2226				 * migration) will not expect userfaults on already
  2227				 * copied pages.
  2228				 */
  2229				dec_mm_counter(mm, mm_counter(folio));
  2230			} else if (folio_test_anon(folio)) {
  2231				swp_entry_t entry = page_swap_entry(subpage);
  2232				pte_t swp_pte;
  2233				/*
  2234				 * Store the swap location in the pte.
  2235				 * See handle_pte_fault() ...
  2236				 */
  2237				if (unlikely(folio_test_swapbacked(folio) !=
  2238						folio_test_swapcache(folio))) {
  2239					WARN_ON_ONCE(1);
  2240					goto walk_abort;
  2241				}
  2242	
  2243				/* MADV_FREE page check */
  2244				if (!folio_test_swapbacked(folio)) {
  2245					int ref_count, map_count;
  2246	
  2247					/*
  2248					 * Synchronize with gup_pte_range():
  2249					 * - clear PTE; barrier; read refcount
  2250					 * - inc refcount; barrier; read PTE
  2251					 */
  2252					smp_mb();
  2253	
  2254					ref_count = folio_ref_count(folio);
  2255					map_count = folio_mapcount(folio);
  2256	
  2257					/*
  2258					 * Order reads for page refcount and dirty flag
  2259					 * (see comments in __remove_mapping()).
  2260					 */
  2261					smp_rmb();
  2262	
  2263					if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
  2264						/*
  2265						 * redirtied either using the page table or a previously
  2266						 * obtained GUP reference.
  2267						 */
  2268						set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
  2269						folio_set_swapbacked(folio);
  2270						goto walk_abort;
  2271					} else if (ref_count != 1 + map_count) {
  2272						/*
  2273						 * Additional reference. Could be a GUP reference or any
  2274						 * speculative reference. GUP users must mark the folio
  2275						 * dirty if there was a modification. This folio cannot be
  2276						 * reclaimed right now either way, so act just like nothing
  2277						 * happened.
  2278						 * We'll come back here later and detect if the folio was
  2279						 * dirtied when the additional reference is gone.
  2280						 */
  2281						set_ptes(mm, address, pvmw.pte, pteval, nr_pages);
  2282						goto walk_abort;
  2283					}
  2284					add_mm_counter(mm, MM_ANONPAGES, -nr_pages);
  2285					goto discard;
  2286				}
  2287	
  2288				if (folio_dup_swap(folio, subpage) < 0) {
  2289					set_pte_at(mm, address, pvmw.pte, pteval);
  2290					goto walk_abort;
  2291				}
  2292	
  2293				/*
  2294				 * arch_unmap_one() is expected to be a NOP on
  2295				 * architectures where we could have PFN swap PTEs,
  2296				 * so we'll not check/care.
  2297				 */
  2298				if (arch_unmap_one(mm, vma, address, pteval) < 0) {
  2299					folio_put_swap(folio, subpage);
  2300					set_pte_at(mm, address, pvmw.pte, pteval);
  2301					goto walk_abort;
  2302				}
  2303	
  2304				/* See folio_try_share_anon_rmap(): clear PTE first. */
  2305				if (anon_exclusive &&
  2306				    folio_try_share_anon_rmap_pte(folio, subpage)) {
  2307					folio_put_swap(folio, subpage);
  2308					set_pte_at(mm, address, pvmw.pte, pteval);
  2309					goto walk_abort;
  2310				}
  2311				if (list_empty(&mm->mmlist)) {
  2312					spin_lock(&mmlist_lock);
  2313					if (list_empty(&mm->mmlist))
  2314						list_add(&mm->mmlist, &init_mm.mmlist);
  2315					spin_unlock(&mmlist_lock);
  2316				}
  2317				dec_mm_counter(mm, MM_ANONPAGES);
  2318				inc_mm_counter(mm, MM_SWAPENTS);
  2319				swp_pte = swp_entry_to_pte(entry);
  2320				if (anon_exclusive)
  2321					swp_pte = pte_swp_mkexclusive(swp_pte);
  2322				if (likely(pte_present(pteval))) {
  2323					if (pte_soft_dirty(pteval))
  2324						swp_pte = pte_swp_mksoft_dirty(swp_pte);
  2325					if (pte_uffd_wp(pteval))
  2326						swp_pte = pte_swp_mkuffd_wp(swp_pte);
  2327				} else {
  2328					if (pte_swp_soft_dirty(pteval))
  2329						swp_pte = pte_swp_mksoft_dirty(swp_pte);
  2330					if (pte_swp_uffd_wp(pteval))
  2331						swp_pte = pte_swp_mkuffd_wp(swp_pte);
  2332				}
  2333				set_pte_at(mm, address, pvmw.pte, swp_pte);
  2334			} else {
  2335				/*
  2336				 * This is a locked file-backed folio,
  2337				 * so it cannot be removed from the page
  2338				 * cache and replaced by a new folio before
  2339				 * mmu_notifier_invalidate_range_end, so no
  2340				 * concurrent thread might update its page table
  2341				 * to point at a new folio while a device is
  2342				 * still using this folio.
  2343				 *
  2344				 * See Documentation/mm/mmu_notifier.rst
  2345				 */
  2346				add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
  2347			}
  2348	discard:
  2349			if (unlikely(folio_test_hugetlb(folio))) {
  2350				hugetlb_remove_rmap(folio);
  2351			} else {
  2352				folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
  2353			}
  2354			if (vma->vm_flags & VM_LOCKED)
  2355				mlock_drain_local();
  2356			folio_put_refs(folio, nr_pages);
  2357	
  2358			/*
  2359			 * If we are sure that we batched the entire folio and cleared
  2360			 * all PTEs, we can just optimize and stop right here.
  2361			 */
  2362			if (nr_pages == folio_nr_pages(folio))
  2363				goto walk_done;
  2364			continue;
  2365	walk_abort:
  2366			ret = false;
  2367	walk_done:
  2368			page_vma_mapped_walk_done(&pvmw);
  2369			break;
  2370		}
  2371	
  2372		mmu_notifier_invalidate_range_end(&range);
  2373	
  2374		return ret;
  2375	}
  2376	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-25  5:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25  4:28 [PATCH] mm/rmap: use huge_ptep_get() in try_to_unmap_one() Dev Jain
2026-06-25  4:42 ` Andrew Morton
2026-06-25  5:06   ` Dev Jain
2026-06-25  5:45 ` kernel test robot
2026-06-25  5:45 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox