* [PATCH v6 08/30] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio
[not found] <cover.1778487000.git.mst@redhat.com>
@ 2026-05-11 8:53 ` Michael S. Tsirkin
2026-05-11 8:53 ` [PATCH v6 09/30] mm: alloc_swap_folio: " Michael S. Tsirkin
` (6 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:53 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Liam R. Howlett
Pass vmf->address directly instead of ALIGN_DOWN(vmf->address, ...).
vma_alloc_folio_noprof now aligns internally for NUMA interleave,
and post_alloc_hook will use the raw address for cache-friendly
zeroing via folio_zero_user().
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
mm/memory.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index ea6568571131..0824441a6ba1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5252,8 +5252,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
/* Try allocating the highest of the remaining orders. */
gfp = vma_thp_gfp_mask(vma);
while (orders) {
- addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- folio = vma_alloc_folio(gfp, order, vma, addr);
+ folio = vma_alloc_folio(gfp, order, vma, vmf->address);
if (folio) {
if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 09/30] mm: alloc_swap_folio: pass raw fault address to vma_alloc_folio
[not found] <cover.1778487000.git.mst@redhat.com>
2026-05-11 8:53 ` [PATCH v6 08/30] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
@ 2026-05-11 8:53 ` Michael S. Tsirkin
2026-05-11 8:53 ` [PATCH v6 10/30] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
` (5 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:53 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Liam R. Howlett
Same change as the previous patch but for alloc_swap_folio:
pass vmf->address directly instead of ALIGN_DOWN(vmf->address, ...).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
mm/memory.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 0824441a6ba1..74523bc00d8a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4734,8 +4734,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
/* Try allocating the highest of the remaining orders. */
gfp = vma_thp_gfp_mask(vma);
while (orders) {
- addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- folio = vma_alloc_folio(gfp, order, vma, addr);
+ folio = vma_alloc_folio(gfp, order, vma, vmf->address);
if (folio) {
if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
gfp, entry))
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 10/30] mm: use __GFP_ZERO in alloc_anon_folio
[not found] <cover.1778487000.git.mst@redhat.com>
2026-05-11 8:53 ` [PATCH v6 08/30] mm: alloc_anon_folio: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
2026-05-11 8:53 ` [PATCH v6 09/30] mm: alloc_swap_folio: " Michael S. Tsirkin
@ 2026-05-11 8:53 ` Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 11/30] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
` (4 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:53 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Liam R. Howlett
Convert alloc_anon_folio() to pass __GFP_ZERO instead of zeroing
at the callsite. post_alloc_hook uses the fault address passed
through vma_alloc_folio for cache-friendly zeroing.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/memory.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 74523bc00d8a..f3f1bc66366d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5249,7 +5249,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
goto fallback;
/* Try allocating the highest of the remaining orders. */
- gfp = vma_thp_gfp_mask(vma);
+ gfp = vma_thp_gfp_mask(vma) | __GFP_ZERO;
while (orders) {
folio = vma_alloc_folio(gfp, order, vma, vmf->address);
if (folio) {
@@ -5259,15 +5259,6 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
goto next;
}
folio_throttle_swaprate(folio, gfp);
- /*
- * When a folio is not zeroed during allocation
- * (__GFP_ZERO not used) or user folios require special
- * handling, folio_zero_user() is used to make sure
- * that the page corresponding to the faulting address
- * will be hot in the cache after zeroing.
- */
- if (user_alloc_needs_zeroing())
- folio_zero_user(folio, vmf->address);
return folio;
}
next:
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 11/30] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio
[not found] <cover.1778487000.git.mst@redhat.com>
` (2 preceding siblings ...)
2026-05-11 8:53 ` [PATCH v6 10/30] mm: use __GFP_ZERO in alloc_anon_folio Michael S. Tsirkin
@ 2026-05-11 8:54 ` Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 12/30] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
` (3 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:54 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Liam R. Howlett
Now that vma_alloc_folio aligns the address internally, drop the
redundant HPAGE_PMD_MASK alignment at the callsite.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 970e077019b7..d689e6491ddb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1337,7 +1337,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
const int order = HPAGE_PMD_ORDER;
struct folio *folio;
- folio = vma_alloc_folio(gfp, order, vma, addr & HPAGE_PMD_MASK);
+ folio = vma_alloc_folio(gfp, order, vma, addr);
if (unlikely(!folio)) {
count_vm_event(THP_FAULT_FALLBACK);
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 12/30] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd
[not found] <cover.1778487000.git.mst@redhat.com>
` (3 preceding siblings ...)
2026-05-11 8:54 ` [PATCH v6 11/30] mm: vma_alloc_anon_folio_pmd: pass raw fault address to vma_alloc_folio Michael S. Tsirkin
@ 2026-05-11 8:54 ` Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 13/30] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
` (2 subsequent siblings)
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:54 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Liam R. Howlett
Convert vma_alloc_anon_folio_pmd() to pass __GFP_ZERO instead of
zeroing at the callsite. post_alloc_hook uses the fault address
passed through vma_alloc_folio for cache-friendly zeroing.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
mm/huge_memory.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d689e6491ddb..9845c920c29c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1333,7 +1333,7 @@ EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
unsigned long addr)
{
- gfp_t gfp = vma_thp_gfp_mask(vma);
+ gfp_t gfp = vma_thp_gfp_mask(vma) | __GFP_ZERO;
const int order = HPAGE_PMD_ORDER;
struct folio *folio;
@@ -1356,14 +1356,6 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
}
folio_throttle_swaprate(folio, gfp);
- /*
- * When a folio is not zeroed during allocation (__GFP_ZERO not used)
- * or user folios require special handling, folio_zero_user() is used to
- * make sure that the page corresponding to the faulting address will be
- * hot in the cache after zeroing.
- */
- if (user_alloc_needs_zeroing())
- folio_zero_user(folio, addr);
/*
* The memory barrier inside __folio_mark_uptodate makes sure that
* folio_zero_user writes become visible before the set_pmd_at()
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 13/30] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages
[not found] <cover.1778487000.git.mst@redhat.com>
` (4 preceding siblings ...)
2026-05-11 8:54 ` [PATCH v6 12/30] mm: use __GFP_ZERO in vma_alloc_anon_folio_pmd Michael S. Tsirkin
@ 2026-05-11 8:54 ` Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 14/30] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 15/30] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:54 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli
Convert the hugetlb fault and fallocate paths to use __GFP_ZERO.
For pages allocated from the buddy allocator, post_alloc_hook()
handles zeroing.
Hugetlb surplus pages need special handling because they can be
pre-allocated into the pool during mmap (by hugetlb_acct_memory)
before any page fault. Pool pages are kept around and may need
zeroing long after buddy allocation, so a buddy-level zeroed
hint (consumed at allocation time) cannot track their state.
Add a bool *zeroed output parameter to alloc_hugetlb_folio()
so callers know whether the page needs zeroing. Buddy-allocated
pages are always zeroed (zeroed by post_alloc_hook). Pool
pages use a new HPG_zeroed flag to track whether the page is
known-zero (freshly buddy-allocated, never mapped to userspace).
The flag is set in alloc_surplus_hugetlb_folio() after buddy
allocation and cleared in free_huge_folio() when a user-mapped
page returns to the pool.
Callers that do not need zeroing (CoW, migration) pass NULL for
zeroed and 0 for gfp.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: cursor-agent:GPT-5.4-xhigh
---
fs/hugetlbfs/inode.c | 10 ++++++--
include/linux/hugetlb.h | 8 +++++--
mm/hugetlb.c | 52 ++++++++++++++++++++++++++++++-----------
3 files changed, 53 insertions(+), 17 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8b05bec08e04..24e42cb10ade 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -810,14 +810,20 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
* folios in these areas, we need to consume the reserves
* to keep reservation accounting consistent.
*/
- folio = alloc_hugetlb_folio(&pseudo_vma, addr, false);
+ {
+ bool zeroed;
+
+ folio = alloc_hugetlb_folio(&pseudo_vma, addr, false,
+ __GFP_ZERO, &zeroed);
if (IS_ERR(folio)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
error = PTR_ERR(folio);
goto out;
}
- folio_zero_user(folio, addr);
+ if (!zeroed)
+ folio_zero_user(folio, addr);
__folio_mark_uptodate(folio);
+ }
error = hugetlb_add_to_page_cache(folio, mapping, index);
if (unlikely(error)) {
restore_reserve_on_error(h, &pseudo_vma, addr, folio);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 93418625d3c5..950e1702fbd8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -599,6 +599,7 @@ enum hugetlb_page_flags {
HPG_vmemmap_optimized,
HPG_raw_hwp_unreliable,
HPG_cma,
+ HPG_zeroed,
__NR_HPAGEFLAGS,
};
@@ -659,6 +660,7 @@ HPAGEFLAG(Freed, freed)
HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
HPAGEFLAG(Cma, cma)
+HPAGEFLAG(Zeroed, zeroed)
#ifdef CONFIG_HUGETLB_PAGE
@@ -706,7 +708,8 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
- unsigned long addr, bool cow_from_owner);
+ unsigned long addr, bool cow_from_owner,
+ gfp_t gfp, bool *zeroed);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask,
bool allow_alloc_fallback);
@@ -1131,7 +1134,8 @@ static inline void wait_for_freed_hugetlb_folios(void)
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
- bool cow_from_owner)
+ bool cow_from_owner,
+ gfp_t gfp, bool *zeroed)
{
return NULL;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a999f3ead852..8710366d14b7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1708,6 +1708,9 @@ void free_huge_folio(struct folio *folio)
int nid = folio_nid(folio);
struct hugepage_subpool *spool = hugetlb_folio_subpool(folio);
bool restore_reserve;
+
+ /* Page was mapped to userspace; no longer known-zero */
+ folio_clear_hugetlb_zeroed(folio);
unsigned long flags;
VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
@@ -2110,6 +2113,10 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
if (!folio)
return NULL;
+ /* Mark as known-zero only if __GFP_ZERO was requested */
+ if (gfp_mask & __GFP_ZERO)
+ folio_set_hugetlb_zeroed(folio);
+
spin_lock_irq(&hugetlb_lock);
/*
* nr_huge_pages needs to be adjusted within the same lock cycle
@@ -2173,11 +2180,11 @@ static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mas
*/
static
struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
- struct vm_area_struct *vma, unsigned long addr)
+ struct vm_area_struct *vma, unsigned long addr, gfp_t gfp)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
- gfp_t gfp_mask = htlb_alloc_mask(h);
+ gfp_t gfp_mask = htlb_alloc_mask(h) | gfp;
int nid;
nodemask_t *nodemask;
@@ -2874,7 +2881,8 @@ typedef enum {
* When it's set, the allocation will bypass all vma level reservations.
*/
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
- unsigned long addr, bool cow_from_owner)
+ unsigned long addr, bool cow_from_owner,
+ gfp_t gfp, bool *zeroed)
{
struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
@@ -2883,7 +2891,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
map_chg_state map_chg;
int ret, idx;
struct hugetlb_cgroup *h_cg = NULL;
- gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
+ bool from_pool;
+
+ gfp |= htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
idx = hstate_index(h);
@@ -2951,13 +2961,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
- folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
+ folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr, gfp);
if (!folio)
goto out_uncharge_cgroup;
spin_lock_irq(&hugetlb_lock);
list_add(&folio->lru, &h->hugepage_activelist);
folio_ref_unfreeze(folio, 1);
- /* Fall through */
+ from_pool = false;
+ } else {
+ from_pool = true;
}
/*
@@ -2980,6 +2992,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
spin_unlock_irq(&hugetlb_lock);
+ if (zeroed) {
+ if (from_pool)
+ *zeroed = folio_test_hugetlb_zeroed(folio);
+ else
+ *zeroed = true; /* buddy-allocated, zeroed by post_alloc_hook */
+ folio_clear_hugetlb_zeroed(folio);
+ }
+
hugetlb_set_folio_subpool(folio, spool);
if (map_chg != MAP_CHG_ENFORCED) {
@@ -4988,7 +5008,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
/* Do not use reserve as it's private owned */
- new_folio = alloc_hugetlb_folio(dst_vma, addr, false);
+ new_folio = alloc_hugetlb_folio(dst_vma, addr, false, 0, NULL);
if (IS_ERR(new_folio)) {
folio_put(pte_folio);
ret = PTR_ERR(new_folio);
@@ -5517,7 +5537,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
* be acquired again before returning to the caller, as expected.
*/
spin_unlock(vmf->ptl);
- new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner);
+ new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner, 0, NULL);
if (IS_ERR(new_folio)) {
/*
@@ -5711,7 +5731,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
struct vm_fault *vmf)
{
u32 hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
- bool new_folio, new_anon_folio = false;
+ bool new_folio, new_anon_folio = false, zeroed;
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
struct hstate *h = hstate_vma(vma);
@@ -5777,7 +5797,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
goto out;
}
- folio = alloc_hugetlb_folio(vma, vmf->address, false);
+ folio = alloc_hugetlb_folio(vma, vmf->address, false,
+ __GFP_ZERO, &zeroed);
if (IS_ERR(folio)) {
/*
* Returning error will result in faulting task being
@@ -5797,7 +5818,12 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
ret = 0;
goto out;
}
- folio_zero_user(folio, vmf->real_address);
+ /*
+ * Buddy-allocated pages are zeroed in post_alloc_hook().
+ * Pool pages bypass the allocator, zero them here.
+ */
+ if (!zeroed)
+ folio_zero_user(folio, vmf->real_address);
__folio_mark_uptodate(folio);
new_folio = true;
@@ -6236,7 +6262,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out;
}
- folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL);
if (IS_ERR(folio)) {
pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
if (actual_pte) {
@@ -6283,7 +6309,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
goto out;
}
- folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
+ folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL);
if (IS_ERR(folio)) {
folio_put(*foliop);
ret = -ENOMEM;
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 14/30] mm: memfd: skip zeroing for zeroed hugetlb pool pages
[not found] <cover.1778487000.git.mst@redhat.com>
` (5 preceding siblings ...)
2026-05-11 8:54 ` [PATCH v6 13/30] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Michael S. Tsirkin
@ 2026-05-11 8:54 ` Michael S. Tsirkin
2026-05-11 8:54 ` [PATCH v6 15/30] mm: remove arch vma_alloc_zeroed_movable_folio overrides Michael S. Tsirkin
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:54 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli
gather_surplus_pages() pre-allocates hugetlb pages into the pool
during mmap. Pass __GFP_ZERO so these pages are zeroed by the
buddy allocator, and HPG_zeroed is set by alloc_surplus_hugetlb_folio.
Add bool *zeroed output to alloc_hugetlb_folio_reserve() so
callers can check whether the pool page is known-zero. memfd's
memfd_alloc_folio() uses this to skip the explicit folio_zero_user()
when the page is already zero.
This avoids redundant zeroing for memfd hugetlb pages that were
pre-allocated into the pool and never mapped to userspace.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
include/linux/hugetlb.h | 6 ++++--
mm/hugetlb.c | 11 +++++++++--
mm/memfd.c | 14 ++++++++------
3 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 950e1702fbd8..c4e66a371fce 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -714,7 +714,8 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask,
bool allow_alloc_fallback);
struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask);
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool *zeroed);
int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
@@ -1142,7 +1143,8 @@ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
static inline struct folio *
alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool *zeroed)
{
return NULL;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8710366d14b7..03ad5c1e0655 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2205,7 +2205,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
}
struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool *zeroed)
{
struct folio *folio;
@@ -2221,6 +2221,12 @@ struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid,
h->resv_huge_pages--;
spin_unlock_irq(&hugetlb_lock);
+
+ if (zeroed && folio) {
+ *zeroed = folio_test_hugetlb_zeroed(folio);
+ folio_clear_hugetlb_zeroed(folio);
+ }
+
return folio;
}
@@ -2305,7 +2311,8 @@ static int gather_surplus_pages(struct hstate *h, long delta)
* It is okay to use NUMA_NO_NODE because we use numa_mem_id()
* down the road to pick the current node if that is the case.
*/
- folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
+ folio = alloc_surplus_hugetlb_folio(h,
+ htlb_alloc_mask(h) | __GFP_ZERO,
NUMA_NO_NODE, &alloc_nodemask,
USER_ADDR_NONE);
if (!folio) {
diff --git a/mm/memfd.c b/mm/memfd.c
index fb425f4e315f..5518f7d2d91f 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -69,6 +69,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
#ifdef CONFIG_HUGETLB_PAGE
struct folio *folio;
gfp_t gfp_mask;
+ bool zeroed;
if (is_file_hugepages(memfd)) {
/*
@@ -93,17 +94,18 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
- gfp_mask);
+ gfp_mask,
+ &zeroed);
if (folio) {
u32 hash;
/*
- * Zero the folio to prevent information leaks to userspace.
- * Use folio_zero_user() which is optimized for huge/gigantic
- * pages. Pass 0 as addr_hint since this is not a faulting path
- * and we don't have a user virtual address yet.
+ * Zero the folio to prevent information leaks to
+ * userspace. Skip if the pool page is known-zero
+ * (HPG_zeroed set during pool pre-allocation).
*/
- folio_zero_user(folio, 0);
+ if (!zeroed)
+ folio_zero_user(folio, 0);
/*
* Mark the folio uptodate before adding to page cache,
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 15/30] mm: remove arch vma_alloc_zeroed_movable_folio overrides
[not found] <cover.1778487000.git.mst@redhat.com>
` (6 preceding siblings ...)
2026-05-11 8:54 ` [PATCH v6 14/30] mm: memfd: skip zeroing for zeroed hugetlb pool pages Michael S. Tsirkin
@ 2026-05-11 8:54 ` Michael S. Tsirkin
7 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2026-05-11 8:54 UTC (permalink / raw)
To: linux-kernel
Cc: David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Magnus Lindholm,
Greg Ungerer, Geert Uytterhoeven, Richard Henderson, Matt Turner,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-alpha, linux-m68k, linux-s390
Now that the generic vma_alloc_zeroed_movable_folio() uses
__GFP_ZERO, the arch-specific macros on alpha, m68k, s390, and
x86 that did the same thing are redundant. Remove them.
arm64 is not affected: it has a real function override that
handles MTE tag zeroing, not just __GFP_ZERO.
Suggested-by: David Hildenbrand <david@kernel.org>
Acked-by: Magnus Lindholm <linmag7@gmail.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
arch/alpha/include/asm/page.h | 3 ---
arch/m68k/include/asm/page_no.h | 3 ---
arch/s390/include/asm/page.h | 3 ---
arch/x86/include/asm/page.h | 3 ---
4 files changed, 12 deletions(-)
diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h
index 59d01f9b77f6..4327029cd660 100644
--- a/arch/alpha/include/asm/page.h
+++ b/arch/alpha/include/asm/page.h
@@ -12,9 +12,6 @@
extern void clear_page(void *page);
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
extern void copy_page(void * _to, void * _from);
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index d2532bc407ef..f511b763a235 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -12,9 +12,6 @@ extern unsigned long memory_end;
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#define __pa(vaddr) ((unsigned long)(vaddr))
#define __va(paddr) ((void *)((unsigned long)(paddr)))
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 56da819a79e6..e995d2a413f9 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -67,9 +67,6 @@ static inline void copy_page(void *to, void *from)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#ifdef CONFIG_STRICT_MM_TYPECHECKS
#define STRICT_MM_TYPECHECKS
#endif
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 416dc88e35c1..92fa975b46f3 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -28,9 +28,6 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
copy_page(to, from);
}
-#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
- vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr)
-
#ifndef __pa
#define __pa(x) __phys_addr((unsigned long)(x))
#endif
--
MST
^ permalink raw reply related [flat|nested] 8+ messages in thread