From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBEDA23D7F0 for ; Tue, 21 Apr 2026 22:01:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776808906; cv=none; b=f8V9NflzwpYXfnpp9Evw5maJkzOZ0GBE9ILcHr83A+WhUyjlIY252+S5+0UGqDn2v5Mm5PlZ8jMT5Y9rfcXu+CC3bnVxXDYNVobtlu3n93DeBbBKNlHHLEuyLgc/DNBloKdWGpx7F9BlKUsiBeQrB7qU6ir236d9iNtQX0SqaM8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776808906; c=relaxed/simple; bh=EWFOuheoC1DvTBgPAHezGTmeJdyms6YPvVMBqd1xoVo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: In-Reply-To:Content-Type:Content-Disposition; b=uFHEZJ2jxi1VZ5/04A7+aewQWkLvzROOlrldnA99i3ypswT6xBlf92zJvLiR5GcRBHEQlheE0rP1oxfldPQMo4yMAE8PNqKXF4GSql0lWYTOFakYpgzOuKwLJ0BGIbZjs5tczYJ4a00d/qW/PxNkmyVnEBeO8q4ZteR8FHjxEww= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VLTXgLIR; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VLTXgLIR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1776808903; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d5XFq6evYUr+s+QSCea6SglT5emqQpzsUniObvQ6PpU=; b=VLTXgLIRYKxhn9XUHo/+ttVvYoK0s3ePy+pnGO8ARmH8tSnIkhfHqtcJuEIHoQEAfgES+p tN6PeSkndpnjpEB0HvfE+b93TLrduOVUVVghJNAiUkK/7jQB+8gro5ZVwpTbicO32f3/2w Jw8oz14VSRWMdqovF7nmJwNYUsChqxk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-694-ZALhoRm7N7G12DtfY1yacg-1; Tue, 21 Apr 2026 18:01:41 -0400 X-MC-Unique: ZALhoRm7N7G12DtfY1yacg-1 X-Mimecast-MFC-AGG-ID: ZALhoRm7N7G12DtfY1yacg_1776808900 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-48378df3469so31088175e9.1 for ; Tue, 21 Apr 2026 15:01:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776808900; x=1777413700; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d5XFq6evYUr+s+QSCea6SglT5emqQpzsUniObvQ6PpU=; b=JJ9Erwv8sBVcVwrqBQ2OMMqEwLP9matt258VwylB/dX8Df/8abB1HR/LvLHipYiw69 NjPB9syxb4Ib7jeZX/eHae5UVmmlVDsGz+Mv/V0MOC9flTEsSsd7nuvWRRAEaiFMvxzd qN2S6Vh2j483f1374GztMLgixkmSiV3I0W514UE9+84VIM/s+VMh476LmG5/7egq8wvB r2phw4dta9QJ+hjgCa1fAuO07DCWtGXaH+9zoBivXBLVjE81O++EZ6YMZwQO+x4dfXCV bNRK9VZbQ6Ff5AhIuuVMNEqgd2+JbRYmzEZIndJIS54/szKbZhKxmMDvYGPxrVQVaFGd b+Nw== X-Forwarded-Encrypted: i=1; AFNElJ/O4D2Srjblzcl8w4WcBwKCDtE4A/CAjsVhG8Holc6tWHSR+tRxTyQQKNS3hxvCPyWXyHXUWgzJBrZDRrSpYw==@lists.linux.dev X-Gm-Message-State: AOJu0YwKBR08N0WHpWahxAsfPJmRZctbfnGvYrAPPXCMem54U1dhMSEy NMF+HRLyszHCJAFXVKQVkdaORBbfqA2d/OwJk0JUbv4tMxloVP1HI/Tww36hcNJ97d0DUGEcreE hMs9ybfACzpeRimuUUq+X2NnYtXQoBGwY5+9uP2wovkWgk0JX1W+PR6Vdmj8SUFf6q/GL X-Gm-Gg: AeBDieswI3CcW8cliLeCvNWmoY/rY/6hMsp9mTtTCD5o6P5GEnAlsMT9UizBbe90meF UdJBPIWJn4S8WAtJxY4KIeHZ7W7Ur7lpc46ftgrawCTNpVVycUed5Gn+j6DDWWfB6F5NcNBbScK ClDIL1xtgOv7p4vSLik+rKx3/Ow+B2WtzKGkKWcP2LR5CAihhHXkrVY0mZAKMvqk8n8lY8dIkYo FJljSZ01U/jzcRHM9OplxDUthcU2S1G9rxGCbFbfzuSDmrx6f4VPiQVjQh0Q66IUJ/YOT/sFQDD Fr9fQYhdqVO9HNqvlbY62/PP4U1I8tjLJjUiOKEtm1y9o0t7TdPTamCIUfR3BiKHZLX7lOHgirE QiBEQH1e86cnEyUMpuN8K6DQ0uFJdtjrPj0GZ6hnUq9Lcnj55AFXWhQ== X-Received: by 2002:a05:600c:3ba0:b0:486:faa8:9e4 with SMTP id 5b1f17b1804b1-488fb8b91a7mr256828965e9.12.1776808900213; Tue, 21 Apr 2026 15:01:40 -0700 (PDT) X-Received: by 2002:a05:600c:3ba0:b0:486:faa8:9e4 with SMTP id 5b1f17b1804b1-488fb8b91a7mr256828685e9.12.1776808899635; Tue, 21 Apr 2026 15:01:39 -0700 (PDT) Received: from redhat.com (IGLD-80-230-25-21.inter.net.il. [80.230.25.21]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4891bba6276sm93953895e9.0.2026.04.21.15.01.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Apr 2026 15:01:39 -0700 (PDT) Date: Tue, 21 Apr 2026 18:01:36 -0400 From: "Michael S. Tsirkin" To: linux-kernel@vger.kernel.org Cc: Andrew Morton , David Hildenbrand , Vlastimil Babka , Brendan Jackman , Michal Hocko , Suren Baghdasaryan , Jason Wang , Andrea Arcangeli , Gregory Price , linux-mm@kvack.org, virtualization@lists.linux.dev, Muchun Song , Oscar Salvador Subject: [PATCH RFC v3 08/19] mm: hugetlb: use __GFP_ZERO and skip zeroing for zeroed pages Message-ID: <6897aec7727120849077661a33248fa2d58b4fe5.1776808210.git.mst@redhat.com> References: Precedence: bulk X-Mailing-List: virtualization@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: U3RcKKeofcE13g6PhNWgt_CYTN3hJLJVJ-QI9ZG69Lc_1776808900 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Convert the hugetlb fault and fallocate paths to use __GFP_ZERO. For pages allocated from the buddy allocator, post_alloc_hook() handles zeroing (with zeroed skip when the host already zeroed the page). Hugetlb surplus pages need special handling because they can be pre-allocated into the pool during mmap (by hugetlb_acct_memory) before any page fault. Pool pages are kept around and may need zeroing long after buddy allocation, so PG_zeroed (consumed at allocation time) cannot track their state. Add a bool *zeroed output parameter to alloc_hugetlb_folio() so callers know whether the page needs zeroing. Buddy-allocated pages are always zeroed (zeroed by post_alloc_hook). Pool pages use a new HPG_zeroed flag to track whether the page is known-zero (freshly buddy-allocated, never mapped to userspace). The flag is set in alloc_surplus_hugetlb_folio() after buddy allocation and cleared in free_huge_folio() when a user-mapped page returns to the pool. Callers that do not need zeroing (CoW, migration) pass NULL for zeroed and 0 for gfp. Signed-off-by: Michael S. Tsirkin Assisted-by: Claude:claude-opus-4-6 --- fs/hugetlbfs/inode.c | 10 ++++++-- include/linux/hugetlb.h | 8 ++++-- mm/hugetlb.c | 54 ++++++++++++++++++++++++++++++++--------- 3 files changed, 56 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3f70c47981de..d5d570d6eff4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -822,14 +822,20 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - folio = alloc_hugetlb_folio(&pseudo_vma, addr, false); + { + bool zeroed; + + folio = alloc_hugetlb_folio(&pseudo_vma, addr, false, + __GFP_ZERO, &zeroed); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); goto out; } - folio_zero_user(folio, addr); + if (!zeroed) + folio_zero_user(folio, addr); __folio_mark_uptodate(folio); + } error = hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, folio); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 65910437be1c..094714c607f9 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -598,6 +598,7 @@ enum hugetlb_page_flags { HPG_vmemmap_optimized, HPG_raw_hwp_unreliable, HPG_cma, + HPG_zeroed, __NR_HPAGEFLAGS, }; @@ -658,6 +659,7 @@ HPAGEFLAG(Freed, freed) HPAGEFLAG(VmemmapOptimized, vmemmap_optimized) HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable) HPAGEFLAG(Cma, cma) +HPAGEFLAG(Zeroed, zeroed) #ifdef CONFIG_HUGETLB_PAGE @@ -705,7 +707,8 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); void wait_for_freed_hugetlb_folios(void); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, bool cow_from_owner); + unsigned long addr, bool cow_from_owner, + gfp_t gfp, bool *zeroed); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); @@ -1117,7 +1120,8 @@ static inline void wait_for_freed_hugetlb_folios(void) static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - bool cow_from_owner) + bool cow_from_owner, + gfp_t gfp, bool *zeroed) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index de8361b503d2..4f0ed01f5b13 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1744,6 +1744,9 @@ void free_huge_folio(struct folio *folio) int nid = folio_nid(folio); struct hugepage_subpool *spool = hugetlb_folio_subpool(folio); bool restore_reserve; + + /* Page was mapped to userspace; no longer known-zero */ + folio_clear_hugetlb_zeroed(folio); unsigned long flags; VM_BUG_ON_FOLIO(folio_ref_count(folio), folio); @@ -2146,6 +2149,10 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, if (!folio) return NULL; + /* Mark as known-zero only if __GFP_ZERO was requested */ + if (gfp_mask & __GFP_ZERO) + folio_set_hugetlb_zeroed(folio); + spin_lock_irq(&hugetlb_lock); /* * nr_huge_pages needs to be adjusted within the same lock cycle @@ -2209,11 +2216,11 @@ static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mas */ static struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) + struct vm_area_struct *vma, unsigned long addr, gfp_t gfp) { struct folio *folio = NULL; struct mempolicy *mpol; - gfp_t gfp_mask = htlb_alloc_mask(h); + gfp_t gfp_mask = htlb_alloc_mask(h) | gfp; int nid; nodemask_t *nodemask; @@ -2910,7 +2917,8 @@ typedef enum { * When it's set, the allocation will bypass all vma level reservations. */ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, bool cow_from_owner) + unsigned long addr, bool cow_from_owner, + gfp_t gfp, bool *zeroed) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -2919,7 +2927,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; - gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + bool from_pool; + + gfp |= htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; idx = hstate_index(h); @@ -2987,13 +2997,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); if (!folio) { spin_unlock_irq(&hugetlb_lock); - folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); + folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr, gfp); if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); - /* Fall through */ + from_pool = false; + } else { + from_pool = true; } /* @@ -3016,6 +3028,14 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, spin_unlock_irq(&hugetlb_lock); + if (zeroed) { + if (from_pool) + *zeroed = folio_test_hugetlb_zeroed(folio); + else + *zeroed = true; /* buddy-allocated, zeroed by post_alloc_hook */ + folio_clear_hugetlb_zeroed(folio); + } + hugetlb_set_folio_subpool(folio, spool); if (map_chg != MAP_CHG_ENFORCED) { @@ -5004,7 +5024,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, false); + new_folio = alloc_hugetlb_folio(dst_vma, addr, false, 0, NULL); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5533,7 +5553,7 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf) * be acquired again before returning to the caller, as expected. */ spin_unlock(vmf->ptl); - new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner); + new_folio = alloc_hugetlb_folio(vma, vmf->address, cow_from_owner, 0, NULL); if (IS_ERR(new_folio)) { /* @@ -5793,7 +5813,11 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, goto out; } - folio = alloc_hugetlb_folio(vma, vmf->address, false); + { + bool zeroed; + + folio = alloc_hugetlb_folio(vma, vmf->address, false, + __GFP_ZERO, &zeroed); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -5813,9 +5837,15 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, ret = 0; goto out; } - folio_zero_user(folio, vmf->real_address); + /* + * Buddy-allocated pages are zeroed in post_alloc_hook(). + * Pool pages bypass the allocator, zero them here. + */ + if (!zeroed) + folio_zero_user(folio, vmf->real_address); __folio_mark_uptodate(folio); new_folio = true; + } if (vma->vm_flags & VM_MAYSHARE) { int err = hugetlb_add_to_page_cache(folio, mapping, @@ -6252,7 +6282,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL); if (IS_ERR(folio)) { pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE); if (actual_pte) { @@ -6299,7 +6329,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, false, 0, NULL); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM; -- MST