public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>, hev <r@hev.cc>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Barry Song <baohua@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	Dev Jain <dev.jain@arm.com>, Jan Kara <jack@suse.cz>,
	Kees Cook <kees@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Zi Yan <ziy@nvidia.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs
Date: Fri, 13 Mar 2026 17:47:51 +0800	[thread overview]
Message-ID: <1a886b5b-319c-4f3e-8db1-6af6696f4d84@linux.dev> (raw)
In-Reply-To: <60ba4311-01f8-4ff3-a2df-e1b3fb6db699@kernel.org>



On 2026/3/13 00:29, David Hildenbrand (Arm) wrote:
> On 3/12/26 17:12, hev wrote:
>> Hi David,
>>
>> On Thu, Mar 12, 2026 at 11:57 PM David Hildenbrand (Arm)
>> <david@kernel.org> wrote:
>>>
>>> On 3/12/26 16:53, David Hildenbrand (Arm) wrote:
>>>>
>>>> There are other ways to enable PMD THP. So I don't quite think this is
>>>> the right tool for the job.
>>>
>>> Ah, you care about file THPs ... gah.
>>>
>>> Why can't we simply do the alignment without considering the current
>>> setting?
>>
>> The main motivation of raising the alignment here is to increase the
>> chance of getting PMD-sized THPs for executable mappings.
>>
>> If THP is not in "always" mode, the kernel will not automatically
>> collapse file-backed mappings into THPs, so the increased alignment
>> would not actually improve THP usage.
>>
>> In that case we would only be introducing additional padding in the
>> virtual address layout, which slightly reduces ASLR entropy without
>> providing a practical benefit.
> 
> Well, that parameter can get toggled at runtime later? Also, I think
> that readahead code could end up allocating a PMD THP (I might be
> wrong about that, the code is confusing).

Right. In do_sync_mmap_readahead(), if the VMA has VM_HUGEPAGE,
force_thp_readahead becomes true and ra->order is set to
HPAGE_PMD_ORDER, IIUC.

	/* Use the readahead code, even if readahead is disabled */
	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
	    (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER)
		force_thp_readahead = true;

That order is then passed down to page_cache_ra_order() and finally to
filemap_alloc_folio().

	if (force_thp_readahead) {
[...]
		ra->async_size = HPAGE_PMD_NR;
		ra->order = HPAGE_PMD_ORDER;
		page_cache_ra_order(&ractl, ra);
		return fpin;
	}


For plain VM_EXEC, the code starts from exec_folio_order(), not
HPAGE_PMD_ORDER.

	if (vm_flags & VM_EXEC) {
[...]
		ra->order = exec_folio_order();
[...]
		ra->async_size = 0;
	}

The default exec_folio_order() is small, and arm64 only overrides it
to 64K.

/*
  * Request exec memory is read into pagecache in at least 64K folios. 
This size
  * can be contpte-mapped when 4K base pages are in use (16 pages into 1 
iTLB
  * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
  * pages are in use.
  */

#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)

> 
> Let's take a look at __get_unmapped_area(), where we don't care about
> ASLR entropy for anonymous memory:
> 
> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && !file
> 	   && !addr /* no hint */
> 	   && IS_ALIGNED(len, PMD_SIZE)) {

Yeah. For anonymous memory, the kernel is willing to do THP-friendly
alignment, but it is constrained, of course :)

> Interestingly we had:
> 
> commit 34d7cf637c437d5c2a8a6ef23ea45193bad8a91c
> Author: Kefeng Wang <wangkefeng.wang@huawei.com>
> Date:   Fri Dec 6 15:03:45 2024 +0800
> 
>      mm: don't try THP alignment for FS without get_unmapped_area
>      
>      Commit ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") changes
>      thp_get_unmapped_area() to thp_get_unmapped_area_vmflags() in
>      __get_unmapped_area(), which doesn't initialize local get_area for
>      anonymous mappings.  This leads to us always trying THP alignment even for
>      file_operations which have a NULL ->get_unmapped_area() callback.
>      
>      Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>      boundaries") we only want to enable THP alignment for anonymous mappings,
>      so add a !file check to avoid attempting THP alignment for file mappings.
>      
>      Found issue by code inspection.  THP alignment is used for easy or more
>      pmd mappings, from vma side.  This may cause unnecessary VMA fragmentation
>      and potentially worse performance on filesystems that do not actually
>      support THPs and thus cannot benefit from the alignment.

Looks like this commit does not *ban* file-backed THP-friendly alignment
completely. It only prevents file mappings from getting it accidentally
via the generic fallback path.

Note that some filesystems still explicitly opt in with their own

.get_unmapped_area = thp_get_unmapped_area

for example ext4, xfs, and btrfs.

So explicit filesystem opt-in is still allowed :)

> I'm not sure about the "VMA fragmentation" argument, really. We only consider
> stuff that is already multiples of PMD_SIZE.
>
> Filesystem support for THPs is also not really something you would handle, and it's
> a problem that solves itself over time as more filesystems keep adding support for
> large folios.
> 
> So I think we should try limiting it to IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE),
> but not checking the runtime toggle.

Good point! ELF layout is decided once at exec time, while the runtime 
THP mode
can change later.


  parent reply	other threads:[~2026-03-13  9:48 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  3:11 [PATCH v4 0/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
2026-03-10  3:11 ` [PATCH v4 1/2] huge_mm: add stubs for THP-disabled configs WANG Rui
2026-03-12 15:53   ` David Hildenbrand (Arm)
2026-03-12 15:57     ` David Hildenbrand (Arm)
2026-03-12 16:12       ` hev
2026-03-12 16:29         ` David Hildenbrand (Arm)
2026-03-13  0:10           ` hev
2026-03-13  9:47           ` Lance Yang [this message]
2026-03-10  3:11 ` [PATCH v4 2/2] binfmt_elf: Align eligible read-only PT_LOAD segments to PMD_SIZE for THP WANG Rui
2026-03-13  8:41 ` [PATCH v4 0/2] " Baolin Wang
2026-03-13 10:46   ` Usama Arif
2026-03-13 14:39   ` hev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a886b5b-319c-4f3e-8db1-6af6696f4d84@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=r@hev.cc \
    --cc=ryan.roberts@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox