Re: [PATCH v6 2/2] mm: use mapping_max_folio_order() for force_thp_readahead order

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Usama Arif <usama.arif@linux.dev>
To: Pedro Falcato <pfalcato@suse.de>, Jan Kara <jack@suse.cz>
Cc: willy@infradead.org, Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org, ryan.roberts@arm.com, linux-mm@kvack.org,
	r@hev.cc, Andrew Donnellan <andrew+kernel@donnellan.id.au>,
	apopple@nvidia.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, brauner@kernel.org,
	catalin.marinas@arm.com, dev.jain@arm.com, kees@kernel.org,
	kevin.brodsky@arm.com, lance.yang@linux.dev,
	"Liam R. Howlett" <liam@infradead.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	ljs@kernel.org, mhocko@suse.com, npache@redhat.com,
	pasha.tatashin@soleen.com, rmclure@linux.ibm.com,
	rppt@kernel.org, surenb@google.com, vbabka@kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	wilts.infradead.org@pedro-suse.lan, ziy@nvidia.com,
	hannes@cmpxchg.org, kas@kernel.org, shakeel.butt@linux.dev,
	kernel-team@meta.com
Subject: Re: [PATCH v6 2/2] mm: use mapping_max_folio_order() for force_thp_readahead order
Date: Wed, 3 Jun 2026 11:10:45 +0100	[thread overview]
Message-ID: <68ca2764-ca7c-482e-8e78-8c112ce01f99@linux.dev> (raw)
In-Reply-To: <ah8Qcxuugn5tTilK@pedro-suse>



On 02/06/2026 18:35, Pedro Falcato wrote:
> On Sat, May 30, 2026 at 05:16:29PM +0200, Jan Kara wrote:
>> On Fri 29-05-26 15:11:54, Usama Arif wrote:
>>> On 29/05/2026 14:40, Pedro Falcato wrote:
>>>> On Fri, May 29, 2026 at 01:19:03PM +0100, Usama Arif wrote:
>>>>>
>>>>> which means mapping_max_folio_order(mapping) <= MAX_PAGECACHE_ORDER <= HPAGE_PMD_ORDER is always
>>>>> true, and you dont need the min3(..) in your diff.
>>>>>
>>>>> Now the question is if then why not just do:
>>>>>
>>>>> 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && (vm_flags & VM_HUGEPAGE)) {
>>>>> 		if (mapping_large_folio_support(mapping)) {
>>>>> 			force_thp_readahead = true;
>>>>> 			thp_order = min_t(unsigned int,
>>>>> 					  mapping_max_folio_order(mapping),
>>>>> 					  get_order(SZ_2M));
>>>>> 		}
>>>>> 	}
>>>>>
>>>>>
>>>>> This is because this will regress the 16K ARM case where we already got 32M
>>>>> folios. Someone might upgrade the kernel and start getting 2M folios now.
>>>>
>>>> So maybe limit to 32MB? It's still arbitrary but at least you get simpler
>>>> logic. If the architecture does not support 32MiB folios, it will clamp
>>>> the maximum folio order to HPAGE_PMD_ORDER, and you get the same result.
>>>>
>>>> Does this sound correct?
>>>>
>>>
>>> Yes, so if we replace it with SZ_32M, it sounds correct. I just think
>>> the 32M size is too large. But as you pointed out, even 2M can be too large...
>>
>> So AFAIU the practical discussion is about two options:
>>
>> 1) limiting at 2MB with a slighly more complicated logic to keep mapping at
>> PMD order for 16k pagesize on ARM but use 2MB pages for 64k pagesize on ARM
>>
>> or
>>
>> 2) limit at 32MB with simple logic which results in larger (32MB) folios
>> with 16k and 64k pagesize on ARM and thus larger memory overhead.
>>
>> I'd like to maybe offer option 3): limit at 2MB with simple logic. This
>> will reduce folio size on 16k pagesize ARM compared to 1) but do we really
>> care? I.e., is there big enough practical performance impact with conpte
>> and other tricks ARM is playing?
>>
> 
> arm64 16K contpte tops out at 256KB TLB entries. It's quite a lot smaller than
> a PMD entry. Also, something that was discussed at LSFMM was its effectiveness.
> Apparently, most of the gains seem to sit on actually having a larger page size
> (perhaps Dev/Ryan can comment; sadly the slides were not posted anywhere on
> the ML, so I don't have numbers).
> 
> To me, the question is quite clear: do we trust users that say "please give me
> hugepages" enough to unconditionally give them hugepages? I would assume the
> answer lies somewhere between "yes" and "no", but 32MB I would say is not
> particularly excessive. 512MB is... much worse.
> 

I think the other question also is, if the userspace asks for hugepages, is it asking
for the biggest possible one? I think the answer is yes on 4K base page size when
largest is 2M, but maybe not the case for 16K and 64K. 

/sys/kernel/mm/transparent_hugepage/hugepages-* is supposed to be used
for anon only, but maybe in the future we could use that to determine the size
of THP to give to the user for file over here? For e.g. over here we could have
used it to determine what the biggest size is that has madvise (or always) set
and used it over here. Its probably a much bigger discussion.

next prev parent reply	other threads:[~2026-06-03 10:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260528165635.2068012-1-usama.arif@linux.dev>
2026-05-28 20:27 ` [PATCH v6 0/2] mm: improve large folio readahead for exec memory Andrew Morton
     [not found] ` <20260528165635.2068012-2-usama.arif@linux.dev>
2026-05-29  9:47   ` [PATCH v6 1/2] mm: bypass mmap_miss heuristic for VM_EXEC readahead Pedro Falcato
     [not found] ` <20260528165635.2068012-3-usama.arif@linux.dev>
2026-05-29 10:01   ` [PATCH v6 2/2] mm: use mapping_max_folio_order() for force_thp_readahead order Pedro Falcato
2026-05-29 12:19     ` Usama Arif
2026-05-29 13:40       ` Pedro Falcato
2026-05-29 14:11         ` Usama Arif
2026-05-30 15:16           ` Jan Kara
2026-06-01  9:43             ` Usama Arif
2026-06-02 17:35             ` Pedro Falcato
2026-06-03 10:10               ` Usama Arif [this message]
2026-06-03 11:51                 ` Pedro Falcato
2026-06-02 17:46           ` Pedro Falcato
2026-05-29 12:36   ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68ca2764-ca7c-482e-8e78-8c112ce01f99@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+kernel@donnellan.id.au \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kas@kernel.org \
    --cc=kees@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kevin.brodsky@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=pfalcato@suse.de \
    --cc=r@hev.cc \
    --cc=rmclure@linux.ibm.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=wilts.infradead.org@pedro-suse.lan \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox