public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Jan Kara <jack@suse.cz>, david@kernel.org, ryan.roberts@arm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	willy@infradead.org, linux-mm@kvack.org, r@hev.cc,
	ajd@linux.ibm.com, apopple@nvidia.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, brauner@kernel.org,
	catalin.marinas@arm.com, dev.jain@arm.com, kees@kernel.org,
	kevin.brodsky@arm.com, lance.yang@linux.dev,
	Liam.Howlett@oracle.com, linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	lorenzo.stoakes@oracle.com, mhocko@suse.com, npache@redhat.com,
	pasha.tatashin@soleen.com, rmclure@linux.ibm.com,
	rppt@kernel.org, surenb@google.com, vbabka@kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	wilts.infradead.org@quack3, ziy@nvidia.com, hannes@cmpxchg.org,
	kas@kernel.org, shakeel.butt@linux.dev, kernel-team@meta.com
Subject: Re: [PATCH v2 2/4] mm: replace exec_folio_order() with generic preferred_exec_order()
Date: Thu, 26 Mar 2026 08:40:21 -0400	[thread overview]
Message-ID: <0ce07bc5-6365-4c54-90e2-4e56ad2b7465@linux.dev> (raw)
In-Reply-To: <k45xs6btmt62uerbglqe665jozrtkeoklu4rek6odgxjdj63ni@ftw6ef3ug33x>



On 20/03/2026 17:42, Jan Kara wrote:
> On Fri 20-03-26 06:58:52, Usama Arif wrote:
>> Replace the arch-specific exec_folio_order() hook with a generic
>> preferred_exec_order() that dynamically computes the readahead folio
>> order for executable memory. It targets min(PMD_ORDER, 2M) as the
>> maximum, which optimally gives the right answer for contpte (arm64),
>> PMD mapping (x86, arm64 4K), and architectures with smaller PMDs
>> (s390 1M). It adapts at runtime based on:
>>
>> - VMA size: caps the order so folios fit within the mapping
>> - Memory pressure: steps down the order when the local node's free
>>   memory is below the high watermark for the requested order
>>
>> This avoids over-allocating on memory-constrained systems while still
>> requesting the optimal order when memory is plentiful.
>>
>> Since exec_folio_order() is no longer needed, remove the arm64
>> definition and the generic default from pgtable.h.
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ...
>> +static unsigned int preferred_exec_order(struct vm_area_struct *vma)
>> +{
>> +	int order;
>> +	unsigned long vma_len = vma_pages(vma);
>> +	struct zone *zone;
>> +	gfp_t gfp;
>> +
>> +	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>> +		return 0;
>> +
>> +	/* Cap at min(PMD_ORDER, 2M) */
>> +	order = min(HPAGE_PMD_ORDER, ilog2(SZ_2M >> PAGE_SHIFT));
>> +
>> +	/* Don't request folios larger than the VMA */
>> +	order = min(order, ilog2(vma_len));
> 

Hi Jan,

Thanks for the feedback and sorry for the late reply! I was travelling
during the week.

> Hum, as far as I'm checking page_cache_ra_order() used in
> do_sync_mmap_readahead(), ra->order is the preferred order but it will be
> trimmed down to fit both within the file and within ra->size. And ra->size
> is set for the readahead to fit within the vma so I don't think any order
> trimming based on vma length is needed in this place?

Ack, yes makes sense.

> 
>> +	/* Step down under memory pressure */
>> +	gfp = mapping_gfp_mask(vma->vm_file->f_mapping);
>> +	zone = first_zones_zonelist(node_zonelist(numa_node_id(), gfp),
>> +				    gfp_zone(gfp), NULL)->zone;
>> +	if (zone) {
>> +		while (order > 0 &&
>> +		       !zone_watermark_ok(zone, order,
>> +					  high_wmark_pages(zone), 0, 0))
>> +			order--;
>> +	}
> 
> It looks wrong for this logic to be here. Trimming order based on memory
> pressure makes sense (and we've already got reports that on memory limited
> devices large order folios in the page cache have too big memory overhead
> so we'll likely need to handle that for page cache allocations in general)
> but IMHO it belongs to page_cache_ra_order() or some other common place
> like that.
> 
> 								Honza

So I have been thinking about this. readahead_gfp_mask() already sets
__GFP_NORETRY, so we wont try aggressive reclaim/compaction to satisfy
the allocation. page_cache_ra_order() falls through to the fallback path
faulting in order 0 page when allocation is not satsified.

So the allocator already naturally steps down under memory pressure,
the explicit zone_watermark_ok() loop might be redundant?

What are your thoughts on just setting
ra->order = min(HPAGE_PMD_ORDER, ilog2(SZ_2M >> PAGE_SHIFT))?
We can do the higher orlder allocation with gfp &= ~__GFP_RECLAIM
for the VM_EXEC case.


  reply	other threads:[~2026-03-26 12:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 13:58 [PATCH v2 0/4] mm: improve large folio readahead and alignment for exec memory Usama Arif
2026-03-20 13:58 ` [PATCH v2 1/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead Usama Arif
2026-03-20 14:18   ` Jan Kara
2026-03-20 14:26   ` Kiryl Shutsemau
2026-03-20 13:58 ` [PATCH v2 2/4] mm: replace exec_folio_order() with generic preferred_exec_order() Usama Arif
2026-03-20 14:41   ` Kiryl Shutsemau
2026-03-20 14:42   ` Jan Kara
2026-03-26 12:40     ` Usama Arif [this message]
2026-03-20 13:58 ` [PATCH v2 3/4] elf: align ET_DYN base to max folio size for PTE coalescing Usama Arif
2026-03-20 14:55   ` Kiryl Shutsemau
2026-03-20 15:58   ` Matthew Wilcox
2026-03-20 16:05   ` WANG Rui
2026-03-20 17:47     ` Matthew Wilcox
2026-03-20 13:58 ` [PATCH v2 4/4] mm: align file-backed mmap to max folio order in thp_get_unmapped_area Usama Arif
2026-03-20 15:06   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ce07bc5-6365-4c54-90e2-4e56ad2b7465@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=ajd@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=kas@kernel.org \
    --cc=kees@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kevin.brodsky@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=r@hev.cc \
    --cc=rmclure@linux.ibm.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=wilts.infradead.org@quack3 \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox