All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org,
	david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com,
	ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	regressions@lists.linux.dev, intel-gfx@lists.freedesktop.org,
	Eero Tamminen <eero.t.tamminen@intel.com>
Subject: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
Date: Tue, 29 Apr 2025 20:44:27 +0300	[thread overview]
Message-ID: <aBEP-6iFhIC87zmb@intel.com> (raw)
In-Reply-To: <035bf55fbdebeff65f5cb2cdb9907b7d632c3228.1732779148.git.baolin.wang@linux.alibaba.com>

On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
> Add large folio support for tmpfs write and fallocate paths matching the
> same high order preference mechanism used in the iomap buffered IO path
> as used in __filemap_get_folio().
> 
> Add shmem_mapping_size_orders() to get a hint for the orders of the folio
> based on the file size which takes care of the mapping requirements.
> 
> Traditionally, tmpfs only supported PMD-sized large folios. However nowadays
> with other file systems supporting any sized large folios, and extending
> anonymous to support mTHP, we should not restrict tmpfs to allocating only
> PMD-sized large folios, making it more special. Instead, we should allow
> tmpfs can allocate any sized large folios.
> 
> Considering that tmpfs already has the 'huge=' option to control the PMD-sized
> large folios allocation, we can extend the 'huge=' option to allow any sized
> large folios. The semantics of the 'huge=' mount option are:
> 
> huge=never: no any sized large folios
> huge=always: any sized large folios
> huge=within_size: like 'always' but respect the i_size
> huge=advise: like 'always' if requested with madvise()
> 
> Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
> allocate the PMD-sized huge folios if huge=always/within_size/advise is set.
> 
> Moreover, the 'deny' and 'force' testing options controlled by
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
> semantics. The 'deny' can disable any sized large folios for tmpfs, while
> the 'force' can enable PMD sized large folios for tmpfs.
> 
> Co-developed-by: Daniel Gomez <da.gomez@samsung.com>
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Hi,

This causes a huge regression in Intel iGPU texturing performance.

I haven't had time to look at this in detail, but presumably the
problem is that we're no longer getting huge pages from our
private tmpfs mount (done in i915_gemfs_init()).

Some more details at
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845

> ---
>  mm/shmem.c | 99 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 81 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7595c3db4c1c..54eaa724c153 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -554,34 +554,100 @@ static bool shmem_confirm_swap(struct address_space *mapping,
>  
>  static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER;
>  
> +/**
> + * shmem_mapping_size_orders - Get allowable folio orders for the given file size.
> + * @mapping: Target address_space.
> + * @index: The page index.
> + * @write_end: end of a write, could extend inode size.
> + *
> + * This returns huge orders for folios (when supported) based on the file size
> + * which the mapping currently allows at the given index. The index is relevant
> + * due to alignment considerations the mapping might have. The returned order
> + * may be less than the size passed.
> + *
> + * Return: The orders.
> + */
> +static inline unsigned int
> +shmem_mapping_size_orders(struct address_space *mapping, pgoff_t index, loff_t write_end)
> +{
> +	unsigned int order;
> +	size_t size;
> +
> +	if (!mapping_large_folio_support(mapping) || !write_end)
> +		return 0;
> +
> +	/* Calculate the write size based on the write_end */
> +	size = write_end - (index << PAGE_SHIFT);
> +	order = filemap_get_order(size);
> +	if (!order)
> +		return 0;
> +
> +	/* If we're not aligned, allocate a smaller folio */
> +	if (index & ((1UL << order) - 1))
> +		order = __ffs(index);
> +
> +	order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
> +	return order > 0 ? BIT(order + 1) - 1 : 0;
> +}
> +
>  static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
>  					      loff_t write_end, bool shmem_huge_force,
> +					      struct vm_area_struct *vma,
>  					      unsigned long vm_flags)
>  {
> +	unsigned int maybe_pmd_order = HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER ?
> +		0 : BIT(HPAGE_PMD_ORDER);
> +	unsigned long within_size_orders;
> +	unsigned int order;
> +	pgoff_t aligned_index;
>  	loff_t i_size;
>  
> -	if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER)
> -		return 0;
>  	if (!S_ISREG(inode->i_mode))
>  		return 0;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
>  		return 0;
>  	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> -		return BIT(HPAGE_PMD_ORDER);
> +		return maybe_pmd_order;
>  
> +	/*
> +	 * The huge order allocation for anon shmem is controlled through
> +	 * the mTHP interface, so we still use PMD-sized huge order to
> +	 * check whether global control is enabled.
> +	 *
> +	 * For tmpfs mmap()'s huge order, we still use PMD-sized order to
> +	 * allocate huge pages due to lack of a write size hint.
> +	 *
> +	 * Otherwise, tmpfs will allow getting a highest order hint based on
> +	 * the size of write and fallocate paths, then will try each allowable
> +	 * huge orders.
> +	 */
>  	switch (SHMEM_SB(inode->i_sb)->huge) {
>  	case SHMEM_HUGE_ALWAYS:
> -		return BIT(HPAGE_PMD_ORDER);
> +		if (vma)
> +			return maybe_pmd_order;
> +
> +		return shmem_mapping_size_orders(inode->i_mapping, index, write_end);
>  	case SHMEM_HUGE_WITHIN_SIZE:
> -		index = round_up(index + 1, HPAGE_PMD_NR);
> -		i_size = max(write_end, i_size_read(inode));
> -		i_size = round_up(i_size, PAGE_SIZE);
> -		if (i_size >> PAGE_SHIFT >= index)
> -			return BIT(HPAGE_PMD_ORDER);
> +		if (vma)
> +			within_size_orders = maybe_pmd_order;
> +		else
> +			within_size_orders = shmem_mapping_size_orders(inode->i_mapping,
> +								       index, write_end);
> +
> +		order = highest_order(within_size_orders);
> +		while (within_size_orders) {
> +			aligned_index = round_up(index + 1, 1 << order);
> +			i_size = max(write_end, i_size_read(inode));
> +			i_size = round_up(i_size, PAGE_SIZE);
> +			if (i_size >> PAGE_SHIFT >= aligned_index)
> +				return within_size_orders;
> +
> +			order = next_order(&within_size_orders, order);
> +		}
>  		fallthrough;
>  	case SHMEM_HUGE_ADVISE:
>  		if (vm_flags & VM_HUGEPAGE)
> -			return BIT(HPAGE_PMD_ORDER);
> +			return maybe_pmd_order;
>  		fallthrough;
>  	default:
>  		return 0;
> @@ -781,6 +847,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
>  
>  static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
>  					      loff_t write_end, bool shmem_huge_force,
> +					      struct vm_area_struct *vma,
>  					      unsigned long vm_flags)
>  {
>  	return 0;
> @@ -1176,7 +1243,7 @@ static int shmem_getattr(struct mnt_idmap *idmap,
>  			STATX_ATTR_NODUMP);
>  	generic_fillattr(idmap, request_mask, inode, stat);
>  
> -	if (shmem_huge_global_enabled(inode, 0, 0, false, 0))
> +	if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
>  		stat->blksize = HPAGE_PMD_SIZE;
>  
>  	if (request_mask & STATX_BTIME) {
> @@ -1693,14 +1760,10 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>  		return 0;
>  
>  	global_orders = shmem_huge_global_enabled(inode, index, write_end,
> -						  shmem_huge_force, vm_flags);
> -	if (!vma || !vma_is_anon_shmem(vma)) {
> -		/*
> -		 * For tmpfs, we now only support PMD sized THP if huge page
> -		 * is enabled, otherwise fallback to order 0.
> -		 */
> +						  shmem_huge_force, vma, vm_flags);
> +	/* Tmpfs huge pages allocation */
> +	if (!vma || !vma_is_anon_shmem(vma))
>  		return global_orders;
> -	}
>  
>  	/*
>  	 * Following the 'deny' semantics of the top level, force the huge
> -- 
> 2.39.3
> 

-- 
Ville Syrjälä
Intel

  reply	other threads:[~2025-04-29 17:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28  7:40 [PATCH v3 0/6] Support large folios for tmpfs Baolin Wang
2024-11-28  7:40 ` [PATCH v3 1/6] mm: factor out the order calculation into a new helper Baolin Wang
2024-11-28  7:40 ` [PATCH v3 2/6] mm: shmem: change shmem_huge_global_enabled() to return huge order bitmap Baolin Wang
2024-11-28  7:40 ` [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs Baolin Wang
2025-04-29 17:44   ` Ville Syrjälä [this message]
2025-04-30  6:32     ` [REGRESSION] " Baolin Wang
2025-04-30 11:20       ` Ville Syrjälä
2025-04-30 13:24         ` Daniel Gomez
2025-05-02  1:02           ` Baolin Wang
2025-05-02  7:18             ` David Hildenbrand
2025-05-02 13:10               ` Daniel Gomez
2025-05-02 15:31                 ` David Hildenbrand
2025-05-06  3:33                   ` Baolin Wang
2025-05-06 14:36                     ` David Hildenbrand
2024-11-28  7:40 ` [PATCH v3 4/6] mm: shmem: add a kernel command line to change the default huge policy " Baolin Wang
2024-11-28  7:40 ` [PATCH v3 5/6] docs: tmpfs: update the large folios policy for tmpfs and shmem Baolin Wang
2024-11-28  7:40 ` [PATCH v3 6/6] docs: tmpfs: drop 'fadvise()' from the documentation Baolin Wang
2025-04-30  7:02 ` ✗ Fi.CI.BUILD: failure for Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBEP-6iFhIC87zmb@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=da.gomez@samsung.com \
    --cc=david@redhat.com \
    --cc=eero.t.tamminen@intel.com \
    --cc=hughd@google.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=ioworker0@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=regressions@lists.linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.