[REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
       [not found] ` <035bf55fbdebeff65f5cb2cdb9907b7d632c3228.1732779148.git.baolin.wang@linux.alibaba.com>
@ 2025-04-29 17:44   ` Ville Syrjälä
  2025-04-30  6:32     ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Ville Syrjälä @ 2025-04-29 17:44 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, willy, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel, regressions,
	intel-gfx, Eero Tamminen

On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
> Add large folio support for tmpfs write and fallocate paths matching the
> same high order preference mechanism used in the iomap buffered IO path
> as used in __filemap_get_folio().
> 
> Add shmem_mapping_size_orders() to get a hint for the orders of the folio
> based on the file size which takes care of the mapping requirements.
> 
> Traditionally, tmpfs only supported PMD-sized large folios. However nowadays
> with other file systems supporting any sized large folios, and extending
> anonymous to support mTHP, we should not restrict tmpfs to allocating only
> PMD-sized large folios, making it more special. Instead, we should allow
> tmpfs can allocate any sized large folios.
> 
> Considering that tmpfs already has the 'huge=' option to control the PMD-sized
> large folios allocation, we can extend the 'huge=' option to allow any sized
> large folios. The semantics of the 'huge=' mount option are:
> 
> huge=never: no any sized large folios
> huge=always: any sized large folios
> huge=within_size: like 'always' but respect the i_size
> huge=advise: like 'always' if requested with madvise()
> 
> Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
> allocate the PMD-sized huge folios if huge=always/within_size/advise is set.
> 
> Moreover, the 'deny' and 'force' testing options controlled by
> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
> semantics. The 'deny' can disable any sized large folios for tmpfs, while
> the 'force' can enable PMD sized large folios for tmpfs.
> 
> Co-developed-by: Daniel Gomez <da.gomez@samsung.com>
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Hi,

This causes a huge regression in Intel iGPU texturing performance.

I haven't had time to look at this in detail, but presumably the
problem is that we're no longer getting huge pages from our
private tmpfs mount (done in i915_gemfs_init()).

Some more details at
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845

> ---
>  mm/shmem.c | 99 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 81 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7595c3db4c1c..54eaa724c153 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -554,34 +554,100 @@ static bool shmem_confirm_swap(struct address_space *mapping,
>  
>  static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER;
>  
> +/**
> + * shmem_mapping_size_orders - Get allowable folio orders for the given file size.
> + * @mapping: Target address_space.
> + * @index: The page index.
> + * @write_end: end of a write, could extend inode size.
> + *
> + * This returns huge orders for folios (when supported) based on the file size
> + * which the mapping currently allows at the given index. The index is relevant
> + * due to alignment considerations the mapping might have. The returned order
> + * may be less than the size passed.
> + *
> + * Return: The orders.
> + */
> +static inline unsigned int
> +shmem_mapping_size_orders(struct address_space *mapping, pgoff_t index, loff_t write_end)
> +{
> +	unsigned int order;
> +	size_t size;
> +
> +	if (!mapping_large_folio_support(mapping) || !write_end)
> +		return 0;
> +
> +	/* Calculate the write size based on the write_end */
> +	size = write_end - (index << PAGE_SHIFT);
> +	order = filemap_get_order(size);
> +	if (!order)
> +		return 0;
> +
> +	/* If we're not aligned, allocate a smaller folio */
> +	if (index & ((1UL << order) - 1))
> +		order = __ffs(index);
> +
> +	order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
> +	return order > 0 ? BIT(order + 1) - 1 : 0;
> +}
> +
>  static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
>  					      loff_t write_end, bool shmem_huge_force,
> +					      struct vm_area_struct *vma,
>  					      unsigned long vm_flags)
>  {
> +	unsigned int maybe_pmd_order = HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER ?
> +		0 : BIT(HPAGE_PMD_ORDER);
> +	unsigned long within_size_orders;
> +	unsigned int order;
> +	pgoff_t aligned_index;
>  	loff_t i_size;
>  
> -	if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER)
> -		return 0;
>  	if (!S_ISREG(inode->i_mode))
>  		return 0;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
>  		return 0;
>  	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> -		return BIT(HPAGE_PMD_ORDER);
> +		return maybe_pmd_order;
>  
> +	/*
> +	 * The huge order allocation for anon shmem is controlled through
> +	 * the mTHP interface, so we still use PMD-sized huge order to
> +	 * check whether global control is enabled.
> +	 *
> +	 * For tmpfs mmap()'s huge order, we still use PMD-sized order to
> +	 * allocate huge pages due to lack of a write size hint.
> +	 *
> +	 * Otherwise, tmpfs will allow getting a highest order hint based on
> +	 * the size of write and fallocate paths, then will try each allowable
> +	 * huge orders.
> +	 */
>  	switch (SHMEM_SB(inode->i_sb)->huge) {
>  	case SHMEM_HUGE_ALWAYS:
> -		return BIT(HPAGE_PMD_ORDER);
> +		if (vma)
> +			return maybe_pmd_order;
> +
> +		return shmem_mapping_size_orders(inode->i_mapping, index, write_end);
>  	case SHMEM_HUGE_WITHIN_SIZE:
> -		index = round_up(index + 1, HPAGE_PMD_NR);
> -		i_size = max(write_end, i_size_read(inode));
> -		i_size = round_up(i_size, PAGE_SIZE);
> -		if (i_size >> PAGE_SHIFT >= index)
> -			return BIT(HPAGE_PMD_ORDER);
> +		if (vma)
> +			within_size_orders = maybe_pmd_order;
> +		else
> +			within_size_orders = shmem_mapping_size_orders(inode->i_mapping,
> +								       index, write_end);
> +
> +		order = highest_order(within_size_orders);
> +		while (within_size_orders) {
> +			aligned_index = round_up(index + 1, 1 << order);
> +			i_size = max(write_end, i_size_read(inode));
> +			i_size = round_up(i_size, PAGE_SIZE);
> +			if (i_size >> PAGE_SHIFT >= aligned_index)
> +				return within_size_orders;
> +
> +			order = next_order(&within_size_orders, order);
> +		}
>  		fallthrough;
>  	case SHMEM_HUGE_ADVISE:
>  		if (vm_flags & VM_HUGEPAGE)
> -			return BIT(HPAGE_PMD_ORDER);
> +			return maybe_pmd_order;
>  		fallthrough;
>  	default:
>  		return 0;
> @@ -781,6 +847,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
>  
>  static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
>  					      loff_t write_end, bool shmem_huge_force,
> +					      struct vm_area_struct *vma,
>  					      unsigned long vm_flags)
>  {
>  	return 0;
> @@ -1176,7 +1243,7 @@ static int shmem_getattr(struct mnt_idmap *idmap,
>  			STATX_ATTR_NODUMP);
>  	generic_fillattr(idmap, request_mask, inode, stat);
>  
> -	if (shmem_huge_global_enabled(inode, 0, 0, false, 0))
> +	if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
>  		stat->blksize = HPAGE_PMD_SIZE;
>  
>  	if (request_mask & STATX_BTIME) {
> @@ -1693,14 +1760,10 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>  		return 0;
>  
>  	global_orders = shmem_huge_global_enabled(inode, index, write_end,
> -						  shmem_huge_force, vm_flags);
> -	if (!vma || !vma_is_anon_shmem(vma)) {
> -		/*
> -		 * For tmpfs, we now only support PMD sized THP if huge page
> -		 * is enabled, otherwise fallback to order 0.
> -		 */
> +						  shmem_huge_force, vma, vm_flags);
> +	/* Tmpfs huge pages allocation */
> +	if (!vma || !vma_is_anon_shmem(vma))
>  		return global_orders;
> -	}
>  
>  	/*
>  	 * Following the 'deny' semantics of the top level, force the huge
> -- 
> 2.39.3
> 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-04-29 17:44   ` [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs Ville Syrjälä
@ 2025-04-30  6:32     ` Baolin Wang
  2025-04-30 11:20       ` Ville Syrjälä
  0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2025-04-30  6:32 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: akpm, hughd, willy, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel, regressions,
	intel-gfx, Eero Tamminen

Hi,

On 2025/4/30 01:44, Ville Syrjälä wrote:
> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>> Add large folio support for tmpfs write and fallocate paths matching the
>> same high order preference mechanism used in the iomap buffered IO path
>> as used in __filemap_get_folio().
>>
>> Add shmem_mapping_size_orders() to get a hint for the orders of the folio
>> based on the file size which takes care of the mapping requirements.
>>
>> Traditionally, tmpfs only supported PMD-sized large folios. However nowadays
>> with other file systems supporting any sized large folios, and extending
>> anonymous to support mTHP, we should not restrict tmpfs to allocating only
>> PMD-sized large folios, making it more special. Instead, we should allow
>> tmpfs can allocate any sized large folios.
>>
>> Considering that tmpfs already has the 'huge=' option to control the PMD-sized
>> large folios allocation, we can extend the 'huge=' option to allow any sized
>> large folios. The semantics of the 'huge=' mount option are:
>>
>> huge=never: no any sized large folios
>> huge=always: any sized large folios
>> huge=within_size: like 'always' but respect the i_size
>> huge=advise: like 'always' if requested with madvise()
>>
>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
>> allocate the PMD-sized huge folios if huge=always/within_size/advise is set.
>>
>> Moreover, the 'deny' and 'force' testing options controlled by
>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
>> semantics. The 'deny' can disable any sized large folios for tmpfs, while
>> the 'force' can enable PMD sized large folios for tmpfs.
>>
>> Co-developed-by: Daniel Gomez <da.gomez@samsung.com>
>> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> 
> Hi,
> 
> This causes a huge regression in Intel iGPU texturing performance.

Unfortunately, I don't have such platform to test it.

> 
> I haven't had time to look at this in detail, but presumably the
> problem is that we're no longer getting huge pages from our
> private tmpfs mount (done in i915_gemfs_init()).

IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE 
in the shmem_pwrite(), which prevents tmpfs from allocating large 
folios. As mentioned in the comments below, tmpfs like other file 
systems that support large folios, will allow getting a highest order 
hint based on the size of the write and fallocate paths, and then will 
attempt each allowable huge order.

Therefore, I think the shmem_pwrite() function should be changed to 
remove the limitation that the write size cannot exceed PAGE_SIZE.

Something like the following code (untested):
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index ae3343c81a64..97eefb73c5d2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -420,6 +420,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
         struct address_space *mapping = obj->base.filp->f_mapping;
         const struct address_space_operations *aops = mapping->a_ops;
         char __user *user_data = u64_to_user_ptr(arg->data_ptr);
+       size_t chunk = mapping_max_folio_size(mapping);
         u64 remain;
         loff_t pos;
         unsigned int pg;
@@ -463,10 +464,10 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
                 void *data, *vaddr;
                 int err;
                 char __maybe_unused c;
+               size_t offset;

-               len = PAGE_SIZE - pg;
-               if (len > remain)
-                       len = remain;
+               offset = pos & (chunk - 1);
+               len = min(chunk - offset, remain);

                 /* Prefault the user page to reduce potential recursion */
                 err = __get_user(c, user_data);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-04-30  6:32     ` Baolin Wang
@ 2025-04-30 11:20       ` Ville Syrjälä
  2025-04-30 13:24         ` Daniel Gomez
  0 siblings, 1 reply; 11+ messages in thread
From: Ville Syrjälä @ 2025-04-30 11:20 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, willy, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel, regressions,
	intel-gfx, Eero Tamminen

On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
> Hi,
> 
> On 2025/4/30 01:44, Ville Syrjälä wrote:
> > On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
> >> Add large folio support for tmpfs write and fallocate paths matching the
> >> same high order preference mechanism used in the iomap buffered IO path
> >> as used in __filemap_get_folio().
> >>
> >> Add shmem_mapping_size_orders() to get a hint for the orders of the folio
> >> based on the file size which takes care of the mapping requirements.
> >>
> >> Traditionally, tmpfs only supported PMD-sized large folios. However nowadays
> >> with other file systems supporting any sized large folios, and extending
> >> anonymous to support mTHP, we should not restrict tmpfs to allocating only
> >> PMD-sized large folios, making it more special. Instead, we should allow
> >> tmpfs can allocate any sized large folios.
> >>
> >> Considering that tmpfs already has the 'huge=' option to control the PMD-sized
> >> large folios allocation, we can extend the 'huge=' option to allow any sized
> >> large folios. The semantics of the 'huge=' mount option are:
> >>
> >> huge=never: no any sized large folios
> >> huge=always: any sized large folios
> >> huge=within_size: like 'always' but respect the i_size
> >> huge=advise: like 'always' if requested with madvise()
> >>
> >> Note: for tmpfs mmap() faults, due to the lack of a write size hint, still
> >> allocate the PMD-sized huge folios if huge=always/within_size/advise is set.
> >>
> >> Moreover, the 'deny' and 'force' testing options controlled by
> >> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same
> >> semantics. The 'deny' can disable any sized large folios for tmpfs, while
> >> the 'force' can enable PMD sized large folios for tmpfs.
> >>
> >> Co-developed-by: Daniel Gomez <da.gomez@samsung.com>
> >> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > 
> > Hi,
> > 
> > This causes a huge regression in Intel iGPU texturing performance.
> 
> Unfortunately, I don't have such platform to test it.
> 
> > 
> > I haven't had time to look at this in detail, but presumably the
> > problem is that we're no longer getting huge pages from our
> > private tmpfs mount (done in i915_gemfs_init()).
> 
> IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE 
> in the shmem_pwrite(),

pwrite is just one random way to write to objects, and probably
not something that's even used by current Mesa.

> which prevents tmpfs from allocating large 
> folios. As mentioned in the comments below, tmpfs like other file 
> systems that support large folios, will allow getting a highest order 
> hint based on the size of the write and fallocate paths, and then will 
> attempt each allowable huge order.
> 
> Therefore, I think the shmem_pwrite() function should be changed to 
> remove the limitation that the write size cannot exceed PAGE_SIZE.
> 
> Something like the following code (untested):
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index ae3343c81a64..97eefb73c5d2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -420,6 +420,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
>          struct address_space *mapping = obj->base.filp->f_mapping;
>          const struct address_space_operations *aops = mapping->a_ops;
>          char __user *user_data = u64_to_user_ptr(arg->data_ptr);
> +       size_t chunk = mapping_max_folio_size(mapping);
>          u64 remain;
>          loff_t pos;
>          unsigned int pg;
> @@ -463,10 +464,10 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
>                  void *data, *vaddr;
>                  int err;
>                  char __maybe_unused c;
> +               size_t offset;
> 
> -               len = PAGE_SIZE - pg;
> -               if (len > remain)
> -                       len = remain;
> +               offset = pos & (chunk - 1);
> +               len = min(chunk - offset, remain);
> 
>                  /* Prefault the user page to reduce potential recursion */
>                  err = __get_user(c, user_data);

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-04-30 11:20       ` Ville Syrjälä
@ 2025-04-30 13:24         ` Daniel Gomez
  2025-05-02  1:02           ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Gomez @ 2025-04-30 13:24 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Baolin Wang, akpm, hughd, willy, david, wangkefeng.wang, 21cnbao,
	ryan.roberts, ioworker0, da.gomez, linux-mm, linux-kernel,
	regressions, intel-gfx, Eero Tamminen

On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
> > On 2025/4/30 01:44, Ville Syrjälä wrote:
> > > On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
> > > Hi,
> > > 
> > > This causes a huge regression in Intel iGPU texturing performance.
> > 
> > Unfortunately, I don't have such platform to test it.
> > 
> > > 
> > > I haven't had time to look at this in detail, but presumably the
> > > problem is that we're no longer getting huge pages from our
> > > private tmpfs mount (done in i915_gemfs_init()).
> > 
> > IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE 
> > in the shmem_pwrite(),
> 
> pwrite is just one random way to write to objects, and probably
> not something that's even used by current Mesa.
> 
> > which prevents tmpfs from allocating large 
> > folios. As mentioned in the comments below, tmpfs like other file 
> > systems that support large folios, will allow getting a highest order 
> > hint based on the size of the write and fallocate paths, and then will 
> > attempt each allowable huge order.
> > 
> > Therefore, I think the shmem_pwrite() function should be changed to 
> > remove the limitation that the write size cannot exceed PAGE_SIZE.

To enable mTHP on tmpfs, the necessary knobs must first be enabled in sysfs
as they are not enabled by default IIRC (only THP, PMD level). Ville, I
see i915_gemfs the huge=within_size mount option is passed. Can you confirm
if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also marked as
'always' when the regression is found?

Even if these are enabled, the possible difference may be that before, i915 was
using PMD pages (THP) always and now mTHP will be used, unless the file size is
as big as the PMD page. I think the always mount option would also try to infer
the size to actually give a proper order folio according to that size. Baolin,
is that correct?

And Ville, can you confirm if what i915 needs is to enable PMD-size allocations
always?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-04-30 13:24         ` Daniel Gomez
@ 2025-05-02  1:02           ` Baolin Wang
  2025-05-02  7:18             ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2025-05-02  1:02 UTC (permalink / raw)
  To: Daniel Gomez, Ville Syrjälä
  Cc: akpm, hughd, willy, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel, regressions,
	intel-gfx, Eero Tamminen



On 2025/4/30 21:24, Daniel Gomez wrote:
> On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
>> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
>>> On 2025/4/30 01:44, Ville Syrjälä wrote:
>>>> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>>>> Hi,
>>>>
>>>> This causes a huge regression in Intel iGPU texturing performance.
>>>
>>> Unfortunately, I don't have such platform to test it.
>>>
>>>>
>>>> I haven't had time to look at this in detail, but presumably the
>>>> problem is that we're no longer getting huge pages from our
>>>> private tmpfs mount (done in i915_gemfs_init()).
>>>
>>> IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE
>>> in the shmem_pwrite(),
>>
>> pwrite is just one random way to write to objects, and probably
>> not something that's even used by current Mesa.
>>
>>> which prevents tmpfs from allocating large
>>> folios. As mentioned in the comments below, tmpfs like other file
>>> systems that support large folios, will allow getting a highest order
>>> hint based on the size of the write and fallocate paths, and then will
>>> attempt each allowable huge order.
>>>
>>> Therefore, I think the shmem_pwrite() function should be changed to
>>> remove the limitation that the write size cannot exceed PAGE_SIZE.
> 
> To enable mTHP on tmpfs, the necessary knobs must first be enabled in sysfs
> as they are not enabled by default IIRC (only THP, PMD level). Ville, I
> see i915_gemfs the huge=within_size mount option is passed. Can you confirm
> if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also marked as
> 'always' when the regression is found?

The tmpfs mount will not be controlled by 
'/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for 
the debugging options 'deny' and 'force').

> Even if these are enabled, the possible difference may be that before, i915 was
> using PMD pages (THP) always and now mTHP will be used, unless the file size is
> as big as the PMD page. I think the always mount option would also try to infer
> the size to actually give a proper order folio according to that size. Baolin,
> is that correct?

Right.

> And Ville, can you confirm if what i915 needs is to enable PMD-size allocations
> always?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-05-02  1:02           ` Baolin Wang
@ 2025-05-02  7:18             ` David Hildenbrand
  2025-05-02 13:10               ` Daniel Gomez
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-05-02  7:18 UTC (permalink / raw)
  To: Baolin Wang, Daniel Gomez, Ville Syrjälä
  Cc: akpm, hughd, willy, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel, regressions,
	intel-gfx, Eero Tamminen

On 02.05.25 03:02, Baolin Wang wrote:
> 
> 
> On 2025/4/30 21:24, Daniel Gomez wrote:
>> On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
>>> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
>>>> On 2025/4/30 01:44, Ville Syrjälä wrote:
>>>>> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>>>>> Hi,
>>>>>
>>>>> This causes a huge regression in Intel iGPU texturing performance.
>>>>
>>>> Unfortunately, I don't have such platform to test it.
>>>>
>>>>>
>>>>> I haven't had time to look at this in detail, but presumably the
>>>>> problem is that we're no longer getting huge pages from our
>>>>> private tmpfs mount (done in i915_gemfs_init()).
>>>>
>>>> IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE
>>>> in the shmem_pwrite(),
>>>
>>> pwrite is just one random way to write to objects, and probably
>>> not something that's even used by current Mesa.
>>>
>>>> which prevents tmpfs from allocating large
>>>> folios. As mentioned in the comments below, tmpfs like other file
>>>> systems that support large folios, will allow getting a highest order
>>>> hint based on the size of the write and fallocate paths, and then will
>>>> attempt each allowable huge order.
>>>>
>>>> Therefore, I think the shmem_pwrite() function should be changed to
>>>> remove the limitation that the write size cannot exceed PAGE_SIZE.
>>
>> To enable mTHP on tmpfs, the necessary knobs must first be enabled in sysfs
>> as they are not enabled by default IIRC (only THP, PMD level). Ville, I
>> see i915_gemfs the huge=within_size mount option is passed. Can you confirm
>> if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also marked as
>> 'always' when the regression is found?
> 
> The tmpfs mount will not be controlled by
> '/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for
> the debugging options 'deny' and 'force').

Right, IIRC as requested by Willy, it should behave like other FSes 
where there is no control over the folio size to be used.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-05-02  7:18             ` David Hildenbrand
@ 2025-05-02 13:10               ` Daniel Gomez
  2025-05-02 15:31                 ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Gomez @ 2025-05-02 13:10 UTC (permalink / raw)
  To: David Hildenbrand, Baolin Wang
  Cc: Ville Syrjälä, akpm, hughd, willy, wangkefeng.wang,
	21cnbao, ryan.roberts, ioworker0, da.gomez, linux-mm,
	linux-kernel, regressions, intel-gfx, Eero Tamminen

On Fri, May 02, 2025 at 09:18:41AM +0100, David Hildenbrand wrote:
> On 02.05.25 03:02, Baolin Wang wrote:
> > 
> > 
> > On 2025/4/30 21:24, Daniel Gomez wrote:
> > > On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
> > > > On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
> > > > > On 2025/4/30 01:44, Ville Syrjälä wrote:
> > > > > > On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > This causes a huge regression in Intel iGPU texturing performance.
> > > > > 
> > > > > Unfortunately, I don't have such platform to test it.
> > > > > 
> > > > > > 
> > > > > > I haven't had time to look at this in detail, but presumably the
> > > > > > problem is that we're no longer getting huge pages from our
> > > > > > private tmpfs mount (done in i915_gemfs_init()).
> > > > > 
> > > > > IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE
> > > > > in the shmem_pwrite(),
> > > > 
> > > > pwrite is just one random way to write to objects, and probably
> > > > not something that's even used by current Mesa.
> > > > 
> > > > > which prevents tmpfs from allocating large
> > > > > folios. As mentioned in the comments below, tmpfs like other file
> > > > > systems that support large folios, will allow getting a highest order
> > > > > hint based on the size of the write and fallocate paths, and then will
> > > > > attempt each allowable huge order.
> > > > > 
> > > > > Therefore, I think the shmem_pwrite() function should be changed to
> > > > > remove the limitation that the write size cannot exceed PAGE_SIZE.
> > > 
> > > To enable mTHP on tmpfs, the necessary knobs must first be enabled in sysfs
> > > as they are not enabled by default IIRC (only THP, PMD level). Ville, I
> > > see i915_gemfs the huge=within_size mount option is passed. Can you confirm
> > > if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also marked as
> > > 'always' when the regression is found?
> > 
> > The tmpfs mount will not be controlled by
> > '/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for
> > the debugging options 'deny' and 'force').
> 
> Right, IIRC as requested by Willy, it should behave like other FSes where
> there is no control over the folio size to be used.

Thanks for reminding me. I forgot we finally changed it.

Could the performance drop be due to the driver no longer using PMD-level pages?

I also recall a performance drop when using order-8 and order-9 folios in tmpfs
with the initial per-block implementation. Baolin, did you experience anything
similar in the final implementation?

These were my numbers:

| Block Size (bs) | Linux Kernel v6.9 (GiB/s) | tmpfs with Large Folios v6.9 (GiB/s) |
| 4k   | 20.4 | 20.5 |
| 8k   | 34.3 | 34.3 |
| 16k  | 52.9 | 52.2 |
| 32k  | 70.2 | 76.9 |
| 64k  | 73.9 | 92.5 |
| 128k | 76.7 | 101  |
| 256k | 80.5 | 114  |
| 512k | 80.3 | 132  |
| 1M   | 78.5 | 75.2 |
| 2M   | 65.7 | 47.1 |

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-05-02 13:10               ` Daniel Gomez
@ 2025-05-02 15:31                 ` David Hildenbrand
  2025-05-06  3:33                   ` Baolin Wang
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-05-02 15:31 UTC (permalink / raw)
  To: Daniel Gomez, Baolin Wang
  Cc: Ville Syrjälä, akpm, hughd, willy, wangkefeng.wang,
	21cnbao, ryan.roberts, ioworker0, da.gomez, linux-mm,
	linux-kernel, regressions, intel-gfx, Eero Tamminen

On 02.05.25 15:10, Daniel Gomez wrote:
> On Fri, May 02, 2025 at 09:18:41AM +0100, David Hildenbrand wrote:
>> On 02.05.25 03:02, Baolin Wang wrote:
>>>
>>>
>>> On 2025/4/30 21:24, Daniel Gomez wrote:
>>>> On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
>>>>> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
>>>>>> On 2025/4/30 01:44, Ville Syrjälä wrote:
>>>>>>> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> This causes a huge regression in Intel iGPU texturing performance.
>>>>>>
>>>>>> Unfortunately, I don't have such platform to test it.
>>>>>>
>>>>>>>
>>>>>>> I haven't had time to look at this in detail, but presumably the
>>>>>>> problem is that we're no longer getting huge pages from our
>>>>>>> private tmpfs mount (done in i915_gemfs_init()).
>>>>>>
>>>>>> IIUC, the i915 driver still limits the maximum write size to PAGE_SIZE
>>>>>> in the shmem_pwrite(),
>>>>>
>>>>> pwrite is just one random way to write to objects, and probably
>>>>> not something that's even used by current Mesa.
>>>>>
>>>>>> which prevents tmpfs from allocating large
>>>>>> folios. As mentioned in the comments below, tmpfs like other file
>>>>>> systems that support large folios, will allow getting a highest order
>>>>>> hint based on the size of the write and fallocate paths, and then will
>>>>>> attempt each allowable huge order.
>>>>>>
>>>>>> Therefore, I think the shmem_pwrite() function should be changed to
>>>>>> remove the limitation that the write size cannot exceed PAGE_SIZE.
>>>>
>>>> To enable mTHP on tmpfs, the necessary knobs must first be enabled in sysfs
>>>> as they are not enabled by default IIRC (only THP, PMD level). Ville, I
>>>> see i915_gemfs the huge=within_size mount option is passed. Can you confirm
>>>> if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also marked as
>>>> 'always' when the regression is found?
>>>
>>> The tmpfs mount will not be controlled by
>>> '/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for
>>> the debugging options 'deny' and 'force').
>>
>> Right, IIRC as requested by Willy, it should behave like other FSes where
>> there is no control over the folio size to be used.
> 
> Thanks for reminding me. I forgot we finally changed it.
> 
> Could the performance drop be due to the driver no longer using PMD-level pages?

I suspect that the faulting logic will now go to a smaller order first, 
indeed.

... trying to digest shmem_allowable_huge_orders() and 
shmem_huge_global_enabled(), having a hard time trying to isolate the 
tmpfs case: especially, if we run here into the vma vs. !vma case.

Without a VMA, I think we should have "mpfs will allow getting a highest 
order hint based on and fallocate paths, then will try each allowable 
order".

With a VMA (no access hint), "we still use PMD-sized order to locate 
huge pages due to lack of a write size hint."

So if we get a fallocate()/write() that is, say, 1 MiB, we'd now 
allocate an 1 MiB folio instead of a 2 MiB one.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-05-02 15:31                 ` David Hildenbrand
@ 2025-05-06  3:33                   ` Baolin Wang
  2025-05-06 14:36                     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2025-05-06  3:33 UTC (permalink / raw)
  To: David Hildenbrand, Daniel Gomez
  Cc: Ville Syrjälä, akpm, hughd, willy, wangkefeng.wang,
	21cnbao, ryan.roberts, ioworker0, da.gomez, linux-mm,
	linux-kernel, regressions, intel-gfx, Eero Tamminen



On 2025/5/2 23:31, David Hildenbrand wrote:
> On 02.05.25 15:10, Daniel Gomez wrote:
>> On Fri, May 02, 2025 at 09:18:41AM +0100, David Hildenbrand wrote:
>>> On 02.05.25 03:02, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2025/4/30 21:24, Daniel Gomez wrote:
>>>>> On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
>>>>>> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
>>>>>>> On 2025/4/30 01:44, Ville Syrjälä wrote:
>>>>>>>> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This causes a huge regression in Intel iGPU texturing performance.
>>>>>>>
>>>>>>> Unfortunately, I don't have such platform to test it.
>>>>>>>
>>>>>>>>
>>>>>>>> I haven't had time to look at this in detail, but presumably the
>>>>>>>> problem is that we're no longer getting huge pages from our
>>>>>>>> private tmpfs mount (done in i915_gemfs_init()).
>>>>>>>
>>>>>>> IIUC, the i915 driver still limits the maximum write size to 
>>>>>>> PAGE_SIZE
>>>>>>> in the shmem_pwrite(),
>>>>>>
>>>>>> pwrite is just one random way to write to objects, and probably
>>>>>> not something that's even used by current Mesa.
>>>>>>
>>>>>>> which prevents tmpfs from allocating large
>>>>>>> folios. As mentioned in the comments below, tmpfs like other file
>>>>>>> systems that support large folios, will allow getting a highest 
>>>>>>> order
>>>>>>> hint based on the size of the write and fallocate paths, and then 
>>>>>>> will
>>>>>>> attempt each allowable huge order.
>>>>>>>
>>>>>>> Therefore, I think the shmem_pwrite() function should be changed to
>>>>>>> remove the limitation that the write size cannot exceed PAGE_SIZE.
>>>>>
>>>>> To enable mTHP on tmpfs, the necessary knobs must first be enabled 
>>>>> in sysfs
>>>>> as they are not enabled by default IIRC (only THP, PMD level). 
>>>>> Ville, I
>>>>> see i915_gemfs the huge=within_size mount option is passed. Can you 
>>>>> confirm
>>>>> if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also 
>>>>> marked as
>>>>> 'always' when the regression is found?
>>>>
>>>> The tmpfs mount will not be controlled by
>>>> '/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for
>>>> the debugging options 'deny' and 'force').
>>>
>>> Right, IIRC as requested by Willy, it should behave like other FSes 
>>> where
>>> there is no control over the folio size to be used.
>>
>> Thanks for reminding me. I forgot we finally changed it.
>>
>> Could the performance drop be due to the driver no longer using 
>> PMD-level pages?
> 
> I suspect that the faulting logic will now go to a smaller order first, 
> indeed.
> 
> ... trying to digest shmem_allowable_huge_orders() and 
> shmem_huge_global_enabled(), having a hard time trying to isolate the 
> tmpfs case: especially, if we run here into the vma vs. !vma case.
> 
> Without a VMA, I think we should have "mpfs will allow getting a highest 
> order hint based on and fallocate paths, then will try each allowable 
> order".
> 
> With a VMA (no access hint), "we still use PMD-sized order to locate 
> huge pages due to lack of a write size hint."
> 
> So if we get a fallocate()/write() that is, say, 1 MiB, we'd now 
> allocate an 1 MiB folio instead of a 2 MiB one.

Right.

So I asked Ville how the shmem folios are allocated in the i915 driver, 
and to see if we can make some improvements.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
  2025-05-06  3:33                   ` Baolin Wang
@ 2025-05-06 14:36                     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2025-05-06 14:36 UTC (permalink / raw)
  To: Baolin Wang, Daniel Gomez
  Cc: Ville Syrjälä, akpm, hughd, willy, wangkefeng.wang,
	21cnbao, ryan.roberts, ioworker0, da.gomez, linux-mm,
	linux-kernel, regressions, intel-gfx, Eero Tamminen

On 06.05.25 05:33, Baolin Wang wrote:
> 
> 
> On 2025/5/2 23:31, David Hildenbrand wrote:
>> On 02.05.25 15:10, Daniel Gomez wrote:
>>> On Fri, May 02, 2025 at 09:18:41AM +0100, David Hildenbrand wrote:
>>>> On 02.05.25 03:02, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2025/4/30 21:24, Daniel Gomez wrote:
>>>>>> On Wed, Apr 30, 2025 at 02:20:02PM +0100, Ville Syrjälä wrote:
>>>>>>> On Wed, Apr 30, 2025 at 02:32:39PM +0800, Baolin Wang wrote:
>>>>>>>> On 2025/4/30 01:44, Ville Syrjälä wrote:
>>>>>>>>> On Thu, Nov 28, 2024 at 03:40:41PM +0800, Baolin Wang wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This causes a huge regression in Intel iGPU texturing performance.
>>>>>>>>
>>>>>>>> Unfortunately, I don't have such platform to test it.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I haven't had time to look at this in detail, but presumably the
>>>>>>>>> problem is that we're no longer getting huge pages from our
>>>>>>>>> private tmpfs mount (done in i915_gemfs_init()).
>>>>>>>>
>>>>>>>> IIUC, the i915 driver still limits the maximum write size to
>>>>>>>> PAGE_SIZE
>>>>>>>> in the shmem_pwrite(),
>>>>>>>
>>>>>>> pwrite is just one random way to write to objects, and probably
>>>>>>> not something that's even used by current Mesa.
>>>>>>>
>>>>>>>> which prevents tmpfs from allocating large
>>>>>>>> folios. As mentioned in the comments below, tmpfs like other file
>>>>>>>> systems that support large folios, will allow getting a highest
>>>>>>>> order
>>>>>>>> hint based on the size of the write and fallocate paths, and then
>>>>>>>> will
>>>>>>>> attempt each allowable huge order.
>>>>>>>>
>>>>>>>> Therefore, I think the shmem_pwrite() function should be changed to
>>>>>>>> remove the limitation that the write size cannot exceed PAGE_SIZE.
>>>>>>
>>>>>> To enable mTHP on tmpfs, the necessary knobs must first be enabled
>>>>>> in sysfs
>>>>>> as they are not enabled by default IIRC (only THP, PMD level).
>>>>>> Ville, I
>>>>>> see i915_gemfs the huge=within_size mount option is passed. Can you
>>>>>> confirm
>>>>>> if /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled are also
>>>>>> marked as
>>>>>> 'always' when the regression is found?
>>>>>
>>>>> The tmpfs mount will not be controlled by
>>>>> '/sys/kernel/mm/transparent_hugepage/hugepages-*Kb/enabled' (except for
>>>>> the debugging options 'deny' and 'force').
>>>>
>>>> Right, IIRC as requested by Willy, it should behave like other FSes
>>>> where
>>>> there is no control over the folio size to be used.
>>>
>>> Thanks for reminding me. I forgot we finally changed it.
>>>
>>> Could the performance drop be due to the driver no longer using
>>> PMD-level pages?
>>
>> I suspect that the faulting logic will now go to a smaller order first,
>> indeed.
>>
>> ... trying to digest shmem_allowable_huge_orders() and
>> shmem_huge_global_enabled(), having a hard time trying to isolate the
>> tmpfs case: especially, if we run here into the vma vs. !vma case.
>>
>> Without a VMA, I think we should have "mpfs will allow getting a highest
>> order hint based on and fallocate paths, then will try each allowable
>> order".
>>
>> With a VMA (no access hint), "we still use PMD-sized order to locate
>> huge pages due to lack of a write size hint."
>>
>> So if we get a fallocate()/write() that is, say, 1 MiB, we'd now
>> allocate an 1 MiB folio instead of a 2 MiB one.
> 
> Right.
> 
> So I asked Ville how the shmem folios are allocated in the i915 driver,
> and to see if we can make some improvements.

Maybe preallocation (using fallocate) might be reasonable for their use 
case: if they know they will consume all that memory either way. If it's 
sparse, it's more problematic.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✗ Fi.CI.BUILD: failure for Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
       [not found] <cover.1732779148.git.baolin.wang@linux.alibaba.com>
       [not found] ` <035bf55fbdebeff65f5cb2cdb9907b7d632c3228.1732779148.git.baolin.wang@linux.alibaba.com>
@ 2025-04-30  7:02 ` Patchwork
  1 sibling, 0 replies; 11+ messages in thread
From: Patchwork @ 2025-04-30  7:02 UTC (permalink / raw)
  To: Baolin Wang; +Cc: intel-gfx

== Series Details ==

Series: Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs
URL   : https://patchwork.freedesktop.org/series/148465/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/148465/revisions/1/mbox/ not applied
Applying: mm: shmem: add large folio support for tmpfs
error: git diff header lacks filename information when removing 1 leading pathname component (line 2)
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 mm: shmem: add large folio support for tmpfs
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Build failed, no error log produced

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-05-06 14:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1732779148.git.baolin.wang@linux.alibaba.com>
     [not found] ` <035bf55fbdebeff65f5cb2cdb9907b7d632c3228.1732779148.git.baolin.wang@linux.alibaba.com>
2025-04-29 17:44   ` [REGRESSION] Re: [PATCH v3 3/6] mm: shmem: add large folio support for tmpfs Ville Syrjälä
2025-04-30  6:32     ` Baolin Wang
2025-04-30 11:20       ` Ville Syrjälä
2025-04-30 13:24         ` Daniel Gomez
2025-05-02  1:02           ` Baolin Wang
2025-05-02  7:18             ` David Hildenbrand
2025-05-02 13:10               ` Daniel Gomez
2025-05-02 15:31                 ` David Hildenbrand
2025-05-06  3:33                   ` Baolin Wang
2025-05-06 14:36                     ` David Hildenbrand
2025-04-30  7:02 ` ✗ Fi.CI.BUILD: failure for " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox