* [PATCH v3] mm: shmem: always support large folios for internal shmem mount
@ 2026-04-17 3:25 Baolin Wang
2026-04-17 9:21 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2026-04-17 3:25 UTC (permalink / raw)
To: akpm, hughd
Cc: willy, ziy, david, ljs, lance.yang, baolin.wang, linux-mm,
linux-kernel
Currently, when shmem mounts are initialized, they only use 'sbinfo->huge' to
determine whether the shmem mount supports large folios. However, for anonymous
shmem, whether it supports large folios can be dynamically configured via sysfs
interfaces, so setting or not setting mapping_set_large_folios() during initialization
cannot accurately reflect whether anonymous shmem actually supports large folios,
which has already caused some confusion[1].
As discussed with David[2], for anonymous shmem we can treat it as always potentially
having large folios. Therefore, always support large folios for the internal shmem
mount (e.g., anonymous shmem), and which large order allocations are allowed can be
configured dynamically via the 'shmem_enabled' interfaces.
[1] https://lore.kernel.org/all/ec927492-4577-4192-8fad-85eb1bb43121@linux.alibaba.com/
[2] https://lore.kernel.org/all/875dc63b-0cd2-49e5-8b0d-3fb062789813@kernel.org/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
Changes from v2:
- Always support large folios for internal shmem mount, per David.
Changes from v1:
- Update the comments and commit message, per Lance.
---
mm/shmem.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 4ecefe02881d..769ef37b1ea9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3088,8 +3088,17 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
if (sbinfo->noswap)
mapping_set_unevictable(inode->i_mapping);
- /* Don't consider 'deny' for emergencies and 'force' for testing */
- if (sbinfo->huge)
+ /*
+ * Always support large folios for the internal shmem mount (e.g.,
+ * anonymous shmem), and which large order allocations are allowed
+ * can be configured dynamically via the 'shmem_enabled' interfaces.
+ *
+ * For tmpfs, honour the 'huge=' mount option to determine whether
+ * large folios are supported.
+ *
+ * Note: don't consider 'deny' for emergencies and 'force' for testing.
+ */
+ if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
mapping_set_large_folios(inode->i_mapping);
switch (mode & S_IFMT) {
--
2.47.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-17 3:25 [PATCH v3] mm: shmem: always support large folios for internal shmem mount Baolin Wang
@ 2026-04-17 9:21 ` David Hildenbrand (Arm)
2026-04-17 9:27 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 9:21 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/17/26 05:25, Baolin Wang wrote:
> Currently, when shmem mounts are initialized, they only use 'sbinfo->huge' to
> determine whether the shmem mount supports large folios. However, for anonymous
> shmem, whether it supports large folios can be dynamically configured via sysfs
> interfaces, so setting or not setting mapping_set_large_folios() during initialization
> cannot accurately reflect whether anonymous shmem actually supports large folios,
> which has already caused some confusion[1].
>
> As discussed with David[2], for anonymous shmem we can treat it as always potentially
> having large folios. Therefore, always support large folios for the internal shmem
> mount (e.g., anonymous shmem), and which large order allocations are allowed can be
> configured dynamically via the 'shmem_enabled' interfaces.
>
> [1] https://lore.kernel.org/all/ec927492-4577-4192-8fad-85eb1bb43121@linux.alibaba.com/
> [2] https://lore.kernel.org/all/875dc63b-0cd2-49e5-8b0d-3fb062789813@kernel.org/
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> Changes from v2:
> - Always support large folios for internal shmem mount, per David.
> Changes from v1:
> - Update the comments and commit message, per Lance.
> ---
> mm/shmem.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4ecefe02881d..769ef37b1ea9 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3088,8 +3088,17 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
> if (sbinfo->noswap)
> mapping_set_unevictable(inode->i_mapping);
>
> - /* Don't consider 'deny' for emergencies and 'force' for testing */
> - if (sbinfo->huge)
> + /*
> + * Always support large folios for the internal shmem mount (e.g.,
> + * anonymous shmem), and which large order allocations are allowed
> + * can be configured dynamically via the 'shmem_enabled' interfaces.
> + *
> + * For tmpfs, honour the 'huge=' mount option to determine whether
> + * large folios are supported.
> + *
> + * Note: don't consider 'deny' for emergencies and 'force' for testing.
> + */
> + if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
> mapping_set_large_folios(inode->i_mapping);
Two questions from a non-fs person about the semantics here:
a) Can sbinfo->huge be triggered later, for example, through a remount
(staring at shmem_reconfigure())
b) Do we cover all cases with the SB_KERNMOUNT where sbinfo->huge cannot
be changed later?
--
Cheers,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-17 9:21 ` David Hildenbrand (Arm)
@ 2026-04-17 9:27 ` Baolin Wang
2026-04-17 9:52 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2026-04-17 9:27 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/17/26 5:21 PM, David Hildenbrand (Arm) wrote:
> On 4/17/26 05:25, Baolin Wang wrote:
>> Currently, when shmem mounts are initialized, they only use 'sbinfo->huge' to
>> determine whether the shmem mount supports large folios. However, for anonymous
>> shmem, whether it supports large folios can be dynamically configured via sysfs
>> interfaces, so setting or not setting mapping_set_large_folios() during initialization
>> cannot accurately reflect whether anonymous shmem actually supports large folios,
>> which has already caused some confusion[1].
>>
>> As discussed with David[2], for anonymous shmem we can treat it as always potentially
>> having large folios. Therefore, always support large folios for the internal shmem
>> mount (e.g., anonymous shmem), and which large order allocations are allowed can be
>> configured dynamically via the 'shmem_enabled' interfaces.
>>
>> [1] https://lore.kernel.org/all/ec927492-4577-4192-8fad-85eb1bb43121@linux.alibaba.com/
>> [2] https://lore.kernel.org/all/875dc63b-0cd2-49e5-8b0d-3fb062789813@kernel.org/
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> Changes from v2:
>> - Always support large folios for internal shmem mount, per David.
>> Changes from v1:
>> - Update the comments and commit message, per Lance.
>> ---
>> mm/shmem.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4ecefe02881d..769ef37b1ea9 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3088,8 +3088,17 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap,
>> if (sbinfo->noswap)
>> mapping_set_unevictable(inode->i_mapping);
>>
>> - /* Don't consider 'deny' for emergencies and 'force' for testing */
>> - if (sbinfo->huge)
>> + /*
>> + * Always support large folios for the internal shmem mount (e.g.,
>> + * anonymous shmem), and which large order allocations are allowed
>> + * can be configured dynamically via the 'shmem_enabled' interfaces.
>> + *
>> + * For tmpfs, honour the 'huge=' mount option to determine whether
>> + * large folios are supported.
>> + *
>> + * Note: don't consider 'deny' for emergencies and 'force' for testing.
>> + */
>> + if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
>> mapping_set_large_folios(inode->i_mapping);
>
> Two questions from a non-fs person about the semantics here:
>
> a) Can sbinfo->huge be triggered later, for example, through a remount
> (staring at shmem_reconfigure())
For tmpfs, yes.
> b) Do we cover all cases with the SB_KERNMOUNT where sbinfo->huge cannot
> be changed later?
For mounts with the SB_KERNMOUNT flag set, which is essentially the
internal shmem mount, as we discussed, we don't care about sbinfo->huge.
Because for the internal shmem mount, we always consider it as
potentially having large folios.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-17 9:27 ` Baolin Wang
@ 2026-04-17 9:52 ` David Hildenbrand (Arm)
2026-04-17 12:45 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-17 9:52 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/17/26 11:27, Baolin Wang wrote:
>
>
> On 4/17/26 5:21 PM, David Hildenbrand (Arm) wrote:
>> On 4/17/26 05:25, Baolin Wang wrote:
>>> Currently, when shmem mounts are initialized, they only use 'sbinfo-
>>> >huge' to
>>> determine whether the shmem mount supports large folios. However, for
>>> anonymous
>>> shmem, whether it supports large folios can be dynamically configured
>>> via sysfs
>>> interfaces, so setting or not setting mapping_set_large_folios()
>>> during initialization
>>> cannot accurately reflect whether anonymous shmem actually supports
>>> large folios,
>>> which has already caused some confusion[1].
>>>
>>> As discussed with David[2], for anonymous shmem we can treat it as
>>> always potentially
>>> having large folios. Therefore, always support large folios for the
>>> internal shmem
>>> mount (e.g., anonymous shmem), and which large order allocations are
>>> allowed can be
>>> configured dynamically via the 'shmem_enabled' interfaces.
>>>
>>> [1] https://lore.kernel.org/all/
>>> ec927492-4577-4192-8fad-85eb1bb43121@linux.alibaba.com/
>>> [2] https://lore.kernel.org/
>>> all/875dc63b-0cd2-49e5-8b0d-3fb062789813@kernel.org/
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> Changes from v2:
>>> - Always support large folios for internal shmem mount, per David.
>>> Changes from v1:
>>> - Update the comments and commit message, per Lance.
>>> ---
>>> mm/shmem.c | 13 +++++++++++--
>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 4ecefe02881d..769ef37b1ea9 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -3088,8 +3088,17 @@ static struct inode *__shmem_get_inode(struct
>>> mnt_idmap *idmap,
>>> if (sbinfo->noswap)
>>> mapping_set_unevictable(inode->i_mapping);
>>> - /* Don't consider 'deny' for emergencies and 'force' for
>>> testing */
>>> - if (sbinfo->huge)
>>> + /*
>>> + * Always support large folios for the internal shmem mount (e.g.,
>>> + * anonymous shmem), and which large order allocations are allowed
>>> + * can be configured dynamically via the 'shmem_enabled'
>>> interfaces.
>>> + *
>>> + * For tmpfs, honour the 'huge=' mount option to determine whether
>>> + * large folios are supported.
>>> + *
>>> + * Note: don't consider 'deny' for emergencies and 'force' for
>>> testing.
>>> + */
>>> + if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
>>> mapping_set_large_folios(inode->i_mapping);
>>
>> Two questions from a non-fs person about the semantics here:
>>
>> a) Can sbinfo->huge be triggered later, for example, through a remount
>> (staring at shmem_reconfigure())
>
> For tmpfs, yes.
So, we could pass this check here, not setting
mapping_set_large_folios(), but later someone toggles it and we missed
to set mapping_set_large_folios()?
Or would we always go through another __shmem_get_inode() after a remount?
--
Cheers,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-17 9:52 ` David Hildenbrand (Arm)
@ 2026-04-17 12:45 ` Baolin Wang
2026-04-20 19:00 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2026-04-17 12:45 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/17/26 5:52 PM, David Hildenbrand (Arm) wrote:
> On 4/17/26 11:27, Baolin Wang wrote:
>>
>>
>> On 4/17/26 5:21 PM, David Hildenbrand (Arm) wrote:
>>> On 4/17/26 05:25, Baolin Wang wrote:
>>>> Currently, when shmem mounts are initialized, they only use 'sbinfo-
>>>>> huge' to
>>>> determine whether the shmem mount supports large folios. However, for
>>>> anonymous
>>>> shmem, whether it supports large folios can be dynamically configured
>>>> via sysfs
>>>> interfaces, so setting or not setting mapping_set_large_folios()
>>>> during initialization
>>>> cannot accurately reflect whether anonymous shmem actually supports
>>>> large folios,
>>>> which has already caused some confusion[1].
>>>>
>>>> As discussed with David[2], for anonymous shmem we can treat it as
>>>> always potentially
>>>> having large folios. Therefore, always support large folios for the
>>>> internal shmem
>>>> mount (e.g., anonymous shmem), and which large order allocations are
>>>> allowed can be
>>>> configured dynamically via the 'shmem_enabled' interfaces.
>>>>
>>>> [1] https://lore.kernel.org/all/
>>>> ec927492-4577-4192-8fad-85eb1bb43121@linux.alibaba.com/
>>>> [2] https://lore.kernel.org/
>>>> all/875dc63b-0cd2-49e5-8b0d-3fb062789813@kernel.org/
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> Changes from v2:
>>>> - Always support large folios for internal shmem mount, per David.
>>>> Changes from v1:
>>>> - Update the comments and commit message, per Lance.
>>>> ---
>>>> mm/shmem.c | 13 +++++++++++--
>>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4ecefe02881d..769ef37b1ea9 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -3088,8 +3088,17 @@ static struct inode *__shmem_get_inode(struct
>>>> mnt_idmap *idmap,
>>>> if (sbinfo->noswap)
>>>> mapping_set_unevictable(inode->i_mapping);
>>>> - /* Don't consider 'deny' for emergencies and 'force' for
>>>> testing */
>>>> - if (sbinfo->huge)
>>>> + /*
>>>> + * Always support large folios for the internal shmem mount (e.g.,
>>>> + * anonymous shmem), and which large order allocations are allowed
>>>> + * can be configured dynamically via the 'shmem_enabled'
>>>> interfaces.
>>>> + *
>>>> + * For tmpfs, honour the 'huge=' mount option to determine whether
>>>> + * large folios are supported.
>>>> + *
>>>> + * Note: don't consider 'deny' for emergencies and 'force' for
>>>> testing.
>>>> + */
>>>> + if (sbinfo->huge || (sb->s_flags & SB_KERNMOUNT))
>>>> mapping_set_large_folios(inode->i_mapping);
>>>
>>> Two questions from a non-fs person about the semantics here:
>>>
>>> a) Can sbinfo->huge be triggered later, for example, through a remount
>>> (staring at shmem_reconfigure())
>>
>> For tmpfs, yes.
>
> So, we could pass this check here, not setting
> mapping_set_large_folios(), but later someone toggles it and we missed
> to set mapping_set_large_folios()?
Indeed. Good point.
>
> Or would we always go through another __shmem_get_inode() after a remount?
Not really. There could be files created before remount whose mappings
don't support large folios (with 'huge=never' option), while files
created after remount will have mappings that support large folios (if
remounted with 'huge=always' option).
It looks like the previous commit 5a90c155defa was also problematic. The
huge mount option has introduced a lot of tricky issues:(
Now I think Zi's previous suggestion should be able to clean up this
mess? That is, calling mapping_set_large_folios() unconditionally for
all shmem mounts, and revisiting Kefeng's first version to fix the
performance issue.
[1]
https://lore.kernel.org/all/20240914140613.2334139-1-wangkefeng.wang@huawei.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-17 12:45 ` Baolin Wang
@ 2026-04-20 19:00 ` David Hildenbrand (Arm)
2026-04-21 6:27 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-20 19:00 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/17/26 14:45, Baolin Wang wrote:
>
>
> On 4/17/26 5:52 PM, David Hildenbrand (Arm) wrote:
>> On 4/17/26 11:27, Baolin Wang wrote:
>>>
>>>
>>>
>>> For tmpfs, yes.
>>
>> So, we could pass this check here, not setting
>> mapping_set_large_folios(), but later someone toggles it and we missed
>> to set mapping_set_large_folios()?
>
> Indeed. Good point.
>
>>
>> Or would we always go through another __shmem_get_inode() after a
>> remount?
>
> Not really. There could be files created before remount whose mappings
> don't support large folios (with 'huge=never' option), while files
> created after remount will have mappings that support large folios (if
> remounted with 'huge=always' option).
>
> It looks like the previous commit 5a90c155defa was also problematic. The
> huge mount option has introduced a lot of tricky issues:(
>
> Now I think Zi's previous suggestion should be able to clean up this
> mess? That is, calling mapping_set_large_folios() unconditionally for
> all shmem mounts, and revisiting Kefeng's first version to fix the
> performance issue.
Okay, so you'll send a patch to just set mapping_set_large_folios()
unconditionally?
>
> [1] https://lore.kernel.org/all/20240914140613.2334139-1-
> wangkefeng.wang@huawei.com/
Is that really required? Which call path would be the problematic bit
with the above?
I'd say, we'd check in the large folio allocation code whether ->huge is
set to never instead?
--
Cheers,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-20 19:00 ` David Hildenbrand (Arm)
@ 2026-04-21 6:27 ` Baolin Wang
2026-04-21 13:39 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2026-04-21 6:27 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:
> On 4/17/26 14:45, Baolin Wang wrote:
>>
>>
>> On 4/17/26 5:52 PM, David Hildenbrand (Arm) wrote:
>>> On 4/17/26 11:27, Baolin Wang wrote:
>>>>
>>>>
>>>>
>>>> For tmpfs, yes.
>>>
>>> So, we could pass this check here, not setting
>>> mapping_set_large_folios(), but later someone toggles it and we missed
>>> to set mapping_set_large_folios()?
>>
>> Indeed. Good point.
>>
>>>
>>> Or would we always go through another __shmem_get_inode() after a
>>> remount?
>>
>> Not really. There could be files created before remount whose mappings
>> don't support large folios (with 'huge=never' option), while files
>> created after remount will have mappings that support large folios (if
>> remounted with 'huge=always' option).
>>
>> It looks like the previous commit 5a90c155defa was also problematic. The
>> huge mount option has introduced a lot of tricky issues:(
>>
>> Now I think Zi's previous suggestion should be able to clean up this
>> mess? That is, calling mapping_set_large_folios() unconditionally for
>> all shmem mounts, and revisiting Kefeng's first version to fix the
>> performance issue.
>
> Okay, so you'll send a patch to just set mapping_set_large_folios()
> unconditionally?
I'm still hesitating on this. If we set mapping_set_large_folios()
unconditionally, we need to re-fix the performance regression that was
addressed by commit 5a90c155defa.
But it's hard for me to convince myself to add a new flag similar to
IOCB_NO_LARGE_CHUNK for this hack (like the patch in [1] does).
>> [1] https://lore.kernel.org/all/20240914140613.2334139-1-
>> wangkefeng.wang@huawei.com/
>
> Is that really required? Which call path would be the problematic bit
> with the above?
>
> I'd say, we'd check in the large folio allocation code whether ->huge is
> set to never instead?
Yes, this is exactly our current logic. When allocating large folios,
we'll check the ->huge setting in shmem_huge_global_enabled(), which
means large folio allocations always respect the ->huge setting.
But as I mentioned earlier, the ->huge setting cannot keep the
mapping_set_large_folios() setting consistent across all mappings in the
entire tmpfs mount. My concern is that under the same tmpfs mount, after
remount, we might end up with some mappings supporting large folios
(calling mapping_set_large_folios()) while others don't.
However, I got some insights from
Documentation/admin-guide/mm/transhuge.rst. Does this mean that after
remount, whether the mappings of existing files support large folios
should remain unchanged?
“
``mount -o remount,huge= /mountpoint`` works fine after mount:
remounting ``huge=never`` will not attempt to break up huge pages at
all, just stop more from being allocated.
”
Do you think this makes sense?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-21 6:27 ` Baolin Wang
@ 2026-04-21 13:39 ` David Hildenbrand (Arm)
2026-04-22 6:28 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-21 13:39 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel
On 4/21/26 08:27, Baolin Wang wrote:
>
>
> On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:
>> On 4/17/26 14:45, Baolin Wang wrote:
>>>
>>>
>>>
>>> Indeed. Good point.
>>>
>>>
>>> Not really. There could be files created before remount whose mappings
>>> don't support large folios (with 'huge=never' option), while files
>>> created after remount will have mappings that support large folios (if
>>> remounted with 'huge=always' option).
>>>
>>> It looks like the previous commit 5a90c155defa was also problematic. The
>>> huge mount option has introduced a lot of tricky issues:(
>>>
>>> Now I think Zi's previous suggestion should be able to clean up this
>>> mess? That is, calling mapping_set_large_folios() unconditionally for
>>> all shmem mounts, and revisiting Kefeng's first version to fix the
>>> performance issue.
>>
>> Okay, so you'll send a patch to just set mapping_set_large_folios()
>> unconditionally?
>
> I'm still hesitating on this. If we set mapping_set_large_folios()
> unconditionally, we need to re-fix the performance regression that was
> addressed by commit 5a90c155defa.
Just so I can follow: where is the test for large folios that we would
unlock large folios and cause a regression?
>
> But it's hard for me to convince myself to add a new flag similar to
> IOCB_NO_LARGE_CHUNK for this hack (like the patch in [1] does).
>
>>> [1] https://lore.kernel.org/all/20240914140613.2334139-1-
>>> wangkefeng.wang@huawei.com/
>>
>> Is that really required? Which call path would be the problematic bit
>> with the above?
>>
>> I'd say, we'd check in the large folio allocation code whether ->huge is
>> set to never instead?
>
> Yes, this is exactly our current logic. When allocating large folios,
> we'll check the ->huge setting in shmem_huge_global_enabled(), which
> means large folio allocations always respect the ->huge setting.
Makes sense.
>
> But as I mentioned earlier, the ->huge setting cannot keep the
> mapping_set_large_folios() setting consistent across all mappings in the
> entire tmpfs mount. My concern is that under the same tmpfs mount, after
> remount, we might end up with some mappings supporting large folios
> (calling mapping_set_large_folios()) while others don't.
If we at least always set mapping_set_large_folios(), then there is no
inconsistency in that regard :)
>
> However, I got some insights from Documentation/admin-guide/mm/
> transhuge.rst. Does this mean that after remount, whether the mappings
> of existing files support large folios should remain unchanged?
That's the current behavior, right?
>
> “
> ``mount -o remount,huge= /mountpoint`` works fine after mount:
> remounting ``huge=never`` will not attempt to break up huge pages at
> all, just stop more from being allocated.
> ”
>
> Do you think this makes sense?
I suspect that matches existing behavior, so it should be fine.
--
Cheers,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-21 13:39 ` David Hildenbrand (Arm)
@ 2026-04-22 6:28 ` Baolin Wang
2026-04-22 15:03 ` Kefeng Wang
0 siblings, 1 reply; 11+ messages in thread
From: Baolin Wang @ 2026-04-22 6:28 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel, Kefeng Wang
CC Kefeng,
On 4/21/26 9:39 PM, David Hildenbrand (Arm) wrote:
> On 4/21/26 08:27, Baolin Wang wrote:
>>
>>
>> On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:
>>> On 4/17/26 14:45, Baolin Wang wrote:
>>>>
>>>>
>>>>
>>>> Indeed. Good point.
>>>>
>>>>
>>>> Not really. There could be files created before remount whose mappings
>>>> don't support large folios (with 'huge=never' option), while files
>>>> created after remount will have mappings that support large folios (if
>>>> remounted with 'huge=always' option).
>>>>
>>>> It looks like the previous commit 5a90c155defa was also problematic. The
>>>> huge mount option has introduced a lot of tricky issues:(
>>>>
>>>> Now I think Zi's previous suggestion should be able to clean up this
>>>> mess? That is, calling mapping_set_large_folios() unconditionally for
>>>> all shmem mounts, and revisiting Kefeng's first version to fix the
>>>> performance issue.
>>>
>>> Okay, so you'll send a patch to just set mapping_set_large_folios()
>>> unconditionally?
>>
>> I'm still hesitating on this. If we set mapping_set_large_folios()
>> unconditionally, we need to re-fix the performance regression that was
>> addressed by commit 5a90c155defa.
>
> Just so I can follow: where is the test for large folios that we would
> unlock large folios and cause a regression?
I spent some time investigating the performance regression that was
addressed by commit 5a90c155defa ("tmpfs: don't enable large folios if
not supported"). From my testing, I found that the performance issue no
longer exists on upstream:
mount tmpfs -t tmpfs -o size=50G /mnt/tmpfs
Base:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.1 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.0 GB/s )
dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
Base + revert 5a90c155defa:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.3 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.3 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.2 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.1 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/testbs=3000K count=1398 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
The data is basically consistent with minor fluctuation noise.
Later, I continued investigating and found that commit 665575cff098b
("filemap: move prefaulting out of hot write path") fixed the write
operation performance.
Base + revert 665575cff098b + revert 5a90c155defa:
dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.0 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (2.9 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (2.6 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (2.6 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (2.5 GB/s)
dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (2.5 GB/s)
We can see that after reverting commit 665575cff098b, there is a
noticeable drop in write performance for tmpfs files.
So my conclusion is that we can now safely revert commit 5a90c155defa to
set mapping_set_large_folios() for all shmem mounts unconditionally.
Kefeng, please correct me if I missed anything.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-22 6:28 ` Baolin Wang
@ 2026-04-22 15:03 ` Kefeng Wang
2026-04-23 0:43 ` Baolin Wang
0 siblings, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2026-04-22 15:03 UTC (permalink / raw)
To: Baolin Wang, David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel, Dave Hansen
On 4/22/2026 2:28 PM, Baolin Wang wrote:
> CC Kefeng,
>
> On 4/21/26 9:39 PM, David Hildenbrand (Arm) wrote:
>> On 4/21/26 08:27, Baolin Wang wrote:
>>>
>>>
>>> On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:
>>>> On 4/17/26 14:45, Baolin Wang wrote:
>>>>>
>>>>>
>>>>>
>>>>> Indeed. Good point.
>>>>>
>>>>>
>>>>> Not really. There could be files created before remount whose mappings
>>>>> don't support large folios (with 'huge=never' option), while files
>>>>> created after remount will have mappings that support large folios (if
>>>>> remounted with 'huge=always' option).
>>>>>
>>>>> It looks like the previous commit 5a90c155defa was also
>>>>> problematic. The
>>>>> huge mount option has introduced a lot of tricky issues:(
>>>>>
>>>>> Now I think Zi's previous suggestion should be able to clean up this
>>>>> mess? That is, calling mapping_set_large_folios() unconditionally for
>>>>> all shmem mounts, and revisiting Kefeng's first version to fix the
>>>>> performance issue.
>>>>
>>>> Okay, so you'll send a patch to just set mapping_set_large_folios()
>>>> unconditionally?
>>>
>>> I'm still hesitating on this. If we set mapping_set_large_folios()
>>> unconditionally, we need to re-fix the performance regression that was
>>> addressed by commit 5a90c155defa.
>>
>> Just so I can follow: where is the test for large folios that we would
>> unlock large folios and cause a regression?
>
> I spent some time investigating the performance regression that was
> addressed by commit 5a90c155defa ("tmpfs: don't enable large folios if
> not supported"). From my testing, I found that the performance issue no
> longer exists on upstream:
>
> mount tmpfs -t tmpfs -o size=50G /mnt/tmpfs
>
> Base:
> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.2 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.2 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.1 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.0 GB/s )
> dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (3.0 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
>
> Base + revert 5a90c155defa:
> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.3 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.3 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.2 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.1 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/testbs=3000K count=1398 (3.0 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
>
> The data is basically consistent with minor fluctuation noise.
>
> Later, I continued investigating and found that commit 665575cff098b
> ("filemap: move prefaulting out of hot write path") fixed the write
> operation performance.
>
> Base + revert 665575cff098b + revert 5a90c155defa:
> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.0 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (2.9 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (2.6 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (2.6 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (2.5 GB/s)
> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (2.5 GB/s)
>
> We can see that after reverting commit 665575cff098b, there is a
> noticeable drop in write performance for tmpfs files.
>
> So my conclusion is that we can now safely revert commit 5a90c155defa to
> set mapping_set_large_folios() for all shmem mounts unconditionally.
>
> Kefeng, please correct me if I missed anything.
Hi Baolin,I found my testcases "bonnie Block/Re Write"
./bonnie -d /tmp -s Size (size is from 100,256,512,1024,2048,4096).
But the dd test is similar as well, and as commit 4e527d5841e2
("iomap: fault in smaller chunks for non-large folio mappings") said,
the issue is,
"If chunk is 2MB, total 512 pages need to be handled finally. During this
period, fault_in_iov_iter_readable() is called to check iov_iter readable
validity. Since only 4KB will be handled each time, below address space
will be checked over and over again"
But after 665575cff098b, fault_in_iov_iter_readable() is moved, so the
issue should be fixed.
+CC Dave,
Since 665575cff098b is works well in generic_perform_write(), I think we
could do the same optimization in iomap_write_iter()? but it seems
maintainer forget pickup them[1].
[1]
https://lore.kernel.org/all/20250129181753.3927F212@davehans-spike.ostc.intel.com/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v3] mm: shmem: always support large folios for internal shmem mount
2026-04-22 15:03 ` Kefeng Wang
@ 2026-04-23 0:43 ` Baolin Wang
0 siblings, 0 replies; 11+ messages in thread
From: Baolin Wang @ 2026-04-23 0:43 UTC (permalink / raw)
To: Kefeng Wang, David Hildenbrand (Arm), akpm, hughd
Cc: willy, ziy, ljs, lance.yang, linux-mm, linux-kernel, Dave Hansen
On 4/22/26 11:03 PM, Kefeng Wang wrote:
>
>
> On 4/22/2026 2:28 PM, Baolin Wang wrote:
>> CC Kefeng,
>>
>> On 4/21/26 9:39 PM, David Hildenbrand (Arm) wrote:
>>> On 4/21/26 08:27, Baolin Wang wrote:
>>>>
>>>>
>>>> On 4/21/26 3:00 AM, David Hildenbrand (Arm) wrote:
>>>>> On 4/17/26 14:45, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Indeed. Good point.
>>>>>>
>>>>>>
>>>>>> Not really. There could be files created before remount whose
>>>>>> mappings
>>>>>> don't support large folios (with 'huge=never' option), while files
>>>>>> created after remount will have mappings that support large folios
>>>>>> (if
>>>>>> remounted with 'huge=always' option).
>>>>>>
>>>>>> It looks like the previous commit 5a90c155defa was also
>>>>>> problematic. The
>>>>>> huge mount option has introduced a lot of tricky issues:(
>>>>>>
>>>>>> Now I think Zi's previous suggestion should be able to clean up this
>>>>>> mess? That is, calling mapping_set_large_folios() unconditionally for
>>>>>> all shmem mounts, and revisiting Kefeng's first version to fix the
>>>>>> performance issue.
>>>>>
>>>>> Okay, so you'll send a patch to just set mapping_set_large_folios()
>>>>> unconditionally?
>>>>
>>>> I'm still hesitating on this. If we set mapping_set_large_folios()
>>>> unconditionally, we need to re-fix the performance regression that was
>>>> addressed by commit 5a90c155defa.
>>>
>>> Just so I can follow: where is the test for large folios that we would
>>> unlock large folios and cause a regression?
>>
>> I spent some time investigating the performance regression that was
>> addressed by commit 5a90c155defa ("tmpfs: don't enable large folios if
>> not supported"). From my testing, I found that the performance issue
>> no longer exists on upstream:
>>
>> mount tmpfs -t tmpfs -o size=50G /mnt/tmpfs
>>
>> Base:
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.2 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.2 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.1 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.0 GB/s )
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (3.0 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
>>
>> Base + revert 5a90c155defa:
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.3 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (3.3 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (3.2 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (3.1 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/testbs=3000K count=1398 (3.0 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (3.1 GB/s)
>>
>> The data is basically consistent with minor fluctuation noise.
>>
>> Later, I continued investigating and found that commit 665575cff098b
>> ("filemap: move prefaulting out of hot write path") fixed the write
>> operation performance.
>>
>> Base + revert 665575cff098b + revert 5a90c155defa:
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=400K count=10485 (3.0 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=800K count=5242 (2.9 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=1600K count=2621 (2.6 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=2200K count=1906 (2.6 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=3000K count=1398 (2.5 GB/s)
>> dd if=/dev/zero of=/mnt/tmpfs/test bs=4500K count=932 (2.5 GB/s)
>>
>> We can see that after reverting commit 665575cff098b, there is a
>> noticeable drop in write performance for tmpfs files.
>>
>> So my conclusion is that we can now safely revert commit 5a90c155defa
>> to set mapping_set_large_folios() for all shmem mounts unconditionally.
>>
>> Kefeng, please correct me if I missed anything.
>
> Hi Baolin,I found my testcases "bonnie Block/Re Write"
>
> ./bonnie -d /tmp -s Size (size is from 100,256,512,1024,2048,4096).
>
> But the dd test is similar as well, and as commit 4e527d5841e2
> ("iomap: fault in smaller chunks for non-large folio mappings") said,
> the issue is,
>
> "If chunk is 2MB, total 512 pages need to be handled finally. During this
> period, fault_in_iov_iter_readable() is called to check iov_iter readable
> validity. Since only 4KB will be handled each time, below address space
> will be checked over and over again"
>
> But after 665575cff098b, fault_in_iov_iter_readable() is moved, so the
> issue should be fixed.
Kefeng, thanks for confirming.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-23 0:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 3:25 [PATCH v3] mm: shmem: always support large folios for internal shmem mount Baolin Wang
2026-04-17 9:21 ` David Hildenbrand (Arm)
2026-04-17 9:27 ` Baolin Wang
2026-04-17 9:52 ` David Hildenbrand (Arm)
2026-04-17 12:45 ` Baolin Wang
2026-04-20 19:00 ` David Hildenbrand (Arm)
2026-04-21 6:27 ` Baolin Wang
2026-04-21 13:39 ` David Hildenbrand (Arm)
2026-04-22 6:28 ` Baolin Wang
2026-04-22 15:03 ` Kefeng Wang
2026-04-23 0:43 ` Baolin Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox