From: Lance Yang <lance.yang@linux.dev>
To: Baolin Wang <baolin.wang@linux.alibaba.com>, luizcap@redhat.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
david@kernel.org, ziy@nvidia.com, corbet@lwn.net,
tsbogend@alpha.franken.de, maddy@linux.ibm.com,
mpe@ellerman.id.au, agordeev@linux.ibm.com,
gerald.schaefer@linux.ibm.com, hca@linux.ibm.com,
gor@linux.ibm.com, x86@kernel.org, dave.hansen@linux.intel.com,
djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com
Subject: Re: (sashiko review) Re: [PATCH v4 6/9] mm: shmem: drop has_transparent_hugepage() usage
Date: Mon, 18 May 2026 13:00:49 +0800 [thread overview]
Message-ID: <8c1ab541-babb-462e-b366-4555d2a09b43@linux.dev> (raw)
In-Reply-To: <6591a74c-7ef9-4614-9ae9-cb2fbed86ebf@linux.alibaba.com>
On 2026/5/18 11:47, Baolin Wang wrote:
>
>
> On 5/17/26 9:32 PM, Lance Yang wrote:
>>
>> On Wed, May 06, 2026 at 02:12:41PM -0400, Luiz Capitulino wrote:
>>> On 2026-05-01 15:18, Luiz Capitulino wrote:
>>>> Shmem uses has_transparent_hugepage() in the following ways:
>>>>
>>>> - shmem_parse_one() and shmem_parse_huge(): Check if THP is built-in
>>>> and
>>>> if the CPU supports PMD-sized pages
>>>>
>>>> - shmem_init(): Since the CONFIG_TRANSPARENT_HUGEPAGE guard is outside
>>>> the code block calling has_transparent_hugepage(), the
>>>> has_transparent_hugepage() call is exclusively checking if the CPU
>>>> supports PMD-sized pages
>>>>
>>>> While it's necessary to check if CONFIG_TRANSPARENT_HUGEPAGE is enabled
>>>> in all cases, shmem can determine mTHP size support at folio allocation
>>>> time. Therefore, drop has_transparent_hugepage() usage while keeping
>>>> the
>>>> CONFIG_TRANSPARENT_HUGEPAGE checks.
>>>>
>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
>>>> ---
>>>> mm/shmem.c | 7 +++----
>>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 3b5dc21b323c..1948d73fb1e3 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -689,7 +689,7 @@ static int shmem_parse_huge(const char *str)
>>>> else
>>>> return -EINVAL;
>>>> - if (!has_transparent_hugepage() &&
>>>> + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>>>> huge != SHMEM_HUGE_NEVER && huge != SHMEM_HUGE_DENY)
>>>> return -EINVAL;
>>>> @@ -4656,8 +4656,7 @@ static int shmem_parse_one(struct fs_context
>>>> *fc, struct fs_parameter *param)
>>>> case Opt_huge:
>>>> ctx->huge = result.uint_32;
>>>> if (ctx->huge != SHMEM_HUGE_NEVER &&
>>>> - !(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>>>> - has_transparent_hugepage()))
>>>> + !IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>>>> goto unsupported_parameter;
>>>> ctx->seen |= SHMEM_SEEN_HUGE;
>>>> break;
>>>
>>> """
>>> By dropping the has_transparent_hugepage() check, will mount -t tmpfs
>>> -o huge=always now succeed on hardware lacking PMD support?
>>>
>>> If so, since hugepage_init() still sets the
>>> TRANSPARENT_HUGEPAGE_UNSUPPORTED
>>> flag, thp_disabled_by_hw() will unconditionally block all large folio
>>> allocations in shmem_allowable_huge_orders().
>>>
>>> Does this create an intermediate state where the mount silently succeeds
>>> but no huge pages of any size can actually be allocated?
>>>
>>> I see this is resolved later in the series by commit cd27430097e8
>>> ("mm: replace thp_disabled_by_hw() with pgtable_has_pmd_leaves()") and
>>> commit 641a20ae032f ("mm: thp: always enable mTHP support").
>>> """
>>>
>>> The mount -t tmpfs -o huge=always succeeding on hardware without PMD
>>> support can happen in this patch, yes. But this seems very minor, the
>>> impact seems to be someone doing bisection, landing on this patch and
>>> their reproducer is depedent on mounting tmpfs with -o huge=always on
>>> hardware without PMD size support? I can fix it if others feel strong
>>> about this.
>>>
>>>> @@ -5449,7 +5448,7 @@ void __init shmem_init(void)
>>>> #endif
>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> - if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>> + if (shmem_huge > SHMEM_HUGE_DENY)
>>>> SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>> else
>>>> shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was
>>>> patched */
>>>
>>> """
>>> Also, by allowing shmem_huge to be set to SHMEM_HUGE_ALWAYS on systems
>>> without PMD support, does this incorrectly affect shmem_getattr()?
>>>
>>> shmem_getattr() relies on shmem_huge_global_enabled(), which only checks
>>> the software configuration and not hardware PMD support. Consequently,
>>> shmem_getattr() will erroneously report stat->blksize = HPAGE_PMD_SIZE
>>> to userspace.
>>>
>>> Since subsequent patches in the series do not appear to update
>>> shmem_getattr(), could this misleading block size cause userspace tools
>>> to over-allocate IO buffers on hardware where PMD-sized pages are
>>> structurally impossible?
>>> """
>>>
>>> This a real issue (albeit small one), the problem is this check in
>>> shmem_getattr():
>>>
>>> if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
>>> stat->blksize = HPAGE_PMD_SIZE;
>>>
>>> So, we may report HPAGE_PMD_SIZE even when PMD size is not supported.
>>> Looks like we may over-report today as well for the
>>> SHMEM_HUGE_WITHIN_SIZE case? In any case, I'll fix this.
>>
>> Well spotted.
>>
>> @Baolin looks like shmem_getattr() might already be buggy?
>>
>> shmem_huge_global_enabled() returns an order mask. For huge=always and
>> huge=within_size it can return THP_ORDERS_ALL_FILE_DEFAULT, which is not
>> PMD-only and can include smaller file mTHP orders as well ...
>>
>> So shmem_getattr() treating any non-zero mask as HPAGE_PMD_SIZE looks
>> like an over-report?
>
> Normally, it looks fine because we always start trying from PMD-sized
> large order if tmpfs supports large order (see commit 69e0a3b49003 ("mm:
> shmem: fix the strategy for the tmpfs 'huge=' options")).
>
> And the code here works for 'force' or 'always', but not so much for
> 'within_size' or 'advise', as Hugh mentioned before[1].
>
> Especially in 'within_size' mode, after we allow fallback to smaller
> large orders, the logic becomes unreasonable. Although I'm not sure if
> there's any benefit after the modification, it would make the logic
> clearer, for example:
Ah, I see. Thanks for the explanation!
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 106e4de943fb..e9f43aefdc7d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1297,8 +1297,14 @@ static int shmem_getattr(struct mnt_idmap *idmap,
> STATX_ATTR_NODUMP);
> generic_fillattr(idmap, request_mask, inode, stat);
>
> - if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
> - stat->blksize = HPAGE_PMD_SIZE;
> + orders = shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0);
> + if (!pgtable_has_pmd_leaves())
> + orders &= ~BIT(PMD_ORDER);
> + if (orders) {
> + unsigned int hi_order = highest_order(orders);
> +
> + stat->blksize = PAGE_SIZE << hi_order;
> + }
Yep, using the highest remaining order after filtering out unsupported
PMD/PUD orders looks better to me. That would keep the stat hint aligned
with the allocation path :D
>
> if (request_mask & STATX_BTIME) {
> stat->result_mask |= STATX_BTIME;
>
> [1] https://lore.kernel.org/all/1524665633-83806-1-git-send-email-
> yang.shi@linux.alibaba.com/T/#m21037b23be70fb9f7ab1965bb8b39242752594d1
Thanks, Lance
next prev parent reply other threads:[~2026-05-18 5:01 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 19:18 [PATCH v4 0/9] mm: thp: always enable mTHP support Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 1/9] docs: tmpfs: remove implementation detail reference Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 2/9] mm: introduce pgtable_has_pmd_leaves() Luiz Capitulino
2026-05-06 17:50 ` (sashiko review) " Luiz Capitulino
2026-05-13 15:30 ` David Hildenbrand (Arm)
2026-05-14 1:52 ` Luiz Capitulino
2026-05-17 12:03 ` Lance Yang
2026-05-19 13:10 ` Luiz Capitulino
2026-05-13 15:36 ` David Hildenbrand (Arm)
2026-05-14 2:18 ` Luiz Capitulino
2026-05-17 12:41 ` Lance Yang
2026-05-19 13:14 ` Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 3/9] drivers: dax: use pgtable_has_pmd_leaves() Luiz Capitulino
2026-05-13 15:40 ` David Hildenbrand (Arm)
2026-05-14 2:25 ` Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 4/9] drivers: nvdimm: " Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 5/9] mm: debug_vm_pgtable: " Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 6/9] mm: shmem: drop has_transparent_hugepage() usage Luiz Capitulino
2026-05-06 18:12 ` (sashiko review) " Luiz Capitulino
2026-05-17 13:32 ` Lance Yang
2026-05-18 3:47 ` Baolin Wang
2026-05-18 5:00 ` Lance Yang [this message]
2026-05-01 19:18 ` [PATCH v4 7/9] treewide: introduce arch_has_pmd_leaves() Luiz Capitulino
2026-05-06 18:22 ` (sashiko review) " Luiz Capitulino
2026-05-06 18:30 ` Luiz Capitulino
2026-05-01 19:18 ` [PATCH v4 8/9] mm: replace thp_disabled_by_hw() with pgtable_has_pmd_leaves() Luiz Capitulino
2026-05-13 15:50 ` David Hildenbrand (Arm)
2026-05-01 19:18 ` [PATCH v4 9/9] mm: thp: always enable mTHP support Luiz Capitulino
2026-05-06 5:46 ` Baolin Wang
2026-05-06 18:34 ` (sashiko review) " Luiz Capitulino
2026-05-13 15:58 ` David Hildenbrand (Arm)
2026-05-14 1:14 ` Baolin Wang
2026-05-03 15:02 ` [PATCH v4 0/9] " Andrew Morton
2026-05-04 19:11 ` Luiz Capitulino
2026-05-14 6:35 ` Lorenzo Stoakes
2026-05-14 12:14 ` Luiz Capitulino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c1ab541-babb-462e-b366-4555d2a09b43@linux.dev \
--to=lance.yang@linux.dev \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@kernel.org \
--cc=djbw@kernel.org \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luizcap@redhat.com \
--cc=maddy@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=tsbogend@alpha.franken.de \
--cc=vishal.l.verma@intel.com \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.