From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F3AA33F5A3 for ; Mon, 18 May 2026 05:01:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779080486; cv=none; b=sYbUdNAQhiVq1NyZ4pebVhjMnn4GkiGTVlmgL/ftSf5+7zya89dQGtH9HeYj6xRhDn/VHMIbYa0uSGO96cfNENPhMfeXQRUgziqySCakDZ6n0b97Pq725usIknY5jy9stXanVGD3ufXPJ3hVzfGsknoUwaXkaHFzC18E+QL1B+A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779080486; c=relaxed/simple; bh=sQ5uK1/ziQ7xT6bHO5gc9W00rFhnCZHl1BCa0vIHavQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=tHvq7gT7FR3L63Fz3FIWHyHE13bNdsRJOrYSc3g4KWcp9smDjATs/nxH4OiqSbE8f2/BtHbkkzTfS1aVIl0Sj6HzSXefPTrVw1bCkc262NQWgGl/RMnUHT8VaMkef95Yt35xfZd2DAtRCwOzN89+J/W4KeI09oMkVPdVLYAHhtA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=u1nu6tPq; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="u1nu6tPq" Message-ID: <8c1ab541-babb-462e-b366-4555d2a09b43@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1779080480; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tlo2znxNKON1HwKplhS/NvaV1BD9SxzqEEMSISG29kQ=; b=u1nu6tPqa9rl30GPY5zmDKUsg/5PpE5pCLgXqaOaPeB01c8rdFyhMqr68AntP26KX122Ie KlCKElKIYNFvMUtVgIfHhBysKIVIcKzV5xPeCLUYQBrzpkvhiHPAOOqDExQAEYvr2a/CST 59qNezxjuz+1i27rOG2ldrCk0shu7WA= Date: Mon, 18 May 2026 13:00:49 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: (sashiko review) Re: [PATCH v4 6/9] mm: shmem: drop has_transparent_hugepage() usage To: Baolin Wang , luizcap@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, david@kernel.org, ziy@nvidia.com, corbet@lwn.net, tsbogend@alpha.franken.de, maddy@linux.ibm.com, mpe@ellerman.id.au, agordeev@linux.ibm.com, gerald.schaefer@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, x86@kernel.org, dave.hansen@linux.intel.com, djbw@kernel.org, vishal.l.verma@intel.com, dave.jiang@intel.com, akpm@linux-foundation.org, lorenzo.stoakes@oracle.com References: <20260517133239.26416-1-lance.yang@linux.dev> <6591a74c-7ef9-4614-9ae9-cb2fbed86ebf@linux.alibaba.com> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: <6591a74c-7ef9-4614-9ae9-cb2fbed86ebf@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2026/5/18 11:47, Baolin Wang wrote: > > > On 5/17/26 9:32 PM, Lance Yang wrote: >> >> On Wed, May 06, 2026 at 02:12:41PM -0400, Luiz Capitulino wrote: >>> On 2026-05-01 15:18, Luiz Capitulino wrote: >>>> Shmem uses has_transparent_hugepage() in the following ways: >>>> >>>> - shmem_parse_one() and shmem_parse_huge(): Check if THP is built-in >>>> and >>>>     if the CPU supports PMD-sized pages >>>> >>>> - shmem_init(): Since the CONFIG_TRANSPARENT_HUGEPAGE guard is outside >>>>     the code block calling has_transparent_hugepage(), the >>>>     has_transparent_hugepage() call is exclusively checking if the CPU >>>>     supports PMD-sized pages >>>> >>>> While it's necessary to check if CONFIG_TRANSPARENT_HUGEPAGE is enabled >>>> in all cases, shmem can determine mTHP size support at folio allocation >>>> time. Therefore, drop has_transparent_hugepage() usage while keeping >>>> the >>>> CONFIG_TRANSPARENT_HUGEPAGE checks. >>>> >>>> Reviewed-by: Baolin Wang >>>> Reviewed-by: Lance Yang >>>> Acked-by: Zi Yan >>>> Signed-off-by: Luiz Capitulino >>>> --- >>>>    mm/shmem.c | 7 +++---- >>>>    1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>> index 3b5dc21b323c..1948d73fb1e3 100644 >>>> --- a/mm/shmem.c >>>> +++ b/mm/shmem.c >>>> @@ -689,7 +689,7 @@ static int shmem_parse_huge(const char *str) >>>>        else >>>>            return -EINVAL; >>>> -    if (!has_transparent_hugepage() && >>>> +    if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && >>>>            huge != SHMEM_HUGE_NEVER && huge != SHMEM_HUGE_DENY) >>>>            return -EINVAL; >>>> @@ -4656,8 +4656,7 @@ static int shmem_parse_one(struct fs_context >>>> *fc, struct fs_parameter *param) >>>>        case Opt_huge: >>>>            ctx->huge = result.uint_32; >>>>            if (ctx->huge != SHMEM_HUGE_NEVER && >>>> -            !(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && >>>> -              has_transparent_hugepage())) >>>> +            !IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) >>>>                goto unsupported_parameter; >>>>            ctx->seen |= SHMEM_SEEN_HUGE; >>>>            break; >>> >>> """ >>> By dropping the has_transparent_hugepage() check, will mount -t tmpfs >>> -o huge=always now succeed on hardware lacking PMD support? >>> >>> If so, since hugepage_init() still sets the >>> TRANSPARENT_HUGEPAGE_UNSUPPORTED >>> flag, thp_disabled_by_hw() will unconditionally block all large folio >>> allocations in shmem_allowable_huge_orders(). >>> >>> Does this create an intermediate state where the mount silently succeeds >>> but no huge pages of any size can actually be allocated? >>> >>> I see this is resolved later in the series by commit cd27430097e8 >>> ("mm: replace thp_disabled_by_hw() with pgtable_has_pmd_leaves()") and >>> commit 641a20ae032f ("mm: thp: always enable mTHP support"). >>> """ >>> >>> The mount -t tmpfs -o huge=always succeeding on hardware without PMD >>> support can happen in this patch, yes. But this seems very minor, the >>> impact seems to be someone doing bisection, landing on this patch and >>> their reproducer is depedent on mounting tmpfs with -o huge=always on >>> hardware without PMD size support? I can fix it if others feel strong >>> about this. >>> >>>> @@ -5449,7 +5448,7 @@ void __init shmem_init(void) >>>>    #endif >>>>    #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> -    if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY) >>>> +    if (shmem_huge > SHMEM_HUGE_DENY) >>>>            SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; >>>>        else >>>>            shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was >>>> patched */ >>> >>> """ >>> Also, by allowing shmem_huge to be set to SHMEM_HUGE_ALWAYS on systems >>> without PMD support, does this incorrectly affect shmem_getattr()? >>> >>> shmem_getattr() relies on shmem_huge_global_enabled(), which only checks >>> the software configuration and not hardware PMD support. Consequently, >>> shmem_getattr() will erroneously report stat->blksize = HPAGE_PMD_SIZE >>> to userspace. >>> >>> Since subsequent patches in the series do not appear to update >>> shmem_getattr(), could this misleading block size cause userspace tools >>> to over-allocate IO buffers on hardware where PMD-sized pages are >>> structurally impossible? >>> """ >>> >>> This a real issue (albeit small one), the problem is this check in >>> shmem_getattr(): >>> >>>     if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) >>>         stat->blksize = HPAGE_PMD_SIZE; >>> >>> So, we may report HPAGE_PMD_SIZE even when PMD size is not supported. >>> Looks like we may over-report today as well for the >>> SHMEM_HUGE_WITHIN_SIZE case? In any case, I'll fix this. >> >> Well spotted. >> >> @Baolin looks like shmem_getattr() might already be buggy? >> >> shmem_huge_global_enabled() returns an order mask. For huge=always and >> huge=within_size it can return THP_ORDERS_ALL_FILE_DEFAULT, which is not >> PMD-only and can include smaller file mTHP orders as well ... >> >> So shmem_getattr() treating any non-zero mask as HPAGE_PMD_SIZE looks >> like an over-report? > > Normally, it looks fine because we always start trying from PMD-sized > large order if tmpfs supports large order (see commit 69e0a3b49003 ("mm: > shmem: fix the strategy for the tmpfs 'huge=' options")). > > And the code here works for 'force' or 'always', but not so much for > 'within_size' or 'advise', as Hugh mentioned before[1]. > > Especially in 'within_size' mode, after we allow fallback to smaller > large orders, the logic becomes unreasonable. Although I'm not sure if > there's any benefit after the modification, it would make the logic > clearer, for example: Ah, I see. Thanks for the explanation! > diff --git a/mm/shmem.c b/mm/shmem.c > index 106e4de943fb..e9f43aefdc7d 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1297,8 +1297,14 @@ static int shmem_getattr(struct mnt_idmap *idmap, >                         STATX_ATTR_NODUMP); >         generic_fillattr(idmap, request_mask, inode, stat); > > -       if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) > -               stat->blksize = HPAGE_PMD_SIZE; > +       orders = shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0); > +       if (!pgtable_has_pmd_leaves()) > +               orders &= ~BIT(PMD_ORDER); > +       if (orders) { > +               unsigned int hi_order = highest_order(orders); > + > +               stat->blksize = PAGE_SIZE << hi_order; > +       } Yep, using the highest remaining order after filtering out unsupported PMD/PUD orders looks better to me. That would keep the stat hint aligned with the allocation path :D > >         if (request_mask & STATX_BTIME) { >                 stat->result_mask |= STATX_BTIME; > > [1] https://lore.kernel.org/all/1524665633-83806-1-git-send-email- > yang.shi@linux.alibaba.com/T/#m21037b23be70fb9f7ab1965bb8b39242752594d1 Thanks, Lance