From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Chengfeng Lin <chengfenglin@stu.xmu.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <liam@infradead.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>, Dev Jain <dev.jain@arm.com>,
Ryan Roberts <ryan.roberts@arm.com>, Zi Yan <ziy@nvidia.com>
Subject: Re: [RFC] mm/mincore: present-PTE scan cost after pte_batch_hint() batching
Date: Tue, 9 Jun 2026 16:27:50 +0200 [thread overview]
Message-ID: <106e71e4-975c-4337-aaff-a00628827477@kernel.org> (raw)
In-Reply-To: <2780f0f6.176fd.19eacba49bb.Coremail.chengfenglin@stu.xmu.edu.cn>
On 6/9/26 16:12, Chengfeng Lin wrote:
> Hi David,
Hi,
> Thanks, and sorry for the confusing wording. The plain statement is: I
> observed a performance difference in a narrow x86/QEMU synthetic mincore()
> case, and after your comment I checked whether this is really a codegen issue.
>
> The wording in my first mail was too abstract. What I was trying to say is
> only that the benchmark focuses on one specific case:
>
> private anonymous memory
> MADV_NOHUGEPAGE
> faulted/resident base pages
> repeated mincore() over the range
>
> so the measured path should mostly be the present-PTE scan in
> mincore_pte_range(). I agree that the 8/16 CPU rows are not very useful for
> this path; please treat them as extra context only. The useful data is the
> single-threaded / low-CPU v6.15 -> v6.16 A/B and the patched variants.
>
> The compiler used for the lab kernels was:
>
> gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Okay, GCC 13 was released 3 years ago.
> GNU ld (GNU Binutils for Ubuntu) 2.42
>
> Your point about x86 pte_batch_hint() is exactly the right thing to check.
> Since pte_batch_hint() returns 1 on x86, I agree that the expectation would be
> for the compiler to optimize the batching logic back down to something very
> close to the old base-page path.
>
> I checked the generated mincore_pte_range() code with the same GCC/config setup.
> The function sizes from nm are:
>
> v6.15 original: 0x1fb
> v6.16 original: 0x245
> v6.16 batch<=1 fastpath: 0x1ec
> v6.16 with batching removed: 0x1ec
>
> So, with GCC 13.3, the v6.16 original build does not look optimized back to the
> old x86 base-page shape. The v6.16 batch<=1 fastpath and the v6.16 nobatch
> variant produce the same mincore_pte_range() objdump output in my build.
>
> I also checked Clang 18.1.3 as a cross-check. With Clang, v6.15 original,
> v6.16 original, v6.16 batch<=1 fastpath and v6.16 nobatch all produce the same
> mincore_pte_range() size, 0x1f9, and the objdump output is byte-identical.
>
> So your expectation does hold with Clang, but not with the GCC 13.3 build I used
> for the original lab runs. This does not prove a compiler bug, and it means my
> original report should be narrowed: it is not a generic x86 mincore()
> regression claim. In this check, GCC 13.3 generates a different
> mincore_pte_range() shape for v6.16 original, while Clang 18.1.3 generates
> byte-identical output for all checked variants. The timing signal I reported
> came from the GCC-built QEMU lab kernels.
It's probably a good idea to
1) Try with newer GCC
2) Take a look at the actual difference in the generated code
Is it some inlining decisions? E.g., if the function is larger, other code is
likely to get inlined?
The function is not particularly large, so it's a bit unexpected.
--
Cheers,
David
next prev parent reply other threads:[~2026-06-09 14:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 7:26 [RFC] mm/mincore: present-PTE scan cost after pte_batch_hint() batching Chengfeng Lin
2026-06-09 9:01 ` David Hildenbrand (Arm)
2026-06-09 9:55 ` Barry Song
2026-06-09 14:12 ` Chengfeng Lin
2026-06-09 14:27 ` David Hildenbrand (Arm) [this message]
2026-06-09 21:12 ` Pedro Falcato
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=106e71e4-975c-4337-aaff-a00628827477@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chengfenglin@stu.xmu.edu.cn \
--cc=dev.jain@arm.com \
--cc=jannh@google.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pfalcato@suse.de \
--cc=ryan.roberts@arm.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox