linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Li Qiang <liqiang01@kylinos.cn>
Cc: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com
Subject: Re: [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for performance
Date: Tue, 5 Aug 2025 14:35:22 +0100	[thread overview]
Message-ID: <9d60bae4-a61b-4d4a-a0a8-19058df30b0f@lucifer.local> (raw)
In-Reply-To: <20250805120435.1142283-1-liqiang01@kylinos.cn>

On Tue, Aug 05, 2025 at 08:04:35PM +0800, Li Qiang wrote:
> Ah, missed it after the performance numbers. As Vlastimil mentioned, I
> would have expected a bloat-o-meter output.
>

You're weirdly quoting David here unattributed as if it were your reply?

> >
> > My 2 cents is that usually it may be better to understand why it is
> > not inlined and address that (e.g., likely() hints or something else)
> > instead of blindly putting __always_inline. The __always_inline might
> > stay there for no reason after some code changes and therefore become
> > a maintenance burden. Concretely, in this case, where there is a single
> > caller, one can expect the compiler to really prefer to inline the
> > callees.
>
> >
> > Agreed, although the compiler is sometimes hard to convince to do the
> > right thing when dealing with rather large+complicated code in my
> > experience.
>

Some nits on reply:

- Please reply to mails individually, rather than reply to one arbtrary one with
  questions as to the other.

- Try to wrap to 75 characters per line in replies.

- Make sure it's clear who you're quoting.

This makes life easier, I've had to go and read through a bunch of mails in
thread to get context here.

> Question 1: Will this patch increase the vmlinux size?
> Reply:
> 	Actually, the overall vmlinux size becomes smaller on x86_64:
> 	[root@localhost linux_old1]# ./scripts/bloat-o-meter before.vmlinux after.vmlinux
> 	add/remove: 6/0 grow/shrink: 0/1 up/down: 4569/-4747 (-178)
> 	Function                                     old     new   delta
> 	zap_present_ptes.constprop                     -    2696   +2696
> 	zap_pte_range                                  -    1236   +1236
> 	zap_pmd_range.isra                             -     589    +589
> 	__pfx_zap_pte_range                            -      16     +16
> 	__pfx_zap_present_ptes.constprop               -      16     +16
> 	__pfx_zap_pmd_range.isra                       -      16     +16
> 	unmap_page_range                            5765    1018   -4747
> 	Total: Before=35379786, After=35379608, chg -0.00%
>
>
> Question 2: Why doesn't GCC inline these functions by default? Are there any side effects of forced inlining?
> Reply:
> 	1) GCC's default parameter max-inline-insns-single imposes restrictions. However, since these are leaf functions, inlining them not only improves performance but also reduces code size. May we consider relaxing the max-inline-insns-single restriction in this case?

Yeah I wonder if we could just increase this... I noticed in my analysis
(presumably what you're replying to here?) that this is what was causing
inlining to stop.

We do a _lot_ of static functions that behave like this so I actually wonder if
we could get perf wins more roadly by doing this...

Could you experiment with this?...


>
> 	2) The functions being inlined in this patch follow a single call path and are ultimately inlined into unmap_page_range. This only increases the size of the unmap_page_range assembly function, but since unmap_page_range itself won't be further inlined, the impact is well-contained.
>

Yup. This is something I already mentioned.

>
>
> Question 3: Does this inlining modification affect code maintainability?
> Reply: The modified inline functions are exclusively called by unmap_page_range, forming a single call path. This doesn't introduce additional maintenance complexity.

Not sure why maintenance would be an issue, code is virtually unchanged.

>
>
> Question 4: Have you performed performance testing on other platforms? Have you tested other scenarios?
> Reply:
> 	1) I tested the same GCC version on arm64 architecture. Even without this patch, these functions get inlined into unmap_page_range automatically. This appears to be due to architecture-specific differences in GCC's max-inline-insns-single default values.

OK interesting. I suspect that's due to more registers right?

>
> 	2) I believe UnixBench serves as a reasonably representative server benchmark. Theoretically, this patch should improve performance by reducing multi-layer function call overhead. However, I would sincerely appreciate your guidance on what additional tests might better demonstrate the performance improvements. Could you kindly suggest some specific benchmarks or test scenarios I should consider?

I'm not sure, actual workloads would be best but presumably you don't have
one where you've noticed a demonstrable difference otherwise you'd have
mentioned...

At any rate I've come around on this series, and think this is probably
reasonable, but I would like to see what increasing max-inline-insns-single
does first?

>
> --
> Cheers,
>
> Li Qiang

Cheers, Lorenzo


  parent reply	other threads:[~2025-08-05 13:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04 12:39 [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for performance Li Qiang
2025-08-04 12:51 ` David Hildenbrand
2025-08-04 13:01   ` Nadav Amit
2025-08-04 13:30     ` David Hildenbrand
2025-08-05 12:04       ` Li Qiang
2025-08-05 13:15         ` Vlastimil Babka
2025-08-06  5:40           ` [PATCH] mm: memory: Force-inline PTE/PMD zapping functions Li Qiang
2025-08-05 13:35         ` Lorenzo Stoakes [this message]
2025-08-06  5:51           ` [PATCH] mm: memory: Force-inline PTE/PMD zapping functions for performance Li Qiang
2025-08-07 10:25             ` Vlastimil Babka
2025-08-04 13:15   ` Vlastimil Babka
2025-08-04 13:29 ` Lorenzo Stoakes
2025-08-04 13:59   ` Lorenzo Stoakes
2025-08-04 14:41     ` Vlastimil Babka
2025-08-04 14:50     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d60bae4-a61b-4d4a-a0a8-19058df30b0f@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liqiang01@kylinos.cn \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).