All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, Juergen Gross <jgross@suse.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	"Ahmed S . Darwish" <darwi@linutronix.de>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	John Ogness <john.ogness@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH] compiler/gcc: Make asm() templates asm __inline__() by default
Date: Wed, 19 Mar 2025 23:34:47 +0100	[thread overview]
Message-ID: <Z9tGh0Fa96gACmpQ@gmail.com> (raw)
In-Reply-To: <CAFULd4aFUun7+1izxbDM8nTEEta5PApysyTGsn1hjvQND=2UtQ@mail.gmail.com>


* Uros Bizjak <ubizjak@gmail.com> wrote:

> On Tue, Mar 18, 2025 at 9:11 PM Ingo Molnar <mingo@kernel.org> wrote:
> 
> >  #ifdef CONFIG_CC_HAS_ASM_INLINE
> >  # define asm_inline __asm__ __inline
> >  # define asm(...) asm_inline(__VA_ARGS__)
> >  #else
> >  # define asm_inline asm
> >  #endif
> >
> > And I fixed up the places where this isn't syntactically correct:
> >
> >  35 files changed, 82 insertions(+), 79 deletions(-)
> >
> > I haven't looked at code generation much yet, but text size changes are
> > minimal:
> >
> >       text         data     bss      dec            hex filename
> >   29429076      7931870 1401196 38762142        24f769e vmlinux.before
> >   29429631      7931870 1401200 38762701        24f78cd vmlinux.after
> >
> > Which is promising, assuming I haven't messed up anywhere.
> 
> Please use bloat-o-meter, it is more precise.

Here's the bloat-o-meter output between vanilla and patched vmlinux:

add/remove: 6/20 grow/shrink: 43/13 up/down: 4245/-2812 (1433)
Function                                     old     new   delta
__ia32_sys_pidfd_send_signal                  21     818    +797
__x64_sys_pidfd_send_signal                   22     818    +796
nl80211_send_wiphy                         11189   11867    +678
icl_update_topdown_event                     473     691    +218
intel_joiner_adjust_timings.isra               -     145    +145
deactivate_locked_super                      148     249    +101
tcp_v4_send_synack                           301     389     +88
kill_anon_super                               53     137     +84
xa_destroy                                   291     371     +80
__xa_set_mark                                135     209     +74
ip_fraglist_prepare                          204     269     +65
store_hwp_dynamic_boost                      136     200     +64
csum_partial                                 239     302     +63
ip_fraglist_init                             155     217     +62
store_max_perf_pct                           252     311     +59
ip_frag_next                                 377     434     +57
ip_do_fragment                              1644    1701     +57
__ip_local_out                               283     340     +57
store_min_perf_pct                           273     329     +56
__udp4_lib_rcv                              2917    2970     +53
freeze_super                                1124    1175     +51
super_wake                                     -      47     +47
ieee80211_parse_tx_radiotap                 1267    1311     +44
ic_bootp_recv                               1333    1371     +38
vlv_compute_watermarks                      2267    2299     +32
intel_atomic_commit_tail                    5423    5455     +32
_g4x_compute_pipe_wm                         397     429     +32
__ieee80211_xmit_fast                       2611    2643     +32
intel_format_info_is_yuv_semiplanar           43      72     +29
skl_main_to_aux_plane                        113     136     +23
ip_auto_config                              4029    4050     +21
__memcpy_flushcache                          369     386     +17
validate_beacon_head                         240     256     +16
tcp_v4_rcv                                  4850    4866     +16
intel_enable_transcoder                     1443    1459     +16
intel_cx0pll_state_verify                   1859    1875     +16
intel_cx0_phy_check_hdmi_link_rate           181     197     +16
inet_gro_receive                             569     585     +16
__pfx_super_wake                               -      16     +16
__pfx_intel_joiner_adjust_timings.isra         -      16     +16
__pfx_intel_cx0_read.constprop                 -      16     +16
intel_fb_is_gen12_ccs_aux_plane.isra          95     107     +12
_intel_modeset_primary_pipes                  79      91     +12
intel_cx0_read.constprop                       -      10     +10
intel_fb_is_ccs_aux_plane                    110     117      +7
intel_crtc_readout_derived_state             513     516      +3
wq_update_node_max_active                    538     540      +2
intel_set_cdclk_pre_plane_update             838     840      +2
intel_sseu_subslice_total                     67      68      +1
intel_crtc_compute_config                    890     889      -1
_intel_modeset_secondary_pipes                50      47      -3
intel_mtl_pll_enable                        7472    7460     -12
bxt_set_cdclk                               1328    1316     -12
vfs_get_tree                                 205     189     -16
intel_fb_modifier_to_tiling                  186     170     -16
__pfx_nl80211_send_iftype_data                16       -     -16
__pfx_lane_mask_to_lane                       16       -     -16
__pfx_kill_super_notify.part                  16       -     -16
__pfx_ip_fast_csum                            16       -     -16
__pfx_intel_pstate_update_policies            16       -     -16
__pfx_intel_joiner_adjust_timings             16       -     -16
__pfx_format_is_yuv_semiplanar.part.isra      16       -     -16
__pfx_cdclk_divider                           16       -     -16
__pfx___icl_update_topdown_event              16       -     -16
__pfx___do_sys_pidfd_send_signal              16       -     -16
intel_c20_sram_read.constprop                173     149     -24
xas_split_alloc                              302     270     -32
nl80211_parse_sched_scan                    3311    3279     -32
kill_litter_super                             70      31     -39
format_is_yuv_semiplanar.part.isra            41       -     -41
lane_mask_to_lane                             44       -     -44
ip_fast_csum                                  48       -     -48
intel_cx0pll_readout_hw_state               1085    1037     -48
intel_pstate_update_policies                  66       -     -66
cdclk_divider                                 69       -     -69
kill_super_notify.part                       111       -    -111
__icl_update_topdown_event                   112       -    -112
intel_joiner_adjust_timings                  140       -    -140
intel_fill_fb_info                          2998    2813    -185
intel_tile_width_bytes                       747     531    -216
nl80211_send_iftype_data                     574       -    -574
__do_sys_pidfd_send_signal                   811       -    -811
Total: Before=22547058, After=22548491, chg +0.01%

A lot fewer functions are affected than I expected from such a 
large-scope change.

> Actually, functions with the most impact (x86 locking functions and
> __arch_hweight) were recently converted to asm_inline, so besides
> __untagged_addr, the remaining have very little impact, if at all
> (c.f. amd_clear_divider() ). There is also no need to convert asm()
> without directives inside.

I did the test with Linus-vanilla (81e4f8d68c66) to maximize the 
potential effect, which doesn't have those changes yet.

See tip:WIP.x86/core.

> My proposal would be to convert the remaining few cases (the remaining
> asms involving ALTERNATIVE and exceptions) "by hand" to asm_inline()
> and stick a rule in checkpatch to use asm_inline() in the code
> involving asm(), like we have the rule with asm volatile.
> 
> I don't think redefining an important C keyword is a good approach, it
> obfuscates its meaning too much. And as has been shown by Ingo's
> experiment, there is a substantial effort to fix false positives.
> Instead of fixing these, we can trivially convert the remaining cases
> to asm_volatile() as well, without obfuscating asm(). Checkpatch can
> take care of future cases.

This would work for me too. The cross-arch impact and churn seems 
substantial.

Thanks,

	Ingo

  reply	other threads:[~2025-03-19 22:34 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17 22:18 [PATCH 0/5] x86/cpu: Introduce <asm/cpuid/types.h> and <asm/cpuid/api.h> and clean them up Ingo Molnar
2025-03-17 22:18 ` [PATCH 1/5] x86/cpuid: Refactor <asm/cpuid.h> Ingo Molnar
2025-03-18 12:00   ` [tip: x86/cpu] " tip-bot2 for Ahmed S. Darwish
2025-03-19 11:03   ` [tip: x86/core] " tip-bot2 for Ahmed S. Darwish
2025-03-17 22:18 ` [PATCH 2/5] x86/cpuid: Clean up <asm/cpuid/types.h> Ingo Molnar
2025-03-18 12:00   ` [tip: x86/cpu] " tip-bot2 for Ingo Molnar
2025-03-19 11:03   ` [tip: x86/core] " tip-bot2 for Ingo Molnar
2025-03-17 22:18 ` [PATCH 3/5] x86/cpuid: Clean up <asm/cpuid/api.h> Ingo Molnar
2025-03-18 12:00   ` [tip: x86/cpu] " tip-bot2 for Ingo Molnar
2025-03-19 11:03   ` [tip: x86/core] " tip-bot2 for Ingo Molnar
2025-03-17 22:18 ` [PATCH 4/5] x86/cpuid: Standardize on u32 in <asm/cpuid/api.h> Ingo Molnar
2025-03-18  5:59   ` Xin Li
2025-03-18 12:00   ` [tip: x86/cpu] " tip-bot2 for Ingo Molnar
2025-03-19 11:03   ` [tip: x86/core] " tip-bot2 for Ingo Molnar
2025-03-17 22:18 ` [PATCH 5/5] x86/cpuid: Use u32 in instead of uint32_t " Ingo Molnar
2025-03-18  6:01   ` Xin Li
2025-03-18  8:34     ` Ingo Molnar
2025-03-18  9:37       ` Borislav Petkov
2025-03-18 11:53         ` Ingo Molnar
2025-03-18 12:15           ` Borislav Petkov
2025-03-18 18:20             ` Ingo Molnar
2025-03-19  8:08               ` Thomas Gleixner
2025-03-19 20:16                 ` Ingo Molnar
2025-03-18 12:00   ` [tip: x86/cpu] " tip-bot2 for Ingo Molnar
2025-03-19 11:03   ` [tip: x86/core] " tip-bot2 for Ingo Molnar
2025-03-18 14:05 ` [PATCH 0/5] x86/cpu: Introduce <asm/cpuid/types.h> and <asm/cpuid/api.h> and clean them up H. Peter Anvin
2025-03-18 18:04   ` Ingo Molnar
2025-03-18 18:33     ` Linus Torvalds
2025-03-18 18:46       ` Ingo Molnar
2025-03-18 20:11         ` [PATCH] compiler/gcc: Make asm() templates asm __inline__() by default Ingo Molnar
2025-03-18 22:07           ` Josh Poimboeuf
2025-03-19  4:57           ` Uros Bizjak
2025-03-19 22:34             ` Ingo Molnar [this message]
2025-03-20  8:21               ` Uros Bizjak
2025-03-20  8:59                 ` Ingo Molnar
2025-03-20 10:30                   ` Uros Bizjak
2025-03-20 11:58                     ` Uros Bizjak
2025-03-19  3:30     ` [PATCH 0/5] x86/cpu: Introduce <asm/cpuid/types.h> and <asm/cpuid/api.h> and clean them up H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9tGh0Fa96gACmpQ@gmail.com \
    --to=mingo@kernel.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=bp@alien8.de \
    --cc=darwi@linutronix.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=john.ogness@linutronix.de \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.