public inbox for linux-sparse@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Zhang <zxh@xh-zhang.com>,
	linux-sparse@vger.kernel.org, dan.carpenter@linaro.org,
	chriscli@google.com, ben.dooks@codethink.co.uk,
	rf@opensource.cirrus.com
Subject: Re: [RFC PATCH] pre-process: add __VA_OPT__ support
Date: Thu, 19 Mar 2026 05:34:57 +0000	[thread overview]
Message-ID: <20260319053457.GH3836593@ZenIV> (raw)
In-Reply-To: <CAHk-=wh7zNVQ0k1o_3cgGiCdcpYFCvbZ2O3ZLQW-q2xXqusmww@mail.gmail.com>

On Wed, Mar 18, 2026 at 09:07:17PM -0700, Linus Torvalds wrote:

> I looked at clang at one point, and iirc it generated *much* better
> code, because I think it does tth esmart thing, which is to get rid of
> the notion of bitfields as quickly as possible, and then doing just
> regular integer optimizations.

FWIW, with the local tokenizer patches the single worst spot at the moment
(both for clang and for gcc build) is this:
        p = &hash_table[hash];
	while ((ident = *p) != NULL) {
		if (ident->len == (unsigned char) len) {
in create_hashed_ident().  This is from clang build, where
it's inlined into tokenize_stream():
   0.14 │       mov    (%rdx,%rcx,8),%rax                                                                                           
  12.34 │       mov    %r12d,%r15d                                                                                                  
   0.02 │       test   %rax,%rax                                                                                                    
   0.43 │     ↓ je     33e                                                                                                          
        │       mov    %rsp,%r14                                                                                                    
   0.29 │     ↓ jmp    319                                                                                                          
        │       nop                                                                                                                 
        │310:┌─→mov    0x0(%r13),%rax                                                                                               
   2.00 │    │  test   %rax,%rax                                                                                                    
   0.07 │    │↓ je     345                                                                                                          
   0.00 │319:│  mov    %rax,%r13                                                                                                    
        │    │if (ident->len == (unsigned char) len) {                                                                              
   0.19 │    ├──cmp    %r12b,0x10(%rax)                                                                                             
  16.86 │    └──jne    310                                                                                                          
and this is gcc build, where it's not inlined, so the percentages are
several times higher (out of 5.9% vs. out of 17.1% on the profiles I'm
looking at):
        │    p = &hash_table[hash];                                                                                                ▒
   0.77 │      mov    (%rax,%rdx,8),%rbx                                                                                           ▒
        │    while ((ident = *p) != NULL) {                                                                                        ▒
  27.77 │      test   %rbx,%rbx                                                                                                    ◆
   0.98 │    ↓ jne    3b                                                                                                           ▒
        │    ↓ jmp    c2                                                                                                           ▒
        │      nop                                                                                                                 ▒
        │    ident_hit++;                                                                                                          ▒
        │    return ident;                                                                                                         ▒
        │    }                                                                                                                     ▒
        │    next:                                                                                                                 ▒
        │    //misses++;                                                                                                           ▒
        │    p = &ident->next;                                                                                                     ▒
   0.00 │30:┌─→mov    (%rbx),%rax                                                                                                  ▒
        │   │while ((ident = *p) != NULL) {                                                                                        ▒
   6.11 │   │  test   %rax,%rax                                                                                                    ▒
   0.24 │   │↓ je     70                                                                                                           ▒
        │   │  mov    %rax,%rbx                                                                                                    ▒
        │   │if (ident->len == (unsigned char) len) {                                                                              ▒
   0.81 │3b:├──cmp    %bpl,0x10(%rbx)                                                                                              ▒
  50.82 │   └──jne    30                                                                                                           ▒

Most of the accesses are to single-element chain; it's not walking the
lists that hurts, it's the very first step.  The profiles are for userland
cycles; looking for stalled-cycles-frontend gives exact same hotspots.
The next one is lookup_symbol(); there we also walk linked lists.
The only difference is that lists are often longer than one entry...
Not sure what can be done about either.

  reply	other threads:[~2026-03-19  5:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1771930766.git.dan.carpenter@linaro.org>
2026-02-24 11:07 ` [PATCH] sparse: add support for __VA_OPT__ Dan Carpenter
2026-02-24 11:16   ` Ben Dooks
2026-02-24 11:56     ` Dan Carpenter
2026-02-24 12:42       ` Richard Fitzgerald
2026-02-24 13:15         ` Ben Dooks
2026-02-25  2:39   ` Chris Li
2026-02-25  3:36     ` Al Viro
2026-02-25  5:29       ` [RFC PATCH] pre-process: add __VA_OPT__ support Eric Zhang
2026-02-25  6:40         ` Al Viro
2026-02-25  7:27           ` Al Viro
2026-02-25  8:14             ` Eric Zhang
2026-02-25 22:18               ` Al Viro
2026-02-26  7:29                 ` Al Viro
2026-03-16  6:56                   ` Al Viro
2026-03-16  7:03                     ` [PATCH 01/21] split copy() into "need to copy" and "can move in place" cases Al Viro
2026-03-16  7:03                       ` [PATCH 02/21] expand and simplify the call of dup_token() in copy() Al Viro
2026-03-16  7:03                       ` [PATCH 03/21] more dup_token() optimizations Al Viro
2026-03-16  7:03                       ` [PATCH 04/21] parsing #define: saner handling of argument count, part 1 Al Viro
2026-03-16  7:03                       ` [PATCH 05/21] simplify collect_arguments() and fix error handling there Al Viro
2026-03-16  7:04                       ` [PATCH 06/21] try_arg(): don't use arglist for argument name lookups Al Viro
2026-03-16  7:04                       ` [PATCH 07/21] make expand_has_...() responsible for expanding its argument Al Viro
2026-03-16  7:04                       ` [PATCH 08/21] preparing to change argument number encoding for TOKEN_..._ARGUMENT Al Viro
2026-03-16  7:04                       ` [PATCH 09/21] steal 2 bits from argnum for argument kind Al Viro
2026-03-16  7:04                       ` [PATCH 10/21] on-demand argument expansion Al Viro
2026-03-16  7:04                       ` [PATCH 11/21] kill create_arglist() Al Viro
2026-03-16  7:04                       ` [PATCH 12/21] stop mangling arglist, get rid of TOKEN_ARG_COUNT Al Viro
2026-03-16  7:04                       ` [PATCH 13/21] deal with ## on arguments separately Al Viro
2026-03-16  7:04                       ` [PATCH 14/21] preparations for __VA_OPT__ support: reshuffle argument slot assignments Al Viro
2026-03-16  7:04                       ` [PATCH 15/21] pre-process.c: split try_arg() Al Viro
2026-03-16  7:04                       ` [PATCH 16/21] __VA_OPT__: parsing Al Viro
2026-03-16  7:04                       ` [PATCH 17/21] expansion-time va_opt handling Al Viro
2026-03-16  7:04                       ` [PATCH 18/21] merge(): saner handling of ->noexpand Al Viro
2026-03-16  7:04                       ` [PATCH 19/21] simplify the calling conventions of collect_arguments() Al Viro
2026-03-16  7:04                       ` [PATCH 20/21] make expand_one_symbol() inline Al Viro
2026-03-16  7:04                       ` [PATCH 21/21] substitute(): convert switch() into cascade of ifs Al Viro
2026-03-16 16:42                     ` [RFC PATCH] pre-process: add __VA_OPT__ support Linus Torvalds
2026-03-19  3:53                       ` Al Viro
2026-03-19  4:07                         ` Linus Torvalds
2026-03-19  5:34                           ` Al Viro [this message]
2026-03-17  7:41                     ` Chris Li
2026-03-18  6:35                     ` Eric Zhang
2026-02-25  7:05       ` [PATCH] sparse: add support for __VA_OPT__ Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260319053457.GH3836593@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=ben.dooks@codethink.co.uk \
    --cc=chriscli@google.com \
    --cc=dan.carpenter@linaro.org \
    --cc=linux-sparse@vger.kernel.org \
    --cc=rf@opensource.cirrus.com \
    --cc=torvalds@linux-foundation.org \
    --cc=zxh@xh-zhang.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox