LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: X86 ML <x86@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Chen Zhongjin <chenzhongjin@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Jason Baron <jbaron@akamai.com>, Ingo Molnar <mingo@redhat.com>,
	"sv@linux.ibm.com" <sv@linux.ibm.com>,
	"Steven Rostedt \(VMware\)" <rostedt@goodmis.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	"agust@denx.de" <agust@denx.de>,
	"open list:LINUX FOR POWERPC \(32-BIT AND 64-BIT\)"
	<linuxppc-dev@lists.ozlabs.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>
Subject: Re: [PATCH v2 0/7] Implement inline static calls on PPC32 - v2
Date: Thu, 1 Sep 2022 16:46:40 +0000	[thread overview]
Message-ID: <d35a2039-1755-b0be-6733-bb7ec19b2ea8@csgroup.eu> (raw)
In-Reply-To: <CAMj1kXFqs=YAbTDJOgzpse9ZkggSxPNNJJphEA=J94FQzF55qg@mail.gmail.com>



Le 09/07/2022 à 08:52, Ard Biesheuvel a écrit :
> Hello Christophe,
> 
> On Fri, 8 Jul 2022 at 19:32, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>> This series applies on top of the series v3 "objtool: Enable and
>> implement --mcount option on powerpc" [1] rebased on powerpc-next branch
>>
>> A few modifications are done to core parts to enable powerpc
>> implementation:
>> - R_X86_64_PC32 is abstracted to R_REL32 so that it can then be
>> redefined as R_PPC_REL32.
>> - A call to static_call_init() is added to start_kernel() to avoid
>> every architecture to have to call it
>> - Trampoline address is provided to arch_static_call_transform() even
>> when setting a site to fallback on a call to the trampoline when the
>> target is too far.
>>
>> [1] https://lore.kernel.org/lkml/70b6d08d-aced-7f4e-b958-a3c7ae1a9319@csgroup.eu/T/#rb3a073c54aba563a135fba891e0c34c46e47beef
>>
>> Christophe Leroy (7):
>>    powerpc: Add missing asm/asm.h for objtool
>>    objtool/powerpc: Activate objtool on PPC32
>>    objtool: Add architecture specific R_REL32 macro
>>    objtool/powerpc: Add necessary support for inline static calls
>>    init: Call static_call_init() from start_kernel()
>>    static_call_inline: Provide trampoline address when updating sites
>>    powerpc/static_call: Implement inline static calls
>>
> 
> Could you quantify the performance gains of moving from out-of-line,
> patched tail-call branch instructions to full-fledged inline static
> calls? On x86, the retpoline problem makes this glaringly obvious, but
> on other architectures, the complexity of supporting this model may
> outweigh the performance advantages.

Surprisingly, I get worst performance with inline static call than with 
out of line static call:

No static call:

root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.186
Running with 1*40 (== 40) tasks.
Time: 16.738
Running with 1*40 (== 40) tasks.
Time: 16.579
Running with 1*40 (== 40) tasks.
Time: 16.838
Running with 1*40 (== 40) tasks.
Time: 16.652
Running with 1*40 (== 40) tasks.
Time: 17.380
Running with 1*40 (== 40) tasks.
Time: 16.630
Running with 1*40 (== 40) tasks.
Time: 16.850
Running with 1*40 (== 40) tasks.
Time: 17.161
Running with 1*40 (== 40) tasks.
Time: 16.722

  Performance counter stats for './hackbench 1' (10 runs):

           17019.55 msec task-clock                #    0.980 CPUs 
utilized            ( +-  0.51% )
               4847      context-switches          #  282.280 /sec 
               ( +-  6.32% )
                  0      cpu-migrations            #    0.000 /sec
               1249      page-faults               #   72.739 /sec 
               ( +-  0.49% )
         2245344976      cycles                    #    0.131 GHz 
               ( +-  0.51% )
          727437072      instructions              #    0.32  insn per 
cycle           ( +-  0.40% )
    <not supported>      branches
    <not supported>      branch-misses

            17.3585 +- 0.0909 seconds time elapsed  ( +-  0.52% )


Outline static call:

root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 15.892
Running with 1*40 (== 40) tasks.
Time: 15.731
Running with 1*40 (== 40) tasks.
Time: 15.507
Running with 1*40 (== 40) tasks.
Time: 16.269
Running with 1*40 (== 40) tasks.
Time: 15.934
Running with 1*40 (== 40) tasks.
Time: 16.048
Running with 1*40 (== 40) tasks.
Time: 15.700
Running with 1*40 (== 40) tasks.
Time: 16.063
Running with 1*40 (== 40) tasks.
Time: 15.852
Running with 1*40 (== 40) tasks.
Time: 15.941

  Performance counter stats for './hackbench 1' (10 runs):

           16227.32 msec task-clock                #    0.992 CPUs 
utilized            ( +-  0.42% )
               3732      context-switches          #  230.525 /sec 
               ( +-  6.42% )
                  0      cpu-migrations            #    0.000 /sec
               1244      page-faults               #   76.842 /sec 
               ( +-  0.11% )
         2141094288      cycles                    #    0.132 GHz 
               ( +-  0.42% )
          712598441      instructions              #    0.33  insn per 
cycle           ( +-  0.29% )
    <not supported>      branches
    <not supported>      branch-misses

            16.3539 +- 0.0675 seconds time elapsed  ( +-  0.41% )


Inline static call:

root@vgoip:~# perf stat -r 10 ./hackbench 1
Running with 1*40 (== 40) tasks.
Time: 17.512
Running with 1*40 (== 40) tasks.
Time: 17.240
Running with 1*40 (== 40) tasks.
Time: 16.901
Running with 1*40 (== 40) tasks.
Time: 17.125
Running with 1*40 (== 40) tasks.
Time: 17.262
Running with 1*40 (== 40) tasks.
Time: 17.298
Running with 1*40 (== 40) tasks.
Time: 17.182
Running with 1*40 (== 40) tasks.
Time: 16.988
Running with 1*40 (== 40) tasks.
Time: 17.102
Running with 1*40 (== 40) tasks.
Time: 16.669

  Performance counter stats for './hackbench 1' (10 runs):

           16976.76 msec task-clock                #    0.964 CPUs 
utilized            ( +-  0.44% )
               4760      context-switches          #  273.007 /sec 
               ( +-  4.93% )
                  0      cpu-migrations            #    0.000 /sec
               1252      page-faults               #   71.808 /sec 
               ( +-  0.35% )
         2239986112      cycles                    #    0.128 GHz 
               ( +-  0.44% )
          721540184      instructions              #    0.31  insn per 
cycle           ( +-  0.31% )
    <not supported>      branches
    <not supported>      branch-misses

            17.6126 +- 0.0762 seconds time elapsed  ( +-  0.43% )


Summary:

No static calls:
            17.3585 +- 0.0909 seconds time elapsed  ( +-  0.52% )
Out-of-line static calls:
            16.3539 +- 0.0675 seconds time elapsed  ( +-  0.41% )
Inline static calls:
            17.6126 +- 0.0762 seconds time elapsed  ( +-  0.43% )

Is there anything wrong with inline statica calls ?

Christophe

  reply	other threads:[~2022-09-01 16:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08 17:31 [PATCH v2 0/7] Implement inline static calls on PPC32 - v2 Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 1/7] powerpc: Add missing asm/asm.h for objtool Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 2/7] objtool/powerpc: Activate objtool on PPC32 Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 3/7] objtool: Add architecture specific R_REL32 macro Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 4/7] objtool/powerpc: Add necessary support for inline static calls Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 5/7] init: Call static_call_init() from start_kernel() Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 6/7] static_call_inline: Provide trampoline address when updating sites Christophe Leroy
2022-07-08 17:31 ` [PATCH v2 7/7] powerpc/static_call: Implement inline static calls Christophe Leroy
2022-07-09  6:52 ` [PATCH v2 0/7] Implement inline static calls on PPC32 - v2 Ard Biesheuvel
2022-09-01 16:46   ` Christophe Leroy [this message]
2022-09-08  0:13     ` Benjamin Gray
2022-09-08  6:11       ` Christophe Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d35a2039-1755-b0be-6733-bb7ec19b2ea8@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=agust@denx.de \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=chenzhongjin@huawei.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jbaron@akamai.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sv@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox