All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Nadav Amit <namit@vmware.com>, Andy Lutomirski <luto@kernel.org>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr()
Date: Sat, 14 Oct 2023 12:04:21 +0200	[thread overview]
Message-ID: <ZSpnpfn/mSYrgC9C@gmail.com> (raw)
In-Reply-To: <20231011204150.51166-1-ubizjak@gmail.com>


* Uros Bizjak <ubizjak@gmail.com> wrote:

> Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
> add the ptr value to the base. This way, the compiler can propagate
> addend to the following instruction and simplify address calculation.
> 
> E.g.: address calcuation in amd_pmu_enable_virt() improves from:
> 
>     48 c7 c0 00 00 00 00 	mov    $0x0,%rax
> 	87b7: R_X86_64_32S	cpu_hw_events
> 
>     65 48 03 05 00 00 00 	add    %gs:0x0(%rip),%rax
>     00
> 	87bf: R_X86_64_PC32	this_cpu_off-0x4
> 
>     48 c7 80 28 13 00 00 	movq   $0x0,0x1328(%rax)
>     00 00 00 00
> 
> to:
> 
>     65 48 8b 05 00 00 00 	mov    %gs:0x0(%rip),%rax
>     00
> 	8798: R_X86_64_PC32	this_cpu_off-0x4
>     48 c7 80 00 00 00 00 	movq   $0x0,0x0(%rax)
>     00 00 00 00
> 	87a6: R_X86_64_32S	cpu_hw_events+0x1328
> 
> The compiler can also eliminate redundant loads from this_cpu_off,
> reducing the number of percpu offset reads (either from this_cpu_off
> or with rdgsbase) from 1663 to 1571.
> 
> Additionaly, the patch introduces 'rdgsbase' alternative for CPUs with
> X86_FEATURE_FSGSBASE. The rdgsbase instruction *probably* will end up
> only decoding in the first decoder etc. But we're talking single-cycle
> kind of effects, and the rdgsbase case should be much better from
> a cache perspective and might use fewer memory pipeline resources to
> offset the fact that it uses an unusual front end decoder resource...

So the 'additionally' wording in the changelog should have been a big hint 
already that the introduction of RDGSBASE usage needs to be a separate 
patch. ;-)

Thanks,

	Ingo

  parent reply	other threads:[~2023-10-14 10:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 20:40 [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr() Uros Bizjak
2023-10-13 16:04 ` Sean Christopherson
2023-10-13 19:30   ` Uros Bizjak
2023-10-13 20:07     ` Linus Torvalds
2023-10-13 21:02     ` Sean Christopherson
2023-10-14 10:04 ` Ingo Molnar [this message]
2023-10-14 10:34   ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSpnpfn/mSYrgC9C@gmail.com \
    --to=mingo@kernel.org \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=ubizjak@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.