From: Greg KH <gregkh@linuxfoundation.org>
To: Kishon Vijay Abraham I <kvijayab@amd.com>
Cc: stable@vger.kernel.org, Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH] x86/barrier: Do not serialize MSR accesses on AMD
Date: Wed, 21 Feb 2024 11:41:58 +0100 [thread overview]
Message-ID: <2024022146-chunk-fencing-1e8f@gregkh> (raw)
In-Reply-To: <20240130092628.1807154-1-kvijayab@amd.com>
On Tue, Jan 30, 2024 at 09:26:28AM +0000, Kishon Vijay Abraham I wrote:
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
>
> commit 04c3024560d3a14acd18d0a51a1d0a89d29b7eb5 upstream.
>
> AMD does not have the requirement for a synchronization barrier when
> acccessing a certain group of MSRs. Do not incur that unnecessary
> penalty there.
>
> There will be a CPUID bit which explicitly states that a MFENCE is not
> needed. Once that bit is added to the APM, this will be extended with
> it.
>
> While at it, move to processor.h to avoid include hell. Untangling that
> file properly is a matter for another day.
>
> Some notes on the performance aspect of why this is relevant, courtesy
> of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>:
>
> On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM
> shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The
> ipi-bench is modified so that the IPIs are sent between two vCPUs in the
> same CCX. This also requires to pin the vCPU to a physical core to
> prevent any latencies. This simulates the use case of pinning vCPUs to
> the thread of a single CCX to avoid interrupt IPI latency.
>
> In order to avoid run-to-run variance (for both x2AVIC and AVIC), the
> below configurations are done:
>
> 1) Disable Power States in BIOS (to prevent the system from going to
> lower power state)
>
> 2) Run the system at fixed frequency 2500MHz (to prevent the system
> from increasing the frequency when the load is more)
>
> With the above configuration:
>
> *) Performance measured using ipi-bench for AVIC:
> Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU]
>
> Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from
> 48 vCPUs simultaneously]
>
> *) Performance measured using ipi-bench for x2AVIC:
> Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU]
>
> Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from
> 48 vCPUs simultaneously]
>
> >From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is
> x2AVIC performance to be better or equivalent to AVIC. Upon analyzing
> the perf captures, it is observed significant time is spent in
> weak_wrmsr_fence() invoked by x2apic_send_IPI().
>
> With the fix to skip weak_wrmsr_fence()
>
> *) Performance measured using ipi-bench for x2AVIC:
> Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU]
>
> Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from
> 48 vCPUs simultaneously]
>
> Comparing the performance of x2AVIC with and without the fix, it can be seen
> the performance improves by ~4%.
>
> Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option
> with and without weak_wrmsr_fence() on a Zen4 system also showed significant
> performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores
> CCX or CCD and just picks random vCPU.
>
> Average throughput (10 iterations) with weak_wrmsr_fence(),
> Cumulative throughput: 4933374 IPI/s
>
> Average throughput (10 iterations) without weak_wrmsr_fence(),
> Cumulative throughput: 6355156 IPI/s
>
> [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench
>
> Cc: stable@vger.kernel.org # 6.6+
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
> ---
> Kindly merge this patch to stable releases (v6.6+) as it's a perf optimization.
> [It does not apply as is on earlier releases and have to be reworked]
Sorry for the delay, now queued up.
greg k-h
prev parent reply other threads:[~2024-02-21 10:42 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 9:26 [PATCH] x86/barrier: Do not serialize MSR accesses on AMD Kishon Vijay Abraham I
2024-02-21 10:41 ` Greg KH [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2024022146-chunk-fencing-1e8f@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=bp@alien8.de \
--cc=kvijayab@amd.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox