* [RFC PATCH v2 0/2] x86/speculation: Add finer control for when to issue IBPB
@ 2021-04-29 8:44 Anand K Mistry
2021-04-29 8:44 ` [RFC PATCH v2 2/2] selftests: Benchmark for the cost of disabling IB speculation Anand K Mistry
0 siblings, 1 reply; 2+ messages in thread
From: Anand K Mistry @ 2021-04-29 8:44 UTC (permalink / raw)
To: x86
Cc: joelaf, asteinhauser, bp, tglx, Anand K Mistry, Andy Lutomirski,
Ben Segall, Catalin Marinas, Chang S. Bae,
Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
Fenghua Yu, Gabriel Krisman Bertazi, H. Peter Anvin, Ingo Molnar,
Jay Lang, Jens Axboe, Juri Lelli, Kees Cook, Lai Jiangshan,
Mel Gorman, Mike Rapoport, Oleg Nesterov, Peter Collingbourne,
Peter Zijlstra, Shuah Khan, Steven Rostedt, Tony Luck,
Vincent Guittot, linux-kernel, linux-kselftest
It is documented in Documentation/admin-guide/hw-vuln/spectre.rst, that
disabling indirect branch speculation for a user-space process creates
more overhead and cause it to run slower. The performance hit varies by
CPU, but on the AMD A4-9120C and A6-9220C CPUs, a simple ping-pong using
pipes between two processes runs ~10x slower when disabling IB
speculation.
Patch 2, included in this RFC but not intended for commit, is a simple
program that demonstrates this issue. Running on a A4-9120C without IB
speculation disabled, each process ping-pong takes ~7us:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1936300, iter/sec: 135383, us/iter: 7
But when IB speculation is disabled, that number increases
significantly:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 16384, t: 1500518, iter/sec: 10918, us/iter: 91
Although this test is a worst-case scenario, we can also consider a real
situation: an audio server (i.e. pulse). If we imagine a low-latency
capture, with 10ms packets and a concurrent task on the same CPU (i.e.
video encoding, for a video call), the audio server will preempt the
CPU at a rate of 100HZ. At 91us overhead per preemption (switching to
and from the audio process), that's 0.9% overhead for one process doing
preemption. In real-world testing (on a A4-9120C), I've seen 9% of CPU
used by IBPB when doing a 2-person video call.
With this patch, the number of IBPBs issued can be reduced to the
minimum necessary, only when there's a potential attacker->victim
process switch.
Running on the same A4-9120C device, this patch reduces the performance
hit of IBPB by ~half, as expected:
localhost ~ # taskset 1 /usr/local/bin/test ds
...
iters: 32768, t: 1824043, iter/sec: 17964, us/iter: 55
It should be noted, CPUs from multiple vendors experience a performance
hit due to IBPB. I also tested a Intel i3-8130U which sees a noticable
(~2x) increase in process switch time due to IBPB.
IB spec enabled:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1210821us, iter/sec: 216501, us/iter: 4
IB spec disabled:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 131072, t: 1257583us, iter/sec: 104225, us/iter: 9
Open questions:
- There are a significant number of task flags, which also now reaches the
limit of the 'long' on 32-bit systems. Should the 'mode' flags be
stored somewhere else?
- Having x86-specific flags in linux/sched.h feels wrong. However, this
is the mechanism for doing atomic flag updates. Is there an alternate
approach?
Open tasks:
- Documentation
- Naming
Changes in v2:
- Make flag per-process using prctl().
Anand K Mistry (2):
x86/speculation: Allow per-process control of when to issue IBPB
selftests: Benchmark for the cost of disabling IB speculation
arch/x86/include/asm/thread_info.h | 4 +
arch/x86/kernel/cpu/bugs.c | 56 +++++++++
arch/x86/kernel/process.c | 10 ++
arch/x86/mm/tlb.c | 51 ++++++--
include/linux/sched.h | 10 ++
include/uapi/linux/prctl.h | 5 +
.../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
7 files changed, 236 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c
--
2.31.1.498.g6c1eba8ee3d-goog
^ permalink raw reply [flat|nested] 2+ messages in thread
* [RFC PATCH v2 2/2] selftests: Benchmark for the cost of disabling IB speculation
2021-04-29 8:44 [RFC PATCH v2 0/2] x86/speculation: Add finer control for when to issue IBPB Anand K Mistry
@ 2021-04-29 8:44 ` Anand K Mistry
0 siblings, 0 replies; 2+ messages in thread
From: Anand K Mistry @ 2021-04-29 8:44 UTC (permalink / raw)
To: x86
Cc: joelaf, asteinhauser, bp, tglx, Anand K Mistry, Shuah Khan,
linux-kernel, linux-kselftest
This is a simple benchmark for determining the cost of disabling IB
speculation. It forks a child process and does a simple ping-pong
using pipes between the parent and child. The child process can have IB
speculation disabled by running with 'd' as the first argument.
The test increases the number of iterations until the iterations take at
least 1 second, to minimise noise.
This file is NOT intended for inclusion in the kernel source. It is
presented here as a patch for reference and for others to replicate
results.
The binary should be run with 'taskset' and pinned to a single core,
since the goal is to benchmark process switching on a single core.
Signed-off-by: Anand K Mistry <amistry@google.com>
---
(no changes since v1)
.../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
1 file changed, 109 insertions(+)
create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c
diff --git a/tools/testing/selftests/ib_spec/ib_spec_bench.c b/tools/testing/selftests/ib_spec/ib_spec_bench.c
new file mode 100644
index 000000000000..e8eab910a9d0
--- /dev/null
+++ b/tools/testing/selftests/ib_spec/ib_spec_bench.c
@@ -0,0 +1,109 @@
+#include <stdio.h>
+#include <time.h>
+#include <stdint.h>
+#include <assert.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+
+#define PR_SPEC_IBPB_MODE 2
+#define PR_SPEC_IBPB_MODE_DEFAULT 0
+#define PR_SPEC_IBPB_MODE_SANDBOX 1
+#define PR_SPEC_IBPB_MODE_PROTECT 2
+
+int64_t get_time_us() {
+ struct timespec ts = {0};
+ assert(clock_gettime(CLOCK_MONOTONIC, &ts) == 0);
+ return (ts.tv_sec * 1000000) + (ts.tv_nsec/1000);
+}
+
+void pong(int read_fd, int write_fd) {
+ int ret;
+ char buf;
+
+ while (1) {
+ ret = read(read_fd, &buf, 1);
+ if (ret == 0)
+ return;
+ assert(ret == 1);
+
+ assert(write(write_fd, &buf, 1) == 1);
+ }
+}
+
+void ping_once(int write_fd, int read_fd) {
+ char buf = 42;
+ assert(write(write_fd, &buf, 1) == 1);
+ assert(read(read_fd, &buf, 1) == 1);
+}
+
+int64_t ping_multi(int iters, int write_fd, int read_fd) {
+ int64_t start_time = get_time_us();
+ for (int i = 0; i < iters; i++)
+ ping_once(write_fd, read_fd);
+ return get_time_us() - start_time;
+}
+
+void run_test(int write_fd, int read_fd) {
+ int64_t iters = 1;
+ int64_t t;
+ for (int i = 0; i < 60; i++) {
+ t = ping_multi(iters, write_fd, read_fd);
+ printf("iters: %d, t: %dus, iter/sec: %d, us/iter: %d\n",
+ iters, t, (iters * 1000000LL) / t, t/iters);
+
+ if (t > 1000000)
+ break;
+ iters <<= 1;
+ }
+}
+
+int main(int argc, char* argv[]) {
+ int fds_ping[2], fds_pong[2];
+ assert(pipe(fds_ping) == 0);
+ assert(pipe(fds_pong) == 0);
+
+ int disable_ib = 0;
+ int spec_ibpb_mode = 0;
+
+ if (argc > 1) {
+ int done = 0;
+ for (int i = 0; !done; i++) {
+ switch (argv[1][i]) {
+ case 0:
+ done = 1;
+ break;
+ case 'd':
+ disable_ib = 1;
+ break;
+ case 's':
+ spec_ibpb_mode = PR_SPEC_IBPB_MODE_SANDBOX;
+ break;
+ case 'p':
+ spec_ibpb_mode = PR_SPEC_IBPB_MODE_PROTECT;
+ break;
+ }
+ }
+ }
+
+ pid_t pid = fork();
+ assert(pid >= 0);
+ if (!pid) {
+ if (prctl(PR_SET_SPECULATION_CTRL,
+ PR_SPEC_IBPB_MODE, spec_ibpb_mode, 0, 0)) {
+ perror("Unable to set IBPB mode");
+ }
+
+ if (disable_ib)
+ assert(prctl(PR_SET_SPECULATION_CTRL,
+ PR_SPEC_INDIRECT_BRANCH,
+ PR_SPEC_DISABLE, 0, 0) == 0);
+
+ close(fds_ping[1]);
+ pong(fds_ping[0], fds_pong[1]);
+ } else {
+ run_test(fds_ping[1], fds_pong[0]);
+ close(fds_ping[1]);
+ }
+
+ return 0;
+}
--
2.31.1.498.g6c1eba8ee3d-goog
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-04-29 8:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-29 8:44 [RFC PATCH v2 0/2] x86/speculation: Add finer control for when to issue IBPB Anand K Mistry
2021-04-29 8:44 ` [RFC PATCH v2 2/2] selftests: Benchmark for the cost of disabling IB speculation Anand K Mistry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox