From: Tim Chen <tim.c.chen@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
the arch/x86 maintainers <x86@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Johannes Wikner <kwikner@ethz.ch>,
Alyssa Milburn <alyssa.milburn@linux.intel.com>,
Jann Horn <jannh@google.com>, "H.J. Lu" <hjl.tools@gmail.com>,
Joao Moreira <joao.moreira@intel.com>,
Joseph Nuzman <joseph.nuzman@intel.com>,
Steven Rostedt <rostedt@goodmis.org>,
Juergen Gross <jgross@suse.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
Date: Fri, 22 Jul 2022 13:11:25 -0700 [thread overview]
Message-ID: <e84fd559e79152d7065f7ceb3bcdd9af6b496ac5.camel@linux.intel.com> (raw)
In-Reply-To: <87o7xmup5t.ffs@tglx>
On Mon, 2022-07-18 at 22:44 +0200, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 12:51, Linus Torvalds wrote:
> > On Mon, Jul 18, 2022 at 12:30 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > Let the compiler add a 16 byte padding in front of each function entry
> > > point and put the call depth accounting there. That avoids calling out
> > > into the module area and reduces ITLB pressure.
> >
> > Ooh.
> >
> > I actually like this a lot better.
> >
> > Could we just say "use this instead if you have SKL and care about the issue?"
> >
> > I don't hate your module thunk trick, but this does seem *so* much
> > simpler, and if it performs better anyway, it really does seem like
> > the better approach.
>
> Yes, Peter and I came from avoiding a new compiler and the overhead for
> everyone when putting the padding into the code. We realized only when
> staring at the perf data that this padding in front of the function
> might be an acceptable solution. I did some more tests today on different
> machines with mitigations=off with kernels compiled with and without
> that padding. I couldn't find a single test case where the result was
> outside of the usual noise. But then my tests are definitely incomplete.
>
Here are some performance numbers for FIO running on a SKX server with
Intel Cold Stream SSD. Padding improves performance significantly.
Tested latest depth tracking code from Thomas:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=depthtracking
(SHA1 714d29e3e7e3faac27142424ae2533163ddd3a46)
latest gcc patch from Thomas is included at the end.
Baseline Baseline
read (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 356.33 6.35 0.00% 7.11% 98.93%
retbleed=off 332.67 5.51 -6.64% 0.00% 99.16%
retbleed=ibrs 242.00 5.57 -32.09% -27.25% 99.41%
retbleed=stuff (nopad) 281.67 4.62 -20.95% -15.33% 99.35%
retbleed=stuff (pad) 310.67 0.58 -12.82% -6.61% 99.29%
read/write Baseline Baseline
70/30 (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 340.60 8.12 0.00% 4.01% 96.80%
retbleed=off 327.47 8.03 -3.86% 0.00% 97.06%
retbleed=ibrs 239.47 0.75 -29.69% -26.87% 98.23%
retbleed=stuff (nopad) 275.20 0.69 -19.20% -15.96% 97.86%
retbleed=stuff (pad) 296.60 2.03 -12.92% -9.43% 97.14%
Baseline Baseline
write (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 299.33 4.04 0.00% 7.16% 93.51%
retbleed=off 279.33 7.51 -6.68% 0.00% 94.30%
retbleed=ibrs 231.33 0.58 -22.72% -17.18% 95.84%
retbleed=stuff (nopad) 257.67 0.58 -13.92% -7.76% 94.96%
retbleed=stuff (pad) 274.67 1.53 -8.24% -1.67% 94.31%
Tim
gcc patch from Thomas:
---
gcc/config/i386/i386.cc | 13 +++++++++++++
gcc/config/i386/i386.h | 7 +++++++
gcc/config/i386/i386.opt | 4 ++++
gcc/doc/invoke.texi | 6 ++++++
4 files changed, 30 insertions(+)
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6182,6 +6182,19 @@ ix86_code_end (void)
file_end_indicate_split_stack ();
}
+void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+ const char *fnname ATTRIBUTE_UNUSED)
+{
+ if (force_function_padding)
+ {
+ fprintf (asm_out_file, "\t.align %d\n",
+ 1 << force_function_padding);
+ fprintf (asm_out_file, "\t.skip %d,0xcc\n",
+ 1 << force_function_padding);
+ }
+}
+
/* Emit code for the SET_GOT patterns. */
const char *
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2860,6 +2860,13 @@ extern enum attr_cpu ix86_schedule;
#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
#endif
+#include <stdio.h>
+extern void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+ const char *fnname ATTRIBUTE_UNUSED);
+#undef ASM_OUTPUT_FUNCTION_PREFIX
+#define ASM_OUTPUT_FUNCTION_PREFIX x86_asm_output_function_prefix
+
/*
Local variables:
version-control: t
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1064,6 +1064,10 @@ mindirect-branch=
Target RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
Convert indirect call and jump to call and return thunks.
+mforce-function-padding=
+Target Joined UInteger Var(force_function_padding) Init(0) IntegerRange(0, 6)
+Put a 2^$N byte padding area before each function
+
mfunction-return=
Target RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
Convert function return to call and return thunk.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1451,6 +1451,7 @@ See RS/6000 and PowerPC Options.
-mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol
-mindirect-branch-register -mharden-sls=@var{choice} @gol
-mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access}
+-mforce-function-padding @gol
@emph{x86 Windows Options}
@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -32849,6 +32850,11 @@ Force all calls to functions to be indir
when using Intel Processor Trace where it generates more precise timing
information for function calls.
+@item -mforce-function-padding
+@opindex -mforce-function-padding
+Force a 16 byte padding are before each function which allows run-time
+code patching to put a special prologue before the function entry.
+
@item -mmanual-endbr
@opindex mmanual-endbr
Insert ENDBR instruction at function entry only via the @code{cf_check}
next prev parent reply other threads:[~2022-07-22 20:11 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-16 23:17 [patch 00/38] x86/retbleed: Call depth tracking mitigation Thomas Gleixner
2022-07-16 23:17 ` [patch 01/38] x86/paravirt: Ensure proper alignment Thomas Gleixner
2022-07-16 23:17 ` [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() Thomas Gleixner
2022-07-17 0:22 ` Andrew Cooper
2022-07-17 15:20 ` Linus Torvalds
2022-07-17 19:08 ` Thomas Gleixner
2022-07-17 20:08 ` Thomas Gleixner
2022-07-17 20:13 ` Thomas Gleixner
2022-07-17 21:54 ` Thomas Gleixner
2022-07-18 5:11 ` Juergen Gross
2022-07-18 6:54 ` Thomas Gleixner
2022-07-18 8:55 ` Thomas Gleixner
2022-07-18 9:31 ` Peter Zijlstra
2022-07-18 10:33 ` Thomas Gleixner
2022-07-18 11:42 ` Thomas Gleixner
2022-07-18 17:52 ` [patch 0/3] x86/cpu: Sanitize switch_to_new_gdt() Thomas Gleixner
2022-07-18 17:52 ` [patch 1/3] x86/cpu: Remove segment load from switch_to_new_gdt() Thomas Gleixner
2022-07-18 18:43 ` Linus Torvalds
2022-07-18 18:55 ` Thomas Gleixner
2022-07-18 17:52 ` [patch 2/3] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations Thomas Gleixner
2022-07-18 17:52 ` [patch 3/3] x86/cpu: Re-enable stackprotector Thomas Gleixner
2022-07-16 23:17 ` [patch 03/38] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc() Thomas Gleixner
2022-07-16 23:17 ` [patch 04/38] x86/vdso: Ensure all kernel code is seen by objtool Thomas Gleixner
2022-07-16 23:17 ` [patch 05/38] btree: Initialize early when builtin Thomas Gleixner
2022-07-16 23:17 ` [patch 06/38] objtool: Allow GS relative relocs Thomas Gleixner
2022-07-16 23:17 ` [patch 07/38] objtool: Track init section Thomas Gleixner
2022-07-16 23:17 ` [patch 08/38] objtool: Add .call_sites section Thomas Gleixner
2022-07-16 23:17 ` [patch 09/38] objtool: Add .sym_sites section Thomas Gleixner
2022-07-16 23:17 ` [patch 10/38] objtool: Add --hacks=skylake Thomas Gleixner
2022-07-16 23:17 ` [patch 11/38] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls Thomas Gleixner
2022-07-16 23:17 ` [patch 12/38] x86/entry: Make sync_regs() invocation a tail call Thomas Gleixner
2022-07-16 23:17 ` [patch 13/38] x86/modules: Make module_alloc() generally available Thomas Gleixner
2022-07-16 23:17 ` [patch 14/38] x86/Kconfig: Add CONFIG_CALL_THUNKS Thomas Gleixner
2022-07-16 23:17 ` [patch 15/38] x86/retbleed: Add X86_FEATURE_CALL_DEPTH Thomas Gleixner
2022-07-16 23:17 ` [patch 16/38] modules: Make struct module_layout unconditionally available Thomas Gleixner
2022-07-16 23:17 ` [patch 17/38] module: Add arch_data to module_layout Thomas Gleixner
2022-07-16 23:17 ` [patch 18/38] mm/vmalloc: Provide huge page mappings Thomas Gleixner
2022-07-16 23:17 ` [patch 19/38] x86/module: Provide __module_alloc() Thomas Gleixner
2022-07-16 23:17 ` [patch 20/38] x86/alternatives: Provide text_poke_[copy|set]_locked() Thomas Gleixner
2022-07-16 23:17 ` [patch 21/38] x86/entry: Make some entry symbols global Thomas Gleixner
2022-07-16 23:17 ` [patch 22/38] x86/paravirt: Make struct paravirt_call_site unconditionally available Thomas Gleixner
2022-07-16 23:17 ` [patch 23/38] x86/callthunks: Add call patching for call depth tracking Thomas Gleixner
2022-07-16 23:17 ` [patch 24/38] module: Add layout for callthunks tracking Thomas Gleixner
2022-07-16 23:17 ` [patch 25/38] x86/modules: Add call thunk patching Thomas Gleixner
2022-07-16 23:17 ` [patch 26/38] x86/returnthunk: Allow different return thunks Thomas Gleixner
2022-07-16 23:17 ` [patch 27/38] x86/asm: Provide ALTERNATIVE_3 Thomas Gleixner
2022-07-16 23:17 ` [patch 28/38] x86/retbleed: Add SKL return thunk Thomas Gleixner
2022-07-16 23:17 ` [patch 29/38] x86/retpoline: Add SKL retthunk retpolines Thomas Gleixner
2022-07-16 23:17 ` [patch 30/38] x86/retbleed: Add SKL call thunk Thomas Gleixner
2022-07-16 23:18 ` [patch 31/38] x86/calldepth: Add ret/call counting for debug Thomas Gleixner
2022-07-16 23:18 ` [patch 32/38] static_call: Add call depth tracking support Thomas Gleixner
2022-07-16 23:18 ` [patch 33/38] kallsyms: Take callthunks into account Thomas Gleixner
2022-07-16 23:18 ` [patch 34/38] x86/orc: Make it callthunk aware Thomas Gleixner
2022-07-16 23:18 ` [patch 35/38] kprobes: Add callthunk blacklisting Thomas Gleixner
2022-07-16 23:18 ` [patch 36/38] x86/ftrace: Make it call depth tracking aware Thomas Gleixner
2022-07-18 21:01 ` Steven Rostedt
2022-07-19 8:46 ` Peter Zijlstra
2022-07-19 13:06 ` Steven Rostedt
2022-07-16 23:18 ` [patch 37/38] x86/bpf: Emit call depth accounting if required Thomas Gleixner
2022-07-19 5:30 ` Alexei Starovoitov
2022-07-19 8:34 ` Peter Zijlstra
2022-07-16 23:18 ` [patch 38/38] x86/retbleed: Add call depth tracking mitigation Thomas Gleixner
2022-07-17 9:45 ` [patch 00/38] x86/retbleed: Call " David Laight
2022-07-17 15:07 ` Thomas Gleixner
2022-07-17 17:56 ` David Laight
2022-07-17 19:15 ` Thomas Gleixner
2022-07-18 19:29 ` Thomas Gleixner
2022-07-18 19:30 ` Thomas Gleixner
2022-07-18 19:51 ` Linus Torvalds
2022-07-18 20:44 ` Thomas Gleixner
2022-07-18 21:01 ` Linus Torvalds
2022-07-18 21:43 ` Peter Zijlstra
2022-07-18 22:34 ` Linus Torvalds
2022-07-18 23:52 ` Peter Zijlstra
2022-07-18 21:18 ` Peter Zijlstra
2022-07-18 22:22 ` Thomas Gleixner
2022-07-18 22:47 ` Joao Moreira
2022-07-18 22:55 ` Sami Tolvanen
2022-07-18 23:08 ` Joao Moreira
2022-07-18 23:19 ` Thomas Gleixner
2022-07-18 23:42 ` Linus Torvalds
2022-07-18 23:52 ` Linus Torvalds
2022-07-18 23:57 ` Peter Zijlstra
2022-07-19 0:03 ` Linus Torvalds
2022-07-19 0:11 ` Linus Torvalds
2022-07-19 0:23 ` Peter Zijlstra
2022-07-19 1:02 ` Linus Torvalds
2022-07-19 17:19 ` Sami Tolvanen
2022-07-20 21:13 ` Peter Zijlstra
2022-07-21 8:21 ` David Laight
2022-07-21 10:56 ` David Laight
2022-07-21 15:54 ` Peter Zijlstra
2022-07-21 17:55 ` Peter Zijlstra
2022-07-21 18:06 ` Linus Torvalds
2022-07-21 18:27 ` Peter Zijlstra
2022-07-21 18:32 ` Linus Torvalds
2022-07-21 20:22 ` Joao Moreira
2022-07-22 0:16 ` Sami Tolvanen
2022-07-22 10:23 ` Peter Zijlstra
2022-07-22 15:38 ` Sami Tolvanen
2022-07-21 22:01 ` David Laight
2022-07-22 11:03 ` Peter Zijlstra
2022-07-22 13:27 ` David Laight
2022-07-23 9:50 ` Thomas Gleixner
2022-07-19 0:01 ` Linus Torvalds
2022-07-19 0:19 ` Joao Moreira
2022-07-19 17:21 ` Sami Tolvanen
2022-07-19 17:58 ` Joao Moreira
2022-07-19 8:26 ` David Laight
2022-07-19 16:27 ` Linus Torvalds
2022-07-19 17:23 ` Sami Tolvanen
2022-07-19 17:27 ` Linus Torvalds
2022-07-19 18:06 ` Sami Tolvanen
2022-07-19 20:10 ` Peter Zijlstra
2022-07-18 22:48 ` Sami Tolvanen
2022-07-18 22:59 ` Thomas Gleixner
2022-07-18 23:10 ` Sami Tolvanen
2022-07-18 23:39 ` Linus Torvalds
2022-07-18 23:51 ` Peter Zijlstra
2022-07-20 9:00 ` Thomas Gleixner
2022-07-20 16:55 ` Sami Tolvanen
2022-07-20 19:42 ` Sami Tolvanen
2022-07-22 20:11 ` Tim Chen [this message]
2022-07-22 22:18 ` Linus Torvalds
2022-07-18 19:55 ` Thomas Gleixner
2022-07-19 10:24 ` Virt " Andrew Cooper
2022-07-19 14:13 ` Thomas Gleixner
2022-07-19 16:23 ` Andrew Cooper
2022-07-19 21:17 ` Thomas Gleixner
2022-07-19 14:45 ` Michael Kelley (LINUX)
2022-07-19 20:16 ` Peter Zijlstra
2022-07-20 16:57 ` [patch 00/38] x86/retbleed: " Steven Rostedt
2022-07-20 17:09 ` Linus Torvalds
2022-07-20 17:24 ` Peter Zijlstra
2022-07-20 17:50 ` Steven Rostedt
2022-07-20 18:07 ` Linus Torvalds
2022-07-20 18:31 ` Steven Rostedt
2022-07-20 18:43 ` Linus Torvalds
2022-07-20 19:11 ` Steven Rostedt
2022-07-20 19:36 ` Kees Cook
2022-07-20 19:43 ` Steven Rostedt
2022-07-20 21:36 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e84fd559e79152d7065f7ceb3bcdd9af6b496ac5.camel@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=alyssa.milburn@linux.intel.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=hjl.tools@gmail.com \
--cc=jannh@google.com \
--cc=jgross@suse.com \
--cc=joao.moreira@intel.com \
--cc=joseph.nuzman@intel.com \
--cc=jpoimboe@kernel.org \
--cc=kwikner@ethz.ch \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox