* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups [not found] ` <ad6c2633f39e39583bc5c5eaf7ccbe52@overdrivepizza.com> @ 2022-02-09 4:05 ` Kees Cook 2022-02-09 5:18 ` Joao Moreira 0 siblings, 1 reply; 18+ messages in thread From: Kees Cook @ 2022-02-09 4:05 UTC (permalink / raw) To: Joao Moreira Cc: Peter Zijlstra, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm On Tue, Feb 08, 2022 at 06:21:16PM -0800, Joao Moreira wrote: > > > Note: This feature was already submitted for upstreaming with the > > > llvm-project: https://reviews.llvm.org/D116070 > > > > Ah nice; I see this has been committed now. > > Yes, but then some front-end changes also required this fix > https://reviews.llvm.org/D118052, which is currently under review (posting > this here in case someone is trying this out). > > > > > Given that IBT will need to work with both Clang and gcc, I suspect the > > objtool approach will still end up needing to do all the verification. > > > > (And as you say, it has limited visibility into assembly.) > > Agreed that at this point objtool provides more coverage. Yet, besides being > an attempt to relief objtool and improve a bit the compiler support as > mentioned in the series cover letter, it is still nice to reduce the > left-over nops and fixups which end-up scattered all around. > > FWIIW, https://reviews.llvm.org/D118438 and https://reviews.llvm.org/D118355 > are also being cooked. Comments and ideas for new approaches or improvements > in the compiler support for this are very welcome :) Ah, excellent, thanks for the pointers. There's also this in the works: https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice to objtool, IBT, etc.) -- Kees Cook ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-09 4:05 ` [RFC][PATCH 6/6] objtool: Add IBT validation / fixups Kees Cook @ 2022-02-09 5:18 ` Joao Moreira 2022-02-11 13:38 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Joao Moreira @ 2022-02-09 5:18 UTC (permalink / raw) To: Kees Cook Cc: Peter Zijlstra, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm > Ah, excellent, thanks for the pointers. There's also this in the works: > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice > to objtool, IBT, etc.) Oh, great! Thanks for pointing it out. I guess I saw something with a similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf Jokes aside (and perhaps questions more targeted to Sami), from a diagonal look it seems that this follows the good old tag approach proposed by PaX/grsecurity, right? If this is the case, should I assume it could also benefit from features like -mibt-seal? Also are you considering that perhaps we can use alternatives to flip different CFI instrumentation as suggested by PeterZ in another thread? Tks, Joao ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-09 5:18 ` Joao Moreira @ 2022-02-11 13:38 ` Peter Zijlstra 2022-02-14 21:38 ` Sami Tolvanen ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-11 13:38 UTC (permalink / raw) To: Joao Moreira Cc: Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote: > > Ah, excellent, thanks for the pointers. There's also this in the works: > > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice > > to objtool, IBT, etc.) > > Oh, great! Thanks for pointing it out. I guess I saw something with a > similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf > > Jokes aside (and perhaps questions more targeted to Sami), from a diagonal > look it seems that this follows the good old tag approach proposed by > PaX/grsecurity, right? If this is the case, should I assume it could also > benefit from features like -mibt-seal? Also are you considering that perhaps > we can use alternatives to flip different CFI instrumentation as suggested > by PeterZ in another thread? So, lets try and recap things from IRC yesterday. There's a whole bunch of things intertwining making indirect branches 'interesting'. Most of which I've not seen mentioned in Sami's KCFI proposal which makes it rather pointless. I think we'll end up with something related to KCFI, but with distinct differences: - 32bit immediates for smaller code - __kcfi_check_fail() is out for smaller code - it must interact with IBT/BTI and retpolines - we must be very careful with speculation. Right, so because !IBT-CFI needs the check at the call site, like: caller: cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes je 1f # 2 bytes ud2 # 2 bytes 1: call __x86_indirect_thunk_rax # 5 bytes .align 16 .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes func: ... ret While FineIBT has them at the landing site: caller: movl $0xdeadbeef, %r11d # 6 bytes call __x86_indirect_thunk_rax # 5 bytes .align 16 func: endbr # 4 bytes cmpl $0xdeadbeef, %r11d # 7 bytes je 1f # 2 bytes ud2 # 2 bytes 1: ... ret It seems to me that always doing the check at the call-site is simpler, since it avoids code-bloat and patching work. That is, if we allow both we'll effectivly blow up the code by 11 + 13 bytes (11 at the call site, 13 at function entry) as opposed to 11+4 or 6+13. Which then yields: caller: cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes je 1f # 2 bytes ud2 # 2 bytes 1: call __x86_indirect_thunk_rax # 5 bytes .align 16 .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes func: endbr # 4 bytes ... ret For a combined 11+8 bytes overhead :/ Now, this setup provides: - minimal size (please yell if there's a smaller option I missed; s/ud2/int3/ ?) - since the retpoline handles speculation from stuff before it, the load-miss induced speculation is covered. - the 'je' branch is binary, leading to either the retpoline or the ud2, both which are a speculation stop. - the ud2 is placed such that if the exception is non-fatal, code execution can recover - when IBT is present we can rewrite the thunk call to: lfence call *(%rax) and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes). - can disable CFI by replacing the cmpl with: jmp 1f (or an 11 byte nop, which is just about possible). And since we already have all retpoline thunk callsites in a section, we can trivially find all CFI bits that are always in front it them. - function pointer sanity Additionally, if we ensure all direct call are +4 and only indirect calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. We can replace those ENDBR instructions of functions that should never be indirectly called with: ud1 0x0(%rax),%eax which is a 4 byte #UD. This gives us the property that even on !IBT hardware such a call will go *splat*. Further, Andrew put in the request for __attribute__((cfi_seed(blah))) to allow distinguishing indirect functions with otherwise identical signature; eg. cookie = hash32(blah##signature). Did I miss anything? Got anything wrong? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-11 13:38 ` Peter Zijlstra @ 2022-02-14 21:38 ` Sami Tolvanen 2022-02-14 22:25 ` Peter Zijlstra 2022-02-15 22:45 ` Joao Moreira ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Sami Tolvanen @ 2022-02-14 21:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote: > I think we'll end up with something related to KCFI, but with distinct > differences: > > - 32bit immediates for smaller code Sure, I don't see issues with that. Based on a quick test with defconfig, this reduces vmlinux size by 0.30%. > - __kcfi_check_fail() is out for smaller code I'm fine with adding a trap mode that's used by default, but having more helpful diagnostics when something fails is useful even in production systems in my experience. This change results in a vmlinux that's another 0.92% smaller. > Which then yields: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes Note that the compiler might not emit this *exact* sequence of instructions. For example, Clang generates this for events_sysfs_show with the modified KCFI patch: 2274: cmpl $0x4d7bed9e,-0x4(%r11) 227c: jne 22c0 <events_sysfs_show+0x6c> 227e: call 2283 <events_sysfs_show+0x2f> 227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4 ... 22c0: ud2 In this case the function has two indirect calls and Clang seems to prefer to emit just one ud2. > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes Here func is no longer aligned to 16 bytes, in case that's important. > Further, Andrew put in the request for __attribute__((cfi_seed(blah))) > to allow distinguishing indirect functions with otherwise identical > signature; eg. cookie = hash32(blah##signature). Sounds reasonable. > Did I miss anything? Got anything wrong? How would you like to deal with the 4-byte hashes in objtool? We either need to annotate all function symbols in the kernel, or we need a way to distinguish the hashes from random instructions, so we can also have functions that don't have a type hash. Sami ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-14 21:38 ` Sami Tolvanen @ 2022-02-14 22:25 ` Peter Zijlstra 2022-02-15 16:56 ` Sami Tolvanen 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-02-14 22:25 UTC (permalink / raw) To: Sami Tolvanen Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote: > > I think we'll end up with something related to KCFI, but with distinct > > differences: > > > > - 32bit immediates for smaller code > > Sure, I don't see issues with that. Based on a quick test with > defconfig, this reduces vmlinux size by 0.30%. > > > - __kcfi_check_fail() is out for smaller code > > I'm fine with adding a trap mode that's used by default, but having > more helpful diagnostics when something fails is useful even in > production systems in my experience. This change results in a vmlinux > that's another 0.92% smaller. You can easily have the exception generate a nice warning, you can even have it continue. You really don't need a call for that. > > Which then yields: > > > > caller: > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > je 1f # 2 bytes > > ud2 # 2 bytes > > 1: call __x86_indirect_thunk_rax # 5 bytes > > Note that the compiler might not emit this *exact* sequence of > instructions. For example, Clang generates this for events_sysfs_show > with the modified KCFI patch: > > 2274: cmpl $0x4d7bed9e,-0x4(%r11) > 227c: jne 22c0 <events_sysfs_show+0x6c> > 227e: call 2283 <events_sysfs_show+0x2f> > 227f: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4 > ... > 22c0: ud2 > > In this case the function has two indirect calls and Clang seems to > prefer to emit just one ud2. That will not allow you to recover from the exception. UD2 is not an unconditional fail. It should have an out-going edge in this case too. Heck, most of the WARN_ON() things are UD2 instructions. Also, you really should add a CS prefix to the retpoline thunk call if you insist on using r11 (or any of the higher regs). > > .align 16 > > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > > func: > > endbr # 4 bytes > > Here func is no longer aligned to 16 bytes, in case that's important. The idea was to have the hash and the endbr in the same cacheline. > > Did I miss anything? Got anything wrong? > > How would you like to deal with the 4-byte hashes in objtool? We > either need to annotate all function symbols in the kernel, or we need > a way to distinguish the hashes from random instructions, so we can > also have functions that don't have a type hash. Easiest would be to create a special section with all the hash offsets in I suppose. A bit like -mfentry-section=name. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-14 22:25 ` Peter Zijlstra @ 2022-02-15 16:56 ` Sami Tolvanen 2022-02-15 20:03 ` Kees Cook 2022-02-15 20:53 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Sami Tolvanen @ 2022-02-15 16:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > I'm fine with adding a trap mode that's used by default, but having > > more helpful diagnostics when something fails is useful even in > > production systems in my experience. This change results in a vmlinux > > that's another 0.92% smaller. > > You can easily have the exception generate a nice warning, you can even > have it continue. You really don't need a call for that. Sure, but wouldn't that require us to generate something like __bug_table, so we know where the CFI specific traps are? > > In this case the function has two indirect calls and Clang seems to > > prefer to emit just one ud2. > > That will not allow you to recover from the exception. UD2 is not an > unconditional fail. It should have an out-going edge in this case too. Yes, CFI failures are not recoverable in that code. In fact, LLVM assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I suppose we could just use an int3 instead. I assume that's sufficient to stop speculation? > Also, you really should add a CS prefix to the retpoline thunk call if > you insist on using r11 (or any of the higher regs). I actually didn't touch the retpoline thunk call, that's exactly the code Clang normally generates. > > How would you like to deal with the 4-byte hashes in objtool? We > > either need to annotate all function symbols in the kernel, or we need > > a way to distinguish the hashes from random instructions, so we can > > also have functions that don't have a type hash. > > Easiest would be to create a special section with all the hash offsets > in I suppose. A bit like -mfentry-section=name. OK, I'll take a look. With 64-bit hashes I was planning to use a known prefix, but that's not really an option with a 32-bit hash. Sami ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 16:56 ` Sami Tolvanen @ 2022-02-15 20:03 ` Kees Cook 2022-02-15 21:05 ` Peter Zijlstra 2022-02-15 20:53 ` Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Kees Cook @ 2022-02-15 20:03 UTC (permalink / raw) To: Sami Tolvanen Cc: Peter Zijlstra, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > I'm fine with adding a trap mode that's used by default, but having > > > more helpful diagnostics when something fails is useful even in > > > production systems in my experience. This change results in a vmlinux > > > that's another 0.92% smaller. > > > > You can easily have the exception generate a nice warning, you can even > > have it continue. You really don't need a call for that. > > Sure, but wouldn't that require us to generate something like > __bug_table, so we know where the CFI specific traps are? It also means the trap handler needs to do a bunch of instruction decoding to find the address that was going to be jumped to, etc. > > > In this case the function has two indirect calls and Clang seems to > > > prefer to emit just one ud2. > > > > That will not allow you to recover from the exception. UD2 is not an > > unconditional fail. It should have an out-going edge in this case too. > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > suppose we could just use an int3 instead. I assume that's sufficient > to stop speculation? Peter, is there a reason you want things in the specific order of: cmp, je-to-call, trap, call Isn't it more run-time efficient to have an out-of-line failure of the form: cmp, jne-to-trap, call, ...code..., trap, jmp-to-call I thought the static label stuff allowed the "default out of line" option, as far as pessimizing certain states, etc? The former is certainly code-size smaller, though, yes, but doesn't it waste space in the cache line for the unlikely case, etc? > > Also, you really should add a CS prefix to the retpoline thunk call if > > you insist on using r11 (or any of the higher regs). > > I actually didn't touch the retpoline thunk call, that's exactly the > code Clang normally generates. > > > > How would you like to deal with the 4-byte hashes in objtool? We > > > either need to annotate all function symbols in the kernel, or we need > > > a way to distinguish the hashes from random instructions, so we can > > > also have functions that don't have a type hash. > > > > Easiest would be to create a special section with all the hash offsets > > in I suppose. A bit like -mfentry-section=name. > > OK, I'll take a look. With 64-bit hashes I was planning to use a known > prefix, but that's not really an option with a 32-bit hash. 32-bit hashes would have both code size and runtime benefits: fewer instructions for the compare therefore a smaller set of instructions added. -- Kees Cook ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 20:03 ` Kees Cook @ 2022-02-15 21:05 ` Peter Zijlstra 2022-02-15 23:05 ` Kees Cook 2022-02-16 12:24 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-15 21:05 UTC (permalink / raw) To: Kees Cook Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote: > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > > I'm fine with adding a trap mode that's used by default, but having > > > > more helpful diagnostics when something fails is useful even in > > > > production systems in my experience. This change results in a vmlinux > > > > that's another 0.92% smaller. > > > > > > You can easily have the exception generate a nice warning, you can even > > > have it continue. You really don't need a call for that. > > > > Sure, but wouldn't that require us to generate something like > > __bug_table, so we know where the CFI specific traps are? > > It also means the trap handler needs to do a bunch of instruction > decoding to find the address that was going to be jumped to, etc. arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we need to to know that to re-write the thunk-call. > > > > In this case the function has two indirect calls and Clang seems to > > > > prefer to emit just one ud2. > > > > > > That will not allow you to recover from the exception. UD2 is not an > > > unconditional fail. It should have an out-going edge in this case too. > > > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > > suppose we could just use an int3 instead. I assume that's sufficient > > to stop speculation? > > Peter, is there a reason you want things in the specific order of: > > cmp, je-to-call, trap, call > > Isn't it more run-time efficient to have an out-of-line failure of > the form: > > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call > > I thought the static label stuff allowed the "default out of line" > option, as far as pessimizing certain states, etc? The former is certainly > code-size smaller, though, yes, but doesn't it waste space in the cache > line for the unlikely case, etc? Mostly so that we can deduce the address of the trap from the retpoline site, also the above has a fairly high chance of using jcc.d32 which is actually larger than jcc.d8+ud2. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 21:05 ` Peter Zijlstra @ 2022-02-15 23:05 ` Kees Cook 2022-02-15 23:38 ` Joao Moreira 2022-02-16 12:24 ` Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Kees Cook @ 2022-02-15 23:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote: > On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote: > > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > > > I'm fine with adding a trap mode that's used by default, but having > > > > > more helpful diagnostics when something fails is useful even in > > > > > production systems in my experience. This change results in a vmlinux > > > > > that's another 0.92% smaller. > > > > > > > > You can easily have the exception generate a nice warning, you can even > > > > have it continue. You really don't need a call for that. > > > > > > Sure, but wouldn't that require us to generate something like > > > __bug_table, so we know where the CFI specific traps are? > > > > It also means the trap handler needs to do a bunch of instruction > > decoding to find the address that was going to be jumped to, etc. > > arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we > need to to know that to re-write the thunk-call. Ah, okay, well that makes things easier. :) > > > > > In this case the function has two indirect calls and Clang seems to > > > > > prefer to emit just one ud2. > > > > > > > > That will not allow you to recover from the exception. UD2 is not an > > > > unconditional fail. It should have an out-going edge in this case too. > > > > > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > > > suppose we could just use an int3 instead. I assume that's sufficient > > > to stop speculation? > > > > Peter, is there a reason you want things in the specific order of: > > > > cmp, je-to-call, trap, call > > > > Isn't it more run-time efficient to have an out-of-line failure of > > the form: > > > > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call > > > > I thought the static label stuff allowed the "default out of line" > > option, as far as pessimizing certain states, etc? The former is certainly > > code-size smaller, though, yes, but doesn't it waste space in the cache > > line for the unlikely case, etc? > > Mostly so that we can deduce the address of the trap from the retpoline > site, also the above has a fairly high chance of using jcc.d32 which is > actually larger than jcc.d8+ud2. Ah, yeah, that's an interesting point. Still, I worry about finding ways to convinces Clang to emit precisely cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't. :P -- Kees Cook ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 23:05 ` Kees Cook @ 2022-02-15 23:38 ` Joao Moreira 0 siblings, 0 replies; 18+ messages in thread From: Joao Moreira @ 2022-02-15 23:38 UTC (permalink / raw) To: Kees Cook Cc: Peter Zijlstra, Sami Tolvanen, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm >> >> Mostly so that we can deduce the address of the trap from the >> retpoline >> site, also the above has a fairly high chance of using jcc.d32 which >> is >> actually larger than jcc.d8+ud2. > > Ah, yeah, that's an interesting point. > > Still, I worry about finding ways to convinces Clang to emit precisely > cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't. > :P This can probably be done more easily/precisely if implemented directly in the compiler's arch-specific backend. At least for x86 it wasn't a hassle to emit a defined sequence of instructions in the past. The price is that it will require a pass specific to each supported architecture, but I guess this isn't that bad. Perhaps this is discussion for a different mailing list, idk... but just pointing that it is not a huge wall. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 21:05 ` Peter Zijlstra 2022-02-15 23:05 ` Kees Cook @ 2022-02-16 12:24 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-16 12:24 UTC (permalink / raw) To: Kees Cook Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote: > > Peter, is there a reason you want things in the specific order of: > > > > cmp, je-to-call, trap, call > > > > Isn't it more run-time efficient to have an out-of-line failure of > > the form: > > > > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call > > > > I thought the static label stuff allowed the "default out of line" > > option, as far as pessimizing certain states, etc? The former is certainly > > code-size smaller, though, yes, but doesn't it waste space in the cache > > line for the unlikely case, etc? > > Mostly so that we can deduce the address of the trap from the retpoline > site, also the above has a fairly high chance of using jcc.d32 which is > actually larger than jcc.d8+ud2. Also; and I think I mentioned this a few emails back, by having the whole CFI thing as a single range of bytes it becomes easier to patch. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-15 16:56 ` Sami Tolvanen 2022-02-15 20:03 ` Kees Cook @ 2022-02-15 20:53 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-15 20:53 UTC (permalink / raw) To: Sami Tolvanen Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote: > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote: > > > I'm fine with adding a trap mode that's used by default, but having > > > more helpful diagnostics when something fails is useful even in > > > production systems in my experience. This change results in a vmlinux > > > that's another 0.92% smaller. > > > > You can easily have the exception generate a nice warning, you can even > > have it continue. You really don't need a call for that. > > Sure, but wouldn't that require us to generate something like > __bug_table, so we know where the CFI specific traps are? Yes, but since you're going to emit a retpoline, and objtool already generates a list of retpoline sites, we can abuse that. If the instruction after the trap is a retpoline site, we a CFI fail. > > > In this case the function has two indirect calls and Clang seems to > > > prefer to emit just one ud2. > > > > That will not allow you to recover from the exception. UD2 is not an > > unconditional fail. It should have an out-going edge in this case too. > > Yes, CFI failures are not recoverable in that code. In fact, LLVM > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I > suppose we could just use an int3 instead. I assume that's sufficient > to stop speculation? It is. int3 is also smaller by one byte, but the exception is already fairly complicated, but I can see if we can make it work. > > Also, you really should add a CS prefix to the retpoline thunk call if > > you insist on using r11 (or any of the higher regs). > > I actually didn't touch the retpoline thunk call, that's exactly the > code Clang normally generates. Yeah, and it needs help, also see: https://lore.kernel.org/lkml/20211119165630.276205624@infradead.org/ Specifically, clang needs to: - CS prefix stuff the retpoline thunk call/jmp; - stop doing conditional indirect tail-calls. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-11 13:38 ` Peter Zijlstra 2022-02-14 21:38 ` Sami Tolvanen @ 2022-02-15 22:45 ` Joao Moreira 2022-02-16 0:57 ` Andrew Cooper 2022-03-02 3:06 ` Peter Collingbourne 3 siblings, 0 replies; 18+ messages in thread From: Joao Moreira @ 2022-02-15 22:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm On 2022-02-11 05:38, Peter Zijlstra wrote: > On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote: >> > Ah, excellent, thanks for the pointers. There's also this in the works: >> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice >> > to objtool, IBT, etc.) >> >> Oh, great! Thanks for pointing it out. I guess I saw something with a >> similar name before ;) >> https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf >> >> Jokes aside (and perhaps questions more targeted to Sami), from a >> diagonal >> look it seems that this follows the good old tag approach proposed by >> PaX/grsecurity, right? If this is the case, should I assume it could >> also >> benefit from features like -mibt-seal? Also are you considering that >> perhaps >> we can use alternatives to flip different CFI instrumentation as >> suggested >> by PeterZ in another thread? > > So, lets try and recap things from IRC yesterday. There's a whole bunch > of things intertwining making indirect branches 'interesting'. Most of > which I've not seen mentioned in Sami's KCFI proposal which makes it > rather pointless. > > I think we'll end up with something related to KCFI, but with distinct > differences: > > - 32bit immediates for smaller code > - __kcfi_check_fail() is out for smaller code > - it must interact with IBT/BTI and retpolines > - we must be very careful with speculation. > > Right, so because !IBT-CFI needs the check at the call site, like: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > ... > ret > > > While FineIBT has them at the landing site: > > caller: > movl $0xdeadbeef, %r11d # 6 bytes > call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > func: > endbr # 4 bytes > cmpl $0xdeadbeef, %r11d # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: ... > ret > > > It seems to me that always doing the check at the call-site is simpler, > since it avoids code-bloat and patching work. That is, if we allow both > we'll effectivly blow up the code by 11 + 13 bytes (11 at the call > site, > 13 at function entry) as opposed to 11+4 or 6+13. > > Which then yields: > > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes > ... > ret > > For a combined 11+8 bytes overhead :/ > > Now, this setup provides: > > - minimal size (please yell if there's a smaller option I missed; > s/ud2/int3/ ?) > - since the retpoline handles speculation from stuff before it, the > load-miss induced speculation is covered. > - the 'je' branch is binary, leading to either the retpoline or the > ud2, both which are a speculation stop. > - the ud2 is placed such that if the exception is non-fatal, code > execution can recover > - when IBT is present we can rewrite the thunk call to: > > lfence > call *(%rax) > > and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes). > - can disable CFI by replacing the cmpl with: > > jmp 1f > > (or an 11 byte nop, which is just about possible). And since we > already have all retpoline thunk callsites in a section, we can > trivially find all CFI bits that are always in front it them. > - function pointer sanity > Agreed that it is sensible to support CFI for CPUs without CET. KCFI is a win. Yet, I still think that we should support FineIBT and there are a few reasons for that (other than hopeful performance benefits). - KCFI is more prone to the presence of unintended allowed targets. Since the IBT scheme always rely on the same watermark tag (the ENDBR instruction), it is easier to prevent these from being emitted by JIT/compilers (fwiiw, see https://reviews.llvm.org/rGf385823e04f300c92ec03dbd660d621cc618a271 and https://dl.acm.org/doi/10.1145/3337167.3337175). - Regarding the space overhead, I can try to verify if FineIBT can be feasibly reshaped into: caller: movl $0xdeadbeef,%r10 # 6 bytes sub $0x5,%r11 # 4 bytes call *(%r11) # 5 bytes .align 16 endbr # 4 bytes call __x86_fineibt_thunk_deadbeef # 5 bytes func+4: ... ret __x86_fineibt_thunk_deadbeef: xor $0xdeadbeef, %r10 je 1f ud2 1: retq This scheme would require less space and less runtime patching. We would need one additional byte on callees to patch both the ENDBR and the 5-byte call instruction, plus space to hold the thunks somewhere else. The thunks overhead is then dilluted across the functions belonging to the same equivalence set, as these will use the same thunk. On a quick analysis over defconfig, it seems that the number of equivalence sets are ~25% of the number of functions (thus, space overhead is cut to a fourth). I guess this would only require the KCFI compiler support to emit an additional "0xcc" before the tag. Thunks can also be placed in a special section and dropped during boot to reduce the runtime memory footprint. What do you think, is this worth investigating? Also, in my understanding, r11 does not need to be preserved as, per-ABI, it is a scratch register. So there is no need to add $0x5,%r11 back after the call. Let me know if I'm mistaken. > > Additionally, if we ensure all direct call are +4 and only indirect > calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. > We > can replace those ENDBR instructions of functions that should never be > indirectly called with: > > ud1 0x0(%rax),%eax > > which is a 4 byte #UD. This gives us the property that even on !IBT > hardware such a call will go *splat*. Given that the space overhead is a big concern, isn't it better to use the compiler support to prevent ENDBRs in the first place? Patching #UD is kinda cool, yeah, but I don't see it providing meaningful security guarantees over not having the ENDBRs emitted at all. > > Further, Andrew put in the request for __attribute__((cfi_seed(blah))) > to allow distinguishing indirect functions with otherwise identical > signature; eg. cookie = hash32(blah##signature). > > > Did I miss anything? Got anything wrong? I understand that Today there are not too many CET-enabled CPUs out there, and that the benefits might be a bit blurred currently. Yet, given that this is something that should continuously change with time, would you object to FineIBT as a build option, i.e., something which might not be supported by alternatives when using defconfig? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-11 13:38 ` Peter Zijlstra 2022-02-14 21:38 ` Sami Tolvanen 2022-02-15 22:45 ` Joao Moreira @ 2022-02-16 0:57 ` Andrew Cooper 2022-03-02 3:06 ` Peter Collingbourne 3 siblings, 0 replies; 18+ messages in thread From: Andrew Cooper @ 2022-02-16 0:57 UTC (permalink / raw) To: Peter Zijlstra, Joao Moreira Cc: Kees Cook, x86@kernel.org, hjl.tools@gmail.com, jpoimboe@redhat.com, linux-kernel@vger.kernel.org, ndesaulniers@google.com, samitolvanen@google.com, llvm@lists.linux.dev On 11/02/2022 13:38, Peter Zijlstra wrote: > Now, this setup provides: > > - minimal size (please yell if there's a smaller option I missed; > s/ud2/int3/ ?) It's probably best not to overload int3. #BP is already has magic properties, and what happens if someone wants to breakpoint one of these? I'll make my usual into suggestion for 64bit builds. ud2 is the sensible approach, with a caveat that whatever is hiding here should have metadata. ~Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-02-11 13:38 ` Peter Zijlstra ` (2 preceding siblings ...) 2022-02-16 0:57 ` Andrew Cooper @ 2022-03-02 3:06 ` Peter Collingbourne 2022-03-02 3:32 ` Joao Moreira 2022-06-08 17:53 ` Fāng-ruì Sòng 3 siblings, 2 replies; 18+ messages in thread From: Peter Collingbourne @ 2022-03-02 3:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Joao Moreira, Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm Hi Peter, One issue with this call sequence is that: On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > caller: > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes Because this instruction ends in the constant 0xdeadbeef, it may be used as a "gadget" that would effectively allow branching to an arbitrary address in %rax if the attacker can arrange to set ZF=1. > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > func: > endbr # 4 bytes > ... > ret I think we can avoid this problem with a slight tweak to your instruction sequence, at the cost of 2 bytes per function prologue. First, change the call sequence like so: cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes je 1f # 2 bytes ud2 # 2 bytes 1: call __x86_indirect_thunk_rax # 5 bytes The key difference is that we've changed 0x4 to 0x6. Then change the function prologue to this: .align 16 .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes .zero 2 # 2 bytes func: The end result of the above is that the constant embedded in the cmpl instruction may only be used to reach the following ud2 instruction, which will "harmlessly" terminate execution in the same way as if the prologue signature did not match. Peter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-03-02 3:06 ` Peter Collingbourne @ 2022-03-02 3:32 ` Joao Moreira 2022-06-08 17:53 ` Fāng-ruì Sòng 1 sibling, 0 replies; 18+ messages in thread From: Joao Moreira @ 2022-03-02 3:32 UTC (permalink / raw) To: Peter Collingbourne Cc: Peter Zijlstra, Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm On 2022-03-01 19:06, Peter Collingbourne wrote: > Hi Peter, > > One issue with this call sequence is that: > > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: >> caller: >> cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > Because this instruction ends in the constant 0xdeadbeef, it may > be used as a "gadget" that would effectively allow branching to an > arbitrary address in %rax if the attacker can arrange to set ZF=1. > >> je 1f # 2 bytes >> ud2 # 2 bytes >> 1: call __x86_indirect_thunk_rax # 5 bytes >> >> >> .align 16 >> .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes >> func: >> endbr # 4 bytes >> ... >> ret > > I think we can avoid this problem with a slight tweak to your > instruction sequence, at the cost of 2 bytes per function prologue. > First, change the call sequence like so: > > cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > The key difference is that we've changed 0x4 to 0x6. > > Then change the function prologue to this: > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > .zero 2 # 2 bytes > func: > > The end result of the above is that the constant embedded in the cmpl > instruction may only be used to reach the following ud2 instruction, > which will "harmlessly" terminate execution in the same way as if > the prologue signature did not match. FWIIW, this makes sense IMHO. These additional pre-prologue bytes are also what might be needed to enable the smaller version of FineIBT that I suggested in this thread some time ago. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-03-02 3:06 ` Peter Collingbourne 2022-03-02 3:32 ` Joao Moreira @ 2022-06-08 17:53 ` Fāng-ruì Sòng 2022-06-09 0:05 ` Sami Tolvanen 1 sibling, 1 reply; 18+ messages in thread From: Fāng-ruì Sòng @ 2022-06-08 17:53 UTC (permalink / raw) To: Peter Collingbourne Cc: Peter Zijlstra, Joao Moreira, Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm Hi Peter, On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote: > > Hi Peter, > One issue with this call sequence is that: > > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > > caller: > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > Because this instruction ends in the constant 0xdeadbeef, it may > be used as a "gadget" that would effectively allow branching to an > arbitrary address in %rax if the attacker can arrange to set ZF=1. Do you mind elaborating how this instruction can be used as a gadget? How does it look like? The information will be useful to the summary of Sami's KCFI LLVM patch: https://reviews.llvm.org/D119296 > > je 1f # 2 bytes > > ud2 # 2 bytes > > 1: call __x86_indirect_thunk_rax # 5 bytes > > > > > > .align 16 > > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > > func: > > endbr # 4 bytes > > ... > > ret > > I think we can avoid this problem with a slight tweak to your > instruction sequence, at the cost of 2 bytes per function prologue. > First, change the call sequence like so: > > cmpl $0xdeadbeef, -0x6(%rax) # 6 bytes > je 1f # 2 bytes > ud2 # 2 bytes > 1: call __x86_indirect_thunk_rax # 5 bytes > > The key difference is that we've changed 0x4 to 0x6. > > Then change the function prologue to this: > > .align 16 > .byte 0xef, 0xbe, 0xad, 0xde # 4 bytes > .zero 2 # 2 bytes > func: > > The end result of the above is that the constant embedded in the cmpl > instruction may only be used to reach the following ud2 instruction, > which will "harmlessly" terminate execution in the same way as if > the prologue signature did not match. > > Peter > -- 宋方睿 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups 2022-06-08 17:53 ` Fāng-ruì Sòng @ 2022-06-09 0:05 ` Sami Tolvanen 0 siblings, 0 replies; 18+ messages in thread From: Sami Tolvanen @ 2022-06-09 0:05 UTC (permalink / raw) To: Fāng-ruì Sòng Cc: Peter Collingbourne, Peter Zijlstra, Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML, Nick Desaulniers, llvm On Wed, Jun 8, 2022 at 10:53 AM Fāng-ruì Sòng <maskray@google.com> wrote: > > Hi Peter, > > On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote: > > > > Hi Peter, > > One issue with this call sequence is that: > > > > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote: > > > caller: > > > cmpl $0xdeadbeef, -0x4(%rax) # 7 bytes > > > > Because this instruction ends in the constant 0xdeadbeef, it may > > be used as a "gadget" that would effectively allow branching to an > > arbitrary address in %rax if the attacker can arrange to set ZF=1. > > Do you mind elaborating how this instruction can be used as a gadget? > How does it look like? With the offset of -4, the je instruction here can be an indirect call target because it's preceded by a valid type hash at the end of the cmpl instruction. If we change the offset to -6, only the ud2 instruction is a potential call target in this sequence, which will be less useful to an attacker. > The information will be useful to the summary of Sami's KCFI LLVM > patch: https://reviews.llvm.org/D119296 I'll add more information about the X86 preamble to the description. Sami ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-06-09 0:06 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211122170301.764232470@infradead.org>
[not found] ` <20211122170805.338489412@infradead.org>
[not found] ` <6ebb0ab131c522f20c094294d49091fc@overdrivepizza.com>
[not found] ` <202202081541.900F9E1B@keescook>
[not found] ` <ad6c2633f39e39583bc5c5eaf7ccbe52@overdrivepizza.com>
2022-02-09 4:05 ` [RFC][PATCH 6/6] objtool: Add IBT validation / fixups Kees Cook
2022-02-09 5:18 ` Joao Moreira
2022-02-11 13:38 ` Peter Zijlstra
2022-02-14 21:38 ` Sami Tolvanen
2022-02-14 22:25 ` Peter Zijlstra
2022-02-15 16:56 ` Sami Tolvanen
2022-02-15 20:03 ` Kees Cook
2022-02-15 21:05 ` Peter Zijlstra
2022-02-15 23:05 ` Kees Cook
2022-02-15 23:38 ` Joao Moreira
2022-02-16 12:24 ` Peter Zijlstra
2022-02-15 20:53 ` Peter Zijlstra
2022-02-15 22:45 ` Joao Moreira
2022-02-16 0:57 ` Andrew Cooper
2022-03-02 3:06 ` Peter Collingbourne
2022-03-02 3:32 ` Joao Moreira
2022-06-08 17:53 ` Fāng-ruì Sòng
2022-06-09 0:05 ` Sami Tolvanen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox