public inbox for llvm@lists.linux.dev
 help / color / mirror / Atom feed
* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
       [not found]       ` <ad6c2633f39e39583bc5c5eaf7ccbe52@overdrivepizza.com>
@ 2022-02-09  4:05         ` Kees Cook
  2022-02-09  5:18           ` Joao Moreira
  0 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2022-02-09  4:05 UTC (permalink / raw)
  To: Joao Moreira
  Cc: Peter Zijlstra, x86, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, llvm

On Tue, Feb 08, 2022 at 06:21:16PM -0800, Joao Moreira wrote:
> > > Note: This feature was already submitted for upstreaming with the
> > > llvm-project: https://reviews.llvm.org/D116070
> > 
> > Ah nice; I see this has been committed now.
> 
> Yes, but then some front-end changes also required this fix
> https://reviews.llvm.org/D118052, which is currently under review (posting
> this here in case someone is trying this out).
> 
> > 
> > Given that IBT will need to work with both Clang and gcc, I suspect the
> > objtool approach will still end up needing to do all the verification.
> > 
> > (And as you say, it has limited visibility into assembly.)
> 
> Agreed that at this point objtool provides more coverage. Yet, besides being
> an attempt to relief objtool and improve a bit the compiler support as
> mentioned in the series cover letter, it is still nice to reduce the
> left-over nops and fixups which end-up scattered all around.
> 
> FWIIW, https://reviews.llvm.org/D118438 and https://reviews.llvm.org/D118355
> are also being cooked. Comments and ideas for new approaches or improvements
> in the compiler support for this are very welcome :)

Ah, excellent, thanks for the pointers. There's also this in the works:
https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
to objtool, IBT, etc.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-09  4:05         ` [RFC][PATCH 6/6] objtool: Add IBT validation / fixups Kees Cook
@ 2022-02-09  5:18           ` Joao Moreira
  2022-02-11 13:38             ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Joao Moreira @ 2022-02-09  5:18 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, x86, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, llvm

> Ah, excellent, thanks for the pointers. There's also this in the works:
> https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> to objtool, IBT, etc.)

Oh, great! Thanks for pointing it out. I guess I saw something with a 
similar name before ;) 
https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf

Jokes aside (and perhaps questions more targeted to Sami), from a 
diagonal look it seems that this follows the good old tag approach 
proposed by PaX/grsecurity, right? If this is the case, should I assume 
it could also benefit from features like -mibt-seal? Also are you 
considering that perhaps we can use alternatives to flip different CFI 
instrumentation as suggested by PeterZ in another thread?

Tks,
Joao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-09  5:18           ` Joao Moreira
@ 2022-02-11 13:38             ` Peter Zijlstra
  2022-02-14 21:38               ` Sami Tolvanen
                                 ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-11 13:38 UTC (permalink / raw)
  To: Joao Moreira
  Cc: Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, llvm

On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
> > Ah, excellent, thanks for the pointers. There's also this in the works:
> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> > to objtool, IBT, etc.)
> 
> Oh, great! Thanks for pointing it out. I guess I saw something with a
> similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
> 
> Jokes aside (and perhaps questions more targeted to Sami), from a diagonal
> look it seems that this follows the good old tag approach proposed by
> PaX/grsecurity, right? If this is the case, should I assume it could also
> benefit from features like -mibt-seal? Also are you considering that perhaps
> we can use alternatives to flip different CFI instrumentation as suggested
> by PeterZ in another thread?

So, lets try and recap things from IRC yesterday. There's a whole bunch
of things intertwining making indirect branches 'interesting'. Most of
which I've not seen mentioned in Sami's KCFI proposal which makes it
rather pointless.

I think we'll end up with something related to KCFI, but with distinct
differences:

 - 32bit immediates for smaller code
 - __kcfi_check_fail() is out for smaller code
 - it must interact with IBT/BTI and retpolines
 - we must be very careful with speculation.

Right, so because !IBT-CFI needs the check at the call site, like:

caller:
	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
func:
	...
	ret


While FineIBT has them at the landing site:

caller:
	movl	$0xdeadbeef, %r11d		# 6 bytes
	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
func:
	endbr					# 4 bytes
	cmpl	$0xdeadbeef, %r11d		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	...
	ret


It seems to me that always doing the check at the call-site is simpler,
since it avoids code-bloat and patching work. That is, if we allow both
we'll effectivly blow up the code by 11 + 13 bytes (11 at the call site,
13 at function entry) as opposed to 11+4 or 6+13.

Which then yields:

caller:
	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
func:
	endbr					# 4 bytes
	...
	ret

For a combined 11+8 bytes overhead :/

Now, this setup provides:

 - minimal size (please yell if there's a smaller option I missed;
   s/ud2/int3/ ?)
 - since the retpoline handles speculation from stuff before it, the
   load-miss induced speculation is covered.
 - the 'je' branch is binary, leading to either the retpoline or the
   ud2, both which are a speculation stop.
 - the ud2 is placed such that if the exception is non-fatal, code
   execution can recover
 - when IBT is present we can rewrite the thunk call to:

	lfence
	call *(%rax)

   and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
 - can disable CFI by replacing the cmpl with:

	jmp	1f

   (or an 11 byte nop, which is just about possible). And since we
   already have all retpoline thunk callsites in a section, we can
   trivially find all CFI bits that are always in front it them.
 - function pointer sanity


Additionally, if we ensure all direct call are +4 and only indirect
calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. We
can replace those ENDBR instructions of functions that should never be
indirectly called with:

	ud1    0x0(%rax),%eax

which is a 4 byte #UD. This gives us the property that even on !IBT
hardware such a call will go *splat*.

Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
to allow distinguishing indirect functions with otherwise identical
signature; eg. cookie = hash32(blah##signature).


Did I miss anything? Got anything wrong?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-11 13:38             ` Peter Zijlstra
@ 2022-02-14 21:38               ` Sami Tolvanen
  2022-02-14 22:25                 ` Peter Zijlstra
  2022-02-15 22:45               ` Joao Moreira
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Sami Tolvanen @ 2022-02-14 21:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote:
> I think we'll end up with something related to KCFI, but with distinct
> differences:
>
>  - 32bit immediates for smaller code

Sure, I don't see issues with that. Based on a quick test with
defconfig, this reduces vmlinux size by 0.30%.

>  - __kcfi_check_fail() is out for smaller code

I'm fine with adding a trap mode that's used by default, but having
more helpful diagnostics when something fails is useful even in
production systems in my experience. This change results in a vmlinux
that's another 0.92% smaller.

> Which then yields:
>
> caller:
>         cmpl    $0xdeadbeef, -0x4(%rax)         # 7 bytes
>         je      1f                              # 2 bytes
>         ud2                                     # 2 bytes
> 1:      call    __x86_indirect_thunk_rax        # 5 bytes

Note that the compiler might not emit this *exact* sequence of
instructions. For example, Clang generates this for events_sysfs_show
with the modified KCFI patch:

2274:       cmpl   $0x4d7bed9e,-0x4(%r11)
227c:       jne    22c0 <events_sysfs_show+0x6c>
227e:       call   2283 <events_sysfs_show+0x2f>
                    227f: R_X86_64_PLT32    __x86_indirect_thunk_r11-0x4
...
22c0:       ud2

In this case the function has two indirect calls and Clang seems to
prefer to emit just one ud2.

>         .align 16
>         .byte 0xef, 0xbe, 0xad, 0xde            # 4 bytes
> func:
>         endbr                                   # 4 bytes

Here func is no longer aligned to 16 bytes, in case that's important.

> Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
> to allow distinguishing indirect functions with otherwise identical
> signature; eg. cookie = hash32(blah##signature).

Sounds reasonable.

> Did I miss anything? Got anything wrong?

How would you like to deal with the 4-byte hashes in objtool? We
either need to annotate all function symbols in the kernel, or we need
a way to distinguish the hashes from random instructions, so we can
also have functions that don't have a type hash.

Sami

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-14 21:38               ` Sami Tolvanen
@ 2022-02-14 22:25                 ` Peter Zijlstra
  2022-02-15 16:56                   ` Sami Tolvanen
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-14 22:25 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> On Fri, Feb 11, 2022 at 5:38 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > I think we'll end up with something related to KCFI, but with distinct
> > differences:
> >
> >  - 32bit immediates for smaller code
> 
> Sure, I don't see issues with that. Based on a quick test with
> defconfig, this reduces vmlinux size by 0.30%.
> 
> >  - __kcfi_check_fail() is out for smaller code
> 
> I'm fine with adding a trap mode that's used by default, but having
> more helpful diagnostics when something fails is useful even in
> production systems in my experience. This change results in a vmlinux
> that's another 0.92% smaller.

You can easily have the exception generate a nice warning, you can even
have it continue. You really don't need a call for that.

> > Which then yields:
> >
> > caller:
> >         cmpl    $0xdeadbeef, -0x4(%rax)         # 7 bytes
> >         je      1f                              # 2 bytes
> >         ud2                                     # 2 bytes
> > 1:      call    __x86_indirect_thunk_rax        # 5 bytes
> 
> Note that the compiler might not emit this *exact* sequence of
> instructions. For example, Clang generates this for events_sysfs_show
> with the modified KCFI patch:
> 
> 2274:       cmpl   $0x4d7bed9e,-0x4(%r11)
> 227c:       jne    22c0 <events_sysfs_show+0x6c>
> 227e:       call   2283 <events_sysfs_show+0x2f>
>                     227f: R_X86_64_PLT32    __x86_indirect_thunk_r11-0x4
> ...
> 22c0:       ud2
> 
> In this case the function has two indirect calls and Clang seems to
> prefer to emit just one ud2.

That will not allow you to recover from the exception. UD2 is not an
unconditional fail. It should have an out-going edge in this case too.

Heck, most of the WARN_ON() things are UD2 instructions.

Also, you really should add a CS prefix to the retpoline thunk call if
you insist on using r11 (or any of the higher regs).

> >         .align 16
> >         .byte 0xef, 0xbe, 0xad, 0xde            # 4 bytes
> > func:
> >         endbr                                   # 4 bytes
> 
> Here func is no longer aligned to 16 bytes, in case that's important.

The idea was to have the hash and the endbr in the same cacheline.

> > Did I miss anything? Got anything wrong?
> 
> How would you like to deal with the 4-byte hashes in objtool? We
> either need to annotate all function symbols in the kernel, or we need
> a way to distinguish the hashes from random instructions, so we can
> also have functions that don't have a type hash.

Easiest would be to create a special section with all the hash offsets
in I suppose. A bit like -mfentry-section=name.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-14 22:25                 ` Peter Zijlstra
@ 2022-02-15 16:56                   ` Sami Tolvanen
  2022-02-15 20:03                     ` Kees Cook
  2022-02-15 20:53                     ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Sami Tolvanen @ 2022-02-15 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > I'm fine with adding a trap mode that's used by default, but having
> > more helpful diagnostics when something fails is useful even in
> > production systems in my experience. This change results in a vmlinux
> > that's another 0.92% smaller.
>
> You can easily have the exception generate a nice warning, you can even
> have it continue. You really don't need a call for that.

Sure, but wouldn't that require us to generate something like
__bug_table, so we know where the CFI specific traps are?

> > In this case the function has two indirect calls and Clang seems to
> > prefer to emit just one ud2.
>
> That will not allow you to recover from the exception. UD2 is not an
> unconditional fail. It should have an out-going edge in this case too.

Yes, CFI failures are not recoverable in that code. In fact, LLVM
assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
suppose we could just use an int3 instead. I assume that's sufficient
to stop speculation?

> Also, you really should add a CS prefix to the retpoline thunk call if
> you insist on using r11 (or any of the higher regs).

I actually didn't touch the retpoline thunk call, that's exactly the
code Clang normally generates.

> > How would you like to deal with the 4-byte hashes in objtool? We
> > either need to annotate all function symbols in the kernel, or we need
> > a way to distinguish the hashes from random instructions, so we can
> > also have functions that don't have a type hash.
>
> Easiest would be to create a special section with all the hash offsets
> in I suppose. A bit like -mfentry-section=name.

OK, I'll take a look. With 64-bit hashes I was planning to use a known
prefix, but that's not really an option with a 32-bit hash.

Sami

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 16:56                   ` Sami Tolvanen
@ 2022-02-15 20:03                     ` Kees Cook
  2022-02-15 21:05                       ` Peter Zijlstra
  2022-02-15 20:53                     ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Kees Cook @ 2022-02-15 20:03 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Peter Zijlstra, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > I'm fine with adding a trap mode that's used by default, but having
> > > more helpful diagnostics when something fails is useful even in
> > > production systems in my experience. This change results in a vmlinux
> > > that's another 0.92% smaller.
> >
> > You can easily have the exception generate a nice warning, you can even
> > have it continue. You really don't need a call for that.
> 
> Sure, but wouldn't that require us to generate something like
> __bug_table, so we know where the CFI specific traps are?

It also means the trap handler needs to do a bunch of instruction
decoding to find the address that was going to be jumped to, etc.

> > > In this case the function has two indirect calls and Clang seems to
> > > prefer to emit just one ud2.
> >
> > That will not allow you to recover from the exception. UD2 is not an
> > unconditional fail. It should have an out-going edge in this case too.
> 
> Yes, CFI failures are not recoverable in that code. In fact, LLVM
> assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> suppose we could just use an int3 instead. I assume that's sufficient
> to stop speculation?

Peter, is there a reason you want things in the specific order of:

cmp, je-to-call, trap, call

Isn't it more run-time efficient to have an out-of-line failure of
the form:

cmp, jne-to-trap, call, ...code..., trap, jmp-to-call

I thought the static label stuff allowed the "default out of line"
option, as far as pessimizing certain states, etc? The former is certainly
code-size smaller, though, yes, but doesn't it waste space in the cache
line for the unlikely case, etc?

> > Also, you really should add a CS prefix to the retpoline thunk call if
> > you insist on using r11 (or any of the higher regs).
> 
> I actually didn't touch the retpoline thunk call, that's exactly the
> code Clang normally generates.
> 
> > > How would you like to deal with the 4-byte hashes in objtool? We
> > > either need to annotate all function symbols in the kernel, or we need
> > > a way to distinguish the hashes from random instructions, so we can
> > > also have functions that don't have a type hash.
> >
> > Easiest would be to create a special section with all the hash offsets
> > in I suppose. A bit like -mfentry-section=name.
> 
> OK, I'll take a look. With 64-bit hashes I was planning to use a known
> prefix, but that's not really an option with a 32-bit hash.

32-bit hashes would have both code size and runtime benefits: fewer
instructions for the compare therefore a smaller set of instructions
added.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 16:56                   ` Sami Tolvanen
  2022-02-15 20:03                     ` Kees Cook
@ 2022-02-15 20:53                     ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-15 20:53 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Joao Moreira, Kees Cook, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > I'm fine with adding a trap mode that's used by default, but having
> > > more helpful diagnostics when something fails is useful even in
> > > production systems in my experience. This change results in a vmlinux
> > > that's another 0.92% smaller.
> >
> > You can easily have the exception generate a nice warning, you can even
> > have it continue. You really don't need a call for that.
> 
> Sure, but wouldn't that require us to generate something like
> __bug_table, so we know where the CFI specific traps are?

Yes, but since you're going to emit a retpoline, and objtool already
generates a list of retpoline sites, we can abuse that. If the
instruction after the trap is a retpoline site, we a CFI fail.

> > > In this case the function has two indirect calls and Clang seems to
> > > prefer to emit just one ud2.
> >
> > That will not allow you to recover from the exception. UD2 is not an
> > unconditional fail. It should have an out-going edge in this case too.
> 
> Yes, CFI failures are not recoverable in that code. In fact, LLVM
> assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> suppose we could just use an int3 instead. I assume that's sufficient
> to stop speculation?

It is. int3 is also smaller by one byte, but the exception is already
fairly complicated, but I can see if we can make it work.

> > Also, you really should add a CS prefix to the retpoline thunk call if
> > you insist on using r11 (or any of the higher regs).
> 
> I actually didn't touch the retpoline thunk call, that's exactly the
> code Clang normally generates.

Yeah, and it needs help, also see:

  https://lore.kernel.org/lkml/20211119165630.276205624@infradead.org/

Specifically, clang needs to:

 - CS prefix stuff the retpoline thunk call/jmp;
 - stop doing conditional indirect tail-calls.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 20:03                     ` Kees Cook
@ 2022-02-15 21:05                       ` Peter Zijlstra
  2022-02-15 23:05                         ` Kees Cook
  2022-02-16 12:24                         ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-15 21:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote:
> On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > > I'm fine with adding a trap mode that's used by default, but having
> > > > more helpful diagnostics when something fails is useful even in
> > > > production systems in my experience. This change results in a vmlinux
> > > > that's another 0.92% smaller.
> > >
> > > You can easily have the exception generate a nice warning, you can even
> > > have it continue. You really don't need a call for that.
> > 
> > Sure, but wouldn't that require us to generate something like
> > __bug_table, so we know where the CFI specific traps are?
> 
> It also means the trap handler needs to do a bunch of instruction
> decoding to find the address that was going to be jumped to, etc.

arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we
need to to know that to re-write the thunk-call.

> > > > In this case the function has two indirect calls and Clang seems to
> > > > prefer to emit just one ud2.
> > >
> > > That will not allow you to recover from the exception. UD2 is not an
> > > unconditional fail. It should have an out-going edge in this case too.
> > 
> > Yes, CFI failures are not recoverable in that code. In fact, LLVM
> > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> > suppose we could just use an int3 instead. I assume that's sufficient
> > to stop speculation?
> 
> Peter, is there a reason you want things in the specific order of:
> 
> cmp, je-to-call, trap, call
> 
> Isn't it more run-time efficient to have an out-of-line failure of
> the form:
> 
> cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> 
> I thought the static label stuff allowed the "default out of line"
> option, as far as pessimizing certain states, etc? The former is certainly
> code-size smaller, though, yes, but doesn't it waste space in the cache
> line for the unlikely case, etc?

Mostly so that we can deduce the address of the trap from the retpoline
site, also the above has a fairly high chance of using jcc.d32 which is
actually larger than jcc.d8+ud2.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-11 13:38             ` Peter Zijlstra
  2022-02-14 21:38               ` Sami Tolvanen
@ 2022-02-15 22:45               ` Joao Moreira
  2022-02-16  0:57               ` Andrew Cooper
  2022-03-02  3:06               ` Peter Collingbourne
  3 siblings, 0 replies; 18+ messages in thread
From: Joao Moreira @ 2022-02-15 22:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, llvm

On 2022-02-11 05:38, Peter Zijlstra wrote:
> On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
>> > Ah, excellent, thanks for the pointers. There's also this in the works:
>> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
>> > to objtool, IBT, etc.)
>> 
>> Oh, great! Thanks for pointing it out. I guess I saw something with a
>> similar name before ;) 
>> https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
>> 
>> Jokes aside (and perhaps questions more targeted to Sami), from a 
>> diagonal
>> look it seems that this follows the good old tag approach proposed by
>> PaX/grsecurity, right? If this is the case, should I assume it could 
>> also
>> benefit from features like -mibt-seal? Also are you considering that 
>> perhaps
>> we can use alternatives to flip different CFI instrumentation as 
>> suggested
>> by PeterZ in another thread?
> 
> So, lets try and recap things from IRC yesterday. There's a whole bunch
> of things intertwining making indirect branches 'interesting'. Most of
> which I've not seen mentioned in Sami's KCFI proposal which makes it
> rather pointless.
> 
> I think we'll end up with something related to KCFI, but with distinct
> differences:
> 
>  - 32bit immediates for smaller code
>  - __kcfi_check_fail() is out for smaller code
>  - it must interact with IBT/BTI and retpolines
>  - we must be very careful with speculation.
> 
> Right, so because !IBT-CFI needs the check at the call site, like:
> 
> caller:
> 	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
> 	je	1f				# 2 bytes
> 	ud2					# 2 bytes
> 1:	call	__x86_indirect_thunk_rax	# 5 bytes
> 
> 
> 	.align 16
> 	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
> func:
> 	...
> 	ret
> 
> 
> While FineIBT has them at the landing site:
> 
> caller:
> 	movl	$0xdeadbeef, %r11d		# 6 bytes
> 	call	__x86_indirect_thunk_rax	# 5 bytes
> 
> 
> 	.align 16
> func:
> 	endbr					# 4 bytes
> 	cmpl	$0xdeadbeef, %r11d		# 7 bytes
> 	je	1f				# 2 bytes
> 	ud2					# 2 bytes
> 1:	...
> 	ret
> 
> 
> It seems to me that always doing the check at the call-site is simpler,
> since it avoids code-bloat and patching work. That is, if we allow both
> we'll effectivly blow up the code by 11 + 13 bytes (11 at the call 
> site,
> 13 at function entry) as opposed to 11+4 or 6+13.
> 
> Which then yields:
> 
> caller:
> 	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
> 	je	1f				# 2 bytes
> 	ud2					# 2 bytes
> 1:	call	__x86_indirect_thunk_rax	# 5 bytes
> 
> 
> 	.align 16
> 	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
> func:
> 	endbr					# 4 bytes
> 	...
> 	ret
> 
> For a combined 11+8 bytes overhead :/
> 
> Now, this setup provides:
> 
>  - minimal size (please yell if there's a smaller option I missed;
>    s/ud2/int3/ ?)
>  - since the retpoline handles speculation from stuff before it, the
>    load-miss induced speculation is covered.
>  - the 'je' branch is binary, leading to either the retpoline or the
>    ud2, both which are a speculation stop.
>  - the ud2 is placed such that if the exception is non-fatal, code
>    execution can recover
>  - when IBT is present we can rewrite the thunk call to:
> 
> 	lfence
> 	call *(%rax)
> 
>    and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
>  - can disable CFI by replacing the cmpl with:
> 
> 	jmp	1f
> 
>    (or an 11 byte nop, which is just about possible). And since we
>    already have all retpoline thunk callsites in a section, we can
>    trivially find all CFI bits that are always in front it them.
>  - function pointer sanity
> 

Agreed that it is sensible to support CFI for CPUs without CET. KCFI is 
a win.
Yet, I still think that we should support FineIBT and there are a few 
reasons
for that (other than hopeful performance benefits).

- KCFI is more prone to the presence of unintended allowed targets. 
Since the
IBT scheme always rely on the same watermark tag (the ENDBR 
instruction), it
is easier to prevent these from being emitted by JIT/compilers (fwiiw, 
see
https://reviews.llvm.org/rGf385823e04f300c92ec03dbd660d621cc618a271 and
https://dl.acm.org/doi/10.1145/3337167.3337175).

- Regarding the space overhead, I can try to verify if FineIBT can be 
feasibly
reshaped into:

caller:
     movl    $0xdeadbeef,%r10            # 6 bytes
     sub     $0x5,%r11                   # 4 bytes
     call    *(%r11)                     # 5 bytes

     .align 16
     endbr                               # 4 bytes
     call __x86_fineibt_thunk_deadbeef   # 5 bytes
func+4:
     ...
     ret

__x86_fineibt_thunk_deadbeef:
     xor     $0xdeadbeef, %r10
     je      1f
     ud2
1:
     retq

This scheme would require less space and less runtime patching. We would 
need
one additional byte on callees to patch both the ENDBR and the 5-byte 
call
instruction, plus space to hold the thunks somewhere else. The thunks 
overhead
is then dilluted across the functions belonging to the same equivalence 
set,
as these will use the same thunk. On a quick analysis over defconfig, it 
seems
that the number of equivalence sets are ~25% of the number of functions 
(thus,
space overhead is cut to a fourth). I guess this would only require the 
KCFI
compiler support to emit an additional "0xcc" before the tag.

Thunks can also be placed in a special section and dropped during boot 
to
reduce the runtime memory footprint.

What do you think, is this worth investigating?

Also, in my understanding, r11 does not need to be preserved as, 
per-ABI, it
is a scratch register. So there is no need to add $0x5,%r11 back after 
the
call. Let me know if I'm mistaken.

> 
> Additionally, if we ensure all direct call are +4 and only indirect
> calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. 
> We
> can replace those ENDBR instructions of functions that should never be
> indirectly called with:
> 
> 	ud1    0x0(%rax),%eax
> 
> which is a 4 byte #UD. This gives us the property that even on !IBT
> hardware such a call will go *splat*.

Given that the space overhead is a big concern, isn't it better to use 
the
compiler support to prevent ENDBRs in the first place? Patching #UD is 
kinda
cool, yeah, but I don't see it providing meaningful security guarantees 
over
not having the ENDBRs emitted at all.

> 
> Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
> to allow distinguishing indirect functions with otherwise identical
> signature; eg. cookie = hash32(blah##signature).
> 
> 
> Did I miss anything? Got anything wrong?

I understand that Today there are not too many CET-enabled CPUs out 
there,
and that the benefits might be a bit blurred currently. Yet, given that 
this
is something that should continuously change with time, would you object 
to
FineIBT as a build option, i.e., something which might not be supported 
by
alternatives when using defconfig?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 21:05                       ` Peter Zijlstra
@ 2022-02-15 23:05                         ` Kees Cook
  2022-02-15 23:38                           ` Joao Moreira
  2022-02-16 12:24                         ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Kees Cook @ 2022-02-15 23:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 15, 2022 at 12:03:12PM -0800, Kees Cook wrote:
> > On Tue, Feb 15, 2022 at 08:56:03AM -0800, Sami Tolvanen wrote:
> > > On Mon, Feb 14, 2022 at 2:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Mon, Feb 14, 2022 at 01:38:18PM -0800, Sami Tolvanen wrote:
> > > > > I'm fine with adding a trap mode that's used by default, but having
> > > > > more helpful diagnostics when something fails is useful even in
> > > > > production systems in my experience. This change results in a vmlinux
> > > > > that's another 0.92% smaller.
> > > >
> > > > You can easily have the exception generate a nice warning, you can even
> > > > have it continue. You really don't need a call for that.
> > > 
> > > Sure, but wouldn't that require us to generate something like
> > > __bug_table, so we know where the CFI specific traps are?
> > 
> > It also means the trap handler needs to do a bunch of instruction
> > decoding to find the address that was going to be jumped to, etc.
> 
> arch/x86/kernel/alternative.c:apply_retpolines() has all that, since we
> need to to know that to re-write the thunk-call.

Ah, okay, well that makes things easier. :)

> > > > > In this case the function has two indirect calls and Clang seems to
> > > > > prefer to emit just one ud2.
> > > >
> > > > That will not allow you to recover from the exception. UD2 is not an
> > > > unconditional fail. It should have an out-going edge in this case too.
> > > 
> > > Yes, CFI failures are not recoverable in that code. In fact, LLVM
> > > assumes that the llvm.trap intrinsic (i.e. ud2) never returns, but I
> > > suppose we could just use an int3 instead. I assume that's sufficient
> > > to stop speculation?
> > 
> > Peter, is there a reason you want things in the specific order of:
> > 
> > cmp, je-to-call, trap, call
> > 
> > Isn't it more run-time efficient to have an out-of-line failure of
> > the form:
> > 
> > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> > 
> > I thought the static label stuff allowed the "default out of line"
> > option, as far as pessimizing certain states, etc? The former is certainly
> > code-size smaller, though, yes, but doesn't it waste space in the cache
> > line for the unlikely case, etc?
> 
> Mostly so that we can deduce the address of the trap from the retpoline
> site, also the above has a fairly high chance of using jcc.d32 which is
> actually larger than jcc.d8+ud2.

Ah, yeah, that's an interesting point.

Still, I worry about finding ways to convinces Clang to emit precisely
cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't.
:P

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 23:05                         ` Kees Cook
@ 2022-02-15 23:38                           ` Joao Moreira
  0 siblings, 0 replies; 18+ messages in thread
From: Joao Moreira @ 2022-02-15 23:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Sami Tolvanen, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

>> 
>> Mostly so that we can deduce the address of the trap from the 
>> retpoline
>> site, also the above has a fairly high chance of using jcc.d32 which 
>> is
>> actually larger than jcc.d8+ud2.
> 
> Ah, yeah, that's an interesting point.
> 
> Still, I worry about finding ways to convinces Clang to emit precisely
> cmp/je/trap/call, but I guess we'll catch it immediately if it doesn't.
> :P

This can probably be done more easily/precisely if implemented directly
in the compiler's arch-specific backend. At least for x86 it wasn't a
hassle to emit a defined sequence of instructions in the past. The price
is that it will require a pass specific to each supported architecture,
but I guess this isn't that bad.

Perhaps this is discussion for a different mailing list, idk... but
just pointing that it is not a huge wall.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-11 13:38             ` Peter Zijlstra
  2022-02-14 21:38               ` Sami Tolvanen
  2022-02-15 22:45               ` Joao Moreira
@ 2022-02-16  0:57               ` Andrew Cooper
  2022-03-02  3:06               ` Peter Collingbourne
  3 siblings, 0 replies; 18+ messages in thread
From: Andrew Cooper @ 2022-02-16  0:57 UTC (permalink / raw)
  To: Peter Zijlstra, Joao Moreira
  Cc: Kees Cook, x86@kernel.org, hjl.tools@gmail.com,
	jpoimboe@redhat.com, linux-kernel@vger.kernel.org,
	ndesaulniers@google.com, samitolvanen@google.com,
	llvm@lists.linux.dev

On 11/02/2022 13:38, Peter Zijlstra wrote:
> Now, this setup provides:
>
>  - minimal size (please yell if there's a smaller option I missed;
>    s/ud2/int3/ ?)

It's probably best not to overload int3.  #BP is already has magic
properties, and what happens if someone wants to breakpoint one of these?

I'll make my usual into suggestion for 64bit builds.

ud2 is the sensible approach, with a caveat that whatever is hiding here
should have metadata.

~Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-15 21:05                       ` Peter Zijlstra
  2022-02-15 23:05                         ` Kees Cook
@ 2022-02-16 12:24                         ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-16 12:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Sami Tolvanen, Joao Moreira, X86 ML, hjl.tools, Josh Poimboeuf,
	andrew.cooper3, LKML, Nick Desaulniers, llvm

On Tue, Feb 15, 2022 at 10:05:50PM +0100, Peter Zijlstra wrote:

> > Peter, is there a reason you want things in the specific order of:
> > 
> > cmp, je-to-call, trap, call
> > 
> > Isn't it more run-time efficient to have an out-of-line failure of
> > the form:
> > 
> > cmp, jne-to-trap, call, ...code..., trap, jmp-to-call
> > 
> > I thought the static label stuff allowed the "default out of line"
> > option, as far as pessimizing certain states, etc? The former is certainly
> > code-size smaller, though, yes, but doesn't it waste space in the cache
> > line for the unlikely case, etc?
> 
> Mostly so that we can deduce the address of the trap from the retpoline
> site, also the above has a fairly high chance of using jcc.d32 which is
> actually larger than jcc.d8+ud2.

Also; and I think I mentioned this a few emails back, by having the
whole CFI thing as a single range of bytes it becomes easier to patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-02-11 13:38             ` Peter Zijlstra
                                 ` (2 preceding siblings ...)
  2022-02-16  0:57               ` Andrew Cooper
@ 2022-03-02  3:06               ` Peter Collingbourne
  2022-03-02  3:32                 ` Joao Moreira
  2022-06-08 17:53                 ` Fāng-ruì Sòng
  3 siblings, 2 replies; 18+ messages in thread
From: Peter Collingbourne @ 2022-03-02  3:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joao Moreira, Kees Cook, x86, hjl.tools, jpoimboe, andrew.cooper3,
	linux-kernel, ndesaulniers, samitolvanen, llvm

Hi Peter,

One issue with this call sequence is that:

On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> caller:
> 	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes

Because this instruction ends in the constant 0xdeadbeef, it may
be used as a "gadget" that would effectively allow branching to an
arbitrary address in %rax if the attacker can arrange to set ZF=1.

> 	je	1f				# 2 bytes
> 	ud2					# 2 bytes
> 1:	call	__x86_indirect_thunk_rax	# 5 bytes
> 
> 
> 	.align 16
> 	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
> func:
> 	endbr					# 4 bytes
> 	...
> 	ret

I think we can avoid this problem with a slight tweak to your
instruction sequence, at the cost of 2 bytes per function prologue.
First, change the call sequence like so:

 	cmpl	$0xdeadbeef, -0x6(%rax)		# 6 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	call	__x86_indirect_thunk_rax	# 5 bytes

The key difference is that we've changed 0x4 to 0x6.

Then change the function prologue to this:

	.align 16
	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
	.zero 2					# 2 bytes
func:

The end result of the above is that the constant embedded in the cmpl
instruction may only be used to reach the following ud2 instruction,
which will "harmlessly" terminate execution in the same way as if
the prologue signature did not match.

Peter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-03-02  3:06               ` Peter Collingbourne
@ 2022-03-02  3:32                 ` Joao Moreira
  2022-06-08 17:53                 ` Fāng-ruì Sòng
  1 sibling, 0 replies; 18+ messages in thread
From: Joao Moreira @ 2022-03-02  3:32 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Peter Zijlstra, Kees Cook, x86, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm

On 2022-03-01 19:06, Peter Collingbourne wrote:
> Hi Peter,
> 
> One issue with this call sequence is that:
> 
> On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
>> caller:
>> 	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
> 
> Because this instruction ends in the constant 0xdeadbeef, it may
> be used as a "gadget" that would effectively allow branching to an
> arbitrary address in %rax if the attacker can arrange to set ZF=1.
> 
>> 	je	1f				# 2 bytes
>> 	ud2					# 2 bytes
>> 1:	call	__x86_indirect_thunk_rax	# 5 bytes
>> 
>> 
>> 	.align 16
>> 	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
>> func:
>> 	endbr					# 4 bytes
>> 	...
>> 	ret
> 
> I think we can avoid this problem with a slight tweak to your
> instruction sequence, at the cost of 2 bytes per function prologue.
> First, change the call sequence like so:
> 
>  	cmpl	$0xdeadbeef, -0x6(%rax)		# 6 bytes
> 	je	1f				# 2 bytes
> 	ud2					# 2 bytes
> 1:	call	__x86_indirect_thunk_rax	# 5 bytes
> 
> The key difference is that we've changed 0x4 to 0x6.
> 
> Then change the function prologue to this:
> 
> 	.align 16
> 	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
> 	.zero 2					# 2 bytes
> func:
> 
> The end result of the above is that the constant embedded in the cmpl
> instruction may only be used to reach the following ud2 instruction,
> which will "harmlessly" terminate execution in the same way as if
> the prologue signature did not match.

FWIIW, this makes sense IMHO. These additional pre-prologue bytes are 
also what might be needed to enable the smaller version of FineIBT that 
I suggested in this thread some time ago.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-03-02  3:06               ` Peter Collingbourne
  2022-03-02  3:32                 ` Joao Moreira
@ 2022-06-08 17:53                 ` Fāng-ruì Sòng
  2022-06-09  0:05                   ` Sami Tolvanen
  1 sibling, 1 reply; 18+ messages in thread
From: Fāng-ruì Sòng @ 2022-06-08 17:53 UTC (permalink / raw)
  To: Peter Collingbourne
  Cc: Peter Zijlstra, Joao Moreira, Kees Cook, x86, hjl.tools, jpoimboe,
	andrew.cooper3, linux-kernel, ndesaulniers, samitolvanen, llvm

Hi Peter,

On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote:
>
> Hi Peter,
> One issue with this call sequence is that:
>
> On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> > caller:
> >       cmpl    $0xdeadbeef, -0x4(%rax)         # 7 bytes
>
> Because this instruction ends in the constant 0xdeadbeef, it may
> be used as a "gadget" that would effectively allow branching to an
> arbitrary address in %rax if the attacker can arrange to set ZF=1.

Do you mind elaborating how this instruction can be used as a gadget?
How does it look like?

The information will be useful to the summary of Sami's KCFI LLVM
patch: https://reviews.llvm.org/D119296

> >       je      1f                              # 2 bytes
> >       ud2                                     # 2 bytes
> > 1:    call    __x86_indirect_thunk_rax        # 5 bytes
> >
> >
> >       .align 16
> >       .byte 0xef, 0xbe, 0xad, 0xde            # 4 bytes
> > func:
> >       endbr                                   # 4 bytes
> >       ...
> >       ret
>
> I think we can avoid this problem with a slight tweak to your
> instruction sequence, at the cost of 2 bytes per function prologue.
> First, change the call sequence like so:
>
>         cmpl    $0xdeadbeef, -0x6(%rax)         # 6 bytes
>         je      1f                              # 2 bytes
>         ud2                                     # 2 bytes
> 1:      call    __x86_indirect_thunk_rax        # 5 bytes
>
> The key difference is that we've changed 0x4 to 0x6.
>
> Then change the function prologue to this:
>
>         .align 16
>         .byte 0xef, 0xbe, 0xad, 0xde            # 4 bytes
>         .zero 2                                 # 2 bytes
> func:
>
> The end result of the above is that the constant embedded in the cmpl
> instruction may only be used to reach the following ud2 instruction,
> which will "harmlessly" terminate execution in the same way as if
> the prologue signature did not match.
>
> Peter
>


-- 
宋方睿

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups
  2022-06-08 17:53                 ` Fāng-ruì Sòng
@ 2022-06-09  0:05                   ` Sami Tolvanen
  0 siblings, 0 replies; 18+ messages in thread
From: Sami Tolvanen @ 2022-06-09  0:05 UTC (permalink / raw)
  To: Fāng-ruì Sòng
  Cc: Peter Collingbourne, Peter Zijlstra, Joao Moreira, Kees Cook,
	X86 ML, hjl.tools, Josh Poimboeuf, andrew.cooper3, LKML,
	Nick Desaulniers, llvm

On Wed, Jun 8, 2022 at 10:53 AM Fāng-ruì Sòng <maskray@google.com> wrote:
>
> Hi Peter,
>
> On Tue, Mar 1, 2022 at 7:06 PM Peter Collingbourne <pcc@google.com> wrote:
> >
> > Hi Peter,
> > One issue with this call sequence is that:
> >
> > On Fri, Feb 11, 2022 at 02:38:03PM +0100, Peter Zijlstra wrote:
> > > caller:
> > >       cmpl    $0xdeadbeef, -0x4(%rax)         # 7 bytes
> >
> > Because this instruction ends in the constant 0xdeadbeef, it may
> > be used as a "gadget" that would effectively allow branching to an
> > arbitrary address in %rax if the attacker can arrange to set ZF=1.
>
> Do you mind elaborating how this instruction can be used as a gadget?
> How does it look like?

With the offset of -4, the je instruction here can be an indirect call
target because it's preceded by a valid type hash at the end of the
cmpl instruction. If we change the offset to -6, only the ud2
instruction is a potential call target in this sequence, which will be
less useful to an attacker.

> The information will be useful to the summary of Sami's KCFI LLVM
> patch: https://reviews.llvm.org/D119296

I'll add more information about the X86 preamble to the description.

Sami

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-06-09  0:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211122170301.764232470@infradead.org>
     [not found] ` <20211122170805.338489412@infradead.org>
     [not found]   ` <6ebb0ab131c522f20c094294d49091fc@overdrivepizza.com>
     [not found]     ` <202202081541.900F9E1B@keescook>
     [not found]       ` <ad6c2633f39e39583bc5c5eaf7ccbe52@overdrivepizza.com>
2022-02-09  4:05         ` [RFC][PATCH 6/6] objtool: Add IBT validation / fixups Kees Cook
2022-02-09  5:18           ` Joao Moreira
2022-02-11 13:38             ` Peter Zijlstra
2022-02-14 21:38               ` Sami Tolvanen
2022-02-14 22:25                 ` Peter Zijlstra
2022-02-15 16:56                   ` Sami Tolvanen
2022-02-15 20:03                     ` Kees Cook
2022-02-15 21:05                       ` Peter Zijlstra
2022-02-15 23:05                         ` Kees Cook
2022-02-15 23:38                           ` Joao Moreira
2022-02-16 12:24                         ` Peter Zijlstra
2022-02-15 20:53                     ` Peter Zijlstra
2022-02-15 22:45               ` Joao Moreira
2022-02-16  0:57               ` Andrew Cooper
2022-03-02  3:06               ` Peter Collingbourne
2022-03-02  3:32                 ` Joao Moreira
2022-06-08 17:53                 ` Fāng-ruì Sòng
2022-06-09  0:05                   ` Sami Tolvanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox