* Re: [RFC] Circumventing FineIBT Via Entrypoints [not found] <Z60NwR4w/28Z7XUa@ubun> @ 2025-02-12 22:29 ` Jann Horn 2025-02-13 1:31 ` Andrew Cooper 2025-02-13 6:15 ` Jennifer Miller 0 siblings, 2 replies; 40+ messages in thread From: Jann Horn @ 2025-02-12 22:29 UTC (permalink / raw) To: Jennifer Miller, Andy Lutomirski Cc: linux-hardening, kees, joao, samitolvanen, kernel list +Andy Lutomirski (X86 entry code maintainer) On Wed, Feb 12, 2025 at 10:08 PM Jennifer Miller <jmill@asu.edu> wrote: > As part of a recently accepted paper we demonstrated that syscall > entrypoints can be misused on x86-64 systems to generically bypass > FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We > communicated this finding to s@k.o before submitting the paper and were > encouraged to bring the issue to hardening after the paper was accepted to > have a discussion on how to address the issue. > > The bypass takes advantage of the architectural requirement of entrypoints > to begin with the endbr64 instruction and the ability to control GS_BASE > from userspace via wrgsbase, from to the FSGSBASE extension, in order to > perform a stack pivot to a ROP-chain. Oh, fun, that's a gnarly quirk. > Here is a snippet of the 64-bit entrypoint code: > ``` > entry_SYSCALL_64: > <+0>: endbr64 > <+4>: swapgs > <+7>: mov QWORD PTR gs:0x6014,rsp > <+16>: jmp <entry_SYSCALL_64+36> > <+18>: mov rsp,cr3 > <+21>: nop > <+26>: and rsp,0xffffffffffffe7ff > <+33>: mov cr3,rsp > <+36>: mov rsp,QWORD PTR gs:0x32c98 > ``` > > This is a valid target from any indirect callsite under FineIBT due to the > endbr64 instruction and the lack of a software CFI check. After hijacking > control flow to the entrypoint, executing swapgs will swap to the user > controlled GS_BASE, which will be used to set the stack pointer, leading to > a stack pivot. The rest of the entrypoint will execute with a hijacked > GS_BASE on a user controlled stack. The stack page we use is one mapped in > the user address space and from another thread we race overwriting returns > addresses on the stack to pivot a second time to a ROP-chain. For this to > succeed we required a large area of user-controlled kernel memory that can > serve as the forged GS_BASE address, we did this by spraying 2MB > Transparent Huge Pages to fill the kernel physical memory map with > controlled 2MB allocations and guessing relative to the base address of the > area to hit a page we control. > > We evaluated an approach to patching the issue in the paper but it touched > the userspace API a bit, added an error code returned by syscalls if they > are invoked with a kernel address in GS_BASE, which is not a great > solution. > > Linus provided some thoughts on how to potentially address this issue > in our communication with s@k.o, suggesting the kernel could make the > KERNEL_GS_BASE match the GS_BASE value so both registers always contain a > valid kernel address and a confusion induced by executing swapgs an extra > time cannot occur, and restore the value of KERNEL_GS_BASE ahead of > executing swapgs in the exit path. > > I started working on a patch based on the approach suggested by Linus but I > haven't been able to get it passing the relevant x86 selftests yet. It > turned out that it's more than the entrypoint code that needs to be > modified for it to work, we need to correctly save and restore the user's > GS_BASE across task switches and ensure it is updated correctly when set > via arch_prctl and ptrace. Unfortunately, I lack familiarity with those > parts of the kernel, and my understanding is that the paper will be made > public in a couple weeks so I didn't want to delay too long on bringing the > issue to this list. > > Assuming this is an issue you all feel is worth addressing, I will continue > working on providing a patch. I'm concerned though that the overhead from > adding a wrmsr on both syscall entry and exit to overwrite and restore the > KERNEL_GS_BASE MSR may be quite high, so any feedback in regards to the > approach or suggestions of alternate approaches to patching are welcome :) Since the kernel, as far as I understand, uses FineIBT without backwards control flow protection (in other words, I think we assume that the kernel stack is trusted?), could we build a cheaper check on that basis somehow? For example, maybe we could do something like: ``` endbr64 test rsp, rsp js slowpath swapgs ``` So we'd have the fast normal case where RSP points to userspace (meaning we can't be coming from the kernel unless our stack has already been pivoted, in which case forward edge protection alone can't help anymore), and the slow case where RSP points to kernel memory - in that case we'd then have to do some slower checks to figure out whether weird userspace is making a syscall with RSP pointing to the kernel, or whether we're coming from hijacked kernel control flow. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-12 22:29 ` [RFC] Circumventing FineIBT Via Entrypoints Jann Horn @ 2025-02-13 1:31 ` Andrew Cooper 2025-02-13 2:09 ` Jann Horn ` (2 more replies) 2025-02-13 6:15 ` Jennifer Miller 1 sibling, 3 replies; 40+ messages in thread From: Andrew Cooper @ 2025-02-13 1:31 UTC (permalink / raw) To: jannh Cc: jmill, joao, kees, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) >> Assuming this is an issue you all feel is worth addressing, I will >> continue working on providing a patch. I'm concerned though that the >> overhead from adding a wrmsr on both syscall entry and exit to >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so >> any feedback in regards to the approach or suggestions of alternate >> approaches to patching are welcome :) > > Since the kernel, as far as I understand, uses FineIBT without > backwards control flow protection (in other words, I think we assume > that the kernel stack is trusted?), This is fun indeed. Linux cannot use supervisor shadow stacks because the mess around NMI re-entrancy (and IST more generally) requires ROP gadgets in order to function safely. Implementing this with shadow stacks active, while not impossible, is deemed to be prohibitively complicated. Linux's supervisor shadow stack support is waiting for FRED support, which fixes both the NMI re-entrancy problem, and other exceptions nesting within NMIs, as well as prohibiting the use of the SWAPGS instruction as FRED tries to make sure that the correct GS is always in context. But, FRED support is slated for PantherLake/DiamondRapids which haven't shipped yet, so are no use to the problem right now. > could we build a cheaper > check on that basis somehow? For example, maybe we could do something like: > > ``` > endbr64 > test rsp, rsp > js slowpath > swapgs > ``` I presume it's been pointed out already, but there are 3 related entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER which are related. But, any other IDT entry is in a similar bucket. If we're corrupting a function pointer or return address to redirect here, then the check of CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers stack frame. For IDT entries, checking %rsp is reasonable, because userspace can't forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp entirely attacker controlled (and even potentially non-canonical), so I'm wondering what you hand in mind for the slowpath to truly distinguish kernel context from user context? ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 1:31 ` Andrew Cooper @ 2025-02-13 2:09 ` Jann Horn 2025-02-13 2:42 ` Andrew Cooper 2025-02-13 20:28 ` Kees Cook 2025-02-14 9:54 ` Peter Zijlstra 2 siblings, 1 reply; 40+ messages in thread From: Jann Horn @ 2025-02-13 2:09 UTC (permalink / raw) To: Andrew Cooper Cc: jmill, joao, kees, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On Thu, Feb 13, 2025 at 2:31 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > >> Assuming this is an issue you all feel is worth addressing, I will > >> continue working on providing a patch. I'm concerned though that the > >> overhead from adding a wrmsr on both syscall entry and exit to > >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > >> any feedback in regards to the approach or suggestions of alternate > >> approaches to patching are welcome :) > > > > Since the kernel, as far as I understand, uses FineIBT without > > backwards control flow protection (in other words, I think we assume > > that the kernel stack is trusted?), > > This is fun indeed. Linux cannot use supervisor shadow stacks because > the mess around NMI re-entrancy (and IST more generally) requires ROP > gadgets in order to function safely. Implementing this with shadow > stacks active, while not impossible, is deemed to be prohibitively > complicated. > > Linux's supervisor shadow stack support is waiting for FRED support, > which fixes both the NMI re-entrancy problem, and other exceptions > nesting within NMIs, as well as prohibiting the use of the SWAPGS > instruction as FRED tries to make sure that the correct GS is always in > context. > > But, FRED support is slated for PantherLake/DiamondRapids which haven't > shipped yet, so are no use to the problem right now. > > > could we build a cheaper > > check on that basis somehow? For example, maybe we could do something like: > > > > ``` > > endbr64 > > test rsp, rsp > > js slowpath > > swapgs > > ``` > > I presume it's been pointed out already, but there are 3 related > entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER > which are related. > > But, any other IDT entry is in a similar bucket. If we're corrupting a > function pointer or return address to redirect here, then the check of > CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers > stack frame. > > For IDT entries, checking %rsp is reasonable, because userspace can't > forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp > entirely attacker controlled (and even potentially non-canonical), so > I'm wondering what you hand in mind for the slowpath to truly > distinguish kernel context from user context? Hm, yeah, that seems hard - maybe the best we could do is to make sure that the inactive gsbase has the correct value for our CPU's kernel gsbase? Kinda like a paranoid_entry, except more painful because we'd first have to figure out a place to spill registers to before we can start using stuff like rdmsr... Then a function pointer overwrite might still turn into returning to userspace with a sysret with GPRs full of kernel pointers, but at least we wouldn't run off of a bogus gsbase anymore? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 2:09 ` Jann Horn @ 2025-02-13 2:42 ` Andrew Cooper 2025-02-22 20:43 ` Rudolf Marek 2025-02-28 12:13 ` Florian Weimer 0 siblings, 2 replies; 40+ messages in thread From: Andrew Cooper @ 2025-02-13 2:42 UTC (permalink / raw) To: Jann Horn Cc: jmill, joao, kees, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On 13/02/2025 2:09 am, Jann Horn wrote: > On Thu, Feb 13, 2025 at 2:31 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> Assuming this is an issue you all feel is worth addressing, I will >>>> continue working on providing a patch. I'm concerned though that the >>>> overhead from adding a wrmsr on both syscall entry and exit to >>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so >>>> any feedback in regards to the approach or suggestions of alternate >>>> approaches to patching are welcome :) >>> Since the kernel, as far as I understand, uses FineIBT without >>> backwards control flow protection (in other words, I think we assume >>> that the kernel stack is trusted?), >> This is fun indeed. Linux cannot use supervisor shadow stacks because >> the mess around NMI re-entrancy (and IST more generally) requires ROP >> gadgets in order to function safely. Implementing this with shadow >> stacks active, while not impossible, is deemed to be prohibitively >> complicated. >> >> Linux's supervisor shadow stack support is waiting for FRED support, >> which fixes both the NMI re-entrancy problem, and other exceptions >> nesting within NMIs, as well as prohibiting the use of the SWAPGS >> instruction as FRED tries to make sure that the correct GS is always in >> context. >> >> But, FRED support is slated for PantherLake/DiamondRapids which haven't >> shipped yet, so are no use to the problem right now. >> >>> could we build a cheaper >>> check on that basis somehow? For example, maybe we could do something like: >>> >>> ``` >>> endbr64 >>> test rsp, rsp >>> js slowpath >>> swapgs >>> ``` >> I presume it's been pointed out already, but there are 3 related >> entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER >> which are related. >> >> But, any other IDT entry is in a similar bucket. If we're corrupting a >> function pointer or return address to redirect here, then the check of >> CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers >> stack frame. >> >> For IDT entries, checking %rsp is reasonable, because userspace can't >> forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp >> entirely attacker controlled (and even potentially non-canonical), so >> I'm wondering what you hand in mind for the slowpath to truly >> distinguish kernel context from user context? > Hm, yeah, that seems hard - maybe the best we could do is to make sure > that the inactive gsbase has the correct value for our CPU's kernel > gsbase? Kinda like a paranoid_entry, except more painful because we'd > first have to figure out a place to spill registers to before we can > start using stuff like rdmsr... Then a function pointer overwrite > might still turn into returning to userspace with a sysret with GPRs > full of kernel pointers, but at least we wouldn't run off of a bogus > gsbase anymore? Thinking about this some more, I think it's impossible to distinguish. One of the many sharp edges of SYSCALL (and SYSENTER for that matter) is that they're instructions expected to be only be used by userspace, but that be executed in supervisor too[1]. They're asymmetric with their SYSRET (and SYSEXIT) counterparts which are CPL0 instructions that strictly transition into CPL3. The SYSCALL behaviour TLDR is: %rcx = %rip %r11 = %eflags %cs = fixed attr %ss = fixed attr %rip = MSR_LSTAR which means that %rcx (old rip) is the only piece of state which userspace can't feasibly forge (and therefore could distinguish a SYSCALL from user vs kernel mode), yet if we're talking about a JOP chain to get here, then %rcx is under attacker control too. There are a variety of solutions to this problem that involve not using %gs for per-cpu data. I also expect that to be wholly unpopular and dismissed as an approach. ~Andrew [1] No-one back then was brave enough to design CPL3-only instructions. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 2:42 ` Andrew Cooper @ 2025-02-22 20:43 ` Rudolf Marek 2025-02-25 18:10 ` Andrew Cooper 2025-02-28 12:13 ` Florian Weimer 1 sibling, 1 reply; 40+ messages in thread From: Rudolf Marek @ 2025-02-22 20:43 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers Hi, Dne 13. 02. 25 v 3:42 Andrew Cooper napsal(a): > The SYSCALL behaviour TLDR is: > > %rcx = %rip > %r11 = %eflags > %cs = fixed attr > %ss = fixed attr > %rip = MSR_LSTAR > > which means that %rcx (old rip) is the only piece of state which > userspace can't feasibly forge (and therefore could distinguish a > SYSCALL from user vs kernel mode), yet if we're talking about a JOP > chain to get here, then %rcx is under attacker control too. The SYSCALL instruction also provides means to create "incoherent" state of the processor selectors where the value of selector do not match pre-loaded values in the descriptor caches. Would it work to have KERNEL_CS as last entry in the GDT table? Therefore executing SYSCALL would set the CS as usual, but the numeric value of SS selector would be larger than GDT limit? That would mean that "impossible" selector is loaded into SS if we came from usermode, but operation with stack would still work as the descriptor caches will be sane. The "impossible" selector value can be fixed by loading SS with NULL which is cheap. The check in hotpath could maybe use VERR %SS which would fail because of GDT limit is reached. The VERR with mem operand does not use any GPR! Or simply check for "impossible" selector would work if we misuse zeros in high32 of R11 (usermode rflags) maybe like: entry: endbr64 rol $32, %r11 movw %ss, %r11w cmpw $IMPOSSIBLE_SEL, %r11w jnz panic ; load null to SS, fix R11 and pretend above never happened If attacker would execute SYSCALL in the kernel, likely we could check the %RCX if it is OK or not? Bit variation to this "theme" would be to have SYSCALL SS GDT entry still in the GDT but set as "not present". Another brainstorm idea would be to misuse RFLAGS.ID and clear it in MSR FMASK but run kernel or most of it with RFLAGS.ID set. I don't know what is the threat model you are trying to fix. Lets fight x86 insanity with yet another x86 insanity - I think it is fair. I hope above helps or at least I will learn why not if I overseen something obvious! I tried to CC all the lists. I'm not subscribed. Thanks, Rudolf PS: I'm leaving as an exercise to a reader NMI and #MC handling! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-22 20:43 ` Rudolf Marek @ 2025-02-25 18:10 ` Andrew Cooper 2025-02-25 20:06 ` Rudolf Marek 0 siblings, 1 reply; 40+ messages in thread From: Andrew Cooper @ 2025-02-25 18:10 UTC (permalink / raw) To: Rudolf Marek, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers On 22/02/2025 8:43 pm, Rudolf Marek wrote: > Hi, > > Dne 13. 02. 25 v 3:42 Andrew Cooper napsal(a): >> The SYSCALL behaviour TLDR is: >> >> %rcx = %rip >> %r11 = %eflags >> %cs = fixed attr >> %ss = fixed attr >> %rip = MSR_LSTAR >> >> which means that %rcx (old rip) is the only piece of state which >> userspace can't feasibly forge (and therefore could distinguish a >> SYSCALL from user vs kernel mode), yet if we're talking about a JOP >> chain to get here, then %rcx is under attacker control too. > > The SYSCALL instruction also provides means to create "incoherent" > state of the processor selectors > where the value of selector do not match pre-loaded values in the > descriptor caches. Very cunning. Yes it does, but the state needs to be safe to IRET back to, and ... > Would it work to have KERNEL_CS as last entry in the GDT table? > Therefore executing SYSCALL would set the CS as usual, > but the numeric value of SS selector would be larger than GDT limit? ... this isn't safe. Any exception/interrupt will yield #SS when trying to load an out-of-limit %ss. i.e. a wrongly-timed NMI will take out the system with a very bizarre looking oops. You can do this in a less fatal way by e.g. having in-GDT form have a segment limit, but any exception/interrupt will resync the out-of-sync state, and break detection. Also it would make the segment unusable for compatibility userspace, where the limit would take effect. Finally, while this potentially gives us an option for SYSCALL and maybe SYSENTER, it doesn't help with any of the main IDT entrypoints which can also be attacked. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-25 18:10 ` Andrew Cooper @ 2025-02-25 20:06 ` Rudolf Marek 2025-02-25 21:14 ` Andrew Cooper 0 siblings, 1 reply; 40+ messages in thread From: Rudolf Marek @ 2025-02-25 20:06 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers Hi Andrew, Dne 25. 02. 25 v 19:10 Andrew Cooper napsal(a): > Very cunning. Yes it does, but the state needs to be safe to IRET back > to, and ... ... And intellectually very pleasing! >> Would it work to have KERNEL_CS as last entry in the GDT table? >> Therefore executing SYSCALL would set the CS as usual, >> but the numeric value of SS selector would be larger than GDT limit? > > ... this isn't safe. Any exception/interrupt will yield #SS when trying > to load an out-of-limit %ss.> i.e. a wrongly-timed NMI will take out the system with a very bizarre > looking oops. Hmm I was hoping that "the reader" will perform this NMI/#MC exercise :) The SYSCALL/SYSENTER startup has interrupts disabled, so it is the problem of NMI/#MC handler which would need deal with the normal case and attack case. It would need to check if it was executing that critical part of syscall64 entry from endbr64 to checkselector section, and if yes, the saved %ss needs to be "impossible" one. If it isn't -> panic. For non-attack case it just needs to forward RIP after the check... > You can do this in a less fatal way by e.g. having in-GDT form have a > segment limit, but any exception/interrupt will resync the out-of-sync > state, and break detection. Also it would make the segment unusable for > compatibility userspace, where the limit would take effect. Yeah couldn't figure out what else could work "vice-versa" :( > Finally, while this potentially gives us an option for SYSCALL and maybe > SYSENTER, it doesn't help with any of the main IDT entrypoints which can > also be attacked. I see, sorry I wasn't aware of this. But if I recall correctly only "paranoid" IDT entries do something with swapgs. But is there also some stack pivot where it would depend on GS? Or is it somewhat unrelated issue, that you might just redirect to "any endbr64" which are IDT entrypoints? Maybe you can share some details of how the attack would work in this case, or point me somewhere where I can read about it. If it is "any endbr64" case, would it work to just do "sanity check" of the exception stackframe? I mean if it is real or some random kernel stack state? 1) check %RSP alignment if it was ok 2) check if %ss and %cs for all possible valid values (16 bit) Unfortunately I think intel is not clearing high 48 bits of saved selector, AMD is. 3) check if %rip is kernel range 4) check if %rflags is sane (bit1 is 1) Because if the attacker has no or limited control on the stack content, it would be difficult to fake it. Thanks, Rudolf ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-25 20:06 ` Rudolf Marek @ 2025-02-25 21:14 ` Andrew Cooper 2025-02-26 2:55 ` Kees Cook 2025-02-26 22:48 ` Rudolf Marek 0 siblings, 2 replies; 40+ messages in thread From: Andrew Cooper @ 2025-02-25 21:14 UTC (permalink / raw) To: Rudolf Marek, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers On 25/02/2025 8:06 pm, Rudolf Marek wrote: > Hi Andrew, > > Dne 25. 02. 25 v 19:10 Andrew Cooper napsal(a): >> Very cunning. Yes it does, but the state needs to be safe to IRET back >> to, and ... > > ... And intellectually very pleasing! > >>> Would it work to have KERNEL_CS as last entry in the GDT table? >>> Therefore executing SYSCALL would set the CS as usual, >>> but the numeric value of SS selector would be larger than GDT limit? >> >> ... this isn't safe. Any exception/interrupt will yield #SS when trying >> to load an out-of-limit %ss.> i.e. a wrongly-timed NMI will take out >> the system with a very bizarre >> looking oops. > > Hmm I was hoping that "the reader" will perform this NMI/#MC exercise :) As stand-in for "the reader", I'll point out that you need to add #DB to that list or you're in for a rude surprise when running the x86 selftests. > > The SYSCALL/SYSENTER startup has interrupts disabled, so it is the > problem of NMI/#MC > handler which would need deal with the normal case and attack case. Right, but in the case of the attack, regular interrupts are most likely enabled too. And writing this has just caused me to realise a yet-more-fun case. An interrupt hitting the syscall entry path (prior to SWAPGS) will cause the interrupt handler's CPL check and conditional SWAPGS to do the wrong thing and switch onto the user GS base too. (Prior research e.g. GhostRace has shown how to get an hrtimer to reliably hit an instruction boundary.) i.e. you'd need paranoid_entry on every vector, not just the IST ones. > > It would need to check if it was executing that critical part of > syscall64 entry > from endbr64 to checkselector section, and if yes, the saved %ss needs > to be > "impossible" one. If it isn't -> panic. > > For non-attack case it just needs to forward RIP after the check... > >> You can do this in a less fatal way by e.g. having in-GDT form have a >> segment limit, but any exception/interrupt will resync the out-of-sync >> state, and break detection. Also it would make the segment unusable for >> compatibility userspace, where the limit would take effect. > > Yeah couldn't figure out what else could work "vice-versa" :( > >> Finally, while this potentially gives us an option for SYSCALL and maybe >> SYSENTER, it doesn't help with any of the main IDT entrypoints which can >> also be attacked. > > I see, sorry I wasn't aware of this. But if I recall correctly only > "paranoid" > IDT entries do something with swapgs. But is there also some stack > pivot where > it would depend on GS? Or is it somewhat unrelated issue, that you > might just > redirect to "any endbr64" which are IDT entrypoints? > > Maybe you can share some details of how the attack would work in this > case, > or point me somewhere where I can read about it. > > If it is "any endbr64" case, would it work to just do "sanity check" > of the exception stackframe? The problem is type confusion. Because ENDBR marks both the regular function callees, and the system entrypoints (256*IDT + 2*SYSCALL + SYSENTER), a function pointer corrupted to refer to a system entrypoint will pass the CET-IBT check and not yield #CP. All entrypoints then conditionally (IDT) or unconditionally (SYSCALL/SYSENTER) SWAPGS. For the attack case, this switches back onto the user gs base. Interrupts and exceptions look at %cs in the IRET frame to judge whether to SWAPGS or not (and this is one of the main things that paranoid_entry does differently). In the case of the attack, there's no IRET frame pushed on the stack and the read of %cs is out-of-bounds, most likely the stack frame of the function which followed the corrupt function pointer. The SYSCALL entrypoint is simply the easiest to pivot on, but all can be attacked in this manner. Fixing only the SYSCALL entrypoint doesn't improve things much. Peter Zijlstra has added a FineIBT=paranoid mode which performs the hash check ahead of calling the function pointer, which ought to mitigate this but at even higher overhead. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-25 21:14 ` Andrew Cooper @ 2025-02-26 2:55 ` Kees Cook 2025-02-26 22:48 ` Rudolf Marek 1 sibling, 0 replies; 40+ messages in thread From: Kees Cook @ 2025-02-26 2:55 UTC (permalink / raw) To: Andrew Cooper, Rudolf Marek, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers On February 25, 2025 1:14:01 PM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >Peter Zijlstra has added a FineIBT=paranoid mode which performs the hash >check ahead of calling the function pointer, which ought to mitigate >this but at even higher overhead. Was kCFI vs FineIBT perf ever measured? Is the assumption of higher overhead based on kCFI filling dcache in addition to icache, whereas FineIBT only fills icache? -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-25 21:14 ` Andrew Cooper 2025-02-26 2:55 ` Kees Cook @ 2025-02-26 22:48 ` Rudolf Marek 2025-02-27 0:41 ` Andrew Cooper 1 sibling, 1 reply; 40+ messages in thread From: Rudolf Marek @ 2025-02-26 22:48 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers Hi Andrew, Dne 25. 02. 25 v 22:14 Andrew Cooper napsal(a): > As stand-in for "the reader", I'll point out that you need to add #DB to > that list or you're in for a rude surprise when running the x86 selftests. Thanks for pointing this out. I forgot about the interrupt shadow on SYSCALL and possibly some breakpoints possibilities in the kernel. >> The SYSCALL/SYSENTER startup has interrupts disabled, so it is the >> problem of NMI/#MC >> handler which would need deal with the normal case and attack case. > > Right, but in the case of the attack, regular interrupts are most likely > enabled too. And writing this has just caused me to realise a > yet-more-fun case. > An interrupt hitting the syscall entry path (prior to SWAPGS) will cause > the interrupt handler's CPL check and conditional SWAPGS to do the wrong > thing and switch onto the user GS base too. (Prior research e.g. > GhostRace has shown how to get an hrtimer to reliably hit an instruction > boundary.) I don't see it, because if attacker starts at syscall entry and interrupts are enabled and the interrupt happens right there the handler will just see proper IRET frame with %cs of kernel and will not perform swapgs. I will try to think about it again tomorrow I likely missed something. > Interrupts and exceptions look at %cs in the IRET frame to judge whether > to SWAPGS or not (and this is one of the main things that paranoid_entry > does differently). In the case of the attack, there's no IRET frame > pushed on the stack and the read of %cs is out-of-bounds, most likely > the stack frame of the function which followed the corrupt function pointer. Thank you for your detailed explanation. > The SYSCALL entrypoint is simply the easiest to pivot on, but all can be > attacked in this manner. Fixing only the SYSCALL entrypoint doesn't > improve things much. Maybe more elegant and cheap check on IDT entry "authenticity" would be to check for current %ss which needs to be NULL and possibly check the %CS on stack frame by checking kernel %cs and not just two CPL bits and/or perform more checks. Another ideas if you think it is still worth to discuss this topic: What about to use completely different %CS selector for all entry code? The early entry code would check the %cs selector and panic if it is wrong one. After swapgs dance, we need to perform far jump to normal kernel %CS, which might cost something. To fix the interrupt on fake entry problem, we could check in relevant IDT handlers that we never come from "completely different" %CS used above for the early entry code. And very last idea would be to somehow persuade the Last Branch Recording to record exception entries only and just check it from MSR. But maybe it is too costly and/or not possible. Thanks, Rudolf ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-26 22:48 ` Rudolf Marek @ 2025-02-27 0:41 ` Andrew Cooper 2025-03-01 22:48 ` Rudolf Marek 0 siblings, 1 reply; 40+ messages in thread From: Andrew Cooper @ 2025-02-27 0:41 UTC (permalink / raw) To: Rudolf Marek, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers On 26/02/2025 10:48 pm, Rudolf Marek wrote: > Hi Andrew, > > Dne 25. 02. 25 v 22:14 Andrew Cooper napsal(a): >> As stand-in for "the reader", I'll point out that you need to add #DB to >> that list or you're in for a rude surprise when running the x86 >> selftests. > > Thanks for pointing this out. I forgot about the interrupt shadow on > SYSCALL > and possibly some breakpoints possibilities in the kernel. Isn't x86 lovely. This is yet another thing fixed in FRED; a CPL change cancels pending_dbg. > >>> The SYSCALL/SYSENTER startup has interrupts disabled, so it is the >>> problem of NMI/#MC >>> handler which would need deal with the normal case and attack case. >> >> Right, but in the case of the attack, regular interrupts are most likely >> enabled too. And writing this has just caused me to realise a >> yet-more-fun case. >> An interrupt hitting the syscall entry path (prior to SWAPGS) will cause >> the interrupt handler's CPL check and conditional SWAPGS to do the wrong >> thing and switch onto the user GS base too. (Prior research e.g. >> GhostRace has shown how to get an hrtimer to reliably hit an instruction >> boundary.) > > I don't see it, because if attacker starts at syscall entry and > interrupts are enabled and the interrupt happens right there the > handler will just see proper IRET frame with %cs of kernel and will > not perform swapgs. I will try to think about it again tomorrow I > likely missed something. Nope, you're correct. I meant (after the SWAPGS). The linear sequence of actions is: * Follow bad fnptr to the SYSCALL entry * SWAPGS (now on user gs) * Interrupt. Handler sees %cs == kernel, so doesn't SWAPGS again * Interrupt handler runs fully on user gs. > >> Interrupts and exceptions look at %cs in the IRET frame to judge whether >> to SWAPGS or not (and this is one of the main things that paranoid_entry >> does differently). In the case of the attack, there's no IRET frame >> pushed on the stack and the read of %cs is out-of-bounds, most likely >> the stack frame of the function which followed the corrupt function >> pointer. > > Thank you for your detailed explanation. > >> The SYSCALL entrypoint is simply the easiest to pivot on, but all can be >> attacked in this manner. Fixing only the SYSCALL entrypoint doesn't >> improve things much. > > Maybe more elegant and cheap check on IDT entry "authenticity" would > be to check for current %ss which needs to be NULL and possibly check > the %CS on stack frame > by checking kernel %cs and not just two CPL bits and/or perform more > checks. > > Another ideas if you think it is still worth to discuss this topic: > > What about to use completely different %CS selector for all entry > code? The early entry code would check the %cs selector and panic if > it is wrong one. > > After swapgs dance, we need to perform far jump to normal kernel %CS, > which might cost something. > > To fix the interrupt on fake entry problem, we could check in relevant > IDT handlers that we never come from "completely different" %CS used > above for the early entry code. Ooh, this looks promising. For IDT it's quite easy. Have a separate DPL0 %cs in the GDT, and write it into the IDT. For SYSCALL/SYSENTER it's a little more complicated. I think you want to move the selectors so they don't alias __KERN_CS directly, so you can then move back to __KERN_CS in a similar way. Give or take paranoid_entry for the IST vectors, any entrypoint that finds itself on __KERN_CS did not get there through the CPU loading a new context. It would depend on an attacker not being able to include a FAR CALL into their exploit chain, or be able toe write the IDT. I don't know how reasonable that would be if we're ruling out all architectural paths not beginning with an ENDBR, but FAR CALLs are rare in general owing to them being dog slow in general, and an attacker who can write the IDT doesn't need these kinds of games to pivot. We do need at least one scratch register to check %cs. For IDT and SYSENTER entries, we can reasonably well spill to the stack (again, an attacker that can modify the stack has won without playing these games), and for SYSCALL, we can use the low part of %r11 as you already demonstrated. Anyone fancy doing a prototype of this? > > And very last idea would be to somehow persuade the Last Branch > Recording to record exception entries only and just check it from MSR. > But maybe it is too costly and/or not possible. This doesn't cover all cases, I don't think. It also won't work under virt, where LBR isn't reliably available. Also LBR is reasonably full of errata, and quite slow. Also VMX clears it unilaterally on vmexit, and at least we don't have an ENDBR in that path to worry about. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-27 0:41 ` Andrew Cooper @ 2025-03-01 22:48 ` Rudolf Marek 2025-03-02 19:16 ` Rudolf Marek 0 siblings, 1 reply; 40+ messages in thread From: Rudolf Marek @ 2025-03-01 22:48 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers Hi Andrew, Dne 27. 02. 25 v 1:41 Andrew Cooper napsal(a): > For SYSCALL/SYSENTER it's a little more complicated. I think you want > to move the selectors so they don't alias __KERN_CS directly, so you can > then move back to __KERN_CS in a similar way Yes I thought the CHECK_CS could be right before KERN_DS so at least kernel SS is right. > Give or take paranoid_entry for the IST vectors, any entrypoint that > finds itself on __KERN_CS did not get there through the CPU loading a > new context. Yes > It would depend on an attacker not being able to include a FAR CALL into > their exploit chain, or be able toe write the IDT. I don't know how > reasonable that would be if we're ruling out all architectural paths not > beginning with an ENDBR, but FAR CALLs are rare in general owing to them > being dog slow in general, and an attacker who can write the IDT doesn't > need these kinds of games to pivot. In fact I wanted to use far jump, but is it OK? On 64-bit architecture, there is no absolute direct jump with CS change, only indirect one. Do all CPUs with FineIBT somehow reasonably handle all the spectre v2 and various other indirect branch speculation problems? To speed it up we can use "fallthrough" speculation to our advantage and include the target right after the instruction. > Anyone fancy doing a prototype of this? Maybe we can discuss following before, if you find this conversation still entertaining :) 1) Implement the different %cs for entry points Looks non-trivial for an attacker to obtain right %cs before landing on the IDT/SYSCALL entrypoints. Each entrypoint would check if current %cs is __KERN_CHECK_CS, and if not panic. Then it would change the %CS back to __KERN_CS via far jump. I don't know how slow is to do the jump back via far jump. 2) Implement some weaker version of what I was proposing and mostly checking the %ss. The attacker would need to control/load %SS before jumping to endbr64 or provide a reasonable exception stack SYSCALL: - maybe do "cli" to avoid issues with interrupts/nesting - would use valid but different %ss selector from __KERN_DS - would check if %ss == __KERN_CHECK_DS, if not panic - reload %SS with __KERN_DS selector IDT entrypoints: - maybe do "cli" to avoid issues with interrupts/nesting - if %SS == 0, skip other checks because CPL changed (maybe too weak?) - perform more sanity checks on exception stack maybe in a direction what I proposed in other email - depends if it makes attacker life miserable or not - reload %SS with __KERN_DS selector if CPL changed (maybe needed?) >> And very last idea would be to somehow persuade the Last Branch >> Recording to record exception entries only and just check it from MSR. >> But maybe it is too costly and/or not possible. > > This doesn't cover all cases, I don't think. It also won't work under > virt, where LBR isn't reliably available. Also LBR is reasonably full > of errata, and quite slow. OK thanks, it was just an idea. Thanks, Rudolf ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-03-01 22:48 ` Rudolf Marek @ 2025-03-02 19:16 ` Rudolf Marek 2025-03-02 22:31 ` Andrew Cooper 0 siblings, 1 reply; 40+ messages in thread From: Rudolf Marek @ 2025-03-02 19:16 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers Dne 01. 03. 25 v 23:48 Rudolf Marek napsal(a): > I don't know how slow is to do the jump back via far jump. I did some micro benchmark on Raptorlake platform using other operating system I'm very familiar with. I added following sequence to the SYSCALL64 entrypoint: .balign 16 syscallentry64: .byte 0x48 ljmp *jmpaddr(%rip) continuehere: swapgs <...> jmpaddr: .quad continuehere .word KERN_OTHER_CS << 3 And well, it is 1.5x slower. Unmodified syscall benchmark took on avg 261 cycles / 104 ns and the one with the indirect jump with %cs change took 386 cycles/ 154 ns. This whole thing is quite literally a trap next to a trap, because GAS wasn't adding REX.W prefix and somehow complained about ljmpq. Thanks, Rudolf ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-03-02 19:16 ` Rudolf Marek @ 2025-03-02 22:31 ` Andrew Cooper 0 siblings, 0 replies; 40+ messages in thread From: Andrew Cooper @ 2025-03-02 22:31 UTC (permalink / raw) To: Rudolf Marek, Jann Horn Cc: jmill, joao, luto, samitolvanen, Peter Zijlstra (Intel), linux-hardening, lkml, x86 maintainers On 02/03/2025 7:16 pm, Rudolf Marek wrote: > Dne 01. 03. 25 v 23:48 Rudolf Marek napsal(a): >> I don't know how slow is to do the jump back via far jump. > > I did some micro benchmark on Raptorlake platform using other > operating system I'm very familiar with. > > I added following sequence to the SYSCALL64 entrypoint: > > .balign 16 > syscallentry64: > .byte 0x48 > ljmp *jmpaddr(%rip) > continuehere: > swapgs > <...> > > jmpaddr: > .quad continuehere > .word KERN_OTHER_CS << 3 > > And well, it is 1.5x slower. Unmodified syscall benchmark took on avg > 261 cycles / 104 ns and the one with the indirect jump with %cs change > took > 386 cycles/ 154 ns. > > This whole thing is quite literally a trap next to a trap, because GAS > wasn't adding REX.W prefix and somehow complained about ljmpq. (I've not finished replying to your other email, but here's one bit brought forward) Sadly far jumps and calls are where Intel and AMD CPUs disagree on how to decode the instruction stream. Intel CPUs obey REX prefix for operand size, while AMD do not. i.e. AMD CPUs cannot far transfer to kernel addresses, at all. This is why you only see far returns generally, which do behave the same between vendors but require a stack. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 2:42 ` Andrew Cooper 2025-02-22 20:43 ` Rudolf Marek @ 2025-02-28 12:13 ` Florian Weimer 1 sibling, 0 replies; 40+ messages in thread From: Florian Weimer @ 2025-02-28 12:13 UTC (permalink / raw) To: Andrew Cooper Cc: Jann Horn, jmill, joao, kees, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) * Andrew Cooper: > The SYSCALL behaviour TLDR is: > > %rcx = %rip > %r11 = %eflags > %cs = fixed attr > %ss = fixed attr > %rip = MSR_LSTAR > > which means that %rcx (old rip) is the only piece of state which > userspace can't feasibly forge (and therefore could distinguish a > SYSCALL from user vs kernel mode), yet if we're talking about a JOP > chain to get here, then %rcx is under attacker control too. Will the syscall handler do anything useful if called with an invalid system call number? If not, and if you can changed the FineIBT cookie register to %rax, would that address this particular gap? As long as the cookies do not overlap with valid system call numbers? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 1:31 ` Andrew Cooper 2025-02-13 2:09 ` Jann Horn @ 2025-02-13 20:28 ` Kees Cook 2025-02-13 20:41 ` Andrew Cooper 2025-02-14 9:54 ` Peter Zijlstra 2 siblings, 1 reply; 40+ messages in thread From: Kees Cook @ 2025-02-13 20:28 UTC (permalink / raw) To: Andrew Cooper Cc: jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: > >> Assuming this is an issue you all feel is worth addressing, I will > >> continue working on providing a patch. I'm concerned though that the > >> overhead from adding a wrmsr on both syscall entry and exit to > >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > >> any feedback in regards to the approach or suggestions of alternate > >> approaches to patching are welcome :) > > > > Since the kernel, as far as I understand, uses FineIBT without > > backwards control flow protection (in other words, I think we assume > > that the kernel stack is trusted?), > > This is fun indeed. Linux cannot use supervisor shadow stacks because > the mess around NMI re-entrancy (and IST more generally) requires ROP > gadgets in order to function safely. Implementing this with shadow > stacks active, while not impossible, is deemed to be prohibitively > complicated. And just validate my understanding here, this attack is fundamentally about FineIBT, not regular CFI (IBT or not), as the validation of target addresses is done at indirect call time, yes? -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:28 ` Kees Cook @ 2025-02-13 20:41 ` Andrew Cooper 2025-02-13 20:53 ` Kees Cook 2025-02-14 10:05 ` Peter Zijlstra 0 siblings, 2 replies; 40+ messages in thread From: Andrew Cooper @ 2025-02-13 20:41 UTC (permalink / raw) To: Kees Cook Cc: jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On 13/02/2025 8:28 pm, Kees Cook wrote: > On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: >>>> Assuming this is an issue you all feel is worth addressing, I will >>>> continue working on providing a patch. I'm concerned though that the >>>> overhead from adding a wrmsr on both syscall entry and exit to >>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so >>>> any feedback in regards to the approach or suggestions of alternate >>>> approaches to patching are welcome :) >>> Since the kernel, as far as I understand, uses FineIBT without >>> backwards control flow protection (in other words, I think we assume >>> that the kernel stack is trusted?), >> This is fun indeed. Linux cannot use supervisor shadow stacks because >> the mess around NMI re-entrancy (and IST more generally) requires ROP >> gadgets in order to function safely. Implementing this with shadow >> stacks active, while not impossible, is deemed to be prohibitively >> complicated. > And just validate my understanding here, this attack is fundamentally > about FineIBT, not regular CFI (IBT or not), as the validation of target > addresses is done at indirect call time, yes? I'm not sure I'd classify it like that. As a pivot primitive, it works very widely. FineIBT (more specifically any hybrid CFI scheme which includes CET-IBT) relies on hardware to do the course grain violation detection, and some software hash for fine grain violation detection. In this case, the requirement for the SYSCALL entrypoint to have an ENDBR64 instruction means it passes the CET-IBT check (does not yield #CP), and then lacks the software hash check as well. i.e. this renders FineIBT (and other hybrid CFI schemes) rather moot, because one hole is all the attacker needs to win, if they can control a function pointer / return address. At which point it's a large overhead for no security benefit over simple CET-IBT. The problem is that SYSCALL entry/exit is a toxic operating mode, because you only have to think about sneezing and another user->kernel priv-esc appears. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:41 ` Andrew Cooper @ 2025-02-13 20:53 ` Kees Cook 2025-02-13 20:57 ` Jann Horn 2025-02-14 9:57 ` Peter Zijlstra 2025-02-14 10:05 ` Peter Zijlstra 1 sibling, 2 replies; 40+ messages in thread From: Kees Cook @ 2025-02-13 20:53 UTC (permalink / raw) To: Andrew Cooper Cc: jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On Thu, Feb 13, 2025 at 08:41:16PM +0000, Andrew Cooper wrote: > On 13/02/2025 8:28 pm, Kees Cook wrote: > > On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: > >>>> Assuming this is an issue you all feel is worth addressing, I will > >>>> continue working on providing a patch. I'm concerned though that the > >>>> overhead from adding a wrmsr on both syscall entry and exit to > >>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > >>>> any feedback in regards to the approach or suggestions of alternate > >>>> approaches to patching are welcome :) > >>> Since the kernel, as far as I understand, uses FineIBT without > >>> backwards control flow protection (in other words, I think we assume > >>> that the kernel stack is trusted?), > >> This is fun indeed. Linux cannot use supervisor shadow stacks because > >> the mess around NMI re-entrancy (and IST more generally) requires ROP > >> gadgets in order to function safely. Implementing this with shadow > >> stacks active, while not impossible, is deemed to be prohibitively > >> complicated. > > And just validate my understanding here, this attack is fundamentally > > about FineIBT, not regular CFI (IBT or not), as the validation of target > > addresses is done at indirect call time, yes? > > I'm not sure I'd classify it like that. As a pivot primitive, it works > very widely. > > FineIBT (more specifically any hybrid CFI scheme which includes CET-IBT) > relies on hardware to do the course grain violation detection, and some > software hash for fine grain violation detection. > > In this case, the requirement for the SYSCALL entrypoint to have an > ENDBR64 instruction means it passes the CET-IBT check (does not yield > #CP), and then lacks the software hash check as well. > > i.e. this renders FineIBT (and other hybrid CFI schemes) rather moot, > because one hole is all the attacker needs to win, if they can control a > function pointer / return address. At which point it's a large overhead > for no security benefit over simple CET-IBT. Right, the "if they can control a function pointer" is the part I'm focusing on. This attack depends on making an indirect call with a controlled pointer. Non-FineIBT CFI will protect against that step, so I think this is only an issue for IBT-only and FineIBT, but not CFI nor CFI+IBT. > The problem is that SYSCALL entry/exit is a toxic operating mode, > because you only have to think about sneezing and another user->kernel > priv-esc appears. Yeah, once an attacker can make an indirect call to a controlled address, everything falls apart. And using the entry just makes the pivot all that much easier to find/use. -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:53 ` Kees Cook @ 2025-02-13 20:57 ` Jann Horn 2025-02-16 23:42 ` Kees Cook 2025-02-14 9:57 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: Jann Horn @ 2025-02-13 20:57 UTC (permalink / raw) To: Kees Cook Cc: Andrew Cooper, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On Thu, Feb 13, 2025 at 9:53 PM Kees Cook <kees@kernel.org> wrote: > On Thu, Feb 13, 2025 at 08:41:16PM +0000, Andrew Cooper wrote: > > On 13/02/2025 8:28 pm, Kees Cook wrote: > > > On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: > > >>>> Assuming this is an issue you all feel is worth addressing, I will > > >>>> continue working on providing a patch. I'm concerned though that the > > >>>> overhead from adding a wrmsr on both syscall entry and exit to > > >>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > > >>>> any feedback in regards to the approach or suggestions of alternate > > >>>> approaches to patching are welcome :) > > >>> Since the kernel, as far as I understand, uses FineIBT without > > >>> backwards control flow protection (in other words, I think we assume > > >>> that the kernel stack is trusted?), > > >> This is fun indeed. Linux cannot use supervisor shadow stacks because > > >> the mess around NMI re-entrancy (and IST more generally) requires ROP > > >> gadgets in order to function safely. Implementing this with shadow > > >> stacks active, while not impossible, is deemed to be prohibitively > > >> complicated. > > > And just validate my understanding here, this attack is fundamentally > > > about FineIBT, not regular CFI (IBT or not), as the validation of target > > > addresses is done at indirect call time, yes? > > > > I'm not sure I'd classify it like that. As a pivot primitive, it works > > very widely. > > > > FineIBT (more specifically any hybrid CFI scheme which includes CET-IBT) > > relies on hardware to do the course grain violation detection, and some > > software hash for fine grain violation detection. > > > > In this case, the requirement for the SYSCALL entrypoint to have an > > ENDBR64 instruction means it passes the CET-IBT check (does not yield > > #CP), and then lacks the software hash check as well. > > > > i.e. this renders FineIBT (and other hybrid CFI schemes) rather moot, > > because one hole is all the attacker needs to win, if they can control a > > function pointer / return address. At which point it's a large overhead > > for no security benefit over simple CET-IBT. > > Right, the "if they can control a function pointer" is the part I'm > focusing on. This attack depends on making an indirect call with a > controlled pointer. Non-FineIBT CFI will protect against that step, > so I think this is only an issue for IBT-only and FineIBT, but not CFI > nor CFI+IBT. To me, "CFI" is really just a fairly abstract concept; are you talking specifically about the Clang scheme from <https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html>, or something else? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:57 ` Jann Horn @ 2025-02-16 23:42 ` Kees Cook 0 siblings, 0 replies; 40+ messages in thread From: Kees Cook @ 2025-02-16 23:42 UTC (permalink / raw) To: Jann Horn Cc: Andrew Cooper, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, Peter Zijlstra (Intel) On Thu, Feb 13, 2025 at 09:57:37PM +0100, Jann Horn wrote: > On Thu, Feb 13, 2025 at 9:53 PM Kees Cook <kees@kernel.org> wrote: > > On Thu, Feb 13, 2025 at 08:41:16PM +0000, Andrew Cooper wrote: > > > On 13/02/2025 8:28 pm, Kees Cook wrote: > > > > On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: > > > >>>> Assuming this is an issue you all feel is worth addressing, I will > > > >>>> continue working on providing a patch. I'm concerned though that the > > > >>>> overhead from adding a wrmsr on both syscall entry and exit to > > > >>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so > > > >>>> any feedback in regards to the approach or suggestions of alternate > > > >>>> approaches to patching are welcome :) > > > >>> Since the kernel, as far as I understand, uses FineIBT without > > > >>> backwards control flow protection (in other words, I think we assume > > > >>> that the kernel stack is trusted?), > > > >> This is fun indeed. Linux cannot use supervisor shadow stacks because > > > >> the mess around NMI re-entrancy (and IST more generally) requires ROP > > > >> gadgets in order to function safely. Implementing this with shadow > > > >> stacks active, while not impossible, is deemed to be prohibitively > > > >> complicated. > > > > And just validate my understanding here, this attack is fundamentally > > > > about FineIBT, not regular CFI (IBT or not), as the validation of target > > > > addresses is done at indirect call time, yes? > > > > > > I'm not sure I'd classify it like that. As a pivot primitive, it works > > > very widely. > > > > > > FineIBT (more specifically any hybrid CFI scheme which includes CET-IBT) > > > relies on hardware to do the course grain violation detection, and some > > > software hash for fine grain violation detection. > > > > > > In this case, the requirement for the SYSCALL entrypoint to have an > > > ENDBR64 instruction means it passes the CET-IBT check (does not yield > > > #CP), and then lacks the software hash check as well. > > > > > > i.e. this renders FineIBT (and other hybrid CFI schemes) rather moot, > > > because one hole is all the attacker needs to win, if they can control a > > > function pointer / return address. At which point it's a large overhead > > > for no security benefit over simple CET-IBT. > > > > Right, the "if they can control a function pointer" is the part I'm > > focusing on. This attack depends on making an indirect call with a > > controlled pointer. Non-FineIBT CFI will protect against that step, > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > nor CFI+IBT. > > To me, "CFI" is really just a fairly abstract concept; are you talking > specifically about the Clang scheme from > <https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html>, or > something else? Ah, sorry, I mean KCFI (and note that FineIBT is a run-time alternatives pass that transforms the "stock" KCFI): https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-kcfi https://lpc.events/event/16/contributions/1315/ https://www.youtube.com/watch?v=bmv6blX_F_g -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:53 ` Kees Cook 2025-02-13 20:57 ` Jann Horn @ 2025-02-14 9:57 ` Peter Zijlstra 2025-02-15 21:07 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2025-02-14 9:57 UTC (permalink / raw) To: Kees Cook Cc: Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > Right, the "if they can control a function pointer" is the part I'm > focusing on. This attack depends on making an indirect call with a > controlled pointer. Non-FineIBT CFI will protect against that step, > so I think this is only an issue for IBT-only and FineIBT, but not CFI > nor CFI+IBT. Yes, the whole caller side validation should stop this. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-14 9:57 ` Peter Zijlstra @ 2025-02-15 21:07 ` Peter Zijlstra 2025-02-16 23:51 ` Kees Cook 2025-02-17 13:06 ` David Laight 0 siblings, 2 replies; 40+ messages in thread From: Peter Zijlstra @ 2025-02-15 21:07 UTC (permalink / raw) To: Kees Cook Cc: Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > Right, the "if they can control a function pointer" is the part I'm > > focusing on. This attack depends on making an indirect call with a > > controlled pointer. Non-FineIBT CFI will protect against that step, > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > nor CFI+IBT. > > Yes, the whole caller side validation should stop this. And I think we can retro-fit that in FineIBT. Notably the current call sites look like: 0000000000000060 <fineibt_caller>: 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d 66: 49 83 eb 10 sub $0x10,%r11 6a: 0f 1f 40 00 nopl 0x0(%rax) 6e: 41 ff d3 call *%r11 71: 0f 1f 00 nopl (%rax) Of which the last 6 bytes are the retpoline site (starting at 0x6e). It is trivially possible to re-arrange things to have both nops next to one another, giving us 7 bytes to muck about with. And I think we can just about manage to do a caller side hash validation in them bytes like: 0000000000000080 <fineibt_paranoid>: 80: 41 ba 78 56 34 12 mov $0x12345678,%r10d 86: 49 83 eb 10 sub $0x10,%r11 8a: 45 3b 53 07 cmp 0x7(%r11),%r10d 8e: 74 01 je 91 <fineibt_paranoid+0x11> 90: ea (bad) 91: 41 ff d3 call *%r11 And while this is somewhat daft, it would close the hole vs this entry point swizzle afaict, no? Patch against tip/x86/core (which includes the latest ibt bits as per this morning). Boots and builds the next kernel on my ADL. --- arch/x86/include/asm/bug.h | 1 + arch/x86/include/asm/cfi.h | 8 ++-- arch/x86/kernel/alternative.c | 107 +++++++++++++++++++++++++++++++++++++++--- arch/x86/kernel/cfi.c | 4 +- arch/x86/kernel/traps.c | 13 ++++- 5 files changed, 120 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h index 1a5e4b372694..bc8a2ca3c82e 100644 --- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -25,6 +25,7 @@ #define BUG_UD2 0xfffe #define BUG_UD1 0xfffd #define BUG_UD1_UBSAN 0xfffc +#define BUG_EA 0xffea #ifdef CONFIG_GENERIC_BUG diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 7dd5ab239c87..550f75450e43 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -104,7 +104,7 @@ extern enum cfi_mode cfi_mode; struct pt_regs; #ifdef CONFIG_CFI_CLANG -enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); +enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs); #define __bpfcall extern u32 cfi_bpf_hash; extern u32 cfi_bpf_subprog_hash; @@ -127,10 +127,10 @@ static inline int cfi_get_offset(void) extern u32 cfi_get_func_hash(void *func); #ifdef CONFIG_FINEIBT -extern bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type); +extern bool decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type); #else static inline bool -decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type) +decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type) { return false; } @@ -138,7 +138,7 @@ decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type) #endif #else -static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) +static inline enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs) { return BUG_TRAP_TYPE_NONE; } diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 247ee5ffbff4..9e327b5e9f75 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -741,6 +741,11 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) op2 = insn.opcode.bytes[1]; switch (op1) { + case 0x70 ... 0x7f: /* Jcc.d8 */ + /* See cfi_paranoid. */ + WARN_ON_ONCE(cfi_mode != CFI_FINEIBT); + continue; + case CALL_INSN_OPCODE: case JMP32_INSN_OPCODE: break; @@ -983,6 +988,8 @@ u32 cfi_get_func_hash(void *func) static bool cfi_rand __ro_after_init = true; static u32 cfi_seed __ro_after_init; +static bool cfi_paranoid __ro_after_init = false; + /* * Re-hash the CFI hash with a boot-time seed while making sure the result is * not a valid ENDBR instruction. @@ -1022,6 +1029,8 @@ static __init int cfi_parse_cmdline(char *str) cfi_mode = CFI_FINEIBT; } else if (!strcmp(str, "norand")) { cfi_rand = false; + } else if (!strcmp(str, "paranoid")) { + cfi_paranoid = true; } else { pr_err("Ignoring unknown cfi option (%s).", str); } @@ -1097,6 +1106,29 @@ extern u8 fineibt_caller_end[]; #define fineibt_caller_jmp (fineibt_caller_size - 2) +asm( ".pushsection .rodata \n" + "fineibt_paranoid_start: \n" + " movl $0x12345678, %r10d \n" + " sub $16, %r11 \n" + " cmpl 7(%r11), %r10d \n" + " je fineibt_paranoid_call \n" + "fineibt_paranoid_trap: \n" + " .byte 0xea \n" + "fineibt_paranoid_call: \n" + " call *%r11 \n" + "fineibt_paranoid_end: \n" + ".popsection \n" +); + +extern u8 fineibt_paranoid_start[]; +extern u8 fineibt_paranoid_trap[]; +extern u8 fineibt_paranoid_call[]; +extern u8 fineibt_paranoid_end[]; + +#define fineibt_paranoid_size (fineibt_paranoid_end - fineibt_paranoid_start) +#define fineibt_paranoid_ud (fineibt_paranoid_trap - fineibt_paranoid_start) +#define fineibt_paranoid_ind (fineibt_paranoid_call - fineibt_paranoid_start) + static u32 decode_preamble_hash(void *addr) { u8 *p = addr; @@ -1260,18 +1292,48 @@ static int cfi_rewrite_callers(s32 *start, s32 *end) { s32 *s; + BUG_ON(fineibt_paranoid_size != 20); + for (s = start; s < end; s++) { void *addr = (void *)s + *s; + struct insn insn; + u8 bytes[20]; u32 hash; + int ret; + u8 op; addr -= fineibt_caller_size; hash = decode_caller_hash(addr); - if (hash) { + if (!hash) + continue; + + if (!cfi_paranoid) { text_poke_early(addr, fineibt_caller_start, fineibt_caller_size); WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678); text_poke_early(addr + fineibt_caller_hash, &hash, 4); + /* rely on apply_retpolines() */ + continue; } - /* rely on apply_retpolines() */ + + /* cfi_paranoid */ + ret = insn_decode_kernel(&insn, addr + fineibt_caller_size); + if (WARN_ON_ONCE(ret < 0)) + continue; + + op = insn.opcode.bytes[0]; + if (op != CALL_INSN_OPCODE && op != JMP32_INSN_OPCODE) { + WARN_ON_ONCE(1); + continue; + } + + memcpy(bytes, fineibt_paranoid_start, fineibt_paranoid_size); + memcpy(bytes + fineibt_caller_hash, &hash, 4); + + ret = emit_indirect(op, 11, bytes + fineibt_paranoid_ind); + if (WARN_ON_ONCE(ret != 3)) + continue; + + text_poke_early(addr, bytes, fineibt_paranoid_size); } return 0; @@ -1288,8 +1350,11 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, if (cfi_mode == CFI_AUTO) { cfi_mode = CFI_KCFI; - if (HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT)) + if (HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT)) { + if (!cpu_feature_enabled(X86_FEATURE_FRED)) + cfi_paranoid = true; cfi_mode = CFI_FINEIBT; + } } /* @@ -1346,8 +1411,10 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, /* now that nobody targets func()+0, remove ENDBR there */ cfi_rewrite_endbr(start_cfi, end_cfi); - if (builtin) - pr_info("Using FineIBT CFI\n"); + if (builtin) { + pr_info("Using FineIBT %s CFI\n", + cfi_paranoid ? "paranoid" : ""); + } return; default: @@ -1420,7 +1487,8 @@ static void poison_cfi(void *addr) * We check the preamble by checking for the ENDBR instruction relative to the * UD2 instruction. */ -bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type) +static bool decode_fineibt_preamble(int ud_type, struct pt_regs *regs, + unsigned long *target, u32 *type) { unsigned long addr = regs->ip - fineibt_preamble_ud2; u32 endbr, hash; @@ -1440,6 +1508,33 @@ bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type) return false; } +/* + * regs->ip points to a 0xea instruction from the fineibt_paranoid_start[] + * sequence. + */ +static bool decode_fineibt_paranoid(int ud_type, struct pt_regs *regs, + unsigned long *target, u32 *type) +{ + unsigned long addr = regs->ip - fineibt_paranoid_ud; + u32 hash; + + __get_kernel_nofault(&hash, addr + fineibt_caller_hash, u32, Efault); + *target = regs->r11 + 16; + *type = regs->r10; + return true; + +Efault: + return false; +} + +bool decode_fineibt_insn(int ud_type, struct pt_regs *regs, + unsigned long *target, u32 *type) +{ + if (ud_type == BUG_EA) + return decode_fineibt_paranoid(ud_type, regs, target, type); + return decode_fineibt_preamble(ud_type, regs, target, type); +} + #else static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, diff --git a/arch/x86/kernel/cfi.c b/arch/x86/kernel/cfi.c index f6905bef0af8..f9eb7465eec6 100644 --- a/arch/x86/kernel/cfi.c +++ b/arch/x86/kernel/cfi.c @@ -65,7 +65,7 @@ static bool decode_cfi_insn(struct pt_regs *regs, unsigned long *target, * Checks if a ud2 trap is because of a CFI failure, and handles the trap * if needed. Returns a bug_trap_type value similarly to report_bug. */ -enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) +enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs) { unsigned long target; u32 type; @@ -81,7 +81,7 @@ enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) break; case CFI_FINEIBT: - if (!decode_fineibt_insn(regs, &target, &type)) + if (!decode_fineibt_insn(ud_type, regs, &target, &type)) return BUG_TRAP_TYPE_NONE; break; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 05b86c05e446..500030ab8036 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -113,6 +113,10 @@ __always_inline int decode_bug(unsigned long addr, s32 *imm, int *len) v = *(u8 *)(addr++); if (v == INSN_ASOP) v = *(u8 *)(addr++); + if (v == 0xea) { + *len = addr - start; + return BUG_EA; + } if (v != OPCODE_ESCAPE) return BUG_NONE; @@ -308,9 +312,16 @@ static noinstr bool handle_bug(struct pt_regs *regs) raw_local_irq_enable(); switch (ud_type) { + case BUG_EA: + if (handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) { + regs->ip += ud_len; + handled = true; + } + break; + case BUG_UD2: if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN || - handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) { + handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) { regs->ip += ud_len; handled = true; } ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-15 21:07 ` Peter Zijlstra @ 2025-02-16 23:51 ` Kees Cook 2025-02-17 10:39 ` Peter Zijlstra 2025-02-17 13:06 ` David Laight 1 sibling, 1 reply; 40+ messages in thread From: Kees Cook @ 2025-02-16 23:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Sat, Feb 15, 2025 at 10:07:29PM +0100, Peter Zijlstra wrote: > On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > > > Right, the "if they can control a function pointer" is the part I'm > > > focusing on. This attack depends on making an indirect call with a > > > controlled pointer. Non-FineIBT CFI will protect against that step, > > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > > nor CFI+IBT. > > > > Yes, the whole caller side validation should stop this. > > And I think we can retro-fit that in FineIBT. Notably the current call > sites look like: > > 0000000000000060 <fineibt_caller>: > 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d > 66: 49 83 eb 10 sub $0x10,%r11 > 6a: 0f 1f 40 00 nopl 0x0(%rax) > 6e: 41 ff d3 call *%r11 > 71: 0f 1f 00 nopl (%rax) > > Of which the last 6 bytes are the retpoline site (starting at 0x6e). It > is trivially possible to re-arrange things to have both nops next to one > another, giving us 7 bytes to muck about with. > > And I think we can just about manage to do a caller side hash validation > in them bytes like: > > 0000000000000080 <fineibt_paranoid>: > 80: 41 ba 78 56 34 12 mov $0x12345678,%r10d > 86: 49 83 eb 10 sub $0x10,%r11 > 8a: 45 3b 53 07 cmp 0x7(%r11),%r10d > 8e: 74 01 je 91 <fineibt_paranoid+0x11> > 90: ea (bad) > 91: 41 ff d3 call *%r11 Ah nice! Yes, that would be great and removes all my concerns about FineIBT. :) (And you went with EA just to distinguish it more easily? Can't we still use the UD2 bug tables to find this like normal?) > And while this is somewhat daft, it would close the hole vs this entry > point swizzle afaict, no? > > Patch against tip/x86/core (which includes the latest ibt bits as per > this morning). > > Boots and builds the next kernel on my ADL. Lovely! Based on the patch, I assume you were testing CFI crash location reporting too? I'll try to get this spun up for testing here too. -- Kees Cook ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-16 23:51 ` Kees Cook @ 2025-02-17 10:39 ` Peter Zijlstra 0 siblings, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2025-02-17 10:39 UTC (permalink / raw) To: Kees Cook Cc: Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Sun, Feb 16, 2025 at 03:51:27PM -0800, Kees Cook wrote: > On Sat, Feb 15, 2025 at 10:07:29PM +0100, Peter Zijlstra wrote: > > On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > > > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > > > > > Right, the "if they can control a function pointer" is the part I'm > > > > focusing on. This attack depends on making an indirect call with a > > > > controlled pointer. Non-FineIBT CFI will protect against that step, > > > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > > > nor CFI+IBT. > > > > > > Yes, the whole caller side validation should stop this. > > > > And I think we can retro-fit that in FineIBT. Notably the current call > > sites look like: > > > > 0000000000000060 <fineibt_caller>: > > 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d > > 66: 49 83 eb 10 sub $0x10,%r11 > > 6a: 0f 1f 40 00 nopl 0x0(%rax) > > 6e: 41 ff d3 call *%r11 > > 71: 0f 1f 00 nopl (%rax) > > > > Of which the last 6 bytes are the retpoline site (starting at 0x6e). It > > is trivially possible to re-arrange things to have both nops next to one > > another, giving us 7 bytes to muck about with. > > > > And I think we can just about manage to do a caller side hash validation > > in them bytes like: > > > > 0000000000000080 <fineibt_paranoid>: > > 80: 41 ba 78 56 34 12 mov $0x12345678,%r10d > > 86: 49 83 eb 10 sub $0x10,%r11 > > 8a: 45 3b 53 07 cmp 0x7(%r11),%r10d > > 8e: 74 01 je 91 <fineibt_paranoid+0x11> > > 90: ea (bad) > > 91: 41 ff d3 call *%r11 > > Ah nice! Yes, that would be great and removes all my concerns about > FineIBT. :) Excellent! > (And you went with EA just to distinguish it more easily? > Can't we still use the UD2 bug tables to find this like normal?) No space; UD2 is a 2 byte instruction. IIUC all the single byte instructions that trip #UD are more or less 'reserved' and we shouldn't be using them, but I think we can use 0xEA here since it is specific to the paranoid FineIBT thing -- and if people want to reclaim the usage, all they need to do is fix IBT :-) -- which as I said before should be done once FRED happens. (/me makes note to go read the very latest FRED spec -- its been a while). > > And while this is somewhat daft, it would close the hole vs this entry > > point swizzle afaict, no? > > > > Patch against tip/x86/core (which includes the latest ibt bits as per > > this morning). > > > > Boots and builds the next kernel on my ADL. > > Lovely! Based on the patch, I assume you were testing CFI crash location > reporting too? Sami was, he reminded me I forgot to hook up FineIBT, so I did :-) > I'll try to get this spun up for testing here too. Thanks! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-15 21:07 ` Peter Zijlstra 2025-02-16 23:51 ` Kees Cook @ 2025-02-17 13:06 ` David Laight 2025-02-17 13:13 ` Peter Zijlstra 1 sibling, 1 reply; 40+ messages in thread From: David Laight @ 2025-02-17 13:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Sat, 15 Feb 2025 22:07:29 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > > > Right, the "if they can control a function pointer" is the part I'm > > > focusing on. This attack depends on making an indirect call with a > > > controlled pointer. Non-FineIBT CFI will protect against that step, > > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > > nor CFI+IBT. > > > > Yes, the whole caller side validation should stop this. > > And I think we can retro-fit that in FineIBT. Notably the current call > sites look like: > > 0000000000000060 <fineibt_caller>: > 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d > 66: 49 83 eb 10 sub $0x10,%r11 > 6a: 0f 1f 40 00 nopl 0x0(%rax) > 6e: 41 ff d3 call *%r11 > 71: 0f 1f 00 nopl (%rax) I tried building a fineibt kernel (without LTO) and that isn't what I see in the object files. (I not trying to run it, just do some analysis.) While the call targets have a 16 byte preamble it is all nops apart from a final 'mov $hash,%rax'. The call site loads $-hash and adds -4(target) and checks for zero. It is too small to be patchable into the above. There are far too many TLA (and ETLA) to follow all the options. I did notice that although objtool seems to have code to remove 'spare' endbra, the 'mov %rax,$hash' was present on all external functions. Some 1600 are void fn(void) - there are high counts of others. > Of which the last 6 bytes are the retpoline site (starting at 0x6e). It > is trivially possible to re-arrange things to have both nops next to one > another, giving us 7 bytes to muck about with. > > And I think we can just about manage to do a caller side hash validation > in them bytes like: > > 0000000000000080 <fineibt_paranoid>: > 80: 41 ba 78 56 34 12 mov $0x12345678,%r10d > 86: 49 83 eb 10 sub $0x10,%r11 > 8a: 45 3b 53 07 cmp 0x7(%r11),%r10d > 8e: 74 01 je 91 <fineibt_paranoid+0x11> > 90: ea (bad) > 91: 41 ff d3 call *%r11 > > And while this is somewhat daft, it would close the hole vs this entry > point swizzle afaict, no? Doesn't it have the problem that it includes the value of the hash? So you can arrange to jump directly into the sequence itself. David ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-17 13:06 ` David Laight @ 2025-02-17 13:13 ` Peter Zijlstra 2025-02-17 18:38 ` David Laight 0 siblings, 1 reply; 40+ messages in thread From: Peter Zijlstra @ 2025-02-17 13:13 UTC (permalink / raw) To: David Laight Cc: Kees Cook, Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Mon, Feb 17, 2025 at 01:06:29PM +0000, David Laight wrote: > On Sat, 15 Feb 2025 22:07:29 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > > > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > > > > > Right, the "if they can control a function pointer" is the part I'm > > > > focusing on. This attack depends on making an indirect call with a > > > > controlled pointer. Non-FineIBT CFI will protect against that step, > > > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > > > nor CFI+IBT. > > > > > > Yes, the whole caller side validation should stop this. > > > > And I think we can retro-fit that in FineIBT. Notably the current call > > sites look like: > > > > 0000000000000060 <fineibt_caller>: > > 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d > > 66: 49 83 eb 10 sub $0x10,%r11 > > 6a: 0f 1f 40 00 nopl 0x0(%rax) > > 6e: 41 ff d3 call *%r11 > > 71: 0f 1f 00 nopl (%rax) > > I tried building a fineibt kernel (without LTO) and that isn't what I > see in the object files. > (I not trying to run it, just do some analysis.) > While the call targets have a 16 byte preamble it is all nops apart > from a final 'mov $hash,%rax'. > The call site loads $-hash and adds -4(target) and checks for zero. > It is too small to be patchable into the above. Right after that comes the retpoline site, which is another 6 bytes (assuming you have indirect-branch-cs-prefix, which all kCFI enabled compilers should have). You need to go read arch/x86/kernel/alternative.c search for FineIBT ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-17 13:13 ` Peter Zijlstra @ 2025-02-17 18:38 ` David Laight 2025-02-17 18:54 ` Peter Zijlstra 0 siblings, 1 reply; 40+ messages in thread From: David Laight @ 2025-02-17 18:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Kees Cook, Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Mon, 17 Feb 2025 14:13:21 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Feb 17, 2025 at 01:06:29PM +0000, David Laight wrote: > > On Sat, 15 Feb 2025 22:07:29 +0100 > > Peter Zijlstra <peterz@infradead.org> wrote: > > > > > On Fri, Feb 14, 2025 at 10:57:51AM +0100, Peter Zijlstra wrote: > > > > On Thu, Feb 13, 2025 at 12:53:28PM -0800, Kees Cook wrote: > > > > > > > > > Right, the "if they can control a function pointer" is the part I'm > > > > > focusing on. This attack depends on making an indirect call with a > > > > > controlled pointer. Non-FineIBT CFI will protect against that step, > > > > > so I think this is only an issue for IBT-only and FineIBT, but not CFI > > > > > nor CFI+IBT. > > > > > > > > Yes, the whole caller side validation should stop this. > > > > > > And I think we can retro-fit that in FineIBT. Notably the current call > > > sites look like: > > > > > > 0000000000000060 <fineibt_caller>: > > > 60: 41 ba 78 56 34 12 mov $0x12345678,%r10d > > > 66: 49 83 eb 10 sub $0x10,%r11 > > > 6a: 0f 1f 40 00 nopl 0x0(%rax) > > > 6e: 41 ff d3 call *%r11 > > > 71: 0f 1f 00 nopl (%rax) > > > > I tried building a fineibt kernel (without LTO) and that isn't what I > > see in the object files. > > (I not trying to run it, just do some analysis.) > > While the call targets have a 16 byte preamble it is all nops apart > > from a final 'mov $hash,%rax'. > > The call site loads $-hash and adds -4(target) and checks for zero. > > It is too small to be patchable into the above. > > Right after that comes the retpoline site, which is another 6 bytes > (assuming you have indirect-branch-cs-prefix, which all kCFI enabled > compilers should have). I'm building with clang 18.1.18 - should be new enough. I may not have retpolines enabled, a typical call site is (from vmlinux.o): 3628: 48 89 c6 mov %rax,%rsi 362b: 41 ba 83 c5 2c af mov $0xaf2cc583,%r10d 3631: 44 03 51 fc add -0x4(%rcx),%r10d 3635: 74 02 je 3639 <vc_handle_exitcode+0x739> 3637: 0f 0b ud2 3639: ff d1 call *%rcx 363b: 4c 89 f6 mov %r14,%rsi That one has three targets, one is: 000000000008a5c0 <__cfi_kvm_sev_es_hcall_prepare>: 8a5c0: 90 nop 8a5c1: 90 nop 8a5c2: 90 nop 8a5c3: 90 nop 8a5c4: 90 nop 8a5c5: 90 nop 8a5c6: 90 nop 8a5c7: 90 nop 8a5c8: 90 nop 8a5c9: 90 nop 8a5ca: 90 nop 8a5cb: b8 7d 3a d3 50 mov $0x50d33a7d,%eax 000000000008a5d0 <kvm_sev_es_hcall_prepare>: 8a5d0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 8a5d1: R_X86_64_NONE __fentry__-0x4 8a5d5: 48 8b 46 28 mov 0x28(%rsi),%rax I think that if I had endbra enabled objtool would remove them from non-exported functions whose address isn't taken. But none of the 'mov $hash,%eax' get removed - and I think they should suffer the same fate. I'm not sure why I don't have endbra though. I did remove a lot of the mitigations from the config I copied to add the caller side fineibt (I think) hash checks. After all this is a local system I want to run fast, not a semi-public one someone might try to hack. > You need to go read arch/x86/kernel/alternative.c search for FineIBT I found some stuff in one of the docs. Didn't read that bit of source. What I was hoping to obtain was a list of the valid target functions for each indirect call site. With the stack offset of the call (which objtool knows) and a lot of 'shaking' an real estimate of max stack depth can be determined. (and recursive loops found.) David ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-17 18:38 ` David Laight @ 2025-02-17 18:54 ` Peter Zijlstra 0 siblings, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2025-02-17 18:54 UTC (permalink / raw) To: David Laight Cc: Kees Cook, Andrew Cooper, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen, scott.d.constable, x86 On Mon, Feb 17, 2025 at 06:38:27PM +0000, David Laight wrote: > I may not have retpolines enabled, a typical call site is (from vmlinux.o): Make sure CONFIG_FINEIBT=y, otherwise there is no point in talking about this. This requires KERNEL_IBT=y RETPOLINE=y CALL_PADDING=y CFI_CLANG=y. Then look at arch/x86/include/asm/cfi.h and make sure to read the comment, and then read arch/x86/kernel/alternative.c:__apply_fineibt(). Which ever way around you're going to turn this, you'll never find the fineibt code in the object files. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 20:41 ` Andrew Cooper 2025-02-13 20:53 ` Kees Cook @ 2025-02-14 10:05 ` Peter Zijlstra 1 sibling, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2025-02-14 10:05 UTC (permalink / raw) To: Andrew Cooper Cc: Kees Cook, jannh, jmill, joao, linux-hardening, linux-kernel, luto, samitolvanen On Thu, Feb 13, 2025 at 08:41:16PM +0000, Andrew Cooper wrote: > The problem is that SYSCALL entry/exit is a toxic operating mode, > because you only have to think about sneezing and another user->kernel > priv-esc appears. For a very brief moment I thought we could leave out the ENDBR there and eat the #CP, but 1) slow, and 2) then #CP needs to be an IST and ARGHH. So yeah, I didn't just suggest anything at all. I hate all this. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 1:31 ` Andrew Cooper 2025-02-13 2:09 ` Jann Horn 2025-02-13 20:28 ` Kees Cook @ 2025-02-14 9:54 ` Peter Zijlstra 2 siblings, 0 replies; 40+ messages in thread From: Peter Zijlstra @ 2025-02-14 9:54 UTC (permalink / raw) To: Andrew Cooper Cc: jannh, jmill, joao, kees, linux-hardening, linux-kernel, luto, samitolvanen On Thu, Feb 13, 2025 at 01:31:30AM +0000, Andrew Cooper wrote: > But, FRED support is slated for PantherLake/DiamondRapids which haven't > shipped yet, so are no use to the problem right now. FRED also fixes this IBT 'oopsie' IIRC. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-12 22:29 ` [RFC] Circumventing FineIBT Via Entrypoints Jann Horn 2025-02-13 1:31 ` Andrew Cooper @ 2025-02-13 6:15 ` Jennifer Miller 2025-02-13 19:23 ` Jann Horn 1 sibling, 1 reply; 40+ messages in thread From: Jennifer Miller @ 2025-02-13 6:15 UTC (permalink / raw) To: Jann Horn Cc: Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list, Andrew Cooper On Wed, Feb 12, 2025 at 11:29:02PM +0100, Jann Horn wrote: > +Andy Lutomirski (X86 entry code maintainer) > > On Wed, Feb 12, 2025 at 10:08 PM Jennifer Miller <jmill@asu.edu> wrote: > > As part of a recently accepted paper we demonstrated that syscall > > entrypoints can be misused on x86-64 systems to generically bypass > > FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We > > communicated this finding to s@k.o before submitting the paper and were > > encouraged to bring the issue to hardening after the paper was accepted to > > have a discussion on how to address the issue. > > > > The bypass takes advantage of the architectural requirement of entrypoints > > to begin with the endbr64 instruction and the ability to control GS_BASE > > from userspace via wrgsbase, from to the FSGSBASE extension, in order to > > perform a stack pivot to a ROP-chain. > > Oh, fun, that's a gnarly quirk. yeag :) > Since the kernel, as far as I understand, uses FineIBT without > backwards control flow protection (in other words, I think we assume > that the kernel stack is trusted?), could we build a cheaper > check on that basis somehow? For example, maybe we could do something like: > > ``` > endbr64 > test rsp, rsp > js slowpath > swapgs > ``` > > So we'd have the fast normal case where RSP points to userspace > (meaning we can't be coming from the kernel unless our stack has > already been pivoted, in which case forward edge protection alone > can't help anymore), and the slow case where RSP points to kernel > memory - in that case we'd then have to do some slower checks to > figure out whether weird userspace is making a syscall with RSP > pointing to the kernel, or whether we're coming from hijacked kernel > control flow. I've been tinkering this idea a bit and came with something. In short, we could have the slowpath branch as you suggested, in the slowpath permit the stack switch and preserving of the registers on the stack, but then do a sanity check according to the __per_cpu_offset array and decide from there whether we should continue executing the entrypoint or die/attempt to recover. Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint, I think more or less the same could be done for the other entrypoints. ``` endbr64 test rsp, rsp js slowpath swapgs ~~fastpath continues~~ ; path taken when rsp was a kernel address ; we have no choice really but to switch to the stack from the untrusted ; gsbase but after doing so we have to be careful about what we put on the ; stack slowpath: swapgs ; swap stacks as normal mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> ~~normal push and clear GPRs sequence here~~ ; we entered with an rsp in the kernel address range. ; we already did swapgs but we don't know if we can trust our gsbase yet. ; we should be able to trust the ro_after_init __per_cpu_offset array ; though. ; check that gsbase is the expected value for our current cpu rdtscp mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset> rdgsbase rbx cmp rbx, rax je fastpath_after_regs_preserved wrgsbase rax ; if we reach here we are being exploited and should explode or attempt ; to recover ``` The unfortunate part is that it would still result in the register state being dumped on top of some attacker controlled address, so if the error path is recoverable someone could still use entrypoints to convert control flow hijacking into memory corruption via register dump. So it would kill the ability to get ROP but it would still be possible to dump regs over modprobe_path, core_pattern, etc. Does this seem feasible and any better than the alternative of overwriting and restoring KERNEL_GS_BASE? ~Jennifer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 6:15 ` Jennifer Miller @ 2025-02-13 19:23 ` Jann Horn 2025-02-13 21:24 ` Andrew Cooper 2025-02-14 22:25 ` Josh Poimboeuf 0 siblings, 2 replies; 40+ messages in thread From: Jann Horn @ 2025-02-13 19:23 UTC (permalink / raw) To: Jennifer Miller Cc: Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list, Andrew Cooper On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@asu.edu> wrote: > On Wed, Feb 12, 2025 at 11:29:02PM +0100, Jann Horn wrote: > > +Andy Lutomirski (X86 entry code maintainer) > > > > On Wed, Feb 12, 2025 at 10:08 PM Jennifer Miller <jmill@asu.edu> wrote: > > > As part of a recently accepted paper we demonstrated that syscall > > > entrypoints can be misused on x86-64 systems to generically bypass > > > FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We > > > communicated this finding to s@k.o before submitting the paper and were > > > encouraged to bring the issue to hardening after the paper was accepted to > > > have a discussion on how to address the issue. > > > > > > The bypass takes advantage of the architectural requirement of entrypoints > > > to begin with the endbr64 instruction and the ability to control GS_BASE > > > from userspace via wrgsbase, from to the FSGSBASE extension, in order to > > > perform a stack pivot to a ROP-chain. > > > > Oh, fun, that's a gnarly quirk. > > yeag :) > > > Since the kernel, as far as I understand, uses FineIBT without > > backwards control flow protection (in other words, I think we assume > > that the kernel stack is trusted?), could we build a cheaper > > check on that basis somehow? For example, maybe we could do something like: > > > > ``` > > endbr64 > > test rsp, rsp > > js slowpath > > swapgs > > ``` > > > > So we'd have the fast normal case where RSP points to userspace > > (meaning we can't be coming from the kernel unless our stack has > > already been pivoted, in which case forward edge protection alone > > can't help anymore), and the slow case where RSP points to kernel > > memory - in that case we'd then have to do some slower checks to > > figure out whether weird userspace is making a syscall with RSP > > pointing to the kernel, or whether we're coming from hijacked kernel > > control flow. > > I've been tinkering this idea a bit and came with something. > > In short, we could have the slowpath branch as you suggested, in the > slowpath permit the stack switch and preserving of the registers on the > stack, but then do a sanity check according to the __per_cpu_offset array > and decide from there whether we should continue executing the entrypoint > or die/attempt to recover. One ugly option to avoid the register spilling might be to say "userspace is not allowed to execute a SYSCALL instruction while RSP is a kernel address, and if userspace does it anyway, the kernel can kill the process". Then the slowpath could immediately start using the GPRs without having to worry about where to save their old values, and it could read the correct gsbase with the GET_PERCPU_BASE macro. It would be an ABI change, but one that is probably fairly unlikely to actually break stuff? But it would require a bit of extra kernel code on the slowpath, which is kinda annoying... > Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint, > I think more or less the same could be done for the other entrypoints. > > ``` > endbr64 > test rsp, rsp > js slowpath > > swapgs > ~~fastpath continues~~ > > ; path taken when rsp was a kernel address > ; we have no choice really but to switch to the stack from the untrusted > ; gsbase but after doing so we have to be careful about what we put on the > ; stack > slowpath: > swapgs > > ; swap stacks as normal > mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> > mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> > > ~~normal push and clear GPRs sequence here~~ > > ; we entered with an rsp in the kernel address range. > ; we already did swapgs but we don't know if we can trust our gsbase yet. > ; we should be able to trust the ro_after_init __per_cpu_offset array > ; though. > > ; check that gsbase is the expected value for our current cpu > rdtscp > mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset> > > rdgsbase rbx > > cmp rbx, rax > je fastpath_after_regs_preserved > > wrgsbase rax > > ; if we reach here we are being exploited and should explode or attempt > ; to recover > ``` > > The unfortunate part is that it would still result in the register state > being dumped on top of some attacker controlled address, so if the error > path is recoverable someone could still use entrypoints to convert control > flow hijacking into memory corruption via register dump. So it would kill > the ability to get ROP but it would still be possible to dump regs over > modprobe_path, core_pattern, etc. It is annoying that we (as far as I know) don't have a nice clear security model for what exactly CFI in the kernel is supposed to achieve - though I guess that's partly because in its current version, it only happens to protect against cases where an attacker gets a function pointer overwrite, but not the probably more common cases where the attacker (also?) gets an object pointer overwrite... > Does this seem feasible and any better than the alternative of overwriting > and restoring KERNEL_GS_BASE? The syscall entry point is a hot path; my main reason for suggesting the RSP check is that I'm worried about the performance impact of the gsbase-overwriting approach, but I don't actually have numbers on that. I figure a test + conditional jump is about the cheapest we can do... Do we know how many cycles wrgsbase takes, and how serializing is it? Sadly Agner Fog's tables don't seem to list it... How would we actually do that overwriting and restoring of KERNEL_GS_BASE? Would we need a scratch register for that? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 19:23 ` Jann Horn @ 2025-02-13 21:24 ` Andrew Cooper 2025-02-13 23:24 ` Jennifer Miller 2025-02-14 22:25 ` Josh Poimboeuf 1 sibling, 1 reply; 40+ messages in thread From: Andrew Cooper @ 2025-02-13 21:24 UTC (permalink / raw) To: Jann Horn, Jennifer Miller Cc: Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On 13/02/2025 7:23 pm, Jann Horn wrote: > On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@asu.edu> wrote: >> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint, >> I think more or less the same could be done for the other entrypoints. >> >> ``` >> endbr64 >> test rsp, rsp >> js slowpath >> >> swapgs >> ~~fastpath continues~~ >> >> ; path taken when rsp was a kernel address >> ; we have no choice really but to switch to the stack from the untrusted >> ; gsbase but after doing so we have to be careful about what we put on the >> ; stack >> slowpath: >> swapgs I'm afraid I don't follow. By this point, both basic blocks are the same (a single swapgs). Malicious userspace can get onto the slowpath by loading a kernel pointer into %rsp. Furthermore, if the origin of this really was in the kernel, then ... >> >> ; swap stacks as normal >> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> >> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> ... these are memory accesses using the user %gs. As you note a few lines lower, %gs isn't safe at this point. A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, at point we'll have loaded an attacker controlled %rsp, then take #PF trying to spill %rsp into pcpu_hot, and now we're running the pagefault handler on an attacker controlled stack and gsbase. >> ~~normal push and clear GPRs sequence here~~ >> >> ; we entered with an rsp in the kernel address range. >> ; we already did swapgs but we don't know if we can trust our gsbase yet. >> ; we should be able to trust the ro_after_init __per_cpu_offset array >> ; though. >> >> ; check that gsbase is the expected value for our current cpu >> rdtscp >> mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset> >> >> rdgsbase rbx >> >> cmp rbx, rax >> je fastpath_after_regs_preserved >> >> wrgsbase rax Irrespective of other things, you'll need some compatibility strategy for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used unconditionally in 64bit mode. It might be as simple as making FineIBT depend on their presence to activate, but taking a #UD exception in this path is also a priv-esc vulnerability. While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are virt environments where this implication does not hold. >> >> ; if we reach here we are being exploited and should explode or attempt >> ; to recover >> ``` >> >> The unfortunate part is that it would still result in the register state >> being dumped on top of some attacker controlled address, so if the error >> path is recoverable someone could still use entrypoints to convert control >> flow hijacking into memory corruption via register dump. So it would kill >> the ability to get ROP but it would still be possible to dump regs over >> modprobe_path, core_pattern, etc. > It is annoying that we (as far as I know) don't have a nice clear > security model for what exactly CFI in the kernel is supposed to > achieve - though I guess that's partly because in its current version, > it only happens to protect against cases where an attacker gets a > function pointer overwrite, but not the probably more common cases > where the attacker (also?) gets an object pointer overwrite... > >> Does this seem feasible and any better than the alternative of overwriting >> and restoring KERNEL_GS_BASE? > The syscall entry point is a hot path; my main reason for suggesting > the RSP check is that I'm worried about the performance impact of the > gsbase-overwriting approach, but I don't actually have numbers on > that. I figure a test + conditional jump is about the cheapest we can > do... Yeah, this is the cheapest I can think of too. TEST+JS has been able to macrofuse since the Core2 era. > Do we know how many cycles wrgsbase takes, and how serializing > is it? Sadly Agner Fog's tables don't seem to list it... Not (architecturally) serialising, and pretty quick IIRC. It is microcoded, but the segment registers are renamed so it can execute speculatively. ~Andrew > > How would we actually do that overwriting and restoring of > KERNEL_GS_BASE? Would we need a scratch register for that? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 21:24 ` Andrew Cooper @ 2025-02-13 23:24 ` Jennifer Miller 2025-02-13 23:43 ` Jann Horn 2025-02-14 23:06 ` Andrew Cooper 0 siblings, 2 replies; 40+ messages in thread From: Jennifer Miller @ 2025-02-13 23:24 UTC (permalink / raw) To: Andrew Cooper, Jann Horn Cc: Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: > On 13/02/2025 7:23 pm, Jann Horn wrote: > > On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@asu.edu> wrote: > >> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint, > >> I think more or less the same could be done for the other entrypoints. > >> > >> ``` > >> endbr64 > >> test rsp, rsp > >> js slowpath > >> > >> swapgs > >> ~~fastpath continues~~ > >> > >> ; path taken when rsp was a kernel address > >> ; we have no choice really but to switch to the stack from the untrusted > >> ; gsbase but after doing so we have to be careful about what we put on the > >> ; stack > >> slowpath: > >> swapgs > > I'm afraid I don't follow. By this point, both basic blocks are the > same (a single swapgs). Ah sure, the test/js could be moved occur after swapgs to save an instruction. > > Malicious userspace can get onto the slowpath by loading a kernel > pointer into %rsp. Furthermore, if the origin of this really was in the > kernel, then ... > > >> > >> ; swap stacks as normal > >> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> > >> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> > > ... these are memory accesses using the user %gs. As you note a few > lines lower, %gs isn't safe at this point. > > A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, > at point we'll have loaded an attacker controlled %rsp, then take #PF > trying to spill %rsp into pcpu_hot, and now we're running the pagefault > handler on an attacker controlled stack and gsbase. > I don't follow, the spill of %rsp into pcpu_hot occurs first, before we would move to the attacker controlled stack. This is Intel asm syntax, sorry if that was unclear. Still, I hadn't considered misusing readonly/unmapped pages on the GPR register spill that follows. Could we enforce that the stack pointer we get be page aligned to prevent this vector? So that if one were to attempt to point the stack to readonly or unmapped memory they should be guaranteed to double fault? > >> ~~normal push and clear GPRs sequence here~~ > >> > >> ; we entered with an rsp in the kernel address range. > >> ; we already did swapgs but we don't know if we can trust our gsbase yet. > >> ; we should be able to trust the ro_after_init __per_cpu_offset array > >> ; though. > >> > >> ; check that gsbase is the expected value for our current cpu > >> rdtscp > >> mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset> > >> > >> rdgsbase rbx > >> > >> cmp rbx, rax > >> je fastpath_after_regs_preserved > >> > >> wrgsbase rax > > Irrespective of other things, you'll need some compatibility strategy > for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used > unconditionally in 64bit mode. It might be as simple as making FineIBT > depend on their presence to activate, but taking a #UD exception in this > path is also a priv-esc vulnerability. Sure, we could rdmsr IA32_TSC_AUX in place of rdtscsp. After the wrgsbase we could switch to the expected kernel stack now that gsbase is fixed before taking any #UD. > > While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are > virt environments where this implication does not hold. > > >> > >> ; if we reach here we are being exploited and should explode or attempt > >> ; to recover > >> ``` > >> > >> The unfortunate part is that it would still result in the register state > >> being dumped on top of some attacker controlled address, so if the error > >> path is recoverable someone could still use entrypoints to convert control > >> flow hijacking into memory corruption via register dump. So it would kill > >> the ability to get ROP but it would still be possible to dump regs over > >> modprobe_path, core_pattern, etc. > > It is annoying that we (as far as I know) don't have a nice clear > > security model for what exactly CFI in the kernel is supposed to > > achieve - though I guess that's partly because in its current version, > > it only happens to protect against cases where an attacker gets a > > function pointer overwrite, but not the probably more common cases > > where the attacker (also?) gets an object pointer overwrite... > > > >> Does this seem feasible and any better than the alternative of overwriting > >> and restoring KERNEL_GS_BASE? > > The syscall entry point is a hot path; my main reason for suggesting > > the RSP check is that I'm worried about the performance impact of the > > gsbase-overwriting approach, but I don't actually have numbers on > > that. I figure a test + conditional jump is about the cheapest we can > > do... > > Yeah, this is the cheapest I can think of too. TEST+JS has been able to > macrofuse since the Core2 era. > > > Do we know how many cycles wrgsbase takes, and how serializing > > is it? Sadly Agner Fog's tables don't seem to list it... > > Not (architecturally) serialising, and pretty quick IIRC. It is > microcoded, but the segment registers are renamed so it can execute > speculatively. > > ~Andrew > > > > > How would we actually do that overwriting and restoring of > > KERNEL_GS_BASE? Would we need a scratch register for that? > I think we can do the overwrite at any point before actually calling into the individual syscall handlers, really anywhere before potentially hijacked indirect control flow can occur and then restore it just after those return e.g., for the 64-bit path I am currently overwriting it at the start of do_syscall_64 and then restoring it just before syscall_exit_to_user_mode. I'm not sure if there is any reason to do it sooner while we'd still be register constrained. ~Jennifer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 23:24 ` Jennifer Miller @ 2025-02-13 23:43 ` Jann Horn 2025-02-14 23:06 ` Andrew Cooper 1 sibling, 0 replies; 40+ messages in thread From: Jann Horn @ 2025-02-13 23:43 UTC (permalink / raw) To: Jennifer Miller Cc: Andrew Cooper, Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On Fri, Feb 14, 2025 at 12:24 AM Jennifer Miller <jmill@asu.edu> wrote: > On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: > > On 13/02/2025 7:23 pm, Jann Horn wrote: > > > How would we actually do that overwriting and restoring of > > > KERNEL_GS_BASE? Would we need a scratch register for that? > > > > I think we can do the overwrite at any point before actually calling into > the individual syscall handlers, really anywhere before potentially > hijacked indirect control flow can occur and then restore it just after > those return e.g., for the 64-bit path I am currently overwriting it at the > start of do_syscall_64 and then restoring it just before > syscall_exit_to_user_mode. I'm not sure if there is any reason to do it > sooner while we'd still be register constrained. Right, makes sense - sorry, I misremembered the details of the KERNEL_GS_BASE overwrite proposal, I had to re-read your first mail. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 23:24 ` Jennifer Miller 2025-02-13 23:43 ` Jann Horn @ 2025-02-14 23:06 ` Andrew Cooper 2025-02-15 0:07 ` Jennifer Miller 1 sibling, 1 reply; 40+ messages in thread From: Andrew Cooper @ 2025-02-14 23:06 UTC (permalink / raw) To: Jennifer Miller, Jann Horn Cc: Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On 13/02/2025 11:24 pm, Jennifer Miller wrote: > On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: >>>> ; swap stacks as normal >>>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> >>>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> >> ... these are memory accesses using the user %gs. As you note a few >> lines lower, %gs isn't safe at this point. >> >> A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, >> at point we'll have loaded an attacker controlled %rsp, then take #PF >> trying to spill %rsp into pcpu_hot, and now we're running the pagefault >> handler on an attacker controlled stack and gsbase. >> > I don't follow, the spill of %rsp into pcpu_hot occurs first, before we > would move to the attacker controlled stack. This is Intel asm syntax, > sorry if that was unclear. No, sorry. It's clearly written; I simply wasn't paying enough attention. > Still, I hadn't considered misusing readonly/unmapped pages on the GPR > register spill that follows. Could we enforce that the stack pointer we get > be page aligned to prevent this vector? So that if one were to attempt to > point the stack to readonly or unmapped memory they should be guaranteed to > double fault? Hmm. Espfix64 does involve #DF recovering from a write to a read-only stack. (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of thing.) As long the #DF handler can be updated to safely distinguish espfix64 from this entrypoint attack, this seems like it might mitigate the read-only case. > I think we can do the overwrite at any point before actually calling into > the individual syscall handlers, really anywhere before potentially > hijacked indirect control flow can occur and then restore it just after > those return e.g., for the 64-bit path I am currently overwriting it at the > start of do_syscall_64 and then restoring it just before > syscall_exit_to_user_mode. I'm not sure if there is any reason to do it > sooner while we'd still be register constrained. I don't follow. If any "bad" execution is found in an entrypoint, Linux needs to panic(). Detecting the malice involves clobbering an in-use stack, and there's no ability to safely recover. ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-14 23:06 ` Andrew Cooper @ 2025-02-15 0:07 ` Jennifer Miller 2025-02-15 0:11 ` Andrew Cooper 0 siblings, 1 reply; 40+ messages in thread From: Jennifer Miller @ 2025-02-15 0:07 UTC (permalink / raw) To: Andrew Cooper Cc: Jann Horn, Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On Fri, Feb 14, 2025 at 11:06:50PM +0000, Andrew Cooper wrote: > On 13/02/2025 11:24 pm, Jennifer Miller wrote: > > On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: > >>>> ; swap stacks as normal > >>>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> > >>>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> > >> ... these are memory accesses using the user %gs. As you note a few > >> lines lower, %gs isn't safe at this point. > >> > >> A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, > >> at point we'll have loaded an attacker controlled %rsp, then take #PF > >> trying to spill %rsp into pcpu_hot, and now we're running the pagefault > >> handler on an attacker controlled stack and gsbase. > >> > > I don't follow, the spill of %rsp into pcpu_hot occurs first, before we > > would move to the attacker controlled stack. This is Intel asm syntax, > > sorry if that was unclear. > > No, sorry. It's clearly written; I simply wasn't paying enough attention. > > > Still, I hadn't considered misusing readonly/unmapped pages on the GPR > > register spill that follows. Could we enforce that the stack pointer we get > > be page aligned to prevent this vector? So that if one were to attempt to > > point the stack to readonly or unmapped memory they should be guaranteed to > > double fault? > > Hmm. > > Espfix64 does involve #DF recovering from a write to a read-only stack. > (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of > thing.) Interesting, I haven't gotten around to reading into how FRED works, it sounds neat. > > As long the #DF handler can be updated to safely distinguish espfix64 > from this entrypoint attack, this seems like it might mitigate the > read-only case. > > I think we can do the overwrite at any point before actually calling into > > the individual syscall handlers, really anywhere before potentially > > hijacked indirect control flow can occur and then restore it just after > > those return e.g., for the 64-bit path I am currently overwriting it at the > > start of do_syscall_64 and then restoring it just before > > syscall_exit_to_user_mode. I'm not sure if there is any reason to do it > > sooner while we'd still be register constrained. > > I don't follow. If any "bad" execution is found in an entrypoint, Linux > needs to panic(). Detecting the malice involves clobbering an in-use > stack, and there's no ability to safely recover. Sorry, this was in response to Jann's question about the mitigation strategy proposed in my initial email. > > ~Andrew ~Jennifer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-15 0:07 ` Jennifer Miller @ 2025-02-15 0:11 ` Andrew Cooper 2025-02-15 0:19 ` Jennifer Miller 0 siblings, 1 reply; 40+ messages in thread From: Andrew Cooper @ 2025-02-15 0:11 UTC (permalink / raw) To: Jennifer Miller Cc: Jann Horn, Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On 15/02/2025 12:07 am, Jennifer Miller wrote: > On Fri, Feb 14, 2025 at 11:06:50PM +0000, Andrew Cooper wrote: >> On 13/02/2025 11:24 pm, Jennifer Miller wrote: >>> On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: >>> Still, I hadn't considered misusing readonly/unmapped pages on the GPR >>> register spill that follows. Could we enforce that the stack pointer we get >>> be page aligned to prevent this vector? So that if one were to attempt to >>> point the stack to readonly or unmapped memory they should be guaranteed to >>> double fault? >> Hmm. >> >> Espfix64 does involve #DF recovering from a write to a read-only stack. >> (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of >> thing.) > Interesting, I haven't gotten around to reading into how FRED works, it > sounds neat. Start with https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing Then https://www.intel.com/content/www/us/en/content-details/779982/flexible-return-and-event-delivery-fred-specification.html ~Andrew ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-15 0:11 ` Andrew Cooper @ 2025-02-15 0:19 ` Jennifer Miller 0 siblings, 0 replies; 40+ messages in thread From: Jennifer Miller @ 2025-02-15 0:19 UTC (permalink / raw) To: Andrew Cooper Cc: Jann Horn, Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list On Sat, Feb 15, 2025 at 12:11:17AM +0000, Andrew Cooper wrote: > On 15/02/2025 12:07 am, Jennifer Miller wrote: > > On Fri, Feb 14, 2025 at 11:06:50PM +0000, Andrew Cooper wrote: > >> On 13/02/2025 11:24 pm, Jennifer Miller wrote: > >>> On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: > >>> Still, I hadn't considered misusing readonly/unmapped pages on the GPR > >>> register spill that follows. Could we enforce that the stack pointer we get > >>> be page aligned to prevent this vector? So that if one were to attempt to > >>> point the stack to readonly or unmapped memory they should be guaranteed to > >>> double fault? > >> Hmm. > >> > >> Espfix64 does involve #DF recovering from a write to a read-only stack. > >> (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of > >> thing.) > > Interesting, I haven't gotten around to reading into how FRED works, it > > sounds neat. > > Start with > https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing > > > Then > https://www.intel.com/content/www/us/en/content-details/779982/flexible-return-and-event-delivery-fred-specification.html > > ~Andrew Thanks, I'll give those a read! ~Jennifer ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Circumventing FineIBT Via Entrypoints 2025-02-13 19:23 ` Jann Horn 2025-02-13 21:24 ` Andrew Cooper @ 2025-02-14 22:25 ` Josh Poimboeuf 1 sibling, 0 replies; 40+ messages in thread From: Josh Poimboeuf @ 2025-02-14 22:25 UTC (permalink / raw) To: Jann Horn Cc: Jennifer Miller, Andy Lutomirski, linux-hardening, kees, joao, samitolvanen, kernel list, Andrew Cooper On Thu, Feb 13, 2025 at 08:23:34PM +0100, Jann Horn wrote: > On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@asu.edu> wrote: > > In short, we could have the slowpath branch as you suggested, in the > > slowpath permit the stack switch and preserving of the registers on the > > stack, but then do a sanity check according to the __per_cpu_offset array > > and decide from there whether we should continue executing the entrypoint > > or die/attempt to recover. > > One ugly option to avoid the register spilling might be to say > "userspace is not allowed to execute a SYSCALL instruction while RSP > is a kernel address, and if userspace does it anyway, the kernel can > kill the process". Then the slowpath could immediately start using the > GPRs without having to worry about where to save their old values, and > it could read the correct gsbase with the GET_PERCPU_BASE macro. It > would be an ABI change, but one that is probably fairly unlikely to > actually break stuff? But it would require a bit of extra kernel code > on the slowpath, which is kinda annoying... Could all this be made easier if we went back to having percpu entry trampolines? Then the trampoline could just use a PC-relative access to get the kernel stack pointer without needing %gs. I think the main reason the entry trampolines were removed was because they needed an indirect branch to jump back to the global text. But they could be allocated within 2GB of the entry text and do a direct jump. -- Josh ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2025-03-02 22:31 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Z60NwR4w/28Z7XUa@ubun>
2025-02-12 22:29 ` [RFC] Circumventing FineIBT Via Entrypoints Jann Horn
2025-02-13 1:31 ` Andrew Cooper
2025-02-13 2:09 ` Jann Horn
2025-02-13 2:42 ` Andrew Cooper
2025-02-22 20:43 ` Rudolf Marek
2025-02-25 18:10 ` Andrew Cooper
2025-02-25 20:06 ` Rudolf Marek
2025-02-25 21:14 ` Andrew Cooper
2025-02-26 2:55 ` Kees Cook
2025-02-26 22:48 ` Rudolf Marek
2025-02-27 0:41 ` Andrew Cooper
2025-03-01 22:48 ` Rudolf Marek
2025-03-02 19:16 ` Rudolf Marek
2025-03-02 22:31 ` Andrew Cooper
2025-02-28 12:13 ` Florian Weimer
2025-02-13 20:28 ` Kees Cook
2025-02-13 20:41 ` Andrew Cooper
2025-02-13 20:53 ` Kees Cook
2025-02-13 20:57 ` Jann Horn
2025-02-16 23:42 ` Kees Cook
2025-02-14 9:57 ` Peter Zijlstra
2025-02-15 21:07 ` Peter Zijlstra
2025-02-16 23:51 ` Kees Cook
2025-02-17 10:39 ` Peter Zijlstra
2025-02-17 13:06 ` David Laight
2025-02-17 13:13 ` Peter Zijlstra
2025-02-17 18:38 ` David Laight
2025-02-17 18:54 ` Peter Zijlstra
2025-02-14 10:05 ` Peter Zijlstra
2025-02-14 9:54 ` Peter Zijlstra
2025-02-13 6:15 ` Jennifer Miller
2025-02-13 19:23 ` Jann Horn
2025-02-13 21:24 ` Andrew Cooper
2025-02-13 23:24 ` Jennifer Miller
2025-02-13 23:43 ` Jann Horn
2025-02-14 23:06 ` Andrew Cooper
2025-02-15 0:07 ` Jennifer Miller
2025-02-15 0:11 ` Andrew Cooper
2025-02-15 0:19 ` Jennifer Miller
2025-02-14 22:25 ` Josh Poimboeuf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox