public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jann Horn <jannh@google.com>
Cc: jmill@asu.edu, joao@overdrivepizza.com, kees@kernel.org,
	linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org,
	luto@kernel.org, samitolvanen@google.com,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints
Date: Thu, 13 Feb 2025 02:42:10 +0000	[thread overview]
Message-ID: <1cf8d5a5-bf3e-4667-bc6a-d1b1d662d822@citrix.com> (raw)
In-Reply-To: <CAG48ez0h0wUS6y+W1HTOwN14V95gKmmFZ_2TamAX+JKTmXT=DA@mail.gmail.com>

On 13/02/2025 2:09 am, Jann Horn wrote:
> On Thu, Feb 13, 2025 at 2:31 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> Assuming this is an issue you all feel is worth addressing, I will
>>>> continue working on providing a patch. I'm concerned though that the
>>>> overhead from adding a wrmsr on both syscall entry and exit to
>>>> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so
>>>> any feedback in regards to the approach or suggestions of alternate
>>>> approaches to patching are welcome :)
>>> Since the kernel, as far as I understand, uses FineIBT without
>>> backwards control flow protection (in other words, I think we assume
>>> that the kernel stack is trusted?),
>> This is fun indeed.  Linux cannot use supervisor shadow stacks because
>> the mess around NMI re-entrancy (and IST more generally) requires ROP
>> gadgets in order to function safely.  Implementing this with shadow
>> stacks active, while not impossible, is deemed to be prohibitively
>> complicated.
>>
>> Linux's supervisor shadow stack support is waiting for FRED support,
>> which fixes both the NMI re-entrancy problem, and other exceptions
>> nesting within NMIs, as well as prohibiting the use of the SWAPGS
>> instruction as FRED tries to make sure that the correct GS is always in
>> context.
>>
>> But, FRED support is slated for PantherLake/DiamondRapids which haven't
>> shipped yet, so are no use to the problem right now.
>>
>>> could we build a cheaper
>>> check on that basis somehow? For example, maybe we could do something like:
>>>
>>> ```
>>> endbr64
>>> test rsp, rsp
>>> js slowpath
>>> swapgs
>>> ```
>> I presume it's been pointed out already, but there are 3 related
>> entrypoints here.  SYSCALL64 which is discussed, SYSCALL32 and SYSENTER
>> which are related.
>>
>> But, any other IDT entry is in a similar bucket.  If we're corrupting a
>> function pointer or return address to redirect here, then the check of
>> CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers
>> stack frame.
>>
>> For IDT entries, checking %rsp is reasonable, because userspace can't
>> forge a kernel-like %rsp.  However, SYSCALL64 specifically leaves %rsp
>> entirely attacker controlled (and even potentially non-canonical), so
>> I'm wondering what you hand in mind for the slowpath to truly
>> distinguish kernel context from user context?
> Hm, yeah, that seems hard - maybe the best we could do is to make sure
> that the inactive gsbase has the correct value for our CPU's kernel
> gsbase? Kinda like a paranoid_entry, except more painful because we'd
> first have to figure out a place to spill registers to before we can
> start using stuff like rdmsr... Then a function pointer overwrite
> might still turn into returning to userspace with a sysret with GPRs
> full of kernel pointers, but at least we wouldn't run off of a bogus
> gsbase anymore?

Thinking about this some more, I think it's impossible to distinguish.

One of the many sharp edges of SYSCALL (and SYSENTER for that matter) is
that they're instructions expected to be only be used by userspace, but
that be executed in supervisor too[1].  They're asymmetric with their
SYSRET (and SYSEXIT) counterparts which are CPL0 instructions that
strictly transition into CPL3.

The SYSCALL behaviour TLDR is:

    %rcx = %rip
    %r11 = %eflags
    %cs = fixed attr
    %ss = fixed attr
    %rip = MSR_LSTAR

which means that %rcx (old rip) is the only piece of state which
userspace can't feasibly forge (and therefore could distinguish a
SYSCALL from user vs kernel mode), yet if we're talking about a JOP
chain to get here, then %rcx is under attacker control too.

There are a variety of solutions to this problem that involve not using
%gs for per-cpu data.  I also expect that to be wholly unpopular and
dismissed as an approach.

~Andrew

[1] No-one back then was brave enough to design CPL3-only instructions.

  reply	other threads:[~2025-02-13  2:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Z60NwR4w/28Z7XUa@ubun>
2025-02-12 22:29 ` [RFC] Circumventing FineIBT Via Entrypoints Jann Horn
2025-02-13  1:31   ` Andrew Cooper
2025-02-13  2:09     ` Jann Horn
2025-02-13  2:42       ` Andrew Cooper [this message]
2025-02-22 20:43         ` Rudolf Marek
2025-02-25 18:10           ` Andrew Cooper
2025-02-25 20:06             ` Rudolf Marek
2025-02-25 21:14               ` Andrew Cooper
2025-02-26  2:55                 ` Kees Cook
2025-02-26 22:48                 ` Rudolf Marek
2025-02-27  0:41                   ` Andrew Cooper
2025-03-01 22:48                     ` Rudolf Marek
2025-03-02 19:16                       ` Rudolf Marek
2025-03-02 22:31                         ` Andrew Cooper
2025-02-28 12:13         ` Florian Weimer
2025-02-13 20:28     ` Kees Cook
2025-02-13 20:41       ` Andrew Cooper
2025-02-13 20:53         ` Kees Cook
2025-02-13 20:57           ` Jann Horn
2025-02-16 23:42             ` Kees Cook
2025-02-14  9:57           ` Peter Zijlstra
2025-02-15 21:07             ` Peter Zijlstra
2025-02-16 23:51               ` Kees Cook
2025-02-17 10:39                 ` Peter Zijlstra
2025-02-17 13:06               ` David Laight
2025-02-17 13:13                 ` Peter Zijlstra
2025-02-17 18:38                   ` David Laight
2025-02-17 18:54                     ` Peter Zijlstra
2025-02-14 10:05         ` Peter Zijlstra
2025-02-14  9:54     ` Peter Zijlstra
2025-02-13  6:15   ` Jennifer Miller
2025-02-13 19:23     ` Jann Horn
2025-02-13 21:24       ` Andrew Cooper
2025-02-13 23:24         ` Jennifer Miller
2025-02-13 23:43           ` Jann Horn
2025-02-14 23:06           ` Andrew Cooper
2025-02-15  0:07             ` Jennifer Miller
2025-02-15  0:11               ` Andrew Cooper
2025-02-15  0:19                 ` Jennifer Miller
2025-02-14 22:25       ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1cf8d5a5-bf3e-4667-bc6a-d1b1d662d822@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jannh@google.com \
    --cc=jmill@asu.edu \
    --cc=joao@overdrivepizza.com \
    --cc=kees@kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=samitolvanen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox