All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Demi Marie Obenour <demi@invisiblethingslab.com>,
	Marco Elver <elver@google.com>
Cc: "Alexander Potapenko" <glider@google.com>,
	kasan-dev <kasan-dev@googlegroups.com>,
	"Xen development discussion" <xen-devel@lists.xenproject.org>,
	"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Subject: Re: kfence_protect_page() writing L1TF vulnerable PTE
Date: Mon, 12 Dec 2022 06:19:51 +0100	[thread overview]
Message-ID: <c250b8f5-bce5-da43-ae11-e5355141ea3c@suse.com> (raw)
In-Reply-To: <Y5azcFUxAWuEVicY@itl-email>


[-- Attachment #1.1.1: Type: text/plain, Size: 5583 bytes --]

On 12.12.22 05:55, Demi Marie Obenour wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> - -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> On Sun, Dec 11, 2022 at 11:50:39PM +0100, Marco Elver wrote:
>> On Sun, 11 Dec 2022 at 22:34, Demi Marie Obenour
>> <demi@invisiblethingslab.com> wrote:
>>> On Sun, Dec 11, 2022 at 01:15:06PM +0100, Juergen Gross wrote:
>>>> During tests with QubesOS a problem was found which seemed to be related
>>>> to kfence_protect_page() writing a L1TF vulnerable page table entry [1].
>>>>
>>>> Looking into the function I'm seeing:
>>>>
>>>>        set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
>>>>
>>>> I don't think this can be correct, as keeping the PFN unmodified and
>>>> just removing the _PAGE_PRESENT bit is wrong regarding L1TF.
>>>>
>>>> There should be at least the highest PFN bit set in order to be L1TF
>>>> safe.
> 
>> Could you elaborate what we want to be safe from?
> 
> The problem is not Linux’s safety, but Xen’s.  To prevent PV guests from
> arbitrarily reading and writing memory, all updates to PV guest page
> tables must be done via hypercalls.  This allows Xen to ensure that a
> guest can only read from its own memory and that pages used for page
> tables or segment descriptors are not mapped writable.
> 
>> KFENCE is only for kernel memory, i.e. slab allocations. The
>> page-protection mechanism is used to detect memory safety bugs in the
>> Linux kernel. The page protection does not prevent or mitigate any
>> such bugs because KFENCE only samples sl[au]b allocations. Normal slab
>> allocations never change the page protection bits; KFENCE merely uses
>> them to receive a page fault, upon which we determine either a
>> use-after-free or out-of-bounds access. After a bug is detected,
>> KFENCE unprotects the page so that the kernel can proceed "as normal"
>> given that's the state of things if it had been a normal sl[au]b
>> allocation.
> 
>> https://docs.kernel.org/dev-tools/kfence.html
> 
>>  From [1] I see: "If an instruction accesses a virtual address for
>> which the relevant page table entry (PTE) has the Present bit cleared
>> or other reserved bits set, then speculative execution ignores the
>> invalid PTE and loads the referenced data if it is present in the
>> Level 1 Data Cache, as if the page referenced by the address bits in
>> the PTE was still present and accessible."
> 
>> [1] https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html
> 
>> This is perfectly fine in the context of KFENCE, as stated above, the
>> page protection is merely used to detect out-of-bounds and
>> use-after-free bugs of sampled slab allocations. KFENCE does not
>> mitigate nor prevent such bugs, because it samples allocations, i.e.
>> most allocations are still serviced by sl[au]b.
> 
> It is not fine when running paravirtualized under Xen, though.  Xen
> strictly validates that present PTEs point into a guest’s own memory,
> but (in the absence of L1TF) allows not-present PTEs to have any value.
> However, L1TF means that doing so would allow a PV guest to leak memory
> from Xen or other guests!  Therefore, Xen requires that not-present PTEs
> be L1TF-safe, ensuring that PV guests cannot use L1TF to obtain memory
> from other guests or the hypervisor.
> 
> If a guest creates an L1TF-vulnerable PTE, Xen’s behavior depends on
> whether it has been compiled with shadow paging support.  If it has, Xen
> will transition the guest to shadow paging mode.  This works, but comes
> at a significant performance hit, so you don’t want that.  If shadow
> paging has been disabled at compile time, as is the case in Qubes, Xen
> simply crashes the guest.
> 
> dom0 is exempted from these checks by default, because the dom0 kernel
> is considered trusted.  However, this can be changed by a Xen
> command-line option, so it is not to be relied on.
> 
>> How can we teach whatever is complaining about L1TF on that KFENCE PTE
>> modification that KFENCE does not use page protection to stop anyone
>> from accessing that memory?
> 
> With current Xen, you can’t.  Any not-present PTE must be L1TF-safe on
> L1TF-vulnerable hardware, and I am not aware of any way to ask Xen if it
> considers the hardware vulnerable to L1TF.  Therefore, KFENCE would need
> to either not generate L1TF-vulnerable not-present PTEs, or
> automatically disable itself when running in Xen PV mode.
> 
> In theory, it ought to be safe for Xen to instead treat not-present
> L1TF-vulnerable PTEs as if they were present, and apply the same
> validation that it does for present PTEs.  However, the PV memory
> management code has been involved in several fatal, reliably exploitable
> PV guest escape vulnerabilities, and I would rather not make it any more
> complex than it already is.

Treating non-present PTEs like present ones has a major drawback: it
requires to keep track of all page frames being potentially referenced,
inducing a major performance hit for the "regular" case. Memory ballooning
would be a lot more complicated due to that.

> A much better solution would be for KFENCE to use PTE inversion just
> like the rest of the kernel does.  This solves the problem
> unconditionally, and avoids needing Xen PV specific code.  I have a
> patch that disables KFENCE on Xen PV, but I would much rather see KFENCE
> fixed, which is why I have not submitted the patch.

I can supply a kernel patch for doing the PFN inversion in the PTE.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

      reply	other threads:[~2022-12-12  5:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-11 12:15 kfence_protect_page() writing L1TF vulnerable PTE Juergen Gross
2022-12-11 21:34 ` Demi Marie Obenour
2022-12-11 22:50   ` Marco Elver
2022-12-12  4:55     ` Demi Marie Obenour
2022-12-12  5:19       ` Juergen Gross [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c250b8f5-bce5-da43-ae11-e5355141ea3c@suse.com \
    --to=jgross@suse.com \
    --cc=demi@invisiblethingslab.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.