From: Dave Hansen <dave.hansen@intel.com>
To: Nadav Amit <namit@vmware.com>
Cc: Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Nick Piggin <npiggin@gmail.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault
Date: Fri, 11 Mar 2022 12:59:17 -0800 [thread overview]
Message-ID: <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com> (raw)
In-Reply-To: <AC8D21EA-CD32-4F9F-B5C1-ED8804EC76FF@vmware.com>
On 3/11/22 12:38, Nadav Amit wrote:
>> On Mar 11, 2022, at 11:41 AM, Dave Hansen <dave.hansen@intel.com> wrote:
...
>> Can any sane code trigger this?
>
> Well, regarding this question and the previous one: I do not think that
> this scenario is possible today since mprotect() holds the mmap_lock
> for write. There is no other code that I am aware of that toggles
> the NX bit on a present entry.
>
> But I will not bet my life on it. That’s the reason for the somewhat
> vague phrasing that I used.
From the userspace perspective, mmap(MAP_FIXED) can do this too. But,
sane userspace can't rely on the syscall to have done any work and the
TLB flushing is currently done before the syscall returns.
I'd put it this way:
Today, it is possible for a thread to end up in access_error()
for a PF_INSN fault and observe a VM_EXEC VMA. If you are
generous, this could be considered a spurious fault.
However, the faulting thread would have had to race with the
thread which was changing the PTE and the VMA and is currently
*in* mprotect() (or some other syscall). In other words, the
faulting thread can encounter this situation, but it never had
any assurance from the kernel that it wouldn't fault. This is
because the faulting thread never had a chance to observe the
syscall return.
There is no evidence that the existing behavior can cause any
issues with sane userspace.
>>> index d0074c6ed31a..ad0ef0a6087a 100644
>>> --- a/arch/x86/mm/fault.c
>>> +++ b/arch/x86/mm/fault.c
>>> @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
>>> (error_code & X86_PF_INSTR), foreign))
>>> return 1;
>>>
>>> - if (error_code & X86_PF_WRITE) {
>>> + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) {
>>> + /*
>>> + * CPUs are not expected to set the two error code bits
>>> + * together, but to ensure that hypervisors do not misbehave,
>>> + * run an additional sanity check.
>>> + */
>>> + if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) ==
>>> + (X86_PF_WRITE|X86_PF_INSTR)) {
>>> + WARN_ON_ONCE(1);
>>> + return 1;
>>> + }
>>
>> access_error() is only used on the do_user_addr_fault() side of things.
>> Can we stick this check somewhere that also works for kernel address
>> faults? This is a generic sanity check. It can also be in a separate
>> patch.
>
> I can wrap it in a different function and also call it from
> do_kern_addr_fault() or spurious_kernel_fault().
>
> Anyhow, spurious_kernel_fault() should handle spurious faults on
> executable code correctly.
This is really about checking the sanity of the "hardware"-provided
error code. Let's just do it in handle_page_fault(), maybe hidden in a
function like:
void check_error_code_sanity(unsigned long error_code)
{
WARN_ON_ONCE(...);
}
You can leave the X86_PF_PK check in place for now. It's probably going
away soon anyway.
>> Also, we should *probably* stop talking about CPUs here. If there's
>> ever something wonky with error code bits, I'd put my money on a weird
>> hypervisor before any kind of CPU issue.
>
> I thought I manage to convey exactly that in the comment. Can you provide
> a better phrasing?
Maybe:
/*
* X86_PF_INSTR for instruction _fetches_. Fetches never write.
* X86_PF_WRITE should never be set with X86_PF_INSTR.
*
* This is most likely due to a buggy hypervisor.
*/
next prev parent reply other threads:[~2022-03-11 20:59 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-11 19:07 [RESEND PATCH v3 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 4/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2022-03-11 22:45 ` Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2022-03-11 20:41 ` Dave Hansen
2022-03-11 20:53 ` Nadav Amit
[not found] ` <20220311190749.338281-3-namit@vmware.com>
2022-03-11 19:41 ` [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault Dave Hansen
2022-03-11 20:38 ` Nadav Amit
2022-03-11 20:59 ` Dave Hansen [this message]
2022-03-11 21:16 ` Nadav Amit
2022-03-11 21:23 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com \
--to=dave.hansen@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=namit@vmware.com \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox