linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-doc@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	linux-arch <linux-arch@vger.kernel.org>, X86 ML <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. J. Lu" <hjl.tools@gmail.com>,
	"Shanbhogue, Vedvyas" <vedvyas.shanbhogue@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Jonathan Corbet <corbet@lwn.net>, Oleg Nesterov <oleg@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>,
	mike.kravetz@oracle.com
Subject: Re: [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions
Date: Thu, 7 Jun 2018 17:59:37 -0700	[thread overview]
Message-ID: <CALCETrU7uNpSp8DWKnpH28wHE3JOeXkmp-H97n2nWHJEu4pDEA@mail.gmail.com> (raw)
In-Reply-To: <5c39caf1-2198-3c2b-b590-8c38a525747f@linux.intel.com>

On Thu, Jun 7, 2018 at 1:30 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
> On 06/07/2018 09:24 AM, Andy Lutomirski wrote:
>
> >> +static inline void ptep_set_wrprotect_flush(struct vm_area_struct *vma,
> >> +                                           unsigned long addr, pte_t *ptep)
> >> +{
> >> +       bool rw;
> >> +
> >> +       rw = test_and_clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> >> +       if (IS_ENABLED(CONFIG_X86_INTEL_SHADOW_STACK_USER)) {
> >> +               struct mm_struct *mm = vma->vm_mm;
> >> +               pte_t pte;
> >> +
> >> +               if (rw && (atomic_read(&mm->mm_users) > 1))
> >> +                       pte = ptep_clear_flush(vma, addr, ptep);
> > Why are you clearing the pte?
>
> I found my notes on the subject. :)
>
> Here's the sequence that causes the problem.  This could happen any time
> we try to take a PTE from read-write to read-only.  P==Present, W=Write,
> D=Dirty:
>
> CPU0 does a write, sees PTE with P=1,W=1,D=0
> CPU0 decides to set D=1
> CPU1 comes in and sets W=0
> CPU0 does locked operation to set D=1
>         CPU0 sees P=1,W=0,D=0
>         CPU0 sets back P=1,W=0,D=1
> CPU0 loads P=1,W=0,D=1 into the TLB
> CPU0 attempts to continue the write, but sees W=0 in the TLB and a #PF
> is generated because of the write fault.
>
> The problem with this is that we end up with a shadowstack-PTE
> (Write=0,Dirty=1) where we didn't want one.  This, unfortunately,
> imposes extra TLB flushing overhead on the R/W->R/O transitions that
> does not exist before shadowstack enabling.
>

So what exactly do the architects want the OS to do?  AFAICS the only
valid ways to clear the dirty bit are:

--- Choice 1 ---
a) Set P=0.
b) Flush using an IPI
c) Read D (so we know if the page was actually dirty)
d) Set P=1,W=0,D=0

and we need to handle spurious faults that happen between steps (a)
and (c).  This isn't so easy because the straightforward "is the fault
spurious" check is going to think it's *not* spurious.

--- Choice 2 ---
a) Set W=0
b) flush
c) Test and clear D

and we need to handle the spurious fault between b and c.  At least
this particular spurious fault is easier to handle since we can check
the error code.

But surely the right solution is to get the architecture team to see
if they can fix the dirty-bit-setting algorithm or, even better, to
look and see if the dirty-bit-setting algorithm is *already* better
and just document it.  If the cpu does a locked set-bit on D in your
example, the CPU is just being silly.  The CPU should make the whole
operation fully atomic: when trying to write to a page that's D=0 in
the TLB, it should re-walk the page tables and, atomically, load the
PTE and, if it's W=1,D=0, set D=1.  I'd honestly be a bit surprised if
modern CPUs don't already do this.

(Hmm.  If the CPUs were that smart, then we wouldn't need a flush at
all in some cases.  If we lock cmpxchg to change W=1,D=0 to W=0,D=0,
then we know that no other CPU can subsequently write the page without
re-walking, and we don't need to flush.)

Can you ask the architecture folks to clarify the situation?  And, if
your notes are indeed correct, don't we need code to handle spurious
faults?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-06-08  0:59 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 14:36 [PATCH 0/9] Control Flow Enforcement - Part (2) Yu-cheng Yu
2018-06-07 14:36 ` [PATCH 1/9] x86/cet: Control protection exception handler Yu-cheng Yu
2018-06-07 15:46   ` Andy Lutomirski
2018-06-07 16:23     ` Yu-cheng Yu
2018-06-08  4:17   ` kbuild test robot
2018-06-08  4:18   ` kbuild test robot
2018-06-07 14:36 ` [PATCH 2/9] x86/cet: Add Kconfig option for user-mode shadow stack Yu-cheng Yu
2018-06-07 15:47   ` Andy Lutomirski
2018-06-07 15:58     ` Yu-cheng Yu
2018-06-07 16:28       ` Andy Lutomirski
2018-06-07 14:36 ` [PATCH 3/9] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
2018-06-07 14:37 ` [PATCH 4/9] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
2018-06-08  3:53   ` kbuild test robot
2018-06-07 14:37 ` [PATCH 5/9] x86/mm: Introduce _PAGE_DIRTY_SW Yu-cheng Yu
2018-06-08  5:15   ` kbuild test robot
2018-06-07 14:37 ` [PATCH 6/9] x86/mm: Introduce ptep_set_wrprotect_flush and related functions Yu-cheng Yu
2018-06-07 16:24   ` Andy Lutomirski
2018-06-07 18:21     ` Dave Hansen
2018-06-07 18:24       ` Andy Lutomirski
2018-06-07 20:29     ` Dave Hansen
2018-06-07 20:36       ` Yu-cheng Yu
2018-06-08  0:59       ` Andy Lutomirski [this message]
2018-06-08  1:20         ` Dave Hansen
2018-06-08  4:43   ` kbuild test robot
2018-06-08 14:13   ` kbuild test robot
2018-06-07 14:37 ` [PATCH 7/9] x86/mm: Shadow stack page fault error checking Yu-cheng Yu
2018-06-07 16:26   ` Andy Lutomirski
2018-06-07 16:46     ` Yu-cheng Yu
2018-06-07 16:56     ` Dave Hansen
2018-06-07 14:37 ` [PATCH 8/9] x86/cet: Handle shadow stack page fault Yu-cheng Yu
2018-06-07 14:37 ` [PATCH 9/9] x86/cet: Handle THP/HugeTLB shadow stack page copying Yu-cheng Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrU7uNpSp8DWKnpH28wHE3JOeXkmp-H97n2nWHJEu4pDEA@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=vedvyas.shanbhogue@intel.com \
    --cc=x86@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).