Re: [GIT PULL] x86/shstk for 6.4

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave.hansen@intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"keescook@chromium.org" <keescook@chromium.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: Re: [GIT PULL] x86/shstk for 6.4
Date: Mon, 8 May 2023 15:57:11 -0700	[thread overview]
Message-ID: <4171c4b0-e24b-a7e2-9928-030cc14f1d8d@intel.com> (raw)
In-Reply-To: <CAHk-=wiB0wy6oXOsPtYU4DSbqJAY8z5iNBKdjdOp2LP23khUoA@mail.gmail.com>

On 5/6/23 13:09, Linus Torvalds wrote:
> Now, my reaction here is
> 
>  - the whole shadow stack notion of "dirty but not writable is a magic
> marker" is *DISGUSTING*. It's wrong.
> 
>    Whatever Intel designer that came up with that - instead of just
> picking another bit for the *HARDWARE* to check - should be ashamed.
>>    Now we have to pick a software bit instead, and play games for
> this. BAD BAD BAD.
> 
>    I'm assuming this is something where Microsoft went "we already
> don't have that, and we want all the sw bits for sw, so do this". But
> from a design standpoint it's just nasty.

Heh, I won't name names.  But, yeah, it was something like that.

>  - But if we have to play those games, just *play* them. Do it all
> unconditionally, and make the x86-64 rules be that "dirty but not
> writable" is something we should never have.

There's a wrinkle to enforcing that universally.  From the SDM's
"ACCESSED AND DIRTY FLAGS" section:

	If software on one logical processor writes to a page while
	software on another logical processor concurrently clears the
	R/W flag in the paging-structure entry that maps the page,
	execution on some processors may result in the entry’s dirty
	flag being set.

This behavior is gone on shadow stack CPUs, but it does exist on older
ones.  We could theoretically stop being exposed to it by transitioning
all PTE operations that today do:

	1. RW => RO (usually more than one)
	2. TLB flush

to instead take a trip through Present=0 first:

	1. RW => Present=0
	2. TLB flush
	3. Present=0 => RO

Similar to what we do for doing Dirty=1->0.

We could probably tolerate the cost for some of the users like ksm.  But
I can't think of a way to do it without making fork() suffer.  fork() of
course modifies the PTE (RW->RO) and flushes the TLB now.  But there
would need to be a Present=0 PTE in there somewhere before the TLB flush.

That fundamentally means there needs to be a second look at the PTEs and
some fault handling for folks that do read-only accesses to the PTEs
during the Present=0 window.

That said, there are some places like:

	pte_mksaveddirty()
and
	pte_clear_saveddirty()

that are doing _extra_ things on shadow stack systems.  That stuff could
be made the common case without functionally breaking any old systems.

So, the rule would be something like:

	The *kernel* will never itself create Write=0,Dirty=1 PTEs

That won't prevent the hardware from still being able to do it behind
our backs on older CPUs.  But it does avoid a few of the special cases.

next prev parent reply	other threads:[~2023-05-08 22:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24 21:21 [GIT PULL] x86/shstk for 6.4 Dave Hansen
2023-04-28 18:17 ` Linus Torvalds
2023-04-29  0:26   ` Edgecombe, Rick P
2023-04-29  0:40     ` Dave Hansen
2023-05-06 19:34       ` Linus Torvalds
2023-05-06 20:09         ` Linus Torvalds
2023-05-07  0:18           ` Edgecombe, Rick P
2023-05-07  0:38             ` Linus Torvalds
2023-05-07 15:57               ` Edgecombe, Rick P
2023-05-08 22:57           ` Dave Hansen [this message]
2023-05-08 23:31             ` Linus Torvalds
2023-05-08 23:47               ` Linus Torvalds
2023-05-12 17:34                 ` Dave Hansen
2023-05-12 21:55                   ` Linus Torvalds
2023-05-15 21:36                     ` Dave Hansen
2023-05-15 21:37                       ` Dave Hansen
2023-05-15 22:40                       ` Linus Torvalds
2023-05-15 23:02                         ` Linus Torvalds
2023-05-16 20:38                         ` Linus Torvalds
2023-05-16 20:42                           ` Dave Hansen
2023-05-09  0:07               ` Dave Hansen
2023-05-07  0:10         ` Edgecombe, Rick P
2023-05-07  0:19           ` Linus Torvalds
2023-05-07 16:24             ` Edgecombe, Rick P
2023-05-15 21:22               ` Deepak Gupta
2023-05-25 16:20                 ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4171c4b0-e24b-a7e2-9928-030cc14f1d8d@intel.com \
    --to=dave.hansen@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox