public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@kernel.org>, Brendan Jackman <jackmanb@google.com>
Cc: Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	yosryahmed@google.com, reijiw@google.com, oweisse@google.com
Subject: Re: [PATCH RESEND] x86/entry: Don't write to CR3 when restoring to kernel CR3
Date: Tue, 19 Sep 2023 11:07:17 +0200	[thread overview]
Message-ID: <87sf7awmyi.ffs@tglx> (raw)
In-Reply-To: <b2967925-0a57-4a19-8657-3d9f4da83a20@app.fastmail.com>

On Mon, Sep 18 2023 at 20:28, Andy Lutomirski wrote:
> On Thu, Aug 17, 2023, at 5:15 AM, Brendan Jackman wrote:
>> From: Lai Jiangshan <laijs@linux.alibaba.com>
>>
>> Skip resuming KERNEL pages since it is already KERNEL CR3
>>
>> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>> ---
>>
>> While staring at paranoid_exit I was confused about why we had this CR3
>> write, avoiding it seems like a free optimisation. The original commit
>> 21e94459110252 ("x86/mm: Optimize RESTORE_CR3") says "Most NMI/paranoid
>> exceptions will not in fact change pagetables" but I didn't't understand
>> what the "most" was referring to. I then discovered this patch on the
>> mailing list, Andy said[1] that it looks correct so maybe now is the
>> time to merge it?
>
> I did?
>
> Looking at the link, I think I was saying that the opposite patch
> (*always* flush) looked okay.

That always flush part was solely for the user CR3 restore path.

>> @@ -236,14 +236,13 @@ For 32-bit we have the following conventions - 
>> kernel is built with
>>  .macro RESTORE_CR3 scratch_reg:req save_reg:req
>>  	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>> 
>> -	ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
>> -
>>  	/*
>> -	 * KERNEL pages can always resume with NOFLUSH as we do
>> -	 * explicit flushes.
>> +	 * Skip resuming KERNEL pages since it is already KERNEL CR3.
>>  	 */
>>  	bt	$PTI_USER_PGTABLE_BIT, \save_reg
>> -	jnc	.Lnoflush_\@
>> +	jnc	.Lend_\@
>
> I don't get it.  How do you know that the actual loaded CR3 is correct?
>
> I'm willing to believe that there is some constraint in the way the
> kernel works such that every paranoid entry will, as part of its
> execution, switch CR3 to kernel *and leave it like that* *and that
> this will be the _same_ kernel CR3 that was saved*.  But I'm not
> really convinced that's true.  (Can we schedule in a paranoid entry?
> Probably not.  What about the weird NMI paths?  What if we do
> something that switches to init mm?  Hmm -- doing that in a paranoid
> context is probably not a brilliant idea.)

You have to differentiate between entry from kernel and entry from user.

Entry from user switches to the task stack, while entry from kernel
always runs on IST.

Entry from user cannot have kernel CR3 obviously, while entry from
kernel can have either kernel CR3 or user CR3. Entry from user does not
use the paranoid entry/exit paths at all, so that's a non-issue.

IST prevents that the exception can schedule, which in turn guarantees
that CR3 stays the same until it returns. Unless some completely stupid
code path would trie to switch to a different mm from an exception which
can hit into the middle of an mm switch. Then the failed restore is
probably the least of our problems.

> Maybe it is true, and maybe a convincing argument could be made.  But
> that seems like a lot of thinking and fragility for an optimization
> that seems pretty minor.

I don't think its pretty minor. CR3 writes even with the noflush bit set
are not necessarily cheap.

While most exceptions which go through the paranoid path are not
hotpath, the one which matters is #NMI due to perf. So I think it's
worth to spare the redundant CR3 switch in that case.

Thanks,

        tglx


      reply	other threads:[~2023-09-19  9:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-17 12:15 [PATCH RESEND] x86/entry: Don't write to CR3 when restoring to kernel CR3 Brendan Jackman
2023-08-23 18:42 ` Brendan Jackman
2023-09-18  7:35 ` Ingo Molnar
2023-09-18 16:56 ` Thomas Gleixner
2023-09-19  3:28 ` Andy Lutomirski
2023-09-19  9:07   ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf7awmyi.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jackmanb@google.com \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oweisse@google.com \
    --cc=reijiw@google.com \
    --cc=x86@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox