From: Dave Hansen <dave.hansen@intel.com>
To: Brian Geffon <bgeffon@google.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: Willis Kung <williskung@google.com>,
Guenter Roeck <groeck@google.com>, Borislav Petkov <bp@suse.de>,
Andy Lutomirski <luto@kernel.org>,
stable@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
Date: Tue, 15 Feb 2022 09:07:45 -0800 [thread overview]
Message-ID: <56fc0ced-d8d2-146f-6ca8-b95bd7e0b4f5@intel.com> (raw)
In-Reply-To: <20220215153644.3654582-1-bgeffon@google.com>
On 2/15/22 07:36, Brian Geffon wrote:
> There are two issues with PKRU handling prior to 5.13.
Are you sure both of these issues were introduced by 0cecca9d03c? I'm
surprised that the get_xsave_addr() issue is not older.
Should this be two patches?
> The first is that when eagerly switching PKRU we check that current
Don't forget to write in imperative mood. No "we's", please.
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
This goes for changelogs and comments too.
> is not a kernel thread as kernel threads will never use PKRU. It's
> possible that this_cpu_read_stable() on current_task (ie.
> get_current()) is returning an old cached value. By forcing the read
> with this_cpu_read() the correct task is used. Without this it's
> possible when switching from a kernel thread to a userspace thread
> that we'll still observe the PF_KTHREAD flag and never restore the
> PKRU. And as a result this issue only occurs when switching from a
> kernel thread to a userspace thread, switching from a non kernel
> thread works perfectly fine because all we consider in that situation
> is the flags from some other non kernel task and the next fpu is
> passed in to switch_fpu_finish().
It makes *sense* that there would be a place in the context switch code
where 'current' is wonky, but I never realized this. This seems really
fragile, but *also* trivially detectable.
Is the PKRU code really the only code to use 'current' in a buggy way
like this?
> The second issue is when using write_pkru() we only write to the
> xstate when the feature bit is set because get_xsave_addr() returns
> NULL when the feature bit is not set. This is problematic as the CPU
> is free to clear the feature bit when it observes the xstate in the
> init state, this behavior seems to be documented a few places throughout
> the kernel. If the bit was cleared then in write_pkru() we would happily
> write to PKRU without ever updating the xstate, and the FPU restore on
> return to userspace would load the old value agian.
^ again
It's probably worth noting that the AMD init tracker is a lot more
aggressive than Intel's. On Intel, I think XRSTOR is the only way to
get back to the init state. You're obviously hitting this on AMD.
It's also *very* unlikely that PKRU gets back to a value of 0. I think
we added a selftest for this case in later kernels.
That helps explain why this bug hung around for so long.
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 03b3de491b5e..540bda5bdd28 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> * PKRU state is switched eagerly because it needs to be valid before we
> * return to userland e.g. for a copy_to_user() operation.
> */
> - if (!(current->flags & PF_KTHREAD)) {
> + if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
This really deserves a specific comment.
> /*
> * If the PKRU bit in xsave.header.xfeatures is not set,
> * then the PKRU component was in init state, which means
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 9e71bf86d8d0..aa381b530de0 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> if (!boot_cpu_has(X86_FEATURE_OSPKE))
> return;
>
> - pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
> -
> /*
> * The PKRU value in xstate needs to be in sync with the value that is
> * written to the CPU. The FPU restore on return to userland would
> * otherwise load the previous value again.
> */
> fpregs_lock();
> - if (pk)
> - pk->pkru = pkru;
> + /*
> + * The CPU is free to clear the feature bit when the xstate is in the
> + * init state. For this reason, we need to make sure the feature bit is
> + * reset when we're explicitly writing to pkru. If we did not then we
> + * would write to pkru and it would not be saved on a context switch.
> + */
> + current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
I don't think we need to describe how the init optimization works again.
I'm also not sure it's worth mentioning context switches here. It's a
wider problem than that. Maybe:
/*
* All fpregs will be XRSTOR'd from this buffer before returning
* to userspace. Ensure that XRSTOR does not init PKRU and that
* get_xsave_addr() will work.
*/
> + pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
> + BUG_ON(!pk);
A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
much good.
> + pk->pkru = pkru;
> __write_pkru(pkru);
> fpregs_unlock();
> }
next prev parent reply other threads:[~2022-02-15 17:13 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-15 15:36 [PATCH] x86/fpu: Correct pkru/xstate inconsistency Brian Geffon
2022-02-15 15:57 ` Guenter Roeck
2022-02-15 16:19 ` Brian Geffon
2022-02-15 17:02 ` Guenter Roeck
2022-02-15 17:10 ` Dave Hansen
2022-02-15 16:20 ` Greg KH
2022-02-15 17:07 ` Dave Hansen [this message]
2022-02-15 17:50 ` Brian Geffon
2022-02-15 17:55 ` Dave Hansen
2022-02-15 19:22 ` [PATCH stable 5.4,5.10] " Brian Geffon
2022-02-15 19:44 ` Greg KH
2022-02-15 21:32 ` Brian Geffon
2022-02-15 21:42 ` Dave Hansen
2022-02-15 21:48 ` Brian Geffon
2022-02-16 2:01 ` Brian Geffon
2022-02-16 10:05 ` Greg KH
2022-02-16 10:05 ` Greg KH
2022-02-16 15:14 ` Brian Geffon
2022-02-16 15:16 ` Dave Hansen
2022-02-17 13:31 ` Brian Geffon
2022-02-17 16:44 ` Dave Hansen
2022-02-17 20:42 ` Brian Geffon
2022-02-24 15:16 ` Dave Hansen
2022-02-25 12:01 ` Greg KH
2022-02-15 21:14 ` [PATCH] " Guenter Roeck
2022-02-15 21:36 ` Brian Geffon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56fc0ced-d8d2-146f-6ca8-b95bd7e0b4f5@intel.com \
--to=dave.hansen@intel.com \
--cc=bgeffon@google.com \
--cc=bp@suse.de \
--cc=groeck@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=williskung@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox