From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5FFBC64E7B for ; Mon, 2 Nov 2020 20:53:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5D770206B2 for ; Mon, 2 Nov 2020 20:53:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D770206B2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 02C176B007D; Mon, 2 Nov 2020 15:53:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F1EA66B0081; Mon, 2 Nov 2020 15:53:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D24D56B0080; Mon, 2 Nov 2020 15:53:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id A435B6B007E for ; Mon, 2 Nov 2020 15:53:45 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4EA82180AD802 for ; Mon, 2 Nov 2020 20:53:45 +0000 (UTC) X-FDA: 77440679610.29.body37_1b11927272b3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 23D39180868DC for ; Mon, 2 Nov 2020 20:53:45 +0000 (UTC) X-HE-Tag: body37_1b11927272b3 X-Filterd-Recvd-Size: 10506 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 2 Nov 2020 20:53:43 +0000 (UTC) IronPort-SDR: YwaIjq63BHCXeMEmEkf4cP/flC9ans//5+dJZCNeh70QnbzZNQuXsoDplqCtMCyh/5DF1gOpGp LKXUcOeClmyQ== X-IronPort-AV: E=McAfee;i="6000,8403,9793"; a="148231660" X-IronPort-AV: E=Sophos;i="5.77,445,1596524400"; d="scan'208";a="148231660" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 12:53:42 -0800 IronPort-SDR: ta3U9kd0Jsn/cIY2nPtdojoxRCkd+dStFMatPLl9Nam6TvbTjoC4kRArmbI3g4iAWYMaKeVB6x NR6Q8kmAW2bQ== X-IronPort-AV: E=Sophos;i="5.77,445,1596524400"; d="scan'208";a="363369735" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 12:53:41 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Ira Weiny , x86@kernel.org, Dan Williams , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH V2 08/10] x86/entry: Preserve PKRS MSR across exceptions Date: Mon, 2 Nov 2020 12:53:18 -0800 Message-Id: <20201102205320.1458656-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201102205320.1458656-1-ira.weiny@intel.com> References: <20201102205320.1458656-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny The PKRS MSR is not managed by XSAVE. It is preserved through a context switch but this support leaves exception handling code open to memory accesses during exceptions. 2 possible places for preserving this state were considered, irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and was potentially fraught with unintended consequences.[2] irqentry_state_t was already an object being used in the exception handling and is straightforward. It is also easy for any number of nested states to be tracked and eventually can be enhanced to store the reference counting required to support PKS through kmap reentry Preserve the current task's PKRS values in irqentry_state_t on exception entry and restoring them on exception exit. Each nested exception is further saved allowing for any number of levels of exception handling. Peter and Thomas both suggested parts of the patch, IDT and NMI respectiv= ely. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN= 1F577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#= t Cc: Dave Hansen Cc: Andy Lutomirski Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Signed-off-by: Ira Weiny --- Changes from V1 remove redundant irq_state->pkrs This value is only needed for the global tracking. So it should be included in that patch and not in this one. Changes from RFC V3 Standardize on 'irq_state' variable name Per Dave Hansen irq_save_pkrs() -> irq_save_set_pkrs() Rebased based on clean up patch by Thomas Gleixner This includes moving irq_[save_set|restore]_pkrs() to the core as well. --- arch/x86/entry/common.c | 38 +++++++++++++++++++++++++++++ arch/x86/include/asm/pkeys_common.h | 5 ++-- arch/x86/mm/pkeys.c | 2 +- include/linux/entry-common.h | 13 ++++++++++ kernel/entry/common.c | 14 +++++++++-- 5 files changed, 67 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 87dea56a15d2..1b6a419a6fac 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -19,6 +19,7 @@ #include #include #include +#include =20 #ifdef CONFIG_XEN_PV #include @@ -209,6 +210,41 @@ SYSCALL_DEFINE0(ni_syscall) return -ENOSYS; } =20 +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +/* + * PKRS is a per-logical-processor MSR which overlays additional protect= ion for + * pages which have been mapped with a protection key. + * + * The register is not maintained with XSAVE so we have to maintain the = MSR + * value in software during context switch and exception handling. + * + * Context switches save the MSR in the task struct thus taking that val= ue to + * other processors if necessary. + * + * To protect against exceptions having access to this memory we save th= e + * current running value and set the PKRS value for the duration of the + * exception. Thus preventing exception handlers from having the elevat= ed + * access of the interrupted task. + */ +noinstr void irq_save_set_pkrs(irqentry_state_t *irq_state, u32 val) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + irq_state->thread_pkrs =3D current->thread.saved_pkrs; + write_pkrs(INIT_PKRS_VALUE); +} + +noinstr void irq_restore_pkrs(irqentry_state_t *irq_state) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + write_pkrs(irq_state->thread_pkrs); + current->thread.saved_pkrs =3D irq_state->thread_pkrs; +} +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #ifdef CONFIG_XEN_PV #ifndef CONFIG_PREEMPTION /* @@ -272,6 +308,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct= pt_regs *regs) =20 inhcall =3D get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(irq_state.exit_rcu)) { + /* Normally called by irqentry_exit, we must restore pkrs here */ + irq_restore_pkrs(&irq_state); instrumentation_begin(); irqentry_exit_cond_resched(); instrumentation_end(); diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/p= keys_common.h index cd492c23b28c..f921c58793f9 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -31,9 +31,10 @@ #define PKS_NUM_KEYS 16 =20 #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS -void write_pkrs(u32 new_pkrs); +DECLARE_PER_CPU(u32, pkrs_cache); +noinstr void write_pkrs(u32 new_pkrs); #else -static inline void write_pkrs(u32 new_pkrs) { } +static __always_inline void write_pkrs(u32 new_pkrs) { } #endif =20 #endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 0dc77409957a..39ec034cf379 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -252,7 +252,7 @@ DEFINE_PER_CPU(u32, pkrs_cache); * until all prior executions of WRPKRU have completed execution * and updated the PKRU register. */ -void write_pkrs(u32 new_pkrs) +noinstr void write_pkrs(u32 new_pkrs) { u32 *pkrs; =20 diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 0e10d0b85817..235110f34a06 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -343,6 +343,8 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs)= ; #ifndef irqentry_state /** * struct irqentry_state - Opaque object for exception state storage + * @thread_pkrs: Thread Supervisor Pkey value to be restored when except= ion is + * complete. * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whethe= r the * exit path has to invoke rcu_irq_exit(). * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures tha= t @@ -357,6 +359,9 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs)= ; * the maintenance of the irqentry_*() functions. */ typedef struct irqentry_state { +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS + u32 thread_pkrs; +#endif union { bool exit_rcu; bool lockdep; @@ -364,6 +369,14 @@ typedef struct irqentry_state { } irqentry_state_t; #endif =20 +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +noinstr void irq_save_set_pkrs(irqentry_state_t *irq_state, u32 val); +noinstr void irq_restore_pkrs(irqentry_state_t *irq_state); +#else +static __always_inline void irq_save_set_pkrs(irqentry_state_t *irq_stat= e, u32 val) { } +static __always_inline void irq_restore_pkrs(irqentry_state_t *irq_state= ) { } +#endif + /** * irqentry_enter - Handle state tracking on ordinary interrupt entries * @regs: Pointer to pt_regs of interrupted context diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 359688400c6b..2340011a5a06 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -326,7 +326,7 @@ noinstr void irqentry_enter(struct pt_regs *regs, irq= entry_state_t *irq_state) instrumentation_end(); =20 irq_state->exit_rcu =3D true; - return; + goto done; } =20 /* @@ -340,6 +340,9 @@ noinstr void irqentry_enter(struct pt_regs *regs, irq= entry_state_t *irq_state) /* Use the combo lockdep/tracing function */ trace_hardirqs_off(); instrumentation_end(); + +done: + irq_save_set_pkrs(irq_state, INIT_PKRS_VALUE); } =20 void irqentry_exit_cond_resched(void) @@ -361,7 +364,12 @@ noinstr void irqentry_exit(struct pt_regs *regs, irq= entry_state_t *irq_state) /* Check whether this returns to user mode */ if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); - } else if (!regs_irqs_disabled(regs)) { + return; + } + + irq_restore_pkrs(irq_state); + + if (!regs_irqs_disabled(regs)) { /* * If RCU was not watching on entry this needs to be done * carefully and needs the same ordering of lockdep/tracing @@ -407,10 +415,12 @@ void noinstr irqentry_nmi_enter(struct pt_regs *reg= s, irqentry_state_t *irq_stat trace_hardirqs_off_finish(); ftrace_nmi_enter(); instrumentation_end(); + irq_save_set_pkrs(irq_state, INIT_PKRS_VALUE); } =20 void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t *i= rq_state) { + irq_restore_pkrs(irq_state); instrumentation_begin(); ftrace_nmi_exit(); if (irq_state->lockdep) { --=20 2.28.0.rc0.12.gb6a658bd00c9