virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] x86: major paravirt cleanup
@ 2020-11-20 11:46 Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization, linux-hyperv, kvm
  Cc: Juri Lelli, Wanpeng Li, peterz, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, Daniel Lezcano, VMware, Inc., Ingo Molnar,
	Mel Gorman, Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira

This is a major cleanup of the paravirt infrastructure aiming at
eliminating all custom code patching via paravirt patching.

This is achieved by using ALTERNATIVE instead, leading to the ability
to give objtool access to the patched in instructions.

In order to remove most of the 32-bit special handling from pvops the
time related operations are switched to use static_call() instead.

At the end of this series all paravirt patching has to do is to
replace indirect calls with direct ones. In a further step this could
be switched to static_call(), too, but that would require a major
header file disentangling.

Note that an updated objtool is needed for this series, as otherwise
lots of warnings due to alternative instructions modifying the stack
will be issued during the build.

Changes in V2:
- added patches 5-12

Juergen Gross (12):
  x86/xen: use specific Xen pv interrupt entry for MCE
  x86/xen: use specific Xen pv interrupt entry for DF
  x86/pv: switch SWAPGS to ALTERNATIVE
  x86/xen: drop USERGS_SYSRET64 paravirt call
  x86: rework arch_local_irq_restore() to not use popf
  x86/paravirt: switch time pvops functions to use static_call()
  x86: add new features for paravirt patching
  x86/paravirt: remove no longer needed 32-bit pvops cruft
  x86/paravirt: switch iret pvops to ALTERNATIVE
  x86/paravirt: add new macros PVOP_ALT* supporting pvops in
    ALTERNATIVEs
  x86/paravirt: switch functions with custom code to ALTERNATIVE
  x86/paravirt: have only one paravirt patch function

 arch/x86/Kconfig                      |   1 +
 arch/x86/entry/entry_32.S             |   4 +-
 arch/x86/entry/entry_64.S             |  32 ++--
 arch/x86/include/asm/cpufeatures.h    |   3 +
 arch/x86/include/asm/idtentry.h       |   6 +
 arch/x86/include/asm/irqflags.h       |  51 ++----
 arch/x86/include/asm/mshyperv.h       |  11 --
 arch/x86/include/asm/paravirt.h       | 170 ++++++--------------
 arch/x86/include/asm/paravirt_time.h  |  38 +++++
 arch/x86/include/asm/paravirt_types.h | 222 ++++++++++++--------------
 arch/x86/kernel/Makefile              |   3 +-
 arch/x86/kernel/alternative.c         |  30 +++-
 arch/x86/kernel/asm-offsets.c         |   8 -
 arch/x86/kernel/asm-offsets_64.c      |   3 -
 arch/x86/kernel/cpu/vmware.c          |   5 +-
 arch/x86/kernel/head_64.S             |   2 -
 arch/x86/kernel/irqflags.S            |  11 --
 arch/x86/kernel/kvm.c                 |   3 +-
 arch/x86/kernel/kvmclock.c            |   3 +-
 arch/x86/kernel/paravirt.c            |  70 +++-----
 arch/x86/kernel/paravirt_patch.c      | 109 -------------
 arch/x86/kernel/tsc.c                 |   3 +-
 arch/x86/xen/enlighten_pv.c           |  36 +++--
 arch/x86/xen/irq.c                    |  23 ---
 arch/x86/xen/time.c                   |  12 +-
 arch/x86/xen/xen-asm.S                |  52 +-----
 arch/x86/xen/xen-ops.h                |   3 -
 drivers/clocksource/hyperv_timer.c    |   5 +-
 drivers/xen/time.c                    |   3 +-
 kernel/sched/sched.h                  |   1 +
 30 files changed, 325 insertions(+), 598 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-27 11:31   ` Borislav Petkov
  2020-12-09 21:07   ` Thomas Gleixner
  2020-11-20 11:46 ` [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call Juergen Gross via Virtualization
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Thomas Gleixner, Boris Ostrovsky

SWAPGS is used only for interrupts coming from user mode or for
returning to user mode. So there is no reason to use the PARAVIRT
framework, as it can easily be replaced by an ALTERNATIVE depending
on X86_FEATURE_XENPV.

There are several instances using the PV-aware SWAPGS macro in paths
which are never executed in a Xen PV guest. Replace those with the
plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S             | 10 +++++-----
 arch/x86/include/asm/irqflags.h       | 20 ++++++++------------
 arch/x86/include/asm/paravirt.h       | 20 --------------------
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/asm-offsets_64.c      |  1 -
 arch/x86/kernel/paravirt.c            |  1 -
 arch/x86/kernel/paravirt_patch.c      |  3 ---
 arch/x86/xen/enlighten_pv.c           |  3 ---
 8 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cad08703c4ad..a876204a73e0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -669,7 +669,7 @@ native_irq_return_ldt:
 	 */
 
 	pushq	%rdi				/* Stash user RDI */
-	SWAPGS					/* to kernel GS */
+	swapgs					/* to kernel GS */
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi	/* to kernel CR3 */
 
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
@@ -699,7 +699,7 @@ native_irq_return_ldt:
 	orq	PER_CPU_VAR(espfix_stack), %rax
 
 	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
-	SWAPGS					/* to user GS */
+	swapgs					/* to user GS */
 	popq	%rdi				/* Restore user RDI */
 
 	movq	%rax, %rsp
@@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
 	ret
 
 .Lparanoid_entry_swapgs:
-	SWAPGS
+	swapgs
 
 	/*
 	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
 	jnz		restore_regs_and_return_to_kernel
 
 	/* We are returning to a context with user GSBASE */
-	SWAPGS_UNSAFE_STACK
+	swapgs
 	jmp		restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
@@ -1426,7 +1426,7 @@ nmi_no_fsgsbase:
 	jnz	nmi_restore
 
 nmi_swapgs:
-	SWAPGS_UNSAFE_STACK
+	swapgs
 
 nmi_restore:
 	POP_REGS
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 2dfc8d380dab..8c86edefa115 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -131,18 +131,6 @@ static __always_inline unsigned long arch_local_irq_save(void)
 #define SAVE_FLAGS(x)		pushfq; popq %rax
 #endif
 
-#define SWAPGS	swapgs
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
-#define SWAPGS_UNSAFE_STACK	swapgs
-
 #define INTERRUPT_RETURN	jmp native_iret
 #define USERGS_SYSRET64				\
 	swapgs;					\
@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void)
 
 	return arch_irqs_disabled_flags(flags);
 }
+#else
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PV
+#define SWAPGS	ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
+#else
+#define SWAPGS	swapgs
+#endif
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d25cc6830e89..5647bcdba776 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,26 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-/*
- * If swapgs is used while the userspace stack is still current,
- * there's no way to call a pvop.  The PV replacement *must* be
- * inlined, or the swapgs instruction must be trapped and emulated.
- */
-#define SWAPGS_UNSAFE_STACK						\
-	PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs)
-
-/*
- * Note: swapgs is very special, and in practise is either going to be
- * implemented with a single "swapgs" instruction or something very
- * special.  Either way, we don't need to save any registers for
- * it.
- */
-#define SWAPGS								\
-	PARA_SITE(PARA_PATCH(PV_CPU_swapgs),				\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  call PARA_INDIRECT(pv_ops+PV_CPU_swapgs);		\
-		 )
-
 #define USERGS_SYSRET64							\
 	PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),			\
 		  ANNOTATE_RETPOLINE_SAFE;				\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0fad9f61c76a..903d71884fa2 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -169,8 +169,6 @@ struct pv_cpu_ops {
 	   frame set up. */
 	void (*iret)(void);
 
-	void (*swapgs)(void);
-
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 828be792231e..1354bc30614d 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -15,7 +15,6 @@ int main(void)
 #ifdef CONFIG_PARAVIRT_XXL
 	OFFSET(PV_CPU_usergs_sysret64, paravirt_patch_template,
 	       cpu.usergs_sysret64);
-	OFFSET(PV_CPU_swapgs, paravirt_patch_template, cpu.swapgs);
 #ifdef CONFIG_DEBUG_ENTRY
 	OFFSET(PV_IRQ_save_fl, paravirt_patch_template, irq.save_fl);
 #endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 6c3407ba6ee9..5e5fcf5c376d 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -312,7 +312,6 @@ struct paravirt_patch_template pv_ops = {
 
 	.cpu.usergs_sysret64	= native_usergs_sysret64,
 	.cpu.iret		= native_iret,
-	.cpu.swapgs		= native_swapgs,
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 	.cpu.invalidate_io_bitmap	= native_tss_invalidate_io_bitmap,
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
index ace6e334cb39..7c518b08aa3c 100644
--- a/arch/x86/kernel/paravirt_patch.c
+++ b/arch/x86/kernel/paravirt_patch.c
@@ -28,7 +28,6 @@ struct patch_xxl {
 	const unsigned char	irq_restore_fl[2];
 	const unsigned char	cpu_wbinvd[2];
 	const unsigned char	cpu_usergs_sysret64[6];
-	const unsigned char	cpu_swapgs[3];
 	const unsigned char	mov64[3];
 };
 
@@ -43,7 +42,6 @@ static const struct patch_xxl patch_data_xxl = {
 	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
 	.cpu_usergs_sysret64	= { 0x0f, 0x01, 0xf8,
 				    0x48, 0x0f, 0x07 },	// swapgs; sysretq
-	.cpu_swapgs		= { 0x0f, 0x01, 0xf8 },	// swapgs
 	.mov64			= { 0x48, 0x89, 0xf8 },	// mov %rdi, %rax
 };
 
@@ -86,7 +84,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
 	PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len);
 
 	PATCH_CASE(cpu, usergs_sysret64, xxl, insn_buff, len);
-	PATCH_CASE(cpu, swapgs, xxl, insn_buff, len);
 	PATCH_CASE(cpu, wbinvd, xxl, insn_buff, len);
 #endif
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 76616024129e..44bb18adfb51 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1085,9 +1085,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 #endif
 	.io_delay = xen_io_delay,
 
-	/* Xen takes care of %gs when switching to usermode for us */
-	.swapgs = paravirt_nop,
-
 	.start_context_switch = paravirt_start_context_switch,
 	.end_context_switch = xen_end_context_switch,
 };
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-12-02 12:32   ` Borislav Petkov
  2020-11-20 11:46 ` [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf Juergen Gross via Virtualization
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Thomas Gleixner, Boris Ostrovsky

USERGS_SYSRET64 is used to return from a syscall via sysret, but
a Xen PV guest will nevertheless use the iret hypercall, as there
is no sysret PV hypercall defined.

So instead of testing all the prerequisites for doing a sysret and
then mangling the stack for Xen PV again for doing an iret just use
the iret exit from the beginning.

This can easily be done via an ALTERNATIVE like it is done for the
sysenter compat case already.

It should be noted that this drops the optimization in Xen for not
restoring a few registers when returning to user mode, but it seems
as if the saved instructions in the kernel more than compensate for
this drop (a kernel build in a Xen PV guest was slightly faster with
this patch applied).

While at it remove the stale sysret32 remnants.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S             | 22 +++++++++-------------
 arch/x86/include/asm/irqflags.h       |  6 ------
 arch/x86/include/asm/paravirt.h       |  5 -----
 arch/x86/include/asm/paravirt_types.h |  8 --------
 arch/x86/kernel/asm-offsets_64.c      |  2 --
 arch/x86/kernel/paravirt.c            |  5 +----
 arch/x86/kernel/paravirt_patch.c      |  4 ----
 arch/x86/xen/enlighten_pv.c           |  1 -
 arch/x86/xen/xen-asm.S                | 20 --------------------
 arch/x86/xen/xen-ops.h                |  2 --
 10 files changed, 10 insertions(+), 65 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a876204a73e0..df865eebd3d7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,14 +46,6 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT_XXL
-SYM_CODE_START(native_usergs_sysret64)
-	UNWIND_HINT_EMPTY
-	swapgs
-	sysretq
-SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT_XXL */
-
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
@@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
 	 * Try to use SYSRET instead of IRET if we're returning to
 	 * a completely clean 64-bit userspace context.  If we're not,
 	 * go to the slow exit path.
+	 * In the Xen PV case we must use iret anyway.
 	 */
-	movq	RCX(%rsp), %rcx
-	movq	RIP(%rsp), %r11
 
-	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
-	jne	swapgs_restore_regs_and_return_to_usermode
+	ALTERNATIVE __stringify( \
+		movq	RCX(%rsp), %rcx; \
+		movq	RIP(%rsp), %r11; \
+		cmpq	%rcx, %r11;	/* SYSRET requires RCX == RIP */ \
+		jne	swapgs_restore_regs_and_return_to_usermode), \
+	"jmp	swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
 
 	/*
 	 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
@@ -215,7 +210,8 @@ syscall_return_via_sysret:
 
 	popq	%rdi
 	popq	%rsp
-	USERGS_SYSRET64
+	swapgs
+	sysretq
 SYM_CODE_END(entry_SYSCALL_64)
 
 /*
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c86edefa115..e585a4705b8d 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,12 +132,6 @@ static __always_inline unsigned long arch_local_irq_save(void)
 #endif
 
 #define INTERRUPT_RETURN	jmp native_iret
-#define USERGS_SYSRET64				\
-	swapgs;					\
-	sysretq;
-#define USERGS_SYSRET32				\
-	swapgs;					\
-	sysretl
 
 #else
 #define INTERRUPT_RETURN		iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5647bcdba776..8121cf9b8d81 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,11 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-#define USERGS_SYSRET64							\
-	PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),			\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)                                        \
 	PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),			    \
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 903d71884fa2..55d8b7950e61 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -157,14 +157,6 @@ struct pv_cpu_ops {
 
 	u64 (*read_pmc)(int counter);
 
-	/*
-	 * Switch to usermode gs and return to 64-bit usermode using
-	 * sysret.  Only used in 64-bit kernels to return to 64-bit
-	 * processes.  Usermode register state, including %rsp, must
-	 * already be restored.
-	 */
-	void (*usergs_sysret64)(void);
-
 	/* Normal iret.  Jump to this with the standard iret stack
 	   frame set up. */
 	void (*iret)(void);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1354bc30614d..b14533af7676 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -13,8 +13,6 @@ int main(void)
 {
 #ifdef CONFIG_PARAVIRT
 #ifdef CONFIG_PARAVIRT_XXL
-	OFFSET(PV_CPU_usergs_sysret64, paravirt_patch_template,
-	       cpu.usergs_sysret64);
 #ifdef CONFIG_DEBUG_ENTRY
 	OFFSET(PV_IRQ_save_fl, paravirt_patch_template, irq.save_fl);
 #endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 5e5fcf5c376d..18560b71e717 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -135,8 +135,7 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
 	else if (opfunc == _paravirt_ident_64)
 		ret = paravirt_patch_ident_64(insn_buff, len);
 
-	else if (type == PARAVIRT_PATCH(cpu.iret) ||
-		 type == PARAVIRT_PATCH(cpu.usergs_sysret64))
+	else if (type == PARAVIRT_PATCH(cpu.iret))
 		/* If operation requires a jmp, then jmp */
 		ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
@@ -170,7 +169,6 @@ static u64 native_steal_clock(int cpu)
 
 /* These are in entry.S */
 extern void native_iret(void);
-extern void native_usergs_sysret64(void);
 
 static struct resource reserve_ioports = {
 	.start = 0,
@@ -310,7 +308,6 @@ struct paravirt_patch_template pv_ops = {
 
 	.cpu.load_sp0		= native_load_sp0,
 
-	.cpu.usergs_sysret64	= native_usergs_sysret64,
 	.cpu.iret		= native_iret,
 
 #ifdef CONFIG_X86_IOPL_IOPERM
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
index 7c518b08aa3c..2fada2c347c9 100644
--- a/arch/x86/kernel/paravirt_patch.c
+++ b/arch/x86/kernel/paravirt_patch.c
@@ -27,7 +27,6 @@ struct patch_xxl {
 	const unsigned char	mmu_write_cr3[3];
 	const unsigned char	irq_restore_fl[2];
 	const unsigned char	cpu_wbinvd[2];
-	const unsigned char	cpu_usergs_sysret64[6];
 	const unsigned char	mov64[3];
 };
 
@@ -40,8 +39,6 @@ static const struct patch_xxl patch_data_xxl = {
 	.mmu_write_cr3		= { 0x0f, 0x22, 0xdf },	// mov %rdi, %cr3
 	.irq_restore_fl		= { 0x57, 0x9d },	// push %rdi; popfq
 	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
-	.cpu_usergs_sysret64	= { 0x0f, 0x01, 0xf8,
-				    0x48, 0x0f, 0x07 },	// swapgs; sysretq
 	.mov64			= { 0x48, 0x89, 0xf8 },	// mov %rdi, %rax
 };
 
@@ -83,7 +80,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
 	PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len);
 	PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len);
 
-	PATCH_CASE(cpu, usergs_sysret64, xxl, insn_buff, len);
 	PATCH_CASE(cpu, wbinvd, xxl, insn_buff, len);
 #endif
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 44bb18adfb51..5476423fc6d0 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1060,7 +1060,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 	.read_pmc = xen_read_pmc,
 
 	.iret = xen_iret,
-	.usergs_sysret64 = xen_sysret64,
 
 	.load_tr_desc = paravirt_nop,
 	.set_ldt = xen_set_ldt,
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 1d054c915046..c0630fd9f44e 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -214,26 +214,6 @@ SYM_CODE_START(xen_iret)
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)
 
-SYM_CODE_START(xen_sysret64)
-	/*
-	 * We're already on the usermode stack at this point, but
-	 * still with the kernel gs, so we can easily switch back.
-	 *
-	 * tss.sp2 is scratch space.
-	 */
-	movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
-	movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-
-	pushq $__USER_DS
-	pushq PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
-	pushq %r11
-	pushq $__USER_CS
-	pushq %rcx
-
-	pushq $VGCF_in_syscall
-	jmp hypercall_iret
-SYM_CODE_END(xen_sysret64)
-
 /*
  * Xen handles syscall callbacks much like ordinary exceptions, which
  * means we have:
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9546c3384c75..b2fd80a01a36 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -138,8 +138,6 @@ __visible unsigned long xen_read_cr2_direct(void);
 
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
-__visible void xen_sysret32(void);
-__visible void xen_sysret64(void);
 
 extern int xen_panic_handler_init(void);
 
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 11:59   ` Peter Zijlstra
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross via Virtualization
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Thomas Gleixner, Boris Ostrovsky

"popf" is a rather expensive operation, so don't use it for restoring
irq flags. Instead test whether interrupts are enabled in the flags
parameter and enable interrupts via "sti" in that case.

This results in the restore_fl paravirt op to be no longer needed.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/irqflags.h       | 20 ++++++-------------
 arch/x86/include/asm/paravirt.h       |  5 -----
 arch/x86/include/asm/paravirt_types.h |  7 ++-----
 arch/x86/kernel/irqflags.S            | 11 -----------
 arch/x86/kernel/paravirt.c            |  1 -
 arch/x86/kernel/paravirt_patch.c      |  3 ---
 arch/x86/xen/enlighten_pv.c           |  2 --
 arch/x86/xen/irq.c                    | 23 ----------------------
 arch/x86/xen/xen-asm.S                | 28 ---------------------------
 arch/x86/xen/xen-ops.h                |  1 -
 10 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index e585a4705b8d..144d70ea4393 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void)
 	return flags;
 }
 
-extern inline void native_restore_fl(unsigned long flags);
-extern inline void native_restore_fl(unsigned long flags)
-{
-	asm volatile("push %0 ; popf"
-		     : /* no output */
-		     :"g" (flags)
-		     :"memory", "cc");
-}
-
 static __always_inline void native_irq_disable(void)
 {
 	asm volatile("cli": : :"memory");
@@ -79,11 +70,6 @@ static __always_inline unsigned long arch_local_save_flags(void)
 	return native_save_fl();
 }
 
-static __always_inline void arch_local_irq_restore(unsigned long flags)
-{
-	native_restore_fl(flags);
-}
-
 static __always_inline void arch_local_irq_disable(void)
 {
 	native_irq_disable();
@@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void)
 
 	return arch_irqs_disabled_flags(flags);
 }
+
+static __always_inline void arch_local_irq_restore(unsigned long flags)
+{
+	if (!arch_irqs_disabled_flags(flags))
+		arch_local_irq_enable();
+}
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 8121cf9b8d81..ce2b8c5aecde 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -648,11 +648,6 @@ static inline notrace unsigned long arch_local_save_flags(void)
 	return PVOP_CALLEE0(unsigned long, irq.save_fl);
 }
 
-static inline notrace void arch_local_irq_restore(unsigned long f)
-{
-	PVOP_VCALLEE1(irq.restore_fl, f);
-}
-
 static inline notrace void arch_local_irq_disable(void)
 {
 	PVOP_VCALLEE0(irq.irq_disable);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 55d8b7950e61..2031631160d0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -169,16 +169,13 @@ struct pv_cpu_ops {
 struct pv_irq_ops {
 #ifdef CONFIG_PARAVIRT_XXL
 	/*
-	 * Get/set interrupt state.  save_fl and restore_fl are only
-	 * expected to use X86_EFLAGS_IF; all other bits
-	 * returned from save_fl are undefined, and may be ignored by
-	 * restore_fl.
+	 * Get/set interrupt state.  save_fl is expected to use X86_EFLAGS_IF;
+	 * all other bits returned from save_fl are undefined.
 	 *
 	 * NOTE: These functions callers expect the callee to preserve
 	 * more registers than the standard C calling convention.
 	 */
 	struct paravirt_callee_save save_fl;
-	struct paravirt_callee_save restore_fl;
 	struct paravirt_callee_save irq_disable;
 	struct paravirt_callee_save irq_enable;
 
diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
index 0db0375235b4..8ef35063964b 100644
--- a/arch/x86/kernel/irqflags.S
+++ b/arch/x86/kernel/irqflags.S
@@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl)
 	ret
 SYM_FUNC_END(native_save_fl)
 EXPORT_SYMBOL(native_save_fl)
-
-/*
- * void native_restore_fl(unsigned long flags)
- * %eax/%rdi: flags
- */
-SYM_FUNC_START(native_restore_fl)
-	push %_ASM_ARG1
-	popf
-	ret
-SYM_FUNC_END(native_restore_fl)
-EXPORT_SYMBOL(native_restore_fl)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 18560b71e717..c60222ab8ab9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -320,7 +320,6 @@ struct paravirt_patch_template pv_ops = {
 
 	/* Irq ops. */
 	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
-	.irq.restore_fl		= __PV_IS_CALLEE_SAVE(native_restore_fl),
 	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
 	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
 	.irq.safe_halt		= native_safe_halt,
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
index 2fada2c347c9..abd27ec67397 100644
--- a/arch/x86/kernel/paravirt_patch.c
+++ b/arch/x86/kernel/paravirt_patch.c
@@ -25,7 +25,6 @@ struct patch_xxl {
 	const unsigned char	mmu_read_cr2[3];
 	const unsigned char	mmu_read_cr3[3];
 	const unsigned char	mmu_write_cr3[3];
-	const unsigned char	irq_restore_fl[2];
 	const unsigned char	cpu_wbinvd[2];
 	const unsigned char	mov64[3];
 };
@@ -37,7 +36,6 @@ static const struct patch_xxl patch_data_xxl = {
 	.mmu_read_cr2		= { 0x0f, 0x20, 0xd0 },	// mov %cr2, %[re]ax
 	.mmu_read_cr3		= { 0x0f, 0x20, 0xd8 },	// mov %cr3, %[re]ax
 	.mmu_write_cr3		= { 0x0f, 0x22, 0xdf },	// mov %rdi, %cr3
-	.irq_restore_fl		= { 0x57, 0x9d },	// push %rdi; popfq
 	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
 	.mov64			= { 0x48, 0x89, 0xf8 },	// mov %rdi, %rax
 };
@@ -71,7 +69,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
 	switch (type) {
 
 #ifdef CONFIG_PARAVIRT_XXL
-	PATCH_CASE(irq, restore_fl, xxl, insn_buff, len);
 	PATCH_CASE(irq, save_fl, xxl, insn_buff, len);
 	PATCH_CASE(irq, irq_enable, xxl, insn_buff, len);
 	PATCH_CASE(irq, irq_disable, xxl, insn_buff, len);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5476423fc6d0..32b295cc2716 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1022,8 +1022,6 @@ void __init xen_setup_vcpu_info_placement(void)
 	 */
 	if (xen_have_vcpu_info_placement) {
 		pv_ops.irq.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
-		pv_ops.irq.restore_fl =
-			__PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
 		pv_ops.irq.irq_disable =
 			__PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
 		pv_ops.irq.irq_enable =
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 850c93f346c7..dfa091d79c2e 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -42,28 +42,6 @@ asmlinkage __visible unsigned long xen_save_fl(void)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
 
-__visible void xen_restore_fl(unsigned long flags)
-{
-	struct vcpu_info *vcpu;
-
-	/* convert from IF type flag */
-	flags = !(flags & X86_EFLAGS_IF);
-
-	/* See xen_irq_enable() for why preemption must be disabled. */
-	preempt_disable();
-	vcpu = this_cpu_read(xen_vcpu);
-	vcpu->evtchn_upcall_mask = flags;
-
-	if (flags == 0) {
-		barrier(); /* unmask then check (avoid races) */
-		if (unlikely(vcpu->evtchn_upcall_pending))
-			xen_force_evtchn_callback();
-		preempt_enable();
-	} else
-		preempt_enable_no_resched();
-}
-PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
-
 asmlinkage __visible void xen_irq_disable(void)
 {
 	/* There's a one instruction preempt window here.  We need to
@@ -118,7 +96,6 @@ static void xen_halt(void)
 
 static const struct pv_irq_ops xen_irq_ops __initconst = {
 	.save_fl = PV_CALLEE_SAVE(xen_save_fl),
-	.restore_fl = PV_CALLEE_SAVE(xen_restore_fl),
 	.irq_disable = PV_CALLEE_SAVE(xen_irq_disable),
 	.irq_enable = PV_CALLEE_SAVE(xen_irq_enable),
 
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index c0630fd9f44e..1ea7e41044b5 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -72,34 +72,6 @@ SYM_FUNC_START(xen_save_fl_direct)
 	ret
 SYM_FUNC_END(xen_save_fl_direct)
 
-
-/*
- * In principle the caller should be passing us a value return from
- * xen_save_fl_direct, but for robustness sake we test only the
- * X86_EFLAGS_IF flag rather than the whole byte. After setting the
- * interrupt mask state, it checks for unmasked pending events and
- * enters the hypervisor to get them delivered if so.
- */
-SYM_FUNC_START(xen_restore_fl_direct)
-	FRAME_BEGIN
-	testw $X86_EFLAGS_IF, %di
-	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
-	/*
-	 * Preempt here doesn't matter because that will deal with any
-	 * pending interrupts.  The pending check may end up being run
-	 * on the wrong CPU, but that doesn't hurt.
-	 */
-
-	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
-	jnz 1f
-	call check_events
-1:
-	FRAME_END
-	ret
-SYM_FUNC_END(xen_restore_fl_direct)
-
-
 /*
  * Force an event check by making a hypercall, but preserve regs
  * before making the call.
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b2fd80a01a36..8d7ec49a35fb 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -131,7 +131,6 @@ static inline void __init xen_efi_init(struct boot_params *boot_params)
 __visible void xen_irq_enable_direct(void);
 __visible void xen_irq_disable_direct(void);
 __visible unsigned long xen_save_fl_direct(void);
-__visible void xen_restore_fl_direct(unsigned long);
 
 __visible unsigned long xen_read_cr2(void);
 __visible unsigned long xen_read_cr2_direct(void);
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (2 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 12:01   ` Peter Zijlstra
  2020-11-20 11:46 ` [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft Juergen Gross via Virtualization
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, linux-hyperv, virtualization, kvm
  Cc: Juri Lelli, Wanpeng Li, peterz, Ben Segall, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, Daniel Lezcano, VMware, Inc., Ingo Molnar,
	Mel Gorman, Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Boris Ostrovsky, Dietmar Eggemann, Jim Mattson,
	Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira

The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Due to include hell this requires to split out the time interfaces
into a new header file.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/Kconfig                      |  1 +
 arch/x86/include/asm/mshyperv.h       | 11 --------
 arch/x86/include/asm/paravirt.h       | 14 ----------
 arch/x86/include/asm/paravirt_time.h  | 38 +++++++++++++++++++++++++++
 arch/x86/include/asm/paravirt_types.h |  6 -----
 arch/x86/kernel/cpu/vmware.c          |  5 ++--
 arch/x86/kernel/kvm.c                 |  3 ++-
 arch/x86/kernel/kvmclock.c            |  3 ++-
 arch/x86/kernel/paravirt.c            | 16 ++++++++---
 arch/x86/kernel/tsc.c                 |  3 ++-
 arch/x86/xen/time.c                   | 12 ++++-----
 drivers/clocksource/hyperv_timer.c    |  5 ++--
 drivers/xen/time.c                    |  3 ++-
 kernel/sched/sched.h                  |  1 +
 14 files changed, 71 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_time.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..56775acc243e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -767,6 +767,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
 	bool "Enable paravirtualization code"
+	depends on HAVE_STATIC_CALL
 	help
 	  This changes the kernel so it can modify itself when it is run
 	  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ffc289992d1b..45942d420626 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
 #define hv_get_raw_timer() rdtsc_ordered()
 #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
 
-/*
- * Reference to pv_ops must be inline so objtool
- * detection of noinstr violations can work correctly.
- */
-static __always_inline void hv_setup_sched_clock(void *sched_clock)
-{
-#ifdef CONFIG_PARAVIRT
-	pv_ops.time.sched_clock = sched_clock;
-#endif
-}
-
 void hyperv_vector_handler(struct pt_regs *regs);
 
 static inline void hv_enable_stimer0_percpu_irq(int irq) {}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index ce2b8c5aecde..01b3e36462c3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,25 +17,11 @@
 #include <linux/cpumask.h>
 #include <asm/frame.h>
 
-static inline unsigned long long paravirt_sched_clock(void)
-{
-	return PVOP_CALL0(unsigned long long, time.sched_clock);
-}
-
-struct static_key;
-extern struct static_key paravirt_steal_enabled;
-extern struct static_key paravirt_steal_rq_enabled;
-
 __visible void __native_queued_spin_unlock(struct qspinlock *lock);
 bool pv_is_native_spin_unlock(void);
 __visible bool __native_vcpu_is_preempted(long cpu);
 bool pv_is_native_vcpu_is_preempted(void);
 
-static inline u64 paravirt_steal_clock(int cpu)
-{
-	return PVOP_CALL1(u64, time.steal_clock, cpu);
-}
-
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
diff --git a/arch/x86/include/asm/paravirt_time.h b/arch/x86/include/asm/paravirt_time.h
new file mode 100644
index 000000000000..76cf94b7c899
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_time.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PARAVIRT_TIME_H
+#define _ASM_X86_PARAVIRT_TIME_H
+
+/* Time related para-virtualized functions. */
+
+#ifdef CONFIG_PARAVIRT
+
+#include <linux/types.h>
+#include <linux/jump_label.h>
+#include <linux/static_call.h>
+
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
+{
+	return static_call(pv_sched_clock)();
+}
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+	return static_call(pv_steal_clock)(cpu);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif /* _ASM_X86_PARAVIRT_TIME_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 2031631160d0..01af7b944224 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -96,11 +96,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-	unsigned long long (*sched_clock)(void);
-	unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
 	/* hooks for various privileged instructions */
 	void (*io_delay)(void);
@@ -292,7 +287,6 @@ struct pv_lock_ops {
  * what to patch. */
 struct paravirt_patch_template {
 	struct pv_init_ops	init;
-	struct pv_time_ops	time;
 	struct pv_cpu_ops	cpu;
 	struct pv_irq_ops	irq;
 	struct pv_mmu_ops	mmu;
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..f265426a1c3e 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -34,6 +34,7 @@
 #include <asm/apic.h>
 #include <asm/vmware.h>
 #include <asm/svm.h>
+#include <asm/paravirt_time.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"vmware: " fmt
@@ -336,11 +337,11 @@ static void __init vmware_paravirt_ops_setup(void)
 	vmware_cyc2ns_setup();
 
 	if (vmw_sched_clock)
-		pv_ops.time.sched_clock = vmware_sched_clock;
+		paravirt_set_sched_clock(vmware_sched_clock);
 
 	if (vmware_is_stealclock_available()) {
 		has_steal_clock = true;
-		pv_ops.time.steal_clock = vmware_steal_clock;
+		static_call_update(pv_steal_clock, vmware_steal_clock);
 
 		/* We use reboot notifier only to disable steal clock */
 		register_reboot_notifier(&vmware_pv_reboot_nb);
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7f57ede3cb8e..6c525fdd0312 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -38,6 +38,7 @@
 #include <asm/cpuidle_haltpoll.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/paravirt_time.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -650,7 +651,7 @@ static void __init kvm_guest_init(void)
 
 	if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		has_steal_clock = 1;
-		pv_ops.time.steal_clock = kvm_steal_clock;
+		static_call_update(pv_steal_clock, kvm_steal_clock);
 	}
 
 	if (pv_tlb_flush_supported()) {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 34b18f6eeb2c..02f60ee16f10 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include <asm/x86_init.h>
 #include <asm/reboot.h>
 #include <asm/kvmclock.h>
+#include <asm/paravirt_time.h>
 
 static int kvmclock __initdata = 1;
 static int kvmclock_vsyscall __initdata = 1;
@@ -107,7 +108,7 @@ static inline void kvm_sched_clock_init(bool stable)
 	if (!stable)
 		clear_sched_clock_stable();
 	kvm_sched_clock_offset = kvm_clock_read();
-	pv_ops.time.sched_clock = kvm_sched_clock_read;
+	paravirt_set_sched_clock(kvm_sched_clock_read);
 
 	pr_info("kvm-clock: using sched offset of %llu cycles",
 		kvm_sched_clock_offset);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c60222ab8ab9..9f8aa18aa378 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -31,6 +31,7 @@
 #include <asm/special_insns.h>
 #include <asm/tlb.h>
 #include <asm/io_bitmap.h>
+#include <asm/paravirt_time.h>
 
 /*
  * nop stub, which must not clobber anything *including the stack* to
@@ -167,6 +168,17 @@ static u64 native_steal_clock(int cpu)
 	return 0;
 }
 
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+DEFINE_STATIC_CALL(pv_sched_clock, native_sched_clock);
+
+bool paravirt_using_native_sched_clock = true;
+
+void paravirt_set_sched_clock(u64 (*func)(void))
+{
+	static_call_update(pv_sched_clock, func);
+	paravirt_using_native_sched_clock = (func == native_sched_clock);
+}
+
 /* These are in entry.S */
 extern void native_iret(void);
 
@@ -272,10 +284,6 @@ struct paravirt_patch_template pv_ops = {
 	/* Init ops. */
 	.init.patch		= native_patch,
 
-	/* Time ops. */
-	.time.sched_clock	= native_sched_clock,
-	.time.steal_clock	= native_steal_clock,
-
 	/* Cpu ops. */
 	.cpu.io_delay		= native_io_delay,
 
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc2771f..d01245b770de 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -28,6 +28,7 @@
 #include <asm/intel-family.h>
 #include <asm/i8259.h>
 #include <asm/uv/uv.h>
+#include <asm/paravirt_time.h>
 
 unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -254,7 +255,7 @@ unsigned long long sched_clock(void)
 
 bool using_native_sched_clock(void)
 {
-	return pv_ops.time.sched_clock == native_sched_clock;
+	return paravirt_using_native_sched_clock;
 }
 #else
 unsigned long long
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 91f5b330dcc6..17e62f4f69a9 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -18,6 +18,7 @@
 #include <linux/timekeeper_internal.h>
 
 #include <asm/pvclock.h>
+#include <asm/paravirt_time.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -379,11 +380,6 @@ void xen_timer_resume(void)
 	}
 }
 
-static const struct pv_time_ops xen_time_ops __initconst = {
-	.sched_clock = xen_sched_clock,
-	.steal_clock = xen_steal_clock,
-};
-
 static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
 static u64 xen_clock_value_saved;
 
@@ -528,7 +524,8 @@ static void __init xen_time_init(void)
 void __init xen_init_time_ops(void)
 {
 	xen_sched_clock_offset = xen_clocksource_read();
-	pv_ops.time = xen_time_ops;
+	static_call_update(pv_steal_clock, xen_steal_clock);
+	paravirt_set_sched_clock(xen_sched_clock);
 
 	x86_init.timers.timer_init = xen_time_init;
 	x86_init.timers.setup_percpu_clockev = x86_init_noop;
@@ -570,7 +567,8 @@ void __init xen_hvm_init_time_ops(void)
 	}
 
 	xen_sched_clock_offset = xen_clocksource_read();
-	pv_ops.time = xen_time_ops;
+	static_call_update(pv_steal_clock, xen_steal_clock);
+	paravirt_set_sched_clock(xen_sched_clock);
 	x86_init.timers.setup_percpu_clockev = xen_time_init;
 	x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
 
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba04cb381cd3..1ed79993fc50 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -21,6 +21,7 @@
 #include <clocksource/hyperv_timer.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
+#include <asm/paravirt_time.h>
 
 static struct clock_event_device __percpu *hv_clock_event;
 static u64 hv_sched_clock_offset __ro_after_init;
@@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
 	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_tsc);
+	paravirt_set_sched_clock(read_hv_sched_clock_tsc);
 
 	return true;
 }
@@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
 	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 
 	hv_sched_clock_offset = hv_read_reference_counter();
-	hv_setup_sched_clock(read_hv_sched_clock_msr);
+	static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 108edbcbc040..b35ce88427c9 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 
 #include <asm/paravirt.h>
+#include <asm/paravirt_time.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -175,7 +176,7 @@ void __init xen_time_setup_guest(void)
 	xen_runstate_remote = !HYPERVISOR_vm_assist(VMASST_CMD_enable,
 					VMASST_TYPE_runstate_update_flag);
 
-	pv_ops.time.steal_clock = xen_steal_clock;
+	static_call_update(pv_steal_clock, xen_steal_clock);
 
 	static_key_slow_inc(&paravirt_steal_enabled);
 	if (xen_runstate_remote)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index df80bfcea92e..1687c5383797 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -71,6 +71,7 @@
 
 #ifdef CONFIG_PARAVIRT
 # include <asm/paravirt.h>
+# include <asm/paravirt_time.h>
 #endif
 
 #include "cpupri.h"
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (3 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 12:08   ` Peter Zijlstra
  2020-11-20 11:46 ` [PATCH v2 09/12] x86/paravirt: switch iret pvops to ALTERNATIVE Juergen Gross via Virtualization
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, linux-kernel, virtualization
  Cc: Juergen Gross, peterz, VMware, Inc., Ingo Molnar, Borislav Petkov,
	luto, H. Peter Anvin, Thomas Gleixner

PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

GET_CR2_INTO_AX and ENABLE_INTERRUPTS are used nowhere, so they can
be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/entry/entry_32.S             |   4 +-
 arch/x86/include/asm/irqflags.h       |   5 --
 arch/x86/include/asm/paravirt.h       |  46 +----------
 arch/x86/include/asm/paravirt_types.h | 112 ++++++++------------------
 arch/x86/kernel/asm-offsets.c         |   3 -
 arch/x86/kernel/head_64.S             |   2 -
 6 files changed, 35 insertions(+), 137 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..765487e57d6e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -430,7 +430,7 @@
 	 * will soon execute iret and the tracer was already set to
 	 * the irqstate after the IRET:
 	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
+	cli
 	lss	(%esp), %esp			/* switch to espfix segment */
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
@@ -1077,7 +1077,7 @@ restore_all_switch_stack:
 	 * when returning from IPI handler and when returning from
 	 * scheduler to user-space.
 	 */
-	INTERRUPT_RETURN
+	iret
 
 .section .fixup, "ax"
 SYM_CODE_START(asm_iret_error)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 144d70ea4393..a0efbcd24b86 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -109,9 +109,6 @@ static __always_inline unsigned long arch_local_irq_save(void)
 }
 #else
 
-#define ENABLE_INTERRUPTS(x)	sti
-#define DISABLE_INTERRUPTS(x)	cli
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)		pushfq; popq %rax
@@ -119,8 +116,6 @@ static __always_inline unsigned long arch_local_irq_save(void)
 
 #define INTERRUPT_RETURN	jmp native_iret
 
-#else
-#define INTERRUPT_RETURN		iret
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 01b3e36462c3..1dd30c95505d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -692,6 +692,7 @@ extern void default_banner(void);
 	.if ((~(set)) & mask); pop %reg; .endif
 
 #ifdef CONFIG_X86_64
+#ifdef CONFIG_PARAVIRT_XXL
 
 #define PV_SAVE_REGS(set)			\
 	COND_PUSH(set, CLBR_RAX, rax);		\
@@ -717,46 +718,12 @@ extern void default_banner(void);
 #define PARA_PATCH(off)		((off) / 8)
 #define PARA_SITE(ptype, ops)	_PVSITE(ptype, ops, .quad, 8)
 #define PARA_INDIRECT(addr)	*addr(%rip)
-#else
-#define PV_SAVE_REGS(set)			\
-	COND_PUSH(set, CLBR_EAX, eax);		\
-	COND_PUSH(set, CLBR_EDI, edi);		\
-	COND_PUSH(set, CLBR_ECX, ecx);		\
-	COND_PUSH(set, CLBR_EDX, edx)
-#define PV_RESTORE_REGS(set)			\
-	COND_POP(set, CLBR_EDX, edx);		\
-	COND_POP(set, CLBR_ECX, ecx);		\
-	COND_POP(set, CLBR_EDI, edi);		\
-	COND_POP(set, CLBR_EAX, eax)
-
-#define PARA_PATCH(off)		((off) / 4)
-#define PARA_SITE(ptype, ops)	_PVSITE(ptype, ops, .long, 4)
-#define PARA_INDIRECT(addr)	*%cs:addr
-#endif
 
-#ifdef CONFIG_PARAVIRT_XXL
 #define INTERRUPT_RETURN						\
 	PARA_SITE(PARA_PATCH(PV_CPU_iret),				\
 		  ANNOTATE_RETPOLINE_SAFE;				\
 		  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
 
-#define DISABLE_INTERRUPTS(clobbers)					\
-	PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable),			\
-		  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);		\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable);	\
-		  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-
-#define ENABLE_INTERRUPTS(clobbers)					\
-	PARA_SITE(PARA_PATCH(PV_IRQ_irq_enable),			\
-		  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);		\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable);		\
-		  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-#endif
-
-#ifdef CONFIG_X86_64
-#ifdef CONFIG_PARAVIRT_XXL
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)                                        \
 	PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),			    \
@@ -768,17 +735,6 @@ extern void default_banner(void);
 #endif /* CONFIG_PARAVIRT_XXL */
 #endif	/* CONFIG_X86_64 */
 
-#ifdef CONFIG_PARAVIRT_XXL
-
-#define GET_CR2_INTO_AX							\
-	PARA_SITE(PARA_PATCH(PV_MMU_read_cr2),				\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  call PARA_INDIRECT(pv_ops+PV_MMU_read_cr2);		\
-		 )
-
-#endif /* CONFIG_PARAVIRT_XXL */
-
-
 #endif /* __ASSEMBLY__ */
 #else  /* CONFIG_PARAVIRT */
 # define default_banner x86_init_noop
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 01af7b944224..b86acbb6449f 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -471,55 +471,34 @@ int paravirt_disable_iospace(void);
 	})
 
 
-#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr,		\
-		      pre, post, ...)					\
+#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)	\
 	({								\
-		rettype __ret;						\
 		PVOP_CALL_ARGS;						\
 		PVOP_TEST_NULL(op);					\
-		/* This is 32-bit specific, but is okay in 64-bit */	\
-		/* since this condition will never hold */		\
-		if (sizeof(rettype) > sizeof(unsigned long)) {		\
-			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
-				     post				\
-				     : call_clbr, ASM_CALL_CONSTRAINT	\
-				     : paravirt_type(op),		\
-				       paravirt_clobber(clbr),		\
-				       ##__VA_ARGS__			\
-				     : "memory", "cc" extra_clbr);	\
-			__ret = (rettype)((((u64)__edx) << 32) | __eax); \
-		} else {						\
-			asm volatile(pre				\
-				     paravirt_alt(PARAVIRT_CALL)	\
-				     post				\
-				     : call_clbr, ASM_CALL_CONSTRAINT	\
-				     : paravirt_type(op),		\
-				       paravirt_clobber(clbr),		\
-				       ##__VA_ARGS__			\
-				     : "memory", "cc" extra_clbr);	\
-			__ret = (rettype)(__eax & PVOP_RETMASK(rettype));	\
-		}							\
-		__ret;							\
+		BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));	\
+		asm volatile(paravirt_alt(PARAVIRT_CALL)		\
+			     : call_clbr, ASM_CALL_CONSTRAINT		\
+			     : paravirt_type(op),			\
+			       paravirt_clobber(clbr),			\
+			       ##__VA_ARGS__				\
+			     : "memory", "cc" extra_clbr);		\
+		(rettype)(__eax & PVOP_RETMASK(rettype));		\
 	})
 
-#define __PVOP_CALL(rettype, op, pre, post, ...)			\
+#define __PVOP_CALL(rettype, op, ...)					\
 	____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,	\
-		      EXTRA_CLOBBERS, pre, post, ##__VA_ARGS__)
+		      EXTRA_CLOBBERS, ##__VA_ARGS__)
 
-#define __PVOP_CALLEESAVE(rettype, op, pre, post, ...)			\
+#define __PVOP_CALLEESAVE(rettype, op, ...)				\
 	____PVOP_CALL(rettype, op.func, CLBR_RET_REG,			\
-		      PVOP_CALLEE_CLOBBERS, ,				\
-		      pre, post, ##__VA_ARGS__)
+		      PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
-#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, pre, post, ...)	\
+#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, ...)		\
 	({								\
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
-		asm volatile(pre					\
-			     paravirt_alt(PARAVIRT_CALL)		\
-			     post					\
+		asm volatile(paravirt_alt(PARAVIRT_CALL)		\
 			     : call_clbr, ASM_CALL_CONSTRAINT		\
 			     : paravirt_type(op),			\
 			       paravirt_clobber(clbr),			\
@@ -527,84 +506,57 @@ int paravirt_disable_iospace(void);
 			     : "memory", "cc" extra_clbr);		\
 	})
 
-#define __PVOP_VCALL(op, pre, post, ...)				\
+#define __PVOP_VCALL(op, ...)						\
 	____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS,		\
-		       VEXTRA_CLOBBERS,					\
-		       pre, post, ##__VA_ARGS__)
+		       VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
-#define __PVOP_VCALLEESAVE(op, pre, post, ...)				\
+#define __PVOP_VCALLEESAVE(op, ...)					\
 	____PVOP_VCALL(op.func, CLBR_RET_REG,				\
-		      PVOP_VCALLEE_CLOBBERS, ,				\
-		      pre, post, ##__VA_ARGS__)
+		      PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
 
 #define PVOP_CALL0(rettype, op)						\
-	__PVOP_CALL(rettype, op, "", "")
+	__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)							\
-	__PVOP_VCALL(op, "", "")
+	__PVOP_VCALL(op)
 
 #define PVOP_CALLEE0(rettype, op)					\
-	__PVOP_CALLEESAVE(rettype, op, "", "")
+	__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)						\
-	__PVOP_VCALLEESAVE(op, "", "")
+	__PVOP_VCALLEESAVE(op)
 
 
 #define PVOP_CALL1(rettype, op, arg1)					\
-	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1))
+	__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)						\
-	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1))
+	__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1))
 
 #define PVOP_CALLEE1(rettype, op, arg1)					\
-	__PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1))
+	__PVOP_CALLEESAVE(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALLEE1(op, arg1)						\
-	__PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1))
+	__PVOP_VCALLEESAVE(op, PVOP_CALL_ARG1(arg1))
 
 
 #define PVOP_CALL2(rettype, op, arg1, arg2)				\
-	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1),		\
-		    PVOP_CALL_ARG2(arg2))
+	__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2))
 #define PVOP_VCALL2(op, arg1, arg2)					\
-	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1),			\
-		     PVOP_CALL_ARG2(arg2))
-
-#define PVOP_CALLEE2(rettype, op, arg1, arg2)				\
-	__PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1),	\
-			  PVOP_CALL_ARG2(arg2))
-#define PVOP_VCALLEE2(op, arg1, arg2)					\
-	__PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1),		\
-			   PVOP_CALL_ARG2(arg2))
-
+	__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2))
 
 #define PVOP_CALL3(rettype, op, arg1, arg2, arg3)			\
-	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1),		\
+	__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1),			\
 		    PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3))
 #define PVOP_VCALL3(op, arg1, arg2, arg3)				\
-	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1),			\
+	__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1),				\
 		     PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3))
 
-/* This is the only difference in x86_64. We can make it much simpler */
-#ifdef CONFIG_X86_32
 #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4)			\
 	__PVOP_CALL(rettype, op,					\
-		    "push %[_arg4];", "lea 4(%%esp),%%esp;",		\
-		    PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),		\
-		    PVOP_CALL_ARG3(arg3), [_arg4] "mr" ((u32)(arg4)))
-#define PVOP_VCALL4(op, arg1, arg2, arg3, arg4)				\
-	__PVOP_VCALL(op,						\
-		    "push %[_arg4];", "lea 4(%%esp),%%esp;",		\
-		    "0" ((u32)(arg1)), "1" ((u32)(arg2)),		\
-		    "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
-#else
-#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4)			\
-	__PVOP_CALL(rettype, op, "", "",				\
 		    PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),		\
 		    PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4))
 #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4)				\
-	__PVOP_VCALL(op, "", "",					\
-		     PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),	\
+	__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),	\
 		     PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4))
-#endif
 
 /* Lazy mode for batching updates / context switch */
 enum paravirt_lazy_mode {
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 70b7154f4bdd..736508004b30 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -63,10 +63,7 @@ static void __used common(void)
 
 #ifdef CONFIG_PARAVIRT_XXL
 	BLANK();
-	OFFSET(PV_IRQ_irq_disable, paravirt_patch_template, irq.irq_disable);
-	OFFSET(PV_IRQ_irq_enable, paravirt_patch_template, irq.irq_enable);
 	OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-	OFFSET(PV_MMU_read_cr2, paravirt_patch_template, mmu.read_cr2);
 #endif
 
 #ifdef CONFIG_XEN
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3c417734790f..ccb3a16ae6d0 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -29,10 +29,8 @@
 #ifdef CONFIG_PARAVIRT_XXL
 #include <asm/asm-offsets.h>
 #include <asm/paravirt.h>
-#define GET_CR2_INTO(reg) GET_CR2_INTO_AX ; _ASM_MOV %_ASM_AX, reg
 #else
 #define INTERRUPT_RETURN iretq
-#define GET_CR2_INTO(reg) _ASM_MOV %cr2, reg
 #endif
 
 /*
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 09/12] x86/paravirt: switch iret pvops to ALTERNATIVE
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (4 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 10/12] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs Juergen Gross via Virtualization
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Thomas Gleixner, Boris Ostrovsky

The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h       |  7 ++++---
 arch/x86/include/asm/paravirt_types.h |  5 +----
 arch/x86/kernel/asm-offsets.c         |  5 -----
 arch/x86/kernel/paravirt.c            | 26 ++------------------------
 arch/x86/xen/enlighten_pv.c           |  3 +--
 5 files changed, 8 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1dd30c95505d..62fbcd899539 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -720,9 +720,10 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)	*addr(%rip)
 
 #define INTERRUPT_RETURN						\
-	PARA_SITE(PARA_PATCH(PV_CPU_iret),				\
-		  ANNOTATE_RETPOLINE_SAFE;				\
-		  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+	ANNOTATE_RETPOLINE_SAFE;					\
+	ALTERNATIVE_2 "jmp *paravirt_iret(%rip);",			\
+		      "jmp native_iret;", X86_FEATURE_NOT_XENPV,	\
+		      "jmp xen_iret;", X86_FEATURE_XENPV
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)                                        \
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index b86acbb6449f..b66650dc869e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -152,10 +152,6 @@ struct pv_cpu_ops {
 
 	u64 (*read_pmc)(int counter);
 
-	/* Normal iret.  Jump to this with the standard iret stack
-	   frame set up. */
-	void (*iret)(void);
-
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
 #endif
@@ -295,6 +291,7 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
+extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 736508004b30..ecd3fd6993d1 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -61,11 +61,6 @@ static void __used common(void)
 	OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext);
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-	BLANK();
-	OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-#endif
-
 #ifdef CONFIG_XEN
 	BLANK();
 	OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 9f8aa18aa378..2ab547dd66c3 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x)
 {
 	return x;
 }
-
-static unsigned paravirt_patch_jmp(void *insn_buff, const void *target,
-				   unsigned long addr, unsigned len)
-{
-	struct branch *b = insn_buff;
-	unsigned long delta = (unsigned long)target - (addr+5);
-
-	if (len < 5) {
-#ifdef CONFIG_RETPOLINE
-		WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void *)addr);
-#endif
-		return len;	/* call too long for patch site */
-	}
-
-	b->opcode = 0xe9;	/* jmp */
-	b->delta = delta;
-
-	return 5;
-}
 #endif
 
 DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
 	else if (opfunc == _paravirt_ident_64)
 		ret = paravirt_patch_ident_64(insn_buff, len);
 
-	else if (type == PARAVIRT_PATCH(cpu.iret))
-		/* If operation requires a jmp, then jmp */
-		ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
 	else
 		/* Otherwise call the function. */
@@ -316,8 +294,6 @@ struct paravirt_patch_template pv_ops = {
 
 	.cpu.load_sp0		= native_load_sp0,
 
-	.cpu.iret		= native_iret,
-
 #ifdef CONFIG_X86_IOPL_IOPERM
 	.cpu.invalidate_io_bitmap	= native_tss_invalidate_io_bitmap,
 	.cpu.update_io_bitmap		= native_tss_update_io_bitmap,
@@ -422,6 +398,8 @@ struct paravirt_patch_template pv_ops = {
 NOKPROBE_SYMBOL(native_get_debugreg);
 NOKPROBE_SYMBOL(native_set_debugreg);
 NOKPROBE_SYMBOL(native_load_idt);
+
+void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 32b295cc2716..4716383c64a9 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1057,8 +1057,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = {
 
 	.read_pmc = xen_read_pmc,
 
-	.iret = xen_iret,
-
 	.load_tr_desc = paravirt_nop,
 	.set_ldt = xen_set_ldt,
 	.load_gdt = xen_load_gdt,
@@ -1222,6 +1220,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 	pv_info = xen_info;
 	pv_ops.init.patch = paravirt_patch_default;
 	pv_ops.cpu = xen_cpu_ops;
+	paravirt_iret = xen_iret;
 	xen_init_irq_ops();
 
 	/*
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 10/12] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (5 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 09/12] x86/paravirt: switch iret pvops to ALTERNATIVE Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 11:46 ` [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE Juergen Gross via Virtualization
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, peterz, VMware, Inc., Ingo Molnar, Borislav Petkov,
	luto, H. Peter Anvin, Thomas Gleixner

Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt_types.h | 62 +++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index b66650dc869e..1f8e1b76e78b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -482,14 +482,39 @@ int paravirt_disable_iospace(void);
 		(rettype)(__eax & PVOP_RETMASK(rettype));		\
 	})
 
+#define ____PVOP_ALT_CALL(rettype, op, alt, cond, clbr, call_clbr,	\
+			  extra_clbr, ...)				\
+	({								\
+		PVOP_CALL_ARGS;						\
+		PVOP_TEST_NULL(op);					\
+		BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));	\
+		asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),	\
+					 alt, cond)			\
+			     : call_clbr, ASM_CALL_CONSTRAINT		\
+			     : paravirt_type(op),			\
+			       paravirt_clobber(clbr),			\
+			       ##__VA_ARGS__				\
+			     : "memory", "cc" extra_clbr);		\
+		(rettype)(__eax & PVOP_RETMASK(rettype));		\
+	})
+
 #define __PVOP_CALL(rettype, op, ...)					\
 	____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,	\
 		      EXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...)			\
+	____PVOP_ALT_CALL(rettype, op, alt, cond, CLBR_ANY,		\
+			  PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS,		\
+			  ##__VA_ARGS__)
+
 #define __PVOP_CALLEESAVE(rettype, op, ...)				\
 	____PVOP_CALL(rettype, op.func, CLBR_RET_REG,			\
 		      PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...)		\
+	____PVOP_ALT_CALL(rettype, op.func, alt, cond, CLBR_RET_REG,	\
+			  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
+
 
 #define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, ...)		\
 	({								\
@@ -503,36 +528,73 @@ int paravirt_disable_iospace(void);
 			     : "memory", "cc" extra_clbr);		\
 	})
 
+#define ____PVOP_ALT_VCALL(op, alt, cond, clbr, call_clbr,		\
+			   extra_clbr, ...)				\
+	({								\
+		PVOP_VCALL_ARGS;					\
+		PVOP_TEST_NULL(op);					\
+		asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),	\
+					 alt, cond)			\
+			     : call_clbr, ASM_CALL_CONSTRAINT		\
+			     : paravirt_type(op),			\
+			       paravirt_clobber(clbr),			\
+			       ##__VA_ARGS__				\
+			     : "memory", "cc" extra_clbr);		\
+	})
+
 #define __PVOP_VCALL(op, ...)						\
 	____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS,		\
 		       VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALL(op, alt, cond, ...)				\
+	____PVOP_ALT_VCALL(op, alt, cond, CLBR_ANY,			\
+			   PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS,	\
+			   ##__VA_ARGS__)
+
 #define __PVOP_VCALLEESAVE(op, ...)					\
 	____PVOP_VCALL(op.func, CLBR_RET_REG,				\
 		      PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...)			\
+	____PVOP_ALT_VCALL(op.func, alt, cond, CLBR_RET_REG,		\
+			   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
+
 
 
 #define PVOP_CALL0(rettype, op)						\
 	__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)							\
 	__PVOP_VCALL(op)
+#define PVOP_ALT_CALL0(rettype, op, alt, cond)				\
+	__PVOP_ALT_CALL(rettype, op, alt, cond)
+#define PVOP_ALT_VCALL0(op, alt, cond)					\
+	__PVOP_ALT_VCALL(op, alt, cond)
 
 #define PVOP_CALLEE0(rettype, op)					\
 	__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)						\
 	__PVOP_VCALLEESAVE(op)
+#define PVOP_ALT_CALLEE0(rettype, op, alt, cond)			\
+	__PVOP_ALT_CALLEESAVE(rettype, op, alt, cond)
+#define PVOP_ALT_VCALLEE0(op, alt, cond)				\
+	__PVOP_ALT_VCALLEESAVE(op, alt, cond)
 
 
 #define PVOP_CALL1(rettype, op, arg1)					\
 	__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)						\
 	__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_VCALL1(op, arg1, alt, cond)				\
+	__PVOP_ALT_VCALL(op, alt, cond, PVOP_CALL_ARG1(arg1))
 
 #define PVOP_CALLEE1(rettype, op, arg1)					\
 	__PVOP_CALLEESAVE(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALLEE1(op, arg1)						\
 	__PVOP_VCALLEESAVE(op, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_CALLEE1(rettype, op, arg1, alt, cond)			\
+	__PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_VCALLEE1(op, arg1, alt, cond)				\
+	__PVOP_ALT_VCALLEESAVE(op, alt, cond, PVOP_CALL_ARG1(arg1))
 
 
 #define PVOP_CALL2(rettype, op, arg1, arg2)				\
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (6 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 10/12] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 15:46   ` kernel test robot
  2020-11-20 11:46 ` [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function Juergen Gross via Virtualization
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, peterz, VMware, Inc., Ingo Molnar, Borislav Petkov,
	luto, H. Peter Anvin, Thomas Gleixner

Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h       | 73 ++++++++++++++--------
 arch/x86/include/asm/paravirt_types.h |  1 -
 arch/x86/kernel/paravirt.c            | 16 ++---
 arch/x86/kernel/paravirt_patch.c      | 88 ---------------------------
 4 files changed, 54 insertions(+), 124 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 62fbcd899539..500c64d0cfcb 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -108,7 +108,8 @@ static inline void write_cr0(unsigned long x)
 
 static inline unsigned long read_cr2(void)
 {
-	return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
+	return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
+				"mov %%cr2, %%rax;", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void write_cr2(unsigned long x)
@@ -118,12 +119,14 @@ static inline void write_cr2(unsigned long x)
 
 static inline unsigned long __read_cr3(void)
 {
-	return PVOP_CALL0(unsigned long, mmu.read_cr3);
+	return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
+			      "mov %%cr3, %%rax;", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void write_cr3(unsigned long x)
 {
-	PVOP_VCALL1(mmu.write_cr3, x);
+	PVOP_ALT_VCALL1(mmu.write_cr3, x,
+			"mov %%rdi, %%cr3", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void __write_cr4(unsigned long x)
@@ -143,7 +146,7 @@ static inline void halt(void)
 
 static inline void wbinvd(void)
 {
-	PVOP_VCALL0(cpu.wbinvd);
+	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", X86_FEATURE_NOT_XENPV);
 }
 
 static inline u64 paravirt_read_msr(unsigned msr)
@@ -357,22 +360,28 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-	return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
+	return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
+					  "mov %%rdi, %%rax",
+					  X86_FEATURE_NOT_XENPV) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-	return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
+	return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
+				"mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-	return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
+	return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
+					  "mov %%rdi, %%rax",
+					  X86_FEATURE_NOT_XENPV) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-	return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
+	return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
+				"mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -405,12 +414,15 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline pmd_t __pmd(pmdval_t val)
 {
-	return (pmd_t) { PVOP_CALLEE1(pmdval_t, mmu.make_pmd, val) };
+	return (pmd_t) { PVOP_ALT_CALLEE1(pmdval_t, mmu.make_pmd, val,
+					  "mov %%rdi, %%rax",
+					  X86_FEATURE_NOT_XENPV) };
 }
 
 static inline pmdval_t pmd_val(pmd_t pmd)
 {
-	return PVOP_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd);
+	return PVOP_ALT_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd,
+				"mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
@@ -422,14 +434,16 @@ static inline pud_t __pud(pudval_t val)
 {
 	pudval_t ret;
 
-	ret = PVOP_CALLEE1(pudval_t, mmu.make_pud, val);
+	ret = PVOP_ALT_CALLEE1(pudval_t, mmu.make_pud, val,
+			       "mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 
 	return (pud_t) { ret };
 }
 
 static inline pudval_t pud_val(pud_t pud)
 {
-	return PVOP_CALLEE1(pudval_t, mmu.pud_val, pud.pud);
+	return PVOP_ALT_CALLEE1(pudval_t, mmu.pud_val, pud.pud,
+				"mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void pud_clear(pud_t *pudp)
@@ -448,14 +462,17 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 
 static inline p4d_t __p4d(p4dval_t val)
 {
-	p4dval_t ret = PVOP_CALLEE1(p4dval_t, mmu.make_p4d, val);
+	p4dval_t ret = PVOP_ALT_CALLEE1(p4dval_t, mmu.make_p4d, val,
+					"mov %%rdi, %%rax",
+					X86_FEATURE_NOT_XENPV);
 
 	return (p4d_t) { ret };
 }
 
 static inline p4dval_t p4d_val(p4d_t p4d)
 {
-	return PVOP_CALLEE1(p4dval_t, mmu.p4d_val, p4d.p4d);
+	return PVOP_ALT_CALLEE1(p4dval_t, mmu.p4d_val, p4d.p4d,
+				"mov %%rdi, %%rax", X86_FEATURE_NOT_XENPV);
 }
 
 static inline void __set_pgd(pgd_t *pgdp, pgd_t pgd)
@@ -542,7 +559,9 @@ static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock *lock,
 
 static __always_inline void pv_queued_spin_unlock(struct qspinlock *lock)
 {
-	PVOP_VCALLEE1(lock.queued_spin_unlock, lock);
+	PVOP_ALT_VCALLEE1(lock.queued_spin_unlock, lock,
+			  "movb $0, (%%" _ASM_ARG1 ");",
+			  X86_FEATURE_NO_PVUNLOCK);
 }
 
 static __always_inline void pv_wait(u8 *ptr, u8 val)
@@ -557,7 +576,9 @@ static __always_inline void pv_kick(int cpu)
 
 static __always_inline bool pv_vcpu_is_preempted(long cpu)
 {
-	return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu);
+	return PVOP_ALT_CALLEE1(bool, lock.vcpu_is_preempted, cpu,
+				"xor %%" _ASM_AX ", %%" _ASM_AX ";",
+				X86_FEATURE_NO_VCPUPREEMPT);
 }
 
 void __raw_callee_save___native_queued_spin_unlock(struct qspinlock *lock);
@@ -631,17 +652,18 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
 #ifdef CONFIG_PARAVIRT_XXL
 static inline notrace unsigned long arch_local_save_flags(void)
 {
-	return PVOP_CALLEE0(unsigned long, irq.save_fl);
+	return PVOP_ALT_CALLEE0(unsigned long, irq.save_fl,
+				"pushf; pop %%rax;", X86_FEATURE_NOT_XENPV);
 }
 
 static inline notrace void arch_local_irq_disable(void)
 {
-	PVOP_VCALLEE0(irq.irq_disable);
+	PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", X86_FEATURE_NOT_XENPV);
 }
 
 static inline notrace void arch_local_irq_enable(void)
 {
-	PVOP_VCALLEE0(irq.irq_enable);
+	PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", X86_FEATURE_NOT_XENPV);
 }
 
 static inline notrace unsigned long arch_local_irq_save(void)
@@ -726,12 +748,13 @@ extern void default_banner(void);
 		      "jmp xen_iret;", X86_FEATURE_XENPV
 
 #ifdef CONFIG_DEBUG_ENTRY
-#define SAVE_FLAGS(clobbers)                                        \
-	PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),			    \
-		  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);        \
-		  ANNOTATE_RETPOLINE_SAFE;			    \
-		  call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);	    \
-		  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
+#define SAVE_FLAGS(clobbers)						      \
+	ALTERNATIVE(PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),		      \
+			      PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);      \
+			      ANNOTATE_RETPOLINE_SAFE;			      \
+			      call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);      \
+			      PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);), \
+		    "pushf; pop %rax;", X86_FEATURE_NOT_XENPV)
 #endif
 #endif /* CONFIG_PARAVIRT_XXL */
 #endif	/* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 1f8e1b76e78b..481d3b667005 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -323,7 +323,6 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_ident_64(void *insn_buff, unsigned len);
 unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, unsigned len);
 unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char *start, const char *end);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 2ab547dd66c3..db6ae7f7c14e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -53,7 +53,10 @@ void __init default_banner(void)
 }
 
 /* Undefined instruction for dealing with missing ops pointers. */
-static const unsigned char ud2a[] = { 0x0f, 0x0b };
+static void paravirt_BUG(void)
+{
+	BUG();
+}
 
 struct branch {
 	unsigned char opcode;
@@ -107,17 +110,10 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
 	unsigned ret;
 
 	if (opfunc == NULL)
-		/* If there's no function, patch it with a ud2a (BUG) */
-		ret = paravirt_patch_insns(insn_buff, len, ud2a, ud2a+sizeof(ud2a));
+		/* If there's no function, patch it with paravirt_BUG() */
+		ret = paravirt_patch_call(insn_buff, paravirt_BUG, addr, len);
 	else if (opfunc == _paravirt_nop)
 		ret = 0;
-
-#ifdef CONFIG_PARAVIRT_XXL
-	/* identity functions just return their single argument */
-	else if (opfunc == _paravirt_ident_64)
-		ret = paravirt_patch_ident_64(insn_buff, len);
-
-#endif
 	else
 		/* Otherwise call the function. */
 		ret = paravirt_patch_call(insn_buff, opfunc, addr, len);
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
index abd27ec67397..10543dcc8211 100644
--- a/arch/x86/kernel/paravirt_patch.c
+++ b/arch/x86/kernel/paravirt_patch.c
@@ -4,96 +4,8 @@
 #include <asm/paravirt.h>
 #include <asm/asm-offsets.h>
 
-#define PSTART(d, m)							\
-	patch_data_##d.m
-
-#define PEND(d, m)							\
-	(PSTART(d, m) + sizeof(patch_data_##d.m))
-
-#define PATCH(d, m, insn_buff, len)						\
-	paravirt_patch_insns(insn_buff, len, PSTART(d, m), PEND(d, m))
-
-#define PATCH_CASE(ops, m, data, insn_buff, len)				\
-	case PARAVIRT_PATCH(ops.m):					\
-		return PATCH(data, ops##_##m, insn_buff, len)
-
-#ifdef CONFIG_PARAVIRT_XXL
-struct patch_xxl {
-	const unsigned char	irq_irq_disable[1];
-	const unsigned char	irq_irq_enable[1];
-	const unsigned char	irq_save_fl[2];
-	const unsigned char	mmu_read_cr2[3];
-	const unsigned char	mmu_read_cr3[3];
-	const unsigned char	mmu_write_cr3[3];
-	const unsigned char	cpu_wbinvd[2];
-	const unsigned char	mov64[3];
-};
-
-static const struct patch_xxl patch_data_xxl = {
-	.irq_irq_disable	= { 0xfa },		// cli
-	.irq_irq_enable		= { 0xfb },		// sti
-	.irq_save_fl		= { 0x9c, 0x58 },	// pushf; pop %[re]ax
-	.mmu_read_cr2		= { 0x0f, 0x20, 0xd0 },	// mov %cr2, %[re]ax
-	.mmu_read_cr3		= { 0x0f, 0x20, 0xd8 },	// mov %cr3, %[re]ax
-	.mmu_write_cr3		= { 0x0f, 0x22, 0xdf },	// mov %rdi, %cr3
-	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
-	.mov64			= { 0x48, 0x89, 0xf8 },	// mov %rdi, %rax
-};
-
-unsigned int paravirt_patch_ident_64(void *insn_buff, unsigned int len)
-{
-	return PATCH(xxl, mov64, insn_buff, len);
-}
-# endif /* CONFIG_PARAVIRT_XXL */
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-struct patch_lock {
-	unsigned char queued_spin_unlock[3];
-	unsigned char vcpu_is_preempted[2];
-};
-
-static const struct patch_lock patch_data_lock = {
-	.vcpu_is_preempted	= { 0x31, 0xc0 },	// xor %eax, %eax
-
-# ifdef CONFIG_X86_64
-	.queued_spin_unlock	= { 0xc6, 0x07, 0x00 },	// movb $0, (%rdi)
-# else
-	.queued_spin_unlock	= { 0xc6, 0x00, 0x00 },	// movb $0, (%eax)
-# endif
-};
-#endif /* CONFIG_PARAVIRT_SPINLOCKS */
-
 unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
 			  unsigned int len)
 {
-	switch (type) {
-
-#ifdef CONFIG_PARAVIRT_XXL
-	PATCH_CASE(irq, save_fl, xxl, insn_buff, len);
-	PATCH_CASE(irq, irq_enable, xxl, insn_buff, len);
-	PATCH_CASE(irq, irq_disable, xxl, insn_buff, len);
-
-	PATCH_CASE(mmu, read_cr2, xxl, insn_buff, len);
-	PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len);
-	PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len);
-
-	PATCH_CASE(cpu, wbinvd, xxl, insn_buff, len);
-#endif
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	case PARAVIRT_PATCH(lock.queued_spin_unlock):
-		if (pv_is_native_spin_unlock())
-			return PATCH(lock, queued_spin_unlock, insn_buff, len);
-		break;
-
-	case PARAVIRT_PATCH(lock.vcpu_is_preempted):
-		if (pv_is_native_vcpu_is_preempted())
-			return PATCH(lock, vcpu_is_preempted, insn_buff, len);
-		break;
-#endif
-	default:
-		break;
-	}
-
 	return paravirt_patch_default(type, insn_buff, addr, len);
 }
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (7 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE Juergen Gross via Virtualization
@ 2020-11-20 11:46 ` Juergen Gross via Virtualization
  2020-11-20 14:18   ` kernel test robot
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
  9 siblings, 1 reply; 45+ messages in thread
From: Juergen Gross via Virtualization @ 2020-11-20 11:46 UTC (permalink / raw)
  To: xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Thomas Gleixner, Boris Ostrovsky

There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt_types.h | 19 +------------------
 arch/x86/kernel/Makefile              |  3 +--
 arch/x86/kernel/alternative.c         |  2 +-
 arch/x86/kernel/paravirt.c            |  7 ++-----
 arch/x86/kernel/paravirt_patch.c      | 11 -----------
 arch/x86/xen/enlighten_pv.c           |  1 -
 6 files changed, 5 insertions(+), 38 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 481d3b667005..bb79e21e1ead 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -74,19 +74,6 @@ struct pv_info {
 	const char *name;
 };
 
-struct pv_init_ops {
-	/*
-	 * Patch may replace one of the defined code sequences with
-	 * arbitrary code, subject to the same register constraints.
-	 * This generally means the code is not free to clobber any
-	 * registers other than EAX.  The patch function should return
-	 * the number of bytes of code generated, as we nop pad the
-	 * rest in generic code.
-	 */
-	unsigned (*patch)(u8 type, void *insn_buff,
-			  unsigned long addr, unsigned len);
-} __no_randomize_layout;
-
 #ifdef CONFIG_PARAVIRT_XXL
 struct pv_lazy_ops {
 	/* Set deferred update mode, used for batching operations. */
@@ -282,7 +269,6 @@ struct pv_lock_ops {
  * number for each function using the offset which we use to indicate
  * what to patch. */
 struct paravirt_patch_template {
-	struct pv_init_ops	init;
 	struct pv_cpu_ops	cpu;
 	struct pv_irq_ops	irq;
 	struct pv_mmu_ops	mmu;
@@ -323,10 +309,7 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, unsigned len);
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char *start, const char *end);
-
-unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned len);
+unsigned paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned len);
 
 int paravirt_disable_iospace(void);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 68608bd892c0..61f52f95670b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o					:= n
 KCSAN_SANITIZE := n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o			:= y
-OBJECT_FILES_NON_STANDARD_paravirt_patch.o		:= y
 
 ifdef CONFIG_FRAME_POINTER
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o		:= y
@@ -122,7 +121,7 @@ obj-$(CONFIG_AMD_NB)		+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
-obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch.o
+obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f8f9700719cf..7ed2c3992eb3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -617,7 +617,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
 		BUG_ON(p->len > MAX_PATCH_LEN);
 		/* prep the buffer with the original instructions */
 		memcpy(insn_buff, p->instr, p->len);
-		used = pv_ops.init.patch(p->type, insn_buff, (unsigned long)p->instr, p->len);
+		used = paravirt_patch(p->type, insn_buff, (unsigned long)p->instr, p->len);
 
 		BUG_ON(used > p->len);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index db6ae7f7c14e..f05404844245 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -99,8 +99,8 @@ void __init native_pv_lock_init(void)
 		static_branch_disable(&virt_spin_lock_key);
 }
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff,
-				unsigned long addr, unsigned len)
+unsigned paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
+			unsigned len)
 {
 	/*
 	 * Neat trick to map patch type back to the call within the
@@ -255,9 +255,6 @@ struct pv_info pv_info = {
 #define PTE_IDENT	__PV_IS_CALLEE_SAVE(_paravirt_ident_64)
 
 struct paravirt_patch_template pv_ops = {
-	/* Init ops. */
-	.init.patch		= native_patch,
-
 	/* Cpu ops. */
 	.cpu.io_delay		= native_io_delay,
 
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
deleted file mode 100644
index 10543dcc8211..000000000000
--- a/arch/x86/kernel/paravirt_patch.c
+++ /dev/null
@@ -1,11 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/stringify.h>
-
-#include <asm/paravirt.h>
-#include <asm/asm-offsets.h>
-
-unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
-			  unsigned int len)
-{
-	return paravirt_patch_default(type, insn_buff, addr, len);
-}
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4716383c64a9..66f83de4d9e0 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1218,7 +1218,6 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
-	pv_ops.init.patch = paravirt_patch_default;
 	pv_ops.cpu = xen_cpu_ops;
 	paravirt_iret = xen_iret;
 	xen_init_irq_ops();
-- 
2.26.2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-20 11:46 ` [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf Juergen Gross via Virtualization
@ 2020-11-20 11:59   ` Peter Zijlstra
  2020-11-20 12:05     ` Jürgen Groß via Virtualization
                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Peter Zijlstra @ 2020-11-20 11:59 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, VMware, Inc., x86, linux-kernel,
	virtualization, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky

On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> +static __always_inline void arch_local_irq_restore(unsigned long flags)
> +{
> +	if (!arch_irqs_disabled_flags(flags))
> +		arch_local_irq_enable();
> +}

If someone were to write horrible code like:

	local_irq_disable();
	local_irq_save(flags);
	local_irq_enable();
	local_irq_restore(flags);

we'd be up some creek without a paddle... now I don't _think_ we have
genius code like that, but I'd feel saver if we can haz an assertion in
there somewhere...

Maybe something like:

#ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
	WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
#endif

At the end?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross via Virtualization
@ 2020-11-20 12:01   ` Peter Zijlstra
  2020-11-20 12:07     ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:01 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	linux-kernel, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira

On Fri, Nov 20, 2020 at 12:46:24PM +0100, Juergen Gross wrote:
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.

There's also this patch floating around; just in case that would come in
handy:

  https://lkml.kernel.org/r/20201110005609.40989-3-frederic@kernel.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-20 11:59   ` Peter Zijlstra
@ 2020-11-20 12:05     ` Jürgen Groß via Virtualization
  2020-11-22  6:55     ` Jürgen Groß via Virtualization
  2020-12-09 18:15     ` Mark Rutland
  2 siblings, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-11-20 12:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stefano Stabellini, VMware, Inc., x86, linux-kernel,
	virtualization, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 926 bytes --]

On 20.11.20 12:59, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
>> +static __always_inline void arch_local_irq_restore(unsigned long flags)
>> +{
>> +	if (!arch_irqs_disabled_flags(flags))
>> +		arch_local_irq_enable();
>> +}
> 
> If someone were to write horrible code like:
> 
> 	local_irq_disable();
> 	local_irq_save(flags);
> 	local_irq_enable();
> 	local_irq_restore(flags);
> 
> we'd be up some creek without a paddle... now I don't _think_ we have
> genius code like that, but I'd feel saver if we can haz an assertion in
> there somewhere...
> 
> Maybe something like:
> 
> #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> 	WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> #endif
> 
> At the end?
> 

I'd be fine with that. I didn't add something like that because I
couldn't find a suitable CONFIG_ :-)


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call()
  2020-11-20 12:01   ` Peter Zijlstra
@ 2020-11-20 12:07     ` Jürgen Groß via Virtualization
  0 siblings, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-11-20 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	linux-kernel, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira


[-- Attachment #1.1.1.1: Type: text/plain, Size: 714 bytes --]

On 20.11.20 13:01, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:24PM +0100, Juergen Gross wrote:
>> The time pvops functions are the only ones left which might be
>> used in 32-bit mode and which return a 64-bit value.
>>
>> Switch them to use the static_call() mechanism instead of pvops, as
>> this allows quite some simplification of the pvops implementation.
>>
>> Due to include hell this requires to split out the time interfaces
>> into a new header file.
> 
> There's also this patch floating around; just in case that would come in
> handy:
> 
>    https://lkml.kernel.org/r/20201110005609.40989-3-frederic@kernel.org
> 

Ah, yes. This would make life much easier.


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft
  2020-11-20 11:46 ` [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft Juergen Gross via Virtualization
@ 2020-11-20 12:08   ` Peter Zijlstra
  2020-11-20 12:16     ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:08 UTC (permalink / raw)
  To: Juergen Gross
  Cc: VMware, Inc., x86, linux-kernel, virtualization, Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, xen-devel, Thomas Gleixner

On Fri, Nov 20, 2020 at 12:46:26PM +0100, Juergen Gross wrote:
> +#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)	\
>  	({								\
>  		PVOP_CALL_ARGS;						\
>  		PVOP_TEST_NULL(op);					\
> +		BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));	\
> +		asm volatile(paravirt_alt(PARAVIRT_CALL)		\
> +			     : call_clbr, ASM_CALL_CONSTRAINT		\
> +			     : paravirt_type(op),			\
> +			       paravirt_clobber(clbr),			\
> +			       ##__VA_ARGS__				\
> +			     : "memory", "cc" extra_clbr);		\
> +		(rettype)(__eax & PVOP_RETMASK(rettype));		\
>  	})

This is now very similar to ____PVOP_VCALL() (note how PVOP_CALL_ARGS is
PVOP_VCALL_ARGS).

Could we get away with doing something horrible like:

#define ____PVOP_VCALL(X...) (void)____PVOP_CALL(long, X)

?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft
  2020-11-20 12:08   ` Peter Zijlstra
@ 2020-11-20 12:16     ` Jürgen Groß via Virtualization
  0 siblings, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-11-20 12:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: VMware, Inc., x86, linux-kernel, virtualization, Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, xen-devel, Thomas Gleixner


[-- Attachment #1.1.1.1: Type: text/plain, Size: 979 bytes --]

On 20.11.20 13:08, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:26PM +0100, Juergen Gross wrote:
>> +#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)	\
>>   	({								\
>>   		PVOP_CALL_ARGS;						\
>>   		PVOP_TEST_NULL(op);					\
>> +		BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));	\
>> +		asm volatile(paravirt_alt(PARAVIRT_CALL)		\
>> +			     : call_clbr, ASM_CALL_CONSTRAINT		\
>> +			     : paravirt_type(op),			\
>> +			       paravirt_clobber(clbr),			\
>> +			       ##__VA_ARGS__				\
>> +			     : "memory", "cc" extra_clbr);		\
>> +		(rettype)(__eax & PVOP_RETMASK(rettype));		\
>>   	})
> 
> This is now very similar to ____PVOP_VCALL() (note how PVOP_CALL_ARGS is
> PVOP_VCALL_ARGS).
> 
> Could we get away with doing something horrible like:
> 
> #define ____PVOP_VCALL(X...) (void)____PVOP_CALL(long, X)
> 
> ?

Oh, indeed. And in patch 9 the same could be done for the ALT variants.


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
                   ` (8 preceding siblings ...)
  2020-11-20 11:46 ` [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function Juergen Gross via Virtualization
@ 2020-11-20 12:53 ` Peter Zijlstra
  2020-11-23 13:43   ` Peter Zijlstra
  9 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	linux-kernel, Sean Christopherson, Paolo Bonzini,
	Daniel Bristot de Oliveira

On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
>  30 files changed, 325 insertions(+), 598 deletions(-)

Much awesome ! I'll try and get that objtool thing sorted.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function
  2020-11-20 11:46 ` [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function Juergen Gross via Virtualization
@ 2020-11-20 14:18   ` kernel test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2020-11-20 14:18 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, VMware, Inc., kbuild-all, peterz,
	clang-built-linux, luto, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]

Hi Juergen,

I love your patch! Perhaps something to improve:

[auto build test WARNING on v5.10-rc4]
[also build test WARNING on next-20201120]
[cannot apply to tip/x86/core xen-tip/linux-next tip/x86/asm]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934
base:    09162bc32c880a791c6c0668ce0745cf7958f576
config: x86_64-randconfig-r014-20201120 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 3ded927cf80ac519f9f9c4664fef08787f7c537d)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/340df5e02c66ec37486a1f31e6497a22dab65059
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934
        git checkout 340df5e02c66ec37486a1f31e6497a22dab65059
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/paravirt.c:124:10: warning: no previous prototype for function 'paravirt_patch_insns' [-Wmissing-prototypes]
   unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
            ^
   arch/x86/kernel/paravirt.c:124:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
   ^
   static 
   1 warning generated.

vim +/paravirt_patch_insns +124 arch/x86/kernel/paravirt.c

63f70270ccd981c arch/i386/kernel/paravirt.c Jeremy Fitzhardinge 2007-05-02  123  
1fc654cf6e04b40 arch/x86/kernel/paravirt.c  Ingo Molnar         2019-04-25 @124  unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
63f70270ccd981c arch/i386/kernel/paravirt.c Jeremy Fitzhardinge 2007-05-02  125  			      const char *start, const char *end)
63f70270ccd981c arch/i386/kernel/paravirt.c Jeremy Fitzhardinge 2007-05-02  126  {
63f70270ccd981c arch/i386/kernel/paravirt.c Jeremy Fitzhardinge 2007-05-02  127  	unsigned insn_len = end - start;
63f70270ccd981c arch/i386/kernel/paravirt.c Jeremy Fitzhardinge 2007-05-02  128  
2777cae2b19d4a0 arch/x86/kernel/paravirt.c  Ingo Molnar         2019-04-25  129  	/* Alternative instruction is too large for the patch site and we cannot continue: */
2777cae2b19d4a0 arch/x86/kernel/paravirt.c  Ingo Molnar         2019-04-25  130  	BUG_ON(insn_len > len || start == NULL);
2777cae2b19d4a0 arch/x86/kernel/paravirt.c  Ingo Molnar         2019-04-25  131  
1fc654cf6e04b40 arch/x86/kernel/paravirt.c  Ingo Molnar         2019-04-25  132  	memcpy(insn_buff, start, insn_len);
139ec7c416248b9 arch/i386/kernel/paravirt.c Rusty Russell       2006-12-07  133  
139ec7c416248b9 arch/i386/kernel/paravirt.c Rusty Russell       2006-12-07  134  	return insn_len;
139ec7c416248b9 arch/i386/kernel/paravirt.c Rusty Russell       2006-12-07  135  }
139ec7c416248b9 arch/i386/kernel/paravirt.c Rusty Russell       2006-12-07  136  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27174 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE
  2020-11-20 11:46 ` [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE Juergen Gross via Virtualization
@ 2020-11-20 15:46   ` kernel test robot
  0 siblings, 0 replies; 45+ messages in thread
From: kernel test robot @ 2020-11-20 15:46 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, x86, virtualization, linux-kernel
  Cc: Juergen Gross, kbuild-all, peterz, VMware, Inc., luto,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

Hi Juergen,

I love your patch! Perhaps something to improve:

[auto build test WARNING on v5.10-rc4]
[also build test WARNING on next-20201120]
[cannot apply to tip/x86/core xen-tip/linux-next tip/x86/asm]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934
base:    09162bc32c880a791c6c0668ce0745cf7958f576
config: x86_64-rhel-7.6-kselftests (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/fd8d46a7a2c51313ee14c43af60ff337279384ef
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Juergen-Gross/x86-major-paravirt-cleanup/20201120-194934
        git checkout fd8d46a7a2c51313ee14c43af60ff337279384ef
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/misc/lkdtm/bugs.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50016 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-20 11:59   ` Peter Zijlstra
  2020-11-20 12:05     ` Jürgen Groß via Virtualization
@ 2020-11-22  6:55     ` Jürgen Groß via Virtualization
  2020-11-22 21:44       ` Andy Lutomirski
  2020-12-09 18:15     ` Mark Rutland
  2 siblings, 1 reply; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-11-22  6:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stefano Stabellini, VMware, Inc., x86, linux-kernel,
	virtualization, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 986 bytes --]

On 20.11.20 12:59, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
>> +static __always_inline void arch_local_irq_restore(unsigned long flags)
>> +{
>> +	if (!arch_irqs_disabled_flags(flags))
>> +		arch_local_irq_enable();
>> +}
> 
> If someone were to write horrible code like:
> 
> 	local_irq_disable();
> 	local_irq_save(flags);
> 	local_irq_enable();
> 	local_irq_restore(flags);
> 
> we'd be up some creek without a paddle... now I don't _think_ we have
> genius code like that, but I'd feel saver if we can haz an assertion in
> there somewhere...
> 
> Maybe something like:
> 
> #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> 	WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> #endif
> 
> At the end?

I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
like a perfect receipt for include dependency hell.

We could use a plain asm("ud2") instead.


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-22  6:55     ` Jürgen Groß via Virtualization
@ 2020-11-22 21:44       ` Andy Lutomirski
  2020-11-23  5:21         ` Jürgen Groß via Virtualization
  2020-12-09 13:27         ` Mark Rutland
  0 siblings, 2 replies; 45+ messages in thread
From: Andy Lutomirski @ 2020-11-22 21:44 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Stefano Stabellini, Peter Zijlstra, X86 ML, LKML,
	Linux Virtualization, VMware, Inc., Ingo Molnar, Borislav Petkov,
	Andrew Lutomirski, H. Peter Anvin, xen-devel, Thomas Gleixner,
	Boris Ostrovsky

On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
>
> On 20.11.20 12:59, Peter Zijlstra wrote:
> > On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> >> +static __always_inline void arch_local_irq_restore(unsigned long flags)
> >> +{
> >> +    if (!arch_irqs_disabled_flags(flags))
> >> +            arch_local_irq_enable();
> >> +}
> >
> > If someone were to write horrible code like:
> >
> >       local_irq_disable();
> >       local_irq_save(flags);
> >       local_irq_enable();
> >       local_irq_restore(flags);
> >
> > we'd be up some creek without a paddle... now I don't _think_ we have
> > genius code like that, but I'd feel saver if we can haz an assertion in
> > there somewhere...
> >
> > Maybe something like:
> >
> > #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> >       WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> > #endif
> >
> > At the end?
>
> I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
> like a perfect receipt for include dependency hell.
>
> We could use a plain asm("ud2") instead.

How about out-of-lining it:

#ifdef CONFIG_DEBUG_ENTRY
extern void warn_bogus_irqrestore();
#endif

static __always_inline void arch_local_irq_restore(unsigned long flags)
{
       if (!arch_irqs_disabled_flags(flags)) {
               arch_local_irq_enable();
       } else {
#ifdef CONFIG_DEBUG_ENTRY
               if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
                    warn_bogus_irqrestore();
#endif
}
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-22 21:44       ` Andy Lutomirski
@ 2020-11-23  5:21         ` Jürgen Groß via Virtualization
  2020-11-23 15:15           ` Andy Lutomirski
  2020-12-09 13:27         ` Mark Rutland
  1 sibling, 1 reply; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-11-23  5:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stefano Stabellini, Peter Zijlstra, X86 ML, LKML,
	Linux Virtualization, VMware, Inc., Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 2225 bytes --]

On 22.11.20 22:44, Andy Lutomirski wrote:
> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
>>
>> On 20.11.20 12:59, Peter Zijlstra wrote:
>>> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
>>>> +static __always_inline void arch_local_irq_restore(unsigned long flags)
>>>> +{
>>>> +    if (!arch_irqs_disabled_flags(flags))
>>>> +            arch_local_irq_enable();
>>>> +}
>>>
>>> If someone were to write horrible code like:
>>>
>>>        local_irq_disable();
>>>        local_irq_save(flags);
>>>        local_irq_enable();
>>>        local_irq_restore(flags);
>>>
>>> we'd be up some creek without a paddle... now I don't _think_ we have
>>> genius code like that, but I'd feel saver if we can haz an assertion in
>>> there somewhere...
>>>
>>> Maybe something like:
>>>
>>> #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
>>>        WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
>>> #endif
>>>
>>> At the end?
>>
>> I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
>> like a perfect receipt for include dependency hell.
>>
>> We could use a plain asm("ud2") instead.
> 
> How about out-of-lining it:
> 
> #ifdef CONFIG_DEBUG_ENTRY
> extern void warn_bogus_irqrestore();
> #endif
> 
> static __always_inline void arch_local_irq_restore(unsigned long flags)
> {
>         if (!arch_irqs_disabled_flags(flags)) {
>                 arch_local_irq_enable();
>         } else {
> #ifdef CONFIG_DEBUG_ENTRY
>                 if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
>                      warn_bogus_irqrestore();
> #endif
> }
> 

This couldn't be a WARN_ON_ONCE() then (or it would be a catch all).
Another approach might be to open-code the WARN_ON_ONCE(), like:

#ifdef CONFIG_DEBUG_ENTRY
extern void warn_bogus_irqrestore(bool *once);
#endif

static __always_inline void arch_local_irq_restore(unsigned long flags)
{
	if (!arch_irqs_disabled_flags(flags))
		arch_local_irq_enable();
#ifdef CONFIG_DEBUG_ENTRY
	{
		static bool once;

		if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
			warn_bogus_irqrestore(&once);
	}
#endif
}


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
@ 2020-11-23 13:43   ` Peter Zijlstra
  2020-12-15 11:42     ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-11-23 13:43 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Josh Poimboeuf, Vincent Guittot, Thomas Gleixner,
	Dietmar Eggemann, Jim Mattson, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
> >  30 files changed, 325 insertions(+), 598 deletions(-)
> 
> Much awesome ! I'll try and get that objtool thing sorted.

This seems to work for me. It isn't 100% accurate, because it doesn't
know about the direct call instruction, but I can either fudge that or
switching to static_call() will cure that.

It's not exactly pretty, but it should be straight forward.

Index: linux-2.6/tools/objtool/check.c
===================================================================
--- linux-2.6.orig/tools/objtool/check.c
+++ linux-2.6/tools/objtool/check.c
@@ -1090,6 +1090,32 @@ static int handle_group_alt(struct objto
 		return -1;
 	}
 
+	/*
+	 * Add the filler NOP, required for alternative CFI.
+	 */
+	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
+		struct instruction *nop = malloc(sizeof(*nop));
+		if (!nop) {
+			WARN("malloc failed");
+			return -1;
+		}
+		memset(nop, 0, sizeof(*nop));
+		INIT_LIST_HEAD(&nop->alts);
+		INIT_LIST_HEAD(&nop->stack_ops);
+		init_cfi_state(&nop->cfi);
+
+		nop->sec = last_new_insn->sec;
+		nop->ignore = last_new_insn->ignore;
+		nop->func = last_new_insn->func;
+		nop->alt_group = alt_group;
+		nop->offset = last_new_insn->offset + last_new_insn->len;
+		nop->type = INSN_NOP;
+		nop->len = special_alt->orig_len - special_alt->new_len;
+
+		list_add(&nop->list, &last_new_insn->list);
+		last_new_insn = nop;
+	}
+
 	if (fake_jump)
 		list_add(&fake_jump->list, &last_new_insn->list);
 
@@ -2190,18 +2216,12 @@ static int handle_insn_ops(struct instru
 	struct stack_op *op;
 
 	list_for_each_entry(op, &insn->stack_ops, list) {
-		struct cfi_state old_cfi = state->cfi;
 		int res;
 
 		res = update_cfi_state(insn, &state->cfi, op);
 		if (res)
 			return res;
 
-		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
-			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
-			return -1;
-		}
-
 		if (op->dest.type == OP_DEST_PUSHF) {
 			if (!state->uaccess_stack) {
 				state->uaccess_stack = 1;
@@ -2399,19 +2419,137 @@ static int validate_return(struct symbol
  * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
  * states which then results in ORC entries, which we just said we didn't want.
  *
- * Avoid them by copying the CFI entry of the first instruction into the whole
- * alternative.
+ * Avoid them by copying the CFI entry of the first instruction into the hole.
  */
-static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
+static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
 {
 	struct instruction *first_insn = insn;
 	int alt_group = insn->alt_group;
 
-	sec_for_each_insn_continue(file, insn) {
+	sec_for_each_insn_from(file, insn) {
 		if (insn->alt_group != alt_group)
 			break;
-		insn->cfi = first_insn->cfi;
+
+		if (!insn->visited)
+			insn->cfi = first_insn->cfi;
+	}
+}
+
+static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	__fill_alt_cfi(file, alt_insn);
+
+	list_for_each_entry(alt, &alt_insn->alts, list)
+		__fill_alt_cfi(file, alt->insn);
+}
+
+static struct instruction *
+__find_unwind(struct objtool_file *file,
+	      struct instruction *insn, unsigned long offset)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *next;
+	unsigned long off = 0;
+
+	while ((off + insn->len) <= offset) {
+		next = next_insn_same_sec(file, insn);
+		if (next && next->alt_group != alt_group)
+			next = NULL;
+
+		if (!next)
+			break;
+
+		off += insn->len;
+		insn = next;
 	}
+
+	return insn;
+}
+
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset)
+{
+	struct instruction *fit;
+	struct alternative *alt;
+	unsigned long fit_off;
+
+	fit = __find_unwind(file, alt_insn, offset);
+	fit_off = (fit->offset - alt_insn->offset);
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		struct instruction *x;
+		unsigned long x_off;
+
+		x = __find_unwind(file, alt->insn, offset);
+		x_off = (x->offset - alt->insn->offset);
+
+		if (fit_off < x_off) {
+			fit = x;
+			fit_off = x_off;
+
+		} else if (fit_off == x_off &&
+			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(fit->sec, fit->offset);
+			char *_str2 = offstr(x->sec, x->offset);
+			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
+			free(_str1);
+			free(_str2);
+			return fit;
+		}
+	}
+
+	return fit;
+}
+
+static int __validate_unwind(struct objtool_file *file,
+			     struct instruction *alt_insn,
+			     struct instruction *insn)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *unwind;
+	unsigned long offset = 0;
+
+	sec_for_each_insn_from(file, insn) {
+		if (insn->alt_group != alt_group)
+			break;
+
+		unwind = find_alt_unwind(file, alt_insn, offset);
+
+		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(insn->sec, insn->offset);
+			char *_str2 = offstr(unwind->sec, unwind->offset);
+			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
+			     _str1, _str2, offset);
+			free(_str1);
+			free(_str2);
+			return 1;
+		}
+
+		offset += insn->len;
+	}
+
+	return 0;
+}
+
+static int validate_alt_unwind(struct objtool_file *file,
+			       struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	if (__validate_unwind(file, alt_insn, alt_insn))
+		return 1;
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		if (__validate_unwind(file, alt_insn, alt->insn))
+			return 1;
+	}
+
+	return 0;
 }
 
 /*
@@ -2423,9 +2561,10 @@ static void fill_alternative_cfi(struct
 static int validate_branch(struct objtool_file *file, struct symbol *func,
 			   struct instruction *insn, struct insn_state state)
 {
+	struct instruction *next_insn, *alt_insn = NULL;
 	struct alternative *alt;
-	struct instruction *next_insn;
 	struct section *sec;
+	int alt_group = 0;
 	u8 visited;
 	int ret;
 
@@ -2480,8 +2619,10 @@ static int validate_branch(struct objtoo
 				}
 			}
 
-			if (insn->alt_group)
-				fill_alternative_cfi(file, insn);
+			if (insn->alt_group) {
+				alt_insn = insn;
+				alt_group = insn->alt_group;
+			}
 
 			if (skip_orig)
 				return 0;
@@ -2613,6 +2754,17 @@ static int validate_branch(struct objtoo
 		}
 
 		insn = next_insn;
+
+		if (alt_insn && insn->alt_group != alt_group) {
+			alt_insn->alt_end = insn;
+
+			fill_alt_cfi(file, alt_insn);
+
+			if (validate_alt_unwind(file, alt_insn))
+				return 1;
+
+			alt_insn = NULL;
+		}
 	}
 
 	return 0;
Index: linux-2.6/tools/objtool/check.h
===================================================================
--- linux-2.6.orig/tools/objtool/check.h
+++ linux-2.6/tools/objtool/check.h
@@ -40,6 +40,7 @@ struct instruction {
 	struct instruction *first_jump_src;
 	struct reloc *jump_table;
 	struct list_head alts;
+	struct instruction *alt_end;
 	struct symbol *func;
 	struct list_head stack_ops;
 	struct cfi_state cfi;
@@ -54,6 +55,10 @@ static inline bool is_static_jump(struct
 	       insn->type == INSN_JUMP_UNCONDITIONAL;
 }
 
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset);
+
 struct instruction *find_insn(struct objtool_file *file,
 			      struct section *sec, unsigned long offset);
 
Index: linux-2.6/tools/objtool/orc_gen.c
===================================================================
--- linux-2.6.orig/tools/objtool/orc_gen.c
+++ linux-2.6/tools/objtool/orc_gen.c
@@ -12,75 +12,86 @@
 #include "check.h"
 #include "warn.h"
 
-int create_orc(struct objtool_file *file)
+static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
 {
-	struct instruction *insn;
+	struct orc_entry *orc = &insn->orc;
+	struct cfi_reg *cfa = &insn->cfi.cfa;
+	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+
+	orc->end = insn->cfi.end;
+
+	if (cfa->base == CFI_UNDEFINED) {
+		orc->sp_reg = ORC_REG_UNDEFINED;
+		return 0;
+	}
 
-	for_each_insn(file, insn) {
-		struct orc_entry *orc = &insn->orc;
-		struct cfi_reg *cfa = &insn->cfi.cfa;
-		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+	switch (cfa->base) {
+	case CFI_SP:
+		orc->sp_reg = ORC_REG_SP;
+		break;
+	case CFI_SP_INDIRECT:
+		orc->sp_reg = ORC_REG_SP_INDIRECT;
+		break;
+	case CFI_BP:
+		orc->sp_reg = ORC_REG_BP;
+		break;
+	case CFI_BP_INDIRECT:
+		orc->sp_reg = ORC_REG_BP_INDIRECT;
+		break;
+	case CFI_R10:
+		orc->sp_reg = ORC_REG_R10;
+		break;
+	case CFI_R13:
+		orc->sp_reg = ORC_REG_R13;
+		break;
+	case CFI_DI:
+		orc->sp_reg = ORC_REG_DI;
+		break;
+	case CFI_DX:
+		orc->sp_reg = ORC_REG_DX;
+		break;
+	default:
+		WARN_FUNC("unknown CFA base reg %d",
+			  insn->sec, insn->offset, cfa->base);
+		return -1;
+	}
 
-		if (!insn->sec->text)
-			continue;
+	switch(bp->base) {
+	case CFI_UNDEFINED:
+		orc->bp_reg = ORC_REG_UNDEFINED;
+		break;
+	case CFI_CFA:
+		orc->bp_reg = ORC_REG_PREV_SP;
+		break;
+	case CFI_BP:
+		orc->bp_reg = ORC_REG_BP;
+		break;
+	default:
+		WARN_FUNC("unknown BP base reg %d",
+			  insn->sec, insn->offset, bp->base);
+		return -1;
+	}
 
-		orc->end = insn->cfi.end;
+	orc->sp_offset = cfa->offset;
+	orc->bp_offset = bp->offset;
+	orc->type = insn->cfi.type;
 
-		if (cfa->base == CFI_UNDEFINED) {
-			orc->sp_reg = ORC_REG_UNDEFINED;
-			continue;
-		}
+	return 0;
+}
 
-		switch (cfa->base) {
-		case CFI_SP:
-			orc->sp_reg = ORC_REG_SP;
-			break;
-		case CFI_SP_INDIRECT:
-			orc->sp_reg = ORC_REG_SP_INDIRECT;
-			break;
-		case CFI_BP:
-			orc->sp_reg = ORC_REG_BP;
-			break;
-		case CFI_BP_INDIRECT:
-			orc->sp_reg = ORC_REG_BP_INDIRECT;
-			break;
-		case CFI_R10:
-			orc->sp_reg = ORC_REG_R10;
-			break;
-		case CFI_R13:
-			orc->sp_reg = ORC_REG_R13;
-			break;
-		case CFI_DI:
-			orc->sp_reg = ORC_REG_DI;
-			break;
-		case CFI_DX:
-			orc->sp_reg = ORC_REG_DX;
-			break;
-		default:
-			WARN_FUNC("unknown CFA base reg %d",
-				  insn->sec, insn->offset, cfa->base);
-			return -1;
-		}
+int create_orc(struct objtool_file *file)
+{
+	struct instruction *insn;
 
-		switch(bp->base) {
-		case CFI_UNDEFINED:
-			orc->bp_reg = ORC_REG_UNDEFINED;
-			break;
-		case CFI_CFA:
-			orc->bp_reg = ORC_REG_PREV_SP;
-			break;
-		case CFI_BP:
-			orc->bp_reg = ORC_REG_BP;
-			break;
-		default:
-			WARN_FUNC("unknown BP base reg %d",
-				  insn->sec, insn->offset, bp->base);
-			return -1;
-		}
+	for_each_insn(file, insn) {
+		int ret;
+	       
+		if (!insn->sec->text)
+			continue;
 
-		orc->sp_offset = cfa->offset;
-		orc->bp_offset = bp->offset;
-		orc->type = insn->cfi.type;
+		ret = create_orc_insn(file, insn);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					/* XXX: skipped earlier ! */
+					create_orc_insn(file, unwind);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+						idx++;
+//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn ||
 			    memcmp(&insn->orc, &prev_insn->orc,
 				   sizeof(struct orc_entry))) {
@@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+
+						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
+								     insn->sec, insn->offset + offset,
+								     &unwind->orc))
+							return -1;
+
+						idx++;
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
 						 sizeof(struct orc_entry))) {
 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-23  5:21         ` Jürgen Groß via Virtualization
@ 2020-11-23 15:15           ` Andy Lutomirski
  0 siblings, 0 replies; 45+ messages in thread
From: Andy Lutomirski @ 2020-11-23 15:15 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Stefano Stabellini, Peter Zijlstra, X86 ML, LKML,
	Linux Virtualization, VMware, Inc., Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, xen-devel, Thomas Gleixner,
	Boris Ostrovsky





> On Nov 22, 2020, at 9:22 PM, Jürgen Groß <jgross@suse.com> wrote:
> 
> On 22.11.20 22:44, Andy Lutomirski wrote:
>>> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
>>> 
>>> On 20.11.20 12:59, Peter Zijlstra wrote:
>>>> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
>>>>> +static __always_inline void arch_local_irq_restore(unsigned long flags)
>>>>> +{
>>>>> +    if (!arch_irqs_disabled_flags(flags))
>>>>> +            arch_local_irq_enable();
>>>>> +}
>>>> 
>>>> If someone were to write horrible code like:
>>>> 
>>>>       local_irq_disable();
>>>>       local_irq_save(flags);
>>>>       local_irq_enable();
>>>>       local_irq_restore(flags);
>>>> 
>>>> we'd be up some creek without a paddle... now I don't _think_ we have
>>>> genius code like that, but I'd feel saver if we can haz an assertion in
>>>> there somewhere...
>>>> 
>>>> Maybe something like:
>>>> 
>>>> #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
>>>>       WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
>>>> #endif
>>>> 
>>>> At the end?
>>> 
>>> I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
>>> like a perfect receipt for include dependency hell.
>>> 
>>> We could use a plain asm("ud2") instead.
>> How about out-of-lining it:
>> #ifdef CONFIG_DEBUG_ENTRY
>> extern void warn_bogus_irqrestore();
>> #endif
>> static __always_inline void arch_local_irq_restore(unsigned long flags)
>> {
>>        if (!arch_irqs_disabled_flags(flags)) {
>>                arch_local_irq_enable();
>>        } else {
>> #ifdef CONFIG_DEBUG_ENTRY
>>                if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
>>                     warn_bogus_irqrestore();
>> #endif
>> }
> 
> This couldn't be a WARN_ON_ONCE() then (or it would be a catch all).

If you put the WARN_ON_ONCE in the out-of-line helper, it should work reasonably well.

> Another approach might be to open-code the WARN_ON_ONCE(), like:
> 
> #ifdef CONFIG_DEBUG_ENTRY
> extern void warn_bogus_irqrestore(bool *once);
> #endif
> 
> static __always_inline void arch_local_irq_restore(unsigned long flags)
> {
>    if (!arch_irqs_disabled_flags(flags))
>        arch_local_irq_enable();
> #ifdef CONFIG_DEBUG_ENTRY
>    {
>        static bool once;
> 
>        if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
>            warn_bogus_irqrestore(&once);
>    }
> #endif
> }
> 

I don’t know precisely what a static variable in an __always_inline function will do, but I imagine it will be, at best, erratic, especially when modules are involved.

> 
> Juergen
> <OpenPGP_0xB0DE9DD628BF132F.asc>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE
  2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
@ 2020-11-27 11:31   ` Borislav Petkov
  2020-12-09 21:07   ` Thomas Gleixner
  1 sibling, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2020-11-27 11:31 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, peterz, x86, linux-kernel, virtualization,
	VMware, Inc., Ingo Molnar, luto, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Boris Ostrovsky

On Fri, Nov 20, 2020 at 12:46:21PM +0100, Juergen Gross wrote:
> SWAPGS is used only for interrupts coming from user mode or for
> returning to user mode. So there is no reason to use the PARAVIRT
> framework, as it can easily be replaced by an ALTERNATIVE depending
> on X86_FEATURE_XENPV.
> 
> There are several instances using the PV-aware SWAPGS macro in paths
> which are never executed in a Xen PV guest. Replace those with the
> plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Andy Lutomirski <luto@kernel.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/entry/entry_64.S             | 10 +++++-----
>  arch/x86/include/asm/irqflags.h       | 20 ++++++++------------
>  arch/x86/include/asm/paravirt.h       | 20 --------------------
>  arch/x86/include/asm/paravirt_types.h |  2 --
>  arch/x86/kernel/asm-offsets_64.c      |  1 -
>  arch/x86/kernel/paravirt.c            |  1 -
>  arch/x86/kernel/paravirt_patch.c      |  3 ---
>  arch/x86/xen/enlighten_pv.c           |  3 ---
>  8 files changed, 13 insertions(+), 47 deletions(-)

I love patches like this one! Give me more...

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call
  2020-11-20 11:46 ` [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call Juergen Gross via Virtualization
@ 2020-12-02 12:32   ` Borislav Petkov
  2020-12-02 14:48     ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Borislav Petkov @ 2020-12-02 12:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, peterz, x86, linux-kernel, virtualization,
	VMware, Inc., Ingo Molnar, luto, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Boris Ostrovsky

On Fri, Nov 20, 2020 at 12:46:22PM +0100, Juergen Gross wrote:
> @@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>  	 * Try to use SYSRET instead of IRET if we're returning to
>  	 * a completely clean 64-bit userspace context.  If we're not,
>  	 * go to the slow exit path.
> +	 * In the Xen PV case we must use iret anyway.
>  	 */
> -	movq	RCX(%rsp), %rcx
> -	movq	RIP(%rsp), %r11
>  
> -	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
> -	jne	swapgs_restore_regs_and_return_to_usermode
> +	ALTERNATIVE __stringify( \
> +		movq	RCX(%rsp), %rcx; \
> +		movq	RIP(%rsp), %r11; \
> +		cmpq	%rcx, %r11;	/* SYSRET requires RCX == RIP */ \
> +		jne	swapgs_restore_regs_and_return_to_usermode), \
> +	"jmp	swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV

Why such a big ALTERNATIVE when you can simply do:

        /*
         * Try to use SYSRET instead of IRET if we're returning to
         * a completely clean 64-bit userspace context.  If we're not,
         * go to the slow exit path.
         * In the Xen PV case we must use iret anyway.
         */
        ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV

        movq    RCX(%rsp), %rcx;
        movq    RIP(%rsp), %r11;
        cmpq    %rcx, %r11;     /* SYSRET requires RCX == RIP */ \
        jne     swapgs_restore_regs_and_return_to_usermode

?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call
  2020-12-02 12:32   ` Borislav Petkov
@ 2020-12-02 14:48     ` Jürgen Groß via Virtualization
  2020-12-02 17:08       ` Borislav Petkov
  0 siblings, 1 reply; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-12-02 14:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stefano Stabellini, peterz, x86, linux-kernel, virtualization,
	VMware, Inc., Ingo Molnar, luto, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 1685 bytes --]

On 02.12.20 13:32, Borislav Petkov wrote:
> On Fri, Nov 20, 2020 at 12:46:22PM +0100, Juergen Gross wrote:
>> @@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
>>   	 * Try to use SYSRET instead of IRET if we're returning to
>>   	 * a completely clean 64-bit userspace context.  If we're not,
>>   	 * go to the slow exit path.
>> +	 * In the Xen PV case we must use iret anyway.
>>   	 */
>> -	movq	RCX(%rsp), %rcx
>> -	movq	RIP(%rsp), %r11
>>   
>> -	cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
>> -	jne	swapgs_restore_regs_and_return_to_usermode
>> +	ALTERNATIVE __stringify( \
>> +		movq	RCX(%rsp), %rcx; \
>> +		movq	RIP(%rsp), %r11; \
>> +		cmpq	%rcx, %r11;	/* SYSRET requires RCX == RIP */ \
>> +		jne	swapgs_restore_regs_and_return_to_usermode), \
>> +	"jmp	swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
> 
> Why such a big ALTERNATIVE when you can simply do:
> 
>          /*
>           * Try to use SYSRET instead of IRET if we're returning to
>           * a completely clean 64-bit userspace context.  If we're not,
>           * go to the slow exit path.
>           * In the Xen PV case we must use iret anyway.
>           */
>          ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
> 
>          movq    RCX(%rsp), %rcx;
>          movq    RIP(%rsp), %r11;
>          cmpq    %rcx, %r11;     /* SYSRET requires RCX == RIP */ \
>          jne     swapgs_restore_regs_and_return_to_usermode
> 
> ?
> 

I wanted to avoid the additional NOPs for the bare metal case.

If you don't mind them I can do as you are suggesting.


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call
  2020-12-02 14:48     ` Jürgen Groß via Virtualization
@ 2020-12-02 17:08       ` Borislav Petkov
  0 siblings, 0 replies; 45+ messages in thread
From: Borislav Petkov @ 2020-12-02 17:08 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Stefano Stabellini, peterz, x86, linux-kernel, virtualization,
	VMware, Inc., Ingo Molnar, luto, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Boris Ostrovsky

On Wed, Dec 02, 2020 at 03:48:21PM +0100, Jürgen Groß wrote:
> I wanted to avoid the additional NOPs for the bare metal case.

Yeah, in that case it gets optimized to a single NOP:

[    0.176692] SMP alternatives: ffffffff81a00068: [0:5) optimized NOPs: 0f 1f 44 00 00

which is nopl 0x0(%rax,%rax,1) and I don't think that's noticeable on
modern CPUs where a NOP is basically a rIP increment only and that goes
down the pipe almost for free. :-)

> If you don't mind them I can do as you are suggesting.

Yes pls, I think asm readability is more important than a 5-byte NOP.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-22 21:44       ` Andy Lutomirski
  2020-11-23  5:21         ` Jürgen Groß via Virtualization
@ 2020-12-09 13:27         ` Mark Rutland
  2020-12-09 14:02           ` Mark Rutland
  1 sibling, 1 reply; 45+ messages in thread
From: Mark Rutland @ 2020-12-09 13:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jürgen Groß, Stefano Stabellini, Peter Zijlstra, X86 ML,
	LKML, Linux Virtualization, VMware, Inc., Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, xen-devel, Thomas Gleixner,
	Boris Ostrovsky

On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
> >
> > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > >> +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > >> +{
> > >> +    if (!arch_irqs_disabled_flags(flags))
> > >> +            arch_local_irq_enable();
> > >> +}
> > >
> > > If someone were to write horrible code like:
> > >
> > >       local_irq_disable();
> > >       local_irq_save(flags);
> > >       local_irq_enable();
> > >       local_irq_restore(flags);
> > >
> > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > there somewhere...
> > >
> > > Maybe something like:
> > >
> > > #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> > >       WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> > > #endif
> > >
> > > At the end?
> >
> > I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
> > like a perfect receipt for include dependency hell.
> >
> > We could use a plain asm("ud2") instead.
> 
> How about out-of-lining it:
> 
> #ifdef CONFIG_DEBUG_ENTRY
> extern void warn_bogus_irqrestore();
> #endif
> 
> static __always_inline void arch_local_irq_restore(unsigned long flags)
> {
>        if (!arch_irqs_disabled_flags(flags)) {
>                arch_local_irq_enable();
>        } else {
> #ifdef CONFIG_DEBUG_ENTRY
>                if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
>                     warn_bogus_irqrestore();
> #endif
> }

I was just talking to Peter on IRC about implementing the same thing for
arm64, so could we put this in the generic irqflags code? IIUC we can
use raw_irqs_disabled() to do the check.

As this isn't really entry specific (and IIUC the cases this should
catch would break lockdep today), maybe we should add a new
DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?

Something like:

#define local_irq_restore(flags)                               \
       do {                                                    \
               if (!raw_irqs_disabled_flags(flags)) {          \
                       trace_hardirqs_on();                    \
               } else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
                       if (unlikely(raw_irqs_disabled())       \
                               warn_bogus_irqrestore();        \
               }                                               \
               raw_local_irq_restore(flags);                   \
        } while (0)

... perhaps? (ignoring however we deal with once-ness).

Thanks,
Mark.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-12-09 13:27         ` Mark Rutland
@ 2020-12-09 14:02           ` Mark Rutland
  2020-12-09 14:05             ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Mark Rutland @ 2020-12-09 14:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jürgen Groß, Stefano Stabellini, Peter Zijlstra, X86 ML,
	LKML, Linux Virtualization, VMware, Inc., Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, xen-devel, Thomas Gleixner,
	Boris Ostrovsky

On Wed, Dec 09, 2020 at 01:27:10PM +0000, Mark Rutland wrote:
> On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
> > > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > > If someone were to write horrible code like:
> > > >
> > > >       local_irq_disable();
> > > >       local_irq_save(flags);
> > > >       local_irq_enable();
> > > >       local_irq_restore(flags);
> > > >
> > > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > > there somewhere...

> I was just talking to Peter on IRC about implementing the same thing for
> arm64, so could we put this in the generic irqflags code? IIUC we can
> use raw_irqs_disabled() to do the check.
> 
> As this isn't really entry specific (and IIUC the cases this should
> catch would break lockdep today), maybe we should add a new
> DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?
> 
> Something like:
> 
> #define local_irq_restore(flags)                               \
>        do {                                                    \
>                if (!raw_irqs_disabled_flags(flags)) {          \
>                        trace_hardirqs_on();                    \
>                } else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
>                        if (unlikely(raw_irqs_disabled())       \

Whoops; that should be !raw_irqs_disabled().

>                                warn_bogus_irqrestore();        \
>                }                                               \
>                raw_local_irq_restore(flags);                   \
>         } while (0)
> 
> ... perhaps? (ignoring however we deal with once-ness).

If no-one shouts in the next day or two I'll spin this as its own patch.

Mark.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-12-09 14:02           ` Mark Rutland
@ 2020-12-09 14:05             ` Jürgen Groß via Virtualization
  0 siblings, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-12-09 14:05 UTC (permalink / raw)
  To: Mark Rutland, Andy Lutomirski
  Cc: Stefano Stabellini, Peter Zijlstra, X86 ML, LKML,
	Linux Virtualization, VMware, Inc., Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 2078 bytes --]

On 09.12.20 15:02, Mark Rutland wrote:
> On Wed, Dec 09, 2020 at 01:27:10PM +0000, Mark Rutland wrote:
>> On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
>>> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß <jgross@suse.com> wrote:
>>>> On 20.11.20 12:59, Peter Zijlstra wrote:
>>>>> If someone were to write horrible code like:
>>>>>
>>>>>        local_irq_disable();
>>>>>        local_irq_save(flags);
>>>>>        local_irq_enable();
>>>>>        local_irq_restore(flags);
>>>>>
>>>>> we'd be up some creek without a paddle... now I don't _think_ we have
>>>>> genius code like that, but I'd feel saver if we can haz an assertion in
>>>>> there somewhere...
> 
>> I was just talking to Peter on IRC about implementing the same thing for
>> arm64, so could we put this in the generic irqflags code? IIUC we can
>> use raw_irqs_disabled() to do the check.
>>
>> As this isn't really entry specific (and IIUC the cases this should
>> catch would break lockdep today), maybe we should add a new
>> DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?
>>
>> Something like:
>>
>> #define local_irq_restore(flags)                               \
>>         do {                                                    \
>>                 if (!raw_irqs_disabled_flags(flags)) {          \
>>                         trace_hardirqs_on();                    \
>>                 } else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
>>                         if (unlikely(raw_irqs_disabled())       \
> 
> Whoops; that should be !raw_irqs_disabled().
> 
>>                                 warn_bogus_irqrestore();        \
>>                 }                                               \
>>                 raw_local_irq_restore(flags);                   \
>>          } while (0)
>>
>> ... perhaps? (ignoring however we deal with once-ness).
> 
> If no-one shouts in the next day or two I'll spin this as its own patch.

Fine with me. So I'll just ignore a potential error case in my patch.

Thanks,


Juergen


[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-11-20 11:59   ` Peter Zijlstra
  2020-11-20 12:05     ` Jürgen Groß via Virtualization
  2020-11-22  6:55     ` Jürgen Groß via Virtualization
@ 2020-12-09 18:15     ` Mark Rutland
  2020-12-09 18:54       ` Thomas Gleixner
  2 siblings, 1 reply; 45+ messages in thread
From: Mark Rutland @ 2020-12-09 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juergen Gross, Stefano Stabellini, VMware, Inc., x86,
	linux-kernel, virtualization, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Boris Ostrovsky

On Fri, Nov 20, 2020 at 12:59:43PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > +{
> > +	if (!arch_irqs_disabled_flags(flags))
> > +		arch_local_irq_enable();
> > +}
> 
> If someone were to write horrible code like:
> 
> 	local_irq_disable();
> 	local_irq_save(flags);
> 	local_irq_enable();
> 	local_irq_restore(flags);
> 
> we'd be up some creek without a paddle... now I don't _think_ we have
> genius code like that, but I'd feel saver if we can haz an assertion in
> there somewhere...

I've cobbled that together locally (i'll post it momentarily), and gave it a
spin on both arm64 and x86, whereupon it exploded at boot time on x86.

In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:

	local_irq_save(flags);
	local_irq_enable();

	[ trigger an IRQ here ]

	local_irq_restore(flags);

... and in check_timer() we call that a number of times after either a
local_irq_save() or local_irq_disable(), eventually trailing with a
local_irq_disable() that will balance things up before calling
local_irq_restore().

I guess that timer_irq_works() should instead do:

	local_irq_save(flags);
	local_irq_enable();
	...
	local_irq_disable();
	local_irq_restore(flags);

... assuming we consider that legitimate?

With that, and all the calls to local_irq_disable() in check_timer() removed
(diff below) I get a clean boot under QEMU with the assertion hacked in and
DEBUG_LOCKDEP enabled.

Thanks
Mark.

---->8----
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b3c7e0d4a09..e79e665a3aeb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1631,6 +1631,7 @@ static int __init timer_irq_works(void)
        else
                delay_without_tsc();
 
+       local_irq_disable();
        local_irq_restore(flags);
 
        /*
@@ -2191,7 +2192,6 @@ static inline void __init check_timer(void)
                        goto out;
                }
                panic_if_irq_remap("timer doesn't work through Interrupt-remapped IO-APIC");
-               local_irq_disable();
                clear_IO_APIC_pin(apic1, pin1);
                if (!no_pin1)
                        apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2215,6 @@ static inline void __init check_timer(void)
                /*
                 * Cleanup, just in case ...
                 */
-               local_irq_disable();
                legacy_pic->mask(0);
                clear_IO_APIC_pin(apic2, pin2);
                apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
@@ -2232,7 +2231,6 @@ static inline void __init check_timer(void)
                apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
                goto out;
        }
-       local_irq_disable();
        legacy_pic->mask(0);
        apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
        apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
@@ -2251,7 +2249,6 @@ static inline void __init check_timer(void)
                apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
                goto out;
        }
-       local_irq_disable();
        apic_printk(APIC_QUIET, KERN_INFO "..... failed :(.\n");
        if (apic_is_x2apic_enabled())
                apic_printk(APIC_QUIET, KERN_INFO
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-12-09 18:15     ` Mark Rutland
@ 2020-12-09 18:54       ` Thomas Gleixner
  2020-12-10 11:10         ` Mark Rutland
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2020-12-09 18:54 UTC (permalink / raw)
  To: Mark Rutland, Peter Zijlstra
  Cc: Juergen Gross, Stefano Stabellini, VMware, Inc., x86,
	linux-kernel, virtualization, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Boris Ostrovsky

On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
>
> 	local_irq_save(flags);
> 	local_irq_enable();
>
> 	[ trigger an IRQ here ]
>
> 	local_irq_restore(flags);
>
> ... and in check_timer() we call that a number of times after either a
> local_irq_save() or local_irq_disable(), eventually trailing with a
> local_irq_disable() that will balance things up before calling
> local_irq_restore().
>
> I guess that timer_irq_works() should instead do:
>
> 	local_irq_save(flags);
> 	local_irq_enable();
> 	...
> 	local_irq_disable();
> 	local_irq_restore(flags);
>
> ... assuming we consider that legitimate?

Nah. That's old and insane gunk.

Thanks,

        tglx
---
 arch/x86/kernel/apic/io_apic.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
 static int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
-	unsigned long flags;
 
 	if (no_timer_check)
 		return 1;
 
-	local_save_flags(flags);
 	local_irq_enable();
-
 	if (boot_cpu_has(X86_FEATURE_TSC))
 		delay_with_tsc();
 	else
 		delay_without_tsc();
 
-	local_irq_restore(flags);
-
 	/*
 	 * Expect a few ticks at least, to be sure some possible
 	 * glue logic does not lock up after one or two first
@@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
 	 * least one tick may be lost due to delays.
 	 */
 
-	/* jiffies wrap? */
-	if (time_after(jiffies, t1 + 4))
-		return 1;
-	return 0;
+	local_irq_disable();
+
+	/* Did jiffies advance? */
+	return time_after(jiffies, t1 + 4);
 }
 
 /*
@@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	int node = cpu_to_node(0);
 	int apic1, pin1, apic2, pin2;
-	unsigned long flags;
 	int no_pin1 = 0;
 
 	if (!global_clock_event)
 		return;
 
-	local_irq_save(flags);
+	local_irq_disable();
 
 	/*
 	 * get/set the timer IRQ vector:
@@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
 			goto out;
 		}
 		panic_if_irq_remap("timer doesn't work through Interrupt-remapped IO-APIC");
-		local_irq_disable();
 		clear_IO_APIC_pin(apic1, pin1);
 		if (!no_pin1)
 			apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
 		/*
 		 * Cleanup, just in case ...
 		 */
-		local_irq_disable();
 		legacy_pic->mask(0);
 		clear_IO_APIC_pin(apic2, pin2);
 		apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
@@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
-	local_irq_disable();
 	legacy_pic->mask(0);
 	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
@@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
-	local_irq_disable();
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed :(.\n");
 	if (apic_is_x2apic_enabled())
 		apic_printk(APIC_QUIET, KERN_INFO
@@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
 	panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
 		"report.  Then try booting with the 'noapic' option.\n");
 out:
-	local_irq_restore(flags);
+	local_irq_enable();
 }
 
 /*
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE
  2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
  2020-11-27 11:31   ` Borislav Petkov
@ 2020-12-09 21:07   ` Thomas Gleixner
  1 sibling, 0 replies; 45+ messages in thread
From: Thomas Gleixner @ 2020-12-09 21:07 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, x86, linux-kernel, virtualization
  Cc: Juergen Gross, Stefano Stabellini, peterz, VMware, Inc.,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin,
	Boris Ostrovsky

On Fri, Nov 20 2020 at 12:46, Juergen Gross wrote:
> SWAPGS is used only for interrupts coming from user mode or for
> returning to user mode. So there is no reason to use the PARAVIRT
> framework, as it can easily be replaced by an ALTERNATIVE depending
> on X86_FEATURE_XENPV.
>
> There are several instances using the PV-aware SWAPGS macro in paths
> which are never executed in a Xen PV guest. Replace those with the
> plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Andy Lutomirski <luto@kernel.org>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf
  2020-12-09 18:54       ` Thomas Gleixner
@ 2020-12-10 11:10         ` Mark Rutland
  2020-12-10 20:15           ` x86/ioapic: Cleanup the timer_works() irqflags mess Thomas Gleixner
  0 siblings, 1 reply; 45+ messages in thread
From: Mark Rutland @ 2020-12-10 11:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juergen Gross, Stefano Stabellini, Peter Zijlstra, x86,
	linux-kernel, virtualization, VMware, Inc., Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, xen-devel, Boris Ostrovsky

On Wed, Dec 09, 2020 at 07:54:26PM +0100, Thomas Gleixner wrote:
> On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> > In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
> >
> > 	local_irq_save(flags);
> > 	local_irq_enable();
> >
> > 	[ trigger an IRQ here ]
> >
> > 	local_irq_restore(flags);
> >
> > ... and in check_timer() we call that a number of times after either a
> > local_irq_save() or local_irq_disable(), eventually trailing with a
> > local_irq_disable() that will balance things up before calling
> > local_irq_restore().

I gave the patchlet below a spin with my debug patch, and it boots
cleanly for me under QEMU. If you spin it as a real patch, feel free to
add:

Tested-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/x86/kernel/apic/io_apic.c |   22 ++++++----------------
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
>  static int __init timer_irq_works(void)
>  {
>  	unsigned long t1 = jiffies;
> -	unsigned long flags;
>  
>  	if (no_timer_check)
>  		return 1;
>  
> -	local_save_flags(flags);
>  	local_irq_enable();
> -
>  	if (boot_cpu_has(X86_FEATURE_TSC))
>  		delay_with_tsc();
>  	else
>  		delay_without_tsc();
>  
> -	local_irq_restore(flags);
> -
>  	/*
>  	 * Expect a few ticks at least, to be sure some possible
>  	 * glue logic does not lock up after one or two first
> @@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
>  	 * least one tick may be lost due to delays.
>  	 */
>  
> -	/* jiffies wrap? */
> -	if (time_after(jiffies, t1 + 4))
> -		return 1;
> -	return 0;
> +	local_irq_disable();
> +
> +	/* Did jiffies advance? */
> +	return time_after(jiffies, t1 + 4);
>  }
>  
>  /*
> @@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
>  	struct irq_cfg *cfg = irqd_cfg(irq_data);
>  	int node = cpu_to_node(0);
>  	int apic1, pin1, apic2, pin2;
> -	unsigned long flags;
>  	int no_pin1 = 0;
>  
>  	if (!global_clock_event)
>  		return;
>  
> -	local_irq_save(flags);
> +	local_irq_disable();
>  
>  	/*
>  	 * get/set the timer IRQ vector:
> @@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
>  			goto out;
>  		}
>  		panic_if_irq_remap("timer doesn't work through Interrupt-remapped IO-APIC");
> -		local_irq_disable();
>  		clear_IO_APIC_pin(apic1, pin1);
>  		if (!no_pin1)
>  			apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
> @@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
>  		/*
>  		 * Cleanup, just in case ...
>  		 */
> -		local_irq_disable();
>  		legacy_pic->mask(0);
>  		clear_IO_APIC_pin(apic2, pin2);
>  		apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
> @@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
>  		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
>  		goto out;
>  	}
> -	local_irq_disable();
>  	legacy_pic->mask(0);
>  	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
>  	apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
> @@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
>  		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
>  		goto out;
>  	}
> -	local_irq_disable();
>  	apic_printk(APIC_QUIET, KERN_INFO "..... failed :(.\n");
>  	if (apic_is_x2apic_enabled())
>  		apic_printk(APIC_QUIET, KERN_INFO
> @@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
>  	panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
>  		"report.  Then try booting with the 'noapic' option.\n");
>  out:
> -	local_irq_restore(flags);
> +	local_irq_enable();
>  }
>  
>  /*
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* x86/ioapic: Cleanup the timer_works() irqflags mess
  2020-12-10 11:10         ` Mark Rutland
@ 2020-12-10 20:15           ` Thomas Gleixner
  2020-12-11  5:10             ` Jürgen Groß via Virtualization
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2020-12-10 20:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Juergen Gross, Stefano Stabellini, Peter Zijlstra, x86,
	linux-kernel, virtualization, VMware, Inc., Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, xen-devel, Boris Ostrovsky

Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:

        local_irq_save(flags);
	local_irq_enable();
        local_irq_restore(flags);

which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.

That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/x86/kernel/apic/io_apic.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
 static int __init timer_irq_works(void)
 {
 	unsigned long t1 = jiffies;
-	unsigned long flags;
 
 	if (no_timer_check)
 		return 1;
 
-	local_save_flags(flags);
 	local_irq_enable();
-
 	if (boot_cpu_has(X86_FEATURE_TSC))
 		delay_with_tsc();
 	else
 		delay_without_tsc();
 
-	local_irq_restore(flags);
-
 	/*
 	 * Expect a few ticks at least, to be sure some possible
 	 * glue logic does not lock up after one or two first
@@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
 	 * least one tick may be lost due to delays.
 	 */
 
-	/* jiffies wrap? */
-	if (time_after(jiffies, t1 + 4))
-		return 1;
-	return 0;
+	local_irq_disable();
+
+	/* Did jiffies advance? */
+	return time_after(jiffies, t1 + 4);
 }
 
 /*
@@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	int node = cpu_to_node(0);
 	int apic1, pin1, apic2, pin2;
-	unsigned long flags;
 	int no_pin1 = 0;
 
 	if (!global_clock_event)
 		return;
 
-	local_irq_save(flags);
+	local_irq_disable();
 
 	/*
 	 * get/set the timer IRQ vector:
@@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
 			goto out;
 		}
 		panic_if_irq_remap("timer doesn't work through Interrupt-remapped IO-APIC");
-		local_irq_disable();
 		clear_IO_APIC_pin(apic1, pin1);
 		if (!no_pin1)
 			apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
 		/*
 		 * Cleanup, just in case ...
 		 */
-		local_irq_disable();
 		legacy_pic->mask(0);
 		clear_IO_APIC_pin(apic2, pin2);
 		apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
@@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
-	local_irq_disable();
 	legacy_pic->mask(0);
 	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
@@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
-	local_irq_disable();
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed :(.\n");
 	if (apic_is_x2apic_enabled())
 		apic_printk(APIC_QUIET, KERN_INFO
@@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
 	panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
 		"report.  Then try booting with the 'noapic' option.\n");
 out:
-	local_irq_restore(flags);
+	local_irq_enable();
 }
 
 /*
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: x86/ioapic: Cleanup the timer_works() irqflags mess
  2020-12-10 20:15           ` x86/ioapic: Cleanup the timer_works() irqflags mess Thomas Gleixner
@ 2020-12-11  5:10             ` Jürgen Groß via Virtualization
  0 siblings, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-12-11  5:10 UTC (permalink / raw)
  To: Thomas Gleixner, Mark Rutland
  Cc: Stefano Stabellini, Peter Zijlstra, x86, linux-kernel,
	virtualization, VMware, Inc., Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, xen-devel, Boris Ostrovsky


[-- Attachment #1.1.1.1: Type: text/plain, Size: 827 bytes --]

On 10.12.20 21:15, Thomas Gleixner wrote:
> Mark tripped over the creative irqflags handling in the IO-APIC timer
> delivery check which ends up doing:
> 
>          local_irq_save(flags);
> 	local_irq_enable();
>          local_irq_restore(flags);
> 
> which triggered a new consistency check he's working on required for
> replacing the POPF based restore with a conditional STI.
> 
> That code is a historical mess and none of this is needed. Make it
> straightforward use local_irq_disable()/enable() as that's all what is
> required. It is invoked from interrupt enabled code nowadays.
> 
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Mark Rutland <mark.rutland@arm.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-11-23 13:43   ` Peter Zijlstra
@ 2020-12-15 11:42     ` Jürgen Groß via Virtualization
  2020-12-15 14:18       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-12-15 11:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Josh Poimboeuf, Vincent Guittot, Thomas Gleixner,
	Dietmar Eggemann, Jim Mattson, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira


[-- Attachment #1.1.1.1: Type: text/plain, Size: 14363 bytes --]

Peter,

On 23.11.20 14:43, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
>> On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
>>>   30 files changed, 325 insertions(+), 598 deletions(-)
>>
>> Much awesome ! I'll try and get that objtool thing sorted.
> 
> This seems to work for me. It isn't 100% accurate, because it doesn't
> know about the direct call instruction, but I can either fudge that or
> switching to static_call() will cure that.
> 
> It's not exactly pretty, but it should be straight forward.

Are you planning to send this out as an "official" patch, or should I
include it in my series (in this case I'd need a variant with a proper
commit message)?

I'd like to have this settled soon, as I'm going to send V2 of my
series hopefully this week.


Juergen

> 
> Index: linux-2.6/tools/objtool/check.c
> ===================================================================
> --- linux-2.6.orig/tools/objtool/check.c
> +++ linux-2.6/tools/objtool/check.c
> @@ -1090,6 +1090,32 @@ static int handle_group_alt(struct objto
>   		return -1;
>   	}
>   
> +	/*
> +	 * Add the filler NOP, required for alternative CFI.
> +	 */
> +	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
> +		struct instruction *nop = malloc(sizeof(*nop));
> +		if (!nop) {
> +			WARN("malloc failed");
> +			return -1;
> +		}
> +		memset(nop, 0, sizeof(*nop));
> +		INIT_LIST_HEAD(&nop->alts);
> +		INIT_LIST_HEAD(&nop->stack_ops);
> +		init_cfi_state(&nop->cfi);
> +
> +		nop->sec = last_new_insn->sec;
> +		nop->ignore = last_new_insn->ignore;
> +		nop->func = last_new_insn->func;
> +		nop->alt_group = alt_group;
> +		nop->offset = last_new_insn->offset + last_new_insn->len;
> +		nop->type = INSN_NOP;
> +		nop->len = special_alt->orig_len - special_alt->new_len;
> +
> +		list_add(&nop->list, &last_new_insn->list);
> +		last_new_insn = nop;
> +	}
> +
>   	if (fake_jump)
>   		list_add(&fake_jump->list, &last_new_insn->list);
>   
> @@ -2190,18 +2216,12 @@ static int handle_insn_ops(struct instru
>   	struct stack_op *op;
>   
>   	list_for_each_entry(op, &insn->stack_ops, list) {
> -		struct cfi_state old_cfi = state->cfi;
>   		int res;
>   
>   		res = update_cfi_state(insn, &state->cfi, op);
>   		if (res)
>   			return res;
>   
> -		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
> -			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
> -			return -1;
> -		}
> -
>   		if (op->dest.type == OP_DEST_PUSHF) {
>   			if (!state->uaccess_stack) {
>   				state->uaccess_stack = 1;
> @@ -2399,19 +2419,137 @@ static int validate_return(struct symbol
>    * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
>    * states which then results in ORC entries, which we just said we didn't want.
>    *
> - * Avoid them by copying the CFI entry of the first instruction into the whole
> - * alternative.
> + * Avoid them by copying the CFI entry of the first instruction into the hole.
>    */
> -static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
> +static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
>   {
>   	struct instruction *first_insn = insn;
>   	int alt_group = insn->alt_group;
>   
> -	sec_for_each_insn_continue(file, insn) {
> +	sec_for_each_insn_from(file, insn) {
>   		if (insn->alt_group != alt_group)
>   			break;
> -		insn->cfi = first_insn->cfi;
> +
> +		if (!insn->visited)
> +			insn->cfi = first_insn->cfi;
> +	}
> +}
> +
> +static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	__fill_alt_cfi(file, alt_insn);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list)
> +		__fill_alt_cfi(file, alt->insn);
> +}
> +
> +static struct instruction *
> +__find_unwind(struct objtool_file *file,
> +	      struct instruction *insn, unsigned long offset)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *next;
> +	unsigned long off = 0;
> +
> +	while ((off + insn->len) <= offset) {
> +		next = next_insn_same_sec(file, insn);
> +		if (next && next->alt_group != alt_group)
> +			next = NULL;
> +
> +		if (!next)
> +			break;
> +
> +		off += insn->len;
> +		insn = next;
>   	}
> +
> +	return insn;
> +}
> +
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset)
> +{
> +	struct instruction *fit;
> +	struct alternative *alt;
> +	unsigned long fit_off;
> +
> +	fit = __find_unwind(file, alt_insn, offset);
> +	fit_off = (fit->offset - alt_insn->offset);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		struct instruction *x;
> +		unsigned long x_off;
> +
> +		x = __find_unwind(file, alt->insn, offset);
> +		x_off = (x->offset - alt->insn->offset);
> +
> +		if (fit_off < x_off) {
> +			fit = x;
> +			fit_off = x_off;
> +
> +		} else if (fit_off == x_off &&
> +			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(fit->sec, fit->offset);
> +			char *_str2 = offstr(x->sec, x->offset);
> +			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
> +			free(_str1);
> +			free(_str2);
> +			return fit;
> +		}
> +	}
> +
> +	return fit;
> +}
> +
> +static int __validate_unwind(struct objtool_file *file,
> +			     struct instruction *alt_insn,
> +			     struct instruction *insn)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *unwind;
> +	unsigned long offset = 0;
> +
> +	sec_for_each_insn_from(file, insn) {
> +		if (insn->alt_group != alt_group)
> +			break;
> +
> +		unwind = find_alt_unwind(file, alt_insn, offset);
> +
> +		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(insn->sec, insn->offset);
> +			char *_str2 = offstr(unwind->sec, unwind->offset);
> +			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
> +			     _str1, _str2, offset);
> +			free(_str1);
> +			free(_str2);
> +			return 1;
> +		}
> +
> +		offset += insn->len;
> +	}
> +
> +	return 0;
> +}
> +
> +static int validate_alt_unwind(struct objtool_file *file,
> +			       struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	if (__validate_unwind(file, alt_insn, alt_insn))
> +		return 1;
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		if (__validate_unwind(file, alt_insn, alt->insn))
> +			return 1;
> +	}
> +
> +	return 0;
>   }
>   
>   /*
> @@ -2423,9 +2561,10 @@ static void fill_alternative_cfi(struct
>   static int validate_branch(struct objtool_file *file, struct symbol *func,
>   			   struct instruction *insn, struct insn_state state)
>   {
> +	struct instruction *next_insn, *alt_insn = NULL;
>   	struct alternative *alt;
> -	struct instruction *next_insn;
>   	struct section *sec;
> +	int alt_group = 0;
>   	u8 visited;
>   	int ret;
>   
> @@ -2480,8 +2619,10 @@ static int validate_branch(struct objtoo
>   				}
>   			}
>   
> -			if (insn->alt_group)
> -				fill_alternative_cfi(file, insn);
> +			if (insn->alt_group) {
> +				alt_insn = insn;
> +				alt_group = insn->alt_group;
> +			}
>   
>   			if (skip_orig)
>   				return 0;
> @@ -2613,6 +2754,17 @@ static int validate_branch(struct objtoo
>   		}
>   
>   		insn = next_insn;
> +
> +		if (alt_insn && insn->alt_group != alt_group) {
> +			alt_insn->alt_end = insn;
> +
> +			fill_alt_cfi(file, alt_insn);
> +
> +			if (validate_alt_unwind(file, alt_insn))
> +				return 1;
> +
> +			alt_insn = NULL;
> +		}
>   	}
>   
>   	return 0;
> Index: linux-2.6/tools/objtool/check.h
> ===================================================================
> --- linux-2.6.orig/tools/objtool/check.h
> +++ linux-2.6/tools/objtool/check.h
> @@ -40,6 +40,7 @@ struct instruction {
>   	struct instruction *first_jump_src;
>   	struct reloc *jump_table;
>   	struct list_head alts;
> +	struct instruction *alt_end;
>   	struct symbol *func;
>   	struct list_head stack_ops;
>   	struct cfi_state cfi;
> @@ -54,6 +55,10 @@ static inline bool is_static_jump(struct
>   	       insn->type == INSN_JUMP_UNCONDITIONAL;
>   }
>   
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset);
> +
>   struct instruction *find_insn(struct objtool_file *file,
>   			      struct section *sec, unsigned long offset);
>   
> Index: linux-2.6/tools/objtool/orc_gen.c
> ===================================================================
> --- linux-2.6.orig/tools/objtool/orc_gen.c
> +++ linux-2.6/tools/objtool/orc_gen.c
> @@ -12,75 +12,86 @@
>   #include "check.h"
>   #include "warn.h"
>   
> -int create_orc(struct objtool_file *file)
> +static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
>   {
> -	struct instruction *insn;
> +	struct orc_entry *orc = &insn->orc;
> +	struct cfi_reg *cfa = &insn->cfi.cfa;
> +	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +
> +	orc->end = insn->cfi.end;
> +
> +	if (cfa->base == CFI_UNDEFINED) {
> +		orc->sp_reg = ORC_REG_UNDEFINED;
> +		return 0;
> +	}
>   
> -	for_each_insn(file, insn) {
> -		struct orc_entry *orc = &insn->orc;
> -		struct cfi_reg *cfa = &insn->cfi.cfa;
> -		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +	switch (cfa->base) {
> +	case CFI_SP:
> +		orc->sp_reg = ORC_REG_SP;
> +		break;
> +	case CFI_SP_INDIRECT:
> +		orc->sp_reg = ORC_REG_SP_INDIRECT;
> +		break;
> +	case CFI_BP:
> +		orc->sp_reg = ORC_REG_BP;
> +		break;
> +	case CFI_BP_INDIRECT:
> +		orc->sp_reg = ORC_REG_BP_INDIRECT;
> +		break;
> +	case CFI_R10:
> +		orc->sp_reg = ORC_REG_R10;
> +		break;
> +	case CFI_R13:
> +		orc->sp_reg = ORC_REG_R13;
> +		break;
> +	case CFI_DI:
> +		orc->sp_reg = ORC_REG_DI;
> +		break;
> +	case CFI_DX:
> +		orc->sp_reg = ORC_REG_DX;
> +		break;
> +	default:
> +		WARN_FUNC("unknown CFA base reg %d",
> +			  insn->sec, insn->offset, cfa->base);
> +		return -1;
> +	}
>   
> -		if (!insn->sec->text)
> -			continue;
> +	switch(bp->base) {
> +	case CFI_UNDEFINED:
> +		orc->bp_reg = ORC_REG_UNDEFINED;
> +		break;
> +	case CFI_CFA:
> +		orc->bp_reg = ORC_REG_PREV_SP;
> +		break;
> +	case CFI_BP:
> +		orc->bp_reg = ORC_REG_BP;
> +		break;
> +	default:
> +		WARN_FUNC("unknown BP base reg %d",
> +			  insn->sec, insn->offset, bp->base);
> +		return -1;
> +	}
>   
> -		orc->end = insn->cfi.end;
> +	orc->sp_offset = cfa->offset;
> +	orc->bp_offset = bp->offset;
> +	orc->type = insn->cfi.type;
>   
> -		if (cfa->base == CFI_UNDEFINED) {
> -			orc->sp_reg = ORC_REG_UNDEFINED;
> -			continue;
> -		}
> +	return 0;
> +}
>   
> -		switch (cfa->base) {
> -		case CFI_SP:
> -			orc->sp_reg = ORC_REG_SP;
> -			break;
> -		case CFI_SP_INDIRECT:
> -			orc->sp_reg = ORC_REG_SP_INDIRECT;
> -			break;
> -		case CFI_BP:
> -			orc->sp_reg = ORC_REG_BP;
> -			break;
> -		case CFI_BP_INDIRECT:
> -			orc->sp_reg = ORC_REG_BP_INDIRECT;
> -			break;
> -		case CFI_R10:
> -			orc->sp_reg = ORC_REG_R10;
> -			break;
> -		case CFI_R13:
> -			orc->sp_reg = ORC_REG_R13;
> -			break;
> -		case CFI_DI:
> -			orc->sp_reg = ORC_REG_DI;
> -			break;
> -		case CFI_DX:
> -			orc->sp_reg = ORC_REG_DX;
> -			break;
> -		default:
> -			WARN_FUNC("unknown CFA base reg %d",
> -				  insn->sec, insn->offset, cfa->base);
> -			return -1;
> -		}
> +int create_orc(struct objtool_file *file)
> +{
> +	struct instruction *insn;
>   
> -		switch(bp->base) {
> -		case CFI_UNDEFINED:
> -			orc->bp_reg = ORC_REG_UNDEFINED;
> -			break;
> -		case CFI_CFA:
> -			orc->bp_reg = ORC_REG_PREV_SP;
> -			break;
> -		case CFI_BP:
> -			orc->bp_reg = ORC_REG_BP;
> -			break;
> -		default:
> -			WARN_FUNC("unknown BP base reg %d",
> -				  insn->sec, insn->offset, bp->base);
> -			return -1;
> -		}
> +	for_each_insn(file, insn) {
> +		int ret;
> +	
> +		if (!insn->sec->text)
> +			continue;
>   
> -		orc->sp_offset = cfa->offset;
> -		orc->bp_offset = bp->offset;
> -		orc->type = insn->cfi.type;
> +		ret = create_orc_insn(file, insn);
> +		if (ret)
> +			return ret;
>   	}
>   
>   	return 0;
> @@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					/* XXX: skipped earlier ! */
> +					create_orc_insn(file, unwind);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +						idx++;
> +//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn ||
>   			    memcmp(&insn->orc, &prev_insn->orc,
>   				   sizeof(struct orc_entry))) {
> @@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +
> +						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
> +								     insn->sec, insn->offset + offset,
> +								     &unwind->orc))
> +							return -1;
> +
> +						idx++;
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
>   						 sizeof(struct orc_entry))) {
>   
> 


[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 11:42     ` Jürgen Groß via Virtualization
@ 2020-12-15 14:18       ` Peter Zijlstra
  2020-12-15 14:54         ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-12-15 14:18 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Josh Poimboeuf, Vincent Guittot, Thomas Gleixner,
	Dietmar Eggemann, Jim Mattson, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 12:42:45PM +0100, Jürgen Groß wrote:
> Peter,
> 
> On 23.11.20 14:43, Peter Zijlstra wrote:
> > On Fri, Nov 20, 2020 at 01:53:42PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 20, 2020 at 12:46:18PM +0100, Juergen Gross wrote:
> > > >   30 files changed, 325 insertions(+), 598 deletions(-)
> > > 
> > > Much awesome ! I'll try and get that objtool thing sorted.
> > 
> > This seems to work for me. It isn't 100% accurate, because it doesn't
> > know about the direct call instruction, but I can either fudge that or
> > switching to static_call() will cure that.
> > 
> > It's not exactly pretty, but it should be straight forward.
> 
> Are you planning to send this out as an "official" patch, or should I
> include it in my series (in this case I'd need a variant with a proper
> commit message)?

Ah, I was waiting for Josh to have an opinion (and then sorta forgot
about the whole thing again). Let me refresh and provide at least a
Changelog.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:18       ` Peter Zijlstra
@ 2020-12-15 14:54         ` Peter Zijlstra
  2020-12-15 15:07           ` Jürgen Groß via Virtualization
  2020-12-16  0:38           ` Josh Poimboeuf
  0 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2020-12-15 14:54 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Josh Poimboeuf, Vincent Guittot, Thomas Gleixner,
	Dietmar Eggemann, Jim Mattson, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 03:18:34PM +0100, Peter Zijlstra wrote:
> Ah, I was waiting for Josh to have an opinion (and then sorta forgot
> about the whole thing again). Let me refresh and provide at least a
> Changelog.

How's this then?

---
Subject: objtool: Alternatives vs ORC, the hard way
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 23 Nov 2020 14:43:17 +0100

Alternatives pose an interesting problem for unwinders because from
the unwinders PoV we're just executing instructions, it has no idea
the text is modified, nor any way of retrieving what with.

Therefore the stance has been that alternatives must not change stack
state, as encoded by commit: 7117f16bf460 ("objtool: Fix ORC vs
alternatives"). This obviously guarantees that whatever actual
instructions end up in the text, the unwind information is correct.

However, there is one additional source of text patching that isn't
currently visible to objtool: paravirt immediate patching. And it
turns out one of these violates the rule.

As part of cleaning that up, the unfortunate reality is that objtool
now has to deal with alternatives modifying unwind state and validate
the combination is valid and generate ORC data to match.

The problem is that a single instance of unwind information (ORC) must
capture and correctly unwind all alternatives. Since the trivially
correct mandate is out, implement the straight forward brute-force
approach:

 1) generate CFI information for each alternative

 2) unwind every alternative with the merge-sort of the previously
    generated CFI information -- O(n^2)

 3) for any possible conflict: yell.

 4) Generate ORC with merge-sort

Specifically for 3 there are two possible classes of conflicts:

 - the merge-sort itself could find conflicting CFI for the same
   offset.

 - the unwind can fail with the merged CFI.

In specific, this allows us to deal with:

	Alt1			Alt2			Alt3

 0x00	CALL *pv_ops.save_fl	CALL xen_save_fl	PUSHF
 0x01							POP %RAX
 0x02							NOP
 ...
 0x05				NOP
 ...
 0x07   <insn>

The unwind information for offset-0x00 is identical for all 3
alternatives. Similarly offset-0x05 and higher also are identical (and
the same as 0x00). However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c   |  180 ++++++++++++++++++++++++++++++++++++++++++++----
 tools/objtool/check.h   |    5 +
 tools/objtool/orc_gen.c |  180 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 290 insertions(+), 75 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1096,6 +1096,32 @@ static int handle_group_alt(struct objto
 		return -1;
 	}
 
+	/*
+	 * Add the filler NOP, required for alternative CFI.
+	 */
+	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
+		struct instruction *nop = malloc(sizeof(*nop));
+		if (!nop) {
+			WARN("malloc failed");
+			return -1;
+		}
+		memset(nop, 0, sizeof(*nop));
+		INIT_LIST_HEAD(&nop->alts);
+		INIT_LIST_HEAD(&nop->stack_ops);
+		init_cfi_state(&nop->cfi);
+
+		nop->sec = last_new_insn->sec;
+		nop->ignore = last_new_insn->ignore;
+		nop->func = last_new_insn->func;
+		nop->alt_group = alt_group;
+		nop->offset = last_new_insn->offset + last_new_insn->len;
+		nop->type = INSN_NOP;
+		nop->len = special_alt->orig_len - special_alt->new_len;
+
+		list_add(&nop->list, &last_new_insn->list);
+		last_new_insn = nop;
+	}
+
 	if (fake_jump)
 		list_add(&fake_jump->list, &last_new_insn->list);
 
@@ -2237,18 +2263,12 @@ static int handle_insn_ops(struct instru
 	struct stack_op *op;
 
 	list_for_each_entry(op, &insn->stack_ops, list) {
-		struct cfi_state old_cfi = state->cfi;
 		int res;
 
 		res = update_cfi_state(insn, &state->cfi, op);
 		if (res)
 			return res;
 
-		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
-			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
-			return -1;
-		}
-
 		if (op->dest.type == OP_DEST_PUSHF) {
 			if (!state->uaccess_stack) {
 				state->uaccess_stack = 1;
@@ -2460,19 +2480,137 @@ static int validate_return(struct symbol
  * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
  * states which then results in ORC entries, which we just said we didn't want.
  *
- * Avoid them by copying the CFI entry of the first instruction into the whole
- * alternative.
+ * Avoid them by copying the CFI entry of the first instruction into the hole.
  */
-static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
+static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
 {
 	struct instruction *first_insn = insn;
 	int alt_group = insn->alt_group;
 
-	sec_for_each_insn_continue(file, insn) {
+	sec_for_each_insn_from(file, insn) {
 		if (insn->alt_group != alt_group)
 			break;
-		insn->cfi = first_insn->cfi;
+
+		if (!insn->visited)
+			insn->cfi = first_insn->cfi;
+	}
+}
+
+static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	__fill_alt_cfi(file, alt_insn);
+
+	list_for_each_entry(alt, &alt_insn->alts, list)
+		__fill_alt_cfi(file, alt->insn);
+}
+
+static struct instruction *
+__find_unwind(struct objtool_file *file,
+	      struct instruction *insn, unsigned long offset)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *next;
+	unsigned long off = 0;
+
+	while ((off + insn->len) <= offset) {
+		next = next_insn_same_sec(file, insn);
+		if (next && next->alt_group != alt_group)
+			next = NULL;
+
+		if (!next)
+			break;
+
+		off += insn->len;
+		insn = next;
 	}
+
+	return insn;
+}
+
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset)
+{
+	struct instruction *fit;
+	struct alternative *alt;
+	unsigned long fit_off;
+
+	fit = __find_unwind(file, alt_insn, offset);
+	fit_off = (fit->offset - alt_insn->offset);
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		struct instruction *x;
+		unsigned long x_off;
+
+		x = __find_unwind(file, alt->insn, offset);
+		x_off = (x->offset - alt->insn->offset);
+
+		if (fit_off < x_off) {
+			fit = x;
+			fit_off = x_off;
+
+		} else if (fit_off == x_off &&
+			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(fit->sec, fit->offset);
+			char *_str2 = offstr(x->sec, x->offset);
+			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
+			free(_str1);
+			free(_str2);
+			return fit;
+		}
+	}
+
+	return fit;
+}
+
+static int __validate_unwind(struct objtool_file *file,
+			     struct instruction *alt_insn,
+			     struct instruction *insn)
+{
+	int alt_group = insn->alt_group;
+	struct instruction *unwind;
+	unsigned long offset = 0;
+
+	sec_for_each_insn_from(file, insn) {
+		if (insn->alt_group != alt_group)
+			break;
+
+		unwind = find_alt_unwind(file, alt_insn, offset);
+
+		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
+
+			char *_str1 = offstr(insn->sec, insn->offset);
+			char *_str2 = offstr(unwind->sec, unwind->offset);
+			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
+			     _str1, _str2, offset);
+			free(_str1);
+			free(_str2);
+			return 1;
+		}
+
+		offset += insn->len;
+	}
+
+	return 0;
+}
+
+static int validate_alt_unwind(struct objtool_file *file,
+			       struct instruction *alt_insn)
+{
+	struct alternative *alt;
+
+	if (__validate_unwind(file, alt_insn, alt_insn))
+		return 1;
+
+	list_for_each_entry(alt, &alt_insn->alts, list) {
+		if (__validate_unwind(file, alt_insn, alt->insn))
+			return 1;
+	}
+
+	return 0;
 }
 
 /*
@@ -2484,9 +2622,10 @@ static void fill_alternative_cfi(struct
 static int validate_branch(struct objtool_file *file, struct symbol *func,
 			   struct instruction *insn, struct insn_state state)
 {
+	struct instruction *next_insn, *alt_insn = NULL;
 	struct alternative *alt;
-	struct instruction *next_insn;
 	struct section *sec;
+	int alt_group = 0;
 	u8 visited;
 	int ret;
 
@@ -2541,8 +2680,10 @@ static int validate_branch(struct objtoo
 				}
 			}
 
-			if (insn->alt_group)
-				fill_alternative_cfi(file, insn);
+			if (insn->alt_group) {
+				alt_insn = insn;
+				alt_group = insn->alt_group;
+			}
 
 			if (skip_orig)
 				return 0;
@@ -2697,6 +2838,17 @@ static int validate_branch(struct objtoo
 		}
 
 		insn = next_insn;
+
+		if (alt_insn && insn->alt_group != alt_group) {
+			alt_insn->alt_end = insn;
+
+			fill_alt_cfi(file, alt_insn);
+
+			if (validate_alt_unwind(file, alt_insn))
+				return 1;
+
+			alt_insn = NULL;
+		}
 	}
 
 	return 0;
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -41,6 +41,7 @@ struct instruction {
 	struct instruction *first_jump_src;
 	struct reloc *jump_table;
 	struct list_head alts;
+	struct instruction *alt_end;
 	struct symbol *func;
 	struct list_head stack_ops;
 	struct cfi_state cfi;
@@ -55,6 +56,10 @@ static inline bool is_static_jump(struct
 	       insn->type == INSN_JUMP_UNCONDITIONAL;
 }
 
+struct instruction *
+find_alt_unwind(struct objtool_file *file,
+		struct instruction *alt_insn, unsigned long offset);
+
 struct instruction *find_insn(struct objtool_file *file,
 			      struct section *sec, unsigned long offset);
 
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -12,75 +12,86 @@
 #include "check.h"
 #include "warn.h"
 
-int create_orc(struct objtool_file *file)
+static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
 {
-	struct instruction *insn;
+	struct orc_entry *orc = &insn->orc;
+	struct cfi_reg *cfa = &insn->cfi.cfa;
+	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+
+	orc->end = insn->cfi.end;
+
+	if (cfa->base == CFI_UNDEFINED) {
+		orc->sp_reg = ORC_REG_UNDEFINED;
+		return 0;
+	}
 
-	for_each_insn(file, insn) {
-		struct orc_entry *orc = &insn->orc;
-		struct cfi_reg *cfa = &insn->cfi.cfa;
-		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
+	switch (cfa->base) {
+	case CFI_SP:
+		orc->sp_reg = ORC_REG_SP;
+		break;
+	case CFI_SP_INDIRECT:
+		orc->sp_reg = ORC_REG_SP_INDIRECT;
+		break;
+	case CFI_BP:
+		orc->sp_reg = ORC_REG_BP;
+		break;
+	case CFI_BP_INDIRECT:
+		orc->sp_reg = ORC_REG_BP_INDIRECT;
+		break;
+	case CFI_R10:
+		orc->sp_reg = ORC_REG_R10;
+		break;
+	case CFI_R13:
+		orc->sp_reg = ORC_REG_R13;
+		break;
+	case CFI_DI:
+		orc->sp_reg = ORC_REG_DI;
+		break;
+	case CFI_DX:
+		orc->sp_reg = ORC_REG_DX;
+		break;
+	default:
+		WARN_FUNC("unknown CFA base reg %d",
+			  insn->sec, insn->offset, cfa->base);
+		return -1;
+	}
 
-		if (!insn->sec->text)
-			continue;
+	switch(bp->base) {
+	case CFI_UNDEFINED:
+		orc->bp_reg = ORC_REG_UNDEFINED;
+		break;
+	case CFI_CFA:
+		orc->bp_reg = ORC_REG_PREV_SP;
+		break;
+	case CFI_BP:
+		orc->bp_reg = ORC_REG_BP;
+		break;
+	default:
+		WARN_FUNC("unknown BP base reg %d",
+			  insn->sec, insn->offset, bp->base);
+		return -1;
+	}
 
-		orc->end = insn->cfi.end;
+	orc->sp_offset = cfa->offset;
+	orc->bp_offset = bp->offset;
+	orc->type = insn->cfi.type;
 
-		if (cfa->base == CFI_UNDEFINED) {
-			orc->sp_reg = ORC_REG_UNDEFINED;
-			continue;
-		}
+	return 0;
+}
 
-		switch (cfa->base) {
-		case CFI_SP:
-			orc->sp_reg = ORC_REG_SP;
-			break;
-		case CFI_SP_INDIRECT:
-			orc->sp_reg = ORC_REG_SP_INDIRECT;
-			break;
-		case CFI_BP:
-			orc->sp_reg = ORC_REG_BP;
-			break;
-		case CFI_BP_INDIRECT:
-			orc->sp_reg = ORC_REG_BP_INDIRECT;
-			break;
-		case CFI_R10:
-			orc->sp_reg = ORC_REG_R10;
-			break;
-		case CFI_R13:
-			orc->sp_reg = ORC_REG_R13;
-			break;
-		case CFI_DI:
-			orc->sp_reg = ORC_REG_DI;
-			break;
-		case CFI_DX:
-			orc->sp_reg = ORC_REG_DX;
-			break;
-		default:
-			WARN_FUNC("unknown CFA base reg %d",
-				  insn->sec, insn->offset, cfa->base);
-			return -1;
-		}
+int create_orc(struct objtool_file *file)
+{
+	struct instruction *insn;
 
-		switch(bp->base) {
-		case CFI_UNDEFINED:
-			orc->bp_reg = ORC_REG_UNDEFINED;
-			break;
-		case CFI_CFA:
-			orc->bp_reg = ORC_REG_PREV_SP;
-			break;
-		case CFI_BP:
-			orc->bp_reg = ORC_REG_BP;
-			break;
-		default:
-			WARN_FUNC("unknown BP base reg %d",
-				  insn->sec, insn->offset, bp->base);
-			return -1;
-		}
+	for_each_insn(file, insn) {
+		int ret;
+
+		if (!insn->sec->text)
+			continue;
 
-		orc->sp_offset = cfa->offset;
-		orc->bp_offset = bp->offset;
-		orc->type = insn->cfi.type;
+		ret = create_orc_insn(file, insn);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					/* XXX: skipped earlier ! */
+					create_orc_insn(file, unwind);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+						idx++;
+//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn ||
 			    memcmp(&insn->orc, &prev_insn->orc,
 				   sizeof(struct orc_entry))) {
@@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
 
 		prev_insn = NULL;
 		sec_for_each_insn(file, sec, insn) {
+
+			if (insn->alt_end) {
+				unsigned int offset, alt_len;
+				struct instruction *unwind;
+
+				alt_len = insn->alt_end->offset - insn->offset;
+				for (offset = 0; offset < alt_len; offset++) {
+					unwind = find_alt_unwind(file, insn, offset);
+					if (!prev_insn ||
+					    memcmp(&unwind->orc, &prev_insn->orc,
+						   sizeof(struct orc_entry))) {
+
+						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
+								     insn->sec, insn->offset + offset,
+								     &unwind->orc))
+							return -1;
+
+						idx++;
+					}
+					prev_insn = unwind;
+				}
+
+				insn = insn->alt_end;
+			}
+
 			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
 						 sizeof(struct orc_entry))) {
 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:54         ` Peter Zijlstra
@ 2020-12-15 15:07           ` Jürgen Groß via Virtualization
  2020-12-16  0:38           ` Josh Poimboeuf
  1 sibling, 0 replies; 45+ messages in thread
From: Jürgen Groß via Virtualization @ 2020-12-15 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Josh Poimboeuf, Vincent Guittot, Boris Ostrovsky,
	Dietmar Eggemann, Jim Mattson, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira


[-- Attachment #1.1.1.1: Type: text/plain, Size: 16292 bytes --]

On 15.12.20 15:54, Peter Zijlstra wrote:
> On Tue, Dec 15, 2020 at 03:18:34PM +0100, Peter Zijlstra wrote:
>> Ah, I was waiting for Josh to have an opinion (and then sorta forgot
>> about the whole thing again). Let me refresh and provide at least a
>> Changelog.
> 
> How's this then?

Thanks, will add it into my series.


Juergen

> 
> ---
> Subject: objtool: Alternatives vs ORC, the hard way
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon, 23 Nov 2020 14:43:17 +0100
> 
> Alternatives pose an interesting problem for unwinders because from
> the unwinders PoV we're just executing instructions, it has no idea
> the text is modified, nor any way of retrieving what with.
> 
> Therefore the stance has been that alternatives must not change stack
> state, as encoded by commit: 7117f16bf460 ("objtool: Fix ORC vs
> alternatives"). This obviously guarantees that whatever actual
> instructions end up in the text, the unwind information is correct.
> 
> However, there is one additional source of text patching that isn't
> currently visible to objtool: paravirt immediate patching. And it
> turns out one of these violates the rule.
> 
> As part of cleaning that up, the unfortunate reality is that objtool
> now has to deal with alternatives modifying unwind state and validate
> the combination is valid and generate ORC data to match.
> 
> The problem is that a single instance of unwind information (ORC) must
> capture and correctly unwind all alternatives. Since the trivially
> correct mandate is out, implement the straight forward brute-force
> approach:
> 
>   1) generate CFI information for each alternative
> 
>   2) unwind every alternative with the merge-sort of the previously
>      generated CFI information -- O(n^2)
> 
>   3) for any possible conflict: yell.
> 
>   4) Generate ORC with merge-sort
> 
> Specifically for 3 there are two possible classes of conflicts:
> 
>   - the merge-sort itself could find conflicting CFI for the same
>     offset.
> 
>   - the unwind can fail with the merged CFI.
> 
> In specific, this allows us to deal with:
> 
> 	Alt1			Alt2			Alt3
> 
>   0x00	CALL *pv_ops.save_fl	CALL xen_save_fl	PUSHF
>   0x01							POP %RAX
>   0x02							NOP
>   ...
>   0x05				NOP
>   ...
>   0x07   <insn>
> 
> The unwind information for offset-0x00 is identical for all 3
> alternatives. Similarly offset-0x05 and higher also are identical (and
> the same as 0x00). However offset-0x01 has deviating CFI, but that is
> only relevant for Alt3, neither of the other alternative instruction
> streams will ever hit that offset.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   tools/objtool/check.c   |  180 ++++++++++++++++++++++++++++++++++++++++++++----
>   tools/objtool/check.h   |    5 +
>   tools/objtool/orc_gen.c |  180 +++++++++++++++++++++++++++++++-----------------
>   3 files changed, 290 insertions(+), 75 deletions(-)
> 
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1096,6 +1096,32 @@ static int handle_group_alt(struct objto
>   		return -1;
>   	}
>   
> +	/*
> +	 * Add the filler NOP, required for alternative CFI.
> +	 */
> +	if (special_alt->group && special_alt->new_len < special_alt->orig_len) {
> +		struct instruction *nop = malloc(sizeof(*nop));
> +		if (!nop) {
> +			WARN("malloc failed");
> +			return -1;
> +		}
> +		memset(nop, 0, sizeof(*nop));
> +		INIT_LIST_HEAD(&nop->alts);
> +		INIT_LIST_HEAD(&nop->stack_ops);
> +		init_cfi_state(&nop->cfi);
> +
> +		nop->sec = last_new_insn->sec;
> +		nop->ignore = last_new_insn->ignore;
> +		nop->func = last_new_insn->func;
> +		nop->alt_group = alt_group;
> +		nop->offset = last_new_insn->offset + last_new_insn->len;
> +		nop->type = INSN_NOP;
> +		nop->len = special_alt->orig_len - special_alt->new_len;
> +
> +		list_add(&nop->list, &last_new_insn->list);
> +		last_new_insn = nop;
> +	}
> +
>   	if (fake_jump)
>   		list_add(&fake_jump->list, &last_new_insn->list);
>   
> @@ -2237,18 +2263,12 @@ static int handle_insn_ops(struct instru
>   	struct stack_op *op;
>   
>   	list_for_each_entry(op, &insn->stack_ops, list) {
> -		struct cfi_state old_cfi = state->cfi;
>   		int res;
>   
>   		res = update_cfi_state(insn, &state->cfi, op);
>   		if (res)
>   			return res;
>   
> -		if (insn->alt_group && memcmp(&state->cfi, &old_cfi, sizeof(struct cfi_state))) {
> -			WARN_FUNC("alternative modifies stack", insn->sec, insn->offset);
> -			return -1;
> -		}
> -
>   		if (op->dest.type == OP_DEST_PUSHF) {
>   			if (!state->uaccess_stack) {
>   				state->uaccess_stack = 1;
> @@ -2460,19 +2480,137 @@ static int validate_return(struct symbol
>    * unreported (because they're NOPs), such holes would result in CFI_UNDEFINED
>    * states which then results in ORC entries, which we just said we didn't want.
>    *
> - * Avoid them by copying the CFI entry of the first instruction into the whole
> - * alternative.
> + * Avoid them by copying the CFI entry of the first instruction into the hole.
>    */
> -static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn)
> +static void __fill_alt_cfi(struct objtool_file *file, struct instruction *insn)
>   {
>   	struct instruction *first_insn = insn;
>   	int alt_group = insn->alt_group;
>   
> -	sec_for_each_insn_continue(file, insn) {
> +	sec_for_each_insn_from(file, insn) {
>   		if (insn->alt_group != alt_group)
>   			break;
> -		insn->cfi = first_insn->cfi;
> +
> +		if (!insn->visited)
> +			insn->cfi = first_insn->cfi;
> +	}
> +}
> +
> +static void fill_alt_cfi(struct objtool_file *file, struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	__fill_alt_cfi(file, alt_insn);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list)
> +		__fill_alt_cfi(file, alt->insn);
> +}
> +
> +static struct instruction *
> +__find_unwind(struct objtool_file *file,
> +	      struct instruction *insn, unsigned long offset)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *next;
> +	unsigned long off = 0;
> +
> +	while ((off + insn->len) <= offset) {
> +		next = next_insn_same_sec(file, insn);
> +		if (next && next->alt_group != alt_group)
> +			next = NULL;
> +
> +		if (!next)
> +			break;
> +
> +		off += insn->len;
> +		insn = next;
>   	}
> +
> +	return insn;
> +}
> +
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset)
> +{
> +	struct instruction *fit;
> +	struct alternative *alt;
> +	unsigned long fit_off;
> +
> +	fit = __find_unwind(file, alt_insn, offset);
> +	fit_off = (fit->offset - alt_insn->offset);
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		struct instruction *x;
> +		unsigned long x_off;
> +
> +		x = __find_unwind(file, alt->insn, offset);
> +		x_off = (x->offset - alt->insn->offset);
> +
> +		if (fit_off < x_off) {
> +			fit = x;
> +			fit_off = x_off;
> +
> +		} else if (fit_off == x_off &&
> +			   memcmp(&fit->cfi, &x->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(fit->sec, fit->offset);
> +			char *_str2 = offstr(x->sec, x->offset);
> +			WARN("%s: equal-offset incompatible alternative: %s\n", _str1, _str2);
> +			free(_str1);
> +			free(_str2);
> +			return fit;
> +		}
> +	}
> +
> +	return fit;
> +}
> +
> +static int __validate_unwind(struct objtool_file *file,
> +			     struct instruction *alt_insn,
> +			     struct instruction *insn)
> +{
> +	int alt_group = insn->alt_group;
> +	struct instruction *unwind;
> +	unsigned long offset = 0;
> +
> +	sec_for_each_insn_from(file, insn) {
> +		if (insn->alt_group != alt_group)
> +			break;
> +
> +		unwind = find_alt_unwind(file, alt_insn, offset);
> +
> +		if (memcmp(&insn->cfi, &unwind->cfi, sizeof(struct cfi_state))) {
> +
> +			char *_str1 = offstr(insn->sec, insn->offset);
> +			char *_str2 = offstr(unwind->sec, unwind->offset);
> +			WARN("%s: unwind incompatible alternative: %s (%ld)\n",
> +			     _str1, _str2, offset);
> +			free(_str1);
> +			free(_str2);
> +			return 1;
> +		}
> +
> +		offset += insn->len;
> +	}
> +
> +	return 0;
> +}
> +
> +static int validate_alt_unwind(struct objtool_file *file,
> +			       struct instruction *alt_insn)
> +{
> +	struct alternative *alt;
> +
> +	if (__validate_unwind(file, alt_insn, alt_insn))
> +		return 1;
> +
> +	list_for_each_entry(alt, &alt_insn->alts, list) {
> +		if (__validate_unwind(file, alt_insn, alt->insn))
> +			return 1;
> +	}
> +
> +	return 0;
>   }
>   
>   /*
> @@ -2484,9 +2622,10 @@ static void fill_alternative_cfi(struct
>   static int validate_branch(struct objtool_file *file, struct symbol *func,
>   			   struct instruction *insn, struct insn_state state)
>   {
> +	struct instruction *next_insn, *alt_insn = NULL;
>   	struct alternative *alt;
> -	struct instruction *next_insn;
>   	struct section *sec;
> +	int alt_group = 0;
>   	u8 visited;
>   	int ret;
>   
> @@ -2541,8 +2680,10 @@ static int validate_branch(struct objtoo
>   				}
>   			}
>   
> -			if (insn->alt_group)
> -				fill_alternative_cfi(file, insn);
> +			if (insn->alt_group) {
> +				alt_insn = insn;
> +				alt_group = insn->alt_group;
> +			}
>   
>   			if (skip_orig)
>   				return 0;
> @@ -2697,6 +2838,17 @@ static int validate_branch(struct objtoo
>   		}
>   
>   		insn = next_insn;
> +
> +		if (alt_insn && insn->alt_group != alt_group) {
> +			alt_insn->alt_end = insn;
> +
> +			fill_alt_cfi(file, alt_insn);
> +
> +			if (validate_alt_unwind(file, alt_insn))
> +				return 1;
> +
> +			alt_insn = NULL;
> +		}
>   	}
>   
>   	return 0;
> --- a/tools/objtool/check.h
> +++ b/tools/objtool/check.h
> @@ -41,6 +41,7 @@ struct instruction {
>   	struct instruction *first_jump_src;
>   	struct reloc *jump_table;
>   	struct list_head alts;
> +	struct instruction *alt_end;
>   	struct symbol *func;
>   	struct list_head stack_ops;
>   	struct cfi_state cfi;
> @@ -55,6 +56,10 @@ static inline bool is_static_jump(struct
>   	       insn->type == INSN_JUMP_UNCONDITIONAL;
>   }
>   
> +struct instruction *
> +find_alt_unwind(struct objtool_file *file,
> +		struct instruction *alt_insn, unsigned long offset);
> +
>   struct instruction *find_insn(struct objtool_file *file,
>   			      struct section *sec, unsigned long offset);
>   
> --- a/tools/objtool/orc_gen.c
> +++ b/tools/objtool/orc_gen.c
> @@ -12,75 +12,86 @@
>   #include "check.h"
>   #include "warn.h"
>   
> -int create_orc(struct objtool_file *file)
> +static int create_orc_insn(struct objtool_file *file, struct instruction *insn)
>   {
> -	struct instruction *insn;
> +	struct orc_entry *orc = &insn->orc;
> +	struct cfi_reg *cfa = &insn->cfi.cfa;
> +	struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +
> +	orc->end = insn->cfi.end;
> +
> +	if (cfa->base == CFI_UNDEFINED) {
> +		orc->sp_reg = ORC_REG_UNDEFINED;
> +		return 0;
> +	}
>   
> -	for_each_insn(file, insn) {
> -		struct orc_entry *orc = &insn->orc;
> -		struct cfi_reg *cfa = &insn->cfi.cfa;
> -		struct cfi_reg *bp = &insn->cfi.regs[CFI_BP];
> +	switch (cfa->base) {
> +	case CFI_SP:
> +		orc->sp_reg = ORC_REG_SP;
> +		break;
> +	case CFI_SP_INDIRECT:
> +		orc->sp_reg = ORC_REG_SP_INDIRECT;
> +		break;
> +	case CFI_BP:
> +		orc->sp_reg = ORC_REG_BP;
> +		break;
> +	case CFI_BP_INDIRECT:
> +		orc->sp_reg = ORC_REG_BP_INDIRECT;
> +		break;
> +	case CFI_R10:
> +		orc->sp_reg = ORC_REG_R10;
> +		break;
> +	case CFI_R13:
> +		orc->sp_reg = ORC_REG_R13;
> +		break;
> +	case CFI_DI:
> +		orc->sp_reg = ORC_REG_DI;
> +		break;
> +	case CFI_DX:
> +		orc->sp_reg = ORC_REG_DX;
> +		break;
> +	default:
> +		WARN_FUNC("unknown CFA base reg %d",
> +			  insn->sec, insn->offset, cfa->base);
> +		return -1;
> +	}
>   
> -		if (!insn->sec->text)
> -			continue;
> +	switch(bp->base) {
> +	case CFI_UNDEFINED:
> +		orc->bp_reg = ORC_REG_UNDEFINED;
> +		break;
> +	case CFI_CFA:
> +		orc->bp_reg = ORC_REG_PREV_SP;
> +		break;
> +	case CFI_BP:
> +		orc->bp_reg = ORC_REG_BP;
> +		break;
> +	default:
> +		WARN_FUNC("unknown BP base reg %d",
> +			  insn->sec, insn->offset, bp->base);
> +		return -1;
> +	}
>   
> -		orc->end = insn->cfi.end;
> +	orc->sp_offset = cfa->offset;
> +	orc->bp_offset = bp->offset;
> +	orc->type = insn->cfi.type;
>   
> -		if (cfa->base == CFI_UNDEFINED) {
> -			orc->sp_reg = ORC_REG_UNDEFINED;
> -			continue;
> -		}
> +	return 0;
> +}
>   
> -		switch (cfa->base) {
> -		case CFI_SP:
> -			orc->sp_reg = ORC_REG_SP;
> -			break;
> -		case CFI_SP_INDIRECT:
> -			orc->sp_reg = ORC_REG_SP_INDIRECT;
> -			break;
> -		case CFI_BP:
> -			orc->sp_reg = ORC_REG_BP;
> -			break;
> -		case CFI_BP_INDIRECT:
> -			orc->sp_reg = ORC_REG_BP_INDIRECT;
> -			break;
> -		case CFI_R10:
> -			orc->sp_reg = ORC_REG_R10;
> -			break;
> -		case CFI_R13:
> -			orc->sp_reg = ORC_REG_R13;
> -			break;
> -		case CFI_DI:
> -			orc->sp_reg = ORC_REG_DI;
> -			break;
> -		case CFI_DX:
> -			orc->sp_reg = ORC_REG_DX;
> -			break;
> -		default:
> -			WARN_FUNC("unknown CFA base reg %d",
> -				  insn->sec, insn->offset, cfa->base);
> -			return -1;
> -		}
> +int create_orc(struct objtool_file *file)
> +{
> +	struct instruction *insn;
>   
> -		switch(bp->base) {
> -		case CFI_UNDEFINED:
> -			orc->bp_reg = ORC_REG_UNDEFINED;
> -			break;
> -		case CFI_CFA:
> -			orc->bp_reg = ORC_REG_PREV_SP;
> -			break;
> -		case CFI_BP:
> -			orc->bp_reg = ORC_REG_BP;
> -			break;
> -		default:
> -			WARN_FUNC("unknown BP base reg %d",
> -				  insn->sec, insn->offset, bp->base);
> -			return -1;
> -		}
> +	for_each_insn(file, insn) {
> +		int ret;
> +
> +		if (!insn->sec->text)
> +			continue;
>   
> -		orc->sp_offset = cfa->offset;
> -		orc->bp_offset = bp->offset;
> -		orc->type = insn->cfi.type;
> +		ret = create_orc_insn(file, insn);
> +		if (ret)
> +			return ret;
>   	}
>   
>   	return 0;
> @@ -166,6 +177,28 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					/* XXX: skipped earlier ! */
> +					create_orc_insn(file, unwind);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +						idx++;
> +//						WARN_FUNC("ORC @ %d/%d", sec, insn->offset+offset, offset, alt_len);
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn ||
>   			    memcmp(&insn->orc, &prev_insn->orc,
>   				   sizeof(struct orc_entry))) {
> @@ -203,6 +236,31 @@ int create_orc_sections(struct objtool_f
>   
>   		prev_insn = NULL;
>   		sec_for_each_insn(file, sec, insn) {
> +
> +			if (insn->alt_end) {
> +				unsigned int offset, alt_len;
> +				struct instruction *unwind;
> +
> +				alt_len = insn->alt_end->offset - insn->offset;
> +				for (offset = 0; offset < alt_len; offset++) {
> +					unwind = find_alt_unwind(file, insn, offset);
> +					if (!prev_insn ||
> +					    memcmp(&unwind->orc, &prev_insn->orc,
> +						   sizeof(struct orc_entry))) {
> +
> +						if (create_orc_entry(file->elf, u_sec, ip_relocsec, idx,
> +								     insn->sec, insn->offset + offset,
> +								     &unwind->orc))
> +							return -1;
> +
> +						idx++;
> +					}
> +					prev_insn = unwind;
> +				}
> +
> +				insn = insn->alt_end;
> +			}
> +
>   			if (!prev_insn || memcmp(&insn->orc, &prev_insn->orc,
>   						 sizeof(struct orc_entry))) {
>   
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 


[-- Attachment #1.1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-15 14:54         ` Peter Zijlstra
  2020-12-15 15:07           ` Jürgen Groß via Virtualization
@ 2020-12-16  0:38           ` Josh Poimboeuf
  2020-12-16  8:40             ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2020-12-16  0:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	Jürgen Groß, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 03:54:08PM +0100, Peter Zijlstra wrote:
> The problem is that a single instance of unwind information (ORC) must
> capture and correctly unwind all alternatives. Since the trivially
> correct mandate is out, implement the straight forward brute-force
> approach:
> 
>  1) generate CFI information for each alternative
> 
>  2) unwind every alternative with the merge-sort of the previously
>     generated CFI information -- O(n^2)
> 
>  3) for any possible conflict: yell.
> 
>  4) Generate ORC with merge-sort
> 
> Specifically for 3 there are two possible classes of conflicts:
> 
>  - the merge-sort itself could find conflicting CFI for the same
>    offset.
> 
>  - the unwind can fail with the merged CFI.

So much algorithm.  Could we make it easier by caching the shared
per-alt-group CFI state somewhere along the way?

For example:

struct alt_group_info {

	/* first original insn in the group */
	struct instruction *orig_insn;

	/* max # of bytes in the group (cfi array size) */
	unsigned long nbytes;

	/* byte-offset-addressed array of CFI pointers */
	struct cfi_state **cfi;
};

We could change 'insn->alt_group' to be a pointer to a shared instance
of the above struct, so that all original and replacement instructions
in a group have a pointer to it.

Starting out, 'cfi' array is all NULLs.  Then when updating CFI, check
'insn->alt_group.cfi[offset]'.

[ 'offset' is a byte offset from the beginning of the group.  It could
  be calculated based on 'orig_insn' or 'orig_insn->alts', depending on
  whether 'insn' is an original or a replacement. ]

If the array entry is NULL, just update it with a pointer to the CFI.
If it's not NULL, make sure it matches the existing CFI, and WARN if it
doesn't.

Also, with this data structure, the ORC generation should also be a lot
more straightforward, just ignore the NULL entries.

Thoughts?  This is all theoretical of course, I could try to do a patch
tomorrow.

-- 
Josh

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16  0:38           ` Josh Poimboeuf
@ 2020-12-16  8:40             ` Peter Zijlstra
  2020-12-16 16:56               ` Josh Poimboeuf
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2020-12-16  8:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	Jürgen Groß, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Tue, Dec 15, 2020 at 06:38:02PM -0600, Josh Poimboeuf wrote:
> On Tue, Dec 15, 2020 at 03:54:08PM +0100, Peter Zijlstra wrote:
> > The problem is that a single instance of unwind information (ORC) must
> > capture and correctly unwind all alternatives. Since the trivially
> > correct mandate is out, implement the straight forward brute-force
> > approach:
> > 
> >  1) generate CFI information for each alternative
> > 
> >  2) unwind every alternative with the merge-sort of the previously
> >     generated CFI information -- O(n^2)
> > 
> >  3) for any possible conflict: yell.
> > 
> >  4) Generate ORC with merge-sort
> > 
> > Specifically for 3 there are two possible classes of conflicts:
> > 
> >  - the merge-sort itself could find conflicting CFI for the same
> >    offset.
> > 
> >  - the unwind can fail with the merged CFI.
> 
> So much algorithm.

:-)

It's not really hard, but it has a few pesky details (as always).

> Could we make it easier by caching the shared
> per-alt-group CFI state somewhere along the way?

Yes, but when I tried it grew the code required. Runtime costs would be
less, but I figured that since alternatives are typically few and small,
that wasn't a real consideration.

That is, it would basically cache the results of find_alt_unwind(), but
you still need find_alt_unwind() to generate that data, and so you gain
the code for filling and using the extra data structure.

Yes, computing it 3 times is naf, but meh.

> [ 'offset' is a byte offset from the beginning of the group.  It could
>   be calculated based on 'orig_insn' or 'orig_insn->alts', depending on
>   whether 'insn' is an original or a replacement. ]

That's exactly what it already does ofcourse ;-)

> If the array entry is NULL, just update it with a pointer to the CFI.
> If it's not NULL, make sure it matches the existing CFI, and WARN if it
> doesn't.
> 
> Also, with this data structure, the ORC generation should also be a lot
> more straightforward, just ignore the NULL entries.

Yeah, I suppose it gets rid of the memcmp-prev thing.

> Thoughts?  This is all theoretical of course, I could try to do a patch
> tomorrow.

No real objection, I just didn't do it because 1) it works, and 2) even
moar lines.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16  8:40             ` Peter Zijlstra
@ 2020-12-16 16:56               ` Josh Poimboeuf
  2020-12-16 17:58                 ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Josh Poimboeuf @ 2020-12-16 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	Jürgen Groß, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Wed, Dec 16, 2020 at 09:40:59AM +0100, Peter Zijlstra wrote:
> > So much algorithm.
> 
> :-)
> 
> It's not really hard, but it has a few pesky details (as always).

It really hurt my brain to look at it.

> > Could we make it easier by caching the shared
> > per-alt-group CFI state somewhere along the way?
> 
> Yes, but when I tried it grew the code required. Runtime costs would be
> less, but I figured that since alternatives are typically few and small,
> that wasn't a real consideration.

Aren't alternatives going to be everywhere now with paravirt using them?

> That is, it would basically cache the results of find_alt_unwind(), but
> you still need find_alt_unwind() to generate that data, and so you gain
> the code for filling and using the extra data structure.
> 
> Yes, computing it 3 times is naf, but meh.

Haha, I loved this sentence.

> > Thoughts?  This is all theoretical of course, I could try to do a patch
> > tomorrow.
> 
> No real objection, I just didn't do it because 1) it works, and 2) even
> moar lines.

I'm kind of surprised it would need moar lines.  Let me play around with
it and maybe I'll come around ;-)

-- 
Josh

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/12] x86: major paravirt cleanup
  2020-12-16 16:56               ` Josh Poimboeuf
@ 2020-12-16 17:58                 ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2020-12-16 17:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Juri Lelli, linux-hyperv, Daniel Lezcano, Wanpeng Li, kvm,
	VMware, Inc., virtualization, Ben Segall, H. Peter Anvin,
	Boris Ostrovsky, Wei Liu, Stefano Stabellini, Stephen Hemminger,
	Joerg Roedel, x86, Ingo Molnar, Mel Gorman, xen-devel,
	Haiyang Zhang, Steven Rostedt, Borislav Petkov, luto,
	Vincent Guittot, Thomas Gleixner, Dietmar Eggemann, Jim Mattson,
	Jürgen Groß, linux-kernel, Sean Christopherson,
	Paolo Bonzini, Daniel Bristot de Oliveira

On Wed, Dec 16, 2020 at 10:56:05AM -0600, Josh Poimboeuf wrote:
> On Wed, Dec 16, 2020 at 09:40:59AM +0100, Peter Zijlstra wrote:

> > > Could we make it easier by caching the shared
> > > per-alt-group CFI state somewhere along the way?
> > 
> > Yes, but when I tried it grew the code required. Runtime costs would be
> > less, but I figured that since alternatives are typically few and small,
> > that wasn't a real consideration.
> 
> Aren't alternatives going to be everywhere now with paravirt using them?

What I meant was, they're either 2-3 wide and only a few instructions
long. Which greatly bounds the actual complexity of the algorithm,
however daft.

> > No real objection, I just didn't do it because 1) it works, and 2) even
> > moar lines.
> 
> I'm kind of surprised it would need moar lines.  Let me play around with
> it and maybe I'll come around ;-)

Please do, it could be getting all the niggly bits right exhausted my
brain enough to miss the obvious ;-)
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-12-16 17:59 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-20 11:46 [PATCH v2 00/12] x86: major paravirt cleanup Juergen Gross via Virtualization
2020-11-20 11:46 ` [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE Juergen Gross via Virtualization
2020-11-27 11:31   ` Borislav Petkov
2020-12-09 21:07   ` Thomas Gleixner
2020-11-20 11:46 ` [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call Juergen Gross via Virtualization
2020-12-02 12:32   ` Borislav Petkov
2020-12-02 14:48     ` Jürgen Groß via Virtualization
2020-12-02 17:08       ` Borislav Petkov
2020-11-20 11:46 ` [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf Juergen Gross via Virtualization
2020-11-20 11:59   ` Peter Zijlstra
2020-11-20 12:05     ` Jürgen Groß via Virtualization
2020-11-22  6:55     ` Jürgen Groß via Virtualization
2020-11-22 21:44       ` Andy Lutomirski
2020-11-23  5:21         ` Jürgen Groß via Virtualization
2020-11-23 15:15           ` Andy Lutomirski
2020-12-09 13:27         ` Mark Rutland
2020-12-09 14:02           ` Mark Rutland
2020-12-09 14:05             ` Jürgen Groß via Virtualization
2020-12-09 18:15     ` Mark Rutland
2020-12-09 18:54       ` Thomas Gleixner
2020-12-10 11:10         ` Mark Rutland
2020-12-10 20:15           ` x86/ioapic: Cleanup the timer_works() irqflags mess Thomas Gleixner
2020-12-11  5:10             ` Jürgen Groß via Virtualization
2020-11-20 11:46 ` [PATCH v2 06/12] x86/paravirt: switch time pvops functions to use static_call() Juergen Gross via Virtualization
2020-11-20 12:01   ` Peter Zijlstra
2020-11-20 12:07     ` Jürgen Groß via Virtualization
2020-11-20 11:46 ` [PATCH v2 08/12] x86/paravirt: remove no longer needed 32-bit pvops cruft Juergen Gross via Virtualization
2020-11-20 12:08   ` Peter Zijlstra
2020-11-20 12:16     ` Jürgen Groß via Virtualization
2020-11-20 11:46 ` [PATCH v2 09/12] x86/paravirt: switch iret pvops to ALTERNATIVE Juergen Gross via Virtualization
2020-11-20 11:46 ` [PATCH v2 10/12] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs Juergen Gross via Virtualization
2020-11-20 11:46 ` [PATCH v2 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE Juergen Gross via Virtualization
2020-11-20 15:46   ` kernel test robot
2020-11-20 11:46 ` [PATCH v2 12/12] x86/paravirt: have only one paravirt patch function Juergen Gross via Virtualization
2020-11-20 14:18   ` kernel test robot
2020-11-20 12:53 ` [PATCH v2 00/12] x86: major paravirt cleanup Peter Zijlstra
2020-11-23 13:43   ` Peter Zijlstra
2020-12-15 11:42     ` Jürgen Groß via Virtualization
2020-12-15 14:18       ` Peter Zijlstra
2020-12-15 14:54         ` Peter Zijlstra
2020-12-15 15:07           ` Jürgen Groß via Virtualization
2020-12-16  0:38           ` Josh Poimboeuf
2020-12-16  8:40             ` Peter Zijlstra
2020-12-16 16:56               ` Josh Poimboeuf
2020-12-16 17:58                 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).