* Re: [Xen-devel] [PATCH 04/10] x86/paravirt: use a single ops structure
From: Juergen Gross @ 2018-08-10 12:30 UTC (permalink / raw)
To: Jan Beulich
Cc: rusty, the arch/x86 maintainers, Alok Kataria, lkml,
Linux Virtualization, mingo, H. Peter Anvin, xen-devel,
Thomas Gleixner, Boris Ostrovsky
In-Reply-To: <5B6D7FB402000078001D?= =?UTF-8?Q?CF30@suse.com>
On 10/08/18 14:06, Jan Beulich wrote:
>>>> On 10.08.18 at 13:52, <jgross@suse.com> wrote:
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -228,9 +228,9 @@ void hyperv_setup_mmu_ops(void)
>>
>> if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED)) {
>> pr_info("Using hypercall for remote TLB flush\n");
>> - pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
>> + pv_ops.pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
>
> Taking just this as example, why not
>
> pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
>
> ? Both pv_ and _ops are redundant on the field names.
Good idea.
Juergen
^ permalink raw reply
* Re: [Xen-devel] [PATCH 04/10] x86/paravirt: use a single ops structure
From: Jan Beulich @ 2018-08-10 12:06 UTC (permalink / raw)
To: Juergen Gross
Cc: rusty, the arch/x86 maintainers, Alok Kataria, linux-kernel,
Linux Virtualization, mingo, hpa, xen-devel, tglx,
Boris Ostrovsky
In-Reply-To: <20180810115252.18213-5-jgross@suse.com>
>>> On 10.08.18 at 13:52, <jgross@suse.com> wrote:
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -228,9 +228,9 @@ void hyperv_setup_mmu_ops(void)
>
> if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED)) {
> pr_info("Using hypercall for remote TLB flush\n");
> - pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
> + pv_ops.pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
Taking just this as example, why not
pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
? Both pv_ and _ops are redundant on the field names.
Jan
^ permalink raw reply
* [PATCH 10/10] x86/paravirt: move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
Most of the paravirt ops defined in pv_mmu_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/fixmap.h | 2 +-
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/include/asm/paravirt.h | 115 +++++++++++++++++-----------------
arch/x86/include/asm/paravirt_types.h | 29 ++++-----
arch/x86/include/asm/pgalloc.h | 2 +-
arch/x86/include/asm/pgtable.h | 7 +--
arch/x86/include/asm/special_insns.h | 11 +---
arch/x86/kernel/asm-offsets.c | 2 +-
arch/x86/kernel/head_64.S | 4 +-
arch/x86/kernel/paravirt.c | 15 +++--
arch/x86/kernel/paravirt_patch_32.c | 4 +-
arch/x86/kernel/paravirt_patch_64.c | 4 +-
12 files changed, 97 insertions(+), 102 deletions(-)
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index e203169931c7..ac80e7eadc3a 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -152,7 +152,7 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
void native_set_fixmap(enum fixed_addresses idx,
phys_addr_t phys, pgprot_t flags);
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_XXL
static inline void __set_fixmap(enum fixed_addresses idx,
phys_addr_t phys, pgprot_t flags)
{
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index bbc796eb0a3b..ffae17a8db36 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,12 +16,12 @@
extern atomic64_t last_mm_ctx_id;
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_XXL
static inline void paravirt_activate_mm(struct mm_struct *prev,
struct mm_struct *next)
{
}
-#endif /* !CONFIG_PARAVIRT */
+#endif /* !CONFIG_PARAVIRT_XXL */
#ifdef CONFIG_PERF_EVENTS
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 220c13d7e846..520c85b74c74 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,6 +17,57 @@
#include <linux/cpumask.h>
#include <asm/frame.h>
+static inline unsigned long long paravirt_sched_clock(void)
+{
+ return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
+}
+
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+ return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+}
+
+/* The paravirtualized I/O functions */
+static inline void slow_down_io(void)
+{
+ pv_ops.pv_cpu_ops.io_delay();
+#ifdef REALLY_SLOW_IO
+ pv_ops.pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
+#endif
+}
+
+static inline void __flush_tlb(void)
+{
+ PVOP_VCALL0(pv_mmu_ops.flush_tlb_user);
+}
+
+static inline void __flush_tlb_global(void)
+{
+ PVOP_VCALL0(pv_mmu_ops.flush_tlb_kernel);
+}
+
+static inline void __flush_tlb_one_user(unsigned long addr)
+{
+ PVOP_VCALL1(pv_mmu_ops.flush_tlb_one_user, addr);
+}
+
+static inline void flush_tlb_others(const struct cpumask *cpumask,
+ const struct flush_tlb_info *info)
+{
+ PVOP_VCALL2(pv_mmu_ops.flush_tlb_others, cpumask, info);
+}
+
+static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
+{
+ PVOP_VCALL1(pv_mmu_ops.exit_mmap, mm);
+}
+
#ifdef CONFIG_PARAVIRT_XXL
static inline void load_sp0(unsigned long sp0)
{
@@ -52,7 +103,6 @@ static inline void write_cr0(unsigned long x)
{
PVOP_VCALL1(pv_cpu_ops.write_cr0, x);
}
-#endif
static inline unsigned long read_cr2(void)
{
@@ -74,7 +124,6 @@ static inline void write_cr3(unsigned long x)
PVOP_VCALL1(pv_mmu_ops.write_cr3, x);
}
-#ifdef CONFIG_PARAVIRT_XXL
static inline void __write_cr4(unsigned long x)
{
PVOP_VCALL1(pv_cpu_ops.write_cr4, x);
@@ -172,23 +221,7 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
*p = paravirt_read_msr_safe(msr, &err);
return err;
}
-#endif
-static inline unsigned long long paravirt_sched_clock(void)
-{
- return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
-}
-
-struct static_key;
-extern struct static_key paravirt_steal_enabled;
-extern struct static_key paravirt_steal_rq_enabled;
-
-static inline u64 paravirt_steal_clock(int cpu)
-{
- return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
-}
-
-#ifdef CONFIG_PARAVIRT_XXL
static inline unsigned long long paravirt_read_pmc(int counter)
{
return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter);
@@ -267,18 +300,6 @@ static inline void set_iopl_mask(unsigned mask)
{
PVOP_VCALL1(pv_cpu_ops.set_iopl_mask, mask);
}
-#endif
-
-/* The paravirtualized I/O functions */
-static inline void slow_down_io(void)
-{
- pv_ops.pv_cpu_ops.io_delay();
-#ifdef REALLY_SLOW_IO
- pv_ops.pv_cpu_ops.io_delay();
- pv_ops.pv_cpu_ops.io_delay();
- pv_ops.pv_cpu_ops.io_delay();
-#endif
-}
static inline void paravirt_activate_mm(struct mm_struct *prev,
struct mm_struct *next)
@@ -292,30 +313,6 @@ static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
PVOP_VCALL2(pv_mmu_ops.dup_mmap, oldmm, mm);
}
-static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
-{
- PVOP_VCALL1(pv_mmu_ops.exit_mmap, mm);
-}
-
-static inline void __flush_tlb(void)
-{
- PVOP_VCALL0(pv_mmu_ops.flush_tlb_user);
-}
-static inline void __flush_tlb_global(void)
-{
- PVOP_VCALL0(pv_mmu_ops.flush_tlb_kernel);
-}
-static inline void __flush_tlb_one_user(unsigned long addr)
-{
- PVOP_VCALL1(pv_mmu_ops.flush_tlb_one_user, addr);
-}
-
-static inline void flush_tlb_others(const struct cpumask *cpumask,
- const struct flush_tlb_info *info)
-{
- PVOP_VCALL2(pv_mmu_ops.flush_tlb_others, cpumask, info);
-}
-
static inline int paravirt_pgd_alloc(struct mm_struct *mm)
{
return PVOP_CALL1(int, pv_mmu_ops.pgd_alloc, mm);
@@ -640,7 +637,6 @@ static inline void pmd_clear(pmd_t *pmdp)
}
#endif /* CONFIG_X86_PAE */
-#ifdef CONFIG_PARAVIRT_XXL
#define __HAVE_ARCH_START_CONTEXT_SWITCH
static inline void arch_start_context_switch(struct task_struct *prev)
{
@@ -651,7 +647,6 @@ static inline void arch_end_context_switch(struct task_struct *next)
{
PVOP_VCALL1(pv_cpu_ops.end_context_switch, next);
}
-#endif
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
static inline void arch_enter_lazy_mmu_mode(void)
@@ -674,6 +669,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
{
pv_ops.pv_mmu_ops.set_fixmap(idx, phys, flags);
}
+#endif
#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
@@ -959,15 +955,20 @@ extern void default_banner(void);
#endif /* __ASSEMBLY__ */
#else /* CONFIG_PARAVIRT */
# define default_banner x86_init_noop
+#endif /* !CONFIG_PARAVIRT */
+
#ifndef __ASSEMBLY__
+#ifndef CONFIG_PARAVIRT_XXL
static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
struct mm_struct *mm)
{
}
+#endif
+#ifndef CONFIG_PARAVIRT
static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
{
}
+#endif
#endif /* __ASSEMBLY__ */
-#endif /* !CONFIG_PARAVIRT */
#endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 938ac2bece81..a9b72dae18b2 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -90,13 +90,14 @@ struct pv_init_ops {
unsigned long addr, unsigned len);
} __no_randomize_layout;
-
+#ifdef CONFIG_PARAVIRT_XXL
struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
void (*enter)(void);
void (*leave)(void);
void (*flush)(void);
} __no_randomize_layout;
+#endif
struct pv_time_ops {
unsigned long long (*sched_clock)(void);
@@ -205,29 +206,28 @@ struct pv_irq_ops {
} __no_randomize_layout;
struct pv_mmu_ops {
+ /* TLB operations */
+ void (*flush_tlb_user)(void);
+ void (*flush_tlb_kernel)(void);
+ void (*flush_tlb_one_user)(unsigned long addr);
+ void (*flush_tlb_others)(const struct cpumask *cpus,
+ const struct flush_tlb_info *info);
+
+ /* Hook for intercepting the destruction of an mm_struct. */
+ void (*exit_mmap)(struct mm_struct *mm);
+
+#ifdef CONFIG_PARAVIRT_XXL
unsigned long (*read_cr2)(void);
void (*write_cr2)(unsigned long);
unsigned long (*read_cr3)(void);
void (*write_cr3)(unsigned long);
- /*
- * Hooks for intercepting the creation/use/destruction of an
- * mm_struct.
- */
+ /* Hooks for intercepting the creation/use of an mm_struct. */
void (*activate_mm)(struct mm_struct *prev,
struct mm_struct *next);
void (*dup_mmap)(struct mm_struct *oldmm,
struct mm_struct *mm);
- void (*exit_mmap)(struct mm_struct *mm);
-
-
- /* TLB operations */
- void (*flush_tlb_user)(void);
- void (*flush_tlb_kernel)(void);
- void (*flush_tlb_one_user)(unsigned long addr);
- void (*flush_tlb_others)(const struct cpumask *cpus,
- const struct flush_tlb_info *info);
/* Hooks for allocating and freeing a pagetable top-level */
int (*pgd_alloc)(struct mm_struct *mm);
@@ -302,6 +302,7 @@ struct pv_mmu_ops {
an mfn. We can tell which is which from the index. */
void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx,
phys_addr_t phys, pgprot_t flags);
+#endif
} __no_randomize_layout;
struct arch_spinlock;
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index fbd578daa66e..ec7f43327033 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -8,7 +8,7 @@
static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; }
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
#define paravirt_pgd_alloc(mm) __paravirt_pgd_alloc(mm)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9ea291fe7107..b9abc525ece3 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -52,9 +52,9 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
extern pmdval_t early_pmd_flags;
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
-#else /* !CONFIG_PARAVIRT */
+#else /* !CONFIG_PARAVIRT_XXL */
#define set_pte(ptep, pte) native_set_pte(ptep, pte)
#define set_pte_at(mm, addr, ptep, pte) native_set_pte_at(mm, addr, ptep, pte)
@@ -108,9 +108,6 @@ extern pmdval_t early_pmd_flags;
#define pte_val(x) native_pte_val(x)
#define __pte(x) native_make_pte(x)
-#endif /* CONFIG_PARAVIRT */
-
-#ifndef CONFIG_PARAVIRT_XXL
#define arch_end_context_switch(prev) do {} while(0)
#endif /* CONFIG_PARAVIRT_XXL */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 2aa6ce4bf159..43c029cdc3fe 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -141,11 +141,10 @@ static inline unsigned long __read_cr4(void)
return native_read_cr4();
}
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
-#endif
+#else
-#ifndef CONFIG_PARAVIRT_XXL
static inline unsigned long read_cr0(void)
{
return native_read_cr0();
@@ -155,9 +154,7 @@ static inline void write_cr0(unsigned long x)
{
native_write_cr0(x);
}
-#endif
-#ifndef CONFIG_PARAVIRT
static inline unsigned long read_cr2(void)
{
return native_read_cr2();
@@ -181,9 +178,7 @@ static inline void write_cr3(unsigned long x)
{
native_write_cr3(x);
}
-#endif
-#ifndef CONFIG_PARAVIRT_XXL
static inline void __write_cr4(unsigned long x)
{
native_write_cr4(x);
@@ -213,7 +208,7 @@ static inline void load_gs_index(unsigned selector)
#endif
-#endif/* CONFIG_PARAVIRT_XXL */
+#endif /* CONFIG_PARAVIRT_XXL */
static inline void clflush(volatile void *__p)
{
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 429db3c8a0cc..1871bb388bb0 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -72,9 +72,9 @@ void common(void) {
pv_irq_ops.irq_enable);
#ifdef CONFIG_PARAVIRT_XXL
OFFSET(PV_CPU_iret, paravirt_patch_template, pv_cpu_ops.iret);
-#endif
OFFSET(PV_MMU_read_cr2, paravirt_patch_template, pv_mmu_ops.read_cr2);
#endif
+#endif
#ifdef CONFIG_XEN
BLANK();
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e11b96b2dc6b..981fd802830f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,14 +25,12 @@
#include <asm/export.h>
#include <asm/nospec-branch.h>
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/asm-offsets.h>
#include <asm/paravirt.h>
#define GET_CR2_INTO(reg) GET_CR2_INTO_RAX ; movq %rax, reg
#else
#define GET_CR2_INTO(reg) movq %cr2, reg
-#endif
-#ifndef CONFIG_PARAVIRT_XXL
#define INTERRUPT_RETURN iretq
#endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 19bfb3d2083f..36c60fbf61fd 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -385,16 +385,19 @@ struct paravirt_patch_template pv_ops = {
#endif
/* Mmu ops. */
- .pv_mmu_ops.read_cr2 = native_read_cr2,
- .pv_mmu_ops.write_cr2 = native_write_cr2,
- .pv_mmu_ops.read_cr3 = __native_read_cr3,
- .pv_mmu_ops.write_cr3 = native_write_cr3,
-
.pv_mmu_ops.flush_tlb_user = native_flush_tlb,
.pv_mmu_ops.flush_tlb_kernel = native_flush_tlb_global,
.pv_mmu_ops.flush_tlb_one_user = native_flush_tlb_one_user,
.pv_mmu_ops.flush_tlb_others = native_flush_tlb_others,
+ .pv_mmu_ops.exit_mmap = paravirt_nop,
+
+#ifdef CONFIG_PARAVIRT_XXL
+ .pv_mmu_ops.read_cr2 = native_read_cr2,
+ .pv_mmu_ops.write_cr2 = native_write_cr2,
+ .pv_mmu_ops.read_cr3 = __native_read_cr3,
+ .pv_mmu_ops.write_cr3 = native_write_cr3,
+
.pv_mmu_ops.pgd_alloc = __paravirt_pgd_alloc,
.pv_mmu_ops.pgd_free = paravirt_nop,
@@ -447,7 +450,6 @@ struct paravirt_patch_template pv_ops = {
.pv_mmu_ops.make_pgd = PTE_IDENT,
.pv_mmu_ops.dup_mmap = paravirt_nop,
- .pv_mmu_ops.exit_mmap = paravirt_nop,
.pv_mmu_ops.activate_mm = paravirt_nop,
.pv_mmu_ops.lazy_mode = {
@@ -457,6 +459,7 @@ struct paravirt_patch_template pv_ops = {
},
.pv_mmu_ops.set_fixmap = native_set_fixmap,
+#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
/* Lock ops. */
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index 704ed7718062..a4e66a912121 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -7,10 +7,10 @@ DEF_NATIVE(pv_irq_ops, restore_fl, "push %eax; popf");
DEF_NATIVE(pv_irq_ops, save_fl, "pushf; pop %eax");
#ifdef CONFIG_PARAVIRT_XXL
DEF_NATIVE(pv_cpu_ops, iret, "iret");
-#endif
DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
+#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
@@ -49,10 +49,10 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(pv_irq_ops, save_fl);
#ifdef CONFIG_PARAVIRT_XXL
PATCH_SITE(pv_cpu_ops, iret);
-#endif
PATCH_SITE(pv_mmu_ops, read_cr2);
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
+#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 131a7eb01a13..abc0c94b8caa 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -7,10 +7,10 @@ DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
DEF_NATIVE(pv_irq_ops, restore_fl, "pushq %rdi; popfq");
DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
+#ifdef CONFIG_PARAVIRT_XXL
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-#ifdef CONFIG_PARAVIRT_XXL
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq");
@@ -59,10 +59,10 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(pv_cpu_ops, usergs_sysret64);
PATCH_SITE(pv_cpu_ops, swapgs);
PATCH_SITE(pv_cpu_ops, wbinvd);
-#endif
PATCH_SITE(pv_mmu_ops, read_cr2);
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
+#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
--
2.13.7
^ permalink raw reply related
* [PATCH 09/10] x86/paravirt: move the Xen-only pv_irq_ops under the PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
Some of the paravirt ops defined in pv_irq_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/irqflags.h | 38 ++++++++++++++++++-----------------
arch/x86/include/asm/paravirt.h | 2 --
arch/x86/include/asm/paravirt_types.h | 2 ++
arch/x86/kernel/paravirt.c | 2 ++
4 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 03bb451e4e6b..205e43e55144 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -88,24 +88,6 @@ static inline notrace void arch_local_irq_enable(void)
}
/*
- * Used in the idle loop; sti takes one instruction cycle
- * to complete:
- */
-static inline __cpuidle void arch_safe_halt(void)
-{
- native_safe_halt();
-}
-
-/*
- * Used when interrupts are already enabled or to
- * shutdown the processor:
- */
-static inline __cpuidle void halt(void)
-{
- native_halt();
-}
-
-/*
* For spinlocks, etc:
*/
static inline notrace unsigned long arch_local_irq_save(void)
@@ -154,6 +136,26 @@ static inline notrace unsigned long arch_local_irq_save(void)
#define INTERRUPT_RETURN iret
#endif
+#else
+
+/*
+ * Used in the idle loop; sti takes one instruction cycle
+ * to complete:
+ */
+static inline __cpuidle void arch_safe_halt(void)
+{
+ native_safe_halt();
+}
+
+/*
+ * Used when interrupts are already enabled or to
+ * shutdown the processor:
+ */
+static inline __cpuidle void halt(void)
+{
+ native_halt();
+}
+
#endif /* __ASSEMBLY__ */
#endif /* CONFIG_PARAVIRT_XXL */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index bc9a72a767c8..220c13d7e846 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -91,7 +91,6 @@ static inline void write_cr8(unsigned long x)
PVOP_VCALL1(pv_cpu_ops.write_cr8, x);
}
#endif
-#endif
static inline void arch_safe_halt(void)
{
@@ -103,7 +102,6 @@ static inline void halt(void)
PVOP_VCALL0(pv_irq_ops.halt);
}
-#ifdef CONFIG_PARAVIRT_XXL
static inline void wbinvd(void)
{
PVOP_VCALL0(pv_cpu_ops.wbinvd);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index be356aacc82c..938ac2bece81 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -197,8 +197,10 @@ struct pv_irq_ops {
struct paravirt_callee_save irq_disable;
struct paravirt_callee_save irq_enable;
+#ifdef CONFIG_PARAVIRT_XXL
void (*safe_halt)(void);
void (*halt)(void);
+#endif
} __no_randomize_layout;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 437be9454cab..19bfb3d2083f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -379,8 +379,10 @@ struct paravirt_patch_template pv_ops = {
.pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
.pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
.pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
+#ifdef CONFIG_PARAVIRT_XXL
.pv_irq_ops.safe_halt = native_safe_halt,
.pv_irq_ops.halt = native_halt,
+#endif
/* Mmu ops. */
.pv_mmu_ops.read_cr2 = native_read_cr2,
--
2.13.7
^ permalink raw reply related
* [PATCH 08/10] x86/paravirt: move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
Most of the paravirt ops defined in pv_cpu_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/debugreg.h | 2 +-
arch/x86/include/asm/desc.h | 4 ++--
arch/x86/include/asm/irqflags.h | 16 +++++++++++-----
arch/x86/include/asm/msr.h | 4 ++--
arch/x86/include/asm/paravirt.h | 15 +++++++++++++--
arch/x86/include/asm/paravirt_types.h | 5 ++++-
arch/x86/include/asm/pgtable.h | 6 ++++--
arch/x86/include/asm/processor.h | 4 ++--
arch/x86/include/asm/special_insns.h | 9 +++++++--
arch/x86/kernel/asm-offsets.c | 2 ++
arch/x86/kernel/asm-offsets_64.c | 2 ++
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/head_64.S | 2 ++
arch/x86/kernel/paravirt.c | 13 ++++++++++++-
arch/x86/kernel/paravirt_patch_32.c | 4 ++++
arch/x86/kernel/paravirt_patch_64.c | 6 +++++-
16 files changed, 74 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 4505ac2735ad..9e5ca30738e5 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -8,7 +8,7 @@
DECLARE_PER_CPU(unsigned long, cpu_dr7);
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_XXL
/*
* These special macros can be used to get or set a debugging register
*/
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 13c5ee878a47..68a99d2a5f33 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -108,7 +108,7 @@ static inline int desc_empty(const void *ptr)
return !(desc[0] | desc[1]);
}
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
#define load_TR_desc() native_load_tr_desc()
@@ -134,7 +134,7 @@ static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
static inline void paravirt_free_ldt(struct desc_struct *ldt, unsigned entries)
{
}
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_XXL */
#define store_ldt(ldt) asm("sldt %0" : "=m"(ldt))
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index b7a790d03229..03bb451e4e6b 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -120,6 +120,16 @@ static inline notrace unsigned long arch_local_irq_save(void)
#define DISABLE_INTERRUPTS(x) cli
#ifdef CONFIG_X86_64
+#ifdef CONFIG_DEBUG_ENTRY
+#define SAVE_FLAGS(x) pushfq; popq %rax
+#endif
+#endif
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
+#ifndef CONFIG_PARAVIRT_XXL
+#ifdef __ASSEMBLY__
+#ifdef CONFIG_X86_64
#define SWAPGS swapgs
/*
* Currently paravirt can't handle swapgs nicely when we
@@ -140,16 +150,12 @@ static inline notrace unsigned long arch_local_irq_save(void)
swapgs; \
sysretl
-#ifdef CONFIG_DEBUG_ENTRY
-#define SAVE_FLAGS(x) pushfq; popq %rax
-#endif
#else
#define INTERRUPT_RETURN iret
#endif
-
#endif /* __ASSEMBLY__ */
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_XXL */
#ifndef __ASSEMBLY__
static inline int arch_irqs_disabled_flags(unsigned long flags)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 04addd6e0a4a..91e4cf189914 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -242,7 +242,7 @@ static inline unsigned long long native_read_pmc(int counter)
return EAX_EDX_VAL(val, low, high);
}
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
#include <linux/errno.h>
@@ -305,7 +305,7 @@ do { \
#define rdpmcl(counter, val) ((val) = native_read_pmc(counter))
-#endif /* !CONFIG_PARAVIRT */
+#endif /* !CONFIG_PARAVIRT_XXL */
/*
* 64-bit version of wrmsr_safe():
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index afc0469979f7..bc9a72a767c8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,6 +17,7 @@
#include <linux/cpumask.h>
#include <asm/frame.h>
+#ifdef CONFIG_PARAVIRT_XXL
static inline void load_sp0(unsigned long sp0)
{
PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0);
@@ -51,6 +52,7 @@ static inline void write_cr0(unsigned long x)
{
PVOP_VCALL1(pv_cpu_ops.write_cr0, x);
}
+#endif
static inline unsigned long read_cr2(void)
{
@@ -72,6 +74,7 @@ static inline void write_cr3(unsigned long x)
PVOP_VCALL1(pv_mmu_ops.write_cr3, x);
}
+#ifdef CONFIG_PARAVIRT_XXL
static inline void __write_cr4(unsigned long x)
{
PVOP_VCALL1(pv_cpu_ops.write_cr4, x);
@@ -88,6 +91,7 @@ static inline void write_cr8(unsigned long x)
PVOP_VCALL1(pv_cpu_ops.write_cr8, x);
}
#endif
+#endif
static inline void arch_safe_halt(void)
{
@@ -99,14 +103,13 @@ static inline void halt(void)
PVOP_VCALL0(pv_irq_ops.halt);
}
+#ifdef CONFIG_PARAVIRT_XXL
static inline void wbinvd(void)
{
PVOP_VCALL0(pv_cpu_ops.wbinvd);
}
-#ifdef CONFIG_PARAVIRT_XXL
#define get_kernel_rpl() (pv_info.kernel_rpl)
-#endif
static inline u64 paravirt_read_msr(unsigned msr)
{
@@ -171,6 +174,7 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
*p = paravirt_read_msr_safe(msr, &err);
return err;
}
+#endif
static inline unsigned long long paravirt_sched_clock(void)
{
@@ -186,6 +190,7 @@ static inline u64 paravirt_steal_clock(int cpu)
return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
}
+#ifdef CONFIG_PARAVIRT_XXL
static inline unsigned long long paravirt_read_pmc(int counter)
{
return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter);
@@ -230,6 +235,7 @@ static inline unsigned long paravirt_store_tr(void)
{
return PVOP_CALL0(unsigned long, pv_cpu_ops.store_tr);
}
+
#define store_tr(tr) ((tr) = paravirt_store_tr())
static inline void load_TLS(struct thread_struct *t, unsigned cpu)
{
@@ -263,6 +269,7 @@ static inline void set_iopl_mask(unsigned mask)
{
PVOP_VCALL1(pv_cpu_ops.set_iopl_mask, mask);
}
+#endif
/* The paravirtualized I/O functions */
static inline void slow_down_io(void)
@@ -635,6 +642,7 @@ static inline void pmd_clear(pmd_t *pmdp)
}
#endif /* CONFIG_X86_PAE */
+#ifdef CONFIG_PARAVIRT_XXL
#define __HAVE_ARCH_START_CONTEXT_SWITCH
static inline void arch_start_context_switch(struct task_struct *prev)
{
@@ -645,6 +653,7 @@ static inline void arch_end_context_switch(struct task_struct *next)
{
PVOP_VCALL1(pv_cpu_ops.end_context_switch, next);
}
+#endif
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
static inline void arch_enter_lazy_mmu_mode(void)
@@ -887,10 +896,12 @@ extern void default_banner(void);
#define PARA_INDIRECT(addr) *%cs:addr
#endif
+#ifdef CONFIG_PARAVIRT_XXL
#define INTERRUPT_RETURN \
PARA_SITE(PARA_PATCH(PV_CPU_iret), \
ANNOTATE_RETPOLINE_SAFE; \
jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+#endif
#define DISABLE_INTERRUPTS(clobbers) \
PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable), \
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index f1bdc4c9ff4c..be356aacc82c 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -105,6 +105,9 @@ struct pv_time_ops {
struct pv_cpu_ops {
/* hooks for various privileged instructions */
+ void (*io_delay)(void);
+
+#ifdef CONFIG_PARAVIRT_XXL
unsigned long (*get_debugreg)(int regno);
void (*set_debugreg)(int regno, unsigned long value);
@@ -142,7 +145,6 @@ struct pv_cpu_ops {
void (*set_iopl_mask)(unsigned mask);
void (*wbinvd)(void);
- void (*io_delay)(void);
/* cpuid emulation, mostly so that caps bits can be disabled */
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
@@ -177,6 +179,7 @@ struct pv_cpu_ops {
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
+#endif
} __no_randomize_layout;
struct pv_irq_ops {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5715647fc4fe..9ea291fe7107 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -108,10 +108,12 @@ extern pmdval_t early_pmd_flags;
#define pte_val(x) native_pte_val(x)
#define __pte(x) native_make_pte(x)
-#define arch_end_context_switch(prev) do {} while(0)
-
#endif /* CONFIG_PARAVIRT */
+#ifndef CONFIG_PARAVIRT_XXL
+#define arch_end_context_switch(prev) do {} while(0)
+#endif /* CONFIG_PARAVIRT_XXL */
+
/*
* The following only work if pte_present() is true.
* Undefined behaviour if not..
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index cfd29ee8c3da..7a8fa57218c2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -571,7 +571,7 @@ static inline bool on_thread_stack(void)
current_stack_pointer) < THREAD_SIZE;
}
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
#define __cpuid native_cpuid
@@ -582,7 +582,7 @@ static inline void load_sp0(unsigned long sp0)
}
#define set_iopl_mask native_set_iopl_mask
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_XXL */
/* Free all resources held by a thread. */
extern void release_thread(struct task_struct *);
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 317fc59b512c..2aa6ce4bf159 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -143,8 +143,9 @@ static inline unsigned long __read_cr4(void)
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
-#else
+#endif
+#ifndef CONFIG_PARAVIRT_XXL
static inline unsigned long read_cr0(void)
{
return native_read_cr0();
@@ -154,7 +155,9 @@ static inline void write_cr0(unsigned long x)
{
native_write_cr0(x);
}
+#endif
+#ifndef CONFIG_PARAVIRT
static inline unsigned long read_cr2(void)
{
return native_read_cr2();
@@ -178,7 +181,9 @@ static inline void write_cr3(unsigned long x)
{
native_write_cr3(x);
}
+#endif
+#ifndef CONFIG_PARAVIRT_XXL
static inline void __write_cr4(unsigned long x)
{
native_write_cr4(x);
@@ -208,7 +213,7 @@ static inline void load_gs_index(unsigned selector)
#endif
-#endif/* CONFIG_PARAVIRT */
+#endif/* CONFIG_PARAVIRT_XXL */
static inline void clflush(volatile void *__p)
{
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index d0f0348209cb..429db3c8a0cc 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -70,7 +70,9 @@ void common(void) {
pv_irq_ops.irq_disable);
OFFSET(PV_IRQ_irq_enable, paravirt_patch_template,
pv_irq_ops.irq_enable);
+#ifdef CONFIG_PARAVIRT_XXL
OFFSET(PV_CPU_iret, paravirt_patch_template, pv_cpu_ops.iret);
+#endif
OFFSET(PV_MMU_read_cr2, paravirt_patch_template, pv_mmu_ops.read_cr2);
#endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 2add567c1b2a..c98d2935af94 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -21,9 +21,11 @@ static char syscalls_ia32[] = {
int main(void)
{
#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
OFFSET(PV_CPU_usergs_sysret64, paravirt_patch_template,
pv_cpu_ops.usergs_sysret64);
OFFSET(PV_CPU_swapgs, paravirt_patch_template, pv_cpu_ops.swapgs);
+#endif
#ifdef CONFIG_DEBUG_ENTRY
OFFSET(PV_IRQ_save_fl, paravirt_patch_template, pv_irq_ops.save_fl);
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3893df059174..c97f4e0ebad6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1222,7 +1222,7 @@ static void generic_identify(struct cpuinfo_x86 *c)
* ESPFIX issue, we can change this.
*/
#ifdef CONFIG_X86_32
-# ifdef CONFIG_PARAVIRT
+# ifdef CONFIG_PARAVIRT_XXL
do {
extern void native_iret(void);
if (pv_ops.pv_cpu_ops.iret == native_iret)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 8344dd2f310a..e11b96b2dc6b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -31,6 +31,8 @@
#define GET_CR2_INTO(reg) GET_CR2_INTO_RAX ; movq %rax, reg
#else
#define GET_CR2_INTO(reg) movq %cr2, reg
+#endif
+#ifndef CONFIG_PARAVIRT_XXL
#define INTERRUPT_RETURN iretq
#endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 168901f4dc09..437be9454cab 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -100,6 +100,7 @@ static unsigned paravirt_patch_call(void *insnbuf, const void *target,
return 5;
}
+#ifdef CONFIG_PARAVIRT_XXL
static unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
unsigned long addr, unsigned len)
{
@@ -118,6 +119,7 @@ static unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
return 5;
}
+#endif
DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -149,10 +151,12 @@ unsigned paravirt_patch_default(u8 type, void *insnbuf,
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insnbuf, len);
+#ifdef CONFIG_PARAVIRT_XXL
else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64))
/* If operation requires a jmp, then jmp */
ret = paravirt_patch_jmp(insnbuf, opfunc, addr, len);
+#endif
else
/* Otherwise call the function. */
ret = paravirt_patch_call(insnbuf, opfunc, addr, len);
@@ -261,6 +265,7 @@ void paravirt_flush_lazy_mmu(void)
preempt_enable();
}
+#ifdef CONFIG_PARAVIRT_XXL
void paravirt_start_context_switch(struct task_struct *prev)
{
BUG_ON(preemptible());
@@ -281,6 +286,7 @@ void paravirt_end_context_switch(struct task_struct *next)
if (test_and_clear_ti_thread_flag(task_thread_info(next), TIF_LAZY_MMU_UPDATES))
arch_enter_lazy_mmu_mode();
}
+#endif
enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
{
@@ -319,6 +325,9 @@ struct paravirt_patch_template pv_ops = {
.pv_time_ops.steal_clock = native_steal_clock,
/* Cpu ops. */
+ .pv_cpu_ops.io_delay = native_io_delay,
+
+#ifdef CONFIG_PARAVIRT_XXL
.pv_cpu_ops.cpuid = native_cpuid,
.pv_cpu_ops.get_debugreg = native_get_debugreg,
.pv_cpu_ops.set_debugreg = native_set_debugreg,
@@ -360,10 +369,10 @@ struct paravirt_patch_template pv_ops = {
.pv_cpu_ops.swapgs = native_swapgs,
.pv_cpu_ops.set_iopl_mask = native_set_iopl_mask,
- .pv_cpu_ops.io_delay = native_io_delay,
.pv_cpu_ops.start_context_switch = paravirt_nop,
.pv_cpu_ops.end_context_switch = paravirt_nop,
+#endif
/* Irq ops. */
.pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
@@ -462,10 +471,12 @@ struct paravirt_patch_template pv_ops = {
#endif
};
+#ifdef CONFIG_PARAVIRT_XXL
/* At this point, native_get/set_debugreg has real function entries */
NOKPROBE_SYMBOL(native_get_debugreg);
NOKPROBE_SYMBOL(native_set_debugreg);
NOKPROBE_SYMBOL(native_load_idt);
+#endif
EXPORT_SYMBOL_GPL(pv_ops);
EXPORT_SYMBOL_GPL(pv_info);
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index e5c3a438149e..704ed7718062 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -5,7 +5,9 @@ DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
DEF_NATIVE(pv_irq_ops, restore_fl, "push %eax; popf");
DEF_NATIVE(pv_irq_ops, save_fl, "pushf; pop %eax");
+#ifdef CONFIG_PARAVIRT_XXL
DEF_NATIVE(pv_cpu_ops, iret, "iret");
+#endif
DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
@@ -45,7 +47,9 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(pv_irq_ops, irq_enable);
PATCH_SITE(pv_irq_ops, restore_fl);
PATCH_SITE(pv_irq_ops, save_fl);
+#ifdef CONFIG_PARAVIRT_XXL
PATCH_SITE(pv_cpu_ops, iret);
+#endif
PATCH_SITE(pv_mmu_ops, read_cr2);
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 893ef87eb268..131a7eb01a13 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -10,10 +10,12 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
+#ifdef CONFIG_PARAVIRT_XXL
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq");
DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
+#endif
DEF_NATIVE(, mov32, "mov %edi, %eax");
DEF_NATIVE(, mov64, "mov %rdi, %rax");
@@ -53,12 +55,14 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(pv_irq_ops, save_fl);
PATCH_SITE(pv_irq_ops, irq_enable);
PATCH_SITE(pv_irq_ops, irq_disable);
+#ifdef CONFIG_PARAVIRT_XXL
PATCH_SITE(pv_cpu_ops, usergs_sysret64);
PATCH_SITE(pv_cpu_ops, swapgs);
+ PATCH_SITE(pv_cpu_ops, wbinvd);
+#endif
PATCH_SITE(pv_mmu_ops, read_cr2);
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
- PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
--
2.13.7
^ permalink raw reply related
* [PATCH 07/10] x86/paravirt: move items in pv_info under PARAVIRT_XXL umbrella
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
All items but name in pv_info are needed by Xen PV only. Define them
with CONFIG_PARAVIRT_XXL set only.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/paravirt.h | 2 ++
arch/x86/include/asm/paravirt_types.h | 2 ++
arch/x86/include/asm/pgtable-3level_types.h | 2 +-
arch/x86/include/asm/ptrace.h | 3 ++-
arch/x86/include/asm/segment.h | 2 +-
arch/x86/kernel/paravirt.c | 2 ++
6 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 436d270e622b..afc0469979f7 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -104,7 +104,9 @@ static inline void wbinvd(void)
PVOP_VCALL0(pv_cpu_ops.wbinvd);
}
+#ifdef CONFIG_PARAVIRT_XXL
#define get_kernel_rpl() (pv_info.kernel_rpl)
+#endif
static inline u64 paravirt_read_msr(unsigned msr)
{
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index ed024e90b863..f1bdc4c9ff4c 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -65,12 +65,14 @@ struct paravirt_callee_save {
/* general info */
struct pv_info {
+#ifdef CONFIG_PARAVIRT_XXL
unsigned int kernel_rpl;
int shared_kernel_pmd;
#ifdef CONFIG_X86_64
u16 extra_user_64bit_cs; /* __USER_CS if none */
#endif
+#endif
const char *name;
};
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 6a59a6d0cc50..1aa68ca1907c 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -20,7 +20,7 @@ typedef union {
} pte_t;
#endif /* !__ASSEMBLY__ */
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_XXL
#define SHARED_KERNEL_PMD (pv_info.shared_kernel_pmd)
#else
#define SHARED_KERNEL_PMD 1
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6de1fd3d0097..c9ac6ff5f7d2 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -144,7 +144,8 @@ static inline int v8086_mode(struct pt_regs *regs)
static inline bool user_64bit_mode(struct pt_regs *regs)
{
#ifdef CONFIG_X86_64
-#ifndef CONFIG_PARAVIRT
+/* Early boot code has CONFIG_PARAVIRT undefined! */
+#if !defined(CONFIG_PARAVIRT) || !defined(CONFIG_PARAVIRT_XXL)
/*
* On non-paravirt systems, this is the only long mode CPL 3
* selector. We do not allow long mode selectors in the LDT.
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index e293c122d0d5..0ffbe9519e68 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -211,7 +211,7 @@
#endif
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_XXL
# define get_kernel_rpl() 0
#endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 40ec68135f7a..168901f4dc09 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -292,12 +292,14 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
struct pv_info pv_info = {
.name = "bare hardware",
+#ifdef CONFIG_PARAVIRT_XXL
.kernel_rpl = 0,
.shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
#ifdef CONFIG_X86_64
.extra_user_64bit_cs = __USER_CS,
#endif
+#endif
};
#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
--
2.13.7
^ permalink raw reply related
* [PATCH 06/10] x86/paravirt: introduce new config option PARAVIRT_XXL
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
A large amount of paravirt ops is used by Xen PV guests only. Add a new
config option PARAVIRT_XXL which is selected by XEN_PV. Later we can
put the Xen PV only paravirt ops under the PARACVIRT_XXL umbrella.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/Kconfig | 3 +++
arch/x86/xen/Kconfig | 1 +
2 files changed, 4 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 887d3a7bb646..3c967b803c21 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -754,6 +754,9 @@ config PARAVIRT
over full virtualization. However, when run without a hypervisor
the kernel is theoretically slower and slightly larger.
+config PARAVIRT_XXL
+ bool
+
config PARAVIRT_DEBUG
bool "paravirt-ops debugging"
depends on PARAVIRT && DEBUG_KERNEL
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c1f98f32c45f..dd92d7bd3613 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -18,6 +18,7 @@ config XEN_PV
bool "Xen PV guest support"
default y
depends on XEN
+ select PARAVIRT_XXL
select XEN_HAVE_PVMMU
select XEN_HAVE_VPMU
help
--
2.13.7
^ permalink raw reply related
* [PATCH 05/10] x86/paravirt: remove unused paravirt bits
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
The macros ENABLE_INTERRUPTS_SYSEXIT, GET_CR0_INTO_EAX and
PARAVIRT_ADJUST_EXCEPTION_FRAME are used nowhere. Remove their
definitions.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/irqflags.h | 4 ----
arch/x86/include/asm/paravirt.h | 9 +--------
arch/x86/kernel/asm-offsets.c | 1 -
3 files changed, 1 insertion(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c4fc17220df9..b7a790d03229 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,8 +132,6 @@ static inline notrace unsigned long arch_local_irq_save(void)
*/
#define SWAPGS_UNSAFE_STACK swapgs
-#define PARAVIRT_ADJUST_EXCEPTION_FRAME /* */
-
#define INTERRUPT_RETURN jmp native_iret
#define USERGS_SYSRET64 \
swapgs; \
@@ -147,8 +145,6 @@ static inline notrace unsigned long arch_local_irq_save(void)
#endif
#else
#define INTERRUPT_RETURN iret
-#define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit
-#define GET_CR0_INTO_EAX movl %cr0, %eax
#endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1b86bb319393..436d270e622b 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -904,14 +904,7 @@ extern void default_banner(void);
call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-#ifdef CONFIG_X86_32
-#define GET_CR0_INTO_EAX \
- push %ecx; push %edx; \
- ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_ops+PV_CPU_read_cr0); \
- pop %edx; pop %ecx
-#else /* !CONFIG_X86_32 */
-
+#ifdef CONFIG_X86_64
/*
* If swapgs is used while the userspace stack is still current,
* there's no way to call a pvop. The PV replacement *must* be
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index bec9fc3498f8..d0f0348209cb 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -71,7 +71,6 @@ void common(void) {
OFFSET(PV_IRQ_irq_enable, paravirt_patch_template,
pv_irq_ops.irq_enable);
OFFSET(PV_CPU_iret, paravirt_patch_template, pv_cpu_ops.iret);
- OFFSET(PV_CPU_read_cr0, paravirt_patch_template, pv_cpu_ops.read_cr0);
OFFSET(PV_MMU_read_cr2, paravirt_patch_template, pv_mmu_ops.read_cr2);
#endif
--
2.13.7
^ permalink raw reply related
* [PATCH 04/10] x86/paravirt: use a single ops structure
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
Instead of using six globally visible paravirt ops structures combine
them in a single structure, keeping the original structures as
sub-structures.
This avoids the need to assemble struct paravirt_patch_template at
runtime on the stack each time apply_paravirt() is being called (i.e.
when loading a module).
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/hyperv/mmu.c | 4 +-
arch/x86/include/asm/paravirt.h | 51 ++++---
arch/x86/include/asm/paravirt_types.h | 13 +-
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/asm-offsets.c | 14 +-
arch/x86/kernel/asm-offsets_64.c | 7 +-
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/vmware.c | 4 +-
arch/x86/kernel/kvm.c | 18 ++-
arch/x86/kernel/kvmclock.c | 4 +-
arch/x86/kernel/paravirt-spinlocks.c | 15 +-
arch/x86/kernel/paravirt.c | 278 +++++++++++++++++-----------------
arch/x86/kernel/tsc.c | 2 +-
arch/x86/kernel/vsmp_64.c | 11 +-
arch/x86/xen/enlighten_pv.c | 32 ++--
arch/x86/xen/irq.c | 2 +-
arch/x86/xen/mmu_hvm.c | 2 +-
arch/x86/xen/mmu_pv.c | 28 ++--
arch/x86/xen/spinlock.c | 12 +-
arch/x86/xen/time.c | 4 +-
drivers/xen/time.c | 2 +-
21 files changed, 249 insertions(+), 258 deletions(-)
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index de27615c51ea..b34768cfb204 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -228,9 +228,9 @@ void hyperv_setup_mmu_ops(void)
if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED)) {
pr_info("Using hypercall for remote TLB flush\n");
- pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
+ pv_ops.pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others;
} else {
pr_info("Using ext hypercall for remote TLB flush\n");
- pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others_ex;
+ pv_ops.pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others_ex;
}
}
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 76b4b5c056f3..1b86bb319393 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -265,11 +265,11 @@ static inline void set_iopl_mask(unsigned mask)
/* The paravirtualized I/O functions */
static inline void slow_down_io(void)
{
- pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
#ifdef REALLY_SLOW_IO
- pv_cpu_ops.io_delay();
- pv_cpu_ops.io_delay();
- pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
+ pv_ops.pv_cpu_ops.io_delay();
#endif
}
@@ -432,7 +432,7 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm, unsigned long a
{
if (sizeof(pteval_t) > sizeof(long))
/* 5 arg words */
- pv_mmu_ops.ptep_modify_prot_commit(mm, addr, ptep, pte);
+ pv_ops.pv_mmu_ops.ptep_modify_prot_commit(mm, addr, ptep, pte);
else
PVOP_VCALL4(pv_mmu_ops.ptep_modify_prot_commit,
mm, addr, ptep, pte.pte);
@@ -453,7 +453,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
{
if (sizeof(pteval_t) > sizeof(long))
/* 5 arg words */
- pv_mmu_ops.set_pte_at(mm, addr, ptep, pte);
+ pv_ops.pv_mmu_ops.set_pte_at(mm, addr, ptep, pte);
else
PVOP_VCALL4(pv_mmu_ops.set_pte_at, mm, addr, ptep, pte.pte);
}
@@ -663,7 +663,7 @@ static inline void arch_flush_lazy_mmu_mode(void)
static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
phys_addr_t phys, pgprot_t flags)
{
- pv_mmu_ops.set_fixmap(idx, phys, flags);
+ pv_ops.pv_mmu_ops.set_fixmap(idx, phys, flags);
}
#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
@@ -694,6 +694,9 @@ static __always_inline bool pv_vcpu_is_preempted(long cpu)
return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
}
+void __raw_callee_save___native_queued_spin_unlock(struct qspinlock *lock);
+bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
+
#endif /* SMP && PARAVIRT_SPINLOCKS */
#ifdef CONFIG_X86_32
@@ -862,7 +865,7 @@ extern void default_banner(void);
COND_POP(set, CLBR_RCX, rcx); \
COND_POP(set, CLBR_RAX, rax)
-#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 8)
+#define PARA_PATCH(off) ((off) / 8)
#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .quad, 8)
#define PARA_INDIRECT(addr) *addr(%rip)
#else
@@ -877,35 +880,35 @@ extern void default_banner(void);
COND_POP(set, CLBR_EDI, edi); \
COND_POP(set, CLBR_EAX, eax)
-#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4)
+#define PARA_PATCH(off) ((off) / 4)
#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .long, 4)
#define PARA_INDIRECT(addr) *%cs:addr
#endif
#define INTERRUPT_RETURN \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), \
+ PARA_SITE(PARA_PATCH(PV_CPU_iret), \
ANNOTATE_RETPOLINE_SAFE; \
- jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_iret);)
+ jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
#define DISABLE_INTERRUPTS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), \
+ PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_disable); \
+ call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
#define ENABLE_INTERRUPTS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), \
+ PARA_SITE(PARA_PATCH(PV_IRQ_irq_enable), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable); \
+ call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
#ifdef CONFIG_X86_32
#define GET_CR0_INTO_EAX \
push %ecx; push %edx; \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_cpu_ops+PV_CPU_read_cr0); \
+ call PARA_INDIRECT(pv_ops+PV_CPU_read_cr0); \
pop %edx; pop %ecx
#else /* !CONFIG_X86_32 */
@@ -915,7 +918,7 @@ extern void default_banner(void);
* inlined, or the swapgs instruction must be trapped and emulated.
*/
#define SWAPGS_UNSAFE_STACK \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), swapgs)
+ PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs)
/*
* Note: swapgs is very special, and in practise is either going to be
@@ -924,26 +927,26 @@ extern void default_banner(void);
* it.
*/
#define SWAPGS \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), \
+ PARA_SITE(PARA_PATCH(PV_CPU_swapgs), \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs); \
+ call PARA_INDIRECT(pv_ops+PV_CPU_swapgs); \
)
#define GET_CR2_INTO_RAX \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_mmu_ops+PV_MMU_read_cr2);
+ call PARA_INDIRECT(pv_ops+PV_MMU_read_cr2);
#define USERGS_SYSRET64 \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \
+ PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64), \
ANNOTATE_RETPOLINE_SAFE; \
- jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64);)
+ jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
#ifdef CONFIG_DEBUG_ENTRY
#define SAVE_FLAGS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), \
+ PARA_SITE(PARA_PATCH(PV_IRQ_save_fl), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
ANNOTATE_RETPOLINE_SAFE; \
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \
+ call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
#endif
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index b900088cd244..ed024e90b863 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,19 +327,14 @@ struct paravirt_patch_template {
} __no_randomize_layout;
extern struct pv_info pv_info;
-extern struct pv_init_ops pv_init_ops;
-extern struct pv_time_ops pv_time_ops;
-extern struct pv_cpu_ops pv_cpu_ops;
-extern struct pv_irq_ops pv_irq_ops;
-extern struct pv_mmu_ops pv_mmu_ops;
-extern struct pv_lock_ops pv_lock_ops;
+extern struct paravirt_patch_template pv_ops;
#define PARAVIRT_PATCH(x) \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
#define paravirt_type(op) \
[paravirt_typenum] "i" (PARAVIRT_PATCH(op)), \
- [paravirt_opptr] "i" (&(op))
+ [paravirt_opptr] "i" (&(pv_ops.op))
#define paravirt_clobber(clobber) \
[paravirt_clobber] "i" (clobber)
@@ -500,9 +495,9 @@ int paravirt_disable_iospace(void);
#endif /* CONFIG_X86_32 */
#ifdef CONFIG_PARAVIRT_DEBUG
-#define PVOP_TEST_NULL(op) BUG_ON(op == NULL)
+#define PVOP_TEST_NULL(op) BUG_ON(pv_ops.op == NULL)
#else
-#define PVOP_TEST_NULL(op) ((void)op)
+#define PVOP_TEST_NULL(op) ((void)pv_ops.op)
#endif
#define PVOP_RETMASK(rettype) \
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 9729cee11149..7cfeda749382 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -594,7 +594,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insnbuf, p->instr, p->len);
- used = pv_init_ops.patch(p->instrtype, insnbuf,
+ used = pv_ops.pv_init_ops.patch(p->instrtype, insnbuf,
(unsigned long)p->instr, p->len);
BUG_ON(used > p->len);
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index dcb008c320fe..bec9fc3498f8 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -66,13 +66,13 @@ void common(void) {
#ifdef CONFIG_PARAVIRT
BLANK();
- OFFSET(PARAVIRT_PATCH_pv_cpu_ops, paravirt_patch_template, pv_cpu_ops);
- OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops);
- OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
- OFFSET(PV_IRQ_irq_enable, pv_irq_ops, irq_enable);
- OFFSET(PV_CPU_iret, pv_cpu_ops, iret);
- OFFSET(PV_CPU_read_cr0, pv_cpu_ops, read_cr0);
- OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2);
+ OFFSET(PV_IRQ_irq_disable, paravirt_patch_template,
+ pv_irq_ops.irq_disable);
+ OFFSET(PV_IRQ_irq_enable, paravirt_patch_template,
+ pv_irq_ops.irq_enable);
+ OFFSET(PV_CPU_iret, paravirt_patch_template, pv_cpu_ops.iret);
+ OFFSET(PV_CPU_read_cr0, paravirt_patch_template, pv_cpu_ops.read_cr0);
+ OFFSET(PV_MMU_read_cr2, paravirt_patch_template, pv_mmu_ops.read_cr2);
#endif
#ifdef CONFIG_XEN
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index b2dcd161f514..2add567c1b2a 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -21,10 +21,11 @@ static char syscalls_ia32[] = {
int main(void)
{
#ifdef CONFIG_PARAVIRT
- OFFSET(PV_CPU_usergs_sysret64, pv_cpu_ops, usergs_sysret64);
- OFFSET(PV_CPU_swapgs, pv_cpu_ops, swapgs);
+ OFFSET(PV_CPU_usergs_sysret64, paravirt_patch_template,
+ pv_cpu_ops.usergs_sysret64);
+ OFFSET(PV_CPU_swapgs, paravirt_patch_template, pv_cpu_ops.swapgs);
#ifdef CONFIG_DEBUG_ENTRY
- OFFSET(PV_IRQ_save_fl, pv_irq_ops, save_fl);
+ OFFSET(PV_IRQ_save_fl, paravirt_patch_template, pv_irq_ops.save_fl);
#endif
BLANK();
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eb4cb3efd20e..3893df059174 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1225,7 +1225,7 @@ static void generic_identify(struct cpuinfo_x86 *c)
# ifdef CONFIG_PARAVIRT
do {
extern void native_iret(void);
- if (pv_cpu_ops.iret == native_iret)
+ if (pv_ops.pv_cpu_ops.iret == native_iret)
set_cpu_bug(c, X86_BUG_ESPFIX);
} while (0);
# else
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 8e005329648b..f85302a308f2 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -97,14 +97,14 @@ static void __init vmware_sched_clock_setup(void)
d->cyc2ns_offset = mul_u64_u32_shr(tsc_now, d->cyc2ns_mul,
d->cyc2ns_shift);
- pv_time_ops.sched_clock = vmware_sched_clock;
+ pv_ops.pv_time_ops.sched_clock = vmware_sched_clock;
pr_info("using sched offset of %llu ns\n", d->cyc2ns_offset);
}
static void __init vmware_paravirt_ops_setup(void)
{
pv_info.name = "VMware hypervisor";
- pv_cpu_ops.io_delay = paravirt_nop;
+ pv_ops.pv_cpu_ops.io_delay = paravirt_nop;
if (vmware_tsc_khz && vmw_sched_clock)
vmware_sched_clock_setup();
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5b2300b818af..610da165aa26 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -292,7 +292,7 @@ static void __init paravirt_ops_setup(void)
pv_info.name = "KVM";
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
- pv_cpu_ops.io_delay = kvm_io_delay;
+ pv_ops.pv_cpu_ops.io_delay = kvm_io_delay;
#ifdef CONFIG_X86_IO_APIC
no_timer_check = 1;
@@ -549,13 +549,13 @@ static void __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
has_steal_clock = 1;
- pv_time_ops.steal_clock = kvm_steal_clock;
+ pv_ops.pv_time_ops.steal_clock = kvm_steal_clock;
}
if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
kvm_para_has_feature(KVM_FEATURE_STEAL_TIME))
- pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+ pv_ops.pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
@@ -749,13 +749,15 @@ void __init kvm_spinlock_init(void)
return;
__pv_init_lock_hash();
- pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
- pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
- pv_lock_ops.wait = kvm_wait;
- pv_lock_ops.kick = kvm_kick_cpu;
+ pv_ops.pv_lock_ops.queued_spin_lock_slowpath =
+ __pv_queued_spin_lock_slowpath;
+ pv_ops.pv_lock_ops.queued_spin_unlock =
+ PV_CALLEE_SAVE(__pv_queued_spin_unlock);
+ pv_ops.pv_lock_ops.wait = kvm_wait;
+ pv_ops.pv_lock_ops.kick = kvm_kick_cpu;
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
- pv_lock_ops.vcpu_is_preempted =
+ pv_ops.pv_lock_ops.vcpu_is_preempted =
PV_CALLEE_SAVE(__kvm_vcpu_is_preempted);
}
}
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 3b8e7c13c614..10a6071b209a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -104,13 +104,13 @@ static u64 kvm_sched_clock_read(void)
static inline void kvm_sched_clock_init(bool stable)
{
if (!stable) {
- pv_time_ops.sched_clock = kvm_clock_read;
+ pv_ops.pv_time_ops.sched_clock = kvm_clock_read;
clear_sched_clock_stable();
return;
}
kvm_sched_clock_offset = kvm_clock_read();
- pv_time_ops.sched_clock = kvm_sched_clock_read;
+ pv_ops.pv_time_ops.sched_clock = kvm_sched_clock_read;
printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
kvm_sched_clock_offset);
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 71f2d1125ec0..9569481cadb3 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -17,7 +17,7 @@ PV_CALLEE_SAVE_REGS_THUNK(__native_queued_spin_unlock);
bool pv_is_native_spin_unlock(void)
{
- return pv_lock_ops.queued_spin_unlock.func ==
+ return pv_ops.pv_lock_ops.queued_spin_unlock.func ==
__raw_callee_save___native_queued_spin_unlock;
}
@@ -29,17 +29,6 @@ PV_CALLEE_SAVE_REGS_THUNK(__native_vcpu_is_preempted);
bool pv_is_native_vcpu_is_preempted(void)
{
- return pv_lock_ops.vcpu_is_preempted.func ==
+ return pv_ops.pv_lock_ops.vcpu_is_preempted.func ==
__raw_callee_save___native_vcpu_is_preempted;
}
-
-struct pv_lock_ops pv_lock_ops = {
-#ifdef CONFIG_SMP
- .queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
- .queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
- .wait = paravirt_nop,
- .kick = paravirt_nop,
- .vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
-#endif /* SMP */
-};
-EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index f0c462fe2808..40ec68135f7a 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -127,29 +127,14 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
}
-/*
- * Neat trick to map patch type back to the call within the
- * corresponding structure.
- */
-static void *get_call_destination(u8 type)
-{
- struct paravirt_patch_template tmpl = {
- .pv_init_ops = pv_init_ops,
- .pv_time_ops = pv_time_ops,
- .pv_cpu_ops = pv_cpu_ops,
- .pv_irq_ops = pv_irq_ops,
- .pv_mmu_ops = pv_mmu_ops,
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
- .pv_lock_ops = pv_lock_ops,
-#endif
- };
- return *((void **)&tmpl + type);
-}
-
unsigned paravirt_patch_default(u8 type, void *insnbuf,
unsigned long addr, unsigned len)
{
- void *opfunc = get_call_destination(type);
+ /*
+ * Neat trick to map patch type back to the call within the
+ * corresponding structure.
+ */
+ void *opfunc = *((void **)&pv_ops + type);
unsigned ret;
if (opfunc == NULL)
@@ -315,77 +300,6 @@ struct pv_info pv_info = {
#endif
};
-struct pv_init_ops pv_init_ops = {
- .patch = native_patch,
-};
-
-struct pv_time_ops pv_time_ops = {
- .sched_clock = native_sched_clock,
- .steal_clock = native_steal_clock,
-};
-
-__visible struct pv_irq_ops pv_irq_ops = {
- .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
- .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
- .irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
- .irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
- .safe_halt = native_safe_halt,
- .halt = native_halt,
-};
-
-__visible struct pv_cpu_ops pv_cpu_ops = {
- .cpuid = native_cpuid,
- .get_debugreg = native_get_debugreg,
- .set_debugreg = native_set_debugreg,
- .read_cr0 = native_read_cr0,
- .write_cr0 = native_write_cr0,
- .write_cr4 = native_write_cr4,
-#ifdef CONFIG_X86_64
- .read_cr8 = native_read_cr8,
- .write_cr8 = native_write_cr8,
-#endif
- .wbinvd = native_wbinvd,
- .read_msr = native_read_msr,
- .write_msr = native_write_msr,
- .read_msr_safe = native_read_msr_safe,
- .write_msr_safe = native_write_msr_safe,
- .read_pmc = native_read_pmc,
- .load_tr_desc = native_load_tr_desc,
- .set_ldt = native_set_ldt,
- .load_gdt = native_load_gdt,
- .load_idt = native_load_idt,
- .store_tr = native_store_tr,
- .load_tls = native_load_tls,
-#ifdef CONFIG_X86_64
- .load_gs_index = native_load_gs_index,
-#endif
- .write_ldt_entry = native_write_ldt_entry,
- .write_gdt_entry = native_write_gdt_entry,
- .write_idt_entry = native_write_idt_entry,
-
- .alloc_ldt = paravirt_nop,
- .free_ldt = paravirt_nop,
-
- .load_sp0 = native_load_sp0,
-
-#ifdef CONFIG_X86_64
- .usergs_sysret64 = native_usergs_sysret64,
-#endif
- .iret = native_iret,
- .swapgs = native_swapgs,
-
- .set_iopl_mask = native_set_iopl_mask,
- .io_delay = native_io_delay,
-
- .start_context_switch = paravirt_nop,
- .end_context_switch = paravirt_nop,
-};
-
-/* At this point, native_get/set_debugreg has real function entries */
-NOKPROBE_SYMBOL(native_get_debugreg);
-NOKPROBE_SYMBOL(native_set_debugreg);
-NOKPROBE_SYMBOL(native_load_idt);
-
#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
/* 32-bit pagetable entries */
#define PTE_IDENT __PV_IS_CALLEE_SAVE(_paravirt_ident_32)
@@ -394,84 +308,162 @@ NOKPROBE_SYMBOL(native_load_idt);
#define PTE_IDENT __PV_IS_CALLEE_SAVE(_paravirt_ident_64)
#endif
-struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
-
- .read_cr2 = native_read_cr2,
- .write_cr2 = native_write_cr2,
- .read_cr3 = __native_read_cr3,
- .write_cr3 = native_write_cr3,
-
- .flush_tlb_user = native_flush_tlb,
- .flush_tlb_kernel = native_flush_tlb_global,
- .flush_tlb_one_user = native_flush_tlb_one_user,
- .flush_tlb_others = native_flush_tlb_others,
-
- .pgd_alloc = __paravirt_pgd_alloc,
- .pgd_free = paravirt_nop,
+struct paravirt_patch_template pv_ops = {
+ /* Init ops. */
+ .pv_init_ops.patch = native_patch,
+
+ /* Time ops. */
+ .pv_time_ops.sched_clock = native_sched_clock,
+ .pv_time_ops.steal_clock = native_steal_clock,
+
+ /* Cpu ops. */
+ .pv_cpu_ops.cpuid = native_cpuid,
+ .pv_cpu_ops.get_debugreg = native_get_debugreg,
+ .pv_cpu_ops.set_debugreg = native_set_debugreg,
+ .pv_cpu_ops.read_cr0 = native_read_cr0,
+ .pv_cpu_ops.write_cr0 = native_write_cr0,
+ .pv_cpu_ops.write_cr4 = native_write_cr4,
+#ifdef CONFIG_X86_64
+ .pv_cpu_ops.read_cr8 = native_read_cr8,
+ .pv_cpu_ops.write_cr8 = native_write_cr8,
+#endif
+ .pv_cpu_ops.wbinvd = native_wbinvd,
+ .pv_cpu_ops.read_msr = native_read_msr,
+ .pv_cpu_ops.write_msr = native_write_msr,
+ .pv_cpu_ops.read_msr_safe = native_read_msr_safe,
+ .pv_cpu_ops.write_msr_safe = native_write_msr_safe,
+ .pv_cpu_ops.read_pmc = native_read_pmc,
+ .pv_cpu_ops.load_tr_desc = native_load_tr_desc,
+ .pv_cpu_ops.set_ldt = native_set_ldt,
+ .pv_cpu_ops.load_gdt = native_load_gdt,
+ .pv_cpu_ops.load_idt = native_load_idt,
+ .pv_cpu_ops.store_tr = native_store_tr,
+ .pv_cpu_ops.load_tls = native_load_tls,
+#ifdef CONFIG_X86_64
+ .pv_cpu_ops.load_gs_index = native_load_gs_index,
+#endif
+ .pv_cpu_ops.write_ldt_entry = native_write_ldt_entry,
+ .pv_cpu_ops.write_gdt_entry = native_write_gdt_entry,
+ .pv_cpu_ops.write_idt_entry = native_write_idt_entry,
- .alloc_pte = paravirt_nop,
- .alloc_pmd = paravirt_nop,
- .alloc_pud = paravirt_nop,
- .alloc_p4d = paravirt_nop,
- .release_pte = paravirt_nop,
- .release_pmd = paravirt_nop,
- .release_pud = paravirt_nop,
- .release_p4d = paravirt_nop,
+ .pv_cpu_ops.alloc_ldt = paravirt_nop,
+ .pv_cpu_ops.free_ldt = paravirt_nop,
- .set_pte = native_set_pte,
- .set_pte_at = native_set_pte_at,
- .set_pmd = native_set_pmd,
+ .pv_cpu_ops.load_sp0 = native_load_sp0,
- .ptep_modify_prot_start = __ptep_modify_prot_start,
- .ptep_modify_prot_commit = __ptep_modify_prot_commit,
+#ifdef CONFIG_X86_64
+ .pv_cpu_ops.usergs_sysret64 = native_usergs_sysret64,
+#endif
+ .pv_cpu_ops.iret = native_iret,
+ .pv_cpu_ops.swapgs = native_swapgs,
+
+ .pv_cpu_ops.set_iopl_mask = native_set_iopl_mask,
+ .pv_cpu_ops.io_delay = native_io_delay,
+
+ .pv_cpu_ops.start_context_switch = paravirt_nop,
+ .pv_cpu_ops.end_context_switch = paravirt_nop,
+
+ /* Irq ops. */
+ .pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
+ .pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
+ .pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
+ .pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
+ .pv_irq_ops.safe_halt = native_safe_halt,
+ .pv_irq_ops.halt = native_halt,
+
+ /* Mmu ops. */
+ .pv_mmu_ops.read_cr2 = native_read_cr2,
+ .pv_mmu_ops.write_cr2 = native_write_cr2,
+ .pv_mmu_ops.read_cr3 = __native_read_cr3,
+ .pv_mmu_ops.write_cr3 = native_write_cr3,
+
+ .pv_mmu_ops.flush_tlb_user = native_flush_tlb,
+ .pv_mmu_ops.flush_tlb_kernel = native_flush_tlb_global,
+ .pv_mmu_ops.flush_tlb_one_user = native_flush_tlb_one_user,
+ .pv_mmu_ops.flush_tlb_others = native_flush_tlb_others,
+
+ .pv_mmu_ops.pgd_alloc = __paravirt_pgd_alloc,
+ .pv_mmu_ops.pgd_free = paravirt_nop,
+
+ .pv_mmu_ops.alloc_pte = paravirt_nop,
+ .pv_mmu_ops.alloc_pmd = paravirt_nop,
+ .pv_mmu_ops.alloc_pud = paravirt_nop,
+ .pv_mmu_ops.alloc_p4d = paravirt_nop,
+ .pv_mmu_ops.release_pte = paravirt_nop,
+ .pv_mmu_ops.release_pmd = paravirt_nop,
+ .pv_mmu_ops.release_pud = paravirt_nop,
+ .pv_mmu_ops.release_p4d = paravirt_nop,
+
+ .pv_mmu_ops.set_pte = native_set_pte,
+ .pv_mmu_ops.set_pte_at = native_set_pte_at,
+ .pv_mmu_ops.set_pmd = native_set_pmd,
+
+ .pv_mmu_ops.ptep_modify_prot_start = __ptep_modify_prot_start,
+ .pv_mmu_ops.ptep_modify_prot_commit = __ptep_modify_prot_commit,
#if CONFIG_PGTABLE_LEVELS >= 3
#ifdef CONFIG_X86_PAE
- .set_pte_atomic = native_set_pte_atomic,
- .pte_clear = native_pte_clear,
- .pmd_clear = native_pmd_clear,
+ .pv_mmu_ops.set_pte_atomic = native_set_pte_atomic,
+ .pv_mmu_ops.pte_clear = native_pte_clear,
+ .pv_mmu_ops.pmd_clear = native_pmd_clear,
#endif
- .set_pud = native_set_pud,
+ .pv_mmu_ops.set_pud = native_set_pud,
- .pmd_val = PTE_IDENT,
- .make_pmd = PTE_IDENT,
+ .pv_mmu_ops.pmd_val = PTE_IDENT,
+ .pv_mmu_ops.make_pmd = PTE_IDENT,
#if CONFIG_PGTABLE_LEVELS >= 4
- .pud_val = PTE_IDENT,
- .make_pud = PTE_IDENT,
+ .pv_mmu_ops.pud_val = PTE_IDENT,
+ .pv_mmu_ops.make_pud = PTE_IDENT,
- .set_p4d = native_set_p4d,
+ .pv_mmu_ops.set_p4d = native_set_p4d,
#if CONFIG_PGTABLE_LEVELS >= 5
- .p4d_val = PTE_IDENT,
- .make_p4d = PTE_IDENT,
+ .pv_mmu_ops.p4d_val = PTE_IDENT,
+ .pv_mmu_ops.make_p4d = PTE_IDENT,
- .set_pgd = native_set_pgd,
+ .pv_mmu_ops.set_pgd = native_set_pgd,
#endif /* CONFIG_PGTABLE_LEVELS >= 5 */
#endif /* CONFIG_PGTABLE_LEVELS >= 4 */
#endif /* CONFIG_PGTABLE_LEVELS >= 3 */
- .pte_val = PTE_IDENT,
- .pgd_val = PTE_IDENT,
+ .pv_mmu_ops.pte_val = PTE_IDENT,
+ .pv_mmu_ops.pgd_val = PTE_IDENT,
- .make_pte = PTE_IDENT,
- .make_pgd = PTE_IDENT,
+ .pv_mmu_ops.make_pte = PTE_IDENT,
+ .pv_mmu_ops.make_pgd = PTE_IDENT,
- .dup_mmap = paravirt_nop,
- .exit_mmap = paravirt_nop,
- .activate_mm = paravirt_nop,
+ .pv_mmu_ops.dup_mmap = paravirt_nop,
+ .pv_mmu_ops.exit_mmap = paravirt_nop,
+ .pv_mmu_ops.activate_mm = paravirt_nop,
- .lazy_mode = {
+ .pv_mmu_ops.lazy_mode = {
.enter = paravirt_nop,
.leave = paravirt_nop,
.flush = paravirt_nop,
},
- .set_fixmap = native_set_fixmap,
+ .pv_mmu_ops.set_fixmap = native_set_fixmap,
+
+#if defined(CONFIG_PARAVIRT_SPINLOCKS)
+ /* Lock ops. */
+#ifdef CONFIG_SMP
+ .pv_lock_ops.queued_spin_lock_slowpath =
+ native_queued_spin_lock_slowpath,
+ .pv_lock_ops.queued_spin_unlock =
+ PV_CALLEE_SAVE(__native_queued_spin_unlock),
+ .pv_lock_ops.wait = paravirt_nop,
+ .pv_lock_ops.kick = paravirt_nop,
+ .pv_lock_ops.vcpu_is_preempted =
+ PV_CALLEE_SAVE(__native_vcpu_is_preempted),
+#endif /* SMP */
+#endif
};
-EXPORT_SYMBOL_GPL(pv_time_ops);
-EXPORT_SYMBOL (pv_cpu_ops);
-EXPORT_SYMBOL (pv_mmu_ops);
+/* At this point, native_get/set_debugreg has real function entries */
+NOKPROBE_SYMBOL(native_get_debugreg);
+NOKPROBE_SYMBOL(native_set_debugreg);
+NOKPROBE_SYMBOL(native_load_idt);
+
+EXPORT_SYMBOL_GPL(pv_ops);
EXPORT_SYMBOL_GPL(pv_info);
-EXPORT_SYMBOL (pv_irq_ops);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 74392d9d51e0..42b936dd5846 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -230,7 +230,7 @@ unsigned long long sched_clock(void)
bool using_native_sched_clock(void)
{
- return pv_time_ops.sched_clock == native_sched_clock;
+ return pv_ops.pv_time_ops.sched_clock == native_sched_clock;
}
#else
unsigned long long
diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
index f194e5e1e95c..85425584f9a7 100644
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -111,11 +111,12 @@ static void __init set_vsmp_pv_ops(void)
if (cap & ctl & (1 << 4)) {
/* Setup irq ops and turn on vSMP IRQ fastpath handling */
- pv_irq_ops.irq_disable = PV_CALLEE_SAVE(vsmp_irq_disable);
- pv_irq_ops.irq_enable = PV_CALLEE_SAVE(vsmp_irq_enable);
- pv_irq_ops.save_fl = PV_CALLEE_SAVE(vsmp_save_fl);
- pv_irq_ops.restore_fl = PV_CALLEE_SAVE(vsmp_restore_fl);
- pv_init_ops.patch = vsmp_patch;
+ pv_ops.pv_irq_ops.irq_disable =
+ PV_CALLEE_SAVE(vsmp_irq_disable);
+ pv_ops.pv_irq_ops.irq_enable = PV_CALLEE_SAVE(vsmp_irq_enable);
+ pv_ops.pv_irq_ops.save_fl = PV_CALLEE_SAVE(vsmp_save_fl);
+ pv_ops.pv_irq_ops.restore_fl = PV_CALLEE_SAVE(vsmp_restore_fl);
+ pv_ops.pv_init_ops.patch = vsmp_patch;
ctl &= ~(1 << 4);
}
writel(ctl, address + 4);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 439a94bf89ad..dbb3a3b24cf8 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -998,11 +998,15 @@ void __ref xen_setup_vcpu_info_placement(void)
* percpu area for all cpus, so make use of it.
*/
if (xen_have_vcpu_info_placement) {
- pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
- pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
- pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
- pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(xen_irq_enable_direct);
- pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
+ pv_ops.pv_irq_ops.save_fl =
+ __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
+ pv_ops.pv_irq_ops.restore_fl =
+ __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
+ pv_ops.pv_irq_ops.irq_disable =
+ __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
+ pv_ops.pv_irq_ops.irq_enable =
+ __PV_IS_CALLEE_SAVE(xen_irq_enable_direct);
+ pv_ops.pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
}
}
@@ -1177,14 +1181,14 @@ static void __init xen_boot_params_init_edd(void)
*/
static void xen_setup_gdt(int cpu)
{
- pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
- pv_cpu_ops.load_gdt = xen_load_gdt_boot;
+ pv_ops.pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
+ pv_ops.pv_cpu_ops.load_gdt = xen_load_gdt_boot;
setup_stack_canary_segment(0);
switch_to_new_gdt(0);
- pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry;
- pv_cpu_ops.load_gdt = xen_load_gdt;
+ pv_ops.pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry;
+ pv_ops.pv_cpu_ops.load_gdt = xen_load_gdt;
}
static void __init xen_dom0_set_legacy_features(void)
@@ -1209,8 +1213,8 @@ asmlinkage __visible void __init xen_start_kernel(void)
/* Install Xen paravirt ops */
pv_info = xen_info;
- pv_init_ops.patch = paravirt_patch_default;
- pv_cpu_ops = xen_cpu_ops;
+ pv_ops.pv_init_ops.patch = paravirt_patch_default;
+ pv_ops.pv_cpu_ops = xen_cpu_ops;
xen_init_irq_ops();
/*
@@ -1274,8 +1278,10 @@ asmlinkage __visible void __init xen_start_kernel(void)
#endif
if (xen_feature(XENFEAT_mmu_pt_update_preserve_ad)) {
- pv_mmu_ops.ptep_modify_prot_start = xen_ptep_modify_prot_start;
- pv_mmu_ops.ptep_modify_prot_commit = xen_ptep_modify_prot_commit;
+ pv_ops.pv_mmu_ops.ptep_modify_prot_start =
+ xen_ptep_modify_prot_start;
+ pv_ops.pv_mmu_ops.ptep_modify_prot_commit =
+ xen_ptep_modify_prot_commit;
}
machine_ops = xen_machine_ops;
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 7515a19fd324..2df69ffc33d6 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -128,6 +128,6 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
void __init xen_init_irq_ops(void)
{
- pv_irq_ops = xen_irq_ops;
+ pv_ops.pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
}
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index dd2ad82eee80..5ef3ba3e748f 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -73,7 +73,7 @@ static int is_pagetable_dying_supported(void)
void __init xen_hvm_init_mmu_ops(void)
{
if (is_pagetable_dying_supported())
- pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
+ pv_ops.pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
#ifdef CONFIG_PROC_VMCORE
WARN_ON(register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram));
#endif
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index b7ec689320c7..9f46c67787bc 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2215,7 +2215,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
set_page_prot(initial_page_table, PAGE_KERNEL);
set_page_prot(initial_kernel_pmd, PAGE_KERNEL);
- pv_mmu_ops.write_cr3 = &xen_write_cr3;
+ pv_ops.pv_mmu_ops.write_cr3 = &xen_write_cr3;
}
/*
@@ -2364,27 +2364,27 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
static void __init xen_post_allocator_init(void)
{
- pv_mmu_ops.set_pte = xen_set_pte;
- pv_mmu_ops.set_pmd = xen_set_pmd;
- pv_mmu_ops.set_pud = xen_set_pud;
+ pv_ops.pv_mmu_ops.set_pte = xen_set_pte;
+ pv_ops.pv_mmu_ops.set_pmd = xen_set_pmd;
+ pv_ops.pv_mmu_ops.set_pud = xen_set_pud;
#ifdef CONFIG_X86_64
- pv_mmu_ops.set_p4d = xen_set_p4d;
+ pv_ops.pv_mmu_ops.set_p4d = xen_set_p4d;
#endif
/* This will work as long as patching hasn't happened yet
(which it hasn't) */
- pv_mmu_ops.alloc_pte = xen_alloc_pte;
- pv_mmu_ops.alloc_pmd = xen_alloc_pmd;
- pv_mmu_ops.release_pte = xen_release_pte;
- pv_mmu_ops.release_pmd = xen_release_pmd;
+ pv_ops.pv_mmu_ops.alloc_pte = xen_alloc_pte;
+ pv_ops.pv_mmu_ops.alloc_pmd = xen_alloc_pmd;
+ pv_ops.pv_mmu_ops.release_pte = xen_release_pte;
+ pv_ops.pv_mmu_ops.release_pmd = xen_release_pmd;
#ifdef CONFIG_X86_64
- pv_mmu_ops.alloc_pud = xen_alloc_pud;
- pv_mmu_ops.release_pud = xen_release_pud;
+ pv_ops.pv_mmu_ops.alloc_pud = xen_alloc_pud;
+ pv_ops.pv_mmu_ops.release_pud = xen_release_pud;
#endif
- pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
+ pv_ops.pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
#ifdef CONFIG_X86_64
- pv_mmu_ops.write_cr3 = &xen_write_cr3;
+ pv_ops.pv_mmu_ops.write_cr3 = &xen_write_cr3;
#endif
}
@@ -2471,7 +2471,7 @@ void __init xen_init_mmu_ops(void)
x86_init.paging.pagetable_init = xen_pagetable_init;
x86_init.hyper.init_after_bootmem = xen_after_bootmem;
- pv_mmu_ops = xen_mmu_ops;
+ pv_ops.pv_mmu_ops = xen_mmu_ops;
memset(dummy_mapping, 0xff, PAGE_SIZE);
}
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index cd97a62394e7..53b213af8c26 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -137,11 +137,13 @@ void __init xen_init_spinlocks(void)
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
__pv_init_lock_hash();
- pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
- pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
- pv_lock_ops.wait = xen_qlock_wait;
- pv_lock_ops.kick = xen_qlock_kick;
- pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
+ pv_ops.pv_lock_ops.queued_spin_lock_slowpath =
+ __pv_queued_spin_lock_slowpath;
+ pv_ops.pv_lock_ops.queued_spin_unlock =
+ PV_CALLEE_SAVE(__pv_queued_spin_unlock);
+ pv_ops.pv_lock_ops.wait = xen_qlock_wait;
+ pv_ops.pv_lock_ops.kick = xen_qlock_kick;
+ pv_ops.pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
}
static __init int xen_parse_nopvspin(char *arg)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index e0f1bcf01d63..66311c3e90b0 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -505,7 +505,7 @@ static void __init xen_time_init(void)
void __ref xen_init_time_ops(void)
{
- pv_time_ops = xen_time_ops;
+ pv_ops.pv_time_ops = xen_time_ops;
x86_init.timers.timer_init = xen_time_init;
x86_init.timers.setup_percpu_clockev = x86_init_noop;
@@ -547,7 +547,7 @@ void __init xen_hvm_init_time_ops(void)
return;
}
- pv_time_ops = xen_time_ops;
+ pv_ops.pv_time_ops = xen_time_ops;
x86_init.timers.setup_percpu_clockev = xen_time_init;
x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 3e741cd1409c..994fb1ae64b3 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -175,7 +175,7 @@ void __init xen_time_setup_guest(void)
xen_runstate_remote = !HYPERVISOR_vm_assist(VMASST_CMD_enable,
VMASST_TYPE_runstate_update_flag);
- pv_time_ops.steal_clock = xen_steal_clock;
+ pv_ops.pv_time_ops.steal_clock = xen_steal_clock;
static_key_slow_inc(¶virt_steal_enabled);
if (xen_runstate_remote)
--
2.13.7
^ permalink raw reply related
* [PATCH 03/10] x86/paravirt: remove clobbers from struct paravirt_patch_site
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
There is no need any longer to store the clobbers in struct
paravirt_patch_site. Remove clobbers from the struct and from the
related macros.
While at it fix some lines longer than 80 characters.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/paravirt.h | 33 +++++++++++++++------------------
arch/x86/include/asm/paravirt_types.h | 1 -
2 files changed, 15 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d49bbf4bb5c8..76b4b5c056f3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -822,7 +822,7 @@ extern void default_banner(void);
#else /* __ASSEMBLY__ */
-#define _PVSITE(ptype, clobbers, ops, word, algn) \
+#define _PVSITE(ptype, ops, word, algn) \
771:; \
ops; \
772:; \
@@ -831,7 +831,6 @@ extern void default_banner(void);
word 771b; \
.byte ptype; \
.byte 772b-771b; \
- .short clobbers; \
.popsection
@@ -864,7 +863,7 @@ extern void default_banner(void);
COND_POP(set, CLBR_RAX, rax)
#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 8)
-#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .quad, 8)
+#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .quad, 8)
#define PARA_INDIRECT(addr) *addr(%rip)
#else
#define PV_SAVE_REGS(set) \
@@ -879,26 +878,26 @@ extern void default_banner(void);
COND_POP(set, CLBR_EAX, eax)
#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4)
-#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .long, 4)
+#define PARA_SITE(ptype, ops) _PVSITE(ptype, ops, .long, 4)
#define PARA_INDIRECT(addr) *%cs:addr
#endif
#define INTERRUPT_RETURN \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE, \
- ANNOTATE_RETPOLINE_SAFE; \
+ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), \
+ ANNOTATE_RETPOLINE_SAFE; \
jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_iret);)
#define DISABLE_INTERRUPTS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
+ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
- ANNOTATE_RETPOLINE_SAFE; \
+ ANNOTATE_RETPOLINE_SAFE; \
call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_disable); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
#define ENABLE_INTERRUPTS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), clobbers, \
+ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
- ANNOTATE_RETPOLINE_SAFE; \
+ ANNOTATE_RETPOLINE_SAFE; \
call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
@@ -916,8 +915,7 @@ extern void default_banner(void);
* inlined, or the swapgs instruction must be trapped and emulated.
*/
#define SWAPGS_UNSAFE_STACK \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \
- swapgs)
+ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), swapgs)
/*
* Note: swapgs is very special, and in practise is either going to be
@@ -926,8 +924,8 @@ extern void default_banner(void);
* it.
*/
#define SWAPGS \
- PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \
- ANNOTATE_RETPOLINE_SAFE; \
+ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), \
+ ANNOTATE_RETPOLINE_SAFE; \
call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs); \
)
@@ -937,15 +935,14 @@ extern void default_banner(void);
#define USERGS_SYSRET64 \
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \
- CLBR_NONE, \
- ANNOTATE_RETPOLINE_SAFE; \
+ ANNOTATE_RETPOLINE_SAFE; \
jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64);)
#ifdef CONFIG_DEBUG_ENTRY
#define SAVE_FLAGS(clobbers) \
- PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), clobbers, \
+ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), \
PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
- ANNOTATE_RETPOLINE_SAFE; \
+ ANNOTATE_RETPOLINE_SAFE; \
call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \
PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
#endif
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index f6e24e78633b..b900088cd244 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -678,7 +678,6 @@ struct paravirt_patch_site {
u8 *instr; /* original instructions */
u8 instrtype; /* type of this instruction */
u8 len; /* length of original instruction */
- u16 clobbers; /* what registers you may clobber */
};
extern struct paravirt_patch_site __parainstructions[],
--
2.13.7
^ permalink raw reply related
* [PATCH 02/10] x86/paravirt: remove clobbers parameter from paravirt patch functions
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
The clobbers parameter from paravirt_patch_default() et al isn't used
any longer. Remove it.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/paravirt_types.h | 7 +++----
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/paravirt.c | 14 +++++---------
arch/x86/kernel/paravirt_patch_32.c | 5 ++---
arch/x86/kernel/paravirt_patch_64.c | 5 ++---
arch/x86/kernel/vsmp_64.c | 6 +++---
6 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 036b2f88f105..f6e24e78633b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -84,7 +84,7 @@ struct pv_init_ops {
* the number of bytes of code generated, as we nop pad the
* rest in generic code.
*/
- unsigned (*patch)(u8 type, u16 clobber, void *insnbuf,
+ unsigned (*patch)(u8 type, void *insnbuf,
unsigned long addr, unsigned len);
} __no_randomize_layout;
@@ -370,14 +370,13 @@ extern struct pv_lock_ops pv_lock_ops;
unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
-unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
+unsigned paravirt_patch_default(u8 type, void *insnbuf,
unsigned long addr, unsigned len);
unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
const char *start, const char *end);
-unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
- unsigned long addr, unsigned len);
+unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len);
int paravirt_disable_iospace(void);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a481763a3776..9729cee11149 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -594,7 +594,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insnbuf, p->instr, p->len);
- used = pv_init_ops.patch(p->instrtype, p->clobbers, insnbuf,
+ used = pv_init_ops.patch(p->instrtype, insnbuf,
(unsigned long)p->instr, p->len);
BUG_ON(used > p->len);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index ce560b916b1f..f0c462fe2808 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -80,10 +80,8 @@ struct branch {
u32 delta;
} __attribute__((packed));
-static unsigned paravirt_patch_call(void *insnbuf,
- const void *target, u16 tgt_clobbers,
- unsigned long addr, u16 site_clobbers,
- unsigned len)
+static unsigned paravirt_patch_call(void *insnbuf, const void *target,
+ unsigned long addr, unsigned len)
{
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);
@@ -148,7 +146,7 @@ static void *get_call_destination(u8 type)
return *((void **)&tmpl + type);
}
-unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
+unsigned paravirt_patch_default(u8 type, void *insnbuf,
unsigned long addr, unsigned len)
{
void *opfunc = get_call_destination(type);
@@ -171,10 +169,8 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
/* If operation requires a jmp, then jmp */
ret = paravirt_patch_jmp(insnbuf, opfunc, addr, len);
else
- /* Otherwise call the function; assume target could
- clobber any caller-save reg */
- ret = paravirt_patch_call(insnbuf, opfunc, CLBR_ANY,
- addr, clobbers, len);
+ /* Otherwise call the function. */
+ ret = paravirt_patch_call(insnbuf, opfunc, addr, len);
return ret;
}
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index 758e69d72ebf..e5c3a438149e 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -30,8 +30,7 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
extern bool pv_is_native_spin_unlock(void);
extern bool pv_is_native_vcpu_is_preempted(void);
-unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
- unsigned long addr, unsigned len)
+unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
{
const unsigned char *start, *end;
unsigned ret;
@@ -70,7 +69,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
default:
patch_default: __maybe_unused
- ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
+ ret = paravirt_patch_default(type, ibuf, addr, len);
break;
patch_site:
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 9edadabf04f6..893ef87eb268 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -38,8 +38,7 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
extern bool pv_is_native_spin_unlock(void);
extern bool pv_is_native_vcpu_is_preempted(void);
-unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
- unsigned long addr, unsigned len)
+unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
{
const unsigned char *start, *end;
unsigned ret;
@@ -80,7 +79,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
default:
patch_default: __maybe_unused
- ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
+ ret = paravirt_patch_default(type, ibuf, addr, len);
break;
patch_site:
diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
index 44685fb2a192..f194e5e1e95c 100644
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -69,7 +69,7 @@ asmlinkage __visible void vsmp_irq_enable(void)
}
PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_enable);
-static unsigned __init vsmp_patch(u8 type, u16 clobbers, void *ibuf,
+static unsigned __init vsmp_patch(u8 type, void *ibuf,
unsigned long addr, unsigned len)
{
switch (type) {
@@ -77,9 +77,9 @@ static unsigned __init vsmp_patch(u8 type, u16 clobbers, void *ibuf,
case PARAVIRT_PATCH(pv_irq_ops.irq_disable):
case PARAVIRT_PATCH(pv_irq_ops.save_fl):
case PARAVIRT_PATCH(pv_irq_ops.restore_fl):
- return paravirt_patch_default(type, clobbers, ibuf, addr, len);
+ return paravirt_patch_default(type, ibuf, addr, len);
default:
- return native_patch(type, clobbers, ibuf, addr, len);
+ return native_patch(type, ibuf, addr, len);
}
}
--
2.13.7
^ permalink raw reply related
* [PATCH 01/10] x86/paravirt: make paravirt_patch_call() and paravirt_patch_jmp() static
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
In-Reply-To: <20180810115252.18213-1-jgross@suse.com>
paravirt_patch_call() and paravirt_patch_jmp() are used in paravirt.c
only. Convert them to static.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/paravirt_types.h | 6 ------
arch/x86/kernel/paravirt.c | 12 ++++++------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 180bc0bff0fb..036b2f88f105 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -370,12 +370,6 @@ extern struct pv_lock_ops pv_lock_ops;
unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
-unsigned paravirt_patch_call(void *insnbuf,
- const void *target, u16 tgt_clobbers,
- unsigned long addr, u16 site_clobbers,
- unsigned len);
-unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
- unsigned long addr, unsigned len);
unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
unsigned long addr, unsigned len);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 930c88341e4e..ce560b916b1f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -80,10 +80,10 @@ struct branch {
u32 delta;
} __attribute__((packed));
-unsigned paravirt_patch_call(void *insnbuf,
- const void *target, u16 tgt_clobbers,
- unsigned long addr, u16 site_clobbers,
- unsigned len)
+static unsigned paravirt_patch_call(void *insnbuf,
+ const void *target, u16 tgt_clobbers,
+ unsigned long addr, u16 site_clobbers,
+ unsigned len)
{
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);
@@ -102,8 +102,8 @@ unsigned paravirt_patch_call(void *insnbuf,
return 5;
}
-unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
- unsigned long addr, unsigned len)
+static unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
+ unsigned long addr, unsigned len)
{
struct branch *b = insnbuf;
unsigned long delta = (unsigned long)target - (addr+5);
--
2.13.7
^ permalink raw reply related
* [PATCH 00/10] x86/paravirt: several cleanups
From: Juergen Gross @ 2018-08-10 11:52 UTC (permalink / raw)
To: linux-kernel, xen-devel, x86, virtualization
Cc: Juergen Gross, boris.ostrovsky, rusty, mingo, hpa, akataria, tglx
This series removes some no longer needed stuff from paravirt
infrastructure and puts large quantities of paravirt ops under a new
config option PARAVIRT_XXL which is selected by XEN_PV only.
A pvops kernel without XEN_PV being configured is about 2.5% smaller
with this series applied.
tip commit 5800dc5c19f34e6e03b5adab1282535cb102fafd ("x86/paravirt:
Fix spectre-v2 mitigations for paravirt guests") is a prerequisite
for this series.
The last 4 patches of this series require my Xen cleanup series
https://lore.kernel.org/lkml/20180717120113.12756-1-jgross@suse.com/
which hides more Xen PV-only code behind CONFIG_XEN_PV.
Juergen Gross (10):
x86/paravirt: make paravirt_patch_call() and paravirt_patch_jmp()
static
x86/paravirt: remove clobbers parameter from paravirt patch functions
x86/paravirt: remove clobbers from struct paravirt_patch_site
x86/paravirt: use a single ops structure
x86/paravirt: remove unused paravirt bits
x86/paravirt: introduce new config option PARAVIRT_XXL
x86/paravirt: move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: move the Xen-only pv_cpu_ops under the PARAVIRT_XXL
umbrella
x86/paravirt: move the Xen-only pv_irq_ops under the PARAVIRT_XXL
umbrella
x86/paravirt: move the Xen-only pv_mmu_ops under the PARAVIRT_XXL
umbrella
arch/x86/Kconfig | 3 +
arch/x86/hyperv/mmu.c | 4 +-
arch/x86/include/asm/debugreg.h | 2 +-
arch/x86/include/asm/desc.h | 4 +-
arch/x86/include/asm/fixmap.h | 2 +-
arch/x86/include/asm/irqflags.h | 56 +++---
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/include/asm/msr.h | 4 +-
arch/x86/include/asm/paravirt.h | 183 +++++++++---------
arch/x86/include/asm/paravirt_types.h | 65 +++----
arch/x86/include/asm/pgalloc.h | 2 +-
arch/x86/include/asm/pgtable-3level_types.h | 2 +-
arch/x86/include/asm/pgtable.h | 7 +-
arch/x86/include/asm/processor.h | 4 +-
arch/x86/include/asm/ptrace.h | 3 +-
arch/x86/include/asm/segment.h | 2 +-
arch/x86/include/asm/special_insns.h | 4 +-
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/asm-offsets.c | 15 +-
arch/x86/kernel/asm-offsets_64.c | 9 +-
arch/x86/kernel/cpu/common.c | 4 +-
arch/x86/kernel/cpu/vmware.c | 4 +-
arch/x86/kernel/head_64.S | 2 +-
arch/x86/kernel/kvm.c | 18 +-
arch/x86/kernel/kvmclock.c | 4 +-
arch/x86/kernel/paravirt-spinlocks.c | 15 +-
arch/x86/kernel/paravirt.c | 290 ++++++++++++++--------------
arch/x86/kernel/paravirt_patch_32.c | 9 +-
arch/x86/kernel/paravirt_patch_64.c | 11 +-
arch/x86/kernel/tsc.c | 2 +-
arch/x86/kernel/vsmp_64.c | 17 +-
arch/x86/xen/Kconfig | 1 +
arch/x86/xen/enlighten_pv.c | 32 +--
arch/x86/xen/irq.c | 2 +-
arch/x86/xen/mmu_hvm.c | 2 +-
arch/x86/xen/mmu_pv.c | 28 +--
arch/x86/xen/spinlock.c | 12 +-
arch/x86/xen/time.c | 4 +-
drivers/xen/time.c | 2 +-
39 files changed, 430 insertions(+), 406 deletions(-)
--
2.13.7
^ permalink raw reply
* Re: [PATCH] drm: qxl: Fix error handling at qxl_device_init
From: Gerd Hoffmann @ 2018-08-10 6:03 UTC (permalink / raw)
To: Anton Vasilyev
Cc: ldv-project, David Airlie, linux-kernel, dri-devel,
virtualization, Dave Airlie
In-Reply-To: <20180727115440.11112-1-vasilyev@ispras.ru>
On Fri, Jul 27, 2018 at 02:54:40PM +0300, Anton Vasilyev wrote:
> If qxl_device_init fails on creating resources and does not report it,
> then qxl module will catch null pointer exception on remove, or on
> probe's error path.
>
> The patch adds error path with resources release into qxl_device_init.
>
> Found by Linux Driver Verification project (linuxtesting.org).
Pushed to drm-misc-next.
thanks,
Gerd
^ permalink raw reply
* Re: [PATCH] drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
From: Gerd Hoffmann @ 2018-08-10 6:03 UTC (permalink / raw)
To: Thomas Zimmermann; +Cc: airlied, dri-devel, virtualization
In-Reply-To: <20180720112743.27159-1-tzimmermann@suse.de>
On Fri, Jul 20, 2018 at 01:27:43PM +0200, Thomas Zimmermann wrote:
> In the Cirrus driver, the regular clean-up code also performs the clean-up
> of a failed initialization. If the fbdev's framebuffer was not initialized,
> the clean-up will fail within drm_framebuffer_unregister_private. Booting
> with cirrus.bpp=16 triggers this bug.
>
> The framebuffer is currently stored directly within struct cirrus_fbdev. To
> fix the bug, we turn it into a pointer that is only set for initialized
> framebuffers. The fbdev's clean-up code skips uninitialized framebuffers.
>
> The memory for struct drm_framebuffer is allocated dynamically. This requires
> additional error handling within cirrusfb_create. The framebuffer clean-up is
> now performed by drm_framebuffer_put, which also frees the data strcuture's
> memory.
pushed to drm-misc-next (also the other ones, except the failing ttm_put
patches).
thanks,
Gerd
^ permalink raw reply
* Re: [PATCH] drm/qxl: Replace ttm_bo_unref with ttm_bo_put
From: Gerd Hoffmann @ 2018-08-09 15:30 UTC (permalink / raw)
To: Thomas Zimmermann; +Cc: airlied, dri-devel, virtualization
In-Reply-To: <20180731063559.11629-1-tzimmermann@suse.de>
> diff --git a/drivers/gpu/drm/qxl/qxl_gem.c b/drivers/gpu/drm/qxl/qxl_gem.c
> index f5c1e7872e92..89606c819d82 100644
> --- a/drivers/gpu/drm/qxl/qxl_gem.c
> +++ b/drivers/gpu/drm/qxl/qxl_gem.c
> @@ -40,7 +40,7 @@ void qxl_gem_object_free(struct drm_gem_object *gobj)
> qxl_surface_evict(qdev, qobj, false);
>
> tbo = &qobj->tbo;
> - ttm_bo_unref(&tbo);
> + ttm_bo_put(tbo);
Same here (using drm-misc-next btw).
cheers,
Gerd
^ permalink raw reply
* Re: [PATCH] drm/cirrus: Replace ttm_bo_unref with ttm_bo_put
From: Gerd Hoffmann @ 2018-08-09 15:29 UTC (permalink / raw)
To: Thomas Zimmermann; +Cc: airlied, dri-devel, virtualization
In-Reply-To: <20180731063128.11041-1-tzimmermann@suse.de>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
> drivers/gpu/drm/cirrus/cirrus_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/cirrus/cirrus_main.c b/drivers/gpu/drm/cirrus/cirrus_main.c
> index 60d54e10a34d..57f8fe6d020b 100644
> --- a/drivers/gpu/drm/cirrus/cirrus_main.c
> +++ b/drivers/gpu/drm/cirrus/cirrus_main.c
> @@ -269,7 +269,7 @@ static void cirrus_bo_unref(struct cirrus_bo **bo)
> return;
>
> tbo = &((*bo)->bo);
> - ttm_bo_unref(&tbo);
> + ttm_bo_put(tbo);
Fails to build too.
cheers,
Gerd
^ permalink raw reply
* Re: [PATCH] drm/bochs: Replace ttm_bo_unref with ttm_bo_put
From: Gerd Hoffmann @ 2018-08-09 15:27 UTC (permalink / raw)
To: Thomas Zimmermann; +Cc: dri-devel, virtualization
In-Reply-To: <20180731062851.10812-1-tzimmermann@suse.de>
> diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
> index 39cd08416773..c9c7097030ca 100644
> --- a/drivers/gpu/drm/bochs/bochs_mm.c
> +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> @@ -430,7 +430,7 @@ static void bochs_bo_unref(struct bochs_bo **bo)
> return;
>
> tbo = &((*bo)->bo);
> - ttm_bo_unref(&tbo);
> + ttm_bo_put(tbo);
fails to build:
CC [M] drivers/gpu/drm/bochs/bochs_mm.o
/home/kraxel/projects/linux/drivers/gpu/drm/bochs/bochs_mm.c: In function ‘bochs_bo_unref’:
/home/kraxel/projects/linux/drivers/gpu/drm/bochs/bochs_mm.c:433:2: error: implicit declaration of function ‘ttm_bo_put’ [-Werror=implicit-function-declaration]
ttm_bo_put(tbo);
^
cc1: some warnings being treated as errors
cheers,
Gerd
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-09 5:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, Christoph Hellwig,
jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy,
david, linuxppc-dev, elfring, haren, Anshuman Khandual
In-Reply-To: <98eb367ce322ad84baa31e3c7beffc4a42be8458.camel@kernel.crashing.org>
On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote:
> > > - if (xen_domain())
> > > + if (xen_domain() || pseries_secure_vm())
> > > return true;
> >
> > I don't think it's pseries specific actually. E.g. I suspect AMD SEV
> > might benefit from the same kind of hack.
>
> As long as they can provide the same guarantee that the DMA ops are
> completely equivalent between virtio and other PCI devices, at least on
> the same bus, ie, we don't have to go hack special DMA ops.
>
> I think the latter is really what Christoph wants to avoid for good
> reasons.
Yes. I also generally want to avoid too much arch specific magic.
FYI, I'm off to a week-long vacation today, don't expect quick replies.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-09 2:00 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram,
virtualization, Christoph Hellwig, jean-philippe.brucker, paulus,
marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring,
haren, Anshuman Khandual
In-Reply-To: <98eb367ce322ad84baa31e3c7beffc4a42be8458.camel@kernel.crashing.org>
On Thu, 2018-08-09 at 08:13 +1000, Benjamin Herrenschmidt wrote:
> > For completeness, virtio could also have its own bounce buffer
> > outside of DMA API one. I don't see lots of benefits to this
> > though.
>
> Not fan of that either...
To elaborate a bit ...
For our secure VMs, we will need bounce buffering for everything
anyway. virtio, emulated PCI, or vfio.
By ensuring that we create an identity mapping in the IOMMU for
the bounce buffering pool, we enable virtio "legacy/direct" to
use the same mapping ops as things using the iommu.
That said, we still need somewhere in arch/powerpc a set of dma
ops which we'll attach to all PCI devices of a secure VM to force
bouncing always, rather than just based on address (which is what
the standard swiotlb ones do)... Unless we can tweak the swiotlb
"threshold" for example by using an empty mask.
We'll need the same set of DMA ops for VIO devices too, not just PCI.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-08 22:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram,
virtualization, Christoph Hellwig, jean-philippe.brucker, paulus,
marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring,
haren, Anshuman Khandual
In-Reply-To: <20180808232210-mutt-send-email-mst@kernel.org>
On Wed, 2018-08-08 at 23:31 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> > Sure, but all of this is just the configuration of the iommu. But I
> > think we agree here, and your point remains valid, indeed my proposed
> > hack:
> >
> > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
> >
> > Will only work if the IOMMU and non-IOMMU path are completely equivalent.
> >
> > We can provide that guarantee for our secure VM case, but not generally so if
> > we were to go down the route of a quirk in virtio, it might be better to
> > make it painfully obvious that it's specific to that one case with a different
> > kind of turd:
> >
> > - if (xen_domain())
> > + if (xen_domain() || pseries_secure_vm())
> > return true;
>
> I don't think it's pseries specific actually. E.g. I suspect AMD SEV
> might benefit from the same kind of hack.
As long as they can provide the same guarantee that the DMA ops are
completely equivalent between virtio and other PCI devices, at least on
the same bus, ie, we don't have to go hack special DMA ops.
I think the latter is really what Christoph wants to avoid for good
reasons.
> > So to summarize, and make sure I'm not missing something, the two approaches
> > at hand are either:
> >
> > 1- The above, which is a one liner and contained in the guest, so that's nice, but
> > also means another turd in virtio which isn't ...
> >
> > 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
> > architecture on our side that will force virtio to always go through an emulated
> > iommu, as pseries doesn't have the concept of a real bypass window, and thus will
> > impact performance for both secure and non-secure VMs.
> >
> > 3- Invent a property that can be put in selected PCI device tree nodes that
> > indicates that for that device specifically, the iommu can be bypassed, along with
> > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
> > but its DT nodes would also have that property and Linux would notice it and turn
> > bypass on.
>
> For completeness, virtio could also have its own bounce buffer
> outside of DMA API one. I don't see lots of benefits to this
> though.
Not fan of that either...
> > The resulting properties of those options are:
> >
> > 1- Is what I want because it's the simplest, provides the best performance now,
> > and works without code changes to qemu or non-secure Linux. However it does
> > add a tiny turd to virtio which is annoying.
> >
> > 2- This works but it puts the iommu in the way always, thus reducing virtio performance
> > accross the board for pseries unless we only do that for secure VMs but that is
> > difficult (as discussed earlier).
> >
> > 3- This would recover the performance lost in -2-, however it requires qemu *and*
> > guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
> > performance hit of -2- unless modified to call that 'enable bypass' call, which
> > isn't great.
> >
> > So imho we have to chose one of 3 not-great solutions here... Unless I missed
> > something in your ideas of course.
> >
^ permalink raw reply
* Re: [PATCH] vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048
From: Michael S. Tsirkin @ 2018-08-08 20:42 UTC (permalink / raw)
To: Greg Edwards; +Cc: pbonzini, virtualization
In-Reply-To: <20180808192955.1115-1-gedwards@ddn.com>
On Wed, Aug 08, 2018 at 01:29:55PM -0600, Greg Edwards wrote:
> The current value of VHOST_SCSI_PREALLOC_PROT_SGLS is too small to
> accommodate larger I/Os, e.g. 16-32 MiB, when the VIRTIO_SCSI_F_T10_PI
> feature bit is negotiated and the backing store supports T10 PI.
>
> vhost-scsi rejects the command with errors like:
>
> [ 59.581317] vhost_scsi_calc_sgls: requested sgl_count: 1820 exceeds pre-allocated max_sgls: 512
>
> Signed-off-by: Greg Edwards <gedwards@ddn.com>
> ---
> drivers/vhost/scsi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index 17fcd3b2e686..8c32cf58d6fa 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -56,7 +56,7 @@
> #define VHOST_SCSI_DEFAULT_TAGS 256
> #define VHOST_SCSI_PREALLOC_SGLS 2048
> #define VHOST_SCSI_PREALLOC_UPAGES 2048
> -#define VHOST_SCSI_PREALLOC_PROT_SGLS 512
> +#define VHOST_SCSI_PREALLOC_PROT_SGLS 2048
>
> struct vhost_scsi_inflight {
> /* Wait for the flush operation to finish */
I guess it's ok since PREALLOC_SGLS is already 2K ... or
am I missing something. Paolo, any input on this?
> --
> 2.17.1
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-08 20:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram,
virtualization, Christoph Hellwig, jean-philippe.brucker, paulus,
marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring,
haren, Anshuman Khandual
In-Reply-To: <b8b9150a747453c070ad3b0e4c92d2b1b052ad06.camel@kernel.crashing.org>
On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
> Sure, but all of this is just the configuration of the iommu. But I
> think we agree here, and your point remains valid, indeed my proposed
> hack:
>
> > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
>
> Will only work if the IOMMU and non-IOMMU path are completely equivalent.
>
> We can provide that guarantee for our secure VM case, but not generally so if
> we were to go down the route of a quirk in virtio, it might be better to
> make it painfully obvious that it's specific to that one case with a different
> kind of turd:
>
> - if (xen_domain())
> + if (xen_domain() || pseries_secure_vm())
> return true;
I don't think it's pseries specific actually. E.g. I suspect AMD SEV
might benefit from the same kind of hack.
> So to summarize, and make sure I'm not missing something, the two approaches
> at hand are either:
>
> 1- The above, which is a one liner and contained in the guest, so that's nice, but
> also means another turd in virtio which isn't ...
>
> 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
> architecture on our side that will force virtio to always go through an emulated
> iommu, as pseries doesn't have the concept of a real bypass window, and thus will
> impact performance for both secure and non-secure VMs.
>
> 3- Invent a property that can be put in selected PCI device tree nodes that
> indicates that for that device specifically, the iommu can be bypassed, along with
> a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
> but its DT nodes would also have that property and Linux would notice it and turn
> bypass on.
For completeness, virtio could also have its own bounce buffer
outside of DMA API one. I don't see lots of benefits to this
though.
> The resulting properties of those options are:
>
> 1- Is what I want because it's the simplest, provides the best performance now,
> and works without code changes to qemu or non-secure Linux. However it does
> add a tiny turd to virtio which is annoying.
>
> 2- This works but it puts the iommu in the way always, thus reducing virtio performance
> accross the board for pseries unless we only do that for secure VMs but that is
> difficult (as discussed earlier).
>
> 3- This would recover the performance lost in -2-, however it requires qemu *and*
> guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
> performance hit of -2- unless modified to call that 'enable bypass' call, which
> isn't great.
>
> So imho we have to chose one of 3 not-great solutions here... Unless I missed
> something in your ideas of course.
>
> Cheers,
> Ben.
>
>
^ permalink raw reply
* Re: [PATCH net] vhost: reset metadata cache when initializing new IOTLB
From: David Miller @ 2018-08-08 16:45 UTC (permalink / raw)
To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1533699784-4950-1-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 8 Aug 2018 11:43:04 +0800
> We need to reset metadata cache during new IOTLB initialization,
> otherwise the stale pointers to previous IOTLB may be still accessed
> which will lead a use after free.
>
> Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com
> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Applied and queued up for -stable, thanks Jason.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Benjamin Herrenschmidt @ 2018-08-08 13:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel,
linuxram, virtualization, jean-philippe.brucker, paulus,
marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring,
haren, Anshuman Khandual
In-Reply-To: <20180808123036.GA2525@infradead.org>
On Wed, 2018-08-08 at 05:30 -0700, Christoph Hellwig wrote:
> On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote:
> > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
> > is not set (default) but there's nothing in the device-tree to tell the
> > guest about this since it's a violation of our pseries architecture, so
> > we just rely on Linux virtio "knowing" that it happens. It's a bit
> > yucky but that's now history...
>
> That is ugly as hell, but it is how virtio works everywhere, so nothing
> special so far.
Yup.
> > Essentially pseries "architecturally" does not have the concept of not
> > having an iommu in the way and qemu violates that architecture today.
> >
> > (Remember it comes from pHyp, our priorietary HV, which we are somewhat
> > mimmicing here).
>
> It shouldnt be too hard to have a dt property that communicates this,
> should it?
We could invent something I suppose. The additional problem then (yeah
I know ... what a mess) is that qemu doesn't create the DT for PCI
devices, the firmware (SLOF) inside the guest does using normal PCI
probing.
That said, that FW could know about all the virtio vendor/device IDs,
check the VIRTIO_F_IOMMU_PLATFORM and set that property accordingly...
messy but doable. It's not a bus property (see my other reply below as
this could complicate things with your bus mask).
But we are drifting from the problem at hand :-) You propose we do set
VIRTIO_F_IOMMU_PLATFORM so we aren't in the above case, and the bypass
stuff works, so no need to touch it.
See my recap at the end of the email to make sure I understand fully
what you suggest.
> > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
> > through that iommu and performance will suffer (esp vhost I suspect),
> > especially since adding/removing translations in the iommu is a
> > hypercall.
> Well, we'd nee to make sure that for this particular bus we skip the
> actualy iommu.
It's not a bus property. Qemu will happily mix up everything on the
same bus, that includes emulated devices that go through the emulated
iommu, real VFIO devices that go through an actual HW iommu and virtio
that bypasses everything.
This makes things tricky in general (not just in my powerpc secure VM
case) since, at least on powerpc but I suppose elsewhere too, iommu
related properties tend to be per "bus" while here, qemu will mix and
match.
But again, I think we are drifting away from the topic, see below
> > > It would not be the same effect. The problem with that is that you must
> > > now assumes that your qemu knows that for example you might be passing
> > > a dma offset if the bus otherwise requires it.
> >
> > I would assume that arch_virtio_wants_dma_ops() only returns true when
> > no such offsets are involved, at least in our case that would be what
> > happens.
>
> That would work, but we're really piling hacĸs ontop of hacks here.
Sort-of :-) At least none of what we are discussing now involves
touching the dma_ops themselves so we are not in the way of your big
cleanup operation here. But yeah, let's continue discussing your other
solution below.
> > > Or in other words:
> > > you potentially break the contract between qemu and the guest of always
> > > passing down physical addresses. If we explicitly change that contract
> > > through using a flag that says you pass bus address everything is fine.
> >
> > For us a "bus address" is behind the iommu so that's what
> > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
> > bus address that is different. I suppose it's an ARMism to have DMA
> > offsets that are separate from iommus ?
>
> No, a lot of platforms support a bus address that has an offset from
> the physical address. including a lot of power platforms:
Ok, just talking past each other :-) For all the powerpc ones, these
*do* go through the iommu, which is what I meant. It's just a window of
the iommu that provides some kind of direct mapping of memory.
For pseries, there is no such thing however. What we do to avoid
constant map/unmap of iommu PTEs in pseries guests is that we use
hypercalls to create a 64-bit window and populate all its PTEs with an
identity mapping. But that's not as efficient as a real bypass.
There are good historical reasons for that, since pseries is a guest
platform, its memory is never really where the guest thinks it is, so
you always need an iommu to remap. Even for virtual devices, since for
most of them, in the "IBM" pHyp model, the "peer" is actually another
partition, so the virtual iommu handles translating accross the two
partitions.
Same goes with cell in HW, no real bypass, just the iommu being
confiured with very large pages and a fixed mapping.
powernv has a separate physical window that can be configured as a real
bypass though, so does the U4 DART. Not sure about the FSL one.
But yeah, your point stands, this is just implementation details.
> arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
> arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset);
> arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr);
> arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base);
> arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32));
> arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base);
> arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset);
> arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
> arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset);
>
> to make things worse some platforms (at least on arm/arm64/mips/x86) can
> also require additional banking where it isn't even a single linear map
> but multiples windows.
Sure, but all of this is just the configuration of the iommu. But I
think we agree here, and your point remains valid, indeed my proposed
hack:
> if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
Will only work if the IOMMU and non-IOMMU path are completely equivalent.
We can provide that guarantee for our secure VM case, but not generally so if
we were to go down the route of a quirk in virtio, it might be better to
make it painfully obvious that it's specific to that one case with a different
kind of turd:
- if (xen_domain())
+ if (xen_domain() || pseries_secure_vm())
return true;
So to summarize, and make sure I'm not missing something, the two approaches
at hand are either:
1- The above, which is a one liner and contained in the guest, so that's nice, but
also means another turd in virtio which isn't ...
2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
architecture on our side that will force virtio to always go through an emulated
iommu, as pseries doesn't have the concept of a real bypass window, and thus will
impact performance for both secure and non-secure VMs.
3- Invent a property that can be put in selected PCI device tree nodes that
indicates that for that device specifically, the iommu can be bypassed, along with
a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
but its DT nodes would also have that property and Linux would notice it and turn
bypass on.
The resulting properties of those options are:
1- Is what I want because it's the simplest, provides the best performance now,
and works without code changes to qemu or non-secure Linux. However it does
add a tiny turd to virtio which is annoying.
2- This works but it puts the iommu in the way always, thus reducing virtio performance
accross the board for pseries unless we only do that for secure VMs but that is
difficult (as discussed earlier).
3- This would recover the performance lost in -2-, however it requires qemu *and*
guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
performance hit of -2- unless modified to call that 'enable bypass' call, which
isn't great.
So imho we have to chose one of 3 not-great solutions here... Unless I missed
something in your ideas of course.
Cheers,
Ben.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox