* [PATCH v3 1/8] mm, x86: Add HAVE_RCU_TABLE_FREE support
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
@ 2012-07-31 10:47 ` Nikunj A. Dadhania
2012-07-31 10:48 ` [PATCH v3 2/8] mm: Add missing TLB invalidate to RCU page-table freeing Nikunj A. Dadhania
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:47 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Implements optional HAVE_RCU_TABLE_FREE support for x86.
This is useful for things like Xen and KVM where paravirt tlb flush
means the software page table walkers like GUP-fast cannot rely on
IRQs disabling like regular x86 can.
Not for inclusion - is part of PeterZ's "Unify TLB gather implementations"
http://mid.gmane.org/20120627211540.459910855@chello.nl
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-r106wg6t7crxxhva55jnacrj@git.kernel.org
---
arch/x86/include/asm/tlb.h | 1 +
arch/x86/mm/pgtable.c | 6 +++---
include/asm-generic/tlb.h | 9 +++++++++
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 4fef207..f5489f0 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -1,6 +1,7 @@
#ifndef _ASM_X86_TLB_H
#define _ASM_X86_TLB_H
+#define __tlb_remove_table(table) free_page_and_swap_cache(table)
#define tlb_start_vma(tlb, vma) do { } while (0)
#define tlb_end_vma(tlb, vma) do { } while (0)
#define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8573b83..34fa168 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -51,21 +51,21 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
{
pgtable_page_dtor(pte);
paravirt_release_pte(page_to_pfn(pte));
- tlb_remove_page(tlb, pte);
+ tlb_remove_table(tlb, pte);
}
#if PAGETABLE_LEVELS > 2
void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
{
paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
- tlb_remove_page(tlb, virt_to_page(pmd));
+ tlb_remove_table(tlb, virt_to_page(pmd));
}
#if PAGETABLE_LEVELS > 3
void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
{
paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
- tlb_remove_page(tlb, virt_to_page(pud));
+ tlb_remove_table(tlb, virt_to_page(pud));
}
#endif /* PAGETABLE_LEVELS > 3 */
#endif /* PAGETABLE_LEVELS > 2 */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index ed6642a..d382b22 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -19,6 +19,8 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page);
+
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
/*
* Semi RCU freeing of the page directories.
@@ -60,6 +62,13 @@ struct mmu_table_batch {
extern void tlb_table_flush(struct mmu_gather *tlb);
extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
+#else
+
+static inline void tlb_remove_table(struct mmu_gather *tlb, void *table)
+{
+ tlb_remove_page(tlb, table);
+}
+
#endif
/*
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 2/8] mm: Add missing TLB invalidate to RCU page-table freeing
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
2012-07-31 10:47 ` [PATCH v3 1/8] mm, x86: Add HAVE_RCU_TABLE_FREE support Nikunj A. Dadhania
@ 2012-07-31 10:48 ` Nikunj A. Dadhania
2012-07-31 10:48 ` [PATCH v3 3/8] KVM Guest: Add VCPU running/pre-empted state for guest Nikunj A. Dadhania
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:48 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
For normal systems we need a TLB invalidate before freeing the
page-tables, the generic RCU based page-table freeing code lacked
this.
This is because this code originally came from ppc where the hardware
never walks the linux page-tables and thus this invalidate is not
required.
Others, notably s390 which ran into this problem in cd94154cc6a
("[S390] fix tlb flushing for page table pages"), do very much need
this TLB invalidation.
Therefore add it, with a Kconfig option to disable it so as to not
unduly slow down PPC and SPARC64 which neither of them need it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-z32nke0csqopykthsk1zjg8f@git.kernel.org
[Fix to check *batch is not NULL]
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
arch/Kconfig | 3 +++
arch/powerpc/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
mm/memory.c | 43 +++++++++++++++++++++++++++++++++++++------
4 files changed, 42 insertions(+), 6 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 8c3d957..fec1c9b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX
config HAVE_RCU_TABLE_FREE
bool
+config STRICT_TLB_FILL
+ bool
+
config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9a5d3cd..fb70260 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,6 +127,7 @@ config PPC
select GENERIC_IRQ_SHOW_LEVEL
select IRQ_FORCED_THREADING
select HAVE_RCU_TABLE_FREE if SMP
+ select STRICT_TLB_FILL
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_BPF_JIT if PPC64
select HAVE_ARCH_JUMP_LABEL
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e74ff13..126e500 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -52,6 +52,7 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
+ select STRICT_TLB_FILL
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_SYSCALL_WRAPPERS
diff --git a/mm/memory.c b/mm/memory.c
index 91f6945..2ef9ce1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -332,12 +332,47 @@ static void tlb_remove_table_rcu(struct rcu_head *head)
free_page((unsigned long)batch);
}
+#ifdef CONFIG_STRICT_TLB_FILL
+/*
+ * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed
+ * the PTE entries from their hash-table. Their hardware never looks at the
+ * linux page-table structures, so they don't need a hardware TLB invalidate
+ * when tearing down the page-table structure itself.
+ */
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { }
+
+/*
+ * When there's less than two users of this mm there cannot be
+ * a concurrent page-table walk.
+ */
+static inline bool tlb_table_fast(struct mmu_gather *tlb)
+{
+ return atomic_read(&tlb->mm->mm_users) < 2;
+}
+#else
+static inline void tlb_table_flush_mmu(struct mmu_gather *tlb)
+{
+ tlb_flush_mmu(tlb);
+}
+
+/*
+ * Even if there's only a single user, speculative TLB loads can
+ * wreck stuff.
+ */
+static inline bool tlb_table_fast(struct mmu_gather *tlb)
+{
+ return false;
+}
+#endif /* CONFIG_STRICT_TLB_FILL */
+
void tlb_table_flush(struct mmu_gather *tlb)
{
struct mmu_table_batch **batch = &tlb->batch;
if (*batch) {
- call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
+ tlb_table_flush_mmu(tlb);
+ if (*batch)
+ call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu);
*batch = NULL;
}
}
@@ -348,11 +383,7 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
tlb->need_flush = 1;
- /*
- * When there's less then two users of this mm there cannot be a
- * concurrent page-table walk.
- */
- if (atomic_read(&tlb->mm->mm_users) < 2) {
+ if (tlb_table_fast(tlb)) {
__tlb_remove_table(table);
return;
}
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 3/8] KVM Guest: Add VCPU running/pre-empted state for guest
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
2012-07-31 10:47 ` [PATCH v3 1/8] mm, x86: Add HAVE_RCU_TABLE_FREE support Nikunj A. Dadhania
2012-07-31 10:48 ` [PATCH v3 2/8] mm: Add missing TLB invalidate to RCU page-table freeing Nikunj A. Dadhania
@ 2012-07-31 10:48 ` Nikunj A. Dadhania
2012-07-31 10:48 ` [PATCH v3 4/8] KVM-HV: " Nikunj A. Dadhania
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:48 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
The patch adds guest code for msr between guest and hypervisor. The
msr will export the vcpu running/pre-empted information to the guest
from host. This will enable guest to intelligently send ipi to running
vcpus and set flag for pre-empted vcpus. This will prevent waiting for
vcpus that are not running.
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
arch/x86/include/asm/kvm_para.h | 13 +++++++++++++
arch/x86/kernel/kvm.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2f7712e..5dfb975 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
#define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_VCPU_STATE 7
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@@ -39,6 +40,7 @@
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
#define MSR_KVM_PV_EOI_EN 0x4b564d04
+#define MSR_KVM_VCPU_STATE 0x4b564d05
struct kvm_steal_time {
__u64 steal;
@@ -51,6 +53,17 @@ struct kvm_steal_time {
#define KVM_STEAL_VALID_BITS ((-1ULL << (KVM_STEAL_ALIGNMENT_BITS + 1)))
#define KVM_STEAL_RESERVED_MASK (((1 << KVM_STEAL_ALIGNMENT_BITS) - 1 ) << 1)
+struct kvm_vcpu_state {
+ __u64 state;
+ __u32 pad[14];
+};
+/* bits in vcpu_state->state */
+#define KVM_VCPU_STATE_IN_GUEST_MODE 0
+#define KVM_VCPU_STATE_SHOULD_FLUSH 1
+
+#define KVM_VCPU_STATE_ALIGN_BITS 5
+#define KVM_VCPU_STATE_VALID_BITS ((-1ULL << (KVM_VCPU_STATE_ALIGN_BITS + 1)))
+
#define KVM_MAX_MMU_OP_BATCH 32
#define KVM_ASYNC_PF_ENABLED (1 << 0)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index c1d61ee..37e6599 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -66,6 +66,9 @@ static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
static int has_steal_clock = 0;
+DEFINE_PER_CPU(struct kvm_vcpu_state, vcpu_state) __aligned(64);
+static int has_vcpu_state;
+
/*
* No need for any "IO delay" on KVM
*/
@@ -302,6 +305,22 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
apic_write(APIC_EOI, APIC_EOI_ACK);
}
+static void kvm_register_vcpu_state(void)
+{
+ int cpu = smp_processor_id();
+ struct kvm_vcpu_state *v_state;
+
+ if (!has_vcpu_state)
+ return;
+
+ v_state = &per_cpu(vcpu_state, cpu);
+ memset(v_state, 0, sizeof(*v_state));
+
+ wrmsrl(MSR_KVM_VCPU_STATE, (__pa(v_state) | KVM_MSR_ENABLED));
+ printk(KERN_INFO "kvm-vcpustate: cpu %d, msr %lx\n",
+ cpu, __pa(v_state));
+}
+
void __cpuinit kvm_guest_cpu_init(void)
{
if (!kvm_para_available())
@@ -330,6 +349,9 @@ void __cpuinit kvm_guest_cpu_init(void)
if (has_steal_clock)
kvm_register_steal_time();
+
+ if (has_vcpu_state)
+ kvm_register_vcpu_state();
}
static void kvm_pv_disable_apf(void)
@@ -393,6 +415,14 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
}
+void kvm_disable_vcpu_state(void)
+{
+ if (!has_vcpu_state)
+ return;
+
+ wrmsr(MSR_KVM_VCPU_STATE, 0, 0);
+}
+
#ifdef CONFIG_SMP
static void __init kvm_smp_prepare_boot_cpu(void)
{
@@ -410,6 +440,7 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
static void kvm_guest_cpu_offline(void *dummy)
{
+ kvm_disable_vcpu_state();
kvm_disable_steal_time();
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
wrmsrl(MSR_KVM_PV_EOI_EN, 0);
@@ -469,6 +500,11 @@ void __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
+#ifdef CONFIG_PARAVIRT_TLB_FLUSH
+ if (kvm_para_has_feature(KVM_FEATURE_VCPU_STATE))
+ has_vcpu_state = 1;
+#endif
+
#ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
register_cpu_notifier(&kvm_cpu_notifier);
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
` (2 preceding siblings ...)
2012-07-31 10:48 ` [PATCH v3 3/8] KVM Guest: Add VCPU running/pre-empted state for guest Nikunj A. Dadhania
@ 2012-07-31 10:48 ` Nikunj A. Dadhania
2012-08-02 19:56 ` Marcelo Tosatti
2012-07-31 10:48 ` [PATCH v3 5/8] KVM Guest: Add paravirt kvm_flush_tlb_others Nikunj A. Dadhania
` (3 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:48 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Hypervisor code to indicate guest running/pre-empteded status through
msr. The page is now pinned during MSR write time and use
kmap_atomic/kunmap_atomic to access the shared area vcpu_state area.
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
arch/x86/include/asm/kvm_host.h | 7 ++++
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++++++++++++++++-
3 files changed, 77 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..441348f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -429,6 +429,13 @@ struct kvm_vcpu_arch {
struct kvm_steal_time steal;
} st;
+ /* indicates vcpu is running or preempted */
+ struct {
+ u64 msr_val;
+ struct page *vs_page;
+ unsigned int vs_offset;
+ } v_state;
+
u64 last_guest_tsc;
u64 last_kernel_ns;
u64 last_host_tsc;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0595f13..37ab364 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -411,6 +411,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
(1 << KVM_FEATURE_PV_EOI) |
+ (1 << KVM_FEATURE_VCPU_STATE) |
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
if (sched_info_on())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59b5950..580abcf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -806,13 +806,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
* kvm-specific. Those are put in the beginning of the list.
*/
-#define KVM_SAVE_MSRS_BEGIN 9
+#define KVM_SAVE_MSRS_BEGIN 10
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN,
+ MSR_KVM_VCPU_STATE, MSR_KVM_PV_EOI_EN,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
#ifdef CONFIG_X86_64
@@ -1557,6 +1557,53 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
}
+static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
+{
+ int loop = 1000000;
+ while (1) {
+ if (cmpxchg(addr, old, new) == old)
+ break;
+ loop--;
+ if (!loop) {
+ pr_info("atomic cur: %lx old: %lx new: %lx\n",
+ *addr, old, new);
+ break;
+ }
+ }
+}
+
+static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_state *vs;
+ char *kaddr;
+
+ if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
+ !vcpu->arch.v_state.vs_page)
+ return;
+
+ kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
+ kaddr += vcpu->arch.v_state.vs_offset;
+ vs = kaddr;
+ kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
+ kunmap_atomic(kaddr);
+}
+
+static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_state *vs;
+ char *kaddr;
+
+ if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
+ !vcpu->arch.v_state.vs_page)
+ return;
+
+ kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
+ kaddr += vcpu->arch.v_state.vs_offset;
+ vs = kaddr;
+ kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
+ kunmap_atomic(kaddr);
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
bool pr = false;
@@ -1676,6 +1723,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
return 1;
break;
+ case MSR_KVM_VCPU_STATE:
+ vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
+ vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
+
+ if (is_error_page(vcpu->arch.v_state.vs_page)) {
+ kvm_release_page_clean(vcpu->arch.time_page);
+ vcpu->arch.v_state.vs_page = NULL;
+ pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
+ }
+ vcpu->arch.v_state.msr_val = data;
+ break;
+
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1996,6 +2055,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_KVM_STEAL_TIME:
data = vcpu->arch.st.msr_val;
break;
+ case MSR_KVM_VCPU_STATE:
+ data = vcpu->arch.v_state.msr_val;
+ break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
@@ -5312,6 +5374,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_load_guest_fpu(vcpu);
kvm_load_guest_xcr0(vcpu);
+ kvm_set_vcpu_state(vcpu);
+
vcpu->mode = IN_GUEST_MODE;
/* We should set ->mode before check ->requests,
@@ -5323,6 +5387,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
|| need_resched() || signal_pending(current)) {
+ kvm_clear_vcpu_state(vcpu);
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
local_irq_enable();
@@ -5361,6 +5426,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu);
+ kvm_clear_vcpu_state(vcpu);
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
local_irq_enable();
@@ -6043,6 +6109,7 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu->arch.apf.msr_val = 0;
vcpu->arch.st.msr_val = 0;
+ vcpu->arch.v_state.msr_val = 0;
kvmclock_reset(vcpu);
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-07-31 10:48 ` [PATCH v3 4/8] KVM-HV: " Nikunj A. Dadhania
@ 2012-08-02 19:56 ` Marcelo Tosatti
2012-08-03 5:17 ` Nikunj A Dadhania
2012-08-03 5:55 ` Nikunj A Dadhania
0 siblings, 2 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2012-08-02 19:56 UTC (permalink / raw)
To: Nikunj A. Dadhania; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Tue, Jul 31, 2012 at 04:18:41PM +0530, Nikunj A. Dadhania wrote:
> From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
>
> Hypervisor code to indicate guest running/pre-empteded status through
> msr. The page is now pinned during MSR write time and use
> kmap_atomic/kunmap_atomic to access the shared area vcpu_state area.
>
> Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> arch/x86/include/asm/kvm_host.h | 7 ++++
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 09155d6..441348f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -429,6 +429,13 @@ struct kvm_vcpu_arch {
> struct kvm_steal_time steal;
> } st;
>
> + /* indicates vcpu is running or preempted */
> + struct {
> + u64 msr_val;
> + struct page *vs_page;
> + unsigned int vs_offset;
> + } v_state;
> +
> u64 last_guest_tsc;
> u64 last_kernel_ns;
> u64 last_host_tsc;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0595f13..37ab364 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -411,6 +411,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> (1 << KVM_FEATURE_CLOCKSOURCE2) |
> (1 << KVM_FEATURE_ASYNC_PF) |
> (1 << KVM_FEATURE_PV_EOI) |
> + (1 << KVM_FEATURE_VCPU_STATE) |
> (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>
> if (sched_info_on())
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 59b5950..580abcf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -806,13 +806,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
> * kvm-specific. Those are put in the beginning of the list.
> */
>
> -#define KVM_SAVE_MSRS_BEGIN 9
> +#define KVM_SAVE_MSRS_BEGIN 10
> static u32 msrs_to_save[] = {
> MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
> MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
> HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
> HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> - MSR_KVM_PV_EOI_EN,
> + MSR_KVM_VCPU_STATE, MSR_KVM_PV_EOI_EN,
> MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
> MSR_STAR,
> #ifdef CONFIG_X86_64
> @@ -1557,6 +1557,53 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> }
>
> +static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
> +{
> + int loop = 1000000;
> + while (1) {
> + if (cmpxchg(addr, old, new) == old)
> + break;
> + loop--;
> + if (!loop) {
> + pr_info("atomic cur: %lx old: %lx new: %lx\n",
> + *addr, old, new);
> + break;
> + }
> + }
> +}
A generic "kvm_set_atomic" would need that loop, but in the particular
TLB flush case we know that the only information being transmitted is
a TLB flush.
So this idea should work:
old = *addr;
if (cmpxchg(addr, old, IN_GUEST_MODE) == FAILURE)
kvm_x86_ops->tlb_flush()
atomic_set(addr, IN_GUEST_MODE);
} else if {
if (old & TLB_SHOULD_FLUSH)
kvm_x86_ops->tlb_flush()
}
(the actual pseucode above is pretty ugly and
mus be improved but it should be enough to transmit
the idea).
Of course as long as you make sure the atomic_set does not
overwrite information.
> + char *kaddr;
> +
> + if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
> + !vcpu->arch.v_state.vs_page)
> + return;
If its not enabled vs_page should be NULL?
> +
> + kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> + kaddr += vcpu->arch.v_state.vs_offset;
> + vs = kaddr;
> + kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> + kunmap_atomic(kaddr);
> +}
> +
> +static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_state *vs;
> + char *kaddr;
> +
> + if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
> + !vcpu->arch.v_state.vs_page)
> + return;
Like above.
> + kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> + kaddr += vcpu->arch.v_state.vs_offset;
> + vs = kaddr;
> + kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> + kunmap_atomic(kaddr);
> +}
> +
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> {
> bool pr = false;
> @@ -1676,6 +1723,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> return 1;
> break;
>
> + case MSR_KVM_VCPU_STATE:
> + vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> + vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
Assign vs_offset after success.
> +
> + if (is_error_page(vcpu->arch.v_state.vs_page)) {
> + kvm_release_page_clean(vcpu->arch.time_page);
> + vcpu->arch.v_state.vs_page = NULL;
> + pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
Missing break or return;
> + }
> + vcpu->arch.v_state.msr_val = data;
> + break;
> +
> case MSR_IA32_MCG_CTL:
Please verify this code carefully again.
Also leaking the page reference.
> vcpu->arch.apf.msr_val = 0;
> vcpu->arch.st.msr_val = 0;
> + vcpu->arch.v_state.msr_val = 0;
Add a newline and comment (or even better a new helper).
>
> kvmclock_reset(vcpu);
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-08-02 19:56 ` Marcelo Tosatti
@ 2012-08-03 5:17 ` Nikunj A Dadhania
2012-08-03 5:55 ` Nikunj A Dadhania
1 sibling, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2012-08-03 5:17 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Thu, 2 Aug 2012 16:56:28 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Jul 31, 2012 at 04:18:41PM +0530, Nikunj A. Dadhania wrote:
> > From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> >
> > Hypervisor code to indicate guest running/pre-empteded status through
> > msr. The page is now pinned during MSR write time and use
> > kmap_atomic/kunmap_atomic to access the shared area vcpu_state area.
> >
> > Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 7 ++++
> > arch/x86/kvm/cpuid.c | 1 +
> > arch/x86/kvm/x86.c | 71 ++++++++++++++++++++++++++++++++++++++-
> > 3 files changed, 77 insertions(+), 2 deletions(-)
[...]
> > +static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
> > +{
> > + int loop = 1000000;
> > + while (1) {
> > + if (cmpxchg(addr, old, new) == old)
> > + break;
> > + loop--;
> > + if (!loop) {
> > + pr_info("atomic cur: %lx old: %lx new: %lx\n",
> > + *addr, old, new);
> > + break;
> > + }
> > + }
> > +}
>
> A generic "kvm_set_atomic" would need that loop, but in the particular
> TLB flush case we know that the only information being transmitted is
> a TLB flush.
>
yes, so the next patch gets rid of this in a neater way.
> So this idea should work:
>
> old = *addr;
> if (cmpxchg(addr, old, IN_GUEST_MODE) == FAILURE)
> kvm_x86_ops->tlb_flush()
> atomic_set(addr, IN_GUEST_MODE);
> } else if {
> if (old & TLB_SHOULD_FLUSH)
> kvm_x86_ops->tlb_flush()
> }
>
> (the actual pseucode above is pretty ugly and
> mus be improved but it should be enough to transmit
> the idea).
>
> Of course as long as you make sure the atomic_set does not
> overwrite information.
>
>
> > + char *kaddr;
> > +
> > + if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
> > + !vcpu->arch.v_state.vs_page)
> > + return;
>
> If its not enabled vs_page should be NULL?
>
Yes, it should be:
if (!(enabled && vs_page))
return;
> > +
> > + kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > + kaddr += vcpu->arch.v_state.vs_offset;
> > + vs = kaddr;
> > + kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> > + kunmap_atomic(kaddr);
> > +}
> > +
> > +static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_vcpu_state *vs;
> > + char *kaddr;
> > +
> > + if (!(vcpu->arch.v_state.msr_val & KVM_MSR_ENABLED) ||
> > + !vcpu->arch.v_state.vs_page)
> > + return;
>
> Like above.
>
> > + kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > + kaddr += vcpu->arch.v_state.vs_offset;
> > + vs = kaddr;
> > + kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> > + kunmap_atomic(kaddr);
> > +}
> > +
> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > {
> > bool pr = false;
> > @@ -1676,6 +1723,18 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > return 1;
> > break;
> >
> > + case MSR_KVM_VCPU_STATE:
> > + vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> > + vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
>
> Assign vs_offset after success.
>
Will do that.
> > +
> > + if (is_error_page(vcpu->arch.v_state.vs_page)) {
> > + kvm_release_page_clean(vcpu->arch.time_page);
C&P error :(
kvm_release_page_clean(vcpu->arch.v_state.vs_page);
> > + vcpu->arch.v_state.vs_page = NULL;
> > + pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
>
> Missing break or return;
>
Right
> > + }
> > + vcpu->arch.v_state.msr_val = data;
> > + break;
> > +
> > case MSR_IA32_MCG_CTL:
>
> Please verify this code carefully again.
>
> Also leaking the page reference.
>
> > vcpu->arch.apf.msr_val = 0;
> > vcpu->arch.st.msr_val = 0;
> > + vcpu->arch.v_state.msr_val = 0;
>
> Add a newline and comment (or even better a new helper).
>
Will do.
Thanks for the detailed review.
Nikunj
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-08-02 19:56 ` Marcelo Tosatti
2012-08-03 5:17 ` Nikunj A Dadhania
@ 2012-08-03 5:55 ` Nikunj A Dadhania
2012-08-03 17:31 ` Marcelo Tosatti
1 sibling, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2012-08-03 5:55 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Thu, 2 Aug 2012 16:56:28 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > + case MSR_KVM_VCPU_STATE:
> > + vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> > + vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
>
> Assign vs_offset after success.
>
> > +
> > + if (is_error_page(vcpu->arch.v_state.vs_page)) {
> > + kvm_release_page_clean(vcpu->arch.time_page);
> > + vcpu->arch.v_state.vs_page = NULL;
> > + pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
>
> Missing break or return;
>
> > + }
> > + vcpu->arch.v_state.msr_val = data;
> > + break;
> > +
> > case MSR_IA32_MCG_CTL:
>
> Please verify this code carefully again.
>
> Also leaking the page reference.
>
> > vcpu->arch.apf.msr_val = 0;
> > vcpu->arch.st.msr_val = 0;
> > + vcpu->arch.v_state.msr_val = 0;
>
> Add a newline and comment (or even better a new helper).
> >
> > kvmclock_reset(vcpu);
>
How about something like the below. I have tried to look at time_page
for reference:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 580abcf..c82cc12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1604,6 +1604,16 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
kunmap_atomic(kaddr);
}
+static void kvm_vcpu_state_reset(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.v_state.msr_val = 0;
+ vcpu->arch.v_state.vs_offset = 0;
+ if (vcpu->arch.v_state.vs_page) {
+ kvm_release_page_dirty(vcpu->arch.v_state.vs_page);
+ vcpu->arch.v_state.vs_page = NULL;
+ }
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
bool pr = false;
@@ -1724,14 +1734,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
break;
case MSR_KVM_VCPU_STATE:
+ kvm_vcpu_state_reset(vcpu);
+
vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
- vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
if (is_error_page(vcpu->arch.v_state.vs_page)) {
- kvm_release_page_clean(vcpu->arch.time_page);
+ kvm_release_page_clean(vcpu->arch.v_state.vs_page);
vcpu->arch.v_state.vs_page = NULL;
pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
+ break;
}
+ vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
vcpu->arch.v_state.msr_val = data;
break;
@@ -6053,6 +6066,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
{
kvmclock_reset(vcpu);
+ kvm_vcpu_state_reset(vcpu);
free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
fx_free(vcpu);
@@ -6109,9 +6123,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
vcpu->arch.apf.msr_val = 0;
vcpu->arch.st.msr_val = 0;
- vcpu->arch.v_state.msr_val = 0;
kvmclock_reset(vcpu);
+ kvm_vcpu_state_reset(vcpu);
kvm_clear_async_pf_completion_queue(vcpu);
kvm_async_pf_hash_reset(vcpu);
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-08-03 5:55 ` Nikunj A Dadhania
@ 2012-08-03 17:31 ` Marcelo Tosatti
2012-08-04 18:33 ` Nikunj A Dadhania
0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2012-08-03 17:31 UTC (permalink / raw)
To: Nikunj A Dadhania; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Fri, Aug 03, 2012 at 11:25:44AM +0530, Nikunj A Dadhania wrote:
> On Thu, 2 Aug 2012 16:56:28 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > + case MSR_KVM_VCPU_STATE:
> > > + vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> > > + vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
> >
> > Assign vs_offset after success.
> >
> > > +
> > > + if (is_error_page(vcpu->arch.v_state.vs_page)) {
> > > + kvm_release_page_clean(vcpu->arch.time_page);
> > > + vcpu->arch.v_state.vs_page = NULL;
> > > + pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
> >
> > Missing break or return;
> >
> > > + }
> > > + vcpu->arch.v_state.msr_val = data;
> > > + break;
> > > +
> > > case MSR_IA32_MCG_CTL:
> >
> > Please verify this code carefully again.
> >
> > Also leaking the page reference.
> >
> > > vcpu->arch.apf.msr_val = 0;
> > > vcpu->arch.st.msr_val = 0;
> > > + vcpu->arch.v_state.msr_val = 0;
> >
> > Add a newline and comment (or even better a new helper).
> > >
> > > kvmclock_reset(vcpu);
> >
>
> How about something like the below. I have tried to look at time_page
> for reference:
>
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 580abcf..c82cc12 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1604,6 +1604,16 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> kunmap_atomic(kaddr);
> }
>
> +static void kvm_vcpu_state_reset(struct kvm_vcpu *vcpu)
> +{
> + vcpu->arch.v_state.msr_val = 0;
> + vcpu->arch.v_state.vs_offset = 0;
> + if (vcpu->arch.v_state.vs_page) {
> + kvm_release_page_dirty(vcpu->arch.v_state.vs_page);
> + vcpu->arch.v_state.vs_page = NULL;
> + }
> +}
> +
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> {
> bool pr = false;
> @@ -1724,14 +1734,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> break;
>
> case MSR_KVM_VCPU_STATE:
> + kvm_vcpu_state_reset(vcpu);
> +
> vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> - vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
Should also fail if its not enabled (KVM_MSR_ENABLED bit).
What is the point of having non-NULL vs_page pointer if KVM_MSR_ENABLED
bit is not set?
The rest is fine, thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 4/8] KVM-HV: Add VCPU running/pre-empted state for guest
2012-08-03 17:31 ` Marcelo Tosatti
@ 2012-08-04 18:33 ` Nikunj A Dadhania
0 siblings, 0 replies; 18+ messages in thread
From: Nikunj A Dadhania @ 2012-08-04 18:33 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Fri, 3 Aug 2012 14:31:22 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Aug 03, 2012 at 11:25:44AM +0530, Nikunj A Dadhania wrote:
> > On Thu, 2 Aug 2012 16:56:28 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > >
> > > > + case MSR_KVM_VCPU_STATE:
> > > > + vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> > > > + vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
> > >
> > > Assign vs_offset after success.
> > >
> > > > +
> > > > + if (is_error_page(vcpu->arch.v_state.vs_page)) {
> > > > + kvm_release_page_clean(vcpu->arch.time_page);
> > > > + vcpu->arch.v_state.vs_page = NULL;
> > > > + pr_info("KVM: VCPU_STATE - Unable to pin the page\n");
> > >
> > > Missing break or return;
> > >
> > > > + }
> > > > + vcpu->arch.v_state.msr_val = data;
> > > > + break;
> > > > +
> > > > case MSR_IA32_MCG_CTL:
> > >
> > > Please verify this code carefully again.
> > >
> > > Also leaking the page reference.
> > >
> > > > vcpu->arch.apf.msr_val = 0;
> > > > vcpu->arch.st.msr_val = 0;
> > > > + vcpu->arch.v_state.msr_val = 0;
> > >
> > > Add a newline and comment (or even better a new helper).
> > > >
> > > > kvmclock_reset(vcpu);
> > >
> >
> > How about something like the below. I have tried to look at time_page
> > for reference:
> >
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 580abcf..c82cc12 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1604,6 +1604,16 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> > kunmap_atomic(kaddr);
> > }
> >
> > +static void kvm_vcpu_state_reset(struct kvm_vcpu *vcpu)
> > +{
> > + vcpu->arch.v_state.msr_val = 0;
> > + vcpu->arch.v_state.vs_offset = 0;
> > + if (vcpu->arch.v_state.vs_page) {
> > + kvm_release_page_dirty(vcpu->arch.v_state.vs_page);
> > + vcpu->arch.v_state.vs_page = NULL;
> > + }
> > +}
> > +
> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > {
> > bool pr = false;
> > @@ -1724,14 +1734,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> > break;
> >
> > case MSR_KVM_VCPU_STATE:
> > + kvm_vcpu_state_reset(vcpu);
> > +
> > vcpu->arch.v_state.vs_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
> > - vcpu->arch.v_state.vs_offset = data & ~(PAGE_MASK | KVM_MSR_ENABLED);
>
> Should also fail if its not enabled (KVM_MSR_ENABLED bit).
>
> What is the point of having non-NULL vs_page pointer if KVM_MSR_ENABLED
> bit is not set?
>
Yes, will do that.
> The rest is fine, thanks.
>
Thanks
Nikunj
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 5/8] KVM Guest: Add paravirt kvm_flush_tlb_others
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
` (3 preceding siblings ...)
2012-07-31 10:48 ` [PATCH v3 4/8] KVM-HV: " Nikunj A. Dadhania
@ 2012-07-31 10:48 ` Nikunj A. Dadhania
2012-07-31 10:49 ` [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter Nikunj A. Dadhania
` (2 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:48 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated
the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
paravirtualization.
Use the vcpu state information inside the kvm_flush_tlb_others to
avoid sending ipi to pre-empted vcpus.
* Do not send ipi's to offline vcpus and set flush_on_enter flag
* For online vcpus: Wait for them to clear the flag
The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
v3:
* use only one state variable for vcpu-running/flush_on_enter
* use cmpxchg to update the state
* adapt to Alex Shi's TLB flush optimization
v2:
* use ACCESS_ONCE so the value is not register cached
* Separate HV and Guest code
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
--
Pseudo Algo:
------------
Hypervisor
==========
guest_exit()
if (!(xchg(state, NOT_IN_GUEST) == SHOULD_FLUSH))
tlb_flush(vcpu);
guest_enter()
if (!(xchg(state, IN_GUEST) == SHOULD_FLUSH))
tlb_flush(vcpu);
Guest
=====
flushcpumask = cpumask;
for_each_cpu(i, flushmask) {
state = vs->state;
if(!test_bit(IN_GUEST_MODE, state)) {
if (cmpxchg(&vs->state, state,
state | (1 << SHOULD_FLUSH)) == SUCCESS)
cpumask_clear_cpu(flushmask,i)
}
}
if(!empty(flushmask)
smp_call_function_many(f->flushmask, flush_tlb_func)
Summary:
Author:
---
arch/x86/include/asm/tlbflush.h | 11 +++++++++++
arch/x86/kernel/kvm.c | 4 +++-
arch/x86/mm/tlb.c | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 74a4433..0a343a1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -119,6 +119,13 @@ static inline void native_flush_tlb_others(const struct cpumask *cpumask,
{
}
+static inline void kvm_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+}
+
static inline void reset_lazy_tlbstate(void)
{
}
@@ -153,6 +160,10 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start, unsigned long end);
+void kvm_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm, unsigned long start,
+ unsigned long end);
+
#define TLBSTATE_OK 1
#define TLBSTATE_LAZY 2
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 37e6599..b538a31 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -501,8 +501,10 @@ void __init kvm_guest_init(void)
apic_set_eoi_write(kvm_guest_apic_eoi_write);
#ifdef CONFIG_PARAVIRT_TLB_FLUSH
- if (kvm_para_has_feature(KVM_FEATURE_VCPU_STATE))
+ if (kvm_para_has_feature(KVM_FEATURE_VCPU_STATE)) {
has_vcpu_state = 1;
+ pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+ }
#endif
#ifdef CONFIG_SMP
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 613cd83..2399013 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,6 +6,7 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/cpu.h>
+#include <linux/kvm_para.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
@@ -119,6 +120,42 @@ static void flush_tlb_func(void *info)
}
+#ifdef CONFIG_KVM_GUEST
+
+DECLARE_PER_CPU(struct kvm_vcpu_state, vcpu_state) __aligned(64);
+
+void kvm_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm, unsigned long start,
+ unsigned long end)
+{
+ struct flush_tlb_info info;
+ struct kvm_vcpu_state *v_state;
+ u64 state;
+ int cpu;
+ cpumask_t flushmask;
+
+ cpumask_copy(&flushmask, cpumask);
+ info.flush_mm = mm;
+ info.flush_start = start;
+ info.flush_end = end;
+ /*
+ * We have to call flush only on online vCPUs. And
+ * queue flush_on_enter for pre-empted vCPUs
+ */
+ for_each_cpu(cpu, to_cpumask(&flushmask)) {
+ v_state = &per_cpu(vcpu_state, cpu);
+ state = v_state->state;
+ if (!test_bit(KVM_VCPU_STATE_IN_GUEST_MODE, &state)) {
+ if (cmpxchg(&v_state->state, state, state | 1 << KVM_VCPU_STATE_SHOULD_FLUSH))
+ cpumask_clear_cpu(cpu, to_cpumask(&flushmask));
+ }
+ }
+
+ if (!cpumask_empty(to_cpumask(&flushmask)))
+ smp_call_function_many(&flushmask, flush_tlb_func, &info, 1);
+}
+#endif /* CONFIG_KVM_GUEST */
+
void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long start,
unsigned long end)
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
` (4 preceding siblings ...)
2012-07-31 10:48 ` [PATCH v3 5/8] KVM Guest: Add paravirt kvm_flush_tlb_others Nikunj A. Dadhania
@ 2012-07-31 10:49 ` Nikunj A. Dadhania
2012-08-02 20:14 ` Marcelo Tosatti
2012-07-31 10:49 ` [PATCH v3 7/8] Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is enabled Nikunj A. Dadhania
2012-07-31 10:49 ` [PATCH v3 8/8] KVM-doc: Add paravirt tlb flush document Nikunj A. Dadhania
7 siblings, 1 reply; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:49 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
PV-Flush guest would indicate to flush on enter, flush the TLB before
entering and exiting the guest.
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
arch/x86/kvm/x86.c | 23 +++++++----------------
1 files changed, 7 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 580abcf..a67e971 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1557,20 +1557,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
}
-static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
-{
- int loop = 1000000;
- while (1) {
- if (cmpxchg(addr, old, new) == old)
- break;
- loop--;
- if (!loop) {
- pr_info("atomic cur: %lx old: %lx new: %lx\n",
- *addr, old, new);
- break;
- }
- }
-}
+#define VS_NOT_IN_GUEST (0)
+#define VS_IN_GUEST (1 << KVM_VCPU_STATE_IN_GUEST_MODE)
+#define VS_SHOULD_FLUSH (1 << KVM_VCPU_STATE_SHOULD_FLUSH)
static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
{
@@ -1584,7 +1573,8 @@ static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
kaddr += vcpu->arch.v_state.vs_offset;
vs = kaddr;
- kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
+ if (xchg(&vs->state, VS_IN_GUEST) == VS_SHOULD_FLUSH)
+ kvm_x86_ops->tlb_flush(vcpu);
kunmap_atomic(kaddr);
}
@@ -1600,7 +1590,8 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
kaddr += vcpu->arch.v_state.vs_offset;
vs = kaddr;
- kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
+ if (xchg(&vs->state, VS_NOT_IN_GUEST) == VS_SHOULD_FLUSH)
+ kvm_x86_ops->tlb_flush(vcpu);
kunmap_atomic(kaddr);
}
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter
2012-07-31 10:49 ` [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter Nikunj A. Dadhania
@ 2012-08-02 20:14 ` Marcelo Tosatti
2012-08-02 20:16 ` Marcelo Tosatti
0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2012-08-02 20:14 UTC (permalink / raw)
To: Nikunj A. Dadhania; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Tue, Jul 31, 2012 at 04:19:02PM +0530, Nikunj A. Dadhania wrote:
> From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
>
> PV-Flush guest would indicate to flush on enter, flush the TLB before
> entering and exiting the guest.
>
> Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> arch/x86/kvm/x86.c | 23 +++++++----------------
> 1 files changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 580abcf..a67e971 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1557,20 +1557,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> }
>
> -static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
> -{
> - int loop = 1000000;
> - while (1) {
> - if (cmpxchg(addr, old, new) == old)
> - break;
> - loop--;
> - if (!loop) {
> - pr_info("atomic cur: %lx old: %lx new: %lx\n",
> - *addr, old, new);
> - break;
> - }
> - }
> -}
> +#define VS_NOT_IN_GUEST (0)
> +#define VS_IN_GUEST (1 << KVM_VCPU_STATE_IN_GUEST_MODE)
> +#define VS_SHOULD_FLUSH (1 << KVM_VCPU_STATE_SHOULD_FLUSH)
>
> static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> {
> @@ -1584,7 +1573,8 @@ static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> kaddr += vcpu->arch.v_state.vs_offset;
> vs = kaddr;
> - kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> + if (xchg(&vs->state, VS_IN_GUEST) == VS_SHOULD_FLUSH)
> + kvm_x86_ops->tlb_flush(vcpu);
> kunmap_atomic(kaddr);
> }
>
> @@ -1600,7 +1590,8 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> kaddr += vcpu->arch.v_state.vs_offset;
> vs = kaddr;
> - kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> + if (xchg(&vs->state, VS_NOT_IN_GUEST) == VS_SHOULD_FLUSH)
> + kvm_x86_ops->tlb_flush(vcpu);
> kunmap_atomic(kaddr);
> }
Nevermind the early comment (the other comments on that message are
valid).
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter
2012-08-02 20:14 ` Marcelo Tosatti
@ 2012-08-02 20:16 ` Marcelo Tosatti
2012-08-03 5:37 ` Nikunj A Dadhania
0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2012-08-02 20:16 UTC (permalink / raw)
To: Nikunj A. Dadhania; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Thu, Aug 02, 2012 at 05:14:02PM -0300, Marcelo Tosatti wrote:
> On Tue, Jul 31, 2012 at 04:19:02PM +0530, Nikunj A. Dadhania wrote:
> > From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> >
> > PV-Flush guest would indicate to flush on enter, flush the TLB before
> > entering and exiting the guest.
> >
> > Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> > ---
> > arch/x86/kvm/x86.c | 23 +++++++----------------
> > 1 files changed, 7 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 580abcf..a67e971 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1557,20 +1557,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> > &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
> > }
> >
> > -static void kvm_set_atomic(u64 *addr, u64 old, u64 new)
> > -{
> > - int loop = 1000000;
> > - while (1) {
> > - if (cmpxchg(addr, old, new) == old)
> > - break;
> > - loop--;
> > - if (!loop) {
> > - pr_info("atomic cur: %lx old: %lx new: %lx\n",
> > - *addr, old, new);
> > - break;
> > - }
> > - }
> > -}
> > +#define VS_NOT_IN_GUEST (0)
> > +#define VS_IN_GUEST (1 << KVM_VCPU_STATE_IN_GUEST_MODE)
> > +#define VS_SHOULD_FLUSH (1 << KVM_VCPU_STATE_SHOULD_FLUSH)
> >
> > static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > {
> > @@ -1584,7 +1573,8 @@ static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > kaddr += vcpu->arch.v_state.vs_offset;
> > vs = kaddr;
> > - kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> > + if (xchg(&vs->state, VS_IN_GUEST) == VS_SHOULD_FLUSH)
> > + kvm_x86_ops->tlb_flush(vcpu);
> > kunmap_atomic(kaddr);
> > }
> >
> > @@ -1600,7 +1590,8 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > kaddr += vcpu->arch.v_state.vs_offset;
> > vs = kaddr;
> > - kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> > + if (xchg(&vs->state, VS_NOT_IN_GUEST) == VS_SHOULD_FLUSH)
> > + kvm_x86_ops->tlb_flush(vcpu);
> > kunmap_atomic(kaddr);
> > }
>
> Nevermind the early comment (the other comments on that message are
> valid).
Ah, so the pseucode mentions flush-on-exit because we can be clearing
the flag on xchg. Setting KVM_REQ_TLB_FLUSH instead should be enough
(please verify).
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter
2012-08-02 20:16 ` Marcelo Tosatti
@ 2012-08-03 5:37 ` Nikunj A Dadhania
2012-08-03 17:31 ` Marcelo Tosatti
0 siblings, 1 reply; 18+ messages in thread
From: Nikunj A Dadhania @ 2012-08-03 5:37 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Thu, 2 Aug 2012 17:16:41 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Aug 02, 2012 at 05:14:02PM -0300, Marcelo Tosatti wrote:
> > On Tue, Jul 31, 2012 at 04:19:02PM +0530, Nikunj A. Dadhania wrote:
> > > From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> > >
> > > static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > > {
> > > @@ -1584,7 +1573,8 @@ static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > > kaddr += vcpu->arch.v_state.vs_offset;
> > > vs = kaddr;
> > > - kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> > > + if (xchg(&vs->state, VS_IN_GUEST) == VS_SHOULD_FLUSH)
> > > + kvm_x86_ops->tlb_flush(vcpu);
> > > kunmap_atomic(kaddr);
> > > }
> > >
> > > @@ -1600,7 +1590,8 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> > > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > > kaddr += vcpu->arch.v_state.vs_offset;
> > > vs = kaddr;
> > > - kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> > > + if (xchg(&vs->state, VS_NOT_IN_GUEST) == VS_SHOULD_FLUSH)
> > > + kvm_x86_ops->tlb_flush(vcpu);
> > > kunmap_atomic(kaddr);
> > > }
> >
> > Nevermind the early comment (the other comments on that message are
> > valid).
I assume the above is related to kvm_set_atomic comment in the [3/8]
>
> Ah, so the pseucode mentions flush-on-exit because we can be clearing
> the flag on xchg. Setting KVM_REQ_TLB_FLUSH instead should be enough
> (please verify).
>
Yes, that will work while exiting.
In the vm_enter case, we need to do a kvm_x86_ops->tlb_flush(vcpu), as
we have already passed the phase of checking the request.
Nikunj
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter
2012-08-03 5:37 ` Nikunj A Dadhania
@ 2012-08-03 17:31 ` Marcelo Tosatti
0 siblings, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2012-08-03 17:31 UTC (permalink / raw)
To: Nikunj A Dadhania; +Cc: peterz, avi, raghukt, alex.shi, mingo, kvm, hpa
On Fri, Aug 03, 2012 at 11:07:13AM +0530, Nikunj A Dadhania wrote:
> On Thu, 2 Aug 2012 17:16:41 -0300, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Aug 02, 2012 at 05:14:02PM -0300, Marcelo Tosatti wrote:
> > > On Tue, Jul 31, 2012 at 04:19:02PM +0530, Nikunj A. Dadhania wrote:
> > > > From: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
> > > >
> > > > static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > > > {
> > > > @@ -1584,7 +1573,8 @@ static void kvm_set_vcpu_state(struct kvm_vcpu *vcpu)
> > > > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > > > kaddr += vcpu->arch.v_state.vs_offset;
> > > > vs = kaddr;
> > > > - kvm_set_atomic(&vs->state, 0, 1 << KVM_VCPU_STATE_IN_GUEST_MODE);
> > > > + if (xchg(&vs->state, VS_IN_GUEST) == VS_SHOULD_FLUSH)
> > > > + kvm_x86_ops->tlb_flush(vcpu);
> > > > kunmap_atomic(kaddr);
> > > > }
> > > >
> > > > @@ -1600,7 +1590,8 @@ static void kvm_clear_vcpu_state(struct kvm_vcpu *vcpu)
> > > > kaddr = kmap_atomic(vcpu->arch.v_state.vs_page);
> > > > kaddr += vcpu->arch.v_state.vs_offset;
> > > > vs = kaddr;
> > > > - kvm_set_atomic(&vs->state, 1 << KVM_VCPU_STATE_IN_GUEST_MODE, 0);
> > > > + if (xchg(&vs->state, VS_NOT_IN_GUEST) == VS_SHOULD_FLUSH)
> > > > + kvm_x86_ops->tlb_flush(vcpu);
> > > > kunmap_atomic(kaddr);
> > > > }
> > >
> > > Nevermind the early comment (the other comments on that message are
> > > valid).
> I assume the above is related to kvm_set_atomic comment in the [3/8]
Yes.
> > Ah, so the pseucode mentions flush-on-exit because we can be clearing
> > the flag on xchg. Setting KVM_REQ_TLB_FLUSH instead should be enough
> > (please verify).
> >
> Yes, that will work while exiting.
>
> In the vm_enter case, we need to do a kvm_x86_ops->tlb_flush(vcpu), as
> we have already passed the phase of checking the request.
Yes.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 7/8] Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is enabled
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
` (5 preceding siblings ...)
2012-07-31 10:49 ` [PATCH v3 6/8] KVM-HV: Add flush_on_enter before guest enter Nikunj A. Dadhania
@ 2012-07-31 10:49 ` Nikunj A. Dadhania
2012-07-31 10:49 ` [PATCH v3 8/8] KVM-doc: Add paravirt tlb flush document Nikunj A. Dadhania
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:49 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
arch/x86/Kconfig | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c70684f..354160d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -612,6 +612,17 @@ config PARAVIRT_SPINLOCKS
If you are unsure how to answer this question, answer N.
+config PARAVIRT_TLB_FLUSH
+ bool "Paravirtualization layer for TLB Flush"
+ depends on PARAVIRT && SMP && EXPERIMENTAL
+ select HAVE_RCU_TABLE_FREE
+ ---help---
+ Paravirtualized Flush TLB replace the native implementation
+ with something virtualization-friendly (for example, set a
+ flag for sleeping vcpu and do not wait for it).
+
+ If you are unsure how to answer this question, answer N.
+
config PARAVIRT_CLOCK
bool
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 8/8] KVM-doc: Add paravirt tlb flush document
2012-07-31 10:47 [PATCH v3 0/8] KVM paravirt remote flush tlb Nikunj A. Dadhania
` (6 preceding siblings ...)
2012-07-31 10:49 ` [PATCH v3 7/8] Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is enabled Nikunj A. Dadhania
@ 2012-07-31 10:49 ` Nikunj A. Dadhania
7 siblings, 0 replies; 18+ messages in thread
From: Nikunj A. Dadhania @ 2012-07-31 10:49 UTC (permalink / raw)
To: peterz, mtosatti, avi; +Cc: raghukt, alex.shi, mingo, kvm, hpa
Signed-off-by: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
---
Documentation/virtual/kvm/msr.txt | 4 ++
Documentation/virtual/kvm/paravirt-tlb-flush.txt | 53 ++++++++++++++++++++++
2 files changed, 57 insertions(+), 0 deletions(-)
create mode 100644 Documentation/virtual/kvm/paravirt-tlb-flush.txt
diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 7304710..92a6af6 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -256,3 +256,7 @@ MSR_KVM_EOI_EN: 0x4b564d04
guest must both read the least significant bit in the memory area and
clear it using a single CPU instruction, such as test and clear, or
compare and exchange.
+
+MSR_KVM_VCPU_STATE: 0x4b564d05
+
+Refer: Documentation/virtual/kvm/paravirt-tlb-flush.txt
diff --git a/Documentation/virtual/kvm/paravirt-tlb-flush.txt b/Documentation/virtual/kvm/paravirt-tlb-flush.txt
new file mode 100644
index 0000000..0eaabd7
--- /dev/null
+++ b/Documentation/virtual/kvm/paravirt-tlb-flush.txt
@@ -0,0 +1,53 @@
+KVM - Paravirt TLB Flush
+Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>, IBM, 2012
+========================================================
+
+Remote flushing api's does a busy wait which is fine in bare-metal
+scenario. But with-in the guest, the vcpus might have been pre-empted
+or blocked. In this scenario, the initator vcpu would end up
+busy-waiting for a long amount of time.
+
+This would require to have information of guest running/not-running
+within the guest to take a decision. The following MSR introduces vcpu
+running state information.
+
+Using this MSR we have implemented para-virt flush tlbs making sure
+that it does not wait for vcpus that are not-running. And TLB flushing
+for them is deferred, which is done on guest enter.
+
+MSR_KVM_VCPU_STATE: 0x4b564d04
+
+ data: 64-byte alignment physical address of a memory area which must be
+ in guest RAM, plus an enable bit in bit 0. This memory is expected to
+ hold a copy of the following structure:
+
+ struct kvm_steal_time {
+ __u64 state;
+ __u32 pad[14];
+ }
+
+ whose data will be filled in by the hypervisor/guest. Only one
+ write, or registration, is needed for each VCPU. The interval
+ between updates of this structure is arbitrary and
+ implementation-dependent. The hypervisor may update this
+ structure at any time it sees fit until anything with bit0 ==
+ 0 is written to it. Guest is required to make sure this
+ structure is initialized to zero.
+
+ This would enable a VCPU to know running status of sibling
+ VCPUs. The information can further be used to determine if an
+ IPI needs to be send to the non-running VCPU and wait for them
+ unnecessarily. For e.g. flush_tlb_others_ipi.
+
+ Fields have the following meanings:
+
+ state: has bit following fields:
+
+ Bit 0 - vcpu running state. Hypervisor would set vcpu
+ running/not running. Value 1 meaning the vcpu
+ is running and value 0 means vcpu is
+ pre-empted out.
+
+ Bit 1 - hypervisor should flush tlb is set during
+ guest enter/exit
+
^ permalink raw reply related [flat|nested] 18+ messages in thread