[PATCH V5 1/2] perf ignore LBR and extra

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V5 1/2] perf ignore LBR and extra_regs
@ 2014-07-10 10:59 kan.liang
  2014-07-10 10:59 ` [PATCH V5 2/2] kvm: ignore LBR and extra_reg kan.liang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: kan.liang @ 2014-07-10 10:59 UTC (permalink / raw)
  To: peterz; +Cc: andi, linux-kernel, kvm, Kan Liang

From: Kan Liang <kan.liang@intel.com>

x86, perf: Protect LBR and extra_regs against KVM lying

With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
When the guest perf driver tries to access LBR or extra_regs MSR,
it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
So check the related MSRs access right once at initialization time to avoid the error access at runtime.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
Start the guest with -cpu host.
Run perf record with --branch-any or --branch-filter in guest to trigger LBR #GP.
Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP

Signed-off-by: Kan Liang <kan.liang@intel.com>

V2: Move the check code to initialization time.
V3: Add flag for each extra register.
    Check all LBR MSRs at initialization time.
V4: Remove lbr_msr_access. For LBR msr, simply set lbr_nr to 0 if check_msr failed.
    Disable all extra msrs in creation places if check_msr failed.
V5: Fix check_msr broken
    Don't check any more MSRs after the first fail
    Return error when checking fail to stop creating the event
    Remove the checking code path which never get
---
 arch/x86/kernel/cpu/perf_event.c       |  3 +++
 arch/x86/kernel/cpu/perf_event.h       | 45 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_intel.c | 38 +++++++++++++++++++++++++++-
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2bdfbff..a7c5e4b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
 			continue;
 		if (event->attr.config1 & ~er->valid_mask)
 			return -EINVAL;
+		/* Check if the extra msrs can be safely accessed*/
+		if (!x86_pmu.extra_msr_access[er->idx])
+			return -EFAULT;
 
 		reg->idx = er->idx;
 		reg->config = event->attr.config1;
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3b2f9bd..992c678 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -464,6 +464,12 @@ struct x86_pmu {
 	 */
 	struct extra_reg *extra_regs;
 	unsigned int er_flags;
+	/*
+	 * EXTRA REG MSR can be accessed
+	 * The extra registers are completely unrelated to each other.
+	 * So it needs a flag for each extra register.
+	 */
+	bool		extra_msr_access[EXTRA_REG_MAX];
 
 	/*
 	 * Intel host/guest support (KVM)
@@ -525,6 +531,45 @@ extern u64 __read_mostly hw_cache_extra_regs
 				[PERF_COUNT_HW_CACHE_OP_MAX]
 				[PERF_COUNT_HW_CACHE_RESULT_MAX];
 
+/*
+ * Under certain circumstances, access certain MSR may cause #GP.
+ * The function tests if the input MSR can be safely accessed.
+ */
+static inline bool check_msr(unsigned long msr)
+{
+	u64 val_old, val_new, val_tmp;
+
+	/*
+	 * Read the current value, change it and read it back to see if it
+	 * matches, this is needed to detect certain hardware emulators
+	 * (qemu/kvm) that don't trap on the MSR access and always return 0s.
+	 */
+	if (rdmsrl_safe(msr, &val_old))
+		goto msr_fail;
+	/*
+	 * Only chagne it slightly,
+	 * since the higher bits of some MSRs cannot be updated by wrmsrl.
+	 * E.g. MSR_LBR_TOS
+	 */
+	val_tmp = val_old ^ 0x3UL;
+	if (wrmsrl_safe(msr, val_tmp) ||
+		rdmsrl_safe(msr, &val_new))
+		goto msr_fail;
+
+	if (val_new != val_tmp)
+		goto msr_fail;
+
+	/* Here it's sure that the MSR can be safely accessed.
+	 * Restore the old value and return.
+	 */
+	wrmsrl(msr, val_old);
+
+	return true;
+
+msr_fail:
+	return false;
+}
+
 u64 x86_perf_event_update(struct perf_event *event);
 
 static inline unsigned int x86_pmu_config_addr(int index)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index adb02aa..953b2b2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2262,7 +2262,8 @@ __init int intel_pmu_init(void)
 	union cpuid10_ebx ebx;
 	struct event_constraint *c;
 	unsigned int unused;
-	int version;
+	int version, i;
+	struct extra_reg *er;
 
 	if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
 		switch (boot_cpu_data.x86) {
@@ -2565,6 +2566,41 @@ __init int intel_pmu_init(void)
 		}
 	}
 
+	/*
+	 * Access LBR MSR may cause #GP under certain circumstances.
+	 * E.g. KVM doesn't support LBR MSR
+	 * Check all LBT MSR here.
+	 * Disable LBR access if any LBR MSRs can not be accessed.
+	 */
+	if (x86_pmu.lbr_nr) {
+		if (check_msr(x86_pmu.lbr_tos)) {
+			for (i = 0; i < x86_pmu.lbr_nr; i++) {
+				if (!(check_msr(x86_pmu.lbr_from + i) &&
+				      check_msr(x86_pmu.lbr_to + i))) {
+					x86_pmu.lbr_nr = 0;
+					break;
+				}
+			}
+		} else
+			x86_pmu.lbr_nr = 0;
+	}
+	/*
+	 * Access extra MSR may cause #GP under certain circumstances.
+	 * E.g. KVM doesn't support offcore event
+	 * Check all extra_regs here.
+	 */
+	if (x86_pmu.extra_regs) {
+		memset(x86_pmu.extra_msr_access,
+			true, sizeof(bool) * EXTRA_REG_MAX);
+		for (er = x86_pmu.extra_regs; er->msr; er++) {
+			x86_pmu.extra_msr_access[er->idx] =
+				check_msr(er->msr);
+		}
+		/* Disable LBR select mapping */
+		if (!x86_pmu.extra_msr_access[EXTRA_REG_LBR])
+			x86_pmu.lbr_sel_map = NULL;
+	}
+
 	/* Support full width counters using alternative MSR range */
 	if (x86_pmu.intel_cap.full_width_write) {
 		x86_pmu.max_period = x86_pmu.cntval_mask;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V5 2/2] kvm: ignore LBR and extra_reg
  2014-07-10 10:59 [PATCH V5 1/2] perf ignore LBR and extra_regs kan.liang
@ 2014-07-10 10:59 ` kan.liang
  2014-07-14 10:53 ` [PATCH V5 1/2] perf ignore LBR and extra_regs Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: kan.liang @ 2014-07-10 10:59 UTC (permalink / raw)
  To: peterz; +Cc: andi, linux-kernel, kvm, Kan Liang, Andi Kleen

From: Kan Liang <kan.liang@intel.com>

With -cpu host KVM reports LBR and extra_regs support, so the perf driver may accesses the LBR and extra_regs MSRs.
However, there is no LBR and extra_regs virtualization support yet. This could causes guest to crash.
As a workaround, KVM just simply ignore the LBR and extra_regs MSRs to lie the guest.

For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
Start the guest with -cpu host.
Run perf record with --branch-any or --branch-filter in guest to trigger LBR #GP.
Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>

V3: add MSR_LBR_TOS
V4: add MSR_LBR_SELECT and MSR_PEBS_LD_LAT_THRESHOLD
V5: set_msr should return 0 to lie the guest
---
 arch/x86/kvm/pmu.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cbecaa9..5fd5b44 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -331,6 +331,18 @@ bool kvm_pmu_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		ret = pmu->version > 1;
 		break;
+	case MSR_OFFCORE_RSP_0:
+	case MSR_OFFCORE_RSP_1:
+	case MSR_LBR_SELECT:
+	case MSR_PEBS_LD_LAT_THRESHOLD:
+	case MSR_LBR_TOS:
+	/* At most 8-deep LBR for core and atom */
+	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 7:
+	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 7:
+	/* 16-deep LBR for core i3/i5/i7 series processors */
+	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 15:
+	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 15:
+		return 1; /* to avoid crashes */
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)
 			|| get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0)
@@ -358,6 +370,19 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		*data = pmu->global_ovf_ctrl;
 		return 0;
+	case MSR_OFFCORE_RSP_0:
+	case MSR_OFFCORE_RSP_1:
+	case MSR_LBR_SELECT:
+	case MSR_PEBS_LD_LAT_THRESHOLD:
+	case MSR_LBR_TOS:
+	/* At most 8-deep LBR for core and atom */
+	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 7:
+	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 7:
+	/* 16-deep LBR for core i3/i5/i7 series processors */
+	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 15:
+	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 15:
+		*data = 0;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, index, MSR_IA32_PERFCTR0)) ||
 				(pmc = get_fixed_pmc(pmu, index))) {
@@ -409,6 +434,19 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_OFFCORE_RSP_0:
+	case MSR_OFFCORE_RSP_1:
+	case MSR_LBR_SELECT:
+	case MSR_PEBS_LD_LAT_THRESHOLD:
+	case MSR_LBR_TOS:
+	/* At most 8-deep LBR for core and atom */
+	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 7:
+	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 7:
+	/* 16-deep LBR for core i3/i5/i7 series processors */
+	case MSR_LBR_NHM_FROM ... MSR_LBR_NHM_FROM + 15:
+	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 15:
+		/* dummy for now */
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, index, MSR_IA32_PERFCTR0)) ||
 				(pmc = get_fixed_pmc(pmu, index))) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-10 10:59 [PATCH V5 1/2] perf ignore LBR and extra_regs kan.liang
  2014-07-10 10:59 ` [PATCH V5 2/2] kvm: ignore LBR and extra_reg kan.liang
@ 2014-07-14 10:53 ` Peter Zijlstra
  2014-07-14 14:28   ` Liang, Kan
  2014-07-14 11:08 ` Peter Zijlstra
  2014-07-14 11:55 ` Paolo Bonzini
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-14 10:53 UTC (permalink / raw)
  To: kan.liang; +Cc: andi, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 5658 bytes --]

On Thu, Jul 10, 2014 at 03:59:43AM -0700, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> x86, perf: Protect LBR and extra_regs against KVM lying
> 
> With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
> When the guest perf driver tries to access LBR or extra_regs MSR,
> it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
> So check the related MSRs access right once at initialization time to avoid the error access at runtime.
> 
> For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
> And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
> Start the guest with -cpu host.
> Run perf record with --branch-any or --branch-filter in guest to trigger LBR #GP.
> Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP

This is still not properly wrapped at 78 chars.

> Signed-off-by: Kan Liang <kan.liang@intel.com>
> 
> V2: Move the check code to initialization time.
> V3: Add flag for each extra register.
>     Check all LBR MSRs at initialization time.
> V4: Remove lbr_msr_access. For LBR msr, simply set lbr_nr to 0 if check_msr failed.
>     Disable all extra msrs in creation places if check_msr failed.
> V5: Fix check_msr broken
>     Don't check any more MSRs after the first fail
>     Return error when checking fail to stop creating the event
>     Remove the checking code path which never get

These things should go below the --- so they get thrown away when
applying the patch, its of no relevance once applied.

> ---
>  arch/x86/kernel/cpu/perf_event.c       |  3 +++
>  arch/x86/kernel/cpu/perf_event.h       | 45 ++++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/perf_event_intel.c | 38 +++++++++++++++++++++++++++-
>  3 files changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 2bdfbff..a7c5e4b 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
>  			continue;
>  		if (event->attr.config1 & ~er->valid_mask)
>  			return -EINVAL;
> +		/* Check if the extra msrs can be safely accessed*/
> +		if (!x86_pmu.extra_msr_access[er->idx])
> +			return -EFAULT;

This is not a correct usage of -EFAULT. Event creation did not fail
because we took a fault dereferencing a user provided pointer. Possibly
ENXIO is appropriate.

>  		reg->idx = er->idx;
>  		reg->config = event->attr.config1;
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 3b2f9bd..992c678 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -464,6 +464,12 @@ struct x86_pmu {
>  	 */
>  	struct extra_reg *extra_regs;
>  	unsigned int er_flags;
> +	/*
> +	 * EXTRA REG MSR can be accessed
> +	 * The extra registers are completely unrelated to each other.
> +	 * So it needs a flag for each extra register.
> +	 */
> +	bool		extra_msr_access[EXTRA_REG_MAX];

So why not in struct extra_reg again? You didn't give a straight answer
there.

> +/*
> + * Under certain circumstances, access certain MSR may cause #GP.
> + * The function tests if the input MSR can be safely accessed.
> + */
> +static inline bool check_msr(unsigned long msr)
> +{

This reads like a generic function;

> +	u64 val_old, val_new, val_tmp;
> +
> +	/*
> +	 * Read the current value, change it and read it back to see if it
> +	 * matches, this is needed to detect certain hardware emulators
> +	 * (qemu/kvm) that don't trap on the MSR access and always return 0s.
> +	 */
> +	if (rdmsrl_safe(msr, &val_old))
> +		goto msr_fail;
> +	/*
> +	 * Only chagne it slightly,
> +	 * since the higher bits of some MSRs cannot be updated by wrmsrl.
> +	 * E.g. MSR_LBR_TOS
> +	 */
> +	val_tmp = val_old ^ 0x3UL;

but this is not generally true; not all MSRs can write the 2 LSB, can
they? One option would be to extend the function with a u64 mask.

> +	if (wrmsrl_safe(msr, val_tmp) ||
> +		rdmsrl_safe(msr, &val_new))
> +		goto msr_fail;
> +
> +	if (val_new != val_tmp)
> +		goto msr_fail;
> +
> +	/* Here it's sure that the MSR can be safely accessed.
> +	 * Restore the old value and return.
> +	 */
> +	wrmsrl(msr, val_old);
> +
> +	return true;
> +
> +msr_fail:
> +	return false;
> +}

Also, by now this function is far too large to be inline and in a
header.

> +	/*
> +	 * Access LBR MSR may cause #GP under certain circumstances.
> +	 * E.g. KVM doesn't support LBR MSR
> +	 * Check all LBT MSR here.
> +	 * Disable LBR access if any LBR MSRs can not be accessed.
> +	 */
> +	if (x86_pmu.lbr_nr) {
> +		if (check_msr(x86_pmu.lbr_tos)) {
> +			for (i = 0; i < x86_pmu.lbr_nr; i++) {
> +				if (!(check_msr(x86_pmu.lbr_from + i) &&
> +				      check_msr(x86_pmu.lbr_to + i))) {
> +					x86_pmu.lbr_nr = 0;
> +					break;
> +				}
> +			}
> +		} else
> +			x86_pmu.lbr_nr = 0;

That's needlessly complex and indented.

	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos)
		x86_pmu.lbr_nr = 0;

	for (i = 0; i < x86_pmu.lbr_nr; i++) {
		if (!(check_msr(x86_pmu.lbr_from + i) &&
		      check_msr(x86_pmu.lbr_to   + i)))
			x86_pmu.lbr_nr = 0;
	}

You don't need to wrap the for loop in a lbr_nr test and you don't need
a break to terminate. Once you set lbr_nr = 0, the for loop will
terminate on its own. If it was already 0 it would've never started.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-10 10:59 [PATCH V5 1/2] perf ignore LBR and extra_regs kan.liang
  2014-07-10 10:59 ` [PATCH V5 2/2] kvm: ignore LBR and extra_reg kan.liang
  2014-07-14 10:53 ` [PATCH V5 1/2] perf ignore LBR and extra_regs Peter Zijlstra
@ 2014-07-14 11:08 ` Peter Zijlstra
  2014-07-14 11:55 ` Paolo Bonzini
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-14 11:08 UTC (permalink / raw)
  To: kan.liang; +Cc: andi, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On Thu, Jul 10, 2014 at 03:59:43AM -0700, kan.liang@intel.com wrote:
> +/*
> + * Under certain circumstances, access certain MSR may cause #GP.
> + * The function tests if the input MSR can be safely accessed.
> + */
> +static inline bool check_msr(unsigned long msr)
> +{
> +	u64 val_old, val_new, val_tmp;
> +
> +	/*
> +	 * Read the current value, change it and read it back to see if it
> +	 * matches, this is needed to detect certain hardware emulators
> +	 * (qemu/kvm) that don't trap on the MSR access and always return 0s.
> +	 */
> +	if (rdmsrl_safe(msr, &val_old))
> +		goto msr_fail;
> +	/*
> +	 * Only chagne it slightly,
> +	 * since the higher bits of some MSRs cannot be updated by wrmsrl.
> +	 * E.g. MSR_LBR_TOS
> +	 */
> +	val_tmp = val_old ^ 0x3UL;
> +	if (wrmsrl_safe(msr, val_tmp) ||
> +		rdmsrl_safe(msr, &val_new))
> +		goto msr_fail;
> +
> +	if (val_new != val_tmp)
> +		goto msr_fail;
> +
> +	/* Here it's sure that the MSR can be safely accessed.
> +	 * Restore the old value and return.
> +	 */
> +	wrmsrl(msr, val_old);
> +
> +	return true;
> +
> +msr_fail:
> +	return false;
> +}

I don't think we need the msr_fail thing, there's no state to clean up,
you can return false at all places you now have goto.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-10 10:59 [PATCH V5 1/2] perf ignore LBR and extra_regs kan.liang
                   ` (2 preceding siblings ...)
  2014-07-14 11:08 ` Peter Zijlstra
@ 2014-07-14 11:55 ` Paolo Bonzini
  2014-07-14 12:09   ` Peter Zijlstra
  3 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-07-14 11:55 UTC (permalink / raw)
  To: kan.liang, peterz; +Cc: andi, linux-kernel, kvm

Il 10/07/2014 12:59, kan.liang@intel.com ha scritto:
> From: Kan Liang <kan.liang@intel.com>
>
> x86, perf: Protect LBR and extra_regs against KVM lying
>
> With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
> When the guest perf driver tries to access LBR or extra_regs MSR,
> it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
> So check the related MSRs access right once at initialization time to avoid the error access at runtime.
>
> For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
> And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).

I'm not sure this is a useful patch.

This is #GP'ing just because of a limitation in the PMU; just compile 
the kernel with CONFIG_PARAVIRT, or split the "rdmsr is always 
rdmsr_safe" behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.

In fact there's no reason why LBR cannot be virtualized (though it does 
need support from the processor), and it may even be possible to support 
OFFCORE_RSP_X in the KVM virtual PMU.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 11:55 ` Paolo Bonzini
@ 2014-07-14 12:09   ` Peter Zijlstra
  2014-07-14 12:40     ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-14 12:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kan.liang, andi, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Mon, Jul 14, 2014 at 01:55:03PM +0200, Paolo Bonzini wrote:
> Il 10/07/2014 12:59, kan.liang@intel.com ha scritto:
> >From: Kan Liang <kan.liang@intel.com>
> >
> >x86, perf: Protect LBR and extra_regs against KVM lying
> >
> >With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
> >When the guest perf driver tries to access LBR or extra_regs MSR,
> >it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
> >So check the related MSRs access right once at initialization time to avoid the error access at runtime.
> >
> >For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
> >And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
> 
> I'm not sure this is a useful patch.
> 
> This is #GP'ing just because of a limitation in the PMU; just compile the
> kernel with CONFIG_PARAVIRT

How's that going to help? If you run kvm -host the VM is lying through
its teeth is the kernel is going to assume all those MSRs present,
PARAVIRT isn't going to help with this.

> , or split the "rdmsr is always rdmsr_safe"
> behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.

That's not useful either, because non of these code-paths are going to
check the return value.

> In fact there's no reason why LBR cannot be virtualized (though it does need
> support from the processor), and it may even be possible to support
> OFFCORE_RSP_X in the KVM virtual PMU.

But its not, so something needs to be done, right?

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 12:09   ` Peter Zijlstra
@ 2014-07-14 12:40     ` Paolo Bonzini
  2014-07-14 12:48       ` Peter Zijlstra
  2014-07-14 13:36       ` Liang, Kan
  0 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-07-14 12:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: kan.liang, andi, linux-kernel, kvm

Il 14/07/2014 14:09, Peter Zijlstra ha scritto:
> On Mon, Jul 14, 2014 at 01:55:03PM +0200, Paolo Bonzini wrote:
>> Il 10/07/2014 12:59, kan.liang@intel.com ha scritto:
>>> From: Kan Liang <kan.liang@intel.com>
>>>
>>> x86, perf: Protect LBR and extra_regs against KVM lying
>>>
>>> With -cpu host, KVM reports LBR and extra_regs support, if the host has support.
>>> When the guest perf driver tries to access LBR or extra_regs MSR,
>>> it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support.
>>> So check the related MSRs access right once at initialization time to avoid the error access at runtime.
>>>
>>> For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel).
>>> And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel).
>>
>> I'm not sure this is a useful patch.
>>
>> This is #GP'ing just because of a limitation in the PMU; just compile the
>> kernel with CONFIG_PARAVIRT
>
> How's that going to help? If you run kvm -host the VM is lying through
> its teeth is the kernel is going to assume all those MSRs present,
> PARAVIRT isn't going to help with this.
>
>> , or split the "rdmsr is always rdmsr_safe"
>> behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.
>
> That's not useful either, because non of these code-paths are going to
> check the return value.

Hmmm, I thought rdmsr_safe was going to return zero, but it just returns 
whatever happened to be in edx:eax which maybe should also be fixed.

Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or 
just no events reported?

>> In fact there's no reason why LBR cannot be virtualized (though it does need
>> support from the processor), and it may even be possible to support
>> OFFCORE_RSP_X in the KVM virtual PMU.
>
> But its not, so something needs to be done, right?

A first thing that could be done, is to have a way for the kernel to 
detect absence of LBR, for example an all-1s setting of the LBR format 
field of IA32_PERF_CAPABILITIES.  If Intel can tell us "all 1s will 
never be used", we can have KVM set the field that way.  The kernel then 
should disable LBR.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 12:40     ` Paolo Bonzini
@ 2014-07-14 12:48       ` Peter Zijlstra
  2014-07-14 13:47         ` Paolo Bonzini
  2014-07-14 13:36       ` Liang, Kan
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-14 12:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kan.liang, andi, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Mon, Jul 14, 2014 at 02:40:33PM +0200, Paolo Bonzini wrote:
> Hmmm, I thought rdmsr_safe was going to return zero, but it just returns
> whatever happened to be in edx:eax which maybe should also be fixed.
> 
> Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or just no
> events reported?

Either way; its not functioning according to 'spec'. Therefore we should
'disable' things.

> >>In fact there's no reason why LBR cannot be virtualized (though it does need
> >>support from the processor), and it may even be possible to support
> >>OFFCORE_RSP_X in the KVM virtual PMU.
> >
> >But its not, so something needs to be done, right?
> 
> A first thing that could be done, is to have a way for the kernel to detect
> absence of LBR

Which is exactly what this patch does, no?

> , for example an all-1s setting of the LBR format field of
> IA32_PERF_CAPABILITIES.  If Intel can tell us "all 1s will never be used",
> we can have KVM set the field that way.  The kernel then should disable LBR.

So what's wrong with testing if the MSRs that provide LBR actually work
or not? That doesn't require any magic co-operation of the host/vm
machinery and is therefore far more robust.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 12:40     ` Paolo Bonzini
  2014-07-14 12:48       ` Peter Zijlstra
@ 2014-07-14 13:36       ` Liang, Kan
  2014-07-14 13:40         ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Liang, Kan @ 2014-07-14 13:36 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Zijlstra
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org



> >>> For reproducing the issue, please build the kernel with
> CONFIG_KVM_INTEL = y (for host kernel).
> >>> And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest
> kernel).
> >>
> >> I'm not sure this is a useful patch.
> >>
> >> This is #GP'ing just because of a limitation in the PMU; just compile
> >> the kernel with CONFIG_PARAVIRT
> >
> > How's that going to help? If you run kvm -host the VM is lying through
> > its teeth is the kernel is going to assume all those MSRs present,
> > PARAVIRT isn't going to help with this.
> >
> >> , or split the "rdmsr is always rdmsr_safe"
> >> behavior out of CONFIG_PARAVIRT and into a new Kconfig symbol.
> >
> > That's not useful either, because non of these code-paths are going to
> > check the return value.
> 
> Hmmm, I thought rdmsr_safe was going to return zero, but it just returns
> whatever happened to be in edx:eax which maybe should also be fixed.
> 
> Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or
> just no events reported?
> 

Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will handle the #GP. So there is no error report in guest.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 13:36       ` Liang, Kan
@ 2014-07-14 13:40         ` Paolo Bonzini
  2014-07-14 13:44           ` Liang, Kan
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-07-14 13:40 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org

Il 14/07/2014 15:36, Liang, Kan ha scritto:
>> > Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage or
>> > just no events reported?
>> >
> Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will handle the #GP. So there is no error report in guest.

Yeah, but what's the effect on the output of perf?

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 13:40         ` Paolo Bonzini
@ 2014-07-14 13:44           ` Liang, Kan
  0 siblings, 0 replies; 14+ messages in thread
From: Liang, Kan @ 2014-07-14 13:44 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Zijlstra
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, July 14, 2014 9:40 AM
> To: Liang, Kan; Peter Zijlstra
> Cc: andi@firstfloor.org; linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
> 
> Il 14/07/2014 15:36, Liang, Kan ha scritto:
> >> > Kan Liang, what happens if CONFIG_PARAVIRT=y?  Do you get garbage
> >> > or just no events reported?
> >> >
> > Guest rdmsr/wrmsr will eventually call rdmsr_safe/wrmsr_safe. They will
> handle the #GP. So there is no error report in guest.
> 
> Yeah, but what's the effect on the output of perf?

They can be still counted/sampled, but the results are garbage.

Kan

> 
> Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 12:48       ` Peter Zijlstra
@ 2014-07-14 13:47         ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-07-14 13:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: kan.liang, andi, linux-kernel, kvm

Il 14/07/2014 14:48, Peter Zijlstra ha scritto:
>>>> In fact there's no reason why LBR cannot be virtualized (though it does need
>>>> support from the processor), and it may even be possible to support
>>>> OFFCORE_RSP_X in the KVM virtual PMU.
>>>
>>> But its not, so something needs to be done, right?
>>
>> A first thing that could be done, is to have a way for the kernel to detect
>> absence of LBR
>
> Which is exactly what this patch does, no?

A documented (and recommended) way for the kernel to detect it is 
superior though...

>> , for example an all-1s setting of the LBR format field of
>> IA32_PERF_CAPABILITIES.  If Intel can tell us "all 1s will never be used",
>> we can have KVM set the field that way.  The kernel then should disable LBR.
>
> So what's wrong with testing if the MSRs that provide LBR actually work
> or not? That doesn't require any magic co-operation of the host/vm
> machinery and is therefore far more robust.

... because the difference is that whenever we hack the kernel, Windows 
and vTune will remain broken forever.  Whenever we get Intel to make the 
hack part of the spec, there is some hope that Windows and vTune will 
eventually get fixed.

The hack certainly works, I'm not disputing that.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 10:53 ` [PATCH V5 1/2] perf ignore LBR and extra_regs Peter Zijlstra
@ 2014-07-14 14:28   ` Liang, Kan
  2014-07-14 16:28     ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Liang, Kan @ 2014-07-14 14:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org



> > diff --git a/arch/x86/kernel/cpu/perf_event.h
> > b/arch/x86/kernel/cpu/perf_event.h
> > index 3b2f9bd..992c678 100644
> > --- a/arch/x86/kernel/cpu/perf_event.h
> > +++ b/arch/x86/kernel/cpu/perf_event.h
> > @@ -464,6 +464,12 @@ struct x86_pmu {
> >  	 */
> >  	struct extra_reg *extra_regs;
> >  	unsigned int er_flags;
> > +	/*
> > +	 * EXTRA REG MSR can be accessed
> > +	 * The extra registers are completely unrelated to each other.
> > +	 * So it needs a flag for each extra register.
> > +	 */
> > +	bool		extra_msr_access[EXTRA_REG_MAX];
> 
> So why not in struct extra_reg again? You didn't give a straight answer there.

I think I did in the email.
You mentioned that there's still (only) 4 empty bytes at the tail of extra_reg itself. 
However, the extra_reg_type may be extended in the near future.
So that may not be a reason to move to extra_reg.

Furthermore, if we move extra_msr_access to extra_reg, 
I guess we have to modify all the related micros (i.e EVENT_EXTRA_REG) to initialize the new items.
That could be a big change.

On the other side, in x86_pmu structure, there are extra_regs related items defined under the comments "Extra registers for events".
 And the bit holes are enough for current usage and future extension.

So I guess  x86_pmu should be a good place to store the availability of the reg.

         /* --- cacheline 6 boundary (384 bytes) --- */
         bool                       lbr_double_abort;     /*   384     1 */
 
         /* XXX 7 bytes hole, try to pack */
 
         struct extra_reg *         extra_regs;           /*   392     8 */
         unsigned int               er_flags;             /*   400     4 */
 
         /* XXX 4 bytes hole, try to pack */
 
         struct perf_guest_switch_msr * (*guest_get_msrs)(int *); /*   408     8 */
 
         /* size: 416, cachelines: 7, members: 64 */
         /* sum members: 391, holes: 6, sum holes: 25 */
         /* bit holes: 1, sum bit holes: 27 bits */
         /* last cacheline: 32 bytes */

> 
> > +/*
> > + * Under certain circumstances, access certain MSR may cause #GP.
> > + * The function tests if the input MSR can be safely accessed.
> > + */
> > +static inline bool check_msr(unsigned long msr) {
> 
> This reads like a generic function;
> 
> > +	u64 val_old, val_new, val_tmp;
> > +
> > +	/*
> > +	 * Read the current value, change it and read it back to see if it
> > +	 * matches, this is needed to detect certain hardware emulators
> > +	 * (qemu/kvm) that don't trap on the MSR access and always return
> 0s.
> > +	 */
> > +	if (rdmsrl_safe(msr, &val_old))
> > +		goto msr_fail;
> > +	/*
> > +	 * Only chagne it slightly,
> > +	 * since the higher bits of some MSRs cannot be updated by wrmsrl.
> > +	 * E.g. MSR_LBR_TOS
> > +	 */
> > +	val_tmp = val_old ^ 0x3UL;
> 
> but this is not generally true; not all MSRs can write the 2 LSB, can they? One
> option would be to extend the function with a u64 mask.

Right, the function should be easily used to check all MSRs, not just for the MSRs I tested.
I will pass a mask as a parameter of the function.  

> 
> > +	if (wrmsrl_safe(msr, val_tmp) ||
> > +		rdmsrl_safe(msr, &val_new))
> > +		goto msr_fail;
> > +
> > +	if (val_new != val_tmp)
> > +		goto msr_fail;
> > +
> > +	/* Here it's sure that the MSR can be safely accessed.
> > +	 * Restore the old value and return.
> > +	 */
> > +	wrmsrl(msr, val_old);
> > +
> > +	return true;
> > +
> > +msr_fail:
> > +	return false;
> > +}
> 
> Also, by now this function is far too large to be inline and in a header.

OK. I will move it to perf_event_intel.c as a static function.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/2] perf ignore LBR and extra_regs
  2014-07-14 14:28   ` Liang, Kan
@ 2014-07-14 16:28     ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2014-07-14 16:28 UTC (permalink / raw)
  To: Liang, Kan
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1602 bytes --]


so once more; and then I'm going to route your emails to /dev/null, wrap
text at 78 chars.

On Mon, Jul 14, 2014 at 02:28:36PM +0000, Liang, Kan wrote:
> > > +++ b/arch/x86/kernel/cpu/perf_event.h
> > > @@ -464,6 +464,12 @@ struct x86_pmu {
> > >  	 */
> > >  	struct extra_reg *extra_regs;
> > >  	unsigned int er_flags;
> > > +	/*
> > > +	 * EXTRA REG MSR can be accessed
> > > +	 * The extra registers are completely unrelated to each other.
> > > +	 * So it needs a flag for each extra register.
> > > +	 */
> > > +	bool		extra_msr_access[EXTRA_REG_MAX];
> > 
> > So why not in struct extra_reg again? You didn't give a straight answer there.
> 
> I think I did in the email.
> You mentioned that there's still (only) 4 empty bytes at the tail of
> extra_reg itself.  However, the extra_reg_type may be extended in the
> near future.  So that may not be a reason to move to extra_reg.

Well, you can always grow. Also be explicit, 'may be' is an empty
statement.

> Furthermore, if we move extra_msr_access to extra_reg, I guess we have
> to modify all the related micros (i.e EVENT_EXTRA_REG) to initialize
> the new items.  That could be a big change.

Nah, trivial stuff :-)

> On the other side, in x86_pmu structure, there are extra_regs related
> items defined under the comments "Extra registers for events".  And
> the bit holes are enough for current usage and future extension.
> 
> So I guess  x86_pmu should be a good place to store the availability
> of the reg.

It just doesn't make sense to me to have multiple arrays of the same
thing.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-07-14 16:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-10 10:59 [PATCH V5 1/2] perf ignore LBR and extra_regs kan.liang
2014-07-10 10:59 ` [PATCH V5 2/2] kvm: ignore LBR and extra_reg kan.liang
2014-07-14 10:53 ` [PATCH V5 1/2] perf ignore LBR and extra_regs Peter Zijlstra
2014-07-14 14:28   ` Liang, Kan
2014-07-14 16:28     ` Peter Zijlstra
2014-07-14 11:08 ` Peter Zijlstra
2014-07-14 11:55 ` Paolo Bonzini
2014-07-14 12:09   ` Peter Zijlstra
2014-07-14 12:40     ` Paolo Bonzini
2014-07-14 12:48       ` Peter Zijlstra
2014-07-14 13:47         ` Paolo Bonzini
2014-07-14 13:36       ` Liang, Kan
2014-07-14 13:40         ` Paolo Bonzini
2014-07-14 13:44           ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox