All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 5/5] fix kvm's use of __pa() on percpu areas
Date: Mon, 21 Jan 2013 21:02:07 +0200	[thread overview]
Message-ID: <20130121190207.GB25818@redhat.com> (raw)
In-Reply-To: <08cba1bf-6476-4fad-8d29-e380ec7127ba@email.android.com>

On Mon, Jan 21, 2013 at 12:38:06PM -0600, H. Peter Anvin wrote:
> Final question: are any of these done in frequent paths?  (I believe no, but...)
> 
No, only during guest boot.

> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> >
> >In short, it is illegal to call __pa() on an address holding
> >a percpu variable.  The times when this actually matters are
> >pretty obscure (certain 32-bit NUMA systems), but it _does_
> >happen.  It is important to keep KVM guests working on these
> >systems because the real hardware is getting harder and
> >harder to find.
> >
> >This bug manifested first by me seeing a plain hang at boot
> >after this message:
> >
> >	CPU 0 irqstacks, hard=f3018000 soft=f301a000
> >
> >or, sometimes, it would actually make it out to the console:
> >
> >[    0.000000] BUG: unable to handle kernel paging request at ffffffff
> >
> >I eventually traced it down to the KVM async pagefault code.
> >This can be worked around by disabling that code either at
> >compile-time, or on the kernel command-line.
> >
> >The kvm async pagefault code was injecting page faults in
> >to the guest which the guest misinterpreted because its
> >"reason" was not being properly sent from the host.
> >
> >The guest passes a physical address of an per-cpu async page
> >fault structure via an MSR to the host.  Since __pa() is
> >broken on percpu data, the physical address it sent was
> >bascially bogus and the host went scribbling on random data.
> >The guest never saw the real reason for the page fault (it
> >was injected by the host), assumed that the kernel had taken
> >a _real_ page fault, and panic()'d.  The behavior varied,
> >though, depending on what got corrupted by the bad write.
> >
> >Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> >Acked-by: Rik van Riel <riel@redhat.com>
> >---
> >
> > linux-2.6.git-dave/arch/x86/kernel/kvm.c      |    9 +++++----
> > linux-2.6.git-dave/arch/x86/kernel/kvmclock.c |    4 ++--
> > 2 files changed, 7 insertions(+), 6 deletions(-)
> >
> >diff -puN arch/x86/kernel/kvm.c~fix-kvm-__pa-use-on-percpu-areas
> >arch/x86/kernel/kvm.c
> >---
> >linux-2.6.git/arch/x86/kernel/kvm.c~fix-kvm-__pa-use-on-percpu-areas	2013-01-17
> >10:22:26.914436992 -0800
> >+++ linux-2.6.git-dave/arch/x86/kernel/kvm.c	2013-01-17
> >10:22:26.922437062 -0800
> >@@ -289,9 +289,9 @@ static void kvm_register_steal_time(void
> > 
> > 	memset(st, 0, sizeof(*st));
> > 
> >-	wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED));
> >+	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
> >KVM_MSR_ENABLED));
> > 	printk(KERN_INFO "kvm-stealtime: cpu %d, msr %lx\n",
> >-		cpu, __pa(st));
> >+		cpu, slow_virt_to_phys(st));
> > }
> > 
> >static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) =
> >KVM_PV_EOI_DISABLED;
> >@@ -316,7 +316,7 @@ void __cpuinit kvm_guest_cpu_init(void)
> > 		return;
> > 
> > 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
> >-		u64 pa = __pa(&__get_cpu_var(apf_reason));
> >+		u64 pa = slow_virt_to_phys(&__get_cpu_var(apf_reason));
> > 
> > #ifdef CONFIG_PREEMPT
> > 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
> >@@ -332,7 +332,8 @@ void __cpuinit kvm_guest_cpu_init(void)
> > 		/* Size alignment is implied but just to make it explicit. */
> > 		BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4);
> > 		__get_cpu_var(kvm_apic_eoi) = 0;
> >-		pa = __pa(&__get_cpu_var(kvm_apic_eoi)) | KVM_MSR_ENABLED;
> >+		pa = slow_virt_to_phys(&__get_cpu_var(kvm_apic_eoi))
> >+			| KVM_MSR_ENABLED;
> > 		wrmsrl(MSR_KVM_PV_EOI_EN, pa);
> > 	}
> > 
> >diff -puN arch/x86/kernel/kvmclock.c~fix-kvm-__pa-use-on-percpu-areas
> >arch/x86/kernel/kvmclock.c
> >---
> >linux-2.6.git/arch/x86/kernel/kvmclock.c~fix-kvm-__pa-use-on-percpu-areas	2013-01-17
> >10:22:26.918437028 -0800
> >+++ linux-2.6.git-dave/arch/x86/kernel/kvmclock.c	2013-01-17
> >10:22:26.922437062 -0800
> >@@ -162,8 +162,8 @@ int kvm_register_clock(char *txt)
> > 	int low, high, ret;
> > 	struct pvclock_vcpu_time_info *src = &hv_clock[cpu].pvti;
> > 
> >-	low = (int)__pa(src) | 1;
> >-	high = ((u64)__pa(src) >> 32);
> >+	low = (int)slow_virt_to_phys(src) | 1;
> >+	high = ((u64)slow_virt_to_phys(src) >> 32);
> > 	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
> > 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> > 	       cpu, high, low, txt);
> >_
> 
> -- 
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Gleb Natapov <gleb@redhat.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 5/5] fix kvm's use of __pa() on percpu areas
Date: Mon, 21 Jan 2013 21:02:07 +0200	[thread overview]
Message-ID: <20130121190207.GB25818@redhat.com> (raw)
In-Reply-To: <08cba1bf-6476-4fad-8d29-e380ec7127ba@email.android.com>

On Mon, Jan 21, 2013 at 12:38:06PM -0600, H. Peter Anvin wrote:
> Final question: are any of these done in frequent paths?  (I believe no, but...)
> 
No, only during guest boot.

> Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> 
> >
> >In short, it is illegal to call __pa() on an address holding
> >a percpu variable.  The times when this actually matters are
> >pretty obscure (certain 32-bit NUMA systems), but it _does_
> >happen.  It is important to keep KVM guests working on these
> >systems because the real hardware is getting harder and
> >harder to find.
> >
> >This bug manifested first by me seeing a plain hang at boot
> >after this message:
> >
> >	CPU 0 irqstacks, hard=f3018000 soft=f301a000
> >
> >or, sometimes, it would actually make it out to the console:
> >
> >[    0.000000] BUG: unable to handle kernel paging request at ffffffff
> >
> >I eventually traced it down to the KVM async pagefault code.
> >This can be worked around by disabling that code either at
> >compile-time, or on the kernel command-line.
> >
> >The kvm async pagefault code was injecting page faults in
> >to the guest which the guest misinterpreted because its
> >"reason" was not being properly sent from the host.
> >
> >The guest passes a physical address of an per-cpu async page
> >fault structure via an MSR to the host.  Since __pa() is
> >broken on percpu data, the physical address it sent was
> >bascially bogus and the host went scribbling on random data.
> >The guest never saw the real reason for the page fault (it
> >was injected by the host), assumed that the kernel had taken
> >a _real_ page fault, and panic()'d.  The behavior varied,
> >though, depending on what got corrupted by the bad write.
> >
> >Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> >Acked-by: Rik van Riel <riel@redhat.com>
> >---
> >
> > linux-2.6.git-dave/arch/x86/kernel/kvm.c      |    9 +++++----
> > linux-2.6.git-dave/arch/x86/kernel/kvmclock.c |    4 ++--
> > 2 files changed, 7 insertions(+), 6 deletions(-)
> >
> >diff -puN arch/x86/kernel/kvm.c~fix-kvm-__pa-use-on-percpu-areas
> >arch/x86/kernel/kvm.c
> >---
> >linux-2.6.git/arch/x86/kernel/kvm.c~fix-kvm-__pa-use-on-percpu-areas	2013-01-17
> >10:22:26.914436992 -0800
> >+++ linux-2.6.git-dave/arch/x86/kernel/kvm.c	2013-01-17
> >10:22:26.922437062 -0800
> >@@ -289,9 +289,9 @@ static void kvm_register_steal_time(void
> > 
> > 	memset(st, 0, sizeof(*st));
> > 
> >-	wrmsrl(MSR_KVM_STEAL_TIME, (__pa(st) | KVM_MSR_ENABLED));
> >+	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) |
> >KVM_MSR_ENABLED));
> > 	printk(KERN_INFO "kvm-stealtime: cpu %d, msr %lx\n",
> >-		cpu, __pa(st));
> >+		cpu, slow_virt_to_phys(st));
> > }
> > 
> >static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) =
> >KVM_PV_EOI_DISABLED;
> >@@ -316,7 +316,7 @@ void __cpuinit kvm_guest_cpu_init(void)
> > 		return;
> > 
> > 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
> >-		u64 pa = __pa(&__get_cpu_var(apf_reason));
> >+		u64 pa = slow_virt_to_phys(&__get_cpu_var(apf_reason));
> > 
> > #ifdef CONFIG_PREEMPT
> > 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
> >@@ -332,7 +332,8 @@ void __cpuinit kvm_guest_cpu_init(void)
> > 		/* Size alignment is implied but just to make it explicit. */
> > 		BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4);
> > 		__get_cpu_var(kvm_apic_eoi) = 0;
> >-		pa = __pa(&__get_cpu_var(kvm_apic_eoi)) | KVM_MSR_ENABLED;
> >+		pa = slow_virt_to_phys(&__get_cpu_var(kvm_apic_eoi))
> >+			| KVM_MSR_ENABLED;
> > 		wrmsrl(MSR_KVM_PV_EOI_EN, pa);
> > 	}
> > 
> >diff -puN arch/x86/kernel/kvmclock.c~fix-kvm-__pa-use-on-percpu-areas
> >arch/x86/kernel/kvmclock.c
> >---
> >linux-2.6.git/arch/x86/kernel/kvmclock.c~fix-kvm-__pa-use-on-percpu-areas	2013-01-17
> >10:22:26.918437028 -0800
> >+++ linux-2.6.git-dave/arch/x86/kernel/kvmclock.c	2013-01-17
> >10:22:26.922437062 -0800
> >@@ -162,8 +162,8 @@ int kvm_register_clock(char *txt)
> > 	int low, high, ret;
> > 	struct pvclock_vcpu_time_info *src = &hv_clock[cpu].pvti;
> > 
> >-	low = (int)__pa(src) | 1;
> >-	high = ((u64)__pa(src) >> 32);
> >+	low = (int)slow_virt_to_phys(src) | 1;
> >+	high = ((u64)slow_virt_to_phys(src) >> 32);
> > 	ret = native_write_msr_safe(msr_kvm_system_time, low, high);
> > 	printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> > 	       cpu, high, low, txt);
> >_
> 
> -- 
> Sent from my mobile phone. Please excuse brevity and lack of formatting.

--
			Gleb.

  parent reply	other threads:[~2013-01-21 19:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-21 17:52 [PATCH 0/5] fix illegal use of __pa() in KVM code Dave Hansen
2013-01-21 17:52 ` Dave Hansen
2013-01-21 17:52 ` [PATCH 1/5] make DEBUG_VIRTUAL work earlier in boot Dave Hansen
2013-01-21 17:52   ` Dave Hansen
2013-01-21 17:52 ` [PATCH 2/5] pagetable level size/shift/mask helpers Dave Hansen
2013-01-21 17:52   ` Dave Hansen
2013-01-21 17:52 ` [PATCH 3/5] use new pagetable helpers in try_preserve_large_page() Dave Hansen
2013-01-21 17:52   ` Dave Hansen
2013-01-21 17:52 ` [PATCH 4/5] create slow_virt_to_phys() Dave Hansen
2013-01-21 17:52   ` Dave Hansen
2013-01-21 18:08   ` H. Peter Anvin
2013-01-21 18:08     ` H. Peter Anvin
2013-01-21 18:18     ` Dave Hansen
2013-01-21 18:18       ` Dave Hansen
2013-01-21 17:52 ` [PATCH 5/5] fix kvm's use of __pa() on percpu areas Dave Hansen
2013-01-21 17:52   ` Dave Hansen
2013-01-21 18:38   ` H. Peter Anvin
2013-01-21 18:38     ` H. Peter Anvin
2013-01-21 18:59     ` Dave Hansen
2013-01-21 18:59       ` Dave Hansen
2013-01-21 19:22       ` H. Peter Anvin
2013-01-21 19:22         ` H. Peter Anvin
2013-01-21 19:02     ` Gleb Natapov [this message]
2013-01-21 19:02       ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2013-01-22 21:24 [PATCH 0/5] [v3] fix illegal use of __pa() in KVM code Dave Hansen
2013-01-22 21:24 ` [PATCH 5/5] fix kvm's use of __pa() on percpu areas Dave Hansen
2013-01-22 21:24   ` Dave Hansen
2013-01-23  0:08   ` Marcelo Tosatti
2013-01-23  0:08     ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130121190207.GB25818@redhat.com \
    --to=gleb@redhat.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=riel@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.