* [PATCH] KVM: SVM: fix random segfaults with NPT enabled
@ 2008-08-27 12:18 Joerg Rodel
2008-08-27 13:11 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Joerg Rodel @ 2008-08-27 12:18 UTC (permalink / raw)
To: avi; +Cc: kvm, Joerg Roedel, stable, Alexander Graf
From: Joerg Roedel <joerg.roedel@amd.com>
This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes
random segfaults and #UD exceptions in the guest seen under some workloads
(e.g. long running compile workloads or tbench). A kernbench run with and
without that fix showed that it has a slowdown lower than 0.5%
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
arch/x86/kvm/svm.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2d5aed4..980f140 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -59,6 +59,7 @@ static int npt = 1;
module_param(npt, int, S_IRUGO);
static void kvm_reput_irq(struct vcpu_svm *svm);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu);
static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
{
@@ -1004,10 +1005,17 @@ static int pf_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code,
(u32)fault_address, (u32)(fault_address >> 32),
handler);
- else
+ else {
KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code,
(u32)fault_address, (u32)(fault_address >> 32),
handler);
+ /*
+ * FIXME: Tis shouldn't be necessary here, but there is a flush
+ * missing in the MMU code. Until we find this bug, flush the
+ * complete TLB here on an NPF
+ */
+ svm_flush_tlb(&svm->vcpu);
+ }
if (event_injection)
kvm_mmu_unprotect_page_virt(&svm->vcpu, fault_address);
--
1.5.3.7
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 12:18 [PATCH] KVM: SVM: fix random segfaults with NPT enabled Joerg Rodel
@ 2008-08-27 13:11 ` Avi Kivity
2008-08-27 13:53 ` Avi Kivity
2008-08-27 13:53 ` Joerg Rodel
0 siblings, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 13:11 UTC (permalink / raw)
To: Joerg Rodel; +Cc: kvm, stable, Alexander Graf
Joerg Rodel wrote:
> From: Joerg Roedel <joerg.roedel@amd.com>
>
> This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes
> random segfaults and #UD exceptions in the guest seen under some workloads
> (e.g. long running compile workloads or tbench). A kernbench run with and
> without that fix showed that it has a slowdown lower than 0.5%
>
>
hm. tbench doesn't allocate memory, so there shouldn't be any npt
faults. I don't see how this can make a difference.
It can only change something if X is started and we're tracking writes
to the framebuffer. Is this the case?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 13:11 ` Avi Kivity
@ 2008-08-27 13:53 ` Avi Kivity
2008-08-27 13:57 ` Joerg Rodel
2008-08-27 13:53 ` Joerg Rodel
1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 13:53 UTC (permalink / raw)
To: Joerg Rodel; +Cc: kvm, stable, Alexander Graf
Avi Kivity wrote:
> Joerg Rodel wrote:
>> From: Joerg Roedel <joerg.roedel@amd.com>
>>
>> This patch introduces a guest TLB flush on every NPF exit in KVM.
>> This fixes
>> random segfaults and #UD exceptions in the guest seen under some
>> workloads
>> (e.g. long running compile workloads or tbench). A kernbench run with
>> and
>> without that fix showed that it has a slowdown lower than 0.5%
>>
>>
>
> hm. tbench doesn't allocate memory, so there shouldn't be any npt
> faults. I don't see how this can make a difference.
>
> It can only change something if X is started and we're tracking writes
> to the framebuffer. Is this the case?
>
I fixed a missing flush in this area. Does it help? (I doubt it). Can
you post instructions on how to reproduce?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 13:53 ` Avi Kivity
@ 2008-08-27 13:57 ` Joerg Rodel
2008-08-27 15:22 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Joerg Rodel @ 2008-08-27 13:57 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 04:53:26PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
> >Joerg Rodel wrote:
> >>From: Joerg Roedel <joerg.roedel@amd.com>
> >>
> >>This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes
> >>random segfaults and #UD exceptions in the guest seen under some workloads
> >>(e.g. long running compile workloads or tbench). A kernbench run with and
> >>without that fix showed that it has a slowdown lower than 0.5%
> >>
> >>
> >
> >hm. tbench doesn't allocate memory, so there shouldn't be any npt faults. I don't
> >see how this can make a difference.
> >
> >It can only change something if X is started and we're tracking writes to the
> >framebuffer. Is this the case?
> >
>
> I fixed a missing flush in this area. Does it help? (I doubt it). Can you post
> instructions on how to reproduce?
I will test it. Is the fix in your latest kernel.org tree? Reproduce it
with a KVM guest and start tbench in it with around 100 clients
configured. The tbench-process will crash when the bug is hit.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 13:57 ` Joerg Rodel
@ 2008-08-27 15:22 ` Avi Kivity
2008-08-27 15:35 ` Joerg Roedel
0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 15:22 UTC (permalink / raw)
To: Joerg Rodel; +Cc: kvm, stable, Alexander Graf
Joerg Rodel wrote:
> I will test it. Is the fix in your latest kernel.org tree?
It is now. It doesn't fix the problem.
> Reproduce it
> with a KVM guest and start tbench in it with around 100 clients
> configured. The tbench-process will crash when the bug is hit.
>
Does it reproduce with uniprocessor guests?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 15:22 ` Avi Kivity
@ 2008-08-27 15:35 ` Joerg Roedel
2008-08-27 15:50 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Joerg Roedel @ 2008-08-27 15:35 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 06:22:14PM +0300, Avi Kivity wrote:
> Joerg Rodel wrote:
> >I will test it. Is the fix in your latest kernel.org tree?
>
> It is now. It doesn't fix the problem.
>
> >Reproduce it
> >with a KVM guest and start tbench in it with around 100 clients
> >configured. The tbench-process will crash when the bug is hit.
> >
>
> Does it reproduce with uniprocessor guests?
Don't know yet. We will try that.
Joerg
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 15:35 ` Joerg Roedel
@ 2008-08-27 15:50 ` Avi Kivity
2008-08-27 16:27 ` Joerg Rodel
0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 15:50 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
Joerg Roedel wrote:
> On Wed, Aug 27, 2008 at 06:22:14PM +0300, Avi Kivity wrote:
>
>> Joerg Rodel wrote:
>>
>>> I will test it. Is the fix in your latest kernel.org tree?
>>>
>> It is now. It doesn't fix the problem.
>>
>>
>>> Reproduce it
>>> with a KVM guest and start tbench in it with around 100 clients
>>> configured. The tbench-process will crash when the bug is hit.
>>>
>>>
>> Does it reproduce with uniprocessor guests?
>>
>
> Don't know yet. We will try that.
>
>
It didn't reproduce here on uniprocessor, but I hadn't tried for long.
Some observations:
- tbench triggers many cases where we have concurrent faults on the same
address. these are serialized by mmu_lock. I tried to have
direct_map_entry() return is it detects a race. didn't help.
- I instrumented set_shadow_pte() to warn if changing the pfn or
writeable bit. Didn't trip.
Are there any rules for touching npt ptes concurrently?
Meanwhile, I applied the patch, but I'm very worried about this.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 15:50 ` Avi Kivity
@ 2008-08-27 16:27 ` Joerg Rodel
2008-08-27 16:49 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Joerg Rodel @ 2008-08-27 16:27 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 06:50:27PM +0300, Avi Kivity wrote:
> Joerg Roedel wrote:
> >On Wed, Aug 27, 2008 at 06:22:14PM +0300, Avi Kivity wrote:
> >
> >>Joerg Rodel wrote:
> >>
> >>>I will test it. Is the fix in your latest kernel.org tree?
> >>It is now. It doesn't fix the problem.
> >>
> >>
> >>>Reproduce it
> >>>with a KVM guest and start tbench in it with around 100 clients
> >>>configured. The tbench-process will crash when the bug is hit.
> >>>
> >>Does it reproduce with uniprocessor guests?
> >>
> >
> >Don't know yet. We will try that.
> >
> >
>
> It didn't reproduce here on uniprocessor, but I hadn't tried for long.
We are still testing. In the moment it does not reproduce very fast, for
whatever reason...
>
> Some observations:
>
> - tbench triggers many cases where we have concurrent faults on the same address.
> these are serialized by mmu_lock. I tried to have direct_map_entry() return is
> it detects a race. didn't help.
> - I instrumented set_shadow_pte() to warn if changing the pfn or writeable bit.
> Didn't trip.
>
> Are there any rules for touching npt ptes concurrently?
Hmm, not that I am aware of. I will ask the silicon guys if they know
something. But I don't think so.
> Meanwhile, I applied the patch, but I'm very worried about this.
Yes, we are also worried. Another question is why this only happens with
NPT. The SoftMMU code should also fail with shadow paging if there is a
bug.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 16:27 ` Joerg Rodel
@ 2008-08-27 16:49 ` Avi Kivity
2008-08-27 16:59 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 16:49 UTC (permalink / raw)
To: Joerg Rodel; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
Joerg Rodel wrote:
>> Meanwhile, I applied the patch, but I'm very worried about this.
>>
>
> Yes, we are also worried. Another question is why this only happens with
> NPT. The SoftMMU code should also fail with shadow paging if there is a
> bug.
>
Slightly different paths -- direct_map vs page_fault. Also, with npt,
all cpus will access the same pte that's being modified; without npt,
faults on the same page will result in different ptes being
instantiated, as each access will be from a different guest pte.
Maybe we should turn on the dirty bit in the instantiated ptes -- that
will reduce the processor's mucking about with them.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 16:49 ` Avi Kivity
@ 2008-08-27 16:59 ` Avi Kivity
2008-08-28 14:58 ` Joerg Rodel
0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 16:59 UTC (permalink / raw)
To: Joerg Rodel; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
Avi Kivity wrote:
> Joerg Rodel wrote:
>>> Meanwhile, I applied the patch, but I'm very worried about this.
>>>
>>
>> Yes, we are also worried. Another question is why this only happens with
>> NPT. The SoftMMU code should also fail with shadow paging if there is a
>> bug.
>>
>
> Slightly different paths -- direct_map vs page_fault. Also, with npt,
> all cpus will access the same pte that's being modified; without npt,
> faults on the same page will result in different ptes being
> instantiated, as each access will be from a different guest pte.
>
> Maybe we should turn on the dirty bit in the instantiated ptes -- that
> will reduce the processor's mucking about with them.
>
I meant the accessed bit. The dirty bit is always set, but the accessed
bit it not, due to a bug. Fixing it doesn't help, though.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 16:59 ` Avi Kivity
@ 2008-08-28 14:58 ` Joerg Rodel
2008-08-28 15:15 ` Avi Kivity
0 siblings, 1 reply; 19+ messages in thread
From: Joerg Rodel @ 2008-08-28 14:58 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 07:59:24PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
> >Joerg Rodel wrote:
> >>>Meanwhile, I applied the patch, but I'm very worried about this.
> >>>
> >>
> >>Yes, we are also worried. Another question is why this only happens with
> >>NPT. The SoftMMU code should also fail with shadow paging if there is a
> >>bug.
> >>
> >
> >Slightly different paths -- direct_map vs page_fault. Also, with npt, all cpus will access the same pte
> >that's being modified; without npt, faults on the same page will result in different ptes being instantiated,
> >as each access will be from a different guest pte.
> >
> >Maybe we should turn on the dirty bit in the instantiated ptes -- that will reduce the processor's mucking
> >about with them.
> >
>
> I meant the accessed bit. The dirty bit is always set, but the accessed bit it not, due to a bug. Fixing it
> doesn't help, though.
I did a bit meditation about the softmmu code today. In the path of the
NPT fault the function kvm_mmu_free_some_pages() is called which itself
calls kvm_mmu_zap_page(). There the two functions
kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called.
They both call mmu_page_remove_parent_pte() which modifies ptes. But
only the first function, kvm_mmu_page_unlink_children(), flushes remote
TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct?
If yes, why?
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-28 14:58 ` Joerg Rodel
@ 2008-08-28 15:15 ` Avi Kivity
2008-08-28 15:19 ` Joerg Roedel
2008-08-28 15:29 ` Avi Kivity
0 siblings, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2008-08-28 15:15 UTC (permalink / raw)
To: Joerg Rodel; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
Joerg Rodel wrote:
> I did a bit meditation about the softmmu code today. In the path of the
> NPT fault the function kvm_mmu_free_some_pages() is called which itself
> calls kvm_mmu_zap_page(). There the two functions
> kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called.
> They both call mmu_page_remove_parent_pte() which modifies ptes. But
> only the first function, kvm_mmu_page_unlink_children(), flushes remote
> TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct?
>
>
It isn't correct. I'll move the flush below. Good catch.
However, I can't believe this is responsible. There is very little page
zapping going on with npt.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-28 15:15 ` Avi Kivity
@ 2008-08-28 15:19 ` Joerg Roedel
2008-08-28 15:47 ` Avi Kivity
2008-08-28 15:29 ` Avi Kivity
1 sibling, 1 reply; 19+ messages in thread
From: Joerg Roedel @ 2008-08-28 15:19 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
On Thu, Aug 28, 2008 at 06:15:57PM +0300, Avi Kivity wrote:
> Joerg Rodel wrote:
> > I did a bit meditation about the softmmu code today. In the path of the
> > NPT fault the function kvm_mmu_free_some_pages() is called which itself
> > calls kvm_mmu_zap_page(). There the two functions
> > kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called.
> > They both call mmu_page_remove_parent_pte() which modifies ptes. But
> > only the first function, kvm_mmu_page_unlink_children(), flushes remote
> > TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct?
> >
> >
>
> It isn't correct. I'll move the flush below. Good catch.
>
> However, I can't believe this is responsible. There is very little page
> zapping going on with npt.
Ok, cool. But the bug happens only rarely so I think there is some
probability that this is the missing tlb flush, I think. But lets see.
Joerg
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-28 15:19 ` Joerg Roedel
@ 2008-08-28 15:47 ` Avi Kivity
0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2008-08-28 15:47 UTC (permalink / raw)
To: Joerg Roedel; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
Joerg Roedel wrote:
> Ok, cool. But the bug happens only rarely so I think there is some
> probability that this is the missing tlb flush, I think. But lets see.
>
It reproduces rapidly for me, and it did after adding the missing flush
to kvm_mmu_zap_page().
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-28 15:15 ` Avi Kivity
2008-08-28 15:19 ` Joerg Roedel
@ 2008-08-28 15:29 ` Avi Kivity
2008-08-28 15:58 ` Joerg Roedel
1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-28 15:29 UTC (permalink / raw)
To: Joerg Rodel; +Cc: Joerg Roedel, kvm, stable, Alexander Graf
Avi Kivity wrote:
> Joerg Rodel wrote:
>
>> I did a bit meditation about the softmmu code today. In the path of the
>> NPT fault the function kvm_mmu_free_some_pages() is called which itself
>> calls kvm_mmu_zap_page(). There the two functions
>> kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called.
>> They both call mmu_page_remove_parent_pte() which modifies ptes. But
>> only the first function, kvm_mmu_page_unlink_children(), flushes remote
>> TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct?
>>
>>
>>
>
> It isn't correct. I'll move the flush below. Good catch.
>
> However, I can't believe this is responsible. There is very little page
> zapping going on with npt.
>
>
Indeed, the mmu_shadow_zapped counter for the guest I'm testing is zero,
so this code path was never hit.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-28 15:29 ` Avi Kivity
@ 2008-08-28 15:58 ` Joerg Roedel
0 siblings, 0 replies; 19+ messages in thread
From: Joerg Roedel @ 2008-08-28 15:58 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
On Thu, Aug 28, 2008 at 06:29:19PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
> > Joerg Rodel wrote:
> >
> >> I did a bit meditation about the softmmu code today. In the path of the
> >> NPT fault the function kvm_mmu_free_some_pages() is called which itself
> >> calls kvm_mmu_zap_page(). There the two functions
> >> kvm_mmu_page_unlink_children() and kvm_mmu_unlink_parents() are called.
> >> They both call mmu_page_remove_parent_pte() which modifies ptes. But
> >> only the first function, kvm_mmu_page_unlink_children(), flushes remote
> >> TLBs. The function kvm_mmu_unlink_parents() does not. Is this correct?
> >>
> >>
> >>
> >
> > It isn't correct. I'll move the flush below. Good catch.
> >
> > However, I can't believe this is responsible. There is very little page
> > zapping going on with npt.
> >
> >
>
> Indeed, the mmu_shadow_zapped counter for the guest I'm testing is zero,
> so this code path was never hit.
Ok, but at least we found another missing flush :)
Joerg
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 13:11 ` Avi Kivity
2008-08-27 13:53 ` Avi Kivity
@ 2008-08-27 13:53 ` Joerg Rodel
2008-08-27 15:21 ` Avi Kivity
1 sibling, 1 reply; 19+ messages in thread
From: Joerg Rodel @ 2008-08-27 13:53 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 04:11:02PM +0300, Avi Kivity wrote:
> Joerg Rodel wrote:
> >From: Joerg Roedel <joerg.roedel@amd.com>
> >
> >This patch introduces a guest TLB flush on every NPF exit in KVM. This fixes
> >random segfaults and #UD exceptions in the guest seen under some workloads
> >(e.g. long running compile workloads or tbench). A kernbench run with and
> >without that fix showed that it has a slowdown lower than 0.5%
> >
> >
>
> hm. tbench doesn't allocate memory, so there shouldn't be any npt faults. I don't
> see how this can make a difference.
Base for the fix was this bugreport:
http://sourceforge.net/tracker/index.php?func=detail&aid=2019053&group_id=180599&atid=893831
We found out that the same crash occur on long running compile
workloads and that stale tlb-entries cause it. Until we find the real
location of the missing tlb flush in the mmu code, i think its best to
flush the TLB every time the mapping/unmapping code for the nested page
table is executed. This fixes at least the crashes in the guest and has
only minimal performance impact.
> It can only change something if X is started and we're tracking writes to the
> framebuffer. Is this the case?
No, X is not running in the guest.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 13:53 ` Joerg Rodel
@ 2008-08-27 15:21 ` Avi Kivity
2008-08-27 15:32 ` Joerg Roedel
0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-08-27 15:21 UTC (permalink / raw)
To: Joerg Rodel; +Cc: kvm, stable, Alexander Graf
Joerg Rodel wrote:
>> hm. tbench doesn't allocate memory, so there shouldn't be any npt faults. I don't
>> see how this can make a difference.
>>
>
>
I reproduced it. There are a few npt faults as the guest has not
touched all of memory yet. If I force it to touch all of memory (dd <
/dev/hda), the problem appears to go away.
> Base for the fix was this bugreport:
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=2019053&group_id=180599&atid=893831
>
> We found out that the same crash occur on long running compile
> workloads and that stale tlb-entries cause it. Until we find the real
> location of the missing tlb flush in the mmu code, i think its best to
> flush the TLB every time the mapping/unmapping code for the nested page
> table is executed. This fixes at least the crashes in the guest and has
> only minimal performance impact.
>
I'd like to try and find out what the problem is exactly. Otherwise we
may be only narrowing the window, not closing it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] KVM: SVM: fix random segfaults with NPT enabled
2008-08-27 15:21 ` Avi Kivity
@ 2008-08-27 15:32 ` Joerg Roedel
0 siblings, 0 replies; 19+ messages in thread
From: Joerg Roedel @ 2008-08-27 15:32 UTC (permalink / raw)
To: Avi Kivity; +Cc: Joerg Rodel, kvm, stable, Alexander Graf
On Wed, Aug 27, 2008 at 06:21:40PM +0300, Avi Kivity wrote:
> Joerg Rodel wrote:
>
>
>
> >>hm. tbench doesn't allocate memory, so there shouldn't be any npt
> >>faults. I don't see how this can make a difference.
> >>
> >
> >
>
> I reproduced it. There are a few npt faults as the guest has not
> touched all of memory yet. If I force it to touch all of memory (dd <
> /dev/hda), the problem appears to go away.
>
> >Base for the fix was this bugreport:
> >
> >http://sourceforge.net/tracker/index.php?func=detail&aid=2019053&group_id=180599&atid=893831
> >
> >We found out that the same crash occur on long running compile
> >workloads and that stale tlb-entries cause it. Until we find the real
> >location of the missing tlb flush in the mmu code, i think its best to
> >flush the TLB every time the mapping/unmapping code for the nested page
> >table is executed. This fixes at least the crashes in the guest and has
> >only minimal performance impact.
> >
>
> I'd like to try and find out what the problem is exactly. Otherwise we
> may be only narrowing the window, not closing it.
Agreed. The fix I sent is only meant to be temporary until we find the
real root cause of the problem.
Joerg
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-08-28 15:58 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-27 12:18 [PATCH] KVM: SVM: fix random segfaults with NPT enabled Joerg Rodel
2008-08-27 13:11 ` Avi Kivity
2008-08-27 13:53 ` Avi Kivity
2008-08-27 13:57 ` Joerg Rodel
2008-08-27 15:22 ` Avi Kivity
2008-08-27 15:35 ` Joerg Roedel
2008-08-27 15:50 ` Avi Kivity
2008-08-27 16:27 ` Joerg Rodel
2008-08-27 16:49 ` Avi Kivity
2008-08-27 16:59 ` Avi Kivity
2008-08-28 14:58 ` Joerg Rodel
2008-08-28 15:15 ` Avi Kivity
2008-08-28 15:19 ` Joerg Roedel
2008-08-28 15:47 ` Avi Kivity
2008-08-28 15:29 ` Avi Kivity
2008-08-28 15:58 ` Joerg Roedel
2008-08-27 13:53 ` Joerg Rodel
2008-08-27 15:21 ` Avi Kivity
2008-08-27 15:32 ` Joerg Roedel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox