* cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
@ 2009-02-23 0:33 Aurelien Jarno
2009-02-23 1:47 ` Marcelo Tosatti
2009-03-20 23:14 ` Marcelo Tosatti
0 siblings, 2 replies; 21+ messages in thread
From: Aurelien Jarno @ 2009-02-23 0:33 UTC (permalink / raw)
To: kvm; +Cc: Marcelo Tosatti
Hi,
Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
under high load (during a compilation for example) with the following
error message:
| Fatal trap 12: page fault while in kernel mode
| fault virtual address = 0x4
| fault code = supervisor read, page not present
| instruction pointer = 0x20:0xc0a4fc00
| stack pointer = 0x28:0xe66d7a70
| frame pointer = 0x28:0xe66d7a80
| code segment = base 0x0, limit 0xfffff, type 0x1b
| = DPL 0, pres 1, def32 1, gran 1
| processor eflags = interrupt enabled, resume, IOPL = 0
| current process = 24037 (bash)
| trap number = 12
| panic: page fault
| Uptime: 4m7s
| Cannot dump. No dump device defined.
| Automatic reboot in 15 seconds - press a key on the console to abort
I haven't tried yet with a plain FreeBSD guest, but I also expect it to
crash given the kernel (version 7.1) is almost the same. A closer
investigation has shown that the following commit is causing the
problem:
| commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
| Author: Marcelo Tosatti <mtosatti@redhat.com>
| Date: Mon Dec 1 22:32:04 2008 -0200
|
| KVM: MMU: skip global pgtables on sync due to cr3 switch
|
| Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
| important for Linux 32-bit guests with PAE, where the kmap page is
| marked as global.
|
| Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| Signed-off-by: Avi Kivity <avi@redhat.com>
As expected, loading the KVM module with oos_shadow=0 workaround the
problem. Please note that the guest is running in 32-bit mode, does not
use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.
Does anybody see any problem in this patch? How can I further
debug the problem?
Aurelien
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 0:33 cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest Aurelien Jarno
@ 2009-02-23 1:47 ` Marcelo Tosatti
2009-02-23 14:01 ` Aurelien Jarno
2009-03-20 23:14 ` Marcelo Tosatti
1 sibling, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-02-23 1:47 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: kvm
On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> Hi,
>
> Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> under high load (during a compilation for example) with the following
> error message:
>
> | Fatal trap 12: page fault while in kernel mode
> | fault virtual address = 0x4
> | fault code = supervisor read, page not present
> | instruction pointer = 0x20:0xc0a4fc00
> | stack pointer = 0x28:0xe66d7a70
> | frame pointer = 0x28:0xe66d7a80
> | code segment = base 0x0, limit 0xfffff, type 0x1b
> | = DPL 0, pres 1, def32 1, gran 1
> | processor eflags = interrupt enabled, resume, IOPL = 0
> | current process = 24037 (bash)
> | trap number = 12
> | panic: page fault
> | Uptime: 4m7s
> | Cannot dump. No dump device defined.
> | Automatic reboot in 15 seconds - press a key on the console to abort
>
> I haven't tried yet with a plain FreeBSD guest, but I also expect it to
> crash given the kernel (version 7.1) is almost the same. A closer
> investigation has shown that the following commit is causing the
> problem:
>
> | commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
> | Author: Marcelo Tosatti <mtosatti@redhat.com>
> | Date: Mon Dec 1 22:32:04 2008 -0200
> |
> | KVM: MMU: skip global pgtables on sync due to cr3 switch
> |
> | Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
> | important for Linux 32-bit guests with PAE, where the kmap page is
> | marked as global.
> |
> | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> | Signed-off-by: Avi Kivity <avi@redhat.com>
>
> As expected, loading the KVM module with oos_shadow=0 workaround the
> problem. Please note that the guest is running in 32-bit mode, does not
> use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
> problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.
>
> Does anybody see any problem in this patch? How can I further
> debug the problem?
Aurelien,
Maybe there is a bug in the syncing code (eg: not all global pages are
sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
invlpg/cr3 write to sync global pages (remember TLB entries can be
invalidated internally by CPU).
If you want to debug it, would suggest looping over all MMU pages in
mmu_sync_global, after the kvm_sync_page loop, and
WARN_ON(sp->unsync && sp->global);
If that fails, check if the unsync and global flags mean what they are
supposed to.
Sorry for the trouble and thanks for the detailed report, will take a
close look at it this week.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 1:47 ` Marcelo Tosatti
@ 2009-02-23 14:01 ` Aurelien Jarno
2009-02-23 14:52 ` Marcelo Tosatti
0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2009-02-23 14:01 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm
On Sun, Feb 22, 2009 at 10:47:13PM -0300, Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> > Hi,
> >
> > Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> > under high load (during a compilation for example) with the following
> > error message:
> >
> > | Fatal trap 12: page fault while in kernel mode
> > | fault virtual address = 0x4
> > | fault code = supervisor read, page not present
> > | instruction pointer = 0x20:0xc0a4fc00
> > | stack pointer = 0x28:0xe66d7a70
> > | frame pointer = 0x28:0xe66d7a80
> > | code segment = base 0x0, limit 0xfffff, type 0x1b
> > | = DPL 0, pres 1, def32 1, gran 1
> > | processor eflags = interrupt enabled, resume, IOPL = 0
> > | current process = 24037 (bash)
> > | trap number = 12
> > | panic: page fault
> > | Uptime: 4m7s
> > | Cannot dump. No dump device defined.
> > | Automatic reboot in 15 seconds - press a key on the console to abort
> >
> > I haven't tried yet with a plain FreeBSD guest, but I also expect it to
> > crash given the kernel (version 7.1) is almost the same. A closer
> > investigation has shown that the following commit is causing the
> > problem:
> >
> > | commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
> > | Author: Marcelo Tosatti <mtosatti@redhat.com>
> > | Date: Mon Dec 1 22:32:04 2008 -0200
> > |
> > | KVM: MMU: skip global pgtables on sync due to cr3 switch
> > |
> > | Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
> > | important for Linux 32-bit guests with PAE, where the kmap page is
> > | marked as global.
> > |
> > | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > | Signed-off-by: Avi Kivity <avi@redhat.com>
> >
> > As expected, loading the KVM module with oos_shadow=0 workaround the
> > problem. Please note that the guest is running in 32-bit mode, does not
> > use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
> > problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.
> >
> > Does anybody see any problem in this patch? How can I further
> > debug the problem?
>
> Aurelien,
>
> Maybe there is a bug in the syncing code (eg: not all global pages are
> sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
> invlpg/cr3 write to sync global pages (remember TLB entries can be
> invalidated internally by CPU).
As far as I understand the Intel IA32 manual, using invlpg to sync
global pages is correct. OTOH, cr3 is not. I think that if FreeBSD is
using cr3 to sync global pages that should also cause a problem on real
hardware sooner or later, right?
I'll try to look at the kernel code.
> If you want to debug it, would suggest looping over all MMU pages in
> mmu_sync_global, after the kvm_sync_page loop, and
>
> WARN_ON(sp->unsync && sp->global);
>
> If that fails, check if the unsync and global flags mean what they are
> supposed to.
This doesn't detect any problem, which means that all pages marked as
global are synced correctly.
I have also tried to call kvm_mmu_sync_global() in kvm_set_cr3(), and
as expected the guest works correctly in that case.
If I understand correctly, and if the problem is not on the FreeBSD
side, the only remaining possible problem is if normal pages are wrongly
marked as global, and thus should be synced with cr3, while they are
not. Am I right here?
> Sorry for the trouble and thanks for the detailed report, will take a
> close look at it this week.
>
Thanks for your fast answer and for your help for debugging.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 14:01 ` Aurelien Jarno
@ 2009-02-23 14:52 ` Marcelo Tosatti
2009-02-23 14:59 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-02-23 14:52 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: kvm
On Mon, Feb 23, 2009 at 03:01:15PM +0100, Aurelien Jarno wrote:
> > Maybe there is a bug in the syncing code (eg: not all global pages are
> > sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
> > invlpg/cr3 write to sync global pages (remember TLB entries can be
> > invalidated internally by CPU).
>
> As far as I understand the Intel IA32 manual, using invlpg to sync
> global pages is correct. OTOH, cr3 is not. I think that if FreeBSD is
> using cr3 to sync global pages that should also cause a problem on real
> hardware sooner or later, right?
From my understanding of the documentation, yes. Note:
5.1 Invalidation Instructions
The processor is always free to invalidate additional entries in the
TLBs and paging-structure caches. The following are some examples:
• INVLPG may invalidate TLB entries for pages other than the one
corresponding to its linear-address operand.
• MOV to CR3 may invalidate TLB entries for global pages.
^^^
• On a processor supporting Hyper-Threading Technology, invalidations
performed on one logical processor may invalidate entries in the TLBs
and paging-structure caches used by other logical processors.
> I'll try to look at the kernel code.
>
> > If you want to debug it, would suggest looping over all MMU pages in
> > mmu_sync_global, after the kvm_sync_page loop, and
> >
> > WARN_ON(sp->unsync && sp->global);
> >
> > If that fails, check if the unsync and global flags mean what they are
> > supposed to.
>
> This doesn't detect any problem, which means that all pages marked as
> global are synced correctly.
>
> I have also tried to call kvm_mmu_sync_global() in kvm_set_cr3(), and
> as expected the guest works correctly in that case.
>
> If I understand correctly, and if the problem is not on the FreeBSD
> side, the only remaining possible problem is if normal pages are wrongly
> marked as global, and thus should be synced with cr3, while they are
> not. Am I right here?
Yes, this is the most likely problematic scenario.
> > Sorry for the trouble and thanks for the detailed report, will take a
> > close look at it this week.
> >
>
> Thanks for your fast answer and for your help for debugging.
If you confirm that FreeBSD is indeed relying on cr3 to sync global
pages, it might be better to disable the optimization. Lets hope that is
not the case.
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 14:52 ` Marcelo Tosatti
@ 2009-02-23 14:59 ` Avi Kivity
2009-02-23 15:06 ` Marcelo Tosatti
0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-02-23 14:59 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
>> Thanks for your fast answer and for your help for debugging.
>>
>
> If you confirm that FreeBSD is indeed relying on cr3 to sync global
> pages, it might be better to disable the optimization. Lets hope that is
> not the case.
>
cr3 writes explicitly do not flush global pages; otherwise what would be
the point of global pages at all?
In other words, the only difference between global and non-global pages
is visible via cr3 writes.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 14:59 ` Avi Kivity
@ 2009-02-23 15:06 ` Marcelo Tosatti
2009-02-23 15:16 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-02-23 15:06 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Mon, Feb 23, 2009 at 04:59:37PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> Thanks for your fast answer and for your help for debugging.
>>>
>>
>> If you confirm that FreeBSD is indeed relying on cr3 to sync global
>> pages, it might be better to disable the optimization. Lets hope that is
>> not the case.
>>
>
> cr3 writes explicitly do not flush global pages; otherwise what would be
> the point of global pages at all?
From the Intel TLB doc:
The processor is always free to invalidate additional entries in the TLBs
and paging-structure caches. The following are some examples:
• MOV to CR3 may invalidate TLB entries for global pages.
The reasoning was if an optimization breaks an important guest which
contains a bug that happens to not trigger on real HW due to positioning
of the stars, it is reasonable to disable that optimization.
> In other words, the only difference between global and non-global pages
> is visible via cr3 writes.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 15:06 ` Marcelo Tosatti
@ 2009-02-23 15:16 ` Avi Kivity
0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2009-02-23 15:16 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 04:59:37PM +0200, Avi Kivity wrote:
>
>> Marcelo Tosatti wrote:
>>
>>>> Thanks for your fast answer and for your help for debugging.
>>>>
>>>>
>>> If you confirm that FreeBSD is indeed relying on cr3 to sync global
>>> pages, it might be better to disable the optimization. Lets hope that is
>>> not the case.
>>>
>>>
>> cr3 writes explicitly do not flush global pages; otherwise what would be
>> the point of global pages at all?
>>
>
> From the Intel TLB doc:
>
> The processor is always free to invalidate additional entries in the TLBs
> and paging-structure caches. The following are some examples:
>
> • MOV to CR3 may invalidate TLB entries for global pages.
>
> The reasoning was if an optimization breaks an important guest which
> contains a bug that happens to not trigger on real HW due to positioning
> of the stars, it is reasonable to disable that optimization.
>
This means the OS may not rely on the TLB retaining its contents. For
example, you can't do
1. set pte to global+present
2. access through pte to load tlb entry
3. clear pte
4. switch cr3
5. access through same pte again, relying on tlb entry to service the
access
So the processor may choose to ignore the global bit on some or all tlb
entries, but software cannot assume that it does. Typically it will
honor the global bit since otherwise it's useless.
I don't think this is what is happening with FreeBSD. It may be that
spte population on invlpg is confusing the guest (though that is allowed
as a speculative read?). For example, the sequence:
1. invlpg
2. set pte to A (present+accessed)
3. set pte to B (present+accessed)
kvm behaves as if a speculative read always happens between 2 and 3,
which would be very rare on real hardware.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-02-23 0:33 cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest Aurelien Jarno
2009-02-23 1:47 ` Marcelo Tosatti
@ 2009-03-20 23:14 ` Marcelo Tosatti
2009-03-21 8:51 ` Aurelien Jarno
2009-03-22 9:35 ` Avi Kivity
1 sibling, 2 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2009-03-20 23:14 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: kvm
On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> Hi,
>
> Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> under high load (during a compilation for example) with the following
> error message:
>
> | Fatal trap 12: page fault while in kernel mode
> | fault virtual address = 0x4
> | fault code = supervisor read, page not present
> | instruction pointer = 0x20:0xc0a4fc00
> | stack pointer = 0x28:0xe66d7a70
> | frame pointer = 0x28:0xe66d7a80
> | code segment = base 0x0, limit 0xfffff, type 0x1b
> | = DPL 0, pres 1, def32 1, gran 1
> | processor eflags = interrupt enabled, resume, IOPL = 0
> | current process = 24037 (bash)
> | trap number = 12
> | panic: page fault
> | Uptime: 4m7s
> | Cannot dump. No dump device defined.
> | Automatic reboot in 15 seconds - press a key on the console to abort
Aurelien,
Can you try this patch please.
-------
KVM: MMU: do not allow unsync global pages to become unreachable
Section 5.1 of the Intel TLB doc:
"• INVLPG. This instruction takes a single operand, which is a linear
address. The instruction invalidates any TLB entries for the page number
of that linear address including those for global pages (see Section
7). INVLPG also invalidates all entries in all paging-structure caches
regardless of the linear addresses to which they correspond."
Section 7:
"The following items detail the invalidation of global TLB entries:
• An execution of INVLPG invalidates any TLB entries for the page
number of the linear address that is its operand, even if the entries
are global."
If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.
So invalidate all unsync global entries when zapping a page. Another
possibility would be to cover global pages in the unsync bitmaps, but
that would increase overhead for mmu_sync_roots.
Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
}
}
+static void mmu_invalidate_unsync_global(struct kvm *kvm)
+{
+ struct kvm_mmu_page *sp, *n;
+
+ list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
+ kvm_mmu_page_unlink_children(kvm, sp);
+ kvm_unlink_unsync_page(kvm, sp);
+ }
+}
+
static int mmu_zap_unsync_children(struct kvm *kvm,
struct kvm_mmu_page *parent)
{
@@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
kvm_mmu_pages_init(parent, &parents, &pages);
}
+ mmu_invalidate_unsync_global(kvm);
+
return zapped;
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-20 23:14 ` Marcelo Tosatti
@ 2009-03-21 8:51 ` Aurelien Jarno
2009-03-22 9:35 ` Avi Kivity
1 sibling, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2009-03-21 8:51 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm
On Fri, Mar 20, 2009 at 08:14:05PM -0300, Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> > Hi,
> >
> > Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> > under high load (during a compilation for example) with the following
> > error message:
> >
> > | Fatal trap 12: page fault while in kernel mode
> > | fault virtual address = 0x4
> > | fault code = supervisor read, page not present
> > | instruction pointer = 0x20:0xc0a4fc00
> > | stack pointer = 0x28:0xe66d7a70
> > | frame pointer = 0x28:0xe66d7a80
> > | code segment = base 0x0, limit 0xfffff, type 0x1b
> > | = DPL 0, pres 1, def32 1, gran 1
> > | processor eflags = interrupt enabled, resume, IOPL = 0
> > | current process = 24037 (bash)
> > | trap number = 12
> > | panic: page fault
> > | Uptime: 4m7s
> > | Cannot dump. No dump device defined.
> > | Automatic reboot in 15 seconds - press a key on the console to abort
>
> Aurelien,
>
> Can you try this patch please.
I have just tried it, and it works. Thanks a lot for your work!
> -------
>
> KVM: MMU: do not allow unsync global pages to become unreachable
>
> Section 5.1 of the Intel TLB doc:
>
> "• INVLPG. This instruction takes a single operand, which is a linear
> address. The instruction invalidates any TLB entries for the page number
> of that linear address including those for global pages (see Section
> 7). INVLPG also invalidates all entries in all paging-structure caches
> regardless of the linear addresses to which they correspond."
>
> Section 7:
>
> "The following items detail the invalidation of global TLB entries:
>
> • An execution of INVLPG invalidates any TLB entries for the page
> number of the linear address that is its operand, even if the entries
> are global."
>
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
>
> So invalidate all unsync global entries when zapping a page. Another
> possibility would be to cover global pages in the unsync bitmaps, but
> that would increase overhead for mmu_sync_roots.
>
> Index: kvm/arch/x86/kvm/mmu.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/mmu.c
> +++ kvm/arch/x86/kvm/mmu.c
> @@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
> }
> }
>
> +static void mmu_invalidate_unsync_global(struct kvm *kvm)
> +{
> + struct kvm_mmu_page *sp, *n;
> +
> + list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
> + kvm_mmu_page_unlink_children(kvm, sp);
> + kvm_unlink_unsync_page(kvm, sp);
> + }
> +}
> +
> static int mmu_zap_unsync_children(struct kvm *kvm,
> struct kvm_mmu_page *parent)
> {
> @@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
> kvm_mmu_pages_init(parent, &parents, &pages);
> }
>
> + mmu_invalidate_unsync_global(kvm);
> +
> return zapped;
> }
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-20 23:14 ` Marcelo Tosatti
2009-03-21 8:51 ` Aurelien Jarno
@ 2009-03-22 9:35 ` Avi Kivity
2009-03-23 17:27 ` Marcelo Tosatti
1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-03-22 9:35 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
>
>> Hi,
>>
>> Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
>> under high load (during a compilation for example) with the following
>> error message:
>>
>> | Fatal trap 12: page fault while in kernel mode
>> | fault virtual address = 0x4
>> | fault code = supervisor read, page not present
>> | instruction pointer = 0x20:0xc0a4fc00
>> | stack pointer = 0x28:0xe66d7a70
>> | frame pointer = 0x28:0xe66d7a80
>> | code segment = base 0x0, limit 0xfffff, type 0x1b
>> | = DPL 0, pres 1, def32 1, gran 1
>> | processor eflags = interrupt enabled, resume, IOPL = 0
>> | current process = 24037 (bash)
>> | trap number = 12
>> | panic: page fault
>> | Uptime: 4m7s
>> | Cannot dump. No dump device defined.
>> | Automatic reboot in 15 seconds - press a key on the console to abort
>>
>
> Aurelien,
>
> Can you try this patch please.
>
> -------
>
> KVM: MMU: do not allow unsync global pages to become unreachable
>
> Section 5.1 of the Intel TLB doc:
>
> "• INVLPG. This instruction takes a single operand, which is a linear
> address. The instruction invalidates any TLB entries for the page number
> of that linear address including those for global pages (see Section
> 7). INVLPG also invalidates all entries in all paging-structure caches
> regardless of the linear addresses to which they correspond."
>
> Section 7:
>
> "The following items detail the invalidation of global TLB entries:
>
> • An execution of INVLPG invalidates any TLB entries for the page
> number of the linear address that is its operand, even if the entries
> are global."
>
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
>
> So invalidate all unsync global entries when zapping a page. Another
> possibility would be to cover global pages in the unsync bitmaps, but
> that would increase overhead for mmu_sync_roots.
>
> Index: kvm/arch/x86/kvm/mmu.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/mmu.c
> +++ kvm/arch/x86/kvm/mmu.c
> @@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
> }
> }
>
> +static void mmu_invalidate_unsync_global(struct kvm *kvm)
> +{
> + struct kvm_mmu_page *sp, *n;
> +
> + list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
> + kvm_mmu_page_unlink_children(kvm, sp);
> + kvm_unlink_unsync_page(kvm, sp);
> + }
> +}
> +
> static int mmu_zap_unsync_children(struct kvm *kvm,
> struct kvm_mmu_page *parent)
> {
> @@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
> kvm_mmu_pages_init(parent, &parents, &pages);
> }
>
> + mmu_invalidate_unsync_global(kvm);
> +
> return zapped;
> }
>
>
Good catch, indeed. But is it sufficient? We could unlink a page
through other means, for example by the guest zapping a page directory
entry. Maybe it's best to resync when relinking a global page?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-22 9:35 ` Avi Kivity
@ 2009-03-23 17:27 ` Marcelo Tosatti
2009-03-24 9:47 ` Avi Kivity
2009-03-24 10:39 ` Aurelien Jarno
0 siblings, 2 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2009-03-23 17:27 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Sun, Mar 22, 2009 at 11:35:00AM +0200, Avi Kivity wrote:
> Good catch, indeed. But is it sufficient? We could unlink a page
> through other means, for example by the guest zapping a page directory
> entry.
Yep.
> Maybe it's best to resync when relinking a global page?
How about this. It will shorten the unsync period of global pages,
unfortunately.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a36f7f..bccdcc7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
kvm_mmu_mark_parents_unsync(vcpu, sp);
}
+ if (role.level != PT_PAGE_TABLE_LEVEL &&
+ !list_empty(&vcpu->kvm->arch.oos_global_pages))
+ set_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests);
+
pgprintk("%s: found\n", __func__);
return sp;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ea8262..48169d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
kvm_write_guest_time(vcpu);
if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
kvm_mmu_sync_roots(vcpu);
+ if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
+ kvm_mmu_sync_global(vcpu);
if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
kvm_x86_ops->tlb_flush(vcpu);
if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 11eb702..8efd6e3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -37,7 +37,8 @@
#define KVM_REQ_PENDING_TIMER 5
#define KVM_REQ_UNHALT 6
#define KVM_REQ_MMU_SYNC 7
-#define KVM_REQ_KVMCLOCK_UPDATE 8
+#define KVM_REQ_MMU_GLOBAL_SYNC 8
+#define KVM_REQ_KVMCLOCK_UPDATE 9
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-23 17:27 ` Marcelo Tosatti
@ 2009-03-24 9:47 ` Avi Kivity
2009-03-24 11:49 ` Marcelo Tosatti
2009-04-03 21:45 ` Marcelo Tosatti
2009-03-24 10:39 ` Aurelien Jarno
1 sibling, 2 replies; 21+ messages in thread
From: Avi Kivity @ 2009-03-24 9:47 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
>> Maybe it's best to resync when relinking a global page?
>>
>
> How about this. It will shorten the unsync period of global pages,
> unfortunately.
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2a36f7f..bccdcc7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
> kvm_mmu_mark_parents_unsync(vcpu, sp);
> }
> + if (role.level != PT_PAGE_TABLE_LEVEL &&
> + !list_empty(&vcpu->kvm->arch.oos_global_pages))
> + set_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests);
> +
> pgprintk("%s: found\n", __func__);
> return sp;
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ea8262..48169d7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_write_guest_time(vcpu);
> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
> kvm_mmu_sync_roots(vcpu);
> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
> + kvm_mmu_sync_global(vcpu);
> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
> kvm_x86_ops->tlb_flush(vcpu);
> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
Windows will (I think) write a PDE on every context switch, so this
effectively disables global unsync for that guest.
What about recursively syncing the newly linked page in FNAME(fetch)()?
If the page isn't global, this becomes a no-op, so no new overhead. The
only question is the expense when linking a populated top-level page,
especially in long mode.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-23 17:27 ` Marcelo Tosatti
2009-03-24 9:47 ` Avi Kivity
@ 2009-03-24 10:39 ` Aurelien Jarno
1 sibling, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2009-03-24 10:39 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm
On Mon, Mar 23, 2009 at 02:27:25PM -0300, Marcelo Tosatti wrote:
> On Sun, Mar 22, 2009 at 11:35:00AM +0200, Avi Kivity wrote:
> > Good catch, indeed. But is it sufficient? We could unlink a page
> > through other means, for example by the guest zapping a page directory
> > entry.
>
> Yep.
>
> > Maybe it's best to resync when relinking a global page?
>
> How about this. It will shorten the unsync period of global pages,
> unfortunately.
JFYI, it still fixes the problem seen with FreeBSD.
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2a36f7f..bccdcc7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
> kvm_mmu_mark_parents_unsync(vcpu, sp);
> }
> + if (role.level != PT_PAGE_TABLE_LEVEL &&
> + !list_empty(&vcpu->kvm->arch.oos_global_pages))
> + set_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests);
> +
> pgprintk("%s: found\n", __func__);
> return sp;
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ea8262..48169d7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_write_guest_time(vcpu);
> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
> kvm_mmu_sync_roots(vcpu);
> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
> + kvm_mmu_sync_global(vcpu);
> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
> kvm_x86_ops->tlb_flush(vcpu);
> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 11eb702..8efd6e3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -37,7 +37,8 @@
> #define KVM_REQ_PENDING_TIMER 5
> #define KVM_REQ_UNHALT 6
> #define KVM_REQ_MMU_SYNC 7
> -#define KVM_REQ_KVMCLOCK_UPDATE 8
> +#define KVM_REQ_MMU_GLOBAL_SYNC 8
> +#define KVM_REQ_KVMCLOCK_UPDATE 9
>
> #define KVM_USERSPACE_IRQ_SOURCE_ID 0
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-24 9:47 ` Avi Kivity
@ 2009-03-24 11:49 ` Marcelo Tosatti
2009-04-03 21:45 ` Marcelo Tosatti
1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2009-03-24 11:49 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> Maybe it's best to resync when relinking a global page?
>>>
>>
>> How about this. It will shorten the unsync period of global pages,
>> unfortunately.
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a36f7f..bccdcc7 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>> set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
>> kvm_mmu_mark_parents_unsync(vcpu, sp);
>> }
>> + if (role.level != PT_PAGE_TABLE_LEVEL &&
>> + !list_empty(&vcpu->kvm->arch.oos_global_pages))
>> + set_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests);
>> +
>> pgprintk("%s: found\n", __func__);
>> return sp;
>> }
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2ea8262..48169d7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> kvm_write_guest_time(vcpu);
>> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>> kvm_mmu_sync_roots(vcpu);
>> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
>> + kvm_mmu_sync_global(vcpu);
>> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>> kvm_x86_ops->tlb_flush(vcpu);
>> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
>
> Windows will (I think) write a PDE on every context switch, so this
> effectively disables global unsync for that guest.
>
> What about recursively syncing the newly linked page in FNAME(fetch)()?
> If the page isn't global, this becomes a no-op, so no new overhead. The
> only question is the expense when linking a populated top-level page,
> especially in long mode.
Yes, I started doing that but it touches the nice fastpath in fetch().
I'll see if I can come up with something and with numbers.
Thanks
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-03-24 9:47 ` Avi Kivity
2009-03-24 11:49 ` Marcelo Tosatti
@ 2009-04-03 21:45 ` Marcelo Tosatti
2009-04-04 10:37 ` Avi Kivity
2009-04-04 23:23 ` Aurelien Jarno
1 sibling, 2 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2009-04-03 21:45 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
>> index 2ea8262..48169d7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> kvm_write_guest_time(vcpu);
>> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>> kvm_mmu_sync_roots(vcpu);
>> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
>> + kvm_mmu_sync_global(vcpu);
>> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>> kvm_x86_ops->tlb_flush(vcpu);
>> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
>
> Windows will (I think) write a PDE on every context switch, so this
> effectively disables global unsync for that guest.
>
> What about recursively syncing the newly linked page in FNAME(fetch)()?
> If the page isn't global, this becomes a no-op, so no new overhead. The
> only question is the expense when linking a populated top-level page,
> especially in long mode.
How about this?
KVM: MMU: sync global pages on fetch()
If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.
So sync global pages in fetch().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 09782a9..728be72 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
break;
}
- if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+ if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
+ if (level-1 == PT_PAGE_TABLE_LEVEL) {
+ shadow_page = page_header(__pa(sptep));
+ if (shadow_page->unsync && shadow_page->global)
+ kvm_sync_page(vcpu, shadow_page);
+ }
continue;
+ }
if (is_large_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-03 21:45 ` Marcelo Tosatti
@ 2009-04-04 10:37 ` Avi Kivity
2009-04-04 17:01 ` Marcelo Tosatti
2009-04-04 23:23 ` Aurelien Jarno
1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-04-04 10:37 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
> On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
>
>>> index 2ea8262..48169d7 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>>> kvm_write_guest_time(vcpu);
>>> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>>> kvm_mmu_sync_roots(vcpu);
>>> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
>>> + kvm_mmu_sync_global(vcpu);
>>> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>>> kvm_x86_ops->tlb_flush(vcpu);
>>> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
>>>
>> Windows will (I think) write a PDE on every context switch, so this
>> effectively disables global unsync for that guest.
>>
>> What about recursively syncing the newly linked page in FNAME(fetch)()?
>> If the page isn't global, this becomes a no-op, so no new overhead. The
>> only question is the expense when linking a populated top-level page,
>> especially in long mode.
>>
>
> How about this?
>
> KVM: MMU: sync global pages on fetch()
>
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
>
> So sync global pages in fetch().
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 09782a9..728be72 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
> break;
> }
>
> - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
> + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
> + if (level-1 == PT_PAGE_TABLE_LEVEL) {
> + shadow_page = page_header(__pa(sptep));
> + if (shadow_page->unsync && shadow_page->global)
> + kvm_sync_page(vcpu, shadow_page);
> + }
> continue;
> + }
>
> if (is_large_pte(*sptep)) {
> rmap_remove(vcpu->kvm, sptep);
>
But here the shadow page is already linked? Isn't the root cause that
an invlpg was called when the page wasn't linked, so it wasn't seen by
invlpg?
So I thought the best place would be in fetch(), after
kvm_mmu_get_page(). If we're linking a page which contains global ptes,
they might be unsynced due to invlpgs that we've missed.
Or am I missing something about the root cause?
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-04 10:37 ` Avi Kivity
@ 2009-04-04 17:01 ` Marcelo Tosatti
2009-04-05 8:41 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-04-04 17:01 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Sat, Apr 04, 2009 at 01:37:39PM +0300, Avi Kivity wrote:
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 09782a9..728be72 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>> break;
>> }
>> - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
>> + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
>> + if (level-1 == PT_PAGE_TABLE_LEVEL) {
>> + shadow_page = page_header(__pa(sptep));
>> + if (shadow_page->unsync && shadow_page->global)
>> + kvm_sync_page(vcpu, shadow_page);
>> + }
>> continue;
>> + }
>> if (is_large_pte(*sptep)) {
>> rmap_remove(vcpu->kvm, sptep);
>>
>
> But here the shadow page is already linked? Isn't the root cause that
> an invlpg was called when the page wasn't linked, so it wasn't seen by
> invlpg?
>
> So I thought the best place would be in fetch(), after
> kvm_mmu_get_page(). If we're linking a page which contains global ptes,
> they might be unsynced due to invlpgs that we've missed.
>
> Or am I missing something about the root cause?
The problem is when the page is unreachable due to a higher level path
being unlinked. Say:
level 4 -> level 3 . level 2 -> level 1 (global unsync)
The dot there means level 3 is not linked to level 2, so invlpg can't
reach the global unsync at level 1.
kvm_mmu_get_page does sync pages when it finds them, so the code is
already safe for the "linking a page which contains global ptes" case
you mention above.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-03 21:45 ` Marcelo Tosatti
2009-04-04 10:37 ` Avi Kivity
@ 2009-04-04 23:23 ` Aurelien Jarno
1 sibling, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2009-04-04 23:23 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm
On Fri, Apr 03, 2009 at 06:45:48PM -0300, Marcelo Tosatti wrote:
> On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
> >> index 2ea8262..48169d7 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> >> kvm_write_guest_time(vcpu);
> >> if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
> >> kvm_mmu_sync_roots(vcpu);
> >> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, &vcpu->requests))
> >> + kvm_mmu_sync_global(vcpu);
> >> if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
> >> kvm_x86_ops->tlb_flush(vcpu);
> >> if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
> >
> > Windows will (I think) write a PDE on every context switch, so this
> > effectively disables global unsync for that guest.
> >
> > What about recursively syncing the newly linked page in FNAME(fetch)()?
> > If the page isn't global, this becomes a no-op, so no new overhead. The
> > only question is the expense when linking a populated top-level page,
> > especially in long mode.
>
> How about this?
>
> KVM: MMU: sync global pages on fetch()
>
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
>
> So sync global pages in fetch().
>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
I have tried this patch, and unfortunately it does not solve the
original problem, while the previous one did.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-04 17:01 ` Marcelo Tosatti
@ 2009-04-05 8:41 ` Avi Kivity
2009-04-05 11:29 ` Marcelo Tosatti
0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-04-05 8:41 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
> The problem is when the page is unreachable due to a higher level path
> being unlinked. Say:
>
> level 4 -> level 3 . level 2 -> level 1 (global unsync)
>
> The dot there means level 3 is not linked to level 2, so invlpg can't
> reach the global unsync at level 1.
>
> kvm_mmu_get_page does sync pages when it finds them, so the code is
> already safe for the "linking a page which contains global ptes" case
> you mention above.
>
The recursive resync ignores global pages (or it would hit them on cr3
switch too).
But we have a bigger problem, invlpg can miss even if nothing is unlinked:
access address x through global pte
-> pte instantiated into spte
switch to cr3 where x is not mapped, or mapped differently
write to pte
-> no fault since pte is unsync
invlpg x
-> no way this can hit the spte
switch cr3 back
access address x
-> use old spte
Here's one way to make this work:
- add a hash of global pagetables, indexed by virtual address instead
of the pagetable's gfn
- invlpg checks this hash in addition to the recursive walk
We'd need to make the virtual address part of sp->role to avoid needing
to link the same page multiple times in the virtual address hash.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-05 8:41 ` Avi Kivity
@ 2009-04-05 11:29 ` Marcelo Tosatti
2009-04-05 11:41 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-04-05 11:29 UTC (permalink / raw)
To: Avi Kivity; +Cc: Aurelien Jarno, kvm
On Sun, Apr 05, 2009 at 11:41:39AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> The problem is when the page is unreachable due to a higher level path
>> being unlinked. Say:
>>
>> level 4 -> level 3 . level 2 -> level 1 (global unsync)
>>
>> The dot there means level 3 is not linked to level 2, so invlpg can't
>> reach the global unsync at level 1.
>>
>> kvm_mmu_get_page does sync pages when it finds them, so the code is
>> already safe for the "linking a page which contains global ptes" case
>> you mention above.
>>
>
> The recursive resync ignores global pages (or it would hit them on cr3
> switch too).
>
> But we have a bigger problem, invlpg can miss even if nothing is unlinked:
>
> access address x through global pte
> -> pte instantiated into spte
> switch to cr3 where x is not mapped, or mapped differently
> write to pte
> -> no fault since pte is unsync
> invlpg x
> -> no way this can hit the spte
> switch cr3 back
> access address x
> -> use old spte
>
> Here's one way to make this work:
>
> - add a hash of global pagetables, indexed by virtual address instead
> of the pagetable's gfn
> - invlpg checks this hash in addition to the recursive walk
>
> We'd need to make the virtual address part of sp->role to avoid needing
> to link the same page multiple times in the virtual address hash.
Humpf, yes. It seems its too expensive/complex to handle this, for such
small gain (~= 2% on AIM7 with RHEL3 guest).
Are you okay with just disabling the global pages optimization?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest
2009-04-05 11:29 ` Marcelo Tosatti
@ 2009-04-05 11:41 ` Avi Kivity
0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2009-04-05 11:41 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Aurelien Jarno, kvm
Marcelo Tosatti wrote:
>> Here's one way to make this work:
>>
>> - add a hash of global pagetables, indexed by virtual address instead
>> of the pagetable's gfn
>> - invlpg checks this hash in addition to the recursive walk
>>
>> We'd need to make the virtual address part of sp->role to avoid needing
>> to link the same page multiple times in the virtual address hash.
>>
>
> Humpf, yes. It seems its too expensive/complex to handle this, for such
> small gain (~= 2% on AIM7 with RHEL3 guest).
>
> Are you okay with just disabling the global pages optimization?
>
Definitely to plug the hole; and probably for later as well, unless
people cry out due to regressions.
Please send it in two patches: one a trivial one to disable global page
detection which can be sent to -stable as well, and a follow on which
rips out the global page machinery until (and if) we decide to
reimplement it correctly.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-04-05 11:41 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-23 0:33 cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest Aurelien Jarno
2009-02-23 1:47 ` Marcelo Tosatti
2009-02-23 14:01 ` Aurelien Jarno
2009-02-23 14:52 ` Marcelo Tosatti
2009-02-23 14:59 ` Avi Kivity
2009-02-23 15:06 ` Marcelo Tosatti
2009-02-23 15:16 ` Avi Kivity
2009-03-20 23:14 ` Marcelo Tosatti
2009-03-21 8:51 ` Aurelien Jarno
2009-03-22 9:35 ` Avi Kivity
2009-03-23 17:27 ` Marcelo Tosatti
2009-03-24 9:47 ` Avi Kivity
2009-03-24 11:49 ` Marcelo Tosatti
2009-04-03 21:45 ` Marcelo Tosatti
2009-04-04 10:37 ` Avi Kivity
2009-04-04 17:01 ` Marcelo Tosatti
2009-04-05 8:41 ` Avi Kivity
2009-04-05 11:29 ` Marcelo Tosatti
2009-04-05 11:41 ` Avi Kivity
2009-04-04 23:23 ` Aurelien Jarno
2009-03-24 10:39 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox