public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Out of sync shadow core breaks Hurd
@ 2008-11-12 19:00 Aurelien Jarno
  2008-11-15 12:15 ` Marcelo Tosatti
  2008-11-20  9:48 ` Marcelo Tosatti
  0 siblings, 2 replies; 5+ messages in thread
From: Aurelien Jarno @ 2008-11-12 19:00 UTC (permalink / raw)
  To: kvm; +Cc: Marcelo Tosatti

Hi,

Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore
under KVM. The ext2fs translator issues a strange error message:

| Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui
| ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp-
| >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.           -
| >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.

Bisecting the problem, I have found that it comes from this patch:

| 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit
| commit 641fb03992b20aa640781a245f6b7136f0b845e4
| Author: Marcelo Tosatti <mtosatti@redhat.com>
| Date:   Tue Sep 23 13:18:39 2008 -0300
| 
|     KVM: MMU: out of sync shadow core v2
| 
|     Allow guest pagetables to go out of sync.
| 
|     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|     Signed-off-by: Avi Kivity <avi@redhat.com>

The problem can be workarounded loading the kvm module with 
oos_shadow=0.

The easiest way to reproduce the problem is to download a ready to use
Hurd image [1]. The error message from the ext2fs translator is not
exactly the same, but it still fails.

Aurelien

[1] http://ftp.debian-ports.org/debian-cd/hurd-i386/current/debian-hurd-k16-qemu.img.tar.gz

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Out of sync shadow core breaks Hurd
  2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno
@ 2008-11-15 12:15 ` Marcelo Tosatti
  2008-11-20  9:48 ` Marcelo Tosatti
  1 sibling, 0 replies; 5+ messages in thread
From: Marcelo Tosatti @ 2008-11-15 12:15 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm

On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote:
> Hi,
> 
> Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore
> under KVM. The ext2fs translator issues a strange error message:
> 
> | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui
> | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp-
> | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.           -
> | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.
> 
> Bisecting the problem, I have found that it comes from this patch:
> 
> | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit
> | commit 641fb03992b20aa640781a245f6b7136f0b845e4
> | Author: Marcelo Tosatti <mtosatti@redhat.com>
> | Date:   Tue Sep 23 13:18:39 2008 -0300
> | 
> |     KVM: MMU: out of sync shadow core v2
> | 
> |     Allow guest pagetables to go out of sync.
> | 
> |     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> |     Signed-off-by: Avi Kivity <avi@redhat.com>
> 
> The problem can be workarounded loading the kvm module with 
> oos_shadow=0.
> 
> The easiest way to reproduce the problem is to download a ready to use
> Hurd image [1]. The error message from the ext2fs translator is not
> exactly the same, but it still fails.

Thanks Aurelien, I'll be looking at this next week.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Out of sync shadow core breaks Hurd
  2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno
  2008-11-15 12:15 ` Marcelo Tosatti
@ 2008-11-20  9:48 ` Marcelo Tosatti
  2008-11-25  9:57   ` Aurelien Jarno
  1 sibling, 1 reply; 5+ messages in thread
From: Marcelo Tosatti @ 2008-11-20  9:48 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm

Hi Aurelien,

On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote:
> Hi,
> 
> Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore
> under KVM. The ext2fs translator issues a strange error message:
> 
> | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui
> | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp-
> | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.           -
> | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.
> 
> Bisecting the problem, I have found that it comes from this patch:
> 
> | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit
> | commit 641fb03992b20aa640781a245f6b7136f0b845e4
> | Author: Marcelo Tosatti <mtosatti@redhat.com>
> | Date:   Tue Sep 23 13:18:39 2008 -0300
> | 
> |     KVM: MMU: out of sync shadow core v2
> | 
> |     Allow guest pagetables to go out of sync.
> | 
> |     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> |     Signed-off-by: Avi Kivity <avi@redhat.com>
> 
> The problem can be workarounded loading the kvm module with 
> oos_shadow=0.
> 
> The easiest way to reproduce the problem is to download a ready to use
> Hurd image [1]. The error message from the ext2fs translator is not
> exactly the same, but it still fails.

It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4
writes or invlpg after updating pagetables. Debugging shows that OOS is
properly syncing the sptes wrt the guest pagetables, and that all pages
are synced before guest re-entry on TLB flush exits.

The Intel TLB doc says (5.1 "Invalidation Instructions"):

(Other instructions and operations may invalidate entries in the TLBs
and the paging structure caches, but the instructions identified above
are recommended.)

As a test, syncing on every exit makes it happy:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7a2aeba..47e2550 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3052,6 +3052,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 
 	kvm_lapic_sync_from_vapic(vcpu);
 
+	kvm_mmu_sync_roots(vcpu);
+
 	r = kvm_x86_ops->handle_exit(kvm_run, vcpu);
 out:
 	return r;

It would be necessary to confirm this by hacking Hurd to flush on every
pagetable update. Perhaps something like

RCS file: /sources/hurd/gnumach/i386/intel/pmap.c,v
retrieving revision 1.4.2.22
diff -u -r1.4.2.22 pmap.c
--- pmap.c  11 Nov 2008 02:24:18 -0000  1.4.2.22
+++ pmap.c  20 Nov 2008 12:47:01 -0000
@@ -82,7 +82,7 @@
 #include <i386/proc_reg.h>
 #include <i386/locore.h>
 
-#define    WRITE_PTE(pte_p, pte_entry)     *(pte_p) = (pte_entry);
+#define    WRITE_PTE(pte_p, pte_entry)     *(pte_p) = (pte_entry);
flush_tlb();
 
 /*
  * Private data structures.


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Out of sync shadow core breaks Hurd
  2008-11-20  9:48 ` Marcelo Tosatti
@ 2008-11-25  9:57   ` Aurelien Jarno
  2008-11-25 16:52     ` Aurelien Jarno
  0 siblings, 1 reply; 5+ messages in thread
From: Aurelien Jarno @ 2008-11-25  9:57 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Thu, Nov 20, 2008 at 10:48:21AM +0100, Marcelo Tosatti wrote:
> Hi Aurelien,
Hi,

> On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote:
> > Hi,
> > 
> > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore
> > under KVM. The ext2fs translator issues a strange error message:
> > 
> > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui
> > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp-
> > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.           -
> > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.
> > 
> > Bisecting the problem, I have found that it comes from this patch:
> > 
> > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit
> > | commit 641fb03992b20aa640781a245f6b7136f0b845e4
> > | Author: Marcelo Tosatti <mtosatti@redhat.com>
> > | Date:   Tue Sep 23 13:18:39 2008 -0300
> > | 
> > |     KVM: MMU: out of sync shadow core v2
> > | 
> > |     Allow guest pagetables to go out of sync.
> > | 
> > |     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > |     Signed-off-by: Avi Kivity <avi@redhat.com>
> > 
> > The problem can be workarounded loading the kvm module with 
> > oos_shadow=0.
> > 
> > The easiest way to reproduce the problem is to download a ready to use
> > Hurd image [1]. The error message from the ext2fs translator is not
> > exactly the same, but it still fails.
> 
> It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4
> writes or invlpg after updating pagetables. Debugging shows that OOS is
> properly syncing the sptes wrt the guest pagetables, and that all pages
> are synced before guest re-entry on TLB flush exits.

Thanks for your investigation.

> The Intel TLB doc says (5.1 "Invalidation Instructions"):
> 
> (Other instructions and operations may invalidate entries in the TLBs
> and the paging structure caches, but the instructions identified above
> are recommended.)
> 
> As a test, syncing on every exit makes it happy:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7a2aeba..47e2550 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3052,6 +3052,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  
>  	kvm_lapic_sync_from_vapic(vcpu);
>  
> +	kvm_mmu_sync_roots(vcpu);
> +
>  	r = kvm_x86_ops->handle_exit(kvm_run, vcpu);
>  out:
>  	return r;
> 
> It would be necessary to confirm this by hacking Hurd to flush on every
> pagetable update. Perhaps something like
> 
> RCS file: /sources/hurd/gnumach/i386/intel/pmap.c,v
> retrieving revision 1.4.2.22
> diff -u -r1.4.2.22 pmap.c
> --- pmap.c  11 Nov 2008 02:24:18 -0000  1.4.2.22
> +++ pmap.c  20 Nov 2008 12:47:01 -0000
> @@ -82,7 +82,7 @@
>  #include <i386/proc_reg.h>
>  #include <i386/locore.h>
>  
> -#define    WRITE_PTE(pte_p, pte_entry)     *(pte_p) = (pte_entry);
> +#define    WRITE_PTE(pte_p, pte_entry)     *(pte_p) = (pte_entry);
> flush_tlb();
>  
>  /*
>   * Private data structures.
> 
> 

I have tried this patch, but it doesn't change anything. I'll try to see
if there are more place when the PTE is written.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Out of sync shadow core breaks Hurd
  2008-11-25  9:57   ` Aurelien Jarno
@ 2008-11-25 16:52     ` Aurelien Jarno
  0 siblings, 0 replies; 5+ messages in thread
From: Aurelien Jarno @ 2008-11-25 16:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Tue, Nov 25, 2008 at 10:57:17AM +0100, Aurelien Jarno wrote:
> On Thu, Nov 20, 2008 at 10:48:21AM +0100, Marcelo Tosatti wrote:
> > Hi Aurelien,
> Hi,
> 
> > On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote:
> > > Hi,
> > > 
> > > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore
> > > under KVM. The ext2fs translator issues a strange error message:
> > > 
> > > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui
> > > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp-
> > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.           -
> > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed.
> > > 
> > > Bisecting the problem, I have found that it comes from this patch:
> > > 
> > > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit
> > > | commit 641fb03992b20aa640781a245f6b7136f0b845e4
> > > | Author: Marcelo Tosatti <mtosatti@redhat.com>
> > > | Date:   Tue Sep 23 13:18:39 2008 -0300
> > > | 
> > > |     KVM: MMU: out of sync shadow core v2
> > > | 
> > > |     Allow guest pagetables to go out of sync.
> > > | 
> > > |     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > |     Signed-off-by: Avi Kivity <avi@redhat.com>
> > > 
> > > The problem can be workarounded loading the kvm module with 
> > > oos_shadow=0.
> > > 
> > > The easiest way to reproduce the problem is to download a ready to use
> > > Hurd image [1]. The error message from the ext2fs translator is not
> > > exactly the same, but it still fails.
> > 
> > It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4
> > writes or invlpg after updating pagetables. Debugging shows that OOS is
> > properly syncing the sptes wrt the guest pagetables, and that all pages
> > are synced before guest re-entry on TLB flush exits.
> 

Looking more precisely at the code, Hurd (actually GNU Mach) flushes the
TLB via cr3, but just *before* updating the pagetables. I have no idea
why it is done that way, but it seems to be correct given the way the
Intel MMU works. However, it fails to comply with the recommendations
from Intel ("5.2 Recommended Invalidation"), which if I understand 
correctly, have been taken as a basis for implementing out of sync 
shadow.

I have confirmed that by patching the GNU Mach code so that TLB are
flushed before and after modifying pagetables.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32@debian.org         | aurelien@aurel32.net
   `-    people.debian.org/~aurel32 | www.aurel32.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-11-25 16:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno
2008-11-15 12:15 ` Marcelo Tosatti
2008-11-20  9:48 ` Marcelo Tosatti
2008-11-25  9:57   ` Aurelien Jarno
2008-11-25 16:52     ` Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox