* Out of sync shadow core breaks Hurd @ 2008-11-12 19:00 Aurelien Jarno 2008-11-15 12:15 ` Marcelo Tosatti 2008-11-20 9:48 ` Marcelo Tosatti 0 siblings, 2 replies; 5+ messages in thread From: Aurelien Jarno @ 2008-11-12 19:00 UTC (permalink / raw) To: kvm; +Cc: Marcelo Tosatti Hi, Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore under KVM. The ext2fs translator issues a strange error message: | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp- | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. - | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. Bisecting the problem, I have found that it comes from this patch: | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit | commit 641fb03992b20aa640781a245f6b7136f0b845e4 | Author: Marcelo Tosatti <mtosatti@redhat.com> | Date: Tue Sep 23 13:18:39 2008 -0300 | | KVM: MMU: out of sync shadow core v2 | | Allow guest pagetables to go out of sync. | | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> | Signed-off-by: Avi Kivity <avi@redhat.com> The problem can be workarounded loading the kvm module with oos_shadow=0. The easiest way to reproduce the problem is to download a ready to use Hurd image [1]. The error message from the ext2fs translator is not exactly the same, but it still fails. Aurelien [1] http://ftp.debian-ports.org/debian-cd/hurd-i386/current/debian-hurd-k16-qemu.img.tar.gz -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32@debian.org | aurelien@aurel32.net `- people.debian.org/~aurel32 | www.aurel32.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Out of sync shadow core breaks Hurd 2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno @ 2008-11-15 12:15 ` Marcelo Tosatti 2008-11-20 9:48 ` Marcelo Tosatti 1 sibling, 0 replies; 5+ messages in thread From: Marcelo Tosatti @ 2008-11-15 12:15 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote: > Hi, > > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore > under KVM. The ext2fs translator issues a strange error message: > > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp- > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. - > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. > > Bisecting the problem, I have found that it comes from this patch: > > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit > | commit 641fb03992b20aa640781a245f6b7136f0b845e4 > | Author: Marcelo Tosatti <mtosatti@redhat.com> > | Date: Tue Sep 23 13:18:39 2008 -0300 > | > | KVM: MMU: out of sync shadow core v2 > | > | Allow guest pagetables to go out of sync. > | > | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > | Signed-off-by: Avi Kivity <avi@redhat.com> > > The problem can be workarounded loading the kvm module with > oos_shadow=0. > > The easiest way to reproduce the problem is to download a ready to use > Hurd image [1]. The error message from the ext2fs translator is not > exactly the same, but it still fails. Thanks Aurelien, I'll be looking at this next week. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Out of sync shadow core breaks Hurd 2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno 2008-11-15 12:15 ` Marcelo Tosatti @ 2008-11-20 9:48 ` Marcelo Tosatti 2008-11-25 9:57 ` Aurelien Jarno 1 sibling, 1 reply; 5+ messages in thread From: Marcelo Tosatti @ 2008-11-20 9:48 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm Hi Aurelien, On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote: > Hi, > > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore > under KVM. The ext2fs translator issues a strange error message: > > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp- > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. - > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. > > Bisecting the problem, I have found that it comes from this patch: > > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit > | commit 641fb03992b20aa640781a245f6b7136f0b845e4 > | Author: Marcelo Tosatti <mtosatti@redhat.com> > | Date: Tue Sep 23 13:18:39 2008 -0300 > | > | KVM: MMU: out of sync shadow core v2 > | > | Allow guest pagetables to go out of sync. > | > | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > | Signed-off-by: Avi Kivity <avi@redhat.com> > > The problem can be workarounded loading the kvm module with > oos_shadow=0. > > The easiest way to reproduce the problem is to download a ready to use > Hurd image [1]. The error message from the ext2fs translator is not > exactly the same, but it still fails. It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4 writes or invlpg after updating pagetables. Debugging shows that OOS is properly syncing the sptes wrt the guest pagetables, and that all pages are synced before guest re-entry on TLB flush exits. The Intel TLB doc says (5.1 "Invalidation Instructions"): (Other instructions and operations may invalidate entries in the TLBs and the paging structure caches, but the instructions identified above are recommended.) As a test, syncing on every exit makes it happy: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7a2aeba..47e2550 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3052,6 +3052,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_lapic_sync_from_vapic(vcpu); + kvm_mmu_sync_roots(vcpu); + r = kvm_x86_ops->handle_exit(kvm_run, vcpu); out: return r; It would be necessary to confirm this by hacking Hurd to flush on every pagetable update. Perhaps something like RCS file: /sources/hurd/gnumach/i386/intel/pmap.c,v retrieving revision 1.4.2.22 diff -u -r1.4.2.22 pmap.c --- pmap.c 11 Nov 2008 02:24:18 -0000 1.4.2.22 +++ pmap.c 20 Nov 2008 12:47:01 -0000 @@ -82,7 +82,7 @@ #include <i386/proc_reg.h> #include <i386/locore.h> -#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry); +#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry); flush_tlb(); /* * Private data structures. ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Out of sync shadow core breaks Hurd 2008-11-20 9:48 ` Marcelo Tosatti @ 2008-11-25 9:57 ` Aurelien Jarno 2008-11-25 16:52 ` Aurelien Jarno 0 siblings, 1 reply; 5+ messages in thread From: Aurelien Jarno @ 2008-11-25 9:57 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm On Thu, Nov 20, 2008 at 10:48:21AM +0100, Marcelo Tosatti wrote: > Hi Aurelien, Hi, > On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote: > > Hi, > > > > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore > > under KVM. The ext2fs translator issues a strange error message: > > > > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui > > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp- > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. - > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. > > > > Bisecting the problem, I have found that it comes from this patch: > > > > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit > > | commit 641fb03992b20aa640781a245f6b7136f0b845e4 > > | Author: Marcelo Tosatti <mtosatti@redhat.com> > > | Date: Tue Sep 23 13:18:39 2008 -0300 > > | > > | KVM: MMU: out of sync shadow core v2 > > | > > | Allow guest pagetables to go out of sync. > > | > > | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > | Signed-off-by: Avi Kivity <avi@redhat.com> > > > > The problem can be workarounded loading the kvm module with > > oos_shadow=0. > > > > The easiest way to reproduce the problem is to download a ready to use > > Hurd image [1]. The error message from the ext2fs translator is not > > exactly the same, but it still fails. > > It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4 > writes or invlpg after updating pagetables. Debugging shows that OOS is > properly syncing the sptes wrt the guest pagetables, and that all pages > are synced before guest re-entry on TLB flush exits. Thanks for your investigation. > The Intel TLB doc says (5.1 "Invalidation Instructions"): > > (Other instructions and operations may invalidate entries in the TLBs > and the paging structure caches, but the instructions identified above > are recommended.) > > As a test, syncing on every exit makes it happy: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 7a2aeba..47e2550 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3052,6 +3052,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > > kvm_lapic_sync_from_vapic(vcpu); > > + kvm_mmu_sync_roots(vcpu); > + > r = kvm_x86_ops->handle_exit(kvm_run, vcpu); > out: > return r; > > It would be necessary to confirm this by hacking Hurd to flush on every > pagetable update. Perhaps something like > > RCS file: /sources/hurd/gnumach/i386/intel/pmap.c,v > retrieving revision 1.4.2.22 > diff -u -r1.4.2.22 pmap.c > --- pmap.c 11 Nov 2008 02:24:18 -0000 1.4.2.22 > +++ pmap.c 20 Nov 2008 12:47:01 -0000 > @@ -82,7 +82,7 @@ > #include <i386/proc_reg.h> > #include <i386/locore.h> > > -#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry); > +#define WRITE_PTE(pte_p, pte_entry) *(pte_p) = (pte_entry); > flush_tlb(); > > /* > * Private data structures. > > I have tried this patch, but it doesn't change anything. I'll try to see if there are more place when the PTE is written. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32@debian.org | aurelien@aurel32.net `- people.debian.org/~aurel32 | www.aurel32.net ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Out of sync shadow core breaks Hurd 2008-11-25 9:57 ` Aurelien Jarno @ 2008-11-25 16:52 ` Aurelien Jarno 0 siblings, 0 replies; 5+ messages in thread From: Aurelien Jarno @ 2008-11-25 16:52 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm On Tue, Nov 25, 2008 at 10:57:17AM +0100, Aurelien Jarno wrote: > On Thu, Nov 20, 2008 at 10:48:21AM +0100, Marcelo Tosatti wrote: > > Hi Aurelien, > Hi, > > > On Wed, Nov 12, 2008 at 08:00:37PM +0100, Aurelien Jarno wrote: > > > Hi, > > > > > > Starting with kvm-76 (and including kvm-79), Hurd does not boot anymore > > > under KVM. The ext2fs translator issues a strange error message: > > > > > > | Hurd server bootstrap: ext2fs.static[device:hd0s3] execext2fs.static: /build/bui > > > | ldd/hurd-20080607/build-tree/hurd/ext2fs/dir.c:494: dirscanblock: Assertion `dp- > > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. - > > > | >dn->dirents[idx] == -1 || dp->dn->dirents[idx] == nentries' failed. > > > > > > Bisecting the problem, I have found that it comes from this patch: > > > > > > | 641fb03992b20aa640781a245f6b7136f0b845e4 is first bad commit > > > | commit 641fb03992b20aa640781a245f6b7136f0b845e4 > > > | Author: Marcelo Tosatti <mtosatti@redhat.com> > > > | Date: Tue Sep 23 13:18:39 2008 -0300 > > > | > > > | KVM: MMU: out of sync shadow core v2 > > > | > > > | Allow guest pagetables to go out of sync. > > > | > > > | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > | Signed-off-by: Avi Kivity <avi@redhat.com> > > > > > > The problem can be workarounded loading the kvm module with > > > oos_shadow=0. > > > > > > The easiest way to reproduce the problem is to download a ready to use > > > Hurd image [1]. The error message from the ext2fs translator is not > > > exactly the same, but it still fails. > > > > It seems Hurd does not always explicitly flush the TLB via cr0/cr3/cr4 > > writes or invlpg after updating pagetables. Debugging shows that OOS is > > properly syncing the sptes wrt the guest pagetables, and that all pages > > are synced before guest re-entry on TLB flush exits. > Looking more precisely at the code, Hurd (actually GNU Mach) flushes the TLB via cr3, but just *before* updating the pagetables. I have no idea why it is done that way, but it seems to be correct given the way the Intel MMU works. However, it fails to comply with the recommendations from Intel ("5.2 Recommended Invalidation"), which if I understand correctly, have been taken as a basis for implementing out of sync shadow. I have confirmed that by patching the GNU Mach code so that TLB are flushed before and after modifying pagetables. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32@debian.org | aurelien@aurel32.net `- people.debian.org/~aurel32 | www.aurel32.net ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-11-25 16:52 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-12 19:00 Out of sync shadow core breaks Hurd Aurelien Jarno 2008-11-15 12:15 ` Marcelo Tosatti 2008-11-20 9:48 ` Marcelo Tosatti 2008-11-25 9:57 ` Aurelien Jarno 2008-11-25 16:52 ` Aurelien Jarno
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox