* Debugging an inconsistent shadow page table
@ 2009-04-25 10:36 Jan Kiszka
2009-04-26 10:32 ` Avi Kivity
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2009-04-25 10:36 UTC (permalink / raw)
To: kvm-devel
[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]
Hi,
turning on MMU_DEBUG and AUDIT in arch/x86/kvm/mmu.c (and fixing a build
error, patch will follow) I got this (and then a #GP :( - patch will
follow):
...
kvm_mmu_get_page: looking gfn 0 role f0120
kvm_mmu_get_page: found
kvm_mmu_get_page: looking gfn 0 role f0220
kvm_mmu_get_page: found
kvm_mmu_get_page: looking gfn 0 role f0320
kvm_mmu_get_page: found
kvm_mmu_get_page: looking gfn e1f role e0044
kvm_mmu_get_page: adding gfn e1f role e0044
rmap_write_protect: spte ffff8100660a60f8 7ca98067
paging64_page_fault: addr 100105 err 19
audit_write_protection: (pre page fault) shadow page has writable mappings: gfn e1f role e0044
audit: (pre page fault) nontrapping pte in nonleaf level: levels 4 gva 8000000000 level 4 pte 0
Is the last message indicating a problem? I get it very early during
guest boot. oos_shadow is disabled.
I'm currently trying to understand an obvious inconsistency in the pte
describing a page of the virtio-net rx ring. On some guests with some
qemu (upstream) command lines I can trigger this with '-smb /some/path'
and then doing smbclient -L in the guest. Once the inconsistency slipped
in, host and guest see different page contents and virtio-net stops to
work. Very strange, but fortunately easily reproducible here. Any hints
or debugging suggestions welcome!
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-25 10:36 Debugging an inconsistent shadow page table Jan Kiszka
@ 2009-04-26 10:32 ` Avi Kivity
2009-04-26 11:11 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-04-26 10:32 UTC (permalink / raw)
To: Jan Kiszka; +Cc: kvm-devel
Jan Kiszka wrote:
> Hi,
>
> turning on MMU_DEBUG and AUDIT in arch/x86/kvm/mmu.c (and fixing a build
> error, patch will follow) I got this (and then a #GP :( - patch will
> follow):
>
> ...
> kvm_mmu_get_page: looking gfn 0 role f0120
> kvm_mmu_get_page: found
> kvm_mmu_get_page: looking gfn 0 role f0220
> kvm_mmu_get_page: found
> kvm_mmu_get_page: looking gfn 0 role f0320
> kvm_mmu_get_page: found
> kvm_mmu_get_page: looking gfn e1f role e0044
> kvm_mmu_get_page: adding gfn e1f role e0044
> rmap_write_protect: spte ffff8100660a60f8 7ca98067
> paging64_page_fault: addr 100105 err 19
> audit_write_protection: (pre page fault) shadow page has writable mappings: gfn e1f role e0044
> audit: (pre page fault) nontrapping pte in nonleaf level: levels 4 gva 8000000000 level 4 pte 0
>
> Is the last message indicating a problem? I get it very early during
> guest boot. oos_shadow is disabled.
>
Yes. It means the guest will receive a page fault if is accesses
anything this pte points to. Theoretically we could have made this
work, but we never did.
But the message is self-contradictory. Level 4 PTEs map 0.5TB each, and
the gva mentioned isn't 0.5TB aligned.
> I'm currently trying to understand an obvious inconsistency in the pte
> describing a page of the virtio-net rx ring. On some guests with some
> qemu (upstream) command lines I can trigger this with '-smb /some/path'
> and then doing smbclient -L in the guest. Once the inconsistency slipped
> in, host and guest see different page contents and virtio-net stops to
> work. Very strange, but fortunately easily reproducible here. Any hints
> or debugging suggestions welcome!
>
What type of inconsistency? pfn or flags?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 10:32 ` Avi Kivity
@ 2009-04-26 11:11 ` Jan Kiszka
2009-04-26 11:27 ` Gleb Natapov
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2009-04-26 11:11 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel
[-- Attachment #1: Type: text/plain, Size: 3305 bytes --]
Avi Kivity wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> turning on MMU_DEBUG and AUDIT in arch/x86/kvm/mmu.c (and fixing a build
>> error, patch will follow) I got this (and then a #GP :( - patch will
>> follow):
>>
>> ...
>> kvm_mmu_get_page: looking gfn 0 role f0120
>> kvm_mmu_get_page: found
>> kvm_mmu_get_page: looking gfn 0 role f0220
>> kvm_mmu_get_page: found
>> kvm_mmu_get_page: looking gfn 0 role f0320
>> kvm_mmu_get_page: found
>> kvm_mmu_get_page: looking gfn e1f role e0044
>> kvm_mmu_get_page: adding gfn e1f role e0044
>> rmap_write_protect: spte ffff8100660a60f8 7ca98067
>> paging64_page_fault: addr 100105 err 19
>> audit_write_protection: (pre page fault) shadow page has writable
>> mappings: gfn e1f role e0044
>> audit: (pre page fault) nontrapping pte in nonleaf level: levels 4 gva
>> 8000000000 level 4 pte 0
>>
>> Is the last message indicating a problem? I get it very early during
>> guest boot. oos_shadow is disabled.
>>
>
> Yes. It means the guest will receive a page fault if is accesses
> anything this pte points to. Theoretically we could have made this
> work, but we never did.
>
> But the message is self-contradictory. Level 4 PTEs map 0.5TB each, and
> the gva mentioned isn't 0.5TB aligned.
>
>> I'm currently trying to understand an obvious inconsistency in the pte
>> describing a page of the virtio-net rx ring. On some guests with some
>> qemu (upstream) command lines I can trigger this with '-smb /some/path'
>> and then doing smbclient -L in the guest. Once the inconsistency slipped
>> in, host and guest see different page contents and virtio-net stops to
>> work. Very strange, but fortunately easily reproducible here. Any hints
>> or debugging suggestions welcome!
>>
>
> What type of inconsistency? pfn or flags?
>
The former. Here is a before-after walk of the shadow and the host page
table (format: <entry address> (<entry value>) ):
[good]
gva=ffff88001ef22000: 000000002fc57000 -> 000000002fc57880
(000000003d119027) -> 000000003d119000 (000000005d9bf027) ->
000000005d9bf7b8 (000000003d159027) -> 000000003d159910
(000000002fbce063) -> 000000002fbce000 = 01 00 07 00
hva=00007f4665cac000: 00000000701f4000 -> 00000000701f47f0
(000000005d925067) -> 000000005d9258c8 (000000005d8c5067) ->
000000005d8c5970 (000000003d2b4067) -> 000000003d2b4560
(800000002fbce067) -> 800000002fbce000 = 01 00 07 00
[bad]
gva=ffff88001ef22000: 000000002fc57000 -> 000000002fc57880
(000000003d119027) -> 000000003d119000 (000000005d9bf027) ->
000000005d9bf7b8 (000000003d159027) -> 000000003d159910
(000000002fbce063) -> 000000002fbce000 = 01 00 0a 00
hva=00007f4665cac000: 00000000701f4000 -> 00000000701f47f0
(000000005d925067) -> 000000005d9258c8 (000000005d8c5067) ->
000000005d8c5970 (000000003d2b4067) -> 000000003d2b4560
(800000006576d067) -> 800000006576d000 = 01 00 10 00
That raise a question for a kvm-mmu newbie like me:
If a page of the qemu process gets pushed around (here likely due to
fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
MMU_NOTIFIER?
I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
far I assumed that kernels without this feature do not work optimally,
but they won't break my guests...
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:11 ` Jan Kiszka
@ 2009-04-26 11:27 ` Gleb Natapov
2009-04-26 11:34 ` Avi Kivity
2009-04-26 11:36 ` Jan Kiszka
0 siblings, 2 replies; 9+ messages in thread
From: Gleb Natapov @ 2009-04-26 11:27 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Avi Kivity, kvm-devel
On Sun, Apr 26, 2009 at 01:11:40PM +0200, Jan Kiszka wrote:
> That raise a question for a kvm-mmu newbie like me:
>
> If a page of the qemu process gets pushed around (here likely due to
> fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
> MMU_NOTIFIER?
>
> I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
> far I assumed that kernels without this feature do not work optimally,
> but they won't break my guests...
>
Guest memory is not COWed on fork (madvise(MADV_DONTFORK))
--
Gleb.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:27 ` Gleb Natapov
@ 2009-04-26 11:34 ` Avi Kivity
2009-04-26 11:36 ` Jan Kiszka
1 sibling, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-04-26 11:34 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Jan Kiszka, kvm-devel
Gleb Natapov wrote:
> On Sun, Apr 26, 2009 at 01:11:40PM +0200, Jan Kiszka wrote:
>
>> That raise a question for a kvm-mmu newbie like me:
>>
>> If a page of the qemu process gets pushed around (here likely due to
>> fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
>> MMU_NOTIFIER?
>>
>> I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
>> far I assumed that kernels without this feature do not work optimally,
>> but they won't break my guests...
>>
>>
> Guest memory is not COWed on fork (madvise(MADV_DONTFORK))
>
>
Which isn't present in upstream.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:27 ` Gleb Natapov
2009-04-26 11:34 ` Avi Kivity
@ 2009-04-26 11:36 ` Jan Kiszka
2009-04-26 11:39 ` Gleb Natapov
2009-04-26 11:42 ` Avi Kivity
1 sibling, 2 replies; 9+ messages in thread
From: Jan Kiszka @ 2009-04-26 11:36 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Avi Kivity, kvm-devel
[-- Attachment #1: Type: text/plain, Size: 756 bytes --]
Gleb Natapov wrote:
> On Sun, Apr 26, 2009 at 01:11:40PM +0200, Jan Kiszka wrote:
>> That raise a question for a kvm-mmu newbie like me:
>>
>> If a page of the qemu process gets pushed around (here likely due to
>> fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
>> MMU_NOTIFIER?
>>
>> I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
>> far I assumed that kernels without this feature do not work optimally,
>> but they won't break my guests...
>>
> Guest memory is not COWed on fork (madvise(MADV_DONTFORK))
Yeah... but that's missing upstream! Will cross-check and then post a
fix for qemu.
Out of curiosity: What's the mechanism to update the shadow table after
swap-out/swap-in?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:36 ` Jan Kiszka
@ 2009-04-26 11:39 ` Gleb Natapov
2009-04-26 11:41 ` Jan Kiszka
2009-04-26 11:42 ` Avi Kivity
1 sibling, 1 reply; 9+ messages in thread
From: Gleb Natapov @ 2009-04-26 11:39 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Avi Kivity, kvm-devel
On Sun, Apr 26, 2009 at 01:36:22PM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Sun, Apr 26, 2009 at 01:11:40PM +0200, Jan Kiszka wrote:
> >> That raise a question for a kvm-mmu newbie like me:
> >>
> >> If a page of the qemu process gets pushed around (here likely due to
> >> fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
> >> MMU_NOTIFIER?
> >>
> >> I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
> >> far I assumed that kernels without this feature do not work optimally,
> >> but they won't break my guests...
> >>
> > Guest memory is not COWed on fork (madvise(MADV_DONTFORK))
>
> Yeah... but that's missing upstream! Will cross-check and then post a
> fix for qemu.
>
> Out of curiosity: What's the mechanism to update the shadow table after
> swap-out/swap-in?
>
I don't think guest memory is swappable without mmu notifiers.
--
Gleb.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:39 ` Gleb Natapov
@ 2009-04-26 11:41 ` Jan Kiszka
0 siblings, 0 replies; 9+ messages in thread
From: Jan Kiszka @ 2009-04-26 11:41 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Avi Kivity, kvm-devel
[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]
Gleb Natapov wrote:
> On Sun, Apr 26, 2009 at 01:36:22PM +0200, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Sun, Apr 26, 2009 at 01:11:40PM +0200, Jan Kiszka wrote:
>>>> That raise a question for a kvm-mmu newbie like me:
>>>>
>>>> If a page of the qemu process gets pushed around (here likely due to
>>>> fork()->exec(smbd)->COW), how will kvm's shadow table catch up? Via
>>>> MMU_NOTIFIER?
>>>>
>>>> I'm on a 2.6.25 kernel, and that means without CONFIG_MMU_NOTIFIER. So
>>>> far I assumed that kernels without this feature do not work optimally,
>>>> but they won't break my guests...
>>>>
>>> Guest memory is not COWed on fork (madvise(MADV_DONTFORK))
>> Yeah... but that's missing upstream! Will cross-check and then post a
>> fix for qemu.
>>
>> Out of curiosity: What's the mechanism to update the shadow table after
>> swap-out/swap-in?
>>
> I don't think guest memory is swappable without mmu notifiers.
Given the experience with COW: How is this ensured, or where is this
done upstream?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging an inconsistent shadow page table
2009-04-26 11:36 ` Jan Kiszka
2009-04-26 11:39 ` Gleb Natapov
@ 2009-04-26 11:42 ` Avi Kivity
1 sibling, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-04-26 11:42 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Gleb Natapov, kvm-devel
Jan Kiszka wrote:
> Out of curiosity: What's the mechanism to update the shadow table after
> swap-out/swap-in?
>
With mmu notifiers, the kernel informs kvm that a host pte has been
invalidated, kvm looks in its reverse mappings and drops any sptes that
correspond to the same hva.
Without mmu notifiers, the page reference count is kept elevated, so the
kernel can't swap anything which has an spte pointing to it.
In both cases, the spte is reestablished on first guest access (which
may or may not be immediately after swapin).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-04-26 11:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-25 10:36 Debugging an inconsistent shadow page table Jan Kiszka
2009-04-26 10:32 ` Avi Kivity
2009-04-26 11:11 ` Jan Kiszka
2009-04-26 11:27 ` Gleb Natapov
2009-04-26 11:34 ` Avi Kivity
2009-04-26 11:36 ` Jan Kiszka
2009-04-26 11:39 ` Gleb Natapov
2009-04-26 11:41 ` Jan Kiszka
2009-04-26 11:42 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox