All of lore.kernel.org
 help / color / mirror / Atom feed
* Invalid types between save and restore, Xen 3.1.4
@ 2008-12-04 17:26 Jean-Yves Migeon
  2008-12-04 18:12 ` Keir Fraser
  0 siblings, 1 reply; 2+ messages in thread
From: Jean-Yves Migeon @ 2008-12-04 17:26 UTC (permalink / raw)
  To: xen-devel

Hi list,

I am currently charged with the implementation of save/restore/migrate 
inside NetBSD.

So far, my current work does manage to save/restore a NetBSD domU, but I 
am erratically (one out of ten) facing issues regarding page type 
validation and pinning when cycling saves/restores.

For unknown reasons, the save operation works, but restore might fail, 
with xend reporting:

[2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) Received all pages 
(0 races)
[2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) ERROR Internal 
error: Failed to pin batch of 21 page tables
[2008-12-04 17:24:40 219] INFO (XendCheckpoint:370) Restore exit with rc=1

This is due to hypervisor refusing some type validation when xc_restore 
is issuing its xc_mmuext_op():

(XEN) mm.c:1842:d0 Bad type (saw 28000008 != exp e0000000) for mfn 1f16f 
(pfn 43e)
(XEN) mm.c:649:d0 Error getting mfn 1f16f (pfn 43e) from L1 entry 
1f16f023 for dom13
(XEN) mm.c:916:d0 Failure in alloc_l1_table: entry 768
(XEN) mm.c:1863:d0 Error while validating mfn 1ee38 (pfn 775) for type 
20000000: caf=80000003 taf=20000001
(XEN) mm.c:683: get_l2_linear_pagetable() ret: 0 (exp 1)
(XEN) mm.c:1091:d0 Failure in alloc_l2_table: entry 1007
(XEN) mm.c:1863:d0 Error while validating mfn 1efb4 (pfn 5f9) for type 
40000000: caf=80000003 taf=40000001
(XEN) mm.c:2132:d0 Error while pinning mfn 1efb4

It is kind of erratic, and hard to reproduce. I suppose that I am facing 
a race inside VM code, but as I am not familiar with Xen's inner 
workings with MMU, I am having a hard time tracking it.

The L1 and L2 entries at fault are always the same. The 1007 L2 entry 
corresponds to an "alternative" recursive PD in our VM subsystem, and 
the L1 768 is the start of our kernel's virtual memory.

This is with Xen 3.1.4. NetBSD does not use writable mappings, and 
manipulates MMU only through the hypercall API. MFN's manipulation are 
suspended during a save, to avoid any incorrect one after a restore.

What I would like to know is the kind of operations that could result on 
such a situation. Considering that the xentools should have an accurate 
view of the pfn_types through the p2m table, how could it become 
possible that between save and restore, hypervisor refuses to validate 
pages, as mappings should not change after the call to HYPERVISOR_suspend()?

For example, why is Xen expecting a writable mapping while the page is 
validated as L1?

I was wondering if anyone could shed some light for me. Please correct 
me if I am wrong.

Thanking you in advance for your help,

-- 
Jean-Yves Migeon
jeanyves.migeon@free.fr

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Invalid types between save and restore, Xen 3.1.4
  2008-12-04 17:26 Invalid types between save and restore, Xen 3.1.4 Jean-Yves Migeon
@ 2008-12-04 18:12 ` Keir Fraser
  0 siblings, 0 replies; 2+ messages in thread
From: Keir Fraser @ 2008-12-04 18:12 UTC (permalink / raw)
  To: Jean-Yves Migeon, xen-devel

On 04/12/2008 17:26, "Jean-Yves Migeon" <jeanyves.migeon@free.fr> wrote:

> What I would like to know is the kind of operations that could result on
> such a situation. Considering that the xentools should have an accurate
> view of the pfn_types through the p2m table, how could it become
> possible that between save and restore, hypervisor refuses to validate
> pages, as mappings should not change after the call to HYPERVISOR_suspend()?
> 
> For example, why is Xen expecting a writable mapping while the page is
> validated as L1?

My guess is that Xen's existing save/restore code is not compatible with
your 'alternative' recursive PDs. For such a recursive PD to be detected,
the PD being mapped must *already* be validated as a PD. Otherwise (let's
assume 2-level pagetables for a moment, with levels called PD and PT) if the
mapped PD is not yet validated, it will by default get validated as a PT!

This explains what you see: the pages mapped by the PD are not interpreted
as PTs but as data pages (because Xen has erroneously decided that the PD is
a PT). Then it will try to validate them as writable data pages and get
confused because some of them are already validated as pagetable pages!

How to fix this... Well:
 (a) Hack xc_domain_save.c and xc_domain_restore.c a bunch. Not fun.
 (b) In the NetBSD kernel, zap alternative recursive PDs before suspending
yourself. If this is possible it will save you a headache. Perhaps you can
flush them somehow, or otherwise zap _PAGE_PRESENT and then reinstate it
yourself during resume?

If you have to go down route (a)... I'd have to think a bit about how best
to fix the issue.

Oh, I'll add that this whole issue will definitely not exist for *self*
recursive PDs. Those will work no problem.

 -- Keir

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-12-04 18:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-04 17:26 Invalid types between save and restore, Xen 3.1.4 Jean-Yves Migeon
2008-12-04 18:12 ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.