All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen 3.0.4 migration failures
@ 2007-01-06  0:52 John Byrne
  2007-01-06 10:15 ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: John Byrne @ 2007-01-06  0:52 UTC (permalink / raw)
  To: xen-devel


Hi,

With the Xen 3.0.4 I've built from source, migration is failing and the 
domain vanishes. (Either live/non-live.) The output from "xm dmesg" and 
xend.log follows. Any ideas?

Thanks,

John Byrne

xm dmesg reports:

(XEN) mm.c:551:d0 Bad L1 flags 80000000
(XEN) mm.c:828:d0 Failure in alloc_l1_table: entry 194
(XEN) mm.c:1685:d0 Error while validating mfn 12eeb3 (pfn 1608) for type 
20000000: caf=80000002 taf=20000001
(XEN) mm.c:976:d0 Failure in alloc_l2_table: entry 64
(XEN) mm.c:1685:d0 Error while validating mfn 11470e (pfn 3ad) for type 
40000000: caf=80000002 taf=40000001
(XEN) mm.c:1039:d0 Failure in alloc_l3_table: entry 0
(XEN) mm.c:1685:d0 Error while validating mfn 11470a (pfn 3b1) for type 
60000000: caf=80000002 taf=60000001
(XEN) mm.c:1960:d0 Error while pinning mfn 11470a

xend.log on the target machine reports:

ib/xen/bin/xc_restore 4 1 133120 1 2
[2007-01-05 18:47:17 xend 3206] INFO (XendCheckpoint:247) 
xc_linux_restore start: max_pfn = 20800
[2007-01-05 18:47:17 xend 3206] INFO (XendCheckpoint:247) Increased 
domain reservation by 82000 KB
[2007-01-05 18:47:17 xend 3206] INFO (XendCheckpoint:247) Reloading 
memory pages:   0%
[2007-01-05 18:47:24 xend 3206] INFO (XendCheckpoint:247) Received all 
pages (0 races)
[2007-01-05 18:47:24 xend 3206] INFO (XendCheckpoint:247) ERROR Internal 
error: Failed to pin batch of 31 page tables
[2007-01-05 18:47:24 xend 3206] INFO (XendCheckpoint:247) Restore exit 
with rc=1
[2007-01-05 18:47:24 xend.XendDomainInfo 3206] DEBUG 
(XendDomainInfo:1483) XendDomainInfo.destroy: domid=1
[2007-01-05 18:47:24 xend.XendDomainInfo 3206] DEBUG 
(XendDomainInfo:1491) XendDomainInfo.destroyDomain(1)
[2007-01-05 18:47:24 xend.XendDomainInfo 3206] ERROR 
(XendDomainInfo:1500) XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
   File 
"/disk2/xen/xen-3.0.4-testing.hg/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py", 
line 1495, in destroyDomain
     xc.domain_destroy(self.domid)
Error: (3, 'No such process')
[2007-01-05 18:47:24 xend 3206] ERROR (XendDomain:1001) Restore failed
Traceback (most recent call last):
   File 
"/disk2/xen/xen-3.0.4-testing.hg/dist/install/usr/lib/python/xen/xend/XendDomain.py", 
line 996, in domain_restore_fd
     return XendCheckpoint.restore(self, fd, paused=paused)
   File 
"/disk2/xen/xen-3.0.4-testing.hg/dist/install/usr/lib/python/xen/xend/XendCheckpoint.py", 
line 167, in restore
     forkHelper(cmd, fd, handler.handler, True)
   File 
"/disk2/xen/xen-3.0.4-testing.hg/dist/install/usr/lib/python/xen/xend/XendCheckpoint.py", 
line 235, in forkHelper
     raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen/bin/xc_restore 4 1 133120 1 2 failed

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Xen 3.0.4 migration failures
  2007-01-06  0:52 Xen 3.0.4 migration failures John Byrne
@ 2007-01-06 10:15 ` Keir Fraser
  2007-01-08 19:20   ` John Byrne
  0 siblings, 1 reply; 3+ messages in thread
From: Keir Fraser @ 2007-01-06 10:15 UTC (permalink / raw)
  To: John Byrne, xen-devel

On 6/1/07 12:52 am, "John Byrne" <john.l.byrne@hp.com> wrote:

> With the Xen 3.0.4 I've built from source, migration is failing and the
> domain vanishes. (Either live/non-live.) The output from "xm dmesg" and
> xend.log follows. Any ideas?

PAE or 64-bit? Looks like the target is choking on _PAGE_NX. Probably the
target machine does not support it. We discussed silently dropping _PAGE_NX
on machines that don't support it but that seems a bit dubious to me. This
really needs to be solved by adding CPUID to the save format and doing
consistency checking at the target before committing to the migration. 3.0.5
will also have save/restore cancellation, so that at least if things do
screw up your guest won't /dev/null.

 -- Keir

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Xen 3.0.4 migration failures
  2007-01-06 10:15 ` Keir Fraser
@ 2007-01-08 19:20   ` John Byrne
  0 siblings, 0 replies; 3+ messages in thread
From: John Byrne @ 2007-01-08 19:20 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

Keir Fraser wrote:
> On 6/1/07 12:52 am, "John Byrne" <john.l.byrne@hp.com> wrote:
> 
>> With the Xen 3.0.4 I've built from source, migration is failing and the
>> domain vanishes. (Either live/non-live.) The output from "xm dmesg" and
>> xend.log follows. Any ideas?
> 
> PAE or 64-bit? Looks like the target is choking on _PAGE_NX. Probably the
> target machine does not support it. We discussed silently dropping _PAGE_NX
> on machines that don't support it but that seems a bit dubious to me. This
> really needs to be solved by adding CPUID to the save format and doing
> consistency checking at the target before committing to the migration. 3.0.5
> will also have save/restore cancellation, so that at least if things do
> screw up your guest won't /dev/null.
> 
>  -- Keir
> 
> 
> 

PAE. NX was disabled int the BIOS. Things work now. Thanks.

Getting rid of the "disappearing domain" issue will certainly be good, 
but mysterious refusals to migrate with unhelpful error messages will 
only be relatively better. Decent diagnostics for CPU incompatibilities 
are really important.

Thanks again,

John Byrne

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-01-08 19:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-06  0:52 Xen 3.0.4 migration failures John Byrne
2007-01-06 10:15 ` Keir Fraser
2007-01-08 19:20   ` John Byrne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.