From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Novotny Subject: Re: Re: Xen-unstable save error Date: Mon, 21 Jun 2010 16:02:18 +0200 Message-ID: <4C1F70EA.9090408@redhat.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 06/21/2010 03:45 PM, Keir Fraser wrote: > On 21/06/2010 14:37, "Michal Novotny" wrote: > > >> My guest is RHEL-5 i386 guest but this seems that the suspend port is >> missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which >> requires the modified kernel. >> >> Isn't it possible that's the issue or how is it with the SUSPEND_CANCEL >> functionality? >> > SUSPEND_CANCEL is a different thing. The suspend port is simply a quicker > way for suspend notifications to be passed back and forth between the guest > and the dom0 toolstack. We fall back okay if the guest kernel does not > support the new faster method. > > I'm not sure why the domain restore operation fails. Unfortunately some > error messages are now expected in the logs, since Remus functionality went > into the tree. So it's hard to work out what the first error is. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > Ok Keir, but what I don't understand is why there's nothing in `/local/domain/%d/device/suspend/event-channel`. So this is OK? For the restore functionality: # ls -ahl rhel5-32fv.sav -rwxr-xr-x 1 root root 53M Jun 21 2010 rhel5-32fv.sav As you can see the save file is 53M big but the guest was having 1G of memory and I think this is why it's failing. You can see it should be having 1G of memory here too: ... [2010-06-21 17:29:20 4305] DEBUG (XendDomainInfo:237) XendDomainInfo.restore(['domain', ['domid', '1'], ['cpu_weight', '256'], ['cpu_cap', '0'], ['on_crash', 'restart'], ['uuid', 'c91ec802-2015-cb49-80e5-810c808bf725'], ['bootloader_args'], ['pool_name', 'Pool-0'], ['vcpus', '1'], ['name', 'rhel5-32fv-stubdom'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['cpus', [[]]], ['description'], ['bootloader'], ['maxmem', '1024'],* ['memory', '1024'],* ['shadow_memory', '9'], ['vcpu_avail', '1'], ['features'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'], ['start_time', '1277134046.11'], ['cpu_time', '1.550284835'], ['online_vcpus', '1'], ['image', ['hvm', ['kernel'], ['superpages', '0'], ['tsc_mode', '0'], ['videoram', '4'], ['hpet', '0'], ['boot', 'c'], ['loader', '/usr/lib/xen/boot/hvmloader'], ['serial', 'pty'], ['vpt_align', '1'], ['xen_platform_pci', '1'], ['opengl', '1'], ['vncunused', '1'], ['rtc_timeoffset', '0'], ['pci', []], ['pae', '1'], ['stdvga', '0'], ['hap', '1'], ['viridian', '0'], ['acpi', '1'], ['localtime', '0'], ['timer_mode', '1'], ['vnc', '1'], ['nographic', '0'], ['guest_os_type', 'default'], ['vncdisplay', '1'], ['pci_msitranslate', '1'], ['oos', '1'], ['apic', '1'], ['sdl', '0'], ['nomigrate', '0'], ['device_model', '/usr/lib/xen/bin/qemu-dm'], ['pci_power_mgmt', '0'], ['usb', '0'], ['xauthority', '/root/.Xauthority'], ['isa', '0'], ['display', 'localhost:10.0'], ['notes', ['SUSPEND_CANCEL', '1']]]], ['status', '2'], ['state', 'r-----'], ['store_mfn', '1044476'], ['device', ['vif', ['bridge', 'virbr0'], ['uuid', 'dcd99a20-2e8f-2692-8e56-dc4051579923'], ['script', '/etc/xen/scripts/vif-bridge'], ['mac', '00:16:3e:5b:bd:9c'], ['type', 'ioemu'], ['backend', '0']]], ['device', ['vbd', ['uuid', 'e7e07da9-c104-800d-ee3f-5fe9757167fd'], ['bootable', '1'], ['dev', 'hda:disk'], ['uname', 'file:/var/lib/xen/images/colossus/rhel5-32fv.img'], ['mode', 'w'], ['backend', '0'], ['VDI']]], ['device', ['vbd', ['uuid', '0180089b-8394-cbfa-0da4-b8c1fc688617'], ['bootable', '0'], ['dev', 'sda:disk'], ['uname', 'file:/home2/test.img'], ['mode', 'w'], ['backend', '0'], ['VDI']]], ['device', ['vfb', ['vncunused', '1'], ['location', '127.0.0.1:5901'], ['vnc', '1'], ['vncdisplay', '1'], ['uuid', '7fa1bcc0-797d-66ac-eb88-6ef15f1209f0']]], ['device', ['console', ['protocol', 'vt100'], ['location', '3'], ['uuid', 'd77b182b-4152-a4d2-f577-8b610b5cd6ff']]]]) The first error (Error when reading batch size (0 = Success): Internal error) is coming from libxc/xc_domain_restore.c in pagebuf_get_one() function where it is there: ... if ( RDEXACT(fd, &count, sizeof(count)) ) { PERROR("Error when reading batch size"); return -1; } ... so I guess the data are not well-written for this guest (since the file is smaller than the original guest memory) and that's why the error occurs. As you can see there's nothing in xend.log except "failed to get the suspend evtchn port" message: [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 56 5 0 0 4 [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410) xc_save: failed to get the suspend evtchn port [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410) [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:381) suspend [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:129) In saveInputHandler suspend [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:131) Suspending 5 ... [2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:521) XendDomainInfo.shutdown(suspend) [2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:1877) XendDomainInfo.handleShutdownWatch [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:538) HVM save:remote shutdown dom 5! [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:2074) Domain has shutdown: name=migrating-rhel5-32fv-stubdom id=5 reason=suspend. [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:137) Domain 5 suspended. [2010-06-21 15:59:56 4305] INFO (image:538) signalDeviceModel:restore dm state to running [2010-06-21 15:59:56 4305] DEBUG (XendCheckpoint:146) Written done [2010-06-21 16:00:02 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=5 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2397) Destroying device model [2010-06-21 16:00:03 4305] INFO (image:615) migrating-rhel5-32fv-stubdom device model terminated [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2404) Releasing devices [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vif/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/768 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/2048 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vfb/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing console/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 Any ideas why the save file is that small (it should be 1024M at least, right? ) ? Thanks, Michal -- Michal Novotny, RHCE Virtualization Team (xen userspace), Red Hat