From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yufang Zhang <yuzhang@redhat.com>
Subject: elapse time computing when restarting VM?
Date: Sun, 15 Aug 2010 08:27:24 -0400 (EDT)
Message-ID: <1586042608.3892661281875244077.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
References: <1822033966.3892641281875172829.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <1822033966.3892641281875172829.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: xen-devel <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

Hi all,
Currently, xend would compute elapse time since vm starts before restarting=
 a vm. If the elapse time is larger than MINIMUM_RESTART_TIME (which is 60s=
), xend would refuse to restart the vm but destroy it to avoid loops. Howev=
er, when a guest crashes at boot time and enable-dump is enabled, core dump=
 is done before restarting the guest which may take quite a while (depends =
on memory size of the guest). At this situation, elapse time computed is ex=
panded thus xend wouldn't destory the guest. Then the guest drops into a re=
start-crash-dumpcore loop, which is either a waist of cpu time or *disk spa=
ce* of Domain0.  Actually, I have hit this problem when I upgraded a 2048M =
guest to a problematic kernel. The guest crashed at boot time and core dump=
 was done for it, after which the guest rebooted and go-through the previou=
s steps. My domain0 was full of core dump files of that guest. So does it m=
ake sense to figure out a way to solve the problem but not just enlarging M=
INIMUM_RESTART_TIME? Is the following patch reasonable?=20

diff -r 774dfc178c39 tools/python/xen/xend/XendDomainInfo.py
--- a/tools/python/xen/xend/XendDomainInfo.py   Thu Aug 12 17:06:21 2010 +0=
100
+++ b/tools/python/xen/xend/XendDomainInfo.py   Mon Aug 16 12:16:45 2010 +0=
800
@@ -2060,7 +2060,7 @@
                 log.warn('Domain has crashed: name=3D%s id=3D%d.',
                          self.info['name_label'], self.domid)
                 self._writeVm(LAST_SHUTDOWN_REASON, 'crash')
-
+                self.info['crash_time'] =3D time.time()
                 restart_reason =3D 'crash'
                 self._stateSet(DOM_STATE_HALTED)

@@ -2188,7 +2188,12 @@
         old_domid =3D self.domid
         self._writeVm(RESTART_IN_PROGRESS, 'True')

-        elapse =3D time.time() - self.info['start_time']
+        if xoptions.get_enable_dump() or self.get_on_crash() \
+               in ['coredump_and_destroy', 'coredump_and_restart']:
+            elapse =3D self.info['crash_time'] - self.info['start_time']
+        else:
+            elapse =3D time.time() - self.info['start_time']
+
         if elapse < MINIMUM_RESTART_TIME:
             log.error('VM %s restarting too fast (Elapsed time: %f seconds=
). '
                       'Refusing to restart to avoid loops.',

I have test the situation with the patch, and it works well when the guest =
crashes at boot time.

Best Regards.

Yufang