From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yufang Zhang Subject: elapse time computing when restarting VM? Date: Sun, 15 Aug 2010 08:27:24 -0400 (EDT) Message-ID: <1586042608.3892661281875244077.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> References: <1822033966.3892641281875172829.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <1822033966.3892641281875172829.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel List-Id: xen-devel@lists.xenproject.org Hi all, Currently, xend would compute elapse time since vm starts before restarting= a vm. If the elapse time is larger than MINIMUM_RESTART_TIME (which is 60s= ), xend would refuse to restart the vm but destroy it to avoid loops. Howev= er, when a guest crashes at boot time and enable-dump is enabled, core dump= is done before restarting the guest which may take quite a while (depends = on memory size of the guest). At this situation, elapse time computed is ex= panded thus xend wouldn't destory the guest. Then the guest drops into a re= start-crash-dumpcore loop, which is either a waist of cpu time or *disk spa= ce* of Domain0. Actually, I have hit this problem when I upgraded a 2048M = guest to a problematic kernel. The guest crashed at boot time and core dump= was done for it, after which the guest rebooted and go-through the previou= s steps. My domain0 was full of core dump files of that guest. So does it m= ake sense to figure out a way to solve the problem but not just enlarging M= INIMUM_RESTART_TIME? Is the following patch reasonable?=20 diff -r 774dfc178c39 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Aug 12 17:06:21 2010 +0= 100 +++ b/tools/python/xen/xend/XendDomainInfo.py Mon Aug 16 12:16:45 2010 +0= 800 @@ -2060,7 +2060,7 @@ log.warn('Domain has crashed: name=3D%s id=3D%d.', self.info['name_label'], self.domid) self._writeVm(LAST_SHUTDOWN_REASON, 'crash') - + self.info['crash_time'] =3D time.time() restart_reason =3D 'crash' self._stateSet(DOM_STATE_HALTED) @@ -2188,7 +2188,12 @@ old_domid =3D self.domid self._writeVm(RESTART_IN_PROGRESS, 'True') - elapse =3D time.time() - self.info['start_time'] + if xoptions.get_enable_dump() or self.get_on_crash() \ + in ['coredump_and_destroy', 'coredump_and_restart']: + elapse =3D self.info['crash_time'] - self.info['start_time'] + else: + elapse =3D time.time() - self.info['start_time'] + if elapse < MINIMUM_RESTART_TIME: log.error('VM %s restarting too fast (Elapsed time: %f seconds= ). ' 'Refusing to restart to avoid loops.', I have test the situation with the patch, and it works well when the guest = crashes at boot time. Best Regards. Yufang