From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= Subject: Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED Date: Tue, 19 Nov 2013 09:03:27 +0100 Message-ID: <528B1B4F.2010102@citrix.com> References: <528B017D.5020202@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <528B017D.5020202@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Joe Jin , Konrad Rzeszutek Wilk , ian.jackson@eu.citrix.com, Ian Campbell , Keir Fraser Cc: xen-devel List-Id: xen-devel@lists.xenproject.org On 19/11/13 07:13, Joe Jin wrote: > When created new guest on NUMA server, xend tried to get the best node by > calculated all vcpus info, the race is if other geust is rebooting, the > guest in the list when entered find_relaxed_node(), but when call > getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with > below error: > > [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed > Traceback (most recent call last): > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start > XendTask.log_progress(31, 60, self._initDomain) > File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress > retval = func(*args, **kwds) > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain > node = self._setCPUAffinity() > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity > best_node = find_relaxed_node(candidate_node_list)[0] > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node > cpuinfo = dom.getVCPUInfo() > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo > raise XendError(str(exn)) > XendError: (3, 'No such process') > > This patch will let find_relaxed_node() only polling the RUNNING or PAUSED > guest vpus info to avoid the race. > > Signed-off-by: Joe Jin > --- > tools/python/xen/xend/XendDomainInfo.py | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py > index e9d3e7e..66e4b9f 100644 > --- a/tools/python/xen/xend/XendDomainInfo.py > +++ b/tools/python/xen/xend/XendDomainInfo.py > @@ -2734,6 +2734,8 @@ class XendDomainInfo: > from xen.xend import XendDomain > doms = XendDomain.instance().list('all') > for dom in filter (lambda d: d.domid != self.domid, doms): > + if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED): > + continue Isn't it possible that the domain has rebooted and is no longer there between this two calls? IMHO it's very unlikely, but there's still a window where getVCPUInfo could fail. > cpuinfo = dom.getVCPUInfo() > for vcpu in sxp.children(cpuinfo, 'vcpu'): > if sxp.child_value(vcpu, 'online') == 0: continue >