xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Joe Jin <joe.jin@oracle.com>, msw@amazon.com
Cc: "Keir Fraser" <keir@xen.org>, xen-devel <xen-devel@lists.xen.org>,
	ian.jackson@eu.citrix.com,
	"Ian Campbell" <ian.campbell@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
Date: Tue, 19 Nov 2013 09:06:51 -0500	[thread overview]
Message-ID: <20131119140651.GC5332@phenom.dumpdata.com> (raw)
In-Reply-To: <528B4061.1000305@oracle.com>

On Tue, Nov 19, 2013 at 06:41:37PM +0800, Joe Jin wrote:
> On 11/19/13 16:03, Roger Pau Monné wrote:
> > On 19/11/13 07:13, Joe Jin wrote:
> >> When created new guest on NUMA server, xend tried to get the best node by
> >> calculated all vcpus info, the race is if other geust is rebooting, the
> >> guest in the list when entered find_relaxed_node(), but when call
> >> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
> >> below error:
> >>
> >> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> >> Traceback (most recent call last):
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> >>     XendTask.log_progress(31, 60, self._initDomain)
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> >>     retval = func(*args, **kwds)
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> >>     node = self._setCPUAffinity()
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> >>     best_node = find_relaxed_node(candidate_node_list)[0]
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> >>     cpuinfo = dom.getVCPUInfo()
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> >>     raise XendError(str(exn))
> >> XendError: (3, 'No such process')
> >>
> >> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
> >> guest vpus info to avoid the race.
> >>
> >> Signed-off-by: Joe Jin <joe.jin@oracle.com>
> >> ---
> >>  tools/python/xen/xend/XendDomainInfo.py |    2 ++
> >>  1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> >> index e9d3e7e..66e4b9f 100644
> >> --- a/tools/python/xen/xend/XendDomainInfo.py
> >> +++ b/tools/python/xen/xend/XendDomainInfo.py
> >> @@ -2734,6 +2734,8 @@ class XendDomainInfo:
> >>                  from xen.xend import XendDomain
> >>                  doms = XendDomain.instance().list('all')
> >>                  for dom in filter (lambda d: d.domid != self.domid, doms):
> >> +                    if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
> >> +                        continue
> > 
> > Isn't it possible that the domain has rebooted and is no longer there
> > between this two calls?
> > 
> > IMHO it's very unlikely, but there's still a window where getVCPUInfo
> > could fail.
> > 
> 
> Yes your right, this patch just reduce the window. 
> I created a new patch for this, please comment!
> 
> [PATCH] xend: getVCPUInfo should handle died domain
> 
> When created new guest on NUMA server, xend tried to get the best node by
> calculated all vcpus info, the race is if other geust is rebooting, the
> guest in the list when entered find_relaxed_node(), but when call
> getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
> fail with  below error:
> 
> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> Traceback (most recent call last):
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
>     XendTask.log_progress(31, 60, self._initDomain)
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
>     retval = func(*args, **kwds)
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
>     node = self._setCPUAffinity()
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
>     best_node = find_relaxed_node(candidate_node_list)[0]
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
>     cpuinfo = dom.getVCPUInfo()
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
>     raise XendError(str(exn))
> XendError: (3, 'No such process')
> 
> This patch will handle the situation.
> 
> Signed-off-by: Joe Jin <joe.jin@oracle.com>
> ---
>  tools/python/xen/xend/XendDomainInfo.py |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> index e9d3e7e..c6414ed 100644
> --- a/tools/python/xen/xend/XendDomainInfo.py
> +++ b/tools/python/xen/xend/XendDomainInfo.py
> @@ -34,6 +34,7 @@ import os
>  import stat
>  import shutil
>  import traceback
> +import errno
>  from types import StringTypes
>  
>  import xen.lowlevel.xc
> @@ -1541,6 +1542,9 @@ class XendDomainInfo:
>              return sxpr
>  
>          except RuntimeError, exn:
> +            # Domain already died.
> +            if exn.args[0] == errno.ESRCH:
> +                return sxpr
>              raise XendError(str(exn))
>  
>  

Adding Matt as he has stepped up to be the bug-fix maintainer of Xend
(I think? Is that correct - should that be reflected in the MAINTAINERS file?)
> -- 
> 1.7.1
> 
> 

  reply	other threads:[~2013-11-19 14:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-19  6:13 [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED Joe Jin
2013-11-19  7:06 ` Dario Faggioli
2013-11-19  8:03 ` Roger Pau Monné
2013-11-19 10:41   ` Joe Jin
2013-11-19 14:06     ` Konrad Rzeszutek Wilk [this message]
2013-11-19 16:26       ` Matt Wilson
2013-11-20  2:27         ` Joe Jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131119140651.GC5332@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=joe.jin@oracle.com \
    --cc=keir@xen.org \
    --cc=msw@amazon.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).