* [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
@ 2013-11-19 6:13 Joe Jin
2013-11-19 7:06 ` Dario Faggioli
2013-11-19 8:03 ` Roger Pau Monné
0 siblings, 2 replies; 7+ messages in thread
From: Joe Jin @ 2013-11-19 6:13 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, ian.jackson, Ian Campbell, Keir Fraser; +Cc: xen-devel
When created new guest on NUMA server, xend tried to get the best node by
calculated all vcpus info, the race is if other geust is rebooting, the
guest in the list when entered find_relaxed_node(), but when call
getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
below error:
[2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
XendTask.log_progress(31, 60, self._initDomain)
File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
retval = func(*args, **kwds)
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
node = self._setCPUAffinity()
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
best_node = find_relaxed_node(candidate_node_list)[0]
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
cpuinfo = dom.getVCPUInfo()
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
raise XendError(str(exn))
XendError: (3, 'No such process')
This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
guest vpus info to avoid the race.
Signed-off-by: Joe Jin <joe.jin@oracle.com>
---
tools/python/xen/xend/XendDomainInfo.py | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
index e9d3e7e..66e4b9f 100644
--- a/tools/python/xen/xend/XendDomainInfo.py
+++ b/tools/python/xen/xend/XendDomainInfo.py
@@ -2734,6 +2734,8 @@ class XendDomainInfo:
from xen.xend import XendDomain
doms = XendDomain.instance().list('all')
for dom in filter (lambda d: d.domid != self.domid, doms):
+ if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
+ continue
cpuinfo = dom.getVCPUInfo()
for vcpu in sxp.children(cpuinfo, 'vcpu'):
if sxp.child_value(vcpu, 'online') == 0: continue
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 6:13 [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED Joe Jin
@ 2013-11-19 7:06 ` Dario Faggioli
2013-11-19 8:03 ` Roger Pau Monné
1 sibling, 0 replies; 7+ messages in thread
From: Dario Faggioli @ 2013-11-19 7:06 UTC (permalink / raw)
To: Joe Jin; +Cc: Keir Fraser, xen-devel, ian.jackson, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 2131 bytes --]
On mar, 2013-11-19 at 14:13 +0800, Joe Jin wrote:
> When created new guest on NUMA server, xend tried to get the best node by
> calculated all vcpus info, the race is if other geust is rebooting, the
> guest in the list when entered find_relaxed_node(), but when call
> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
> below error:
>
> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> XendTask.log_progress(31, 60, self._initDomain)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> retval = func(*args, **kwds)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> node = self._setCPUAffinity()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> best_node = find_relaxed_node(candidate_node_list)[0]
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> cpuinfo = dom.getVCPUInfo()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> raise XendError(str(exn))
> XendError: (3, 'No such process')
>
> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
> guest vpus info to avoid the race.
>
> Signed-off-by: Joe Jin <joe.jin@oracle.com>
>
The idea looks ok. Unfortunately, I know nothing of xend, thus I really
don't feel comfortable enough to provide a formal Ack.
Basically, I don't know whether this patch is the best way to fix the
issue, if there are other ways, etc, but the problem certainly exist and
the solution sounds sound. :-)
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 6:13 [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED Joe Jin
2013-11-19 7:06 ` Dario Faggioli
@ 2013-11-19 8:03 ` Roger Pau Monné
2013-11-19 10:41 ` Joe Jin
1 sibling, 1 reply; 7+ messages in thread
From: Roger Pau Monné @ 2013-11-19 8:03 UTC (permalink / raw)
To: Joe Jin, Konrad Rzeszutek Wilk, ian.jackson, Ian Campbell,
Keir Fraser
Cc: xen-devel
On 19/11/13 07:13, Joe Jin wrote:
> When created new guest on NUMA server, xend tried to get the best node by
> calculated all vcpus info, the race is if other geust is rebooting, the
> guest in the list when entered find_relaxed_node(), but when call
> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
> below error:
>
> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> XendTask.log_progress(31, 60, self._initDomain)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> retval = func(*args, **kwds)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> node = self._setCPUAffinity()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> best_node = find_relaxed_node(candidate_node_list)[0]
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> cpuinfo = dom.getVCPUInfo()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> raise XendError(str(exn))
> XendError: (3, 'No such process')
>
> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
> guest vpus info to avoid the race.
>
> Signed-off-by: Joe Jin <joe.jin@oracle.com>
> ---
> tools/python/xen/xend/XendDomainInfo.py | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> index e9d3e7e..66e4b9f 100644
> --- a/tools/python/xen/xend/XendDomainInfo.py
> +++ b/tools/python/xen/xend/XendDomainInfo.py
> @@ -2734,6 +2734,8 @@ class XendDomainInfo:
> from xen.xend import XendDomain
> doms = XendDomain.instance().list('all')
> for dom in filter (lambda d: d.domid != self.domid, doms):
> + if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
> + continue
Isn't it possible that the domain has rebooted and is no longer there
between this two calls?
IMHO it's very unlikely, but there's still a window where getVCPUInfo
could fail.
> cpuinfo = dom.getVCPUInfo()
> for vcpu in sxp.children(cpuinfo, 'vcpu'):
> if sxp.child_value(vcpu, 'online') == 0: continue
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 8:03 ` Roger Pau Monné
@ 2013-11-19 10:41 ` Joe Jin
2013-11-19 14:06 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 7+ messages in thread
From: Joe Jin @ 2013-11-19 10:41 UTC (permalink / raw)
To: Roger Pau Monné, Konrad Rzeszutek Wilk, ian.jackson,
Ian Campbell, Keir Fraser
Cc: xen-devel
On 11/19/13 16:03, Roger Pau Monné wrote:
> On 19/11/13 07:13, Joe Jin wrote:
>> When created new guest on NUMA server, xend tried to get the best node by
>> calculated all vcpus info, the race is if other geust is rebooting, the
>> guest in the list when entered find_relaxed_node(), but when call
>> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
>> below error:
>>
>> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
>> Traceback (most recent call last):
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
>> XendTask.log_progress(31, 60, self._initDomain)
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
>> retval = func(*args, **kwds)
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
>> node = self._setCPUAffinity()
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
>> best_node = find_relaxed_node(candidate_node_list)[0]
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
>> cpuinfo = dom.getVCPUInfo()
>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
>> raise XendError(str(exn))
>> XendError: (3, 'No such process')
>>
>> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
>> guest vpus info to avoid the race.
>>
>> Signed-off-by: Joe Jin <joe.jin@oracle.com>
>> ---
>> tools/python/xen/xend/XendDomainInfo.py | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
>> index e9d3e7e..66e4b9f 100644
>> --- a/tools/python/xen/xend/XendDomainInfo.py
>> +++ b/tools/python/xen/xend/XendDomainInfo.py
>> @@ -2734,6 +2734,8 @@ class XendDomainInfo:
>> from xen.xend import XendDomain
>> doms = XendDomain.instance().list('all')
>> for dom in filter (lambda d: d.domid != self.domid, doms):
>> + if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
>> + continue
>
> Isn't it possible that the domain has rebooted and is no longer there
> between this two calls?
>
> IMHO it's very unlikely, but there's still a window where getVCPUInfo
> could fail.
>
Yes your right, this patch just reduce the window.
I created a new patch for this, please comment!
[PATCH] xend: getVCPUInfo should handle died domain
When created new guest on NUMA server, xend tried to get the best node by
calculated all vcpus info, the race is if other geust is rebooting, the
guest in the list when entered find_relaxed_node(), but when call
getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
fail with below error:
[2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
XendTask.log_progress(31, 60, self._initDomain)
File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
retval = func(*args, **kwds)
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
node = self._setCPUAffinity()
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
best_node = find_relaxed_node(candidate_node_list)[0]
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
cpuinfo = dom.getVCPUInfo()
File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
raise XendError(str(exn))
XendError: (3, 'No such process')
This patch will handle the situation.
Signed-off-by: Joe Jin <joe.jin@oracle.com>
---
tools/python/xen/xend/XendDomainInfo.py | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
index e9d3e7e..c6414ed 100644
--- a/tools/python/xen/xend/XendDomainInfo.py
+++ b/tools/python/xen/xend/XendDomainInfo.py
@@ -34,6 +34,7 @@ import os
import stat
import shutil
import traceback
+import errno
from types import StringTypes
import xen.lowlevel.xc
@@ -1541,6 +1542,9 @@ class XendDomainInfo:
return sxpr
except RuntimeError, exn:
+ # Domain already died.
+ if exn.args[0] == errno.ESRCH:
+ return sxpr
raise XendError(str(exn))
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 10:41 ` Joe Jin
@ 2013-11-19 14:06 ` Konrad Rzeszutek Wilk
2013-11-19 16:26 ` Matt Wilson
0 siblings, 1 reply; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-19 14:06 UTC (permalink / raw)
To: Joe Jin, msw
Cc: Keir Fraser, xen-devel, ian.jackson, Ian Campbell,
Roger Pau Monné
On Tue, Nov 19, 2013 at 06:41:37PM +0800, Joe Jin wrote:
> On 11/19/13 16:03, Roger Pau Monné wrote:
> > On 19/11/13 07:13, Joe Jin wrote:
> >> When created new guest on NUMA server, xend tried to get the best node by
> >> calculated all vcpus info, the race is if other geust is rebooting, the
> >> guest in the list when entered find_relaxed_node(), but when call
> >> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
> >> below error:
> >>
> >> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> >> Traceback (most recent call last):
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> >> XendTask.log_progress(31, 60, self._initDomain)
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> >> retval = func(*args, **kwds)
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> >> node = self._setCPUAffinity()
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> >> best_node = find_relaxed_node(candidate_node_list)[0]
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> >> cpuinfo = dom.getVCPUInfo()
> >> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> >> raise XendError(str(exn))
> >> XendError: (3, 'No such process')
> >>
> >> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
> >> guest vpus info to avoid the race.
> >>
> >> Signed-off-by: Joe Jin <joe.jin@oracle.com>
> >> ---
> >> tools/python/xen/xend/XendDomainInfo.py | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> >> index e9d3e7e..66e4b9f 100644
> >> --- a/tools/python/xen/xend/XendDomainInfo.py
> >> +++ b/tools/python/xen/xend/XendDomainInfo.py
> >> @@ -2734,6 +2734,8 @@ class XendDomainInfo:
> >> from xen.xend import XendDomain
> >> doms = XendDomain.instance().list('all')
> >> for dom in filter (lambda d: d.domid != self.domid, doms):
> >> + if dom._stateGet() not in (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
> >> + continue
> >
> > Isn't it possible that the domain has rebooted and is no longer there
> > between this two calls?
> >
> > IMHO it's very unlikely, but there's still a window where getVCPUInfo
> > could fail.
> >
>
> Yes your right, this patch just reduce the window.
> I created a new patch for this, please comment!
>
> [PATCH] xend: getVCPUInfo should handle died domain
>
> When created new guest on NUMA server, xend tried to get the best node by
> calculated all vcpus info, the race is if other geust is rebooting, the
> guest in the list when entered find_relaxed_node(), but when call
> getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
> fail with below error:
>
> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> XendTask.log_progress(31, 60, self._initDomain)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> retval = func(*args, **kwds)
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> node = self._setCPUAffinity()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> best_node = find_relaxed_node(candidate_node_list)[0]
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> cpuinfo = dom.getVCPUInfo()
> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> raise XendError(str(exn))
> XendError: (3, 'No such process')
>
> This patch will handle the situation.
>
> Signed-off-by: Joe Jin <joe.jin@oracle.com>
> ---
> tools/python/xen/xend/XendDomainInfo.py | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> index e9d3e7e..c6414ed 100644
> --- a/tools/python/xen/xend/XendDomainInfo.py
> +++ b/tools/python/xen/xend/XendDomainInfo.py
> @@ -34,6 +34,7 @@ import os
> import stat
> import shutil
> import traceback
> +import errno
> from types import StringTypes
>
> import xen.lowlevel.xc
> @@ -1541,6 +1542,9 @@ class XendDomainInfo:
> return sxpr
>
> except RuntimeError, exn:
> + # Domain already died.
> + if exn.args[0] == errno.ESRCH:
> + return sxpr
> raise XendError(str(exn))
>
>
Adding Matt as he has stepped up to be the bug-fix maintainer of Xend
(I think? Is that correct - should that be reflected in the MAINTAINERS file?)
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 14:06 ` Konrad Rzeszutek Wilk
@ 2013-11-19 16:26 ` Matt Wilson
2013-11-20 2:27 ` Joe Jin
0 siblings, 1 reply; 7+ messages in thread
From: Matt Wilson @ 2013-11-19 16:26 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Keir Fraser, Ian Campbell, ian.jackson, Joe Jin, xen-devel, msw,
Roger Pau Monné
On Tue, Nov 19, 2013 at 09:06:51AM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 19, 2013 at 06:41:37PM +0800, Joe Jin wrote:
[...]
> >
> > Yes your right, this patch just reduce the window.
> > I created a new patch for this, please comment!
> >
> > [PATCH] xend: getVCPUInfo should handle died domain
> >
> > When created new guest on NUMA server, xend tried to get the best node by
> > calculated all vcpus info, the race is if other geust is rebooting, the
> > guest in the list when entered find_relaxed_node(), but when call
> > getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
> > fail with below error:
> >
> > [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> > Traceback (most recent call last):
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
> > XendTask.log_progress(31, 60, self._initDomain)
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
> > retval = func(*args, **kwds)
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
> > node = self._setCPUAffinity()
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
> > best_node = find_relaxed_node(candidate_node_list)[0]
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
> > cpuinfo = dom.getVCPUInfo()
> > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
> > raise XendError(str(exn))
> > XendError: (3, 'No such process')
> >
> > This patch will handle the situation.
> >
> > Signed-off-by: Joe Jin <joe.jin@oracle.com>
> > ---
> > tools/python/xen/xend/XendDomainInfo.py | 4 ++++
> > 1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
> > index e9d3e7e..c6414ed 100644
> > --- a/tools/python/xen/xend/XendDomainInfo.py
> > +++ b/tools/python/xen/xend/XendDomainInfo.py
> > @@ -34,6 +34,7 @@ import os
> > import stat
> > import shutil
> > import traceback
> > +import errno
> > from types import StringTypes
> >
> > import xen.lowlevel.xc
> > @@ -1541,6 +1542,9 @@ class XendDomainInfo:
> > return sxpr
> >
> > except RuntimeError, exn:
> > + # Domain already died.
> > + if exn.args[0] == errno.ESRCH:
> > + return sxpr
> > raise XendError(str(exn))
> >
> >
>
> Adding Matt as he has stepped up to be the bug-fix maintainer of Xend
> (I think? Is that correct - should that be reflected in the MAINTAINERS file?)
This should probably be handling xen.lowlevel.xc.Error. There's no
guarantee that a RuntimeError will have arguments, though
xen.lowlevel.xc.Error seems to always be constructed with arguments.
--msw
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED
2013-11-19 16:26 ` Matt Wilson
@ 2013-11-20 2:27 ` Joe Jin
0 siblings, 0 replies; 7+ messages in thread
From: Joe Jin @ 2013-11-20 2:27 UTC (permalink / raw)
To: Matt Wilson, Konrad Rzeszutek Wilk
Cc: Keir Fraser, Ian Campbell, ian.jackson, xen-devel, msw,
Roger Pau Monné
On 11/20/13 00:26, Matt Wilson wrote:
> On Tue, Nov 19, 2013 at 09:06:51AM -0500, Konrad Rzeszutek Wilk wrote:
>> On Tue, Nov 19, 2013 at 06:41:37PM +0800, Joe Jin wrote:
> [...]
>>>
>>> Yes your right, this patch just reduce the window.
>>> I created a new patch for this, please comment!
>>>
>>> [PATCH] xend: getVCPUInfo should handle died domain
>>>
>>> When created new guest on NUMA server, xend tried to get the best node by
>>> calculated all vcpus info, the race is if other geust is rebooting, the
>>> guest in the list when entered find_relaxed_node(), but when call
>>> getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
>>> fail with below error:
>>>
>>> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
>>> Traceback (most recent call last):
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 482, in start
>>> XendTask.log_progress(31, 60, self._initDomain)
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, in log_progress
>>> retval = func(*args, **kwds)
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2918, in _initDomain
>>> node = self._setCPUAffinity()
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2835, in _setCPUAffinity
>>> best_node = find_relaxed_node(candidate_node_list)[0]
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2803, in find_relaxed_node
>>> cpuinfo = dom.getVCPUInfo()
>>> File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1600, in getVCPUInfo
>>> raise XendError(str(exn))
>>> XendError: (3, 'No such process')
>>>
>>> This patch will handle the situation.
>>>
>>> Signed-off-by: Joe Jin <joe.jin@oracle.com>
>>> ---
>>> tools/python/xen/xend/XendDomainInfo.py | 4 ++++
>>> 1 files changed, 4 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py
>>> index e9d3e7e..c6414ed 100644
>>> --- a/tools/python/xen/xend/XendDomainInfo.py
>>> +++ b/tools/python/xen/xend/XendDomainInfo.py
>>> @@ -34,6 +34,7 @@ import os
>>> import stat
>>> import shutil
>>> import traceback
>>> +import errno
>>> from types import StringTypes
>>>
>>> import xen.lowlevel.xc
>>> @@ -1541,6 +1542,9 @@ class XendDomainInfo:
>>> return sxpr
>>>
>>> except RuntimeError, exn:
>>> + # Domain already died.
>>> + if exn.args[0] == errno.ESRCH:
>>> + return sxpr
>>> raise XendError(str(exn))
>>>
>>>
>>
>> Adding Matt as he has stepped up to be the bug-fix maintainer of Xend
>> (I think? Is that correct - should that be reflected in the MAINTAINERS file?)
>
> This should probably be handling xen.lowlevel.xc.Error. There's no
> guarantee that a RuntimeError will have arguments, though
> xen.lowlevel.xc.Error seems to always be constructed with arguments.
>
Do you means when ESRCH returned to xc, generate fake vcpu info rather
than raise exception?
I created a patch for this, can you please review?
Subject: xc: build fake vcpu info when domain already died
Signed-off-by: Joe Jin <joe.jin@oracle.com>
---
tools/python/xen/lowlevel/xc/xc.c | 26 ++++++++++++++++++++++----
1 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 2625fc4..5c40e37 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -384,6 +384,7 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
int rc, i;
xc_cpumap_t cpumap;
int nr_cpus;
+ int died = 0;
static char *kwd_list[] = { "domid", "vcpu", NULL };
@@ -396,8 +397,13 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
return pyxc_error_to_exception(self->xc_handle);
rc = xc_vcpu_getinfo(self->xc_handle, dom, vcpu, &info);
- if ( rc < 0 )
- return pyxc_error_to_exception(self->xc_handle);
+ if ( rc < 0)
+ {
+ if (errno == ESRCH)
+ died = 1;
+ else
+ return pyxc_error_to_exception(self->xc_handle);
+ }
cpumap = xc_cpumap_alloc(self->xc_handle);
if(cpumap == NULL)
@@ -406,8 +412,20 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
rc = xc_vcpu_getaffinity(self->xc_handle, dom, vcpu, cpumap);
if ( rc < 0 )
{
- free(cpumap);
- return pyxc_error_to_exception(self->xc_handle);
+ if (errno == ESRCH)
+ died = 1;
+ else
+ {
+ free(cpumap);
+ return pyxc_error_to_exception(self->xc_handle);
+ }
+ }
+
+ if (died)
+ {
+ memset(&info, 0, sizeof(info));
+ info.cpu_time = 0.0;
+ info.cpu = -1;
}
info_dict = Py_BuildValue("{s:i,s:i,s:i,s:L,s:i}",
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-20 2:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-19 6:13 [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED Joe Jin
2013-11-19 7:06 ` Dario Faggioli
2013-11-19 8:03 ` Roger Pau Monné
2013-11-19 10:41 ` Joe Jin
2013-11-19 14:06 ` Konrad Rzeszutek Wilk
2013-11-19 16:26 ` Matt Wilson
2013-11-20 2:27 ` Joe Jin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).