* [PATCH] xend: Sleep before sending SIGKILL to device model
@ 2009-01-28 8:46 Yosuke Iwamatsu
2009-01-28 11:14 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Yosuke Iwamatsu @ 2009-01-28 8:46 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 815 bytes --]
When we destroy a domain, xend sends SIGTERM to the device model and
wait by waitpid() until the device model process disappears.
If we restarted xend during the lifetime of the domain, waitpid() fails
because the device model is no longer a child of xend, and in that case
xend gives up waiting for the shutdown of process and just send it
SIGKILL immediately. This is problematic because most of the case the
device model will be forcibly killed by xend before shutting itself
down.
This patch adds time.sleep before sending SIGKILL to the device model.
On my test box shutdown of a device model usually takes about 0.5 sec,
so waiting two seconds should be enough in most cases.
Regards,
-----------------------
Yosuke Iwamatsu
NEC Corporation
Signed-off-by: Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com>
[-- Attachment #2: xend_dm_sigkill.patch --]
[-- Type: all/allfiles, Size: 957 bytes --]
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xend: Sleep before sending SIGKILL to device model
2009-01-28 8:46 [PATCH] xend: Sleep before sending SIGKILL to device model Yosuke Iwamatsu
@ 2009-01-28 11:14 ` Ian Jackson
2009-01-29 8:40 ` Yosuke Iwamatsu
0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2009-01-28 11:14 UTC (permalink / raw)
To: Yosuke Iwamatsu; +Cc: xen-devel
Yosuke Iwamatsu writes ("[Xen-devel] [PATCH] xend: Sleep before sending SIGKILL to device model"):
> When we destroy a domain, xend sends SIGTERM to the device model and
> wait by waitpid() until the device model process disappears.
> If we restarted xend during the lifetime of the domain, waitpid() fails
> because the device model is no longer a child of xend, and in that case
> xend gives up waiting for the shutdown of process and just send it
> SIGKILL immediately. This is problematic because most of the case the
> device model will be forcibly killed by xend before shutting itself
> down.
The code already has a timeout to forcibly kill the device model after
(I think) 10 seconds. Surely we should reuse that code path (and the
same timeout value) ?
Restarting xend is not a usual thing to do and I think it's OK if
shutting down a domain started by a previous xend involves waiting for
such a longer timeout. It's better to err on the side of safety.
Also, your patch was:
Content-Type: all/allfiles;
This is not a recognised content type and prevented both of my
mailreaders from displaying it to me. Can you please fix your MUA ?
Alternatively, just include the patch in the body of the mail rather
than attaching it.
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] xend: Sleep before sending SIGKILL to device model
2009-01-28 11:14 ` Ian Jackson
@ 2009-01-29 8:40 ` Yosuke Iwamatsu
2009-01-29 11:12 ` Ian Jackson
2009-02-04 6:14 ` Yosuke Iwamatsu
0 siblings, 2 replies; 5+ messages in thread
From: Yosuke Iwamatsu @ 2009-01-29 8:40 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 914 bytes --]
Ian Jackson wrote:
> The code already has a timeout to forcibly kill the device model after
> (I think) 10 seconds. Surely we should reuse that code path (and the
> same timeout value) ?
>
> Restarting xend is not a usual thing to do and I think it's OK if
> shutting down a domain started by a previous xend involves waiting for
> such a longer timeout. It's better to err on the side of safety.
O.K. Attached is a revised patch which reuses the existing code path.
10 seconds seems to me a bit too long, but I can agree we had better
keep on the safe side.
> Also, your patch was:
> Content-Type: all/allfiles;
> This is not a recognised content type and prevented both of my
> mailreaders from displaying it to me. Can you please fix your MUA ?
Sorry for the inconvenience.
This time your mail client can recognize it, I think.
-- Yosuke
Signed-off-by: Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com>
[-- Attachment #2: xend_dm_sigkill.txt --]
[-- Type: text/plain, Size: 2560 bytes --]
diff -r 39517e863cc8 tools/python/xen/xend/image.py
--- a/tools/python/xen/xend/image.py Mon Jan 26 16:19:42 2009 +0000
+++ b/tools/python/xen/xend/image.py Thu Jan 29 17:30:20 2009 +0900
@@ -558,24 +558,30 @@
os.kill(self.pid, signal.SIGHUP)
except OSError, exn:
log.exception(exn)
- try:
- # Try to reap the child every 100ms for 10s. Then SIGKILL it.
- for i in xrange(100):
+ # Try to reap the child every 100ms for 10s. Then SIGKILL it.
+ for i in xrange(100):
+ try:
(p, rv) = os.waitpid(self.pid, os.WNOHANG)
if p == self.pid:
break
- time.sleep(0.1)
- else:
- log.warning("DeviceModel %d took more than 10s "
- "to terminate: sending SIGKILL" % self.pid)
+ except OSError:
+ # This is expected if Xend has been restarted within
+ # the life of this domain. In this case, we can kill
+ # the process, but we can't wait for it because it's
+ # not our child. We continue this loop, and after it is
+ # terminated make really sure the process is going away
+ # (SIGKILL).
+ pass
+ time.sleep(0.1)
+ else:
+ log.warning("DeviceModel %d took more than 10s "
+ "to terminate: sending SIGKILL" % self.pid)
+ try:
os.kill(self.pid, signal.SIGKILL)
os.waitpid(self.pid, 0)
- except OSError, exn:
- # This is expected if Xend has been restarted within the
- # life of this domain. In this case, we can kill the process,
- # but we can't wait for it because it's not our child.
- # We just make really sure it's going away (SIGKILL) first.
- os.kill(self.pid, signal.SIGKILL)
+ except OSError:
+ # This happens if the process doesn't exist.
+ pass
state = xstransact.Remove("/local/domain/0/device-model/%i"
% self.vm.getDomid())
finally:
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xend: Sleep before sending SIGKILL to device model
2009-01-29 8:40 ` Yosuke Iwamatsu
@ 2009-01-29 11:12 ` Ian Jackson
2009-02-04 6:14 ` Yosuke Iwamatsu
1 sibling, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2009-01-29 11:12 UTC (permalink / raw)
To: Yosuke Iwamatsu; +Cc: xen-devel
Yosuke Iwamatsu writes ("[Xen-devel] [PATCH] xend: Sleep before sending SIGKILL to device model"):
> O.K. Attached is a revised patch which reuses the existing code path.
> 10 seconds seems to me a bit too long, but I can agree we had better
> keep on the safe side.
That looks reasonable to me, thanks. (Although I haven't tested it.)
> Signed-off-by: Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] xend: Sleep before sending SIGKILL to device model
2009-01-29 8:40 ` Yosuke Iwamatsu
2009-01-29 11:12 ` Ian Jackson
@ 2009-02-04 6:14 ` Yosuke Iwamatsu
1 sibling, 0 replies; 5+ messages in thread
From: Yosuke Iwamatsu @ 2009-02-04 6:14 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
Keir,
Would you mind applying this?
-- Yosuke
Yosuke Iwamatsu wrote:
> Ian Jackson wrote:
>> The code already has a timeout to forcibly kill the device model after
>> (I think) 10 seconds. Surely we should reuse that code path (and the
>> same timeout value) ?
>>
>> Restarting xend is not a usual thing to do and I think it's OK if
>> shutting down a domain started by a previous xend involves waiting for
>> such a longer timeout. It's better to err on the side of safety.
>
> O.K. Attached is a revised patch which reuses the existing code path.
> 10 seconds seems to me a bit too long, but I can agree we had better
> keep on the safe side.
>
>> Also, your patch was:
>> Content-Type: all/allfiles;
>> This is not a recognised content type and prevented both of my
>> mailreaders from displaying it to me. Can you please fix your MUA ?
>
> Sorry for the inconvenience.
> This time your mail client can recognize it, I think.
>
> -- Yosuke
>
> Signed-off-by: Yosuke Iwamatsu <y-iwamatsu@ab.jp.nec.com>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-02-04 6:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-28 8:46 [PATCH] xend: Sleep before sending SIGKILL to device model Yosuke Iwamatsu
2009-01-28 11:14 ` Ian Jackson
2009-01-29 8:40 ` Yosuke Iwamatsu
2009-01-29 11:12 ` Ian Jackson
2009-02-04 6:14 ` Yosuke Iwamatsu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.