* SHUTDOWN_crash and vcpu deferrals
@ 2009-02-20 21:01 John Levon
2009-02-20 21:35 ` Keir Fraser
0 siblings, 1 reply; 7+ messages in thread
From: John Levon @ 2009-02-20 21:01 UTC (permalink / raw)
To: xen-devel
If an HVM guest is waiting for an ioemu assist, when qemu isn't running, and
domain_shutdown(SHUTDOWN_crash) is called, then the domain isn't crashed
properly:
446 void domain_shutdown(struct domain *d, u8 reason)
447 {
...
466 for_each_vcpu ( d, v )
467 {
468 if ( v->defer_shutdown )
469 continue;
Nothing will ever end the deferral. I added code to bust through the
deferral if SHUTDOWN_crash was the reason, and it seemed to help, but
I'm not sure it's the right fix.
regards
john
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-20 21:01 SHUTDOWN_crash and vcpu deferrals John Levon
@ 2009-02-20 21:35 ` Keir Fraser
2009-02-20 22:03 ` John Levon
0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2009-02-20 21:35 UTC (permalink / raw)
To: John Levon, xen-devel@lists.xensource.com
On 20/02/2009 21:01, "John Levon" <levon@movementarian.org> wrote:
> If an HVM guest is waiting for an ioemu assist, when qemu isn't running, and
> domain_shutdown(SHUTDOWN_crash) is called, then the domain isn't crashed
> properly:
>
> Nothing will ever end the deferral. I added code to bust through the
> deferral if SHUTDOWN_crash was the reason, and it seemed to help, but
> I'm not sure it's the right fix.
Hm. If qemu is down you're kind of screwed anyway. Even a non-crashed guest
will likely hang. If you care about that eventuality (i.e., you believe qemu
problems are possible/likely and need to detect them, defend against them,
or whatever), would it be better to have tools try to detect it through
keepalives or something, and basically tackle that class of problem head on?
If you want the hack, I think what you're doing is probably about right. I'd
have to go back over that code again to be exactly sure though, since it's a
bit subtle.
Personally I think a dead qemu is pretty bad, and bugs leading to such
should simply be found and fixed (oh for a perfect world :-). That bad
things happen to a guest, like SHUTDOWN_crash hanging, after qemu is dead...
I'd just live with that -- a worse thing has *already* happened to that
guest's virtualisation environment.
-- Keir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-20 21:35 ` Keir Fraser
@ 2009-02-20 22:03 ` John Levon
2009-02-21 9:01 ` Keir Fraser
2009-02-23 16:51 ` Ian Jackson
0 siblings, 2 replies; 7+ messages in thread
From: John Levon @ 2009-02-20 22:03 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com
On Fri, Feb 20, 2009 at 09:35:16PM +0000, Keir Fraser wrote:
> Hm. If qemu is down you're kind of screwed anyway.
You're totally screwed. But what happens today is this: you get some
weird message about sentinels in xend.log (if you happen to read it),
and a domain state that looks like this:
domu-224 2 1024 1 ------ 0.0
which is not exactly very useful. But we detect qemu failures now in
xend. So we turn on this code:
# ideally we would like to forcibly crash the domain with
# something like
# xc.domain_shutdown(self.vm.getDomid(), DOMAIN_CRASH)
# but this can easily lead to very rapid restart loops against
# which we currently have no protection
(The comment being completely incorrect), but then the crash doesn't
work because of the bug I pointed out.
All I want to do is mark a domain without a qemu process as crashed. Is
that clearer?
And yes, it's pretty trivial to make qemu break. Most typically by
passing bogus parameters (say, a broken kernel image, an incorrect NIC,
etc.)
regards,
john
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-20 22:03 ` John Levon
@ 2009-02-21 9:01 ` Keir Fraser
2009-02-23 16:51 ` Ian Jackson
1 sibling, 0 replies; 7+ messages in thread
From: Keir Fraser @ 2009-02-21 9:01 UTC (permalink / raw)
To: John Levon; +Cc: xen-devel@lists.xensource.com
On 20/02/2009 22:03, "John Levon" <levon@movementarian.org> wrote:
> All I want to do is mark a domain without a qemu process as crashed. Is
> that clearer?
>
> And yes, it's pretty trivial to make qemu break. Most typically by
> passing bogus parameters (say, a broken kernel image, an incorrect NIC,
> etc.)
Hmmmm.... Okay, I guess that is pretty reasonable. I'll sort out a patch
after the summit.
-- Keir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-20 22:03 ` John Levon
2009-02-21 9:01 ` Keir Fraser
@ 2009-02-23 16:51 ` Ian Jackson
2009-02-23 16:54 ` John Levon
1 sibling, 1 reply; 7+ messages in thread
From: Ian Jackson @ 2009-02-23 16:51 UTC (permalink / raw)
To: John Levon; +Cc: xen-devel@lists.xensource.com, Keir Fraser
John Levon writes ("Re: [Xen-devel] SHUTDOWN_crash and vcpu deferrals"):
> # ideally we would like to forcibly crash the domain with
> # something like
> # xc.domain_shutdown(self.vm.getDomid(), DOMAIN_CRASH)
> # but this can easily lead to very rapid restart loops against
> # which we currently have no protection
>
> (The comment being completely incorrect), but then the crash doesn't
> work because of the bug I pointed out.
I wrote that comment. I haven't been following this bit of xend. Do
you mean that nowadays if you say
on_crash = 'restart'
and the domain immediately crashes on boot, you don't get an infinite
restart loop ? One of the most common causes of qemu `crashing' is
that it wasn't able to open the dom0 device corresponding to some
emulated device for the guest's benefit and that obviously happens at
startup.
> All I want to do is mark a domain without a qemu process as crashed. Is
> that clearer?
I think that would be good, provided that we can prevent it restarting
rapidly.
> And yes, it's pretty trivial to make qemu break. Most typically by
> passing bogus parameters (say, a broken kernel image, an incorrect NIC,
> etc.)
As you say.
Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-23 16:51 ` Ian Jackson
@ 2009-02-23 16:54 ` John Levon
2009-02-23 16:58 ` Ian Jackson
0 siblings, 1 reply; 7+ messages in thread
From: John Levon @ 2009-02-23 16:54 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel@lists.xensource.com, Keir Fraser
On Mon, Feb 23, 2009 at 04:51:10PM +0000, Ian Jackson wrote:
> > (The comment being completely incorrect), but then the crash doesn't
> > work because of the bug I pointed out.
>
> I wrote that comment. I haven't been following this bit of xend. Do
> you mean that nowadays if you say
> on_crash = 'restart'
> and the domain immediately crashes on boot, you don't get an infinite
> restart loop ? One of the most common causes of qemu `crashing' is
AFAIK this has been the case since forever:
rst = self._readVm('xend/previous_restart_time')
if rst:
rst = float(rst)
timeout = now - rst
if timeout < MINIMUM_RESTART_TIME:
log.error(
'VM %s restarting too fast (%f seconds since the last '
'restart). Refusing to restart to avoid loops.',
self.info['name_label'], timeout)
self.destroy()
return
self._writeVm('xend/previous_restart_time', str(now))
This is from 3.1.4. Perhaps it was broken when you tried it, but it
certainly seems to do its intended job on 3.3.2pre for me.
regards,
john
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: SHUTDOWN_crash and vcpu deferrals
2009-02-23 16:54 ` John Levon
@ 2009-02-23 16:58 ` Ian Jackson
0 siblings, 0 replies; 7+ messages in thread
From: Ian Jackson @ 2009-02-23 16:58 UTC (permalink / raw)
To: John Levon; +Cc: xen-devel@lists.xensource.com, Keir Fraser
John Levon writes ("Re: [Xen-devel] SHUTDOWN_crash and vcpu deferrals"):
> This is from 3.1.4. Perhaps it was broken when you tried it, but it
> certainly seems to do its intended job on 3.3.2pre for me.
Oh, great. I put the comment there because I remembered it happening
to me once (with some kind of pre-3.2 unstable tree I think) but
perhaps I misremembered or there was something else wrong. I didn't
try to reproduce it.
Well, in that case we should definitely fix Xen so that the guest can
be crashed and get rid of my bogus comment.
Regards,
Ian.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-02-23 16:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 21:01 SHUTDOWN_crash and vcpu deferrals John Levon
2009-02-20 21:35 ` Keir Fraser
2009-02-20 22:03 ` John Levon
2009-02-21 9:01 ` Keir Fraser
2009-02-23 16:51 ` Ian Jackson
2009-02-23 16:54 ` John Levon
2009-02-23 16:58 ` Ian Jackson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.