From: Matt Lupfer <mlupfer@ddn.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: pbonzini@redhat.com, QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] CentOS 5.x intermittently fails to boot on QEMU 1.7.0
Date: Fri, 21 Feb 2014 17:57:56 -0700 [thread overview]
Message-ID: <5307F614.3060204@ddn.com> (raw)
In-Reply-To: <15BFE5D4-F5BD-4C70-970F-7F94B2AF8360@alex.org.uk>
On 02/21/2014 06:27 AM, Alex Bligh wrote:
>
> On 21 Feb 2014, at 04:34, Matt Lupfer wrote:
>
>>
>> This doesn't appear to be a solution, because with the timer rewrite, QEMU
>> moves its periodic (1 ms) qemu_notify_event() call to break out of
>> the main event loop from a SIGALRM handler to the rearm of a QEMU timer.
>> Presumably QEMU is counting on these generic callbacks.
>
> This is somewhat bizarre as the code you are reverting causes the main loop
> to be broken out of *more*.
>
> It's also happening only when someone calls qemu_mod_timer_ns. I'm
> not sure what precisely the kernel is doing there, but perhaps it
> is modifying a timer repeatedly and checking it fires within a given
> time?
>
Thanks for the response. The hpet_timer() callback calls timer_mod()
every 1 ms. That timerlist has no notify callback so it in turn calls
qemu_notify_event().
The guest kernel is only enabling the HPET timer and looking for
timer interrupts.
>> It appears that in QEMU 1.7.0, QEMU/KVM doesn't inject timer interrupts, or
>> alternatively the guest doesn't handle them, quickly enough to pass
>> the timer check in the guest kernel reliably.
>
> Yes that would suggest a latency type thing. The other thing that may
> have happened is that the work done is being reprioritised, so rather
> than respond to timer events immediately it's off doing some disk I/O
> or similar, though frankly that's hard to understand when the kernel
> is booting.
>
I did some more debugging and found the problem was elsewhere. This
different timer behavior is exposing a bug in the HPET implementation.
It's possible for the QEMU timer underlying the HPET to call the hpet_timer()
callback between when the timer is created and when the HPET device is
enabled (both actions initiated by the guest writing to HPET registers).
When this happens, the QEMU timer is rearmed to an expiration
time based on uninitialized values. That's preventing the system timer
interrupt from ticking in the guest during the timer check at boot.
The changes to the timer implementation just makes this a lot more likely
to happen on CentOS 5.x kernels.
The fix looks straightforward. I'll send a patch to the list.
Matt
next prev parent reply other threads:[~2014-02-22 0:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-21 4:34 [Qemu-devel] CentOS 5.x intermittently fails to boot on QEMU 1.7.0 Matt Lupfer
2014-02-21 6:30 ` Paolo Bonzini
2014-02-21 13:27 ` Alex Bligh
2014-02-22 0:57 ` Matt Lupfer [this message]
2014-02-22 8:55 ` Alex Bligh
2014-02-22 8:59 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5307F614.3060204@ddn.com \
--to=mlupfer@ddn.com \
--cc=alex@alex.org.uk \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).