From: Anthony Liguori <aliguori@us.ibm.com>
To: Glauber Costa <gcosta@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
kvm-devel <kvm@vger.kernel.org>,
kraxel@redhat.com, chrisw@redhat.com
Subject: Re: kvm guest loops_per_jiffy miscalibration under host load
Date: Sun, 06 Jul 2008 20:56:27 -0500 [thread overview]
Message-ID: <487177CB.60104@us.ibm.com> (raw)
In-Reply-To: <486CD151.8020004@redhat.com>
Glauber Costa wrote:
> Marcelo Tosatti wrote:
>> Hello,
>>
>> I have been discussing with Glauber and Gerd the problem where KVM
>> guests miscalibrate loops_per_jiffy if there's sufficient load on the
>> host.
>>
>> calibrate_delay_direct() failed to get a good estimate for
>> loops_per_jiffy.
>> Probably due to long platform interrupts. Consider using "lpj=" boot
>> option.
>> Calibrating delay loop... <3>107.00 BogoMIPS (lpj=214016)
>>
>> While this particular host calculates lpj=1597041.
>>
>> This means that udelay() can delay for less than what asked for, with
>> fatal results such as:
>>
>> ..MP-BIOS bug: 8254 timer not connected to IO-APIC
>> Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the
>> 'noapic' kernel parameter
>>
>> This bug is easily triggered with a CPU hungry task on nice -20
>> running only during guest calibration (so that the timer check code on
>> io_apic_{32,64}.c fails to wait long enough for PIT interrupts to fire).
>>
>> The problem is that the calibration routines assume a stable relation
>> between timer interrupt frequency (PIT at this boot stage) and
>> TSC/execution frequency.
>>
>> The emulated timer frequency is based on the host system time and
>> therefore virtually resistant against heavy load, while the execution
>> of these routines on the guest is suspectible to scheduling of the QEMU
>> process.
>>
>> To fix this in a transparent way (without direct "lpj=" boot parameter
>> assignment or a paravirt equivalent), it would be necessary to base the
>> emulated timer frequency on guest execution time instead of host system
>> time. But this can introduce timekeeping issues (recent Linux guests
>> seem to handle lost/late interrupts fine as long as the clocksource is
>> reliable) and just sounds scary.
>>
>> Possible solutions:
>>
>> - Require the admin to preset "lpj=". Nasty, not user friendly.
>> - Pass the proper lpj value via a paravirt interface. Won't cover
>> fullvirt guests.
>> - Have the management app guarantee a minimum amount of CPU required
>> for proper calibration during guest initialization.
> I don't like any of these solutions, and won't defend any of "the
> one". So no hard feelings. But I think the "less worse" among them
> IMHO is the
> paravirt one. At least it goes in the general direction of "paravirt
> if you need to scale over xyz".
I agree. A paravirt solution solves the problem.
> I think passing lpj is out of question, and giving the cpu resources
> for that time is kind of a kludge.
It's all heuristics unfortunately.
> Or maybe we could put the timer expiration alone in a separate thread,
> with maximum priority (maybe rt priority)? dunno...
But then if you have high-load because of a lot of guests running, you
defeat yourself. Any attempt to guarantee time to a guest will be
defeated by lots of guests all attempting calibration at the same time.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2008-07-07 1:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-02 16:40 kvm guest loops_per_jiffy miscalibration under host load Marcelo Tosatti
2008-07-03 13:17 ` Glauber Costa
2008-07-04 22:51 ` Marcelo Tosatti
2008-07-07 1:56 ` Anthony Liguori [this message]
2008-07-07 18:27 ` Glauber Costa
2008-07-07 18:48 ` Marcelo Tosatti
2008-07-07 19:21 ` Anthony Liguori
2008-07-07 19:32 ` Glauber Costa
2008-07-07 21:35 ` Glauber Costa
2008-07-11 21:18 ` David S. Ahern
2008-07-12 14:10 ` Marcelo Tosatti
2008-07-12 19:28 ` David S. Ahern
2008-07-07 18:17 ` Daniel P. Berrange
-- strict thread matches above, loose matches on Subject: below --
2008-07-22 3:25 Marcelo Tosatti
2008-07-22 8:22 ` Jan Kiszka
2008-07-22 12:49 ` Marcelo Tosatti
2008-07-22 15:54 ` Jan Kiszka
2008-07-22 22:00 ` Dor Laor
2008-07-22 19:56 ` David S. Ahern
2008-07-23 2:57 ` David S. Ahern
2008-07-29 14:58 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=487177CB.60104@us.ibm.com \
--to=aliguori@us.ibm.com \
--cc=chrisw@redhat.com \
--cc=gcosta@redhat.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.