From: Andre Przywara <andre.przywara@amd.com>
To: Zachary Amsden <zamsden@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
"glommer@redhat.com" <glommer@redhat.com>,
KVM list <kvm@vger.kernel.org>
Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests
Date: Mon, 2 Aug 2010 12:12:37 +0200 [thread overview]
Message-ID: <4C569A15.6020900@amd.com> (raw)
In-Reply-To: <4C53581E.2090208@redhat.com>
Zachary Amsden wrote:
> On 07/28/2010 02:25 AM, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> Andre Przywara wrote:
>>>> Avi Kivity wrote:
>>>>> On 07/27/2010 04:48 PM, Andre Przywara wrote:
>>>>>>> Wierd. Maybe the clock goes crazy.
>>>>>>>
>>>>>>> Let's see if it jumps forward alot:
>>>>>>>
>>>>>>> } while (unlikely(last != ret));
>>>>>>> +
>>>>>>> + {
>>>>>>> + static u64 last_report;
>>>>>>> + if (ret > last_report + 10000) {
>>>>>>> + last_report = ret;
>>>>>>> + printk("kvmclock: %llx\n", ret);
>>>>>>> + }
>>>>>>> +
>>>>>>> + }
>>>>>>>
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>> Worth updating the 'return last' to update ret and goto the new
>>>>>>> code, so we don't miss that path.
>>>>>> Did that. There is _a lot_ of output (about 350 lines per second
>>>>>> via the 115k serial console), both with smp=1 and smp=2.
>>>>>> The majority is differing about 2,000,000 (ticks?), but a handful
>>>>>> of them are in the range of 20 million.
>>>>> nanoseconds. So 2-20ms. Consistent with 350 lines/sec.
>>>>>
>>>>>> No difference between smp=2 and smp=1.
>>>>>> I also get some "BUG: recent printk recursion!" and I don't see
>>>>>> any kernel boot progress beyond outputting the BogoMIPS value.
>>>>> Right, printk() wants the time too.
>>>>>
>>>>>> BTW: I found two message from your earlier debug statement:
>>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock
>>>>>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock
>>>>> Those are from kvmclock initialization, not from the older patch.
>>>>>
>>>>> I'm completely confused, everything seems to be in order.
>>>>>
>>>>> Let's see. if you s/return last/return ret/ in the original, does
>>>>> this help things along? this makes pvclock drop the computation
>>>>> and should be exactly the same as before the patch.
>>>> Yes, this works, both smp version boot. I see a short very short
>>>> break after the line in question, but then it proceeds well.
>>>> Thanks for your help, now I got a much better insight into the
>>>> issue. I will see if I can find something more.
>>> Did some more investigations, some observations:
>>> - The cmpxchg does not seem to be a problem, I didn't see the loop
>>> iterated more than once.
>>> - Turning off printk-timestamps makes the bug go away. But I guess it
>>> is just hiding or deferring it, and it's no real workaround anyway.
>>> - I instrumented the "if (ret < last) return last;" statement, when
>>> the kernel hangs I get only printks from there, although it has hit
>>> before:
>>> ----------
>>> [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>>> [ 0.820000] returning last instead (cnt=19001)
>>> [ 0.820000] returning last instead (cnt=20001)
>>> The last line repeats forever with the same timestamp, the counter
>>> (counting the number of "return last;") increments about 3500
>>> times/second.
>>>
>>> I will see if I find something more...
>> Added some more instrumentation, seems like the values read from the
>> pvclock is bogus *sometimes*:
>> returning last instead (2778021535795841, cnt=1, diff=1389078312510470)
>> This is from the first time the if-statement triggers. So I guess the
>> value read is ridiculously far in the future (multiple days), so next
>> calls to clocksource_read() will always return this bogus last value.
>> This means that the clock does not make progress (for several days)
>> and thus timing loops will never come to an end. I also instrumented
>> the serial driver, the last thing I saw was autoconfig_irq, where
>> obviously udelay() is called.
>>
>> Does that ring a bell with someone?
>>
>> I will now concentrate on the pvclock readout/HV write part to see
>> which of the values used here are wrong.
>
> Have you gotten any further results on this?
Somehow. I think my latest findings were more or less ghost bugs: since
prinkts contain a timestamp they interfere with the actual code. The
large gap I described above was only to be seen with these printks, it
is more or less double the real value (which is my host's uptime).
Sadly I cannot use debugfs to avoid the printks, since the kernel halts
and I don't get to userland.
On another try I managed to bisect the failure also in qemu-kvm. The bug
triggers only with "ebc4f45 turn off kvmclock when resetting cpu"
applied (_additionally_ to the kernel patch in question).
When I comment out the call to kvm_reset_msrs() in the master branch,
this also lets the bug vanish.
>
> I think the most likely explanation is that your host CPU has TSC out of
> sync, and somehow this leaks over to pvclock. Am I correct that it
> happens even with one guest VCPU? What if you disable secondary host CPUs?
I tried several ways to pin VCPUs to different host CPUs (cores and
sockets): both VCPUs on one core, both vCPUs on different cores on the
same socket and both vCPUs to different sockets/nodes. That all did not
make any difference, the kernel halted in either case.
I also tried booting the host with maxcpus=1, the error was still the
same: -smp 1 works, -smp 2 halts.
Btw.: the host uses clocksource acpi_pm. Also I noticed that sometimes
the guests gets very slow after having switched the clocksource to
kvmclock, it then eventually halts at the mentioned line.
Regards,
Andre.
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
prev parent reply other threads:[~2010-08-02 10:17 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
2010-07-25 8:44 ` Avi Kivity
2010-07-26 8:47 ` Andre Przywara
2010-07-26 18:59 ` Arjan Koers
2010-07-27 21:00 ` Arjan Koers
2010-07-28 10:37 ` Avi Kivity
2010-07-31 0:34 ` Arjan Koers
2010-07-31 1:38 ` Zachary Amsden
2010-07-31 11:50 ` Arjan Koers
2010-07-31 2:39 ` Zachary Amsden
2010-07-31 11:53 ` Arjan Koers
2010-07-31 16:36 ` Arjan Koers
2010-07-31 19:45 ` Arjan Koers
2010-07-31 23:55 ` Zachary Amsden
2010-08-02 14:43 ` Glauber Costa
2010-08-02 16:16 ` Arjan Koers
2010-08-02 18:07 ` Glauber Costa
2010-08-02 20:26 ` Zachary Amsden
2010-08-02 21:10 ` Glauber Costa
2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
2010-09-28 11:16 ` Michael Tokarev
2010-09-29 8:12 ` Michael Tokarev
2010-09-29 8:28 ` Avi Kivity
2010-09-29 9:17 ` Michael Tokarev
2010-09-29 9:19 ` Michael Tokarev
2010-09-29 19:26 ` Arjan Koers
2010-09-30 7:55 ` Michael Tokarev
2010-09-30 9:59 ` Michael Tokarev
2010-09-30 13:54 ` Zachary Amsden
2010-09-30 15:12 ` Michael Tokarev
2010-09-30 15:32 ` Zachary Amsden
2010-09-30 18:49 ` Arjan Koers
2010-09-30 19:05 ` Marcelo Tosatti
2010-09-30 20:16 ` Arjan Koers
2010-09-30 23:02 ` Michael Tokarev
2010-09-30 23:07 ` Michael Tokarev
2010-10-01 1:13 ` Zachary Amsden
2010-10-02 5:35 ` Zachary Amsden
2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
2010-10-02 7:50 ` Michael Tokarev
2010-10-02 16:10 ` Arjan Koers
2010-10-02 20:26 ` Michael Tokarev
2010-10-02 23:42 ` Zachary Amsden
2010-10-03 8:27 ` Michael Tokarev
2010-10-08 0:12 ` Arjan Koers
2010-10-08 2:47 ` Zachary Amsden
2010-10-08 22:06 ` Marcelo Tosatti
2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
2010-10-09 8:59 ` Arjan Koers
2010-10-11 20:47 ` Zachary Amsden
2010-10-13 12:18 ` Glauber Costa
2010-10-10 1:20 ` Arjan Koers
2010-10-11 17:53 ` Anthony Liguori
2010-10-11 18:36 ` Marcelo Tosatti
2010-10-09 2:29 ` Zachary Amsden
2010-10-10 1:26 ` Arjan Koers
2010-10-20 20:47 ` Arjan Koers
2010-10-09 7:59 ` Michael Tokarev
2010-10-09 8:31 ` Michael Tokarev
2010-10-02 21:55 ` Zachary Amsden
2010-10-03 8:16 ` Michael Tokarev
2010-10-03 8:22 ` Avi Kivity
2010-10-03 8:30 ` Michael Tokarev
2010-07-27 10:03 ` Avi Kivity
2010-07-27 11:49 ` Andre Przywara
2010-07-27 12:06 ` Avi Kivity
2010-07-27 12:21 ` Andre Przywara
2010-07-27 12:34 ` Avi Kivity
2010-07-27 13:48 ` Andre Przywara
2010-07-27 13:58 ` Avi Kivity
2010-07-27 14:55 ` Andre Przywara
2010-07-27 21:51 ` Andre Przywara
2010-07-28 3:00 ` Zachary Amsden
2010-07-28 7:55 ` Andre Przywara
2010-07-28 12:25 ` Andre Przywara
2010-07-30 22:54 ` Zachary Amsden
2010-08-02 10:12 ` Andre Przywara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C569A15.6020900@amd.com \
--to=andre.przywara@amd.com \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=zamsden@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.