From: Zachary Amsden <zamsden@redhat.com>
To: Arjan Koers <0h61vkll2ly8@xutrox.com>
Cc: kvm@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>,
Michael Tokarev <mjt@tls.msk.ru>, Avi Kivity <avi@redhat.com>,
Glauber Costa <glommer@redhat.com>,
Andre Przywara <andre.przywara@amd.com>
Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests
Date: Sat, 02 Oct 2010 13:42:04 -1000 [thread overview]
Message-ID: <4CA7C34C.4040000@redhat.com> (raw)
In-Reply-To: <4CA75969.1080405@xutrox.com>
On 10/02/2010 06:10 AM, Arjan Koers wrote:
> On 2010-10-02 09:35, Michael Tokarev wrote:
>
>> 02.10.2010 09:35, Zachary Amsden wrote:
>> []
>>
>>> Can you try this patch to see if it helps? I believe it is also safe
>>> for Xen, but cc'ing to double check.
>>>
>> It makes no visible difference.
>>
>> For some reason one of my test guests - 2.6.35.6 32bit kernel -
>> stopped booting completely, always handing at boot somewhere
>> unless I disable printk.time. Here's the typical boot messages,
>> up to the hang:
>>
>> [ 0.000000] Initializing cgroup subsys cpuset
>>
> ...
>
>> [ 0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [ 0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
>>
> While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
> CPU and returns a value that's far into the future. On subsequent calls, it
> keeps returning that bogus 'last' value, because it has been made
> to never go backwards in time.
>
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.
>
>
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..5eab569 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>
> static atomic64_t last_value = ATOMIC64_INIT(0);
>
> +int pvclock_backwards = 0;
> +
> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> {
> struct pvclock_shadow_time shadow;
> unsigned version;
> cycle_t ret, offset;
> u64 last;
> + bool backwards;
>
> do {
> version = pvclock_get_time_values(&shadow, src);
> @@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
> * updating at the same time, and one of them could be slightly behind,
> * making the assumption that last_value always go forward fail to hold.
> */
> + backwards = false;
> last = atomic64_read(&last_value);
> do {
> - if (ret< last)
> - return last;
> + if (ret< last) {
> + if ( last - ret< 25000000 )
> + return last;
> + else
> + /* The clock will go backwards instead of being stuck at last value for a very long time
> + * The return value of the previous call to pvclock_clocksource_read was most probably
> + * very far into te future because the clock for that CPU hadn't been setup yet
> + */
> + backwards = true;
> + }
> last = atomic64_cmpxchg(&last_value, last, ret);
> } while (unlikely(last != ret));
>
> + /* Increment outside of the while loop, because it always loops twice */
> + if (backwards)
> + pvclock_backwards++;
> +
> return ret;
> }
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 0bf2ece..d6dcd45 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1,3 +1,5 @@
> +#define DEBUG
> +
> /*
> * x86 SMP booting functions
> *
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 444b770..9608bec 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -687,6 +687,8 @@ static inline void printk_delay(void)
> }
> }
>
> +extern int pvclock_backwards;
> +
> asmlinkage int vprintk(const char *fmt, va_list args)
> {
> int printed_len = 0;
> @@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
> unsigned long long t;
> unsigned long nanosec_rem;
>
> + int pvclock_backwards_prev = pvclock_backwards;
> t = cpu_clock(printk_cpu);
> nanosec_rem = do_div(t, 1000000000);
> - tlen = sprintf(tbuf, "[%5lu.%06lu] ",
> + tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
> + printk_cpu,
> + pvclock_backwards_prev,
> + pvclock_backwards,
> (unsigned long) t,
> nanosec_rem / 1000);
>
>
>
>
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0: 0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0;0/0: 0.124164] ++++++++++++++++++++=_---CPU UP 1
> [0;0/0: 0.124193] Booting Node 0, Processors #1 Ok.
> [0;0/0: 0.124602] Setting warm reset code and vector.
> [0;0/0: 0.124609] 1.
> [0;0/0: 0.124610] 2.
> [0;0/0: 0.124611] 3.
> [0;0/0: 0.124624] Asserting INIT.
> [0;0/0: 0.124634] Waiting for send to finish...
> [0;0/0: 0.134508] Deasserting INIT.
> [0;0/0: 0.134515] Waiting for send to finish...
> [0;0/0: 0.134519] #startup loops: 2.
> [0;0/0: 0.134521] Sending STARTUP #1.
> [0;0/0: 0.134527] After apic_write.
> [1;0/0: 0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0;0/1: 0.134838] Startup point 1.
> [0;1/1: 0.134841] Waiting for send to finish...
> [0;1/1: 0.135049] Sending STARTUP #2.
> [0;1/1: 0.135055] After apic_write.
> [0;1/1: 0.135359] Startup point 1.
> [0;1/1: 0.135361] Waiting for send to finish...
> [0;1/1: 0.135568] After Startup.
> [0;1/1: 0.135569] Before Callout 1.
> [0;1/1: 0.135571] After Callout 1.
> [1;1/1: 0.008000] CALLIN, before setup_local_APIC().
> [1;2/2: 0.008000] Stack at about ffff88001f875f44
> [0;3/3: 0.136176] CPU1: has booted.
> [1;3/3: 0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
> [0;4/4: 0.136199] Brought up 2 CPUs
> [0;4/4: 0.136201] Boot done.
> [0;4/4: 0.136202] Before bogomips.
> [0;4/4: 0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
> [0;4/4: 0.136205] Before bogocount - setting activated=1.
> [1;4/4: 0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0;4/4: 0.142577] NET: Registered protocol family 16
> [0;4/4: 0.144263] PCI: Using configuration type 1 for base access
> [0;4/4: 0.144494] PCI: Using configuration type 1 for extended access
> [0;4/4: 0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
> [0;4/4: 0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0;4/4: 0.144938] mtrr: probably your BIOS does not setup all CPUs.
> [0;4/4: 0.148004] mtrr: corrected configuration.
> [0;4/4: 0.156040] bio: create slab<bio-0> at 0
> [0;4/4: 0.156602] vgaarb: loaded
> [0;4/4: 0.156602] PCI: Probing PCI hardware
> [0;4/4: 0.156602] PCI: Probing PCI hardware (bus 00)
> [0;4/4: 0.156703] pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
> [0;4/4: 0.160269] pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
> [0;4/4: 0.161055] pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
> [0;4/4: 0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> [0;4/4: 0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> [0;4/4: 0.169023] pci 0000:00:03.0: reg 10: [io 0xc020-0xc03f]
> [0;4/4: 0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
> [0;4/4: 0.170381] pci 0000:00:04.0: reg 10: [io 0xc040-0xc05f]
> [0;4/4: 0.170765] pci 0000:00:05.0: reg 10: [io 0xc080-0xc0bf]
> [0;4/4: 0.171023] pci 0000:00:06.0: reg 10: [io 0xc0c0-0xc0ff]
> [0;4/4: 0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0;4/4: 0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0;4/4: 0.172971] PCI: pci_cache_line_size set to 64 bytes
> [0;4/4: 0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
> [0;4/4: 0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
> [0;4/4: 0.176175] Switching to clocksource kvm-clock
> [1;4/4: 0.212494] pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
> [1;4/4: 0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> [1;4/4: 0.212828] NET: Registered protocol family 2
> [1;4/4: 0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
> ...
>
Umm... do you guys have this commit? This is supposed to address the
issue where the guest keeps resetting the TSC. A guest which does that
will break kvmclock. It only happens on SMP, and it's much worse on AMD
CPUs...
sound like your scenario.
commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
Author: Zachary Amsden <zamsden@redhat.com>
Date: Thu Aug 19 22:07:26 2010 -1000
KVM: x86: Robust TSC compensation
Make the match of TSC find TSC writes that are close to each other
instead of perfectly identical; this allows the compensator to also
work in migration / suspend scenarios.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
next prev parent reply other threads:[~2010-10-02 23:42 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
2010-07-25 8:44 ` Avi Kivity
2010-07-26 8:47 ` Andre Przywara
2010-07-26 18:59 ` Arjan Koers
2010-07-27 21:00 ` Arjan Koers
2010-07-28 10:37 ` Avi Kivity
2010-07-31 0:34 ` Arjan Koers
2010-07-31 1:38 ` Zachary Amsden
2010-07-31 11:50 ` Arjan Koers
2010-07-31 2:39 ` Zachary Amsden
2010-07-31 11:53 ` Arjan Koers
2010-07-31 16:36 ` Arjan Koers
2010-07-31 19:45 ` Arjan Koers
2010-07-31 23:55 ` Zachary Amsden
2010-08-02 14:43 ` Glauber Costa
2010-08-02 16:16 ` Arjan Koers
2010-08-02 18:07 ` Glauber Costa
2010-08-02 20:26 ` Zachary Amsden
2010-08-02 21:10 ` Glauber Costa
2010-08-02 21:35 ` Arjan Koers
2010-08-03 0:00 ` Zachary Amsden
2010-09-28 11:16 ` Michael Tokarev
2010-09-29 8:12 ` Michael Tokarev
2010-09-29 8:28 ` Avi Kivity
2010-09-29 9:17 ` Michael Tokarev
2010-09-29 9:19 ` Michael Tokarev
2010-09-29 19:26 ` Arjan Koers
2010-09-30 7:55 ` Michael Tokarev
2010-09-30 9:59 ` Michael Tokarev
2010-09-30 13:54 ` Zachary Amsden
2010-09-30 15:12 ` Michael Tokarev
2010-09-30 15:32 ` Zachary Amsden
2010-09-30 18:49 ` Arjan Koers
2010-09-30 19:05 ` Marcelo Tosatti
2010-09-30 20:16 ` Arjan Koers
2010-09-30 23:02 ` Michael Tokarev
2010-09-30 23:07 ` Michael Tokarev
2010-10-01 1:13 ` Zachary Amsden
2010-10-02 5:35 ` Zachary Amsden
2010-10-02 7:35 ` Michael Tokarev
2010-10-02 7:40 ` Michael Tokarev
2010-10-02 7:50 ` Michael Tokarev
2010-10-02 16:10 ` Arjan Koers
2010-10-02 20:26 ` Michael Tokarev
2010-10-02 23:42 ` Zachary Amsden [this message]
2010-10-03 8:27 ` Michael Tokarev
2010-10-08 0:12 ` Arjan Koers
2010-10-08 2:47 ` Zachary Amsden
2010-10-08 22:06 ` Marcelo Tosatti
2010-10-09 1:10 ` Arjan Koers
2010-10-09 2:27 ` Zachary Amsden
2010-10-09 6:29 ` Michael Tokarev
2010-10-09 8:59 ` Arjan Koers
2010-10-11 20:47 ` Zachary Amsden
2010-10-13 12:18 ` Glauber Costa
2010-10-10 1:20 ` Arjan Koers
2010-10-11 17:53 ` Anthony Liguori
2010-10-11 18:36 ` Marcelo Tosatti
2010-10-09 2:29 ` Zachary Amsden
2010-10-10 1:26 ` Arjan Koers
2010-10-20 20:47 ` Arjan Koers
2010-10-09 7:59 ` Michael Tokarev
2010-10-09 8:31 ` Michael Tokarev
2010-10-02 21:55 ` Zachary Amsden
2010-10-03 8:16 ` Michael Tokarev
2010-10-03 8:22 ` Avi Kivity
2010-10-03 8:30 ` Michael Tokarev
2010-07-27 10:03 ` Avi Kivity
2010-07-27 11:49 ` Andre Przywara
2010-07-27 12:06 ` Avi Kivity
2010-07-27 12:21 ` Andre Przywara
2010-07-27 12:34 ` Avi Kivity
2010-07-27 13:48 ` Andre Przywara
2010-07-27 13:58 ` Avi Kivity
2010-07-27 14:55 ` Andre Przywara
2010-07-27 21:51 ` Andre Przywara
2010-07-28 3:00 ` Zachary Amsden
2010-07-28 7:55 ` Andre Przywara
2010-07-28 12:25 ` Andre Przywara
2010-07-30 22:54 ` Zachary Amsden
2010-08-02 10:12 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CA7C34C.4040000@redhat.com \
--to=zamsden@redhat.com \
--cc=0h61vkll2ly8@xutrox.com \
--cc=andre.przywara@amd.com \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mjt@tls.msk.ru \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox