public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Zachary Amsden <zamsden@redhat.com>
To: Arjan Koers <0h61vkll2ly8@xutrox.com>
Cc: kvm@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Michael Tokarev <mjt@tls.msk.ru>, Avi Kivity <avi@redhat.com>,
	Glauber Costa <glommer@redhat.com>,
	Andre Przywara <andre.przywara@amd.com>
Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests
Date: Sat, 02 Oct 2010 13:42:04 -1000	[thread overview]
Message-ID: <4CA7C34C.4040000@redhat.com> (raw)
In-Reply-To: <4CA75969.1080405@xutrox.com>

On 10/02/2010 06:10 AM, Arjan Koers wrote:
> On 2010-10-02 09:35, Michael Tokarev wrote:
>    
>> 02.10.2010 09:35, Zachary Amsden wrote:
>> []
>>      
>>> Can you try this patch to see if it helps?  I believe it is also safe
>>> for Xen, but cc'ing to double check.
>>>        
>> It makes no visible difference.
>>
>> For some reason one of my test guests - 2.6.35.6 32bit kernel -
>> stopped booting completely, always handing at boot somewhere
>> unless I disable printk.time.  Here's the typical boot messages,
>> up to the hang:
>>
>> [    0.000000] Initializing cgroup subsys cpuset
>>      
> ...
>    
>> [    0.259999] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
>> [    0.259999] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
>>
>> Note the time - it is constant after switching to kvmclock.
>>      
> While CPU 1 is booting, pvclock_clocksource_read gets wrong data for that
> CPU and returns a value that's far into the future. On subsequent calls, it
> keeps returning that bogus 'last' value, because it has been made
> to never go backwards in time.
>
> I'm pretty sure that your kernel will boot with this debug patch (for
> 2.6.35.7). It doesn't fix the problem, but corrects things afterwards.
> The patch sets the clock backwards if it detects that the previous
> value was far into the future. It also modifies printk to display some
> extra information. The DEBUG define was added to get extra calls to
> printk's where things can go wrong.
>
>
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..5eab569 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,12 +120,15 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>
>   static atomic64_t last_value = ATOMIC64_INIT(0);
>
> +int pvclock_backwards = 0;
> +
>   cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>   {
>   	struct pvclock_shadow_time shadow;
>   	unsigned version;
>   	cycle_t ret, offset;
>   	u64 last;
> +	bool backwards;
>
>   	do {
>   		version = pvclock_get_time_values(&shadow, src);
> @@ -153,13 +156,26 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>   	 * updating at the same time, and one of them could be slightly behind,
>   	 * making the assumption that last_value always go forward fail to hold.
>   	 */
> +	backwards = false;
>   	last = atomic64_read(&last_value);
>   	do {
> -		if (ret<  last)
> -			return last;
> +		if (ret<  last) {
> +			if ( last - ret<  25000000 )
> +				return last;
> +			else
> +				/* The clock will go backwards instead of being stuck at last value for a very long time
> +				 * The return value of the previous call to pvclock_clocksource_read was most probably
> +				 * very far into te future because the clock for that CPU hadn't been setup yet
> +				 */
> +				backwards = true;
> +		}
>   		last = atomic64_cmpxchg(&last_value, last, ret);
>   	} while (unlikely(last != ret));
>
> +	/* Increment outside of the while loop, because it always loops twice */
> +	if (backwards)
> +		pvclock_backwards++;
> +
>   	return ret;
>   }
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 0bf2ece..d6dcd45 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1,3 +1,5 @@
> +#define DEBUG
> +
>   /*
>    *	x86 SMP booting functions
>    *
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 444b770..9608bec 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -687,6 +687,8 @@ static inline void printk_delay(void)
>   	}
>   }
>
> +extern int pvclock_backwards;
> +
>   asmlinkage int vprintk(const char *fmt, va_list args)
>   {
>   	int printed_len = 0;
> @@ -778,9 +780,13 @@ asmlinkage int vprintk(const char *fmt, va_list args)
>   				unsigned long long t;
>   				unsigned long nanosec_rem;
>
> +				int pvclock_backwards_prev = pvclock_backwards;
>   				t = cpu_clock(printk_cpu);
>   				nanosec_rem = do_div(t, 1000000000);
> -				tlen = sprintf(tbuf, "[%5lu.%06lu] ",
> +				tlen = sprintf(tbuf, "[%d;%d/%d:%5lu.%06lu] ",
> +						printk_cpu,
> +						pvclock_backwards_prev,
> +						pvclock_backwards,
>   						(unsigned long) t,
>   						nanosec_rem / 1000);
>
>
>
>
> Partial output on my machine, where the clock is set backwards 4 times:
> ...
> [0;0/0:    0.015662] CPU0: AMD Athlon(tm) II X2 240 Processor stepping 02
> [0;0/0:    0.124164] ++++++++++++++++++++=_---CPU UP  1
> [0;0/0:    0.124193] Booting Node   0, Processors  #1 Ok.
> [0;0/0:    0.124602] Setting warm reset code and vector.
> [0;0/0:    0.124609] 1.
> [0;0/0:    0.124610] 2.
> [0;0/0:    0.124611] 3.
> [0;0/0:    0.124624] Asserting INIT.
> [0;0/0:    0.124634] Waiting for send to finish...
> [0;0/0:    0.134508] Deasserting INIT.
> [0;0/0:    0.134515] Waiting for send to finish...
> [0;0/0:    0.134519] #startup loops: 2.
> [0;0/0:    0.134521] Sending STARTUP #1.
> [0;0/0:    0.134527] After apic_write.
> [1;0/0:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
> [0;0/1:    0.134838] Startup point 1.
> [0;1/1:    0.134841] Waiting for send to finish...
> [0;1/1:    0.135049] Sending STARTUP #2.
> [0;1/1:    0.135055] After apic_write.
> [0;1/1:    0.135359] Startup point 1.
> [0;1/1:    0.135361] Waiting for send to finish...
> [0;1/1:    0.135568] After Startup.
> [0;1/1:    0.135569] Before Callout 1.
> [0;1/1:    0.135571] After Callout 1.
> [1;1/1:    0.008000] CALLIN, before setup_local_APIC().
> [1;2/2:    0.008000] Stack at about ffff88001f875f44
> [0;3/3:    0.136176] CPU1: has booted.
> [1;3/3:    0.008000] kvm-clock: cpu 1, msr 0:1511c41, secondary cpu clock
> [0;4/4:    0.136199] Brought up 2 CPUs
> [0;4/4:    0.136201] Boot done.
> [0;4/4:    0.136202] Before bogomips.
> [0;4/4:    0.136204] Total of 2 processors activated (11198.56 BogoMIPS).
> [0;4/4:    0.136205] Before bogocount - setting activated=1.
> [1;4/4:    0.140208] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
> [0;4/4:    0.142577] NET: Registered protocol family 16
> [0;4/4:    0.144263] PCI: Using configuration type 1 for base access
> [0;4/4:    0.144494] PCI: Using configuration type 1 for extended access
> [0;4/4:    0.144938] mtrr: your CPUs had inconsistent variable MTRR settings
> [0;4/4:    0.144938] mtrr: your CPUs had inconsistent MTRRdefType settings
> [0;4/4:    0.144938] mtrr: probably your BIOS does not setup all CPUs.
> [0;4/4:    0.148004] mtrr: corrected configuration.
> [0;4/4:    0.156040] bio: create slab<bio-0>  at 0
> [0;4/4:    0.156602] vgaarb: loaded
> [0;4/4:    0.156602] PCI: Probing PCI hardware
> [0;4/4:    0.156602] PCI: Probing PCI hardware (bus 00)
> [0;4/4:    0.156703] pci 0000:00:01.1: reg 20: [io  0xc000-0xc00f]
> [0;4/4:    0.160269] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
> [0;4/4:    0.161055] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
> [0;4/4:    0.164064] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> [0;4/4:    0.164827] pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> [0;4/4:    0.169023] pci 0000:00:03.0: reg 10: [io  0xc020-0xc03f]
> [0;4/4:    0.170052] pci 0000:00:03.0: reg 14: [mem 0xf2001000-0xf2001fff]
> [0;4/4:    0.170381] pci 0000:00:04.0: reg 10: [io  0xc040-0xc05f]
> [0;4/4:    0.170765] pci 0000:00:05.0: reg 10: [io  0xc080-0xc0bf]
> [0;4/4:    0.171023] pci 0000:00:06.0: reg 10: [io  0xc0c0-0xc0ff]
> [0;4/4:    0.172123] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> [0;4/4:    0.172971] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
> [0;4/4:    0.172971] PCI: pci_cache_line_size set to 64 bytes
> [0;4/4:    0.172971] reserve RAM buffer: 000000000009bc00 - 000000000009ffff
> [0;4/4:    0.172971] reserve RAM buffer: 000000001fffd000 - 000000001fffffff
> [0;4/4:    0.176175] Switching to clocksource kvm-clock
> [1;4/4:    0.212494] pci_bus 0000:00: resource 0 [io  0x0000-0xffff]
> [1;4/4:    0.212500] pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> [1;4/4:    0.212828] NET: Registered protocol family 2
> [1;4/4:    0.213783] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
> ...
>    

Umm...  do you guys have this commit?  This is supposed to address the 
issue where the guest keeps resetting the TSC.  A guest which does that 
will break kvmclock.  It only happens on SMP, and it's much worse on AMD 
CPUs...

sound like your scenario.

commit bd59fc8ff95126f27b7a0df1b6cc602aa428812d
Author: Zachary Amsden <zamsden@redhat.com>
Date:   Thu Aug 19 22:07:26 2010 -1000

     KVM: x86: Robust TSC compensation

     Make the match of TSC find TSC writes that are close to each other
     instead of perfectly identical; this allows the compensator to also
     work in migration / suspend scenarios.

     Signed-off-by: Zachary Amsden <zamsden@redhat.com>
     Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>



  parent reply	other threads:[~2010-10-02 23:42 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-22 12:53 2.6.35-rc1 regression with pvclock and smp guests Andre Przywara
2010-07-25  8:44 ` Avi Kivity
2010-07-26  8:47   ` Andre Przywara
2010-07-26 18:59     ` Arjan Koers
2010-07-27 21:00       ` Arjan Koers
2010-07-28 10:37         ` Avi Kivity
2010-07-31  0:34           ` Arjan Koers
2010-07-31  1:38             ` Zachary Amsden
2010-07-31 11:50               ` Arjan Koers
2010-07-31  2:39             ` Zachary Amsden
2010-07-31 11:53               ` Arjan Koers
2010-07-31 16:36                 ` Arjan Koers
2010-07-31 19:45                   ` Arjan Koers
2010-07-31 23:55                   ` Zachary Amsden
2010-08-02 14:43                     ` Glauber Costa
2010-08-02 16:16                       ` Arjan Koers
2010-08-02 18:07                         ` Glauber Costa
2010-08-02 20:26                       ` Zachary Amsden
2010-08-02 21:10                         ` Glauber Costa
2010-08-02 21:35                         ` Arjan Koers
2010-08-03  0:00                           ` Zachary Amsden
2010-09-28 11:16                           ` Michael Tokarev
2010-09-29  8:12                             ` Michael Tokarev
2010-09-29  8:28                           ` Avi Kivity
2010-09-29  9:17                             ` Michael Tokarev
2010-09-29  9:19                               ` Michael Tokarev
2010-09-29 19:26                                 ` Arjan Koers
2010-09-30  7:55                                   ` Michael Tokarev
2010-09-30  9:59                                     ` Michael Tokarev
2010-09-30 13:54                                       ` Zachary Amsden
2010-09-30 15:12                                         ` Michael Tokarev
2010-09-30 15:32                                           ` Zachary Amsden
2010-09-30 18:49                                             ` Arjan Koers
2010-09-30 19:05                                               ` Marcelo Tosatti
2010-09-30 20:16                                                 ` Arjan Koers
2010-09-30 23:02                                                 ` Michael Tokarev
2010-09-30 23:07                                                   ` Michael Tokarev
2010-10-01  1:13                                                     ` Zachary Amsden
2010-10-02  5:35                                                     ` Zachary Amsden
2010-10-02  7:35                                                       ` Michael Tokarev
2010-10-02  7:40                                                         ` Michael Tokarev
2010-10-02  7:50                                                           ` Michael Tokarev
2010-10-02 16:10                                                         ` Arjan Koers
2010-10-02 20:26                                                           ` Michael Tokarev
2010-10-02 23:42                                                           ` Zachary Amsden [this message]
2010-10-03  8:27                                                             ` Michael Tokarev
2010-10-08  0:12                                                             ` Arjan Koers
2010-10-08  2:47                                                               ` Zachary Amsden
2010-10-08 22:06                                                                 ` Marcelo Tosatti
2010-10-09  1:10                                                                   ` Arjan Koers
2010-10-09  2:27                                                                     ` Zachary Amsden
2010-10-09  6:29                                                                       ` Michael Tokarev
2010-10-09  8:59                                                                         ` Arjan Koers
2010-10-11 20:47                                                                           ` Zachary Amsden
2010-10-13 12:18                                                                             ` Glauber Costa
2010-10-10  1:20                                                                       ` Arjan Koers
2010-10-11 17:53                                                                       ` Anthony Liguori
2010-10-11 18:36                                                                         ` Marcelo Tosatti
2010-10-09  2:29                                                                     ` Zachary Amsden
2010-10-10  1:26                                                                     ` Arjan Koers
2010-10-20 20:47                                                                     ` Arjan Koers
2010-10-09  7:59                                                                   ` Michael Tokarev
2010-10-09  8:31                                                                     ` Michael Tokarev
2010-10-02 21:55                                                         ` Zachary Amsden
2010-10-03  8:16                                                           ` Michael Tokarev
2010-10-03  8:22                                                             ` Avi Kivity
2010-10-03  8:30                                                             ` Michael Tokarev
2010-07-27 10:03     ` Avi Kivity
2010-07-27 11:49       ` Andre Przywara
2010-07-27 12:06         ` Avi Kivity
2010-07-27 12:21           ` Andre Przywara
2010-07-27 12:34             ` Avi Kivity
2010-07-27 13:48               ` Andre Przywara
2010-07-27 13:58                 ` Avi Kivity
2010-07-27 14:55                   ` Andre Przywara
2010-07-27 21:51                     ` Andre Przywara
2010-07-28  3:00                       ` Zachary Amsden
2010-07-28  7:55                         ` Andre Przywara
2010-07-28 12:25                       ` Andre Przywara
2010-07-30 22:54                         ` Zachary Amsden
2010-08-02 10:12                           ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CA7C34C.4040000@redhat.com \
    --to=zamsden@redhat.com \
    --cc=0h61vkll2ly8@xutrox.com \
    --cc=andre.przywara@amd.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mjt@tls.msk.ru \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox