From mboxrd@z Thu Jan 1 00:00:00 1970 From: Athanasius Subject: Re: Clocksource tsc unstable (delta = -4398046474878 ns) Date: Mon, 29 Mar 2010 11:31:13 +0100 Message-ID: <20100329103113.GP3910@miggy.org> References: <20100328114635.401C730301D3@mail.linux-ag.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="U0B5otXy6WXfork9" Cc: kvm@vger.kernel.org To: Sebastian Hetze Return-path: Received: from lake.fysh.org ([81.94.195.195]:43841 "EHLO lake.fysh.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755168Ab0C2Ksf (ORCPT ); Mon, 29 Mar 2010 06:48:35 -0400 Content-Disposition: inline In-Reply-To: <20100328114635.401C730301D3@mail.linux-ag.de> Sender: kvm-owner@vger.kernel.org List-ID: --U0B5otXy6WXfork9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Mar 28, 2010 at 01:46:35PM +0200, Sebastian Hetze wrote: > this message appeared in the KVM guest kern.log last night: >=20 > Mar 27 22:35:30 guest kernel: [260041.559462] Clocksource tsc unstable (d= elta =3D -4398046474878 ns) >=20 > The guest is running a 2.6.31-20-generic-pae ubuntu kernel with > hrtimer-tune-hrtimer_interrupt-hang-logic.patch applied. >=20 > If I understand things correct, in kernel/time/clocksource.c > clocksource_watchdog() checks all the > /sys/devices/system/clocksource/clocksource0/available_clocksource > every 0.5sec for an delta of more than 0.0625s. So the tsc must have > changed more than one hour within two subsequent calls of > clocksource_watchdog. No event in the host nor anything in the > guest gives reasonable cause for this step. >=20 > However, the number 4398046474878 is only 36226 ns away from > 4*1024*1024*1024*1024 I didn't see any such messages but I've had a recent experience with the time on one KVM host leaping *forwards* approx. 5 and 2.5 hours in two separate incidents. Eerily the exact jumps, as best I can tell from logs are of 17592 and 8796 seconds, give or take a second or two. If you look at these as nanoseconds then that's 'exactly' 2^44 and 2^43 nanoseconds. What I've done that seems to have avoided this happening again is drop KVM_CLOCK kernel option from the kvm guests' kernel. This is with a Debian squeeze (testing) KVM host running 2.6.33 from vanilla sources and my own config. The guests are Debian lenny (stable) and were also running a 2.6.33 kernel from vanilla sources and my own (different, to match the virtual hardware in a KVM guest) config. Both systems/kernels are 64 bit. The base machine is a Dell R210 with an Intel Xeon X3450 quad-core CPU, with the hyper-threading enabled to give 8 visible CPUs in Linux. This only happened on one of the two guests, the much busier one (it does shell accounts, email, IMAP/POP3, a small news server and NFS serves web pages to the other guest which only runs apache2 and nagios3). It took around 2-3 days to see the problem both times. Without KVM_CLOCK it's been up and stable for well over a week now. Without KVM_CLOCK the only clocksource is acpi_pm and thus that is being used. I didn't test forcing that with a boot-time parameter and KVM_CLOCK still enabled. Given turning KVM_CLOCK off fixed my problem and the problem repeating itself causes all manner of trouble given how busy the machine is I'm not really willing to test alternative fixes. --=20 - Athanasius =3D Athanasius(at)miggy.org / http://www.miggy.org/ Finger athan(at)fysh.org for PGP key "And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence." Paula Cole - ME --U0B5otXy6WXfork9 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkuwgXAACgkQSEDmQuIYzh1EcQCbBEoNELRuM28SJ0g2To1y1ePA 1dIAn1u7abqaZVl32ByqOE1K5KTCzdKg =u5qN -----END PGP SIGNATURE----- --U0B5otXy6WXfork9--