From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: pvops domu soft lockup under load (more logs) Date: Fri, 16 Apr 2010 13:22:05 -0700 Message-ID: <4BC8C6ED.4020700@goop.org> References: <2F17645D-999B-435C-97EE-508D39B71035@panelsix.com> <4BBF7550.6070807@goop.org> <1C3B7CF5-5772-4D88-9EBF-F7F71BBA710D@openpanel.com> <4BC5FFF6.3090703@goop.org> <5D538568-D29B-40DB-BEE8-240429C97044@panelsix.com> <4BC74B11.7080206@goop.org> <9ED98B64-9FF0-418E-8842-73518F3284B9@panelsix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <9ED98B64-9FF0-418E-8842-73518F3284B9@panelsix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Pim van Riezen Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 04/16/2010 12:37 AM, Pim van Riezen wrote: > Another datapoint. This customer has similarly loaded VPS machines on a number of different hardware nodes. Not all of them had the lockup problem. I applied the jiffies clocksource to all his machines, regardless of their current problem status. Does the host hardware differ? Are they multisocket? Intel? AMD? > After a day without lockups, the customer complained about time drift (ntp was not activated). The guest that had experienced the soft lockups earlier had major clock drift and were way ahead: > > 16 Apr 09:29:26 ntpdate[11236]: step time server 194.109.22.18 offset -7337.731686 sec > > That's over 2 hours accumulated in less than 24 hours of uptime. The guests that hadn't been excperiencing the lockup issues berfore switching to the jiffies clocksource hadn't drifted that much after the switch and were, at most, 120s behind after the same amount of runtime. > Jiffies is a horrible clocksource. Its accuracy is highly dependent on the how busy the host is (not just a given guest). J