From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Dan Magenheimer" Subject: RE: RE: [PATCH] record max stime skew (was RE: [PATCH] strictly increasing hvm guest time) Date: Fri, 4 Jul 2008 09:11:55 -0600 Message-ID: <20080704091155093.00000003744@djm-pc> References: Reply-To: "dan.magenheimer@oracle.com" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser , "Xen-Devel (E-mail)" Cc: Dave Winchell List-Id: xen-devel@lists.xenproject.org > Skipping cpu0 makes no sense. Oops, I misunderstood that for some reason. Here's a fixed version. I also now preserve the "Platform timer is" line since that can get flushed out of the dmesg buffer. Any idea why the skew can get so bad? Dan > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, July 03, 2008 5:00 PM > To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > Cc: Dave Winchell > Subject: Re: [Xen-devel] RE: [PATCH] record max stime skew (was RE: > [PATCH] strictly increasing hvm guest time) > = > = > Skipping cpu0 makes no sense. It's not the 'master'. = > master_stime is time > calculated from the platform timer (hpet, pit, or whatever). = > All cpus are > equal peers. Apart from that looks plausible to me. > = > -- Keir > = > On 3/7/08 21:03, "Dan Magenheimer" wrote: > = > >>> IMHO, it would be nice to put this patch into the tree as it > >>> will be good for helping to diagnose time skew problems > >>> such as the one just reported on the list. > >> > >> Oops! Just after I sent the above email, I checked again and > >> the same machine (no reboots, no guests ever launched) now reports > >> a max stime skew of 4333ns!! Methinks there might be some > >> periodic glitch in the calibration code? > > > > OK this version records not only max but also a distribution > > of skew. (The code is a bit ugly... I thought about doing > > something fancy with log-binary but decided a few base-10 > > ranges were clearer for a human to read.) > > > > With this, I use "watch -d 'xm debug-key t; xm dmesg | tail -3'" > > and can observe that (on my single-socket two-core recent-vintage > > Intel box) roughly three-quarters of the skew measurements are > > between 10-100nsec, roughly one-quarter are between 100ns-1us, > > a couple percent are between 1us-10us and a few are >10us. > > > > This represents an approximate distribution of how long an hvm > > guest might observe time to be stopped (if it is able to repeatedly > > read time values quickly enough). > > > > So on some machines, this might be substantially worse than the > > old hvm-platform-timer-built-on-tsc mechanism (though we had > > no monotonicity constraint built into that). > > > > I wonder if the >1us outliers are occurring only if the > > processor has been idle for awhile, vs entirely random. > > > > Dan > = > = >