From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752017Ab0HSIyC (ORCPT ); Thu, 19 Aug 2010 04:54:02 -0400 Received: from casper.infradead.org ([85.118.1.10]:54290 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751530Ab0HSIx4 convert rfc822-to-8bit (ORCPT ); Thu, 19 Aug 2010 04:53:56 -0400 Subject: Re: [patch 1/3] sched: init rt_avg stat whenever rq comes online From: Peter Zijlstra To: Suresh Siddha Cc: "mingo@elte.hu" , "linux-kernel@vger.kernel.org" , "chris@frostnet.net" , "debian00@aliceadsl.fr" , "hpa@zytor.com" , "jonathan.protzenko@gmail.com" , "mans@mansr.com" , "psastudio@mail.ru" , "rjw@sisk.pl" , "stephan.eicher@web.de" , "sxxe@gmx.de" , "thomas@archlinux.org" , "venki@google.com" , "wonghow@gmail.com" , "stable@kernel.org" , tglx In-Reply-To: <1282177213.7801.17.camel@sbsiddha-MOBL3.sc.intel.com> References: <20100813190539.410550989@sbsiddha-MOBL3.sc.intel.com> <20100813193911.827207098@sbsiddha-MOBL3.sc.intel.com> <1281944854.1926.948.camel@laptop> <1281980179.2676.22.camel@sbsiddha-MOBL3.sc.intel.com> <1281986708.1926.1877.camel@laptop> <1282035085.1926.2164.camel@laptop> <1282177213.7801.17.camel@sbsiddha-MOBL3.sc.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Thu, 19 Aug 2010 10:53:26 +0200 Message-ID: <1282208006.1926.4517.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2010-08-18 at 17:20 -0700, Suresh Siddha wrote: > On Tue, 2010-08-17 at 01:51 -0700, Peter Zijlstra wrote: > > On Mon, 2010-08-16 at 21:25 +0200, Peter Zijlstra wrote: > > > You can use something like: > > > > > > suspend: > > > __get_cpu_var(cyc2ns_suspend) = sched_clock(); > > > > > > resume: > > > for_each_possible_cpu(i) > > > per_cpu(cyc2ns_offset, i) += per_cpu(cyc2ns_suspend); > > > > > > or something like that to keep sched_clock() stable, which is exactly > > > what most (all?) its users expect when we report the TSC is usable. > > > > That's actually broken, you only want a single offset, otherwise we > > de-sync the TSC, which is bad. > > > > So simply store the sched_clock() value at suspend time on the single > > CPU that is still running, then on resume make sure sched_clock() > > continues there by adding that stamp to all CPU offsets. > > > Peter, That might not be enough. I should add that in my Lenovo T410 > (having 2 core wsm cpu), TSC's are somehow set to a strange big value > (for example 0xfffffffebc22f02e) after resume from S3. It looks like > bios might be writing TSC during resume. I am not sure if this is the > case for other OEM laptops aswell. I am checking. ARGH, please kill all SMM support for future CPUs ;-) Are the TSCs still sync'ed though? If so, we can still compute a offset and continue with things, albeit it requires something like: local_irq_save(flags); __get_cpu_var(cyc2ns_offset) = 0; offset = cyc2ns_suspend - sched_clock(); local_irq_restore(flags); for_each_possible_cpu(i) per_cpu(cyc2ns_offset, i) = offset; Which would take the funny offset into account and make it resume at where we left off. If they got out of sync, we need to flip sched_clock_stable and work on getting the sched_clock.c code to be monotonic over such a flip. > So such large values of TSC (leading to a very big difference between > rq->clock and rq->age_stamp) wont be correctly handled by > scale_rt_power() either. Still, we need to fix the clock, not fudge the users.