From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760700AbZEMQnp (ORCPT ); Wed, 13 May 2009 12:43:45 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756659AbZEMQng (ORCPT ); Wed, 13 May 2009 12:43:36 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:33736 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752003AbZEMQnf (ORCPT ); Wed, 13 May 2009 12:43:35 -0400 Subject: Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds From: John Stultz To: Jon Hunter Cc: Ingo Molnar , Thomas Gleixner , "linux-kernel@vger.kernel.org" In-Reply-To: <4A0AE3E3.5090304@ti.com> References: <49ECE615.2010800@ti.com> <20090421063523.GA8020@elte.hu> <1240345936.6080.6.camel@localhost> <49EE54B4.9020700@ti.com> <1240358525.6080.40.camel@localhost> <4A02F5A3.3050004@ti.com> <1241744048.7518.132.camel@localhost.localdomain> <4A04584E.4020307@ti.com> <1241830281.7297.21.camel@localhost.localdomain> <4A0A07D6.90408@ti.com> <1242172727.3462.55.camel@localhost> <4A0AE3E3.5090304@ti.com> Content-Type: text/plain Date: Wed, 13 May 2009 09:41:12 -0700 Message-Id: <1242232872.9110.98.camel@jstultz-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.24.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2009-05-13 at 10:14 -0500, Jon Hunter wrote: > john stultz wrote: > > Well, the mult adjustments should be quite small, especially compared to > > the NSEC_PER_SEC/HZ adjustment. > > > > Hmm... Although, I guess we could get bitten if the max_deferment was > > like an hour, and the adjustment was enough that it scaled out to and we > > ended up being a second late or so. So you have a point. > > > > But since the clockevent driver is not scaled, we probably can get away > > with using the orig_mult value instead of mult, and be ok. > > > > Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the > > larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale > > up without a problem. > > Yes, may be this would be a safer option. Thinking about this I was > wondering if we should always use max_deferement/10, because I did not > think that there would ever be a case where NSEC_PER_SEC/HZ would be > greater. If NSEC_PER_SEC/HZ was greater than max_deferement/10 this > would imply that the clocksource would wrap after only 10 jiffies, if I > have the math right... Right, but even with such limitiations, if an arch can skip every 5 ticks, they probably will try, right? :) > > I suspect it would be tough to hit this issue though. > > Agree. > > > Two patches should be fine. > > Ok, I will re-post as two once we have the final version. > > > Looks good overall. We may want to add the -10% (or -5%) to be totally > > safe, but that's likely just me being paranoid. > > I am paranoid too! Do you care if we use 6.25% or 12.5% margin instead > of 10% or 5%? This way we can avoid a 64-bit division by using a simple > shift. See below. I have implemented a 6.25% margin for now. Let me know > your thoughts. That sounds reasonable to me. > One final question, I noticed in clocksource.h that the definition of > function cyc2ns returns a type of s64, however, in the function itself a > variable of type u64 is used and returned. Should this function be > modified as follows? > > static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles) > { > - u64 ret = (u64)cycles; > + s64 ret = (s64)cycles; > ret = (ret * cs->mult) >> cs->shift; > return ret; > } Damn. So this brings up an issue I had missed prior. Given that a clocksoure's shift value is calculated so that the cycles*mult multiplication won't overflow 64 bits, there may be inherent assumptions in the clocksource code that limit the results of timekeeping_max_deferrment() other then just the clocksource mask value. So even if the clocksource doesn't wrap over the interval, it could be large enough to cause the cyc2ns function to break given a large enough cycle interval. Right now that assumption is spread out in the individual clocksource drivers. I've got a calculate_shift() helper patch sitting around that will help unify that, but even so there still is a trade-off in that: 1) The larger the shift value, the finer grained (smoother) we can be in making NTP adjustments. 2) The smaller the shift value, the smaller the mult value, so the longer the cycle interval length can be before we overflow. So I'm not sure how we can better extend this in all cases. I'll have to think about how that would change timekeeping_max_deferment() and how we'd have to calculate a reasonable max efficiently. Other then this issue (which is my fault for not noticing it earlier), you're patch looks great. I just feel badly for making you rev this thing over and over. One option if you're itching to push it in and be done with it: Make timekeeping_max_deferment() return just 1 second for now. Your patch provides the right infrastructure for the timekeeping code to provide its limits to the clockevents code. So you can use a safe short constant value for now, and we can extend that out correctly in a future patch. Sorry again for not catching this until now. :( thanks -john