From mboxrd@z Thu Jan 1 00:00:00 1970 From: johnstul@us.ibm.com (john stultz) Date: Mon, 03 Jan 2011 11:47:29 -0800 Subject: [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks In-Reply-To: <20110103003718.GF17727@n2100.arm.linux.org.uk> References: <20101221105149.GO28157@n2100.arm.linux.org.uk> <20110103003718.GF17727@n2100.arm.linux.org.uk> Message-ID: <1294084049.2571.30.camel@work-vm> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, 2011-01-03 at 00:37 +0000, Russell King - ARM Linux wrote: > On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote: > > On Tue, 21 Dec 2010, Russell King - ARM Linux wrote: > > > > > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote: > > > > > > > > The minsec argument to clocks_calc_mult_shift() [ which appears to be > > > > misnamed by the way ] is used to clamp the magnitude of the mult factor > > > > so that the given range won't overflow a 64 bit result. The mult factor > > > > is itself already limited to a maximum of 32 bits. Given that the largest > > > > clock tick range we may have is also 32 bits, there is no way our usage of > > > > any computed mult factor would ever overflow 64 bits. > > > > > > > > The choice of 60 seconds here for minsec is rather arbitrary. If the > > > > mult factor wasn't already limited to 32 bits, this value of 60 would > > > > create overflows for clocks which take more than 60 seconds to wrap. > > > > > > 60 seconds was arbitary, chosen from the selection of clock rates which > > > I had available at the time (the highest of which was 24MHz). > > > > Fair enough. It is just not universally optimal given the different > > clock ranges this code now covers. > > > > > > And for clocks which take less than 60 seconds to wrap then we do lose > > > > precision as the mult factor is made smaller (fewer bits) to be adjusted > > > > to a range which is larger than what we actually use. This currently > > > > affects clocks faster than 71MHz. > > > > > > > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would > > > > be computed in all cases. But let's be formal and provide the actual > > > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock() > > > > users are affected as they all are using clocks < 71MHz. > > > > > > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead > > > of assuming 5 seconds? It also has access to the mask and rate. > > > > There is a comment to that effect right above the > > clocks_calc_mult_shift() indicating that this is a known issue in that > > case already. > > > > And looking at that code, it appears that there is a balance to be made > > between cs->max_idle_ns and cs->mult, the former currently being > > determined by the later. But if cs->mult is maximized to 32 bits, that > > leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 > > seconds only. > > > > So this is not clear to me what would be the optimal mult/shift values > > in the __clocksource_updatefreq_scale() context, while this is rather > > obvious in the init_sched_clock() context. Hence this patch. > > As clocksources are about precision (I believe it has been stated so in > the past) it seems that fudging the shift and multiplier to get an > extended period out of the clock is not the correct approach. > > Maybe Thomas or John can shed some light on this? Right, so there's conflicting goals of long idle times and fine-grained time accuracy that pull the clocksource mult/shift pair calculation in different directions. Originally, the advice was pick a mult/shift pair that allows for a few seconds of time to accrue before we overflow. However this was a little vague and as the number of clocksources grew, and the desired idle times were increasing, I started to see that the lack of consistency was going to give us troubles. So the clocksource_register_hz/khz interfaces try to consolidate that decision into one place, utilizing the MAX_UPDATE_LENGTH value (currently 5 seconds) as the max idle time. Clocksource driver writers don't have to worry about making the correct decision, the kernel should do it for you. Now, at some point I expect folks to get grumpy about the max idle time being limited to 5 seconds, and we'll have to either push it out for everyone (at the cost of less accurate NTP steering), or make it a architecture specific config option. Now, for sched_clock, there are a different set of expectations with regards to accuracy and expected idle times, and we'll probably need a similar consolidation effort to make sure the mult/shift calculations are correct and the resulting limits are taken into account by the scheduler when going into NOHZ mode. thanks -john