From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753977Ab1JaOGu (ORCPT ); Mon, 31 Oct 2011 10:06:50 -0400 Received: from mail.lemote.com ([222.92.8.141]:50252 "EHLO lemote.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752138Ab1JaOGt (ORCPT ); Mon, 31 Oct 2011 10:06:49 -0400 X-Greylist: delayed 452 seconds by postgrey-1.27 at vger.kernel.org; Mon, 31 Oct 2011 10:06:48 EDT Message-ID: <4EAEA9AF.1060904@lemote.com> Date: Mon, 31 Oct 2011 21:59:11 +0800 From: zhangfx User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: John Stultz CC: Chen Jie , Yong Zhang , linux-mips@linux-mips.org, LKML , tglx@linutronix.de, yanhua , =?UTF-8?B?6aG55a6H?= , =?UTF-8?B?5a2Z5rW35YuH?= Subject: Re: [MIPS]clocks_calc_mult_shift() may gen a too big mult value References: <1320066197.2266.11.camel@js-netbook> In-Reply-To: <1320066197.2266.11.camel@js-netbook> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Sirs, >> Thanks for the suggestion. And sorry for I didn't notice the upstream >> code has already hooked to clocksource_register_hz() in csrc-r4k.c >> (We're using r4000 clock source) >> >> I'm afraid this still doesn't fix my case. Through >> clocksource_register_hz()->__clocksource_register_scale()->__clocksource_updatefreq_scale, >> I got a calculated maxsec = (0xffffffff - (0xffffffff>>5))/250000500 = >> 16 # assume mips_hpt_frequency=250000500 >> >> With this maxsec, I got a mult of 0xffffde72, still too big. > Hrmm. Yong Zang is right to suggest clocksource_register_hz(), as the > intention of that code is to try to avoid these sorts of issues. > > What is the corresponding shift value you're getting for the value > above? > > Could you annotate clocks_calc_mult_shift() a little bit to see where > things might be going wrong? Let me give some real world data: in one machine with 500MHz freq, the calculated freq = 500084016, and clocks_calc_mult_shift() give mult = 4294245725 shift = 30 but in the 1785th call to update_wall_time, due to error correction algorithm, the mult become 4293964632, in next update_wall_time, the ntp_error is 0x301c93b7927c, which lead to an adj of 20, then mult is overflow: mult = 4293964632 + (1<<20) = 45912 with this mult, if anyone call timekeeping_get_ns or others using mult, the time concept will be extremely wrong, so some sleep will (almost)never return => virtually hang We are not abosulately sure that the error source is normal, but anyway it is a possible for the code to overflow, and it will cause hang. For this case, the timekeeping_bigadjust should be able to control adj to a maximum of around 20 with the lookahead for any error. So if the mult is chosen at shift = 29, then mult becomes 4294245725/2, it will not be possible to be overflowed. In short, choosing a mult close to 2^32 is dangerous. But I don't know what's the best way to avoid it for general cases, because I don't know how big error can be and the adj can be for different systems. Regards Yours Fuxin Zhang > > thanks > -john > >