From mboxrd@z Thu Jan  1 00:00:00 1970
From: arnd@arndb.de (Arnd Bergmann)
Date: Fri, 05 Dec 2014 11:08:07 +0100
Subject: [PATCH] optimize ktime_divns for constant divisors
In-Reply-To: <alpine.LFD.2.11.1412042056240.470@knanqh.ubzr>
References: <alpine.LFD.2.11.1412031424150.470@knanqh.ubzr>
 <OF0EDEDB1C.C03829F7-ON48257DA5.00062083-48257DA5.0007628B@zte.com.cn>
 <alpine.LFD.2.11.1412042056240.470@knanqh.ubzr>
Message-ID: <2145860.PBxl6kLNRF@wuerfel>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Thursday 04 December 2014 23:30:08 Nicolas Pitre wrote:
> >          res += (u64)x_lo * y_hi + (u64)x_hi * y_lo;
> 
> That, too, risk overflowing.
> 
> Let's say x_lo = 0xffffffff and x_hi = 0xffffffff.  You get:
> 
>         0xffffffff * 0x83126e97 ->  0x83126e967ced9169
>         0xffffffff * 0x8d4fdf3b ->  0x8d4fdf3a72b020c5
>                                    -------------------
>                                    0x110624dd0ef9db22e
> 
> Therefore the sum doesn't fit into a u64 variable.
> 
> It is possible to skip carry handling but only when the MSB of both 
> constants are zero.  Here it is not the case.

If I understand this right, there are two possible optimizations to
avoid the overflow:

- for anything using monotonic time, or elapsed time, we can guarantee
  that the upper bits are zero. Relying on monotonic time is a bit
  dangerous, because that would mean introducing an API that works
  with ktime_get() but not ktime_get_real(), and risk introducing
  subtle bugs.
  However, ktime_us_delta() could be optimized, and we can introduce
  similar ktime_sec_delta() and ktime_ms_delta() functions with
  the same properties.

- one could always pre-shift the ktime_t value. For a division by
  1000, we can shift right by 3 bits first, then do the multiplication
  and then do another shift. Not sure if that helps at all or if
  the extra shift operation makes this counterproductive.

	Arnd