From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: [PATCH] tcp_cubic: use 32 bit math Date: Sat, 10 Mar 2007 12:48:26 +0100 Message-ID: <20070310114826.GB1608@1wt.eu> References: <20070307170731.2b4397e3@freekitty> <20070307.185539.48527371.davem@davemloft.net> <45EF7EB7.8040806@linux-foundation.org> <20070307.195135.74749102.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: shemminger@linux-foundation.org, rkuhn@e18.physik.tu-muenchen.de, andi@firstfloor.org, dada1@cosmosbay.com, jengelh@linux01.gwdg.de, linux-kernel@vger.kernel.org, netdev@vger.kernel.org To: David Miller Return-path: Received: from 1wt.eu ([62.212.114.60]:1102 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1766661AbXCJLw6 (ORCPT ); Sat, 10 Mar 2007 06:52:58 -0500 Content-Disposition: inline In-Reply-To: <20070307.195135.74749102.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Wed, Mar 07, 2007 at 07:51:35PM -0800, David Miller wrote: > From: Stephen Hemminger > Date: Wed, 07 Mar 2007 19:10:47 -0800 > > > David Miller wrote: > > > What about Willy Tarreau's supposedly even faster variant? > > > Or does this incorporate that set of improvements? > > > > > That's what this is: > > x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3; > > Great, thanks for the clarification. Oh BTW, I have a newer version with a first approximation of the cbrt() before the div64_64, which allows us to reduce from 3 div64 to only 2 div64. This results in a version which is twice as fast as the initial one (ncubic), but with slightly less accuracy (0.286% compared to 0.247). But I see that other functions such as hcbrt() had a 1.5% avg error, so I think this is not dramatic. Also, I managed to remove all other divides, to be kind with CPUs having a slow divide instruction or no divide at all. Since we compute on limited range (22 bits), we can multiply then shift right. It shows me even slightly better time on pentium-m and athlon, with a slightly higher avg error (0.297% compared to 0.286%), and slightly smaller code. I just have to clean experiments from my code to provide a patch. David, Stephen, are you interested ? $ ./bictcp fls(0)=0, fls(1)=1, fls(256)=9 Calibrating Function clocks mean(us) max(us) std(us) Avg error bictcp 936 0.61 24.28 1.99 0.172% ocubic 886 0.57 23.51 3.18 0.274% ncubic 644 0.42 16.59 2.18 0.247% ncubic32 444 0.29 11.47 1.50 0.247% ncubic32_1 444 0.29 11.56 1.88 0.238% ncubic32b3 337 0.22 8.67 0.88 0.286% ncubic_ndiv3 329 0.21 8.46 0.69 0.297% acbrt 707 0.46 18.05 0.80 0.275% hcbrt 644 0.42 16.44 0.51 1.580% Regards, Willy