From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailhub1.si.c-s.fr (2.236.17.93.rev.sfr.net [93.17.236.2]) by lists.ozlabs.org (Postfix) with ESMTP id D1B241A0372 for ; Mon, 17 Aug 2015 23:05:44 +1000 (AEST) Message-ID: <55D1DC24.2020407@c-s.fr> Date: Mon, 17 Aug 2015 15:05:40 +0200 From: leroy christophe MIME-Version: 1.0 To: Segher Boessenkool , Scott Wood CC: linuxppc-dev@lists.ozlabs.org, Paul Mackerras , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop References: <67cf476f657e87b2ea586951a57ae3ba3c1e3c0c.1435655733.git.christophe.leroy@c-s.fr> <20150806003059.GD18479@gate.crashing.org> <1438828301.2097.126.camel@freescale.com> <20150806043938.GE18479@gate.crashing.org> <1438901145.2097.170.camel@freescale.com> <20150806232506.GB22196@gate.crashing.org> <55D1BDF0.4090008@c-s.fr> <55D1BED4.4040808@c-s.fr> In-Reply-To: <55D1BED4.4040808@c-s.fr> Content-Type: text/plain; charset=utf-8; format=flowed List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Le 17/08/2015 13:00, leroy christophe a écrit : > > > Le 17/08/2015 12:56, leroy christophe a écrit : >> >> >> Le 07/08/2015 01:25, Segher Boessenkool a écrit : >>> On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote: >>>> If this makes performance non-negligibly worse on other 32-bit >>>> chips, and is >>>> an important improvement on 8xx, then we can use an ifdef since 8xx >>>> already >>>> requires its own kernel build. I'd prefer to see a benchmark >>>> showing that it >>>> actually does make things worse on those chips, though. >>> And I'd like to see a benchmark that shows it *does not* hurt >>> performance >>> on most chips, and does improve things on 8xx, and by how much. But it >>> isn't *me* who has to show that, it is not my patch. >> Ok, following this discussion I made some additional measurement and >> it looks like: >> * There is almost no change on the 885 >> * There is a non negligeable degradation on the 8323 (19.5 tb ticks >> instead of 15.3) >> >> Thanks for pointing this out, I think my patch is therefore not good. >> > Oops, I was talking about my other past, the one that was to optimise > ip_csum_fast. > I still have to measure csum_partial > Now, I have the results for csum_partial(). The measurement is done with mftbl() before and after calling the function, with IRQ off to get a stable measure. Measurement is done with a transfer of vmlinux file done 3 times via scp toward the target. We get approximatly 50000 calls to csum_partial() On MPC885: 1/ Without the patchset, mean time spent in csum_partial() is 167 tb ticks. 2/ With the patchset, mean time is 150 tb ticks On MPC8323: 1/ Without the patchset, mean time is 287 tb ticks 2/ With the patchset, mean time is 256 tb ticks The improvement is approximatly 10% in both cases So, unlike my patch on ip_fast_csum(), this one is worth it. Christophe