From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932255AbYEULJR (ORCPT ); Wed, 21 May 2008 07:09:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756684AbYEULJB (ORCPT ); Wed, 21 May 2008 07:09:01 -0400 Received: from smtpq2.tilbu1.nb.home.nl ([213.51.146.201]:34950 "EHLO smtpq2.tilbu1.nb.home.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755675AbYEULJA (ORCPT ); Wed, 21 May 2008 07:09:00 -0400 Message-ID: <4834035D.5090703@keyaccess.nl> Date: Wed, 21 May 2008 13:11:25 +0200 From: Rene Herman User-Agent: Thunderbird 2.0.0.14 (X11/20080421) MIME-Version: 1.0 To: Soumyadip Das Mahapatra CC: Benoit Boissinot , Akinobu Mita , Harvey Harrison , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] bitreversal program References: <1211229736.5915.86.camel@brick> <961aa3350805200513i4e02716eh79da76345718c3b2@mail.gmail.com> <40f323d00805200847t77b2d875j451d0eb9758cf9ff@mail.gmail.com> <20080520163912.GP7567@pirzuine> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21-05-08 10:54, Soumyadip Das Mahapatra wrote: > Sorry to disturb you again. But i tested my code against Akinobu's one > and the test result shows my code takes less cpu time than that of > Akinobu's. The unfortunate thing about these kinds of changes is that they're not all that easily tested. Straightforwardness would suggest that obviously the current table driven method will be faster due to needing fewer code cycles. Cache considerations add to that in the sense of instruction cache and can (!) detract from it in the sense of data cache; sometimes dramaticaly detract due to cache misses basically dwarving most anything else. However, in this case the table is a tiny 256-byte one which isn't even going to be pulled in completely in normal usage (just the cache-lines needed) while on the other hand the extra i-cache pressure from the increased code in your version is always there. It's unexpected that you would get better results from your new code (and I'm not; I took Benoit's posted test and get 15 seconds for your version versus 9 for the original table-driven one) and in this case, reality wouldn't contradict the micro-benchmark either. It's when the table grows and, especially, more of it is needed on a regular basis that you'd start to worry. PS: If you're going to go really micro, there are even going to be differences between bitreversing 0x00000000 which is just going to need the first byte (hence cacheline) and say 0x004080c0 which is going to occupy 4 cachelines. Again not in the isolated test though; the data in this case is small enough that you should be having a hard time getting your version to perform better -- forking off a competing process that does its best to dirty cache might do it, but then you're in a situation which is no longer real-world with respect to this "call once" bit of API... Rene.