From mboxrd@z Thu Jan 1 00:00:00 1970 From: Knut Petersen Subject: Re: [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts Date: Tue, 30 Aug 2005 19:58:51 +0200 Message-ID: <43149E5B.7040006@t-online.de> References: <43148610.70406@t-online.de> Reply-To: linux-fbdev-devel@lists.sourceforge.net Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list1.sourceforge.net with esmtp (Exim 4.30) id 1EAAKq-0004mW-SN for linux-fbdev-devel@lists.sourceforge.net; Tue, 30 Aug 2005 10:55:48 -0700 Received: from mailout10.sul.t-online.com ([194.25.134.21]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1EAAKp-0006PD-OD for linux-fbdev-devel@lists.sourceforge.net; Tue, 30 Aug 2005 10:55:49 -0700 In-Reply-To: Sender: linux-fbdev-devel-admin@lists.sourceforge.net Errors-To: linux-fbdev-devel-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: List-Post: List-Help: List-Subscribe: , List-Archive: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: linux-fbdev-devel@lists.sourceforge.net Cc: Andrew Morton , "Antonino A. Daplas" , Linux Kernel Development , Jochen Hein >Probably you can make it even faster by avoiding the multiplication, lik= e > > unsigned int offset =3D 0; > for (i =3D 0; i < image.height; i++) { > dst[offset] =3D src[i]; > offset +=3D pitch; > } > More than two decades ago I learned to avoid mul and imul. Use shifts,=20 add and lea instead, that was the credo those days. The name of the game was CP/M 80/86, a86,=20 d86 and ddt ;-) But let=B4s get serious again. Your proposed change of the patch results in a 21 ms performance=20 decrease on my system. Yes, I do know that this is hard to believe. I tested a similar=20 variation before, and the results were even worse. Avoiding mul is a good idea in assembly language today, but often it is=20 better to write a multiplication with the loop counter in C and not to introduce an extra=20 variable instead. The compiler will optimize the code and it=B4s easier for gcc without that=20 extra variable. More interesting would be the question what should be done for idx=3D=3D2= or=20 idx=3D=3D3. Probably fb_pad_aligned_buffer() is also slower for those cases. But does anybody=20 use such fonts? cu, knut ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practic= es Agile & Plan-Driven Development * Managing Projects & Teams * Testing & Q= A Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932243AbVH3Rz6 (ORCPT ); Tue, 30 Aug 2005 13:55:58 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932242AbVH3Rz6 (ORCPT ); Tue, 30 Aug 2005 13:55:58 -0400 Received: from mailout10.sul.t-online.com ([194.25.134.21]:22701 "EHLO mailout10.sul.t-online.com") by vger.kernel.org with ESMTP id S932241AbVH3Rz5 (ORCPT ); Tue, 30 Aug 2005 13:55:57 -0400 Message-ID: <43149E5B.7040006@t-online.de> Date: Tue, 30 Aug 2005 19:58:51 +0200 From: Knut Petersen User-Agent: Mozilla/5.0 (X11; U; Linux i686; de-AT; rv:1.7.7) Gecko/20050414 X-Accept-Language: de, en MIME-Version: 1.0 To: linux-fbdev-devel@lists.sourceforge.net CC: Andrew Morton , "Antonino A. Daplas" , Linux Kernel Development , Jochen Hein Subject: Re: [Linux-fbdev-devel] [PATCH 1/1 2.6.13] framebuffer: bit_putcs() optimization for 8x* fonts References: <43148610.70406@t-online.de> In-Reply-To: X-Enigmail-Version: 0.86.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-ID: GuloccZVYewomlYkw15S2lI-FDg-3vcfL8NY55dHpjpGzrpDChij4+@t-dialin.net X-TOI-MSGID: 2d4c02bb-d883-4929-b55e-4ebabc35657d Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org >Probably you can make it even faster by avoiding the multiplication, like > > unsigned int offset = 0; > for (i = 0; i < image.height; i++) { > dst[offset] = src[i]; > offset += pitch; > } > More than two decades ago I learned to avoid mul and imul. Use shifts, add and lea instead, that was the credo those days. The name of the game was CP/M 80/86, a86, d86 and ddt ;-) But letīs get serious again. Your proposed change of the patch results in a 21 ms performance decrease on my system. Yes, I do know that this is hard to believe. I tested a similar variation before, and the results were even worse. Avoiding mul is a good idea in assembly language today, but often it is better to write a multiplication with the loop counter in C and not to introduce an extra variable instead. The compiler will optimize the code and itīs easier for gcc without that extra variable. More interesting would be the question what should be done for idx==2 or idx==3. Probably fb_pad_aligned_buffer() is also slower for those cases. But does anybody use such fonts? cu, knut