From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752146AbbJWDay (ORCPT ); Thu, 22 Oct 2015 23:30:54 -0400 Received: from mail-by2on0111.outbound.protection.outlook.com ([207.46.100.111]:22784 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750911AbbJWDav (ORCPT ); Thu, 22 Oct 2015 23:30:51 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=scottwood@freescale.com; Message-ID: <1445571040.701.149.camel@freescale.com> Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() From: Scott Wood To: Christophe Leroy CC: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , , , Date: Thu, 22 Oct 2015 22:30:40 -0500 In-Reply-To: <0a4e1624642137dc2b16bbb68ea87b1a479dfa34.1442876807.git.christophe.leroy@c-s.fr> References: <0a4e1624642137dc2b16bbb68ea87b1a479dfa34.1442876807.git.christophe.leroy@c-s.fr> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.0-fta1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Originating-IP: [50.157.106.250] X-ClientProxiedBy: CY1PR0801CA0011.namprd08.prod.outlook.com (25.163.136.149) To CY1PR03MB1485.namprd03.prod.outlook.com (25.163.17.158) X-Microsoft-Exchange-Diagnostics: 1;CY1PR03MB1485;2:IJGAH6UP1FyC8rT1rLbZE/pCQpTY7O3+SibnpFIGD1PlvqsuXnxfhDyOtnqCPrOlbf89sE9+HWvhxDiSUWjGMJR8YhxEXjpNIY4HGb17cBCgeq0Fk3T589aqIfgeYMEKti1ccVkNWCI6a38Xl/TBGc13B+9JuqNG8k05a1qBceo=;3:2+Ss5s4deXANfsSpQfkegE+VsVyR31/gwmWrIE39sIaUJ/8RBE79pwN3f7qruesV6X7AyD6zpKoWwqCrlJLv5PD4B4Ab5MaBFQvMY/ohQu31opl7Xe2yQR7w9+eDR4EskwqRaNLrgIPPjMmfZzjimg==;25:8Q6b+X/NndCWhGMIvMpGrNIe5a/DLx7Wm6qRQ39kNzfqcpflTuezmNfWnfl61mAqXXaLhOHLcqI4Fl1wOEDq9zGj/bnB3OTfO1Psx4rjWD9xSGPa6bxCSIVLs3iUPu0o2IrNQex4riT8BAbOCuFRMGv0YuU2kN3TuMC+Ik75gdBmYfmN37T8x7/JWhklTRlHJXE3OukbUVTcrbW1OPOB1PIhUR6W86+Bu0ian8X1IqEhKYu7nLMRfsqpk5ifVkXJAb8kU1ctknVViGe1/8e3Rg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR03MB1485; X-Microsoft-Exchange-Diagnostics: 1;CY1PR03MB1485;20:nXnjWCkYhkrClbatkIJTraUglGaTVzj1vzbWUDbM7mLYvkIzAGHykrbqnEdm/+cblEfe1WHoeipCHjSVhfM9V1EOAO/Ed3xc80UyksGRCDEjwSL0b2vHOdYwNY1soM/9od6VEWWwq5BBuPMbUewGZQk5DF2uWeHfbzBzHh2Ab6eKa9hOBlxnorifD7MB7F4laj/ixCx2663BH830ZRMKV5S62EHv3S9WcCn7HNtT1F+kpN0fAvTtsNKQ+ZohCMpHqNn6Wv90ygE8QdIXeghz7dr5RJxMBfzXKaSw3unP1TiIsp1t6TogcYtZhel+ahaObh7IoqNG8EvnWK5v9mw6a3UepVZBJUZpD5KPCFACu2VAAJUgmO5FM6qBXqkiUSr4hMiY+IWGzrhFWhqYizx99c1rB2/8+oOT1oyzpAz5nEbHH0RREOKQ9jZg7jxmeykZH2HjwZ284HkyGEPI7rr1QLB8yAmFbQFxm+8tSZXh+4FG6/bkOIj0u5p0Wl0FYvm6;4:4xBhEzIBNXrcUOH/q3ZRcKtaFEl5IxH+F3LHIj+wISFWDxAOkmSsi9l3UQWIuvtRZ1sS40eBJRcT8YKkJDvufL9llWeDMVMx4+M4YMvBbYE+YdlnNc4hg/BcNamyG45zDSv6OosDvF92Tl17QN0nNYwvxf3opRMps0emC4Ube8iv9GSVfFZkzSpsdpjvP2M5FVK0paBjqZi7KQftqShZe+MnOuzcFqLayzX/RUYKda6EEoMH+3pdGyT1cZSlqBGUqz0VhLr4F7uoRt9kB5sAxUYziSpDrVOZSRt8h6IiAqUzhE4ZwBnxXlZmkdQD41qvzVCc8xy74BGqho+ByWTSst+KNQ/psoxXXjdDHxlH4SI32Cth5+deMnJY1sQqO/8I X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(520078)(3002001)(102215026);SRVR:CY1PR03MB1485;BCL:0;PCL:0;RULEID:;SRVR:CY1PR03MB1485; X-Forefront-PRVS: 0738AF4208 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(24454002)(377424004)(199003)(189002)(106356001)(5820100001)(5008740100001)(50986999)(105586002)(97736004)(81156007)(50226001)(110136002)(5001960100002)(76176999)(2950100001)(36756003)(77096005)(33646002)(5004730100002)(5007970100001)(92566002)(23676002)(42186005)(47776003)(66066001)(122386002)(101416001)(19580405001)(87976001)(189998001)(86362001)(40100003)(50466002)(103116003)(99106002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR03MB1485;H:snotra.local;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTFQUjAzTUIxNDg1OzIzOmdmY2hMNXN2WW5IOEJMRUpMRk9CUFJwNFcz?= =?utf-8?B?RVREeU0wN21QMlF6d0hmUkxIYkg5SzJ4elNBVWNleHFHeDFwcE9OUk5adVp4?= =?utf-8?B?dlk5RXl1Tm4raVF4eHpaZUVvQkJvZDVFY1VkUkFHOExIMmN4TXhaWWNLZUZ3?= =?utf-8?B?Q1ZmR3RDYzZqWk1QRFBaOHNvRm5iMmtER3dWUSs5bDJBeVYwY2J6MVE4ZFpm?= =?utf-8?B?VDdBYmRoSVcrZmRRNVNkWVBzallwN3JZSUUvUkxMbFNKNUc5eG9EaFg3TVNi?= =?utf-8?B?dHV5M2RVQ0VYOEIrYVhvVENaSWFmVlF5Sk11NTA1RVd5T3dnSzNSQWJjR05i?= =?utf-8?B?Q3J2QzMrNUxNMmpwRGlNTUpyZ2RmNU1Vc0t6aS9zL0FqYUtISG9tZ213KzZi?= =?utf-8?B?RkcxbVByN0s4dUYwL2lzazFKTWpwZ0NIc0RLWWVSUzE1V0pDZExnd0FPdXlD?= =?utf-8?B?dXN6eXphdlYxaEtDTGFJSWJQUGppVUNIUVNHd3Y3cWVSR1FYSXhHNkQwejI4?= =?utf-8?B?R2xuajFXWU95R2N4dktHelZIc3BsRUJSemh1bGlGTmN2RjdaRjFvUGFMbGxi?= =?utf-8?B?OUFERDlXZzBZY3RBclZUaHpyb3FzdSs2UVFOUi9CcW81NUw0OXlDWkxocnFp?= =?utf-8?B?SWUzWkNpSmZpbWRlYVFzNlAxK01ubTRjb2pQVDBhejZuVkVFQTJvTndIT1Rw?= =?utf-8?B?SlRZa0x4UlhjR1F3STd4OWhXNWVhbHNXc1MrSW1vWWxIOXh3VVNPdmFMQTF6?= =?utf-8?B?RC93bmFjUENwNDFPT0d4SDFXQVJmRmlnSWFXZDJnaE8ySnkzM3liY1FFTFlo?= =?utf-8?B?bjNhdFpDRElGd1JDYVVjTEZrRFdLMCt3VU1ILzVPYmNBeWJHSjU5dU03UmV6?= =?utf-8?B?KzQzVmdQZ1hGM2J6VHRqd1dKN0RPa1gxeGdrU1Q5Q1AxNlFyNitmRzM3YlNu?= =?utf-8?B?NnJtYktOZS9udkpKSDJKOTVoa1VOS25mRUtPaUcxa2pUZmdORjVhUHFvclpT?= =?utf-8?B?WjZhQkJqVFBnemRwZzRNRGRqY0kzSElZV3VXdXZ0UUtESnVIekc1eHVkQTYv?= =?utf-8?B?M1RuVnZRNEIyMVNjNjZLL3BtSkx2bzZib0pyb2NOcG5JMnBsQmtPV3QxeHFq?= =?utf-8?B?c0hOUldHSlJZUHJnSElXakdlVkFDMzgzUDVkbGdNTkJvaXJ2bk9TTkVXTU1Q?= =?utf-8?B?VW1OaHBISHIzU25IaGlGYlNqaGUrQyt3T0I0Z2dUQ0piRllFSHJuMmlmam55?= =?utf-8?B?aHlQbU5TUitWMS84NE43YzJkeUJvYzRIVDF3eGhVWCtoZmg2c3A2SFJ3TzZs?= =?utf-8?B?YURHczFXNGFrWHhnZjIrSnpSM2ZGb2xBRVg4NFRGKzl3UHZ4ajA5bnI0TS9q?= =?utf-8?Q?DxukH110?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR03MB1485;5:vmEVOySEfbDkGKnyRXFLD46fQ2IDpzUY/TGKAXFERDqOPJ/CCr9msrhfY68qAZeQtA4lqTfNWk6aVPw6E/mjvA5xiY/19O4HcuuL6oRl0NT0MJUH1sNkYHYEP9kXutrwtvSR+QqBRi61EnZxfK3ERw==;24:6R6rQYhbjiLCZzZDKvZEjWjjwB+F6sBY9Ys+jZ2kvZYKiFA1b05IkI6noNb4rN8PL+l8Wri1ViuS91rNfiQOYy8wTCBSeFvT5Z2SwOqDbjw=;20:cjQ+YxlIxsGf8zyMc/r/5DQ/AU5oUALw+T23heG53zsODRo+4ldS4lEBYFdYoSpnKUYj43Tsd8wASp9508/uBQ== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: freescale.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Oct 2015 03:30:49.7653 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR03MB1485 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote: > r5 does contain the value to be updated, so lets use r5 all way long > for that. It makes the code more readable. > > To avoid confusion, it is better to use adde instead of addc > > The first addition is useless. Its only purpose is to clear carry. > As r4 is a signed int that is always positive, this can be done by > using srawi instead of srwi > > Let's also remove the comment about bdnz having no overhead as it > is not correct on all powerpc, at least on MPC8xx > > In the last part, in our situation, the remaining quantity of bytes > to be proceeded is between 0 and 3. Therefore, we can base that part > on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 > then proceding on comparisons and substractions. > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++-------------------- > 1 file changed, 17 insertions(+), 20 deletions(-) Do you have benchmarks for these optimizations? -Scott > > diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S > index 3472372..9c12602 100644 > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -27,35 +27,32 @@ > * csum_partial(buff, len, sum) > */ > _GLOBAL(csum_partial) > - addic r0,r5,0 > subi r3,r3,4 > - srwi. r6,r4,2 > + srawi. r6,r4,2 /* Divide len by 4 and also clear carry */ > beq 3f /* if we're doing < 4 bytes */ > - andi. r5,r3,2 /* Align buffer to longword boundary */ > + andi. r0,r3,2 /* Align buffer to longword boundary */ > beq+ 1f > - lhz r5,4(r3) /* do 2 bytes to get aligned */ > - addi r3,r3,2 > + lhz r0,4(r3) /* do 2 bytes to get aligned */ > subi r4,r4,2 > - addc r0,r0,r5 > + addi r3,r3,2 > srwi. r6,r4,2 /* # words to do */ > + adde r5,r5,r0 > beq 3f > 1: mtctr r6 > -2: lwzu r5,4(r3) /* the bdnz has zero overhead, so it should */ > - adde r0,r0,r5 /* be unnecessary to unroll this loop */ > +2: lwzu r0,4(r3) > + adde r5,r5,r0 > bdnz 2b > - andi. r4,r4,3 > -3: cmpwi 0,r4,2 > - blt+ 4f > - lhz r5,4(r3) > +3: andi. r0,r4,2 > + beq+ 4f > + lhz r0,4(r3) > addi r3,r3,2 > - subi r4,r4,2 > - adde r0,r0,r5 > -4: cmpwi 0,r4,1 > - bne+ 5f > - lbz r5,4(r3) > - slwi r5,r5,8 /* Upper byte of word */ > - adde r0,r0,r5 > -5: addze r3,r0 /* add in final carry */ > + adde r5,r5,r0 > +4: andi. r0,r4,1 > + beq+ 5f > + lbz r0,4(r3) > + slwi r0,r0,8 /* Upper byte of word */ > + adde r5,r5,r0 > +5: addze r3,r5 /* add in final carry */ > blr > > /*