From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757771Ab2BIRqP (ORCPT ); Thu, 9 Feb 2012 12:46:15 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47693 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754519Ab2BIRqO (ORCPT ); Thu, 9 Feb 2012 12:46:14 -0500 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <20120209154819.32070.93358.stgit@warthog.procyon.org.uk> <1328804920.6099.3.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> To: Linus Torvalds Cc: dhowells@redhat.com, Eric Dumazet , adobriyan@gmail.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] Reduce the number of expensive division instructions done by _parse_integer() Date: Thu, 09 Feb 2012 17:46:07 +0000 Message-ID: <7095.1328809567@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds wrote: > Looking at the code generated, the "val >> 60" thing actually does > generate a shift, and at least on x86-64, the attached patch generates > better code. On fixed-size instruction arches, the runtime shift is probably the better option, as simply loading 64-bit large constant would take likely take at least four instructions - and might involve a shift anyway. On the other hand, it seems the compiler can optimise your suggestion fairly well. In both cases, the 64-bit arithmetic can be reduced to 32-bit arithmetic on the MSW only on 32-bit arches. On x86_64 we have: 400649: 48 89 d8 mov %rbx,%rax 40064c: 48 c1 e8 3c shr $0x3c,%rax 400650: 48 85 c0 test %rax,%rax 400653: 75 52 jne 4006a7 <_parse_integer+0xa7> And on i386 we have: 8048532: 8b 54 24 14 mov 0x14(%esp),%edx ... 8048538: 89 c7 mov %eax,%edi 804853a: c1 ea 1c shr $0x1c,%edx 804853d: 85 d2 test %edx,%edx 804853f: 75 79 jne 80485ba <_parse_integer+0xda> With your code, we have on x86_64: 40062d: 49 bf 00 00 00 00 00 movabs $0xf000000000000000,%r15 400634: 00 00 f0 ... 400659: 4c 85 fb test %r15,%rbx 40065c: 75 59 jne 4006b7 <_parse_integer+0xb7> And on i386: 804853c: 89 c7 mov %eax,%edi 804853e: f7 44 24 1c 00 00 00 testl $0xf0000000,0x1c(%esp) 8048545: f0 8048546: 75 79 jne 80485c1 <_parse_integer+0xe1> But it will work too. And I like the pointer indirection removal as well. I'm not sure there's a lot to choose between them, though I prefer mine as I think it produces slightly smaller code. Want me to wrap these changes up with my patch description? David