From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Cree Subject: Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Date: Mon, 16 Jul 2012 21:53:53 +1200 Message-ID: <5003E4B1.605@orcon.net.nz> References: <20120713233957.6928.87541.reportbug@electro.phys.waikato.ac.nz> <20120714002950.GA3159@burratino> <5000CBCA.8020607@orcon.net.nz> <20120714021856.GA3062@burratino> <50010B84.5030606@orcon.net.nz> <20120714075906.GD3693@burratino> <20120714205049.GA28502@burratino> <20120715212731.GG1986@burratino> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Linus Torvalds , git@vger.kernel.org, Nicolas Pitre To: Jonathan Nieder X-From: git-owner@vger.kernel.org Mon Jul 16 11:54:04 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Sqi0B-0001Nt-TS for gcvg-git-2@plane.gmane.org; Mon, 16 Jul 2012 11:54:04 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752845Ab2GPJx7 (ORCPT ); Mon, 16 Jul 2012 05:53:59 -0400 Received: from nctlincom02.orcon.net.nz ([60.234.4.75]:59997 "EHLO nctlincom02.orcon.net.nz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751542Ab2GPJx5 (ORCPT ); Mon, 16 Jul 2012 05:53:57 -0400 Received: from mx3.orcon.net.nz (mx3.orcon.net.nz [219.88.242.53]) by nctlincom02.orcon.net.nz (8.14.3/8.14.3/Debian-9.4) with ESMTP id q6G9xxDW020992 for ; Mon, 16 Jul 2012 21:59:59 +1200 Received: from Debian-exim by mx3.orcon.net.nz with local (Exim 4.69) (envelope-from ) id 1Sqi02-0004Bl-L2 for git@vger.kernel.org; Mon, 16 Jul 2012 21:53:54 +1200 Received: from 60-234-221-162.bitstream.orcon.net.nz ([60.234.221.162] helo=[192.168.1.5]) by mx3.orcon.net.nz with esmtpsa (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1Sqi02-0004BJ-Bn; Mon, 16 Jul 2012 21:53:54 +1200 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120613 Icedove/3.0.11 In-Reply-To: <20120715212731.GG1986@burratino> X-Enigmail-Version: 1.0.1 X-DSPAM-Check: by mx3.orcon.net.nz on Mon, 16 Jul 2012 21:53:54 +1200 X-DSPAM-Result: Innocent X-DSPAM-Processed: Mon Jul 16 21:53:54 2012 X-DSPAM-Confidence: 0.6524 X-DSPAM-Probability: 0.0000 X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN, default) X-Spam-Score: 0.00 () [Hold at 5.50] X-CanIt-Geo: ip=60.234.221.162; country=NZ; region=E7; city=Auckland; latitude=-36.8667; longitude=174.7667; http://maps.google.com/maps?q=-36.8667,174.7667&z=6 X-CanItPRO-Stream: base:default X-Canit-Stats-ID: 05HyVXXUi - a900286e8408 - 20120716 X-Scanned-By: CanIt (www . roaringpenguin . com) on 172.16.100.175 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 16/07/12 09:27, Jonathan Nieder wrote: > Michael Cree wrote: > >> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: > >>> gcc takes full advantage by converting the get_be32 >>> calls back to a load and bswap and producing a whole bunch of >>> unaligned access traps. >> >> Alpha does not have a bswap (or similar) instruction. > [...] >> . If the pointer is unaligned >> then two neighbouring aligned 32bit loads are required to ensure >> that all four bytes are loaded. If the pointer is aligned then a >> single 32bit load gets all the four bytes. Having never looked at >> the generated assembler code, I nevertheless suspect that is the >> guts of the optimisation --- the compiler can eliminate an access to >> memory if it knows the pointer is aligned. > > How about: > > gcc takes full advantage by using a single 32-bit load, > resulting in a whole bunch of unaligned access traps. Yes, that's good. I just checked the generated assembler and I am correct except that I underestimated the number of memory accesses for the a priori known unaligned data case. It is actually doing four long-word memory access, i.e. a memory access for each individual byte despite that it only needs to do a maximum of two, so the optimisation for the a priori known aligned data saves three memory accesses, not just one. (And probably also saves on a load-load replay trap on the EV6 and later CPUs, i.e., a complete and very costly replay of the whole pipeline when the CPU realises just in the nick of time that it has cocked up the interrelationships between reordered load instructions.) Cheers Michael.