* [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems [not found] ` <50010B84.5030606@orcon.net.nz> @ 2012-07-14 7:59 ` Jonathan Nieder 2012-07-14 19:11 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2012-07-14 7:59 UTC (permalink / raw) To: Michael Cree; +Cc: git, Nicolas Pitre, Linus Torvalds With 660231aa (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to work reasonably on RISC-y architectures by accessing 32-bit values one byte at a time on machines that prefer that: #define get_unaligned_be32(p) ( \ (*((unsigned char *)(p) + 0) << 24) | \ (*((unsigned char *)(p) + 1) << 16) | \ (*((unsigned char *)(p) + 2) << 8) | \ (*((unsigned char *)(p) + 3) << 0) ) Unfortunately, on big-endian architectures, if p is a pointer to unsigned int then current gcc assumes it is properly aligned and converts this construct to a 32-bit load. The result is the same alignment fault we want to avoid. Fix it by passing around 'data' as an unsigned char * instead of unsigned int * to more clearly document the intended alignment for compilers and humans. The patch renames 'data' to 'block' to make sure all references are updated to reflect its new type. Noticed on an alpha. Typical ARM systems and other little-endian arches don't hit this because with the endianness conversion gcc (rightly) prefers reading one byte at a time even if the pointer is aligned. Reported-and-tested-by: Michael Cree <mcree@orcon.net.nz> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Hi, Michael Cree wrote[1]: > On 14/07/12 14:18, Jonathan Nieder wrote: >> Does this patch help? > > Yes, it does! Unaligned accesses are no longer reported and git is > running noticeably faster. Thanks! Nico, Linus, what do you think? As Michael mentioned, in the long term it might make sense to tweak git to always pass aligned pointers and lengths in the 'data' and 'len' arguments to SHA1_Update, but that's an experiment for another day. [1] http://bugs.debian.org/681532 block-sha1/sha1.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index d8934757..4d72e4bc 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -100,7 +100,7 @@ * Where do we get the source from? The first 16 iterations get it from * the input data, the next mix it from the 512-bit array. */ -#define SHA_SRC(t) get_be32(data + t) +#define SHA_SRC(t) get_be32(block + t*4) #define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ @@ -114,7 +114,7 @@ #define T_40_59(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, ((B&C)+(D&(B^C))) , 0x8f1bbcdc, A, B, C, D, E ) #define T_60_79(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 0xca62c1d6, A, B, C, D, E ) -static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) +static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned char *block) { unsigned int A,B,C,D,E; unsigned int array[16]; @@ -125,7 +125,7 @@ static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) D = ctx->H[3]; E = ctx->H[4]; - /* Round 1 - iterations 0-16 take their input from 'data' */ + /* Round 1 - iterations 0-16 take their input from 'block' */ T_0_15( 0, A, B, C, D, E); T_0_15( 1, E, A, B, C, D); T_0_15( 2, D, E, A, B, C); @@ -251,7 +251,7 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len) data = ((const char *)data + left); if (lenW) return; - blk_SHA1_Block(ctx, ctx->W); + blk_SHA1_Block(ctx, (const unsigned char *) ctx->W); } while (len >= 64) { blk_SHA1_Block(ctx, data); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems 2012-07-14 7:59 ` [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems Jonathan Nieder @ 2012-07-14 19:11 ` Linus Torvalds 2012-07-14 19:50 ` Jonathan Nieder 2012-07-14 20:50 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Jonathan Nieder 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2012-07-14 19:11 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Michael Cree, git, Nicolas Pitre On Sat, Jul 14, 2012 at 12:59 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: > > Unfortunately, on big-endian architectures, if p is a pointer to > unsigned int then current gcc assumes it is properly aligned and > converts this construct to a 32-bit load. This patch seems to entirely depend on the location of the cast. And as far as I can tell, that workaround will in turn depend on just what gets inlined. My gut feel is that this makes the code uglier (the manual "multiply by four" being a prime example) while not really even addressing the problem. I think a much better approach would be to just mark the unsigned int data pointer as being unaligned, or add a "get_unaligned()" helper function (you have to do a structure member and mark the structure packed, I think - I don't think you can just mark an int pointer packed). Sure, that's compiler-dependent, but if a compiler does something like gcc apparently does, it had better support the notion of unaligned pointers. And then gcc might actually do the unaligned word read *optimally* on big-endian architectures, instead of that idiotic byte-at-a-time crap with shifting. Anyway, the whole "noticed on alpha" makes no sense, since alpha isn't even big-endian. So the commit log is insane and misleading too. Alpha is very much little-endian, but maybe gcc turns the thing into an unaligned load followed by a bswap. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems 2012-07-14 19:11 ` Linus Torvalds @ 2012-07-14 19:50 ` Jonathan Nieder 2012-07-14 19:55 ` Linus Torvalds 2012-07-14 20:50 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Jonathan Nieder 1 sibling, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2012-07-14 19:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Michael Cree, git, Nicolas Pitre Linus Torvalds wrote: > On Sat, Jul 14, 2012 at 12:59 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: >> Unfortunately, on big-endian architectures, if p is a pointer to >> unsigned int then current gcc assumes it is properly aligned and >> converts this construct to a 32-bit load. > > This patch seems to entirely depend on the location of the cast. And > as far as I can tell, that workaround will in turn depend on just what > gets inlined. After the patch, what reason does gcc have to expect that 'block' is 32-bit aligned except when it is? The code (including the code I didn't touch) never casts from char * to int * except in get/put_be32 on arches that don't mind unaligned accesses. [...] > Anyway, the whole "noticed on alpha" makes no sense, since alpha isn't > even big-endian. So the commit log is insane and misleading too. Yes, true. Thanks for catching it. Jonathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems 2012-07-14 19:50 ` Jonathan Nieder @ 2012-07-14 19:55 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2012-07-14 19:55 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Michael Cree, git, Nicolas Pitre On Sat, Jul 14, 2012 at 12:50 PM, Jonathan Nieder <jrnieder@gmail.com> wrote: > After the patch, what reason does gcc have to expect that 'block' is > 32-bit aligned except when it is? The code (including the code I > didn't touch) never casts from char * to int * except in get/put_be32 > on arches that don't mind unaligned accesses. Ahh. I was looking at the cast you added: blk_SHA1_Block(ctx, (const unsigned char *) ctx->W); but I guess that if gcc inlines that and sees the original int type, it doesn't matter, because that *is* aligned. So it's ok, but please just make blk_SHA1_Block() take a "const void *" instead and that cast can go away. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints 2012-07-14 19:11 ` Linus Torvalds 2012-07-14 19:50 ` Jonathan Nieder @ 2012-07-14 20:50 ` Jonathan Nieder 2012-07-14 20:56 ` Linus Torvalds 2012-07-15 20:58 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Michael Cree 1 sibling, 2 replies; 10+ messages in thread From: Jonathan Nieder @ 2012-07-14 20:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Michael Cree, git, Nicolas Pitre With 660231aa (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12), blk_SHA1_Update was modified to access 32-bit chunks of memory one byte at a time on arches that prefer that: #define get_be32(p) ( \ (*((unsigned char *)(p) + 0) << 24) | \ (*((unsigned char *)(p) + 1) << 16) | \ (*((unsigned char *)(p) + 2) << 8) | \ (*((unsigned char *)(p) + 3) << 0) ) The code previously accessed these values by just using htonl(*p). Unfortunately, Michael noticed on an Alpha machine that git was using plain 32-bit reads anyway. As soon as we convert a pointer to int *, the compiler can assume that the object pointed to is correctly aligned as an int (C99 section 6.3.2.3 "pointer conversions" paragraph 7), and gcc takes full advantage by converting the get_be32 calls back to a load and bswap and producing a whole bunch of unaligned access traps. So we need to obey the alignment constraints even when only dealing with pointers instead of actual values. Do so by changing the type of 'data' to void *. This patch renames 'data' to 'block' at the same time to make sure all references are updated to reflect the new type. Reported-and-tested-by: Michael Cree <mcree@orcon.net.nz> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Linus Torvalds wrote: > Anyway, the whole "noticed on alpha" makes no sense, since alpha isn't > even big-endian. Thanks again for catching that. Here's a reroll. block-sha1/sha1.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index d8934757..10fd94d1 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -100,7 +100,7 @@ * Where do we get the source from? The first 16 iterations get it from * the input data, the next mix it from the 512-bit array. */ -#define SHA_SRC(t) get_be32(data + t) +#define SHA_SRC(t) get_be32((unsigned char *) block + t*4) #define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ @@ -114,7 +114,7 @@ #define T_40_59(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, ((B&C)+(D&(B^C))) , 0x8f1bbcdc, A, B, C, D, E ) #define T_60_79(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (B^C^D) , 0xca62c1d6, A, B, C, D, E ) -static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) +static void blk_SHA1_Block(blk_SHA_CTX *ctx, const void *block) { unsigned int A,B,C,D,E; unsigned int array[16]; @@ -125,7 +125,7 @@ static void blk_SHA1_Block(blk_SHA_CTX *ctx, const unsigned int *data) D = ctx->H[3]; E = ctx->H[4]; - /* Round 1 - iterations 0-16 take their input from 'data' */ + /* Round 1 - iterations 0-16 take their input from 'block' */ T_0_15( 0, A, B, C, D, E); T_0_15( 1, E, A, B, C, D); T_0_15( 2, D, E, A, B, C); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints 2012-07-14 20:50 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Jonathan Nieder @ 2012-07-14 20:56 ` Linus Torvalds 2012-07-14 21:57 ` [PATCH] block-sha1: put macro arguments in parentheses Jonathan Nieder 2012-07-15 20:58 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Michael Cree 1 sibling, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2012-07-14 20:56 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Michael Cree, git, Nicolas Pitre Looks good to me. I'd suggest doing the macro argument expansion in #define SHA_SRC(t) get_be32((unsigned char *) block + t*4) with parenthesis around 't', but that's a fairly independent thing. The old code didn't do that either. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] block-sha1: put macro arguments in parentheses 2012-07-14 20:56 ` Linus Torvalds @ 2012-07-14 21:57 ` Jonathan Nieder 0 siblings, 0 replies; 10+ messages in thread From: Jonathan Nieder @ 2012-07-14 21:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Michael Cree, git, Nicolas Pitre 't' is currently always a numeric constant, but it can't hurt to prepare for the day that it becomes useful for a caller to pass in a more complex expression. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- block-sha1/sha1.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index 10fd94d1..6f885c43 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -100,8 +100,8 @@ * Where do we get the source from? The first 16 iterations get it from * the input data, the next mix it from the 512-bit array. */ -#define SHA_SRC(t) get_be32((unsigned char *) block + t*4) -#define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) +#define SHA_SRC(t) get_be32((unsigned char *) block + (t)*4) +#define SHA_MIX(t) SHA_ROL(W((t)+13) ^ W((t)+8) ^ W((t)+2) ^ W(t), 1); #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ unsigned int TEMP = input(t); setW(t, TEMP); \ -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints 2012-07-14 20:50 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Jonathan Nieder 2012-07-14 20:56 ` Linus Torvalds @ 2012-07-15 20:58 ` Michael Cree 2012-07-15 21:27 ` Jonathan Nieder 1 sibling, 1 reply; 10+ messages in thread From: Michael Cree @ 2012-07-15 20:58 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Linus Torvalds, git, Nicolas Pitre Jonathan, Thanks for acting so promptly on this. Just a minor point on the commit message below. On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: > Unfortunately, Michael noticed on an Alpha machine that git was using > plain 32-bit reads anyway. As soon as we convert a pointer to int *, > the compiler can assume that the object pointed to is correctly > aligned as an int (C99 section 6.3.2.3 "pointer conversions" > paragraph 7), and gcc takes full advantage by converting the get_be32 > calls back to a load and bswap and producing a whole bunch of > unaligned access traps. Alpha does not have a bswap (or similar) instruction. I do recall reading somewhere that one can get halfway to swapping endianness by treating a number as a VAX float and converting to IEEE float, with only a couple or so more instructions to achieve the full swap. But I don't think that is available to us anyway because on Debian we compile for generic Alpha. I suspect that the crux of the matter is that compiling for generic Alpha also means that byte access instructions are not permitted. In other words, all memory accesses must be long words (32bits) or quad words (64bits). In the get_be32 routine the compiler _has_ to issue 32bit memory accesses whether the base pointer is char * or int *. If the pointer is unaligned then two neighbouring aligned 32bit loads are required to ensure that all four bytes are loaded. If the pointer is aligned then a single 32bit load gets all the four bytes. Having never looked at the generated assembler code, I nevertheless suspect that is the guts of the optimisation --- the compiler can eliminate an access to memory if it knows the pointer is aligned. Cheers Michael. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints 2012-07-15 20:58 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Michael Cree @ 2012-07-15 21:27 ` Jonathan Nieder 2012-07-16 9:53 ` Michael Cree 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2012-07-15 21:27 UTC (permalink / raw) To: Michael Cree; +Cc: Linus Torvalds, git, Nicolas Pitre Hi, Michael Cree wrote: > On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: >> gcc takes full advantage by converting the get_be32 >> calls back to a load and bswap and producing a whole bunch of >> unaligned access traps. > > Alpha does not have a bswap (or similar) instruction. [...] > . If the pointer is unaligned > then two neighbouring aligned 32bit loads are required to ensure > that all four bytes are loaded. If the pointer is aligned then a > single 32bit load gets all the four bytes. Having never looked at > the generated assembler code, I nevertheless suspect that is the > guts of the optimisation --- the compiler can eliminate an access to > memory if it knows the pointer is aligned. How about: gcc takes full advantage by using a single 32-bit load, resulting in a whole bunch of unaligned access traps. You can see the generated assembler with "make block-sha1/sha1.s" and reading the resulting sha1.s file in the current working directory (oops). I don't have an alpha cross-compiler installed or access to an alpha to check for myself. -- >8 -- Subject: Makefile: fix location of listing produced by "make subdir/foo.s" When I invoke "make block-sha1/sha1.s", 'make' runs $(CC) -S without specifying where it should put its output and the output ends up in ./sha1.s. Confusing. Add an -o option to the .s rule to fix this. We were already doing that for most compiler invocations but had forgotten it for the assembler listings. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Thanks, Jonathan Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index dc709029..e8d2798f 100644 --- a/Makefile +++ b/Makefile @@ -2230,7 +2230,7 @@ $(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) endif %.s: %.c GIT-CFLAGS FORCE - $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< + $(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $< ifdef USE_COMPUTED_HEADER_DEPENDENCIES # Take advantage of gcc's on-the-fly dependency generation -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints 2012-07-15 21:27 ` Jonathan Nieder @ 2012-07-16 9:53 ` Michael Cree 0 siblings, 0 replies; 10+ messages in thread From: Michael Cree @ 2012-07-16 9:53 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Linus Torvalds, git, Nicolas Pitre On 16/07/12 09:27, Jonathan Nieder wrote: > Michael Cree wrote: > >> On 15/07/2012, at 8:50 AM, Jonathan Nieder wrote: > >>> gcc takes full advantage by converting the get_be32 >>> calls back to a load and bswap and producing a whole bunch of >>> unaligned access traps. >> >> Alpha does not have a bswap (or similar) instruction. > [...] >> . If the pointer is unaligned >> then two neighbouring aligned 32bit loads are required to ensure >> that all four bytes are loaded. If the pointer is aligned then a >> single 32bit load gets all the four bytes. Having never looked at >> the generated assembler code, I nevertheless suspect that is the >> guts of the optimisation --- the compiler can eliminate an access to >> memory if it knows the pointer is aligned. > > How about: > > gcc takes full advantage by using a single 32-bit load, > resulting in a whole bunch of unaligned access traps. Yes, that's good. I just checked the generated assembler and I am correct except that I underestimated the number of memory accesses for the a priori known unaligned data case. It is actually doing four long-word memory access, i.e. a memory access for each individual byte despite that it only needs to do a maximum of two, so the optimisation for the a priori known aligned data saves three memory accesses, not just one. (And probably also saves on a load-load replay trap on the EV6 and later CPUs, i.e., a complete and very costly replay of the whole pipeline when the CPU realises just in the nick of time that it has cocked up the interrelationships between reordered load instructions.) Cheers Michael. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-07-16 9:54 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20120713233957.6928.87541.reportbug@electro.phys.waikato.ac.nz> [not found] ` <20120714002950.GA3159@burratino> [not found] ` <5000CBCA.8020607@orcon.net.nz> [not found] ` <20120714021856.GA3062@burratino> [not found] ` <50010B84.5030606@orcon.net.nz> 2012-07-14 7:59 ` [PATCH maint-1.6.5] block-sha1: avoid unaligned accesses on some big-endian systems Jonathan Nieder 2012-07-14 19:11 ` Linus Torvalds 2012-07-14 19:50 ` Jonathan Nieder 2012-07-14 19:55 ` Linus Torvalds 2012-07-14 20:50 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Jonathan Nieder 2012-07-14 20:56 ` Linus Torvalds 2012-07-14 21:57 ` [PATCH] block-sha1: put macro arguments in parentheses Jonathan Nieder 2012-07-15 20:58 ` [PATCH maint-1.6.5 v2] block-sha1: avoid pointer conversion that violates alignment constraints Michael Cree 2012-07-15 21:27 ` Jonathan Nieder 2012-07-16 9:53 ` Michael Cree
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).