From mboxrd@z Thu Jan 1 00:00:00 1970 From: "J. R. Okajima" Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp Date: Wed, 15 Dec 2010 04:01:53 +0900 Message-ID: <12853.1292353313@jrobl> References: <20101209070938.GA3949@amd> <19324.1291990997@jrobl> <20101213014553.GA6522@amd> <9580.1292225351@jrobl> Cc: Nick Piggin , Linus Torvalds , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Nick Piggin Return-path: In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Nick Piggin: > Well, let's see what turns up. We certainly can try the long * > approach. I suspect on architectures where byte loads are > very slow, gcc will block the loop into larger loads, so it should > be no worse than a normal memcmp call, but if we do explicit > padding we can avoid all the problems associated with tail > handling. Thank you for your reply. But unfortunately I am afraid that I cannot understand what you wrote clearly due to my poor English. What I understood is, - I suggested 'long *' approach - You wrote "not bad and possible, but may not be worth" - I agreed "the approach may not be effective" And you gave deeper consideration, but the result is unchaged which means "'long *' approach may not be worth". Am I right? > In short, I think the change should be suitable for all x86 CPUs, > but I would like to hear more opinions or see numbers for other > cores. I'd like to hear from other x86 experts too. Also I noticed that memcmp for x86_32 is defined as __builtin_memcmp (for x86_64 is "rep cmp"). Why does x86_64 doesn't use __builtin_memcmp? Is it really worse? J. R. Okajima