From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp Date: Fri, 10 Dec 2010 15:27:52 +1100 Message-ID: <20101210042752.GA3144@amd> References: <20101209070938.GA3949@amd> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linus Torvalds , linux-arch@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Nick Piggin Return-path: Content-Disposition: inline In-Reply-To: <20101209070938.GA3949@amd> Sender: linux-arch-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Dec 09, 2010 at 06:09:38PM +1100, Nick Piggin wrote: > So replace it with an open-coded byte comparison. This increases code > size by 24 bytes in the critical __d_lookup_rcu function, but the Actually, if the loop assumes len is non zero (which is the case for dentry compare), then the bloat is only 8 bytes, so not a problem. Also got numbers versus vanilla kernel, out of interest. > speedup is huge, averaging 10 runs of each: > > git diff st user sys elapsed CPU vanilla 1.19 3.21 4.47 98.0 > before 1.15 2.57 3.82 97.1 > after 1.14 2.35 3.61 96.8 > > git diff mt user sys elapsed CPU vanilla 1.57 45.75 3.60 1312 > before 1.27 3.85 1.46 349 > after 1.26 3.54 1.43 333 > Single thread elapsed time improvment vanilla vs vfs 19.23%. Not quite as big as the AMD fam10h speedup, that's probably because Westmere does atomics so damn quickly. Multi thread numbers are no surprise.