git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>,
	git@vger.kernel.org, gitster@pobox.com
Cc: David Turner <dturner@twitter.com>
Subject: Re: [PATCH] check_refname_component: Optimize
Date: Wed, 28 May 2014 23:44:32 +0200	[thread overview]
Message-ID: <538658C0.8050001@alum.mit.edu> (raw)
In-Reply-To: <1401311055-480-2-git-send-email-dturner@twitter.com>

On 05/28/2014 11:04 PM, David Turner wrote:
> In a repository with tens of thousands of refs, the command
> ~/git/git-diff-index --cached --quiet --ignore-submodules [revision]
> is a bit slow.  check_refname_component is a major contributor to the
> runtime.  Provide two optimized versions of this function: one for
> machines with SSE4.2, and one for machines without.  The speedup for
> this command on one particular repo (with about 60k refs) is about 12%
> for the SSE version and 8% for the non-SSE version.

Interesting.  Thanks for the patch.

Why do you use "git diff-index" to benchmark your change?  Is there
something particular about that command that leads to especially bad
performance with lots of references?

I assume that most of the time spent in check_refname_component() is
while reading the packed-refs file, right?  If so, there are probably
other git commands that would better measure that time, with less other
work.  For example, "git rev-parse refname" will read the packed-refs
file, too (assuming that "refname" is not a loose reference).  If you
test the speedup of a cheaper command like that, it seems to me that you
will get a better idea of how much you are speeding up the ref-reading
part.  It would also be interesting to see the difference in
milliseconds (rather than a percentage) for some specified number of
references.

I think it's worth using your LUT-based approach to save 8%.  I'm less
sure it's worth the complication and unreadability of using SSE to get
the next 4% speedup.  But the gains might be proven to be more
significant if you benchmark them differently.

I won't critique the code in detail until we have thrashed out the
metaissues, but I made a few comments below about nits that I noticed
when I scanned through.

> Signed-off-by: David Turner <dturner@twitter.com>
> ---
>  Makefile           |   6 +++
>  configure.ac       |   6 +++
>  refs.c             | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  t/t5511-refspec.sh |  13 +++++
>  4 files changed, 172 insertions(+), 5 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index a53f3a8..123e2fc 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1326,6 +1326,11 @@ else
>  		COMPAT_OBJS += compat/win32mmap.o
>  	endif
>  endif
> +ifdef NO_SSE
> +	BASIC_CFLAGS += -DNO_SSE
> +else
> +	BASIC_CFLAGS += -msse4
> +endif
>  ifdef OBJECT_CREATION_USES_RENAMES
>  	COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
>  endif
> @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE
>  	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
>  	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
>  	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
> +	@echo NO_SSE=\''$(subst ','\'',$(subst ','\'',$(NO_SSE)))'\' >>$@
>  ifdef TEST_OUTPUT_DIRECTORY
>  	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@
>  endif
> diff --git a/configure.ac b/configure.ac
> index b711254..71a9429 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -382,6 +382,11 @@ AS_HELP_STRING([],[Tcl/Tk interpreter will be found in a system.]),
>  GIT_PARSE_WITH(tcltk))
>  #
>  
> +# Declare the with-sse/without-sse options.
> +AC_ARG_WITH(sse,
> +AS_HELP_STRING([--with-sse],[use SSE instructions (default is YES)]),
> +GIT_PARSE_WITH(sse))
> +
>  
>  ## Checks for programs.
>  AC_MSG_NOTICE([CHECKS for programs])
> @@ -449,6 +454,7 @@ else
>    fi
>  fi
>  GIT_CONF_SUBST([TCLTK_PATH])
> +GIT_CONF_SUBST([NO_SSE])
>  AC_CHECK_PROGS(ASCIIDOC, [asciidoc])
>  if test -n "$ASCIIDOC"; then
>  	AC_MSG_CHECKING([for asciidoc version])
> diff --git a/refs.c b/refs.c
> index 28d5eca..8f0de04 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -5,6 +5,8 @@
>  #include "dir.h"
>  #include "string-list.h"
>  
> +#include <nmmintrin.h>
> +

You include this file unconditionally, but I doubt that it is present on
all platforms.  I guess it has to be protected with #ifdef or something
at a minimum.

>  /*
>   * Make sure "ref" is something reasonable to have under ".git/refs/";
>   * We do not like it if:
> @@ -29,30 +31,169 @@ static inline int bad_ref_char(int ch)
>  	return 0;
>  }
>  

Please add a comment here about what the values in refname_disposition
signify.  And maybe add some line comments explaining unusual values,
for people who haven't memorized the ASCII encoding; e.g., on the third
line,

/* SP -> 0, '.' -> 2, '-' -> 1 */

Also, this variable could be static.  And if you change your encoding to
use 0 instead of 9 for valid characters, then you could define the table
with an explicit length of 256 and omit the second half of the
initializers, letting those values be initialized to zero automatically.

> +char refname_disposition[] = {
> +       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +       0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 2, 1,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 9, 9, 9, 9, 0,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 9, 0, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 9, 9, 0, 0,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +       9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9
> +};
> +
>  /*
>   * Try to read one refname component from the front of refname.  Return
>   * the length of the component found, or -1 if the component is not
>   * legal.
>   */
> -static int check_refname_component(const char *refname, int flags)
> +static int check_refname_component_1(const char *refname, int flags)
>  {
>  	const char *cp;
>  	char last = '\0';
>  
>  	for (cp = refname; ; cp++) {
> -		char ch = *cp;
> -		if (ch == '\0' || ch == '/')
> +		unsigned char ch = (unsigned char) *cp;
> +		char disp = refname_disposition[ch];
> +		switch(disp) {
> +		case 0:
> +		       return -1;
> +		case 1:
> +			goto out;
> +		case 2:
> +			if (last == '.')
> +				return -1;
>  			break;
> -		if (bad_ref_char(ch))
> -			return -1; /* Illegal character in refname. */
> +		case 3:
> +		       if (last == '@')
> +			       return -1;
> +		       break;
> +	       }

You seem to have some indentation problems.  Please always indent with
TAB characters.

> +
>  		if (last == '.' && ch == '.')
>  			return -1; /* Refname contains "..". */
>  		if (last == '@' && ch == '{')
>  			return -1; /* Refname contains "@{". */
>  		last = ch;
>  	}
> +out:
> +	if (cp == refname)
> +		return 0; /* Component has zero length. */
> +
> +	if (refname[0] == '.') {
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1; /* Component starts with '.'. */
> +		/*
> +		 * Even if leading dots are allowed, don't allow "."
> +		 * as a component (".." is prevented by a rule above).
> +		 */
> +		if (refname[1] == '\0')
> +			return -1; /* Component equals ".". */
> +	}
> +	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> +		return -1; /* Refname ends with ".lock". */
> +	return cp - refname;
> +}
> +
> +#ifdef NO_SSE
> +#define check_refname_component check_refname_component_1
> +#else
> +
> +#ifndef _SIDD_UBYTE_OPS
> +#define _SIDD_UBYTE_OPS                 0x00
> +#define _SIDD_CMP_EQUAL_ANY             0x00
> +#define _SIDD_CMP_RANGES                0x04
> +#define _SIDD_CMP_EQUAL_ORDERED         0x0c
> +#define _SIDD_NEGATIVE_POLARITY         0x10
> +#endif
> +#ifndef PAGE_SIZE
> +#define PAGE_SIZE 4096
> +#endif
> +#define BLOCK_SIZE 16
> +
> +/* Vectorized version of check_refname_component */
> +static int check_refname_component(const char *refname, int flags)
> +{
> +	const __m128i *refname_vec = (__m128i*) refname;
> +
> +	/* Character ranges for characters forbidden in refs; see
> +	 * above */
> +	static const __v16qi bad = {
> +		0x01, 0x20,  0x7e, 0x7f,  0x5e, 0x5e,  0x3a, 0x3a,
> +		0x5b, 0x5c,  0x2a, 0x2a,  0x3f, 0x3f,  0x3f, 0x3f};
> +
> +	static const __v16qi nonslashes = {
> +		'\001', '/' -1, '/' + 1, 0xff,
> +	};
> +
> +	static const __v16qi dotdot = {'.','.',0};
> +	static const __v16qi atcurly = {'@','{',0};
> +
> +	const __m128i *vp;
> +	const char *cp = (const char *)refname_vec;
> +
> +
> +	int dotdotpos = BLOCK_SIZE, atcurlypos = BLOCK_SIZE;
> +	for (vp = refname_vec; ; vp++) {
> +		__m128i tmp;
> +		int endpos;
> +
> +		/* Handle case of forbidden substrings .. and @{ crossing
> +		 * sixteen-byte boudaries */
> +		if (dotdotpos == 15 && *cp == '.')
> +			return -1;
> +
> +		if (atcurlypos == 15 && *cp == '{')
> +			return -1;
> +
> +		if (((uintptr_t) vp & (PAGE_SIZE - 1)) > PAGE_SIZE - BLOCK_SIZE)
> +			/* End-of-page; fall back to slow method for
> +			 * this entire component. */
> +			return check_refname_component_1(refname, flags);
> +
> +		tmp = _mm_lddqu_si128(vp);
> +
> +		/* Find slashes or end-of-string. The double-negative
> +		 * (negative-polarity search for non-slashes) is
> +		 * necessary so that \0 will also be counted.  */
> +		endpos = _mm_cmpistri((__m128i) nonslashes, tmp,
> +				      _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES |
> +				      _SIDD_NEGATIVE_POLARITY);
> +
> +		if (_mm_cmpestrc((__m128i) bad, BLOCK_SIZE, tmp, endpos,
> +				 _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES))
> +			return -1;
> +
> +		dotdotpos = _mm_cmpestri((__m128i) dotdot, 2, tmp, endpos,
> +					 _SIDD_UBYTE_OPS |
> +					 _SIDD_CMP_EQUAL_ORDERED);
> +		if (dotdotpos < 15)
> +			return -1;
> +
> +		atcurlypos = _mm_cmpestri((__m128i) atcurly, 2, tmp, endpos,
> +					  _SIDD_UBYTE_OPS |
> +					  _SIDD_CMP_EQUAL_ORDERED);
> +		if (atcurlypos < 15)
> +			return -1;
> +
> +		if (endpos < BLOCK_SIZE) {
> +			cp = ((const char*) vp) + endpos;
> +			break;
> +		}
> +		cp = (const char*) vp + BLOCK_SIZE;
> +	}
> +
>  	if (cp == refname)
>  		return 0; /* Component has zero length. */
> +
>  	if (refname[0] == '.') {
>  		if (!(flags & REFNAME_DOT_COMPONENT))
>  			return -1; /* Component starts with '.'. */
> @@ -67,6 +208,7 @@ static int check_refname_component(const char *refname, int flags)
>  		return -1; /* Refname ends with ".lock". */
>  	return cp - refname;
>  }
> +#endif
>  
>  int check_refname_format(const char *refname, int flags)
>  {
> diff --git a/t/t5511-refspec.sh b/t/t5511-refspec.sh
> index c289322..0f03f9c 100755
> --- a/t/t5511-refspec.sh
> +++ b/t/t5511-refspec.sh
> @@ -84,4 +84,17 @@ test_refspec push 'refs/heads/*/*/for-linus:refs/remotes/mine/*' invalid
>  test_refspec fetch 'refs/heads/*/for-linus:refs/remotes/mine/*'
>  test_refspec push 'refs/heads/*/for-linus:refs/remotes/mine/*'
>  
> +test_refspec fetch 'refs/heads/a-very-long-refname'
> +test_refspec fetch 'refs/heads/.a-very-long-refname'		invalid
> +test_refspec fetch 'refs/heads/abcdefgh0123..'			invalid
> +test_refspec fetch 'refs/heads/abcdefgh01234..'			invalid
> +test_refspec fetch 'refs/heads/abcdefgh012345..'		invalid
> +test_refspec fetch 'refs/heads/abcdefgh0123456..'		invalid
> +test_refspec fetch 'refs/heads/abcdefgh01234567..'		invalid
> +test_refspec fetch 'refs/heads/abcdefgh0123.a'
> +test_refspec fetch 'refs/heads/abcdefgh01234.a'
> +test_refspec fetch 'refs/heads/abcdefgh012345.a'
> +test_refspec fetch 'refs/heads/abcdefgh0123456.a'
> +test_refspec fetch 'refs/heads/abcdefgh01234567.a'
> +
>  test_done
> 

Please mention in your commit message why you added these tests.  (Are
they to test peculiarities around 16-byte boundaries?)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2014-05-28 21:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 21:04 [PATCH v2 0/1] check_refname_component: Optimize David Turner
2014-05-28 21:04 ` [PATCH] " David Turner
2014-05-28 21:44   ` Michael Haggerty [this message]
2014-05-28 23:49     ` David Turner
2014-05-29 13:41       ` Duy Nguyen
2014-05-29 16:36         ` Junio C Hamano
2014-05-29 23:24           ` Duy Nguyen
2014-05-29 23:41             ` Jeff King
2014-05-29 23:43               ` Duy Nguyen
2014-05-30  0:07                 ` Jeff King
2014-05-30  2:03                   ` Duy Nguyen
2014-05-30  9:47                   ` Michael Haggerty
2014-05-30 17:29                     ` Jeff King
2014-05-31 10:47                       ` Michael Haggerty
2014-05-31 11:21                   ` Duy Nguyen
  -- strict thread matches above, loose matches on Subject: below --
2014-05-28 19:57 David Turner
2014-05-28 21:24 ` Junio C Hamano
2014-05-29 12:19 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538658C0.8050001@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twitter.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).