git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/2] refs.c: optimize check_refname_component()
@ 2014-06-04  3:38 David Turner
  2014-06-04  3:38 ` [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component David Turner
  0 siblings, 1 reply; 14+ messages in thread
From: David Turner @ 2014-06-04  3:38 UTC (permalink / raw)
  To: git, gitster, mhagger; +Cc: David Turner

In a repository with many refs, check_refname_component can be a major
contributor to the runtime of some git commands. One such command is

git rev-parse HEAD

Timings for one particular repo, with about 60k refs, almost all
packed, are:

Old: 35 ms
New: 29 ms

Many other commands which read refs are also sped up.

Signed-off-by: David Turner <dturner@twitter.com>
---
 refs.c             | 67 +++++++++++++++++++++++++++++++-----------------------
 t/t5511-refspec.sh |  6 ++++-
 2 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/refs.c b/refs.c
index 28d5eca..46139d2 100644
--- a/refs.c
+++ b/refs.c
@@ -6,8 +6,29 @@
 #include "string-list.h"
 
 /*
- * Make sure "ref" is something reasonable to have under ".git/refs/";
- * We do not like it if:
+ * How to handle various characters in refnames:
+ * 0: An acceptable character for refs
+ * 1: End-of-component
+ * 2: ., look for a preceding . to reject .. in refs
+ * 3: {, look for a preceding @ to reject @{ in refs
+ * 4: A bad character: ASCII control characters, "~", "^", ":" or SP
+ */
+static unsigned char refname_disposition[256] = {
+	1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+	4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+	4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 2, 1,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 4, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
+};
+
+/*
+ * Try to read one refname component from the front of refname.
+ * Return the length of the component found, or -1 if the component is
+ * not legal.  It is legal if it is something reasonable to have under
+ * ".git/refs/"; We do not like it if:
  *
  * - any path component of it begins with ".", or
  * - it has double dots "..", or
@@ -16,41 +37,31 @@
  * - it ends with ".lock"
  * - it contains a "\" (backslash)
  */
-
-/* Return true iff ch is not allowed in reference names. */
-static inline int bad_ref_char(int ch)
-{
-	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
-	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
-		return 1;
-	/* 2.13 Pattern Matching Notation */
-	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
-		return 1;
-	return 0;
-}
-
-/*
- * Try to read one refname component from the front of refname.  Return
- * the length of the component found, or -1 if the component is not
- * legal.
- */
 static int check_refname_component(const char *refname, int flags)
 {
 	const char *cp;
 	char last = '\0';
 
 	for (cp = refname; ; cp++) {
-		char ch = *cp;
-		if (ch == '\0' || ch == '/')
+		int ch = *cp & 255;
+		unsigned char disp = refname_disposition[ch];
+		switch(disp) {
+		case 1:
+			goto out;
+		case 2:
+			if (last == '.')
+				return -1; /* Refname contains "..". */
+			break;
+		case 3:
+			if (last == '@')
+				return -1; /* Refname contains "@{". */
 			break;
-		if (bad_ref_char(ch))
-			return -1; /* Illegal character in refname. */
-		if (last == '.' && ch == '.')
-			return -1; /* Refname contains "..". */
-		if (last == '@' && ch == '{')
-			return -1; /* Refname contains "@{". */
+		case 4:
+			return -1;
+		}
 		last = ch;
 	}
+out:
 	if (cp == refname)
 		return 0; /* Component has zero length. */
 	if (refname[0] == '.') {
diff --git a/t/t5511-refspec.sh b/t/t5511-refspec.sh
index c289322..de6db86 100755
--- a/t/t5511-refspec.sh
+++ b/t/t5511-refspec.sh
@@ -5,7 +5,6 @@ test_description='refspec parsing'
 . ./test-lib.sh
 
 test_refspec () {
-
 	kind=$1 refspec=$2 expect=$3
 	git config remote.frotz.url "." &&
 	git config --remove-section remote.frotz &&
@@ -84,4 +83,9 @@ test_refspec push 'refs/heads/*/*/for-linus:refs/remotes/mine/*' invalid
 test_refspec fetch 'refs/heads/*/for-linus:refs/remotes/mine/*'
 test_refspec push 'refs/heads/*/for-linus:refs/remotes/mine/*'
 
+good=$(printf '\303\204')
+test_refspec fetch "refs/heads/${good}"
+bad=$(printf '\011tab')
+test_refspec fetch "refs/heads/${bad}"				invalid
+
 test_done
-- 
2.0.0.rc1.18.gf763c0f

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04  3:38 [PATCH v6 1/2] refs.c: optimize check_refname_component() David Turner
@ 2014-06-04  3:38 ` David Turner
  2014-06-04  8:04   ` Torsten Bögershausen
  0 siblings, 1 reply; 14+ messages in thread
From: David Turner @ 2014-06-04  3:38 UTC (permalink / raw)
  To: git, gitster, mhagger; +Cc: David Turner

Optimize check_refname_component using SSE4.2, where available.

git rev-parse HEAD is a good test-case for this, since it does almost
nothing except parse refs.  For one particular repo with about 60k
refs, almost all packed, the timings are:

Look up table: 29 ms
SSE4.2:        25 ms

This is about a 15% improvement.

The configure.ac changes include code from the GNU C Library written
by Joseph S. Myers <joseph at codesourcery dot com>.

Signed-off-by: David Turner <dturner@twitter.com>
---
 Makefile           |   6 +++
 aclocal.m4         |   6 +++
 configure.ac       |  17 ++++++++
 git-compat-util.h  |  22 ++++++++++
 refs.c             | 117 ++++++++++++++++++++++++++++++++++++++++++++++-------
 t/t5511-refspec.sh |  13 ++++++
 6 files changed, 166 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index a53f3a8..dd2127a 100644
--- a/Makefile
+++ b/Makefile
@@ -1326,6 +1326,11 @@ else
 		COMPAT_OBJS += compat/win32mmap.o
 	endif
 endif
+ifdef NO_SSE42
+	BASIC_CFLAGS += -DNO_SSE42
+else
+	BASIC_CFLAGS += -msse4.2
+endif
 ifdef OBJECT_CREATION_USES_RENAMES
 	COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
 endif
@@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
+	@echo NO_SSE42=\''$(subst ','\'',$(subst ','\'',$(NO_SSE42)))'\' >>$@
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@
 endif
diff --git a/aclocal.m4 b/aclocal.m4
index f11bc7e..d9f3f19 100644
--- a/aclocal.m4
+++ b/aclocal.m4
@@ -38,3 +38,9 @@ AC_DEFUN([TYPE_SOCKLEN_T],
       [#include <sys/types.h>
 #include <sys/socket.h>])
 ])
+
+dnl Test a compiler option or options with an empty input file.
+dnl LIBC_TRY_CC_OPTION([options], [action-if-true], [action-if-false])
+AC_DEFUN([LIBC_TRY_CC_OPTION],
+[AS_IF([AC_TRY_COMMAND([${CC-cc} $1 -xc /dev/null -S -o /dev/null])],
+	[$2], [$3])])
diff --git a/configure.ac b/configure.ac
index b711254..3a5bda9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -382,6 +382,23 @@ AS_HELP_STRING([],[Tcl/Tk interpreter will be found in a system.]),
 GIT_PARSE_WITH(tcltk))
 #
 
+# Declare the with-sse42/without-sse42 options.
+AC_ARG_WITH(sse42,
+AS_HELP_STRING([--with-sse42],[use SSE4.2 instructions])
+AS_HELP_STRING([],[(default is YES if your compiler supports -msse4.2)]),
+GIT_PARSE_WITH(sse42))
+
+if test "$NO_SSE42" != "YesPlease"; then
+   dnl Check if -msse4.2 works.
+   AC_CACHE_CHECK(for SSE4.2 support, cc_cv_sse42, [dnl
+   LIBC_TRY_CC_OPTION([-msse4.2], [cc_cv_sse42=yes], [cc_cv_sse42=no])
+   ])
+   if test $cc_cv_sse42 = no; then
+     NO_SSE42=1
+   fi
+fi
+
+GIT_CONF_SUBST([NO_SSE42])
 
 ## Checks for programs.
 AC_MSG_NOTICE([CHECKS for programs])
diff --git a/git-compat-util.h b/git-compat-util.h
index f6d3a46..218d510 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #endif
 #endif
 
+#ifndef NO_SSE42
+#include <nmmintrin.h>
+/*
+ * Clang ships with a version of nmmintrin.h that's incomplete; if
+ * necessary, we define the constants that we're going to use.
+ */
+#ifndef _SIDD_UBYTE_OPS
+#define _SIDD_UBYTE_OPS                 0x00
+#define _SIDD_CMP_EQUAL_ANY             0x00
+#define _SIDD_CMP_RANGES                0x04
+#define _SIDD_CMP_EQUAL_ORDERED         0x0c
+#define _SIDD_NEGATIVE_POLARITY         0x10
+#endif
+
+/* This is the system memory page size; it's used so that we can read
+ * outside the bounds of an allocation without segfaulting.
+ */
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+#endif
+
 #ifdef UNRELIABLE_FSTAT
 #define fstat_is_reliable() 0
 #else
diff --git a/refs.c b/refs.c
index 46139d2..2fe0075 100644
--- a/refs.c
+++ b/refs.c
@@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = {
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
 };
 
+static int check_refname_component_trailer(const char *cp, const char *refname, int flags)
+{
+	if (cp == refname)
+		return 0; /* Component has zero length. */
+	if (refname[0] == '.') {
+		if (!(flags & REFNAME_DOT_COMPONENT))
+			return -1; /* Component starts with '.'. */
+		/*
+		 * Even if leading dots are allowed, don't allow "."
+		 * as a component (".." is prevented by a rule above).
+		 */
+		if (refname[1] == '\0')
+			return -1; /* Component equals ".". */
+	}
+	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
+		return -1; /* Refname ends with ".lock". */
+	return cp - refname;
+}
+
 /*
  * Try to read one refname component from the front of refname.
  * Return the length of the component found, or -1 if the component is
@@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = {
  * - it ends with ".lock"
  * - it contains a "\" (backslash)
  */
-static int check_refname_component(const char *refname, int flags)
+static int check_refname_component_1(const char *refname, int flags)
 {
 	const char *cp;
 	char last = '\0';
@@ -47,7 +66,7 @@ static int check_refname_component(const char *refname, int flags)
 		unsigned char disp = refname_disposition[ch];
 		switch(disp) {
 		case 1:
-			goto out;
+			return check_refname_component_trailer(cp, refname, flags);
 		case 2:
 			if (last == '.')
 				return -1; /* Refname contains "..". */
@@ -61,23 +80,91 @@ static int check_refname_component(const char *refname, int flags)
 		}
 		last = ch;
 	}
-out:
-	if (cp == refname)
-		return 0; /* Component has zero length. */
-	if (refname[0] == '.') {
-		if (!(flags & REFNAME_DOT_COMPONENT))
-			return -1; /* Component starts with '.'. */
+}
+
+#ifdef NO_SSE42
+#define check_refname_component check_refname_component_1
+#else
+#define SSE_VECTOR_BYTES 16
+
+/* Vectorized version of check_refname_component */
+static int check_refname_component(const char *refname, int flags)
+{
+	const __m128i *refname_vec = (__m128i*) refname;
+
+	/* Character ranges for characters forbidden in refs; see above */
+	static const __v16qi bad = {
+		0x01, 0x20,  0x7e, 0x7f,  0x5e, 0x5e,  0x3a, 0x3a,
+		0x5b, 0x5c,  0x2a, 0x2a,  0x3f, 0x3f,  0x3f, 0x3f};
+
+	static const __v16qi nonslashes = {
+		'\001', '/' -1, '/' + 1, 0xff,
+	};
+
+	static const __v16qi dotdot = {'.', '.', 0};
+	static const __v16qi atcurly = {'@', '{', 0};
+
+	const __m128i *vp;
+	const char *cp = (const char *)refname_vec;
+
+	int dotdotpos = SSE_VECTOR_BYTES, atcurlypos = SSE_VECTOR_BYTES;
+	for (vp = refname_vec; ; vp++) {
+		__m128i tmp;
+		int endpos;
+
 		/*
-		 * Even if leading dots are allowed, don't allow "."
-		 * as a component (".." is prevented by a rule above).
+		 * Handle case of forbidden substrings .. and @{ crossing
+		 * sixteen-byte boundaries
 		 */
-		if (refname[1] == '\0')
-			return -1; /* Component equals ".". */
+		if (dotdotpos == 15 && *cp == '.')
+			return -1;
+
+		if (atcurlypos == 15 && *cp == '{')
+			return -1;
+
+		if (((uintptr_t) vp % PAGE_SIZE) > PAGE_SIZE - SSE_VECTOR_BYTES)
+			/*
+			 * End-of-page; fall back to slow method for
+			 * this entire component.
+			 */
+			return check_refname_component_1(refname, flags);
+
+		tmp = _mm_lddqu_si128(vp);
+
+		/*
+		 * Find slashes or end-of-string. The double-negative
+		 * (negative-polarity search for non-slashes) is
+		 * necessary so that \0 will also be counted.
+		 */
+		endpos = _mm_cmpistri((__m128i) nonslashes, tmp,
+				      _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES |
+				      _SIDD_NEGATIVE_POLARITY);
+
+		if (_mm_cmpestrc((__m128i) bad, SSE_VECTOR_BYTES, tmp, endpos,
+				 _SIDD_UBYTE_OPS | _SIDD_CMP_RANGES))
+			return -1;
+
+		dotdotpos = _mm_cmpestri((__m128i) dotdot, 2, tmp, endpos,
+					 _SIDD_UBYTE_OPS |
+					 _SIDD_CMP_EQUAL_ORDERED);
+		if (dotdotpos < 15)
+			return -1;
+
+		atcurlypos = _mm_cmpestri((__m128i) atcurly, 2, tmp, endpos,
+					  _SIDD_UBYTE_OPS |
+					  _SIDD_CMP_EQUAL_ORDERED);
+		if (atcurlypos < 15)
+			return -1;
+
+		if (endpos < SSE_VECTOR_BYTES) {
+			cp = ((const char*) vp) + endpos;
+			break;
+		}
+		cp = (const char*) vp + SSE_VECTOR_BYTES;
 	}
-	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
-		return -1; /* Refname ends with ".lock". */
-	return cp - refname;
+	return check_refname_component_trailer(cp, refname, flags);
 }
+#endif
 
 int check_refname_format(const char *refname, int flags)
 {
diff --git a/t/t5511-refspec.sh b/t/t5511-refspec.sh
index de6db86..7f1bd74 100755
--- a/t/t5511-refspec.sh
+++ b/t/t5511-refspec.sh
@@ -88,4 +88,17 @@ test_refspec fetch "refs/heads/${good}"
 bad=$(printf '\011tab')
 test_refspec fetch "refs/heads/${bad}"				invalid
 
+test_refspec fetch 'refs/heads/a-very-long-refname'
+test_refspec fetch 'refs/heads/.a-very-long-refname'		invalid
+test_refspec fetch 'refs/heads/abcdefgh0123..'			invalid
+test_refspec fetch 'refs/heads/abcdefgh01234..'			invalid
+test_refspec fetch 'refs/heads/abcdefgh012345..'		invalid
+test_refspec fetch 'refs/heads/abcdefgh0123456..'		invalid
+test_refspec fetch 'refs/heads/abcdefgh01234567..'		invalid
+test_refspec fetch 'refs/heads/abcdefgh0123.a'
+test_refspec fetch 'refs/heads/abcdefgh01234.a'
+test_refspec fetch 'refs/heads/abcdefgh012345.a'
+test_refspec fetch 'refs/heads/abcdefgh0123456.a'
+test_refspec fetch 'refs/heads/abcdefgh01234567.a'
+
 test_done
-- 
2.0.0.rc1.18.gf763c0f

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04  3:38 ` [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component David Turner
@ 2014-06-04  8:04   ` Torsten Bögershausen
  2014-06-04 11:21     ` Duy Nguyen
  2014-06-04 21:14     ` David Turner
  0 siblings, 2 replies; 14+ messages in thread
From: Torsten Bögershausen @ 2014-06-04  8:04 UTC (permalink / raw)
  To: David Turner, git, gitster, mhagger; +Cc: David Turner


On 2014-06-04 05.38, David Turner wrote:
[]
> []
> diff --git a/Makefile b/Makefile
> index a53f3a8..dd2127a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1326,6 +1326,11 @@ else
>  		COMPAT_OBJS += compat/win32mmap.o
>  	endif
>  endif
> +ifdef NO_SSE42
> +	BASIC_CFLAGS += -DNO_SSE42
> +else
> +	BASIC_CFLAGS += -msse4.2
> +endif
This does work for some people, but break for others, like the systems in my test-lab.
On 2 different systems the gcc has support for -msse4.2, but the processor has not,
and t5511 fails with "Illegal instruction".
How can that be?
The maintainer of a Linux distro wants to ship gcc with all possible features,
an the end-user can compile the code with all the features his very processor has.

On the other hand, a pre-compiled package like e.g. Git is compiled into a binary package
with all the latest features switched of, to be able to run the binary on as many different
processor variants as possible.

He already needs to make 3 binaries only for x86:

- the minimum version is a 32 bit processor like 486/586/686.
- a "medium" version for systems with 4GB RAM (or more) which have 32 bit processors with PAE (686-pae)
- a version for x86_64

E.g. a maintainer wants to have SSE42 enabled, when he builds Git for his system,
but disabled when he builds an RPM.
The people compiling Git need to know what the binary is used for, how about
using something like this in Makefile:

ifdef HAVE_SSE42
	BASIC_CFLAGS += -msse4.2 -DHAS_SSE42

+endif

>  ifdef OBJECT_CREATION_USES_RENAMES
>  	COMPAT_CFLAGS += -DOBJECT_CREATION_MODE=1
>  endif
> @@ -2199,6 +2204,7 @@ GIT-BUILD-OPTIONS: FORCE
>  	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@
>  	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@
>  	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@
> +	@echo NO_SSE42=\''$(subst ','\'',$(subst ','\'',$(NO_SSE42)))'\' >>$@
Same here: Use HAVE_SSE42 rather than NO_SSE42

> diff --git a/aclocal.m4 b/aclocal.m4
> index f11bc7e..d9f3f19 100644
> --- a/aclocal.m4
> +++ b/aclocal.m4
> @@ -38,3 +38,9 @@ AC_DEFUN([TYPE_SOCKLEN_T],
>        [#include <sys/types.h>
>  #include <sys/socket.h>])
>  ])
As the whole detection logic does not work as expected (we need to compile and test-run the code,
not only compile),
can we drop this part completely ? (at least for the first round)

> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size,
>  #endif
>  #endif
>  
> +#ifndef NO_SSE42
> +#include <nmmintrin.h>
> +/*
> + * Clang ships with a version of nmmintrin.h that's incomplete; if
> + * necessary, we define the constants that we're going to use.
> + */
> +#ifndef _SIDD_UBYTE_OPS
> +#define _SIDD_UBYTE_OPS                 0x00
> +#define _SIDD_CMP_EQUAL_ANY             0x00
> +#define _SIDD_CMP_RANGES                0x04
> +#define _SIDD_CMP_EQUAL_ORDERED         0x0c
> +#define _SIDD_NEGATIVE_POLARITY         0x10
> +#endif
Why do this defines end up in git-compat-util.h when they are needed by one file?
(see even below)

> --- a/refs.c
> +++ b/refs.c
> @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = {
>  	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
>  };
>  
> +static int check_refname_component_trailer(const char *cp, const char *refname, int flags)
> +{
> +	if (cp == refname)
> +		return 0; /* Component has zero length. */
> +	if (refname[0] == '.') {
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1; /* Component starts with '.'. */
> +		/*
> +		 * Even if leading dots are allowed, don't allow "."
> +		 * as a component (".." is prevented by a rule above).
> +		 */
> +		if (refname[1] == '\0')
> +			return -1; /* Component equals ".". */
> +	}
> +	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> +		return -1; /* Refname ends with ".lock". */
> +	return cp - refname;
> +}
> +
>  /*
>   * Try to read one refname component from the front of refname.
>   * Return the length of the component found, or -1 if the component is
> @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = {
>   * - it ends with ".lock"
>   * - it contains a "\" (backslash)
>   */
> -static int check_refname_component(const char *refname, int flags)
> +static int check_refname_component_1(const char *refname, int flags)
The name check_refname_component_1() doesn't tell too much,
(check_refname_component_sse42()  or check_refname_component_nonsse42() say more)

can I suggest to move all SSE code out to a file under compat/,
like compat/refs_sse42.c, or something similar ?
(And here we need the missing sse4 defines from git-compat-util.h)
>  {
>  	const char *cp;
>  	char last = '\0';
> @@ -47,7 +66,7 @@ static int check_refname_component(const char *refname, int flags)
>  		unsigned char disp = refname_disposition[ch];
>  		switch(disp) {
>  		case 1:
> -			goto out;
> +			return check_refname_component_trailer(cp, refname, flags);
>  		case 2:
>  			if (last == '.')
>  				return -1; /* Refname contains "..". */
> @@ -61,23 +80,91 @@ static int check_refname_component(const char *refname, int flags)
>  		}
>  		last = ch;
>  	}
> -out:
> -	if (cp == refname)
> -		return 0; /* Component has zero length. */
> -	if (refname[0] == '.') {
> -		if (!(flags & REFNAME_DOT_COMPONENT))
> -			return -1; /* Component starts with '.'. */
> +}
> +
> +#ifdef NO_SSE42
> +#define check_refname_component check_refname_component_1
> +#else
> +#define SSE_VECTOR_BYTES 16
See above, all sse42 related stuff, should it be isolated in a seperate file?
> +
> +/* Vectorized version of check_refname_component */
> +static int check_refname_component(const char *refname, int flags)
> +{
[]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04  8:04   ` Torsten Bögershausen
@ 2014-06-04 11:21     ` Duy Nguyen
  2014-06-04 14:25       ` Torsten Bögershausen
  2014-06-04 21:14     ` David Turner
  1 sibling, 1 reply; 14+ messages in thread
From: Duy Nguyen @ 2014-06-04 11:21 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: David Turner, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen <tboegi@web.de> wrote:
>
> On 2014-06-04 05.38, David Turner wrote:
> []
>> []
>> diff --git a/Makefile b/Makefile
>> index a53f3a8..dd2127a 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1326,6 +1326,11 @@ else
>>               COMPAT_OBJS += compat/win32mmap.o
>>       endif
>>  endif
>> +ifdef NO_SSE42
>> +     BASIC_CFLAGS += -DNO_SSE42
>> +else
>> +     BASIC_CFLAGS += -msse4.2
>> +endif
> This does work for some people, but break for others, like the systems in my test-lab.
> On 2 different systems the gcc has support for -msse4.2, but the processor has not,
> and t5511 fails with "Illegal instruction".
> How can that be?
> The maintainer of a Linux distro wants to ship gcc with all possible features,
> an the end-user can compile the code with all the features his very processor has.

I think glibc code uses cpuid instruction to decide whether to use
optimized version. May be we can do the same? If we go that route and
have a way to detect sse support from compiler, then we can drop
NO_SSE42, enable all and pick one at runtime.
-- 
Duy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04 11:21     ` Duy Nguyen
@ 2014-06-04 14:25       ` Torsten Bögershausen
  2014-06-04 21:16         ` David Turner
  0 siblings, 1 reply; 14+ messages in thread
From: Torsten Bögershausen @ 2014-06-04 14:25 UTC (permalink / raw)
  To: Duy Nguyen, Torsten Bögershausen
  Cc: David Turner, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On 2014-06-04 13.21, Duy Nguyen wrote:
> On Wed, Jun 4, 2014 at 3:04 PM, Torsten Bögershausen <tboegi@web.de> wrote:
>>
>> On 2014-06-04 05.38, David Turner wrote:
>> []
>>> []
>>> diff --git a/Makefile b/Makefile
>>> index a53f3a8..dd2127a 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -1326,6 +1326,11 @@ else
>>>               COMPAT_OBJS += compat/win32mmap.o
>>>       endif
>>>  endif
>>> +ifdef NO_SSE42
>>> +     BASIC_CFLAGS += -DNO_SSE42
>>> +else
>>> +     BASIC_CFLAGS += -msse4.2
>>> +endif
>> This does work for some people, but break for others, like the systems in my test-lab.
>> On 2 different systems the gcc has support for -msse4.2, but the processor has not,
>> and t5511 fails with "Illegal instruction".
>> How can that be?
>> The maintainer of a Linux distro wants to ship gcc with all possible features,
>> an the end-user can compile the code with all the features his very processor has.
> 
> I think glibc code uses cpuid instruction to decide whether to use
> optimized version. May be we can do the same? If we go that route and
> have a way to detect sse support from compiler, then we can drop
> NO_SSE42, enable all and pick one at runtime.
> 
Running make under a non-X86 processor like arm fails, as his gcc does not have -msse4.2

On the other hand, looking here: 
http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html
and looking into refs.c,
it seems as if we can try to run 
strcspn(refname, bad_characters)
and 
strstr(refname, "@{"
and 
strstr(refname, ".."
on each refname, instead of checking each char in a loop.
The library will pick the fastest version for strcspn() automatically.

David, the repo you run the tests on, is it public?
Or is there a public repo with this many refs ?
Or can you make a dummy repo with 60k refs ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04  8:04   ` Torsten Bögershausen
  2014-06-04 11:21     ` Duy Nguyen
@ 2014-06-04 21:14     ` David Turner
  2014-06-04 21:46       ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: David Turner @ 2014-06-04 21:14 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, gitster, mhagger, David Turner

On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
[snip discussion of compiler flags; I'll look into a cpuid approach]

> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -668,6 +668,28 @@ void git_qsort(void *base, size_t nmemb, size_t size,
> >  #endif
> >  #endif
> >  
> > +#ifndef NO_SSE42
> > +#include <nmmintrin.h>
> > +/*
> > + * Clang ships with a version of nmmintrin.h that's incomplete; if
> > + * necessary, we define the constants that we're going to use.
> > + */
> > +#ifndef _SIDD_UBYTE_OPS
> > +#define _SIDD_UBYTE_OPS                 0x00
> > +#define _SIDD_CMP_EQUAL_ANY             0x00
> > +#define _SIDD_CMP_RANGES                0x04
> > +#define _SIDD_CMP_EQUAL_ORDERED         0x0c
> > +#define _SIDD_NEGATIVE_POLARITY         0x10
> > +#endif
> Why do this defines end up in git-compat-util.h when they are needed by one file?
> (see even below)

Because Junio told me to:
"We would prefer not to add inclusion of any system header files in
random *.c files, as there often are system dependencies (order of
inclusion, definition of feature macros, etc.) we would rather want
to encapsulate in one place, that is git-compat-util.h."

> > --- a/refs.c
> > +++ b/refs.c
> > @@ -24,6 +24,25 @@ static unsigned char refname_disposition[256] = {
> >  	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
> >  };
> >  
> > +static int check_refname_component_trailer(const char *cp, const char *refname, int flags)
> > +{
> > +	if (cp == refname)
> > +		return 0; /* Component has zero length. */
> > +	if (refname[0] == '.') {
> > +		if (!(flags & REFNAME_DOT_COMPONENT))
> > +			return -1; /* Component starts with '.'. */
> > +		/*
> > +		 * Even if leading dots are allowed, don't allow "."
> > +		 * as a component (".." is prevented by a rule above).
> > +		 */
> > +		if (refname[1] == '\0')
> > +			return -1; /* Component equals ".". */
> > +	}
> > +	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> > +		return -1; /* Refname ends with ".lock". */
> > +	return cp - refname;
> > +}
> > +
> >  /*
> >   * Try to read one refname component from the front of refname.
> >   * Return the length of the component found, or -1 if the component is
> > @@ -37,7 +56,7 @@ static unsigned char refname_disposition[256] = {
> >   * - it ends with ".lock"
> >   * - it contains a "\" (backslash)
> >   */
> > -static int check_refname_component(const char *refname, int flags)
> > +static int check_refname_component_1(const char *refname, int flags)
> The name check_refname_component_1() doesn't tell too much,
> (check_refname_component_sse42()  or check_refname_component_nonsse42() say more)

I'll go with "_bytewise", since that's how it works.

> can I suggest to move all SSE code out to a file under compat/,
> like compat/refs_sse42.c, or something similar ?

Since this is a relatively small section of code, I think that would be
overkill.  Does anyone else have an opinion?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04 14:25       ` Torsten Bögershausen
@ 2014-06-04 21:16         ` David Turner
  2014-06-05 12:30           ` Torsten Bögershausen
  0 siblings, 1 reply; 14+ messages in thread
From: David Turner @ 2014-06-04 21:16 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Duy Nguyen, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On Wed, 2014-06-04 at 16:25 +0200, Torsten Bögershausen wrote:
> On the other hand, looking here: 
> http://sourceware.org/ml/libc-alpha/2009-10/msg00063.html
> and looking into refs.c,
> it seems as if we can try to run 
> strcspn(refname, bad_characters)
> and 
> strstr(refname, "@{"
> and 
> strstr(refname, ".."
> on each refname, instead of checking each char in a loop.
> The library will pick the fastest version for strcspn() automatically.

Yes, you could try that, but I worry that it would be less efficient,
because it duplicates the looping machinery.

> David, the repo you run the tests on, is it public?

Unfortunately, it is an internal Twitter repo.

> Or is there a public repo with this many refs ?

I do not know of one.

> Or can you make a dummy repo with 60k refs ?

Sure!  I actually went with > 120k to make measurement easier:
https://github.com/dturner-tw/many-refs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04 21:14     ` David Turner
@ 2014-06-04 21:46       ` Junio C Hamano
  2014-06-05 19:27         ` David Turner
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2014-06-04 21:46 UTC (permalink / raw)
  To: David Turner; +Cc: Torsten Bögershausen, git, mhagger, David Turner

David Turner <dturner@twopensource.com> writes:

> On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
> [snip discussion of compiler flags; I'll look into a cpuid approach]

Hmmmm, I am not sure if the complexity is really worth it.

In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined
to queue it by itself early without waiting for the discussion on
2/2 to settle.

>> The name check_refname_component_1() doesn't tell too much,
>> (check_refname_component_sse42()  or check_refname_component_nonsse42() say more)
>
> I'll go with "_bytewise", since that's how it works.

That naming assumes that there will never be any alternative
implementation of the bytewise checker other than the one that uses
sse42, no?

>> can I suggest to move all SSE code out to a file under compat/,
>> like compat/refs_sse42.c, or something similar ?
>
> Since this is a relatively small section of code, I think that would be
> overkill.  Does anyone else have an opinion?

If we foresee people on other architectures to invent different
vectorized implementations on their favourite archs, we may end up
separating it out into compat/.  I have no opinion on how likely
that will happen, though, and because this is a small piece of code
right now, it shouldn't be too painful to reorganize when the time
comes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04 21:16         ` David Turner
@ 2014-06-05 12:30           ` Torsten Bögershausen
  2014-06-05 12:58             ` Ondřej Bílka
  2014-06-05 19:26             ` David Turner
  0 siblings, 2 replies; 14+ messages in thread
From: Torsten Bögershausen @ 2014-06-05 12:30 UTC (permalink / raw)
  To: David Turner, Torsten Bögershausen
  Cc: Duy Nguyen, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On 2014-06-04 23.16, David Turner wrote:
> 
> Sure!  I actually went with > 120k to make measurement easier:
> https://github.com/dturner-tw/many-refs
Hm, I didn't get so man

git remote -v
origin  https://github.com/dturner-tw/many-refs 

 wc .git/packed-refs 
     750    1130   38868 .git/packed-refs



time git rev-parse HEAD
7ac416f789fd4f1e398449113f6e1ec1d699141a

real    0m0.008s
user    0m0.002s
sys     0m0.004s

where only patch 1/2 doesn't seem to speed up things on my system:

time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD
7ac416f789fd4f1e398449113f6e1ec1d699141a

real    0m0.010s
user    0m0.002s
sys     0m0.005s



Intel Core Duo @ 2.4 Ghz, Mac OS

(and I get similar values under an AMD Dual core running 2Ghz under Linux)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-05 12:30           ` Torsten Bögershausen
@ 2014-06-05 12:58             ` Ondřej Bílka
  2014-06-05 19:26             ` David Turner
  1 sibling, 0 replies; 14+ messages in thread
From: Ondřej Bílka @ 2014-06-05 12:58 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: David Turner, Duy Nguyen, Git Mailing List, Junio C Hamano,
	Michael Haggerty, David Turner

On Thu, Jun 05, 2014 at 02:30:17PM +0200, Torsten Bögershausen wrote:
> On 2014-06-04 23.16, David Turner wrote:
> > 
> > Sure!  I actually went with > 120k to make measurement easier:
> > https://github.com/dturner-tw/many-refs
> Hm, I didn't get so man
> 
> git remote -v
> origin  https://github.com/dturner-tw/many-refs 
> 
>  wc .git/packed-refs 
>      750    1130   38868 .git/packed-refs
> 
> 
> 
> time git rev-parse HEAD
> 7ac416f789fd4f1e398449113f6e1ec1d699141a
> 
> real    0m0.008s
> user    0m0.002s
> sys     0m0.004s
> 
> where only patch 1/2 doesn't seem to speed up things on my system:
> 
> time ~/projects/git/tb.140604_DavidTurner_SSE4/git rev-parse HEAD
> 7ac416f789fd4f1e398449113f6e1ec1d699141a
> 
> real    0m0.010s
> user    0m0.002s
> sys     0m0.005s
> 
> 
Could you run it 100 times to get better resolution? This could be just
measurement error.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-05 12:30           ` Torsten Bögershausen
  2014-06-05 12:58             ` Ondřej Bílka
@ 2014-06-05 19:26             ` David Turner
  2014-06-05 21:42               ` Torsten Bögershausen
  1 sibling, 1 reply; 14+ messages in thread
From: David Turner @ 2014-06-05 19:26 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Duy Nguyen, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
> On 2014-06-04 23.16, David Turner wrote:
> > 
> > Sure!  I actually went with > 120k to make measurement easier:
> > https://github.com/dturner-tw/many-refs
> Hm, I didn't get so man
> 
> git remote -v
> origin  https://github.com/dturner-tw/many-refs 
> 
>  wc .git/packed-refs 
>      750    1130   38868 .git/packed-refs
> 

Oops.  It looks like I forgot to push all of the refs.  And when I try,
it fails with "fatal: cannot exec 'send-pack': Argument list too long"

I hacked git to send batches of 1000; maybe I'll actually make a real
patch with ARG_MAX at some point. Anyway, this is uploading now, but I
estimate that it will take at least five hours, because github is being
really slow about this.

...
> where only patch 1/2 doesn't seem to speed up things on my system:
...

I would not expect a noticeable change on a tiny number of refs; it's
only when ref parsing takes up a large percentage of runtime -- tens of
thousands of refs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-04 21:46       ` Junio C Hamano
@ 2014-06-05 19:27         ` David Turner
  0 siblings, 0 replies; 14+ messages in thread
From: David Turner @ 2014-06-05 19:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git, mhagger, David Turner

On Wed, 2014-06-04 at 14:46 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > On Wed, 2014-06-04 at 10:04 +0200, Torsten Bögershausen wrote:
> > [snip discussion of compiler flags; I'll look into a cpuid approach]
> 
> Hmmmm, I am not sure if the complexity is really worth it.
> 
> In any case, [PATCH 1/2] is fairly uncontroversial, so I am inclined
> to queue it by itself early without waiting for the discussion on
> 2/2 to settle.
> 
> >> The name check_refname_component_1() doesn't tell too much,
> >> (check_refname_component_sse42()  or check_refname_component_nonsse42() say more)
> >
> > I'll go with "_bytewise", since that's how it works.
> 
> That naming assumes that there will never be any alternative
> implementation of the bytewise checker other than the one that uses
> sse42, no?

check_refname_component_1 is the non-sse (LUT) one; I assume that there
will only be one implementation of that (and if there's later another
one we can rename it).  I guess this is strong evidence for _1 being a
bad name.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-05 19:26             ` David Turner
@ 2014-06-05 21:42               ` Torsten Bögershausen
  2014-06-05 22:02                 ` David Turner
  0 siblings, 1 reply; 14+ messages in thread
From: Torsten Bögershausen @ 2014-06-05 21:42 UTC (permalink / raw)
  To: David Turner, Torsten Bögershausen
  Cc: Duy Nguyen, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On 2014-06-05 21.26, David Turner wrote:
> On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
>> On 2014-06-04 23.16, David Turner wrote:
>>>
>>> Sure!  I actually went with > 120k to make measurement easier:
>>> https://github.com/dturner-tw/many-refs
>> Hm, I didn't get so man
>>
>> git remote -v
>> origin  https://github.com/dturner-tw/many-refs 
>>
>>  wc .git/packed-refs 
>>      750    1130   38868 .git/packed-refs
>>
> 
> Oops.  It looks like I forgot to push all of the refs.  And when I try,
> it fails with "fatal: cannot exec 'send-pack': Argument list too long"

I just noticed that I may be able to re-use code from t5551 to create a repo
with 50000 tags.


And how about renaming check_refname_component_1() into
check_refname_component_slow() ;-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component
  2014-06-05 21:42               ` Torsten Bögershausen
@ 2014-06-05 22:02                 ` David Turner
  0 siblings, 0 replies; 14+ messages in thread
From: David Turner @ 2014-06-05 22:02 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Duy Nguyen, Git Mailing List, Junio C Hamano, Michael Haggerty,
	David Turner

On Thu, 2014-06-05 at 23:42 +0200, Torsten Bögershausen wrote:
> On 2014-06-05 21.26, David Turner wrote:
> > On Thu, 2014-06-05 at 14:30 +0200, Torsten Bögershausen wrote:
> >> On 2014-06-04 23.16, David Turner wrote:
> >>>
> >>> Sure!  I actually went with > 120k to make measurement easier:
> >>> https://github.com/dturner-tw/many-refs
> >> Hm, I didn't get so man
> >>
> >> git remote -v
> >> origin  https://github.com/dturner-tw/many-refs 
> >>
> >>  wc .git/packed-refs 
> >>      750    1130   38868 .git/packed-refs
> >>
> > 
> > Oops.  It looks like I forgot to push all of the refs.  And when I try,
> > it fails with "fatal: cannot exec 'send-pack': Argument list too long"
> 
> I just noticed that I may be able to re-use code from t5551 to create a repo
> with 50000 tags.

That's good, because github really didn't like me having a repo with
thousands of refs.  Try 100k, tho, because the difference is easier to
see.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-06-05 22:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-04  3:38 [PATCH v6 1/2] refs.c: optimize check_refname_component() David Turner
2014-06-04  3:38 ` [PATCH v6 2/2] refs.c: SSE4.2 optimizations for check_refname_component David Turner
2014-06-04  8:04   ` Torsten Bögershausen
2014-06-04 11:21     ` Duy Nguyen
2014-06-04 14:25       ` Torsten Bögershausen
2014-06-04 21:16         ` David Turner
2014-06-05 12:30           ` Torsten Bögershausen
2014-06-05 12:58             ` Ondřej Bílka
2014-06-05 19:26             ` David Turner
2014-06-05 21:42               ` Torsten Bögershausen
2014-06-05 22:02                 ` David Turner
2014-06-04 21:14     ` David Turner
2014-06-04 21:46       ` Junio C Hamano
2014-06-05 19:27         ` David Turner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).