[PATCH] grep: load funcname patterns for -W

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] grep: load funcname patterns for -W
@ 2011-11-25 14:46 Thomas Rast
  2011-11-25 16:32 ` René Scharfe
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-11-25 14:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe

git-grep avoids loading the funcname patterns unless they are needed.
ba8ea74 (grep: add option to show whole function as context,
2011-08-01) forgot to extend this test also to the new funcbody
feature.  Do so.

The catch is that we also have to disable threading when using
userdiff, as explained in grep_threads_ok().  So we must be careful to
introduce the same test there.
---
 grep.c          |    7 ++++---
 t/t7810-grep.sh |   14 ++++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index b29d09c..7a070e9 100644
--- a/grep.c
+++ b/grep.c
@@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt)
 	 * machinery in grep_buffer_1. The attribute code is not
 	 * thread safe, so we disable the use of threads.
 	 */
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
-	    !opt->name_only)
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
 		return 0;
 
 	return 1;
@@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	}
 
 	memset(&xecfg, 0, sizeof(xecfg));
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
 		struct userdiff_driver *drv = userdiff_find_by_path(name);
 		if (drv && drv->funcname.pattern) {
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 81263b7..7ba5b16 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -523,6 +523,20 @@ test_expect_success 'grep -W' '
 	test_cmp expected actual
 '
 
+cat >expected <<EOF
+hello.c=	printf("Hello world.\n");
+hello.c:	return 0;
+hello.c-	/* char ?? */
+EOF
+
+test_expect_success 'grep -W with userdiff' '
+	test_when_finished "rm -f .gitattributes" &&
+	git config diff.custom.xfuncname "(printf.*|})$" &&
+	echo "hello.c diff=custom" >.gitattributes &&
+	git grep -W return >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
 	mkdir -p s &&
 	(
-- 
1.7.8.rc3.415.gbcbb2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH] grep: load funcname patterns for -W
  2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast
@ 2011-11-25 16:32 ` René Scharfe
  2011-11-26 12:15   ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
  0 siblings, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-11-25 16:32 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano

Am 25.11.2011 15:46, schrieb Thomas Rast:
> git-grep avoids loading the funcname patterns unless they are needed.
> ba8ea74 (grep: add option to show whole function as context,
> 2011-08-01) forgot to extend this test also to the new funcbody
> feature.  Do so.
> 
> The catch is that we also have to disable threading when using
> userdiff, as explained in grep_threads_ok().  So we must be careful to
> introduce the same test there.

Oops.  Thanks for catching this.

That reminds me to look into adding a thread-safe way to access
attributes again..

Thanks,
René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] grep: enable multi-threading for -p and -W
  2011-11-25 16:32 ` René Scharfe
@ 2011-11-26 12:15   ` René Scharfe
  2011-11-29  9:54     ` Thomas Rast
  0 siblings, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-11-26 12:15 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano

Move attribute reading, which is not thread-safe, into add_work(), under
grep_mutex.  That way we can stop turning off multi-threading if -p or -W
is given, as the diff attribute for each file is gotten safely now.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Goes on top of your patch.  Needs testing..

 builtin/grep.c |   33 +++++++++++++++++++++++++--------
 grep.c         |   25 +++++++------------------
 grep.h         |    5 +++--
 revision.c     |    2 +-
 4 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3d7329d..5534111 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,6 +18,7 @@
 #include "quote.h"
 #include "dir.h"
 #include "thread-utils.h"
+#include "userdiff.h"
 
 static char const * const grep_usage[] = {
 	"git grep [options] [-e] <pattern> [<rev>...] [[--] <path>...]",
@@ -43,6 +44,7 @@ enum work_type {WORK_SHA1, WORK_FILE};
 struct work_item {
 	enum work_type type;
 	char *name;
+	struct userdiff_driver *userdiff_driver;
 
 	/* if type == WORK_SHA1, then 'identifier' is a SHA1,
 	 * otherwise type == WORK_FILE, and 'identifier' is a NUL
@@ -114,7 +116,17 @@ static pthread_cond_t cond_result;
 
 static int skip_first_line;
 
-static void add_work(enum work_type type, char *name, void *id)
+/* Reading attributes is not thread-safe, so callers need to lock. */
+static struct userdiff_driver *get_userdiff_driver(struct grep_opt *opt,
+						   const char *name)
+{
+	if (opt->funcbody || opt->funcname)
+		return userdiff_find_by_path(name);
+	return NULL;
+}
+
+static void add_work(struct grep_opt *opt, enum work_type type, char *name,
+		     void *id)
 {
 	grep_lock();
 
@@ -124,6 +136,7 @@ static void add_work(enum work_type type, char *name, void *id)
 
 	todo[todo_end].type = type;
 	todo[todo_end].name = name;
+	todo[todo_end].userdiff_driver = get_userdiff_driver(opt, name);
 	todo[todo_end].identifier = id;
 	todo[todo_end].done = 0;
 	strbuf_reset(&todo[todo_end].out);
@@ -158,13 +171,13 @@ static void grep_sha1_async(struct grep_opt *opt, char *name,
 	unsigned char *s;
 	s = xmalloc(20);
 	memcpy(s, sha1, 20);
-	add_work(WORK_SHA1, name, s);
+	add_work(opt, WORK_SHA1, name, s);
 }
 
 static void grep_file_async(struct grep_opt *opt, char *name,
 			    const char *filename)
 {
-	add_work(WORK_FILE, name, xstrdup(filename));
+	add_work(opt, WORK_FILE, name, xstrdup(filename));
 }
 
 static void work_done(struct work_item *w)
@@ -212,24 +225,26 @@ static void *run(void *arg)
 	struct grep_opt *opt = arg;
 
 	while (1) {
+		struct userdiff_driver *drv;
 		struct work_item *w = get_work();
 		if (!w)
 			break;
 
 		opt->output_priv = w;
+		drv = w->userdiff_driver;
 		if (w->type == WORK_SHA1) {
 			unsigned long sz;
 			void* data = load_sha1(w->identifier, &sz, w->name);
 
 			if (data) {
-				hit |= grep_buffer(opt, w->name, data, sz);
+				hit |= grep_buffer(opt, w->name, drv, data, sz);
 				free(data);
 			}
 		} else if (w->type == WORK_FILE) {
 			size_t sz;
 			void* data = load_file(w->identifier, &sz);
 			if (data) {
-				hit |= grep_buffer(opt, w->name, data, sz);
+				hit |= grep_buffer(opt, w->name, drv, data, sz);
 				free(data);
 			}
 		} else {
@@ -417,10 +432,11 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 		int hit;
 		unsigned long sz;
 		void *data = load_sha1(sha1, &sz, name);
+		struct userdiff_driver *drv = get_userdiff_driver(opt, name);
 		if (!data)
 			hit = 0;
 		else
-			hit = grep_buffer(opt, name, data, sz);
+			hit = grep_buffer(opt, name, drv, data, sz);
 
 		free(data);
 		free(name);
@@ -479,10 +495,11 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		int hit;
 		size_t sz;
 		void *data = load_file(filename, &sz);
+		struct userdiff_driver *drv = get_userdiff_driver(opt, name);
 		if (!data)
 			hit = 0;
 		else
-			hit = grep_buffer(opt, name, data, sz);
+			hit = grep_buffer(opt, name, drv, data, sz);
 
 		free(data);
 		free(name);
@@ -1001,7 +1018,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.regflags |= REG_ICASE;
 
 #ifndef NO_PTHREADS
-	if (online_cpus() == 1 || !grep_threads_ok(&opt))
+	if (online_cpus() == 1)
 		use_threads = 0;
 
 	if (use_threads) {
diff --git a/grep.c b/grep.c
index 7a070e9..ff14a98 100644
--- a/grep.c
+++ b/grep.c
@@ -942,25 +942,13 @@ static int look_ahead(struct grep_opt *opt,
 	return 0;
 }
 
-int grep_threads_ok(const struct grep_opt *opt)
-{
-	/* If this condition is true, then we may use the attribute
-	 * machinery in grep_buffer_1. The attribute code is not
-	 * thread safe, so we disable the use of threads.
-	 */
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
-		return 0;
-
-	return 1;
-}
-
 static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 {
 	fwrite(buf, size, 1, stdout);
 }
 
 static int grep_buffer_1(struct grep_opt *opt, const char *name,
+			 struct userdiff_driver *drv,
 			 char *buf, unsigned long size, int collect_hits)
 {
 	char *bol = buf;
@@ -1011,7 +999,6 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	if ((opt->funcname || opt->funcbody)
 	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
-		struct userdiff_driver *drv = userdiff_find_by_path(name);
 		if (drv && drv->funcname.pattern) {
 			const struct userdiff_funcname *pe = &drv->funcname;
 			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
@@ -1167,23 +1154,25 @@ static int chk_hit_marker(struct grep_expr *x)
 	}
 }
 
-int grep_buffer(struct grep_opt *opt, const char *name, char *buf, unsigned long size)
+int grep_buffer(struct grep_opt *opt, const char *name,
+		struct userdiff_driver *userdiff_driver,
+		char *buf, unsigned long size)
 {
 	/*
 	 * we do not have to do the two-pass grep when we do not check
 	 * buffer-wide "all-match".
 	 */
 	if (!opt->all_match)
-		return grep_buffer_1(opt, name, buf, size, 0);
+		return grep_buffer_1(opt, name, userdiff_driver, buf, size, 0);
 
 	/* Otherwise the toplevel "or" terms hit a bit differently.
 	 * We first clear hit markers from them.
 	 */
 	clr_hit_marker(opt->pattern_expression);
-	grep_buffer_1(opt, name, buf, size, 1);
+	grep_buffer_1(opt, name, userdiff_driver, buf, size, 1);
 
 	if (!chk_hit_marker(opt->pattern_expression))
 		return 0;
 
-	return grep_buffer_1(opt, name, buf, size, 0);
+	return grep_buffer_1(opt, name, userdiff_driver, buf, size, 0);
 }
diff --git a/grep.h b/grep.h
index a652800..20224b5 100644
--- a/grep.h
+++ b/grep.h
@@ -121,14 +121,15 @@ struct grep_opt {
 	void *output_priv;
 };
 
+struct userdiff_driver;
+
 extern void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t);
 extern void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *origin, int no, enum grep_pat_token t);
 extern void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const char *);
 extern void compile_grep_patterns(struct grep_opt *opt);
 extern void free_grep_patterns(struct grep_opt *opt);
-extern int grep_buffer(struct grep_opt *opt, const char *name, char *buf, unsigned long size);
+extern int grep_buffer(struct grep_opt *opt, const char *name, struct userdiff_driver *userdiff_driver, char *buf, unsigned long size);
 
 extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
-extern int grep_threads_ok(const struct grep_opt *opt);
 
 #endif
diff --git a/revision.c b/revision.c
index 8764dde..95cb8c2 100644
--- a/revision.c
+++ b/revision.c
@@ -2126,7 +2126,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 	if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list)
 		return 1;
 	return grep_buffer(&opt->grep_filter,
-			   NULL, /* we say nothing, not even filename */
+			   NULL, NULL, /* we say nothing, not even filename */
 			   commit->buffer, strlen(commit->buffer));
 }
 
-- 
1.7.7

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH] grep: enable multi-threading for -p and -W
  2011-11-26 12:15   ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
@ 2011-11-29  9:54     ` Thomas Rast
  2011-11-29 13:49       ` René Scharfe
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-11-29  9:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano

René Scharfe wrote:
> Move attribute reading, which is not thread-safe, into add_work(), under
> grep_mutex.  That way we can stop turning off multi-threading if -p or -W
> is given, as the diff attribute for each file is gotten safely now.
> 
> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
> ---
> Goes on top of your patch.  Needs testing..

On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5):

* none of the patches:
  git grep --cached INITRAMFS_ROOT_UID
    2.13user 0.14system 0:02.10elapsed
  git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
    2.23user 0.18system 0:02.21elapsed 

* my patch, but not yours:
  git grep --cached INITRAMFS_ROOT_UID
    2.21user 0.21system 0:02.27elapsed
  git grep --cached -W INITRAMFS_ROOT_UID
    3.01user 0.05system 0:03.07elapsed

* my patch + your patch:
  git grep --cached INITRAMFS_ROOT_UID
    2.22user 0.17system 0:02.22elapsed
  git grep --cached -W INITRAMFS_ROOT_UID
    4.45user 0.22system 0:02.67elapsed

So while it ends up being faster overall, it also uses way more CPU
and would presumably be *slower* on a single-processor system.
Apparently those attribute lookups really hurt.

So I had the following idea: if we load attributes only very lazily
(that is, when match_funcname() is first called), we can ask for them
much more rarely.  The revised timings:

* my patch + the new patch at the end:
  git grep --cached INITRAMFS_ROOT_UID
    2.15user 0.16system 0:02.15elapsed 107%CPU
  git grep --cached -W INITRAMFS_ROOT_UID
    2.20user 0.18system 0:02.24elapsed

However, locking on a specific lock relies on the fact that the
attributes are only read from the tree, but never from the object
store.  Perhaps it would be more futureproof to lock on
read_sha1_mutex instead.  Either way the lazy attributes lookup seems
a big win.

------ 8< ------ 8< ------
Subject: [PATCH] grep: fine-grained locking around attributes access

Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

diff --git a/builtin/grep.c b/builtin/grep.c
index 988ea1d..822b32f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
 
 	pthread_mutex_init(&grep_mutex, NULL);
 	pthread_mutex_init(&read_sha1_mutex, NULL);
+	pthread_mutex_init(&grep_attr_mutex, NULL);
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
 	pthread_cond_init(&cond_result, NULL);
@@ -303,6 +304,7 @@ static int wait_all(void)
 
 	pthread_mutex_destroy(&grep_mutex);
 	pthread_mutex_destroy(&read_sha1_mutex);
+	pthread_mutex_destroy(&grep_attr_mutex);
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
@@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.regflags |= REG_ICASE;
 
 #ifndef NO_PTHREADS
-	if (online_cpus() == 1 || !grep_threads_ok(&opt))
+	if (online_cpus() == 1)
 		use_threads = 0;
+#endif
+
+	opt.use_threads = use_threads;
 
+#ifndef NO_PTHREADS
 	if (use_threads) {
 		if (opt.pre_context || opt.post_context || opt.file_break ||
 		    opt.funcbody)
diff --git a/grep.c b/grep.c
index 7a070e9..e9c3df3 100644
--- a/grep.c
+++ b/grep.c
@@ -2,6 +2,7 @@
 #include "grep.h"
 #include "userdiff.h"
 #include "xdiff-interface.h"
+#include "thread-utils.h"
 
 void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
 {
@@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 	opt->output(opt, "\n", 1);
 }
 
-static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
+#ifndef NO_PTHREADS
+pthread_mutex_t grep_attr_mutex;
+
+static inline void grep_attr_lock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_lock(&grep_attr_mutex);
+}
+
+static inline void grep_attr_unlock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_unlock(&grep_attr_mutex);
+}
+#endif
+
+static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
 {
 	xdemitconf_t *xecfg = opt->priv;
-	if (xecfg && xecfg->find_func) {
+	if (xecfg && !xecfg->find_func) {
+		grep_attr_lock(opt);
+		struct userdiff_driver *drv = userdiff_find_by_path(name);
+		grep_attr_unlock(opt);
+		if (drv && drv->funcname.pattern) {
+			const struct userdiff_funcname *pe = &drv->funcname;
+			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
+		} else {
+			xecfg = opt->priv = NULL;
+		}
+	}
+
+	if (xecfg) {
 		char buf[1];
 		return xecfg->find_func(bol, eol - bol, buf, 1,
 					xecfg->find_func_priv) >= 0;
@@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
 		if (lno <= opt->last_shown)
 			break;
 
-		if (match_funcname(opt, bol, eol)) {
+		if (match_funcname(opt, name, bol, eol)) {
 			show_line(opt, bol, eol, name, lno, '=');
 			break;
 		}
@@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 	unsigned cur = lno, from = 1, funcname_lno = 0;
 	int funcname_needed = !!opt->funcname;
 
-	if (opt->funcbody && !match_funcname(opt, bol, end))
+	if (opt->funcbody && !match_funcname(opt, name, bol, end))
 		funcname_needed = 2;
 
 	if (opt->pre_context < lno)
@@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 		while (bol > buf && bol[-1] != '\n')
 			bol--;
 		cur--;
-		if (funcname_needed && match_funcname(opt, bol, eol)) {
+		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
 			funcname_lno = cur;
 			funcname_needed = 0;
 		}
@@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt,
 	return 0;
 }
 
-int grep_threads_ok(const struct grep_opt *opt)
-{
-	/* If this condition is true, then we may use the attribute
-	 * machinery in grep_buffer_1. The attribute code is not
-	 * thread safe, so we disable the use of threads.
-	 */
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
-		return 0;
-
-	return 1;
-}
-
 static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 {
 	fwrite(buf, size, 1, stdout);
@@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	if ((opt->funcname || opt->funcbody)
 	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
-		struct userdiff_driver *drv = userdiff_find_by_path(name);
-		if (drv && drv->funcname.pattern) {
-			const struct userdiff_funcname *pe = &drv->funcname;
-			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
-			opt->priv = &xecfg;
-		}
+		opt->priv = &xecfg;
 	}
 	try_lookahead = should_lookahead(opt);
 
@@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 				show_function = 1;
 			goto next_line;
 		}
-		if (show_function && match_funcname(opt, bol, eol))
+		if (show_function && match_funcname(opt, name, bol, eol))
 			show_function = 0;
 		if (show_function ||
 		    (last_hit && lno <= last_hit + opt->post_context)) {
diff --git a/grep.h b/grep.h
index a652800..15d227c 100644
--- a/grep.h
+++ b/grep.h
@@ -115,6 +115,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int use_threads;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -131,4 +132,10 @@ struct grep_opt {
 extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);
 
+#ifndef NO_PTHREADS
+/* Mutex used around access to the attributes machinery if
+ * opt->use_threads.  Must be initialized/destroyed by callers! */
+extern pthread_mutex_t grep_attr_mutex;
+#endif
+
 #endif


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH] grep: enable multi-threading for -p and -W
  2011-11-29  9:54     ` Thomas Rast
@ 2011-11-29 13:49       ` René Scharfe
  2011-11-29 14:07         ` Thomas Rast
  0 siblings, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-11-29 13:49 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano

Am 29.11.2011 10:54, schrieb Thomas Rast:
> René Scharfe wrote:
>> Move attribute reading, which is not thread-safe, into add_work(), under
>> grep_mutex.  That way we can stop turning off multi-threading if -p or -W
>> is given, as the diff attribute for each file is gotten safely now.
>>
>> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
>> ---
>> Goes on top of your patch.  Needs testing..
> 
> On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5):
> 
> * none of the patches:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.13user 0.14system 0:02.10elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
>     2.23user 0.18system 0:02.21elapsed 

Broken is a tad too hard a word; IIUC -W just lacks support for function
patterns of different file types, which is unintended and surprising of
course.  For the default case it works correctly (and threaded).

> * my patch, but not yours:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.21user 0.21system 0:02.27elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     3.01user 0.05system 0:03.07elapsed
> 
> * my patch + your patch:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.22user 0.17system 0:02.22elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     4.45user 0.22system 0:02.67elapsed
> 
> So while it ends up being faster overall, it also uses way more CPU
> and would presumably be *slower* on a single-processor system.
> Apparently those attribute lookups really hurt.

Hmm, why are they *that* expensive?

callgrind tells me that userdiff_find_by_path() contributes only 0.18%
to the total cost with your first patch.  Timings in my virtual machine
are very volatile, but it seems that here the difference is in the
system time while user is basically the same for all combinations of
patches.

> So I had the following idea: if we load attributes only very lazily
> (that is, when match_funcname() is first called), we can ask for them
> much more rarely.  The revised timings:
> 
> * my patch + the new patch at the end:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.15user 0.16system 0:02.15elapsed 107%CPU
>   git grep --cached -W INITRAMFS_ROOT_UID
>     2.20user 0.18system 0:02.24elapsed

Nice.

I wonder what caused the slight speedup for the case without -W.  It's
probably just noise, though.

> However, locking on a specific lock relies on the fact that the
> attributes are only read from the tree, but never from the object
> store.  Perhaps it would be more futureproof to lock on
> read_sha1_mutex instead.  Either way the lazy attributes lookup seems
> a big win.

You could lock read_sha1_mutex after grep_attr_mutex.  With lazy loading
it should have a low impact most of the time.

> ------ 8< ------ 8< ------
> Subject: [PATCH] grep: fine-grained locking around attributes access
> 
> Lazily load the userdiff attributes in match_funcname().  Use a
> separate mutex around this loading to protect the (not thread-safe)
> attributes machinery.  This lets us re-enable threading with -p and
> -W while reducing the overhead caused by looking up attributes.
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 988ea1d..822b32f 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
>  
>  	pthread_mutex_init(&grep_mutex, NULL);
>  	pthread_mutex_init(&read_sha1_mutex, NULL);
> +	pthread_mutex_init(&grep_attr_mutex, NULL);
>  	pthread_cond_init(&cond_add, NULL);
>  	pthread_cond_init(&cond_write, NULL);
>  	pthread_cond_init(&cond_result, NULL);
> @@ -303,6 +304,7 @@ static int wait_all(void)
>  
>  	pthread_mutex_destroy(&grep_mutex);
>  	pthread_mutex_destroy(&read_sha1_mutex);
> +	pthread_mutex_destroy(&grep_attr_mutex);
>  	pthread_cond_destroy(&cond_add);
>  	pthread_cond_destroy(&cond_write);
>  	pthread_cond_destroy(&cond_result);
> @@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		opt.regflags |= REG_ICASE;
>  
>  #ifndef NO_PTHREADS
> -	if (online_cpus() == 1 || !grep_threads_ok(&opt))
> +	if (online_cpus() == 1)
>  		use_threads = 0;
> +#endif
> +
> +	opt.use_threads = use_threads;
>  
> +#ifndef NO_PTHREADS
>  	if (use_threads) {
>  		if (opt.pre_context || opt.post_context || opt.file_break ||
>  		    opt.funcbody)
> diff --git a/grep.c b/grep.c
> index 7a070e9..e9c3df3 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -2,6 +2,7 @@
>  #include "grep.h"
>  #include "userdiff.h"
>  #include "xdiff-interface.h"
> +#include "thread-utils.h"
>  
>  void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
>  {
> @@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
>  	opt->output(opt, "\n", 1);
>  }
>  
> -static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
> +#ifndef NO_PTHREADS
> +pthread_mutex_t grep_attr_mutex;
> +
> +static inline void grep_attr_lock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_lock(&grep_attr_mutex);
> +}
> +
> +static inline void grep_attr_unlock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_unlock(&grep_attr_mutex);
> +}
> +#endif

We'd need stubs here for the case of NO_PTHREADS being defined.

> +
> +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
>  {
>  	xdemitconf_t *xecfg = opt->priv;
> -	if (xecfg && xecfg->find_func) {
> +	if (xecfg && !xecfg->find_func) {
> +		grep_attr_lock(opt);
> +		struct userdiff_driver *drv = userdiff_find_by_path(name);
> +		grep_attr_unlock(opt);
> +		if (drv && drv->funcname.pattern) {
> +			const struct userdiff_funcname *pe = &drv->funcname;
> +			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
> +		} else {
> +			xecfg = opt->priv = NULL;
> +		}
> +	}

Perhaps leave the thread stuff in builtin/grep.c and export a function
from there that does the above, with locking and all?

> +
> +	if (xecfg) {
>  		char buf[1];
>  		return xecfg->find_func(bol, eol - bol, buf, 1,
>  					xecfg->find_func_priv) >= 0;
> @@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
>  		if (lno <= opt->last_shown)
>  			break;
>  
> -		if (match_funcname(opt, bol, eol)) {
> +		if (match_funcname(opt, name, bol, eol)) {
>  			show_line(opt, bol, eol, name, lno, '=');
>  			break;
>  		}
> @@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  	unsigned cur = lno, from = 1, funcname_lno = 0;
>  	int funcname_needed = !!opt->funcname;
>  
> -	if (opt->funcbody && !match_funcname(opt, bol, end))
> +	if (opt->funcbody && !match_funcname(opt, name, bol, end))
>  		funcname_needed = 2;
>  
>  	if (opt->pre_context < lno)
> @@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  		while (bol > buf && bol[-1] != '\n')
>  			bol--;
>  		cur--;
> -		if (funcname_needed && match_funcname(opt, bol, eol)) {
> +		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
>  			funcname_lno = cur;
>  			funcname_needed = 0;
>  		}
> @@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt,
>  	return 0;
>  }
>  
> -int grep_threads_ok(const struct grep_opt *opt)
> -{
> -	/* If this condition is true, then we may use the attribute
> -	 * machinery in grep_buffer_1. The attribute code is not
> -	 * thread safe, so we disable the use of threads.
> -	 */
> -	if ((opt->funcname || opt->funcbody)
> -	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
> -		return 0;
> -
> -	return 1;
> -}
> -
>  static void std_output(struct grep_opt *opt, const void *buf, size_t size)
>  {
>  	fwrite(buf, size, 1, stdout);
> @@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  	if ((opt->funcname || opt->funcbody)
>  	    && !opt->unmatch_name_only && !opt->status_only &&
>  	    !opt->name_only && !binary_match_only && !collect_hits) {
> -		struct userdiff_driver *drv = userdiff_find_by_path(name);
> -		if (drv && drv->funcname.pattern) {
> -			const struct userdiff_funcname *pe = &drv->funcname;
> -			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
> -			opt->priv = &xecfg;
> -		}
> +		opt->priv = &xecfg;
>  	}

opt->priv can be set unconditionally here now, since we only access it
from within match_funcname() and that function is not called if any of
the short-circuit options is set.

>  	try_lookahead = should_lookahead(opt);
>  
> @@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  				show_function = 1;
>  			goto next_line;
>  		}
> -		if (show_function && match_funcname(opt, bol, eol))
> +		if (show_function && match_funcname(opt, name, bol, eol))
>  			show_function = 0;
>  		if (show_function ||
>  		    (last_hit && lno <= last_hit + opt->post_context)) {
> diff --git a/grep.h b/grep.h
> index a652800..15d227c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -115,6 +115,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int use_threads;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -131,4 +132,10 @@ struct grep_opt {
>  extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
>  extern int grep_threads_ok(const struct grep_opt *opt);
>  
> +#ifndef NO_PTHREADS
> +/* Mutex used around access to the attributes machinery if
> + * opt->use_threads.  Must be initialized/destroyed by callers! */
> +extern pthread_mutex_t grep_attr_mutex;
> +#endif
> +
>  #endif
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] grep: enable multi-threading for -p and -W
  2011-11-29 13:49       ` René Scharfe
@ 2011-11-29 14:07         ` Thomas Rast
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-11-29 14:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano

René Scharfe wrote:
> > * none of the patches:
> >   git grep --cached INITRAMFS_ROOT_UID
> >     2.13user 0.14system 0:02.10elapsed
> >   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
> >     2.23user 0.18system 0:02.21elapsed 
> 
> Broken is a tad too hard a word

Sorry, I just wanted to mark it as: this is unattainable since we're
now doing extra work.

> > * my patch, but not yours:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     3.01user 0.05system 0:03.07elapsed
> > 
> > * my patch + your patch:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     4.45user 0.22system 0:02.67elapsed
> > 
> > So while it ends up being faster overall, it also uses way more CPU
> > and would presumably be *slower* on a single-processor system.
> > Apparently those attribute lookups really hurt.
> 
> Hmm, why are they *that* expensive?
> 
> callgrind tells me that userdiff_find_by_path() contributes only 0.18%
> to the total cost with your first patch.  Timings in my virtual machine
> are very volatile, but it seems that here the difference is in the
> system time while user is basically the same for all combinations of
> patches.

Not sure, perhaps callgrind does not model syscalls as expensive
enough.  I also suspect (from my very cursory look at the attributes
machinery) that loading attributes for all paths *in random order*
(i.e., threaded) causes a lot of discards of the existing attributes
data.  (The order is still random with my new patch, but we only load
them for files that have matches.)

> I wonder what caused the slight speedup for the case without -W.  It's
> probably just noise, though.

Yeah, it's very noisy, but I was too lazy for best-of-50 or something ;-)

[...]
> > +#ifndef NO_PTHREADS
> > +static inline void grep_attr_lock(struct grep_opt *opt)
[...]
> We'd need stubs here for the case of NO_PTHREADS being defined.

Right, thanks.  Sorry for not testing with NO_PTHREADS to begin with.

> Perhaps leave the thread stuff in builtin/grep.c and export a function
> from there that does [the userdiff lookup], with locking and all?

That seems like a layering violation to me.  Isn't builtin/grep.c
supposed to call out to grep.c, but not the other way around?

Maybe it would be less messy if we had a global (across all of git)
flag that says whether threads are on.  Then subsystems that are used
from threaded code, but cannot handle it, can learn to lock themselves
around their work.  But it would be pretty much the opposite of what
git-grep now does.

Thanks for the review.  I'll reroll as a proper patch later.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/3] grep multithreading and scaling
  2011-11-29 14:07         ` Thomas Rast
@ 2011-12-02 13:07           ` Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
                               ` (4 more replies)
  0 siblings, 5 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano

[Eric, I measured some numbers that may be interesting to the
discussion about b2924dc.  See below.]

This round wraps up the original patch I posted, plus the draft patch
I posted inline the other day with René's review taken into account.
I also added a patch that rips out threading in the non-worktree case;
read on for the reasoning.

René Scharfe wrote:
> Hmm, why are [gitattributes lookups] that expensive?
> 
> callgrind tells me that userdiff_find_by_path() contributes only 0.18%
> to the total cost with your first patch.  Timings in my virtual machine
> are very volatile, but it seems that here the difference is in the
> system time while user is basically the same for all combinations of
> patches.

Well, turns out I was measuring something completely stupid.  I had

  git grep --cached -W INITRAMFS_ROOT_UID

where I put the --cached originally because that makes it independent
of the worktree (which in the very first measurements I still had
wiped, as I tend to do for this repo; I checked it out again after
that).  This in fact gives me (~/g/git-grep --cached
INITRAMFS_ROOT_UID, leaving aside -W; best of 10):

  THREADS=8:   2.88user 0.21system 0:02.94elapsed
  THREADS=4:   2.89user 0.29system 0:02.99elapsed
  THREADS=2:   2.83user 0.36system 0:02.87elapsed
  NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed

Uhuh.  Doesn't scale so well after all.  But removing the --cached, as
most people probably would:

  THREADS=8:   0.19user 0.32system 0:00.16elapsed
  THREADS=4:   0.16user 0.34system 0:00.17elapsed
  THREADS=2:   0.18user 0.32system 0:00.26elapsed
  NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed

So I conclude that during any grep that cannot use the worktree,
having any threads hurts.

In addition, during a grep that *can* use the worktree, THREADS=8
still helps somewhat on my dual-core i7, though it goes downhill from
there (12 is again as fast as 4; I verified these details using
best-of-50 timings, and it is reproducible.)

I have also run timings on a 2*6-core workstation running OS X, where
performance is best at 5 cores:

  2 threads:  0.96 real   0.41 user   1.27 sys
  3 threads:  0.68 real   0.41 user   1.30 sys
  4 threads:  0.54 real   0.43 user   1.63 sys
  5 threads:  0.50 real   0.41 user   1.51 sys
  6 threads:  0.54 real   0.43 user   1.63 sys
  7 threads:  0.86 real   0.49 user   1.93 sys
  8 threads:  0.98 real   0.51 user   2.07 sys

I kid you not.  That's best-of-50 and rather stable.  It's on the same
tree as the Linux machine too, except for the problem that the OS X FS
is set to case-insensitive and thus cannot represent the tree exactly.
So from git's POV, there are unstaged changes.

Sadly I do not have access to a Linux box having more than 2 physical
cores.  If you have one, please run some tests :-)

So based on my measurements, I would suggest that unless we have
evidence of it scaling beyond 8 cores on some machine, b2924dc (grep:
detect number of CPUs for thread spawning) be dropped.  For now I'm
ignoring the problem that on OS X it doesn't even scale to 8; I'd
rather check how it fares on Linux first.

I added a third patch on top that disables threading in any case that
does not hit the worktree.  I wonder if I missed something or if it
really is that simple.  The neat part is that it's also a reduction in
code required, and at the same time avoids any issues 2/3 might have
with a future attributes-from-trees implementation.

With this I get

  worktree, 8 threads: 0.15user 0.37system 0:00.17elapsed
  --cached, 8 threads: 2.18user 0.07system 0:02.27elapsed

Of course, we could probably gain a huge boost if the read_sha1
machinery could be made threaded, so that it can unpack several
objects at a time.  In addition, I can well imagine that there are
combinations of delta density, object size, and luck where it pays off
to grep in parallel.  Do we care?

Now I really should do something else than fretting over the
sub-second performance of git-grep...

Thomas Rast (3):
  grep: load funcname patterns for -W
  grep: enable threading with -p and -W using lazy attribute lookup
  grep: disable threading in all but worktree case

 builtin/grep.c  |  153 ++++++++++++++++--------------------------------------
 grep.c          |   73 ++++++++++++++++----------
 grep.h          |    7 +++
 t/t7810-grep.sh |   14 +++++
 4 files changed, 112 insertions(+), 135 deletions(-)

-- 
1.7.8.rc4.388.ge53ab

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/3] grep: load funcname patterns for -W
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
@ 2011-12-02 13:07             ` Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano

git-grep avoids loading the funcname patterns unless they are needed.
ba8ea74 (grep: add option to show whole function as context,
2011-08-01) forgot to extend this test also to the new funcbody
feature.  Do so.

The catch is that we also have to disable threading when using
userdiff, as explained in grep_threads_ok().  So we must be careful to
introduce the same test there.
---
 grep.c          |    7 ++++---
 t/t7810-grep.sh |   14 ++++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index b29d09c..7a070e9 100644
--- a/grep.c
+++ b/grep.c
@@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt)
 	 * machinery in grep_buffer_1. The attribute code is not
 	 * thread safe, so we disable the use of threads.
 	 */
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
-	    !opt->name_only)
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
 		return 0;
 
 	return 1;
@@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	}
 
 	memset(&xecfg, 0, sizeof(xecfg));
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
 		struct userdiff_driver *drv = userdiff_find_by_path(name);
 		if (drv && drv->funcname.pattern) {
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 81263b7..7ba5b16 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -523,6 +523,20 @@ test_expect_success 'grep -W' '
 	test_cmp expected actual
 '
 
+cat >expected <<EOF
+hello.c=	printf("Hello world.\n");
+hello.c:	return 0;
+hello.c-	/* char ?? */
+EOF
+
+test_expect_success 'grep -W with userdiff' '
+	test_when_finished "rm -f .gitattributes" &&
+	git config diff.custom.xfuncname "(printf.*|})$" &&
+	echo "hello.c diff=custom" >.gitattributes &&
+	git grep -W return >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
 	mkdir -p s &&
 	(
-- 
1.7.8.rc4.388.ge53ab

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
@ 2011-12-02 13:07             ` Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano

Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 builtin/grep.c |   10 +++++++-
 grep.c         |   74 ++++++++++++++++++++++++++++++++++----------------------
 grep.h         |    7 +++++
 3 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 988ea1d..65b1ffe 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
 
 	pthread_mutex_init(&grep_mutex, NULL);
 	pthread_mutex_init(&read_sha1_mutex, NULL);
+	pthread_mutex_init(&grep_attr_mutex, NULL);
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
 	pthread_cond_init(&cond_result, NULL);
@@ -303,6 +304,7 @@ static int wait_all(void)
 
 	pthread_mutex_destroy(&grep_mutex);
 	pthread_mutex_destroy(&read_sha1_mutex);
+	pthread_mutex_destroy(&grep_attr_mutex);
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
@@ -1002,9 +1004,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.regflags |= REG_ICASE;
 
 #ifndef NO_PTHREADS
-	if (online_cpus() == 1 || !grep_threads_ok(&opt))
+	if (online_cpus() == 1)
 		use_threads = 0;
+#else
+	use_threads = 0;
+#endif
 
+	opt.use_threads = use_threads;
+
+#ifndef NO_PTHREADS
 	if (use_threads) {
 		if (opt.pre_context || opt.post_context || opt.file_break ||
 		    opt.funcbody)
diff --git a/grep.c b/grep.c
index 7a070e9..4dd7da2 100644
--- a/grep.c
+++ b/grep.c
@@ -2,6 +2,7 @@
 #include "grep.h"
 #include "userdiff.h"
 #include "xdiff-interface.h"
+#include "thread-utils.h"
 
 void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
 {
@@ -806,10 +807,46 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 	opt->output(opt, "\n", 1);
 }
 
-static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
+#ifndef NO_PTHREADS
+/*
+ * This lock protects access to the gitattributes machinery, which is
+ * not thread-safe.
+ */
+pthread_mutex_t grep_attr_mutex;
+
+static inline void grep_attr_lock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_lock(&grep_attr_mutex);
+}
+
+static inline void grep_attr_unlock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_unlock(&grep_attr_mutex);
+}
+#else
+#define grep_attr_lock(opt)
+#define grep_attr_unlock(opt)
+#endif
+
+static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
 {
 	xdemitconf_t *xecfg = opt->priv;
-	if (xecfg && xecfg->find_func) {
+	if (xecfg && !xecfg->find_func) {
+		struct userdiff_driver *drv;
+		grep_attr_lock(opt);
+		drv = userdiff_find_by_path(name);
+		grep_attr_unlock(opt);
+		if (drv && drv->funcname.pattern) {
+			const struct userdiff_funcname *pe = &drv->funcname;
+			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
+		} else {
+			xecfg = opt->priv = NULL;
+		}
+	}
+
+	if (xecfg) {
 		char buf[1];
 		return xecfg->find_func(bol, eol - bol, buf, 1,
 					xecfg->find_func_priv) >= 0;
@@ -835,7 +872,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
 		if (lno <= opt->last_shown)
 			break;
 
-		if (match_funcname(opt, bol, eol)) {
+		if (match_funcname(opt, name, bol, eol)) {
 			show_line(opt, bol, eol, name, lno, '=');
 			break;
 		}
@@ -848,7 +885,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 	unsigned cur = lno, from = 1, funcname_lno = 0;
 	int funcname_needed = !!opt->funcname;
 
-	if (opt->funcbody && !match_funcname(opt, bol, end))
+	if (opt->funcbody && !match_funcname(opt, name, bol, end))
 		funcname_needed = 2;
 
 	if (opt->pre_context < lno)
@@ -864,7 +901,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 		while (bol > buf && bol[-1] != '\n')
 			bol--;
 		cur--;
-		if (funcname_needed && match_funcname(opt, bol, eol)) {
+		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
 			funcname_lno = cur;
 			funcname_needed = 0;
 		}
@@ -942,19 +979,6 @@ static int look_ahead(struct grep_opt *opt,
 	return 0;
 }
 
-int grep_threads_ok(const struct grep_opt *opt)
-{
-	/* If this condition is true, then we may use the attribute
-	 * machinery in grep_buffer_1. The attribute code is not
-	 * thread safe, so we disable the use of threads.
-	 */
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
-		return 0;
-
-	return 1;
-}
-
 static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 {
 	fwrite(buf, size, 1, stdout);
@@ -1008,16 +1032,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	}
 
 	memset(&xecfg, 0, sizeof(xecfg));
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only &&
-	    !opt->name_only && !binary_match_only && !collect_hits) {
-		struct userdiff_driver *drv = userdiff_find_by_path(name);
-		if (drv && drv->funcname.pattern) {
-			const struct userdiff_funcname *pe = &drv->funcname;
-			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
-			opt->priv = &xecfg;
-		}
-	}
+	opt->priv = &xecfg;
+
 	try_lookahead = should_lookahead(opt);
 
 	while (left) {
@@ -1093,7 +1109,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 				show_function = 1;
 			goto next_line;
 		}
-		if (show_function && match_funcname(opt, bol, eol))
+		if (show_function && match_funcname(opt, name, bol, eol))
 			show_function = 0;
 		if (show_function ||
 		    (last_hit && lno <= last_hit + opt->post_context)) {
diff --git a/grep.h b/grep.h
index a652800..15d227c 100644
--- a/grep.h
+++ b/grep.h
@@ -115,6 +115,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int use_threads;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -131,4 +132,10 @@ struct grep_opt {
 extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);
 
+#ifndef NO_PTHREADS
+/* Mutex used around access to the attributes machinery if
+ * opt->use_threads.  Must be initialized/destroyed by callers! */
+extern pthread_mutex_t grep_attr_mutex;
+#endif
+
 #endif
-- 
1.7.8.rc4.388.ge53ab

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
  2011-12-02 13:07             ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
@ 2011-12-02 13:07             ` Thomas Rast
  2011-12-02 16:15               ` René Scharfe
  2011-12-23 22:37               ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
  2011-12-02 17:34             ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
  2011-12-02 20:02             ` Eric Herman
  4 siblings, 2 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano

Measuring grep performance showed that in all but the worktree case
(as opposed to --cached, <committish> or <treeish>), threading
actually slows things down.  For example, on my dual-core
hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got:

Threads       worktree case                 | --cached case
--------------------------------------------------------------------------
8 (default) | 2.17user 0.15sys 0:02.20real  | 0.11user 0.26sys 0:00.11real
4           | 2.06user 0.17sys 0:02.08real  | 0.11user 0.26sys 0:00.12real
2           | 2.02user 0.25sys 0:02.08real  | 0.15user 0.37sys 0:00.28real
NO_PTHREADS | 1.57user 0.05sys 0:01.64real  | 0.09user 0.12sys 0:00.22real

I conjecture that this is caused by contention on read_sha1_mutex.

So disable threading entirely when not scanning the worktree, to get
the NO_PTHREADS performance in that case.  This obsoletes all code
related to grep_sha1_async.  The thread startup must be delayed until
after all arguments have been parsed, but this does not have a
measurable effect.
---
 builtin/grep.c |  157 ++++++++++++++++----------------------------------------
 1 files changed, 44 insertions(+), 113 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 65b1ffe..edf6a31 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -34,21 +34,13 @@
 		       const char *name);
 static void *load_file(const char *filename, size_t *sz);
 
-enum work_type {WORK_SHA1, WORK_FILE};
-
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
  * consumers pick work items from the same array.
  */
 struct work_item {
-	enum work_type type;
 	char *name;
-
-	/* if type == WORK_SHA1, then 'identifier' is a SHA1,
-	 * otherwise type == WORK_FILE, and 'identifier' is a NUL
-	 * terminated filename.
-	 */
-	void *identifier;
+	char *filename;
 	char done;
 	struct strbuf out;
 };
@@ -86,21 +78,6 @@ static inline void grep_unlock(void)
 		pthread_mutex_unlock(&grep_mutex);
 }
 
-/* Used to serialize calls to read_sha1_file. */
-static pthread_mutex_t read_sha1_mutex;
-
-static inline void read_sha1_lock(void)
-{
-	if (use_threads)
-		pthread_mutex_lock(&read_sha1_mutex);
-}
-
-static inline void read_sha1_unlock(void)
-{
-	if (use_threads)
-		pthread_mutex_unlock(&read_sha1_mutex);
-}
-
 /* Signalled when a new work_item is added to todo. */
 static pthread_cond_t cond_add;
 
@@ -114,7 +91,7 @@ static inline void read_sha1_unlock(void)
 
 static int skip_first_line;
 
-static void add_work(enum work_type type, char *name, void *id)
+static void add_work(char *name, char *filename)
 {
 	grep_lock();
 
@@ -122,9 +99,8 @@ static void add_work(enum work_type type, char *name, void *id)
 		pthread_cond_wait(&cond_write, &grep_mutex);
 	}
 
-	todo[todo_end].type = type;
 	todo[todo_end].name = name;
-	todo[todo_end].identifier = id;
+	todo[todo_end].filename = filename;
 	todo[todo_end].done = 0;
 	strbuf_reset(&todo[todo_end].out);
 	todo_end = (todo_end + 1) % ARRAY_SIZE(todo);
@@ -152,19 +128,10 @@ static void add_work(enum work_type type, char *name, void *id)
 	return ret;
 }
 
-static void grep_sha1_async(struct grep_opt *opt, char *name,
-			    const unsigned char *sha1)
-{
-	unsigned char *s;
-	s = xmalloc(20);
-	memcpy(s, sha1, 20);
-	add_work(WORK_SHA1, name, s);
-}
-
 static void grep_file_async(struct grep_opt *opt, char *name,
 			    const char *filename)
 {
-	add_work(WORK_FILE, name, xstrdup(filename));
+	add_work(name, xstrdup(filename));
 }
 
 static void work_done(struct work_item *w)
@@ -194,7 +161,7 @@ static void work_done(struct work_item *w)
 			write_or_die(1, p, len);
 		}
 		free(w->name);
-		free(w->identifier);
+		free(w->filename);
 	}
 
 	if (old_done != todo_done)
@@ -213,29 +180,18 @@ static void work_done(struct work_item *w)
 
 	while (1) {
 		struct work_item *w = get_work();
+		size_t sz;
+		void* data;
+
 		if (!w)
 			break;
 
 		opt->output_priv = w;
-		if (w->type == WORK_SHA1) {
-			unsigned long sz;
-			void* data = load_sha1(w->identifier, &sz, w->name);
-
-			if (data) {
-				hit |= grep_buffer(opt, w->name, data, sz);
-				free(data);
-			}
-		} else if (w->type == WORK_FILE) {
-			size_t sz;
-			void* data = load_file(w->identifier, &sz);
-			if (data) {
-				hit |= grep_buffer(opt, w->name, data, sz);
-				free(data);
-			}
-		} else {
-			assert(0);
+		data = load_file(w->filename, &sz);
+		if (data) {
+			hit |= grep_buffer(opt, w->name, data, sz);
+			free(data);
 		}
-
 		work_done(w);
 	}
 	free_grep_patterns(arg);
@@ -255,7 +211,6 @@ static void start_threads(struct grep_opt *opt)
 	int i;
 
 	pthread_mutex_init(&grep_mutex, NULL);
-	pthread_mutex_init(&read_sha1_mutex, NULL);
 	pthread_mutex_init(&grep_attr_mutex, NULL);
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
@@ -303,7 +258,6 @@ static int wait_all(void)
 	}
 
 	pthread_mutex_destroy(&grep_mutex);
-	pthread_mutex_destroy(&read_sha1_mutex);
 	pthread_mutex_destroy(&grep_attr_mutex);
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
@@ -312,9 +266,6 @@ static int wait_all(void)
 	return hit;
 }
 #else /* !NO_PTHREADS */
-#define read_sha1_lock()
-#define read_sha1_unlock()
-
 static int wait_all(void)
 {
 	return 0;
@@ -371,21 +322,11 @@ static int grep_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
-static void *lock_and_read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
-{
-	void *data;
-
-	read_sha1_lock();
-	data = read_sha1_file(sha1, type, size);
-	read_sha1_unlock();
-	return data;
-}
-
 static void *load_sha1(const unsigned char *sha1, unsigned long *size,
 		       const char *name)
 {
 	enum object_type type;
-	void *data = lock_and_read_sha1_file(sha1, &type, size);
+	void *data = read_sha1_file(sha1, &type, size);
 
 	if (!data)
 		error(_("'%s': unable to read %s"), name, sha1_to_hex(sha1));
@@ -398,6 +339,9 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 {
 	struct strbuf pathbuf = STRBUF_INIT;
 	char *name;
+	int hit;
+	unsigned long sz;
+	void *data;
 
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, -1, &pathbuf,
@@ -409,25 +353,15 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 
 	name = strbuf_detach(&pathbuf, NULL);
 
-#ifndef NO_PTHREADS
-	if (use_threads) {
-		grep_sha1_async(opt, name, sha1);
-		return 0;
-	} else
-#endif
-	{
-		int hit;
-		unsigned long sz;
-		void *data = load_sha1(sha1, &sz, name);
-		if (!data)
-			hit = 0;
-		else
-			hit = grep_buffer(opt, name, data, sz);
+	data = load_sha1(sha1, &sz, name);
+	if (!data)
+		hit = 0;
+	else
+		hit = grep_buffer(opt, name, data, sz);
 
-		free(data);
-		free(name);
-		return hit;
-	}
+	free(data);
+	free(name);
+	return hit;
 }
 
 static void *load_file(const char *filename, size_t *sz)
@@ -586,7 +520,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			void *data;
 			unsigned long size;
 
-			data = lock_and_read_sha1_file(entry.sha1, &type, &size);
+			data = read_sha1_file(entry.sha1, &type, &size);
 			if (!data)
 				die(_("unable to read tree (%s)"),
 				    sha1_to_hex(entry.sha1));
@@ -616,10 +550,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		struct strbuf base;
 		int hit, len;
 
-		read_sha1_lock();
 		data = read_object_with_reference(obj->sha1, tree_type,
 						  &size, NULL);
-		read_sha1_unlock();
 
 		if (!data)
 			die(_("unable to read tree (%s)"), sha1_to_hex(obj->sha1));
@@ -1003,26 +935,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
-#ifndef NO_PTHREADS
-	if (online_cpus() == 1)
-		use_threads = 0;
-#else
-	use_threads = 0;
-#endif
-
-	opt.use_threads = use_threads;
-
-#ifndef NO_PTHREADS
-	if (use_threads) {
-		if (opt.pre_context || opt.post_context || opt.file_break ||
-		    opt.funcbody)
-			skip_first_line = 1;
-		start_threads(&opt);
-	}
-#else
-	use_threads = 0;
-#endif
-
 	compile_grep_patterns(&opt);
 
 	/* Check revs and then paths */
@@ -1044,6 +956,25 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		break;
 	}
 
+#ifndef NO_PTHREADS
+	if (online_cpus() == 1 || cached || list.nr)
+		use_threads = 0;
+#else
+	use_threads = 0;
+#endif
+
+	opt.use_threads = use_threads;
+
+#ifndef NO_PTHREADS
+	if (use_threads) {
+		opt.use_threads = use_threads;
+		if (opt.pre_context || opt.post_context || opt.file_break ||
+		    opt.funcbody)
+			skip_first_line = 1;
+		start_threads(&opt);
+	}
+#endif
+
 	/* The rest are paths */
 	if (!seen_dashdash) {
 		int j;
-- 
1.7.8.rc4.388.ge53ab

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
@ 2011-12-02 16:15               ` René Scharfe
  2011-12-05  9:02                 ` Thomas Rast
                                   ` (2 more replies)
  2011-12-23 22:37               ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
  1 sibling, 3 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-02 16:15 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Eric Herman, git, Junio C Hamano

Am 02.12.2011 14:07, schrieb Thomas Rast:
> Measuring grep performance showed that in all but the worktree case
> (as opposed to --cached,<committish>  or<treeish>), threading
> actually slows things down.  For example, on my dual-core
> hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got:
>
> Threads       worktree case                 | --cached case
> --------------------------------------------------------------------------
> 8 (default) | 2.17user 0.15sys 0:02.20real  | 0.11user 0.26sys 0:00.11real
> 4           | 2.06user 0.17sys 0:02.08real  | 0.11user 0.26sys 0:00.12real
> 2           | 2.02user 0.25sys 0:02.08real  | 0.15user 0.37sys 0:00.28real
> NO_PTHREADS | 1.57user 0.05sys 0:01.64real  | 0.09user 0.12sys 0:00.22real

Are the columns mixed up?

> I conjecture that this is caused by contention on read_sha1_mutex.

Yeah, and I wonder why we need to have this lock in the first place. In 
theory, multiple readers shouldn't have to affect each other at all, 
right?  The lock could be pushed down into read_sha1_file(), or a 
thread-safe variant of the function added.

In pratice, however, the code in sha1_file.c etc. scares me. ;-)

> So disable threading entirely when not scanning the worktree, to get
> the NO_PTHREADS performance in that case.  This obsoletes all code
> related to grep_sha1_async.  The thread startup must be delayed until
> after all arguments have been parsed, but this does not have a
> measurable effect.

This is a bit radical.  I think the underlying issue that 
read_sha1_file() is not thread-safe can be solved eventually and then 
we'd need to readd that code.

How about adding a parameter to control the number of threads 
(--threads?) instead that defaults to eight (or five) for the worktree 
and one for the rest?  That would also make benchmarking easier.

René

PS: Patches one and three missed a signoff.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-02 16:15               ` René Scharfe
@ 2011-12-05  9:02                 ` Thomas Rast
  2011-12-06 22:48                 ` René Scharfe
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
  2 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-05  9:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano

René Scharfe wrote:
> Am 02.12.2011 14:07, schrieb Thomas Rast:
> > Measuring grep performance showed that in all but the worktree case
> > (as opposed to --cached,<committish>  or<treeish>), threading
> > actually slows things down.  For example, on my dual-core
> > hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got:
> >
> > Threads       worktree case                 | --cached case
> > --------------------------------------------------------------------------
> > 8 (default) | 2.17user 0.15sys 0:02.20real  | 0.11user 0.26sys 0:00.11real
> > 4           | 2.06user 0.17sys 0:02.08real  | 0.11user 0.26sys 0:00.12real
> > 2           | 2.02user 0.25sys 0:02.08real  | 0.15user 0.37sys 0:00.28real
> > NO_PTHREADS | 1.57user 0.05sys 0:01.64real  | 0.09user 0.12sys 0:00.22real
> 
> Are the columns mixed up?

Indeed, sorry.

In case you were wondering why this table is different from the
numbers given in the cover letter: I noticed at some point that I had
an incomplete checkout (apparently 'git checkout -- .' is really not
the same as 'git reset --hard'... sigh).  Then I saw that while the
numbers were different, the conclusion was not, so I forgot to update
it.

> This is a bit radical.  I think the underlying issue that 
> read_sha1_file() is not thread-safe can be solved eventually and then 
> we'd need to readd that code.

I'm also scared of sha1_file.c, especially when it gets down to
packfiles.  But perhaps it wouldn't be *too* hard to do it in parallel
iff the object can be read from the loose object store.

> PS: Patches one and three missed a signoff.

Oops, thanks, turns out I had a misconfigured alias ...

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-02 16:15               ` René Scharfe
  2011-12-05  9:02                 ` Thomas Rast
@ 2011-12-06 22:48                 ` René Scharfe
  2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
                                     ` (2 more replies)
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
  2 siblings, 3 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-06 22:48 UTC (permalink / raw)
  To: git; +Cc: Eric Herman, git, Junio C Hamano

Am 02.12.2011 17:15, schrieb René Scharfe:
> How about adding a parameter to control the number of threads 
> (--threads?) instead that defaults to eight (or five) for the worktree 
> and one for the rest? That would also make benchmarking easier.

Like this:

-- >8 --
Subject: grep: add parameter --threads

Allow the number of threads to be specified by the user.  This makes
benchmarking the performance impact of different numbers of threads
much easier.

Move the code for thread handling after argument parsing.  This allows
to change the default number of threads based on the kind of search
(worktree etc.) later on.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Applies on top of your second patch.

 Documentation/git-grep.txt |    4 ++
 builtin/grep.c             |   75 +++++++++++++++++++++++--------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 15d6711..47ac188 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -227,6 +227,10 @@ OPTIONS
 	Do not output matched lines; instead, exit with status 0 when
 	there is a match and with non-zero status when there isn't.
 
+--threads <n>::
+	Run <n> search threads in parallel.  Default is 8.  This option
+	is ignored if git was built without support for POSIX threads.
+
 <tree>...::
 	Instead of searching tracked files in the working tree, search
 	blobs in the given trees.
diff --git a/builtin/grep.c b/builtin/grep.c
index 65b1ffe..0bda900 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,10 @@ static char const * const grep_usage[] = {
 	NULL
 };
 
-static int use_threads = 1;
-
 #ifndef NO_PTHREADS
 #define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
+static int nr_threads = -1;
 
 static void *load_sha1(const unsigned char *sha1, unsigned long *size,
 		       const char *name);
@@ -76,13 +75,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-	if (use_threads)
+	if (nr_threads > 0)
 		pthread_mutex_lock(&grep_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-	if (use_threads)
+	if (nr_threads > 0)
 		pthread_mutex_unlock(&grep_mutex);
 }
 
@@ -91,13 +90,13 @@ static pthread_mutex_t read_sha1_mutex;
 
 static inline void read_sha1_lock(void)
 {
-	if (use_threads)
+	if (nr_threads > 0)
 		pthread_mutex_lock(&read_sha1_mutex);
 }
 
 static inline void read_sha1_unlock(void)
 {
-	if (use_threads)
+	if (nr_threads > 0)
 		pthread_mutex_unlock(&read_sha1_mutex);
 }
 
@@ -254,6 +253,8 @@ static void start_threads(struct grep_opt *opt)
 {
 	int i;
 
+	threads = xcalloc(nr_threads, sizeof(pthread_t));
+
 	pthread_mutex_init(&grep_mutex, NULL);
 	pthread_mutex_init(&read_sha1_mutex, NULL);
 	pthread_mutex_init(&grep_attr_mutex, NULL);
@@ -265,7 +266,7 @@ static void start_threads(struct grep_opt *opt)
 		strbuf_init(&todo[i].out, 0);
 	}
 
-	for (i = 0; i < ARRAY_SIZE(threads); i++) {
+	for (i = 0; i < nr_threads; i++) {
 		int err;
 		struct grep_opt *o = grep_opt_dup(opt);
 		o->output = strbuf_out;
@@ -296,7 +297,7 @@ static int wait_all(void)
 	pthread_cond_broadcast(&cond_add);
 	grep_unlock();
 
-	for (i = 0; i < ARRAY_SIZE(threads); i++) {
+	for (i = 0; i < nr_threads; i++) {
 		void *h;
 		pthread_join(threads[i], &h);
 		hit |= (int) (intptr_t) h;
@@ -309,6 +310,8 @@ static int wait_all(void)
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
 
+	free(threads);
+
 	return hit;
 }
 #else /* !NO_PTHREADS */
@@ -410,7 +413,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	name = strbuf_detach(&pathbuf, NULL);
 
 #ifndef NO_PTHREADS
-	if (use_threads) {
+	if (nr_threads > 0) {
 		grep_sha1_async(opt, name, sha1);
 		return 0;
 	} else
@@ -472,7 +475,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	name = strbuf_detach(&buf, NULL);
 
 #ifndef NO_PTHREADS
-	if (use_threads) {
+	if (nr_threads > 0) {
 		grep_file_async(opt, name, filename);
 		return 0;
 	} else
@@ -895,6 +898,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			PARSE_OPT_OPTARG, NULL, (intptr_t)default_pager },
 		OPT_BOOLEAN(0, "ext-grep", &external_grep_allowed__ignored,
 			    "allow calling of grep(1) (ignored by this build)"),
+#ifdef NO_PTHREADS
+		OPT_INTEGER(0, "threads", &nr_threads,
+			"handle <n> files in parallel (ignored by this build)"),
+#else
+		OPT_INTEGER(0, "threads", &nr_threads,
+			"handle <n> files in parallel"),
+#endif
 		{ OPTION_CALLBACK, 0, "help-all", &options, NULL, "show usage",
 		  PARSE_OPT_HIDDEN | PARSE_OPT_NOARG, help_callback },
 		OPT_END()
@@ -995,7 +1005,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.output_priv = &path_list;
 		opt.output = append_path;
 		string_list_append(&path_list, show_in_pager);
-		use_threads = 0;
+		nr_threads = 0;
 	}
 
 	if (!opt.pattern_list)
@@ -1003,28 +1013,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
-#ifndef NO_PTHREADS
-	if (online_cpus() == 1)
-		use_threads = 0;
-#else
-	use_threads = 0;
-#endif
-
-	opt.use_threads = use_threads;
-
-#ifndef NO_PTHREADS
-	if (use_threads) {
-		if (opt.pre_context || opt.post_context || opt.file_break ||
-		    opt.funcbody)
-			skip_first_line = 1;
-		start_threads(&opt);
-	}
-#else
-	use_threads = 0;
-#endif
-
-	compile_grep_patterns(&opt);
-
 	/* Check revs and then paths */
 	for (i = 0; i < argc; i++) {
 		const char *arg = argv[i];
@@ -1056,6 +1044,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+#ifdef NO_PTHREADS
+	nr_threads = 0;
+#else
+	if (nr_threads == -1)
+		nr_threads = (online_cpus() > 1) ? THREADS : 0;
+
+	if (nr_threads > 0) {
+		opt.use_threads = 1;
+		if (opt.pre_context || opt.post_context || opt.file_break ||
+		    opt.funcbody)
+			skip_first_line = 1;
+		start_threads(&opt);
+	}
+#endif
+
+	compile_grep_patterns(&opt);
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -1100,7 +1105,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		hit = grep_objects(&opt, &pathspec, &list);
 	}
 
-	if (use_threads)
+	if (nr_threads > 0)
 		hit |= wait_all();
 	if (hit && show_in_pager)
 		run_pager(&opt, prefix);
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-06 22:48                 ` René Scharfe
@ 2011-12-06 23:01                   ` René Scharfe
  2011-12-07  4:42                     ` Jeff King
  2011-12-07  8:12                     ` Thomas Rast
  2011-12-07  4:24                   ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
  2011-12-07  8:11                   ` Thomas Rast
  2 siblings, 2 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-06 23:01 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano

Reading of git objects needs to be protected by an exclusive lock
and cannot be parallelized.  Searching the read buffers can be done
in parallel, but for simple expressions threading is a net loss due
to its overhead, as measured by Thomas.  Turn it off unless we're
searching in the worktree.

Once the object store can be read safely by multiple threads in
parallel this patch should be reverted.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Goes on top of my earlier patch.  Could use a better commit message
with your (cleaned up) performance numbers.

 Documentation/git-grep.txt |    5 +++--
 builtin/grep.c             |    2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 47ac188..e981a9b 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -228,8 +228,9 @@ OPTIONS
 	there is a match and with non-zero status when there isn't.
 
 --threads <n>::
-	Run <n> search threads in parallel.  Default is 8.  This option
-	is ignored if git was built without support for POSIX threads.
+	Run <n> search threads in parallel.  Default is 8 when searching
+	the worktree and 0 otherwise.  This option is ignored if git was
+	built without support for POSIX threads.
 
 <tree>...::
 	Instead of searching tracked files in the working tree, search
diff --git a/builtin/grep.c b/builtin/grep.c
index 0bda900..f698642 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	nr_threads = 0;
 #else
 	if (nr_threads == -1)
-		nr_threads = (online_cpus() > 1) ? THREADS : 0;
+		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
 
 	if (nr_threads > 0) {
 		opt.use_threads = 1;
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
@ 2011-12-07  4:42                     ` Jeff King
  2011-12-07 17:11                       ` René Scharfe
  2011-12-07 20:11                       ` J. Bruce Fields
  2011-12-07  8:12                     ` Thomas Rast
  1 sibling, 2 replies; 50+ messages in thread
From: Jeff King @ 2011-12-07  4:42 UTC (permalink / raw)
  To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano

On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote:

> Reading of git objects needs to be protected by an exclusive lock
> and cannot be parallelized.  Searching the read buffers can be done
> in parallel, but for simple expressions threading is a net loss due
> to its overhead, as measured by Thomas.  Turn it off unless we're
> searching in the worktree.

Based on my earlier numbers, I was going to complain that we should
also be checking the "simple expressions" assumption here, as time spent
in the actual regex might be important.

However, after trying to repeat my experiment, I think the numbers I
posted earlier were misleading. For example, using my "more complex"
regex of 'a.*b':

  $ time git grep --threads=8 'a.*b' HEAD >/dev/null
  real    0m8.655s
  user    0m23.817s
  sys     0m0.480s

Look at that sweet, sweet parallelism. It's a quad-core with
hyperthreading, so we're not getting the 8x speedup we might hope for
(presumably due to lock contention on extracting blobs), but hey, 3x
isn't bad. Except, wait:

  $ time git grep --threads=0 'a.*b' HEAD >/dev/null
  real    0m7.651s
  user    0m7.600s
  sys     0m0.048s

We can get 1x on a single core, but the total time is lower! This
processor is an i7 with "turbo boost", which means it clocks higher in
single-core mode than when multiple cores are active. So the numbers I
posted earlier were misleading. Yes, we got parallelism, but at the cost
of knocking the clock speed down for a net loss.

The sweet spot for me seems to be:

  $ time git grep --threads=2 'a.*b' HEAD >/dev/null
  real    0m6.303s
  user    0m11.129s
  sys     0m0.220s

I'd be curious to see results from somebody with a quad-core (or more)
without turbo boost; I suspect that threading may have more benefit
there, even though we have some lock contention for blobs.

> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	nr_threads = 0;
>  #else
>  	if (nr_threads == -1)
> -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
> +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
>  
>  	if (nr_threads > 0) {
>  		opt.use_threads = 1;

This doesn't kick in for "--cached", which has the same performance
characteristics as grepping a tree. I think you want to add "&& !cached" to
the conditional.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07  4:42                     ` Jeff King
@ 2011-12-07 17:11                       ` René Scharfe
  2011-12-07 18:28                         ` Jeff King
  2011-12-07 20:11                       ` J. Bruce Fields
  1 sibling, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-12-07 17:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano

Am 07.12.2011 05:42, schrieb Jeff King:
> On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote:
> 
>> Reading of git objects needs to be protected by an exclusive lock
>> and cannot be parallelized.  Searching the read buffers can be done
>> in parallel, but for simple expressions threading is a net loss due
>> to its overhead, as measured by Thomas.  Turn it off unless we're
>> searching in the worktree.
> 
> Based on my earlier numbers, I was going to complain that we should
> also be checking the "simple expressions" assumption here, as time spent
> in the actual regex might be important.
> 
> However, after trying to repeat my experiment, I think the numbers I
> posted earlier were misleading. For example, using my "more complex"
> regex of 'a.*b':
> 
>   $ time git grep --threads=8 'a.*b' HEAD >/dev/null
>   real    0m8.655s
>   user    0m23.817s
>   sys     0m0.480s
> 
> Look at that sweet, sweet parallelism. It's a quad-core with
> hyperthreading, so we're not getting the 8x speedup we might hope for
> (presumably due to lock contention on extracting blobs), but hey, 3x
> isn't bad. Except, wait:
> 
>   $ time git grep --threads=0 'a.*b' HEAD >/dev/null
>   real    0m7.651s
>   user    0m7.600s
>   sys     0m0.048s
> 
> We can get 1x on a single core, but the total time is lower! This
> processor is an i7 with "turbo boost", which means it clocks higher in
> single-core mode than when multiple cores are active. So the numbers I
> posted earlier were misleading. Yes, we got parallelism, but at the cost
> of knocking the clock speed down for a net loss.

Ugh, right, Turbo Boost complicates matters.

I don't understand the multiplied user time in the threaded case,
though.  Is it caused by busy-waiting?  Thomas reported similar numbers
earlier.

> The sweet spot for me seems to be:
> 
>   $ time git grep --threads=2 'a.*b' HEAD >/dev/null
>   real    0m6.303s
>   user    0m11.129s
>   sys     0m0.220s
> 
> I'd be curious to see results from somebody with a quad-core (or more)
> without turbo boost; I suspect that threading may have more benefit
> there, even though we have some lock contention for blobs.

It would be nice if we could come up with simple rules to calculate
defaults for the number of threads on a given run.  Users shouldn't have
to specify this option normally.  And it would be good if these rules
didn't require a list of all CPUs known to git. :)

>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>>  	nr_threads = 0;
>>  #else
>>  	if (nr_threads == -1)
>> -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
>> +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
>>  
>>  	if (nr_threads > 0) {
>>  		opt.use_threads = 1;
> 
> This doesn't kick in for "--cached", which has the same performance
> characteristics as grepping a tree. I think you want to add "&& !cached" to
> the conditional.

Oh, yes.

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07 17:11                       ` René Scharfe
@ 2011-12-07 18:28                         ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2011-12-07 18:28 UTC (permalink / raw)
  To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano

On Wed, Dec 07, 2011 at 06:11:47PM +0100, René Scharfe wrote:

> >   $ time git grep --threads=8 'a.*b' HEAD >/dev/null
> >   real    0m8.655s
> >   user    0m23.817s
> >   sys     0m0.480s
> [...]
>
> Ugh, right, Turbo Boost complicates matters.
> 
> I don't understand the multiplied user time in the threaded case,
> though.  Is it caused by busy-waiting?  Thomas reported similar numbers
> earlier.

I think it's mostly the clock speed. This processor runs at 1.86GHz but
boosts to 3.2GHz. So we'd expect just the actual work to take close to
twice as long. Plus it's a quad-core with hyperthreading, so 8 threads
is going to mean two threads sharing each core, including cache (i.e.,
hyperthreading a core does not let you double performance, even though
it presents itself as an extra core).

And then you have context switching and lock overhead. So I can believe
that it takes 3x the CPU time to accomplish the task. In an ideal world,
it would be mitigated by having 8x the threads, but in this case, lock
contention brings us down to less than 3x, and it's a slight net loss.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07  4:42                     ` Jeff King
  2011-12-07 17:11                       ` René Scharfe
@ 2011-12-07 20:11                       ` J. Bruce Fields
  2011-12-07 20:45                         ` Jeff King
  1 sibling, 1 reply; 50+ messages in thread
From: J. Bruce Fields @ 2011-12-07 20:11 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, Thomas Rast, git, Eric Herman, Junio C Hamano

On Tue, Dec 06, 2011 at 11:42:42PM -0500, Jeff King wrote:
> On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote:
> 
> > Reading of git objects needs to be protected by an exclusive lock
> > and cannot be parallelized.  Searching the read buffers can be done
> > in parallel, but for simple expressions threading is a net loss due
> > to its overhead, as measured by Thomas.  Turn it off unless we're
> > searching in the worktree.
> 
> Based on my earlier numbers, I was going to complain that we should
> also be checking the "simple expressions" assumption here, as time spent
> in the actual regex might be important.
> 
> However, after trying to repeat my experiment, I think the numbers I
> posted earlier were misleading. For example, using my "more complex"
> regex of 'a.*b':
> 
>   $ time git grep --threads=8 'a.*b' HEAD >/dev/null
>   real    0m8.655s
>   user    0m23.817s
>   sys     0m0.480s

Dumb question (I missed the beginning of the conversation): what kind of
storage are you using, and is the data already cached?

I seem to recall part of the motivation for the multithreading being
NFS, where the goal isn't so much to keep CPU's busy as it is to keep
the network busy.

Probably a bigger problem for something like "git status" which I think
ends up doing a series of stat's (which can each require a round trip to
the server in the NFS case), as it is a problem for something like
git-grep that's also doing reads.

Just a plea for considering the IO cost as well when making these kinds
of decisions....

(Which maybe you already do, apologies again for just naively dropping
into the middle of a thread.)

--b.

> 
> Look at that sweet, sweet parallelism. It's a quad-core with
> hyperthreading, so we're not getting the 8x speedup we might hope for
> (presumably due to lock contention on extracting blobs), but hey, 3x
> isn't bad. Except, wait:
> 
>   $ time git grep --threads=0 'a.*b' HEAD >/dev/null
>   real    0m7.651s
>   user    0m7.600s
>   sys     0m0.048s
> 
> We can get 1x on a single core, but the total time is lower! This
> processor is an i7 with "turbo boost", which means it clocks higher in
> single-core mode than when multiple cores are active. So the numbers I
> posted earlier were misleading. Yes, we got parallelism, but at the cost
> of knocking the clock speed down for a net loss.
> 
> The sweet spot for me seems to be:
> 
>   $ time git grep --threads=2 'a.*b' HEAD >/dev/null
>   real    0m6.303s
>   user    0m11.129s
>   sys     0m0.220s
> 
> I'd be curious to see results from somebody with a quad-core (or more)
> without turbo boost; I suspect that threading may have more benefit
> there, even though we have some lock contention for blobs.
> 
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
> >  	nr_threads = 0;
> >  #else
> >  	if (nr_threads == -1)
> > -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
> > +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
> >  
> >  	if (nr_threads > 0) {
> >  		opt.use_threads = 1;
> 
> This doesn't kick in for "--cached", which has the same performance
> characteristics as grepping a tree. I think you want to add "&& !cached" to
> the conditional.
> 
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07 20:11                       ` J. Bruce Fields
@ 2011-12-07 20:45                         ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2011-12-07 20:45 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: René Scharfe, Thomas Rast, git, Eric Herman, Junio C Hamano

On Wed, Dec 07, 2011 at 03:11:05PM -0500, J. Bruce Fields wrote:

> >   $ time git grep --threads=8 'a.*b' HEAD >/dev/null
> >   real    0m8.655s
> >   user    0m23.817s
> >   sys     0m0.480s
> 
> Dumb question (I missed the beginning of the conversation): what kind of
> storage are you using, and is the data already cached?

Sorry, I should have been clear: all of those numbers are with a warm
cache. So this is measuring only CPU.

> I seem to recall part of the motivation for the multithreading being
> NFS, where the goal isn't so much to keep CPU's busy as it is to keep
> the network busy.
> 
> Probably a bigger problem for something like "git status" which I think
> ends up doing a series of stat's (which can each require a round trip to
> the server in the NFS case), as it is a problem for something like
> git-grep that's also doing reads.
> 
> Just a plea for considering the IO cost as well when making these kinds
> of decisions....

This system has a decent-quality SSD, so the I/O timings are perhaps
not as interesting as they might otherwise be. But here are cold cache
numbers (each run after 'echo 3 >/proc/sys/vm/drop_caches'):

  HEAD, --threads=0: 4.956s
  HEAD, --threads=8: 9.917s
  working tree, --threads=0: 17.444s
  working tree, --threads=8: 6.462s

So when pulling from the object db, threads are still a huge loss
(because the data is compressed, the SSD is fast, and we spend a lot of
CPU time inflating; so it ends up close to the warm cache results). But
for the working tree, the I/O parallelism is a huge win.

So at least on my system, cold cache vs. warm cache leads to the same
conclusion. "git grep --threads=8 ... HEAD" might still be a win on slow
disks or NFS, though.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
  2011-12-07  4:42                     ` Jeff King
@ 2011-12-07  8:12                     ` Thomas Rast
  2011-12-07 17:00                       ` René Scharfe
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-12-07  8:12 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano

René Scharfe wrote:
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 47ac188..e981a9b 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -228,8 +228,9 @@ OPTIONS
>  	there is a match and with non-zero status when there isn't.
>  
>  --threads <n>::
> +	Run <n> search threads in parallel.  Default is 8 when searching
> +	the worktree and 0 otherwise.  This option is ignored if git was
> +	built without support for POSIX threads.
[...]
> -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
> +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;

It would be more consistent to stick to the pack.threads convention
where 0 means "all of my cores", so to disable threading the user
would set the number of threads to 1.  Or were you trying to measure
the contention between the worker thread and the add_work() thread?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07  8:12                     ` Thomas Rast
@ 2011-12-07 17:00                       ` René Scharfe
  2011-12-10 13:13                         ` Pete Wyckoff
  0 siblings, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-12-07 17:00 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano

Am 07.12.2011 09:12, schrieb Thomas Rast:
> René Scharfe wrote:
>> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
>> index 47ac188..e981a9b 100644
>> --- a/Documentation/git-grep.txt
>> +++ b/Documentation/git-grep.txt
>> @@ -228,8 +228,9 @@ OPTIONS
>>  	there is a match and with non-zero status when there isn't.
>>  
>>  --threads <n>::
>> +	Run <n> search threads in parallel.  Default is 8 when searching
>> +	the worktree and 0 otherwise.  This option is ignored if git was
>> +	built without support for POSIX threads.
> [...]
>> -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
>> +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
> 
> It would be more consistent to stick to the pack.threads convention
> where 0 means "all of my cores", so to disable threading the user
> would set the number of threads to 1.  Or were you trying to measure
> the contention between the worker thread and the add_work() thread?

Yes, indeed, the cost for the threading overhead does interest me.  The
documentation should perhaps mention --no-threads explicitly to avoid
confusion.

Currently there is no way to specify "as many threads as there are
cores" here.  Previous measurements indicated that it wasn't too useful,
however, because I/O parallelism was beneficial even for machines with
less than eight cores and more threads didn't pay off.

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-07 17:00                       ` René Scharfe
@ 2011-12-10 13:13                         ` Pete Wyckoff
  2011-12-12 22:37                           ` René Scharfe
  0 siblings, 1 reply; 50+ messages in thread
From: Pete Wyckoff @ 2011-12-10 13:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano

rene.scharfe@lsrfire.ath.cx wrote on Wed, 07 Dec 2011 18:00 +0100:
> Am 07.12.2011 09:12, schrieb Thomas Rast:
> > René Scharfe wrote:
> >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> >> index 47ac188..e981a9b 100644
> >> --- a/Documentation/git-grep.txt
> >> +++ b/Documentation/git-grep.txt
> >> @@ -228,8 +228,9 @@ OPTIONS
> >>  	there is a match and with non-zero status when there isn't.
> >>  
> >>  --threads <n>::
> >> +	Run <n> search threads in parallel.  Default is 8 when searching
> >> +	the worktree and 0 otherwise.  This option is ignored if git was
> >> +	built without support for POSIX threads.
> > [...]
> >> -		nr_threads = (online_cpus() > 1) ? THREADS : 0;
> >> +		nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0;
> > 
> > It would be more consistent to stick to the pack.threads convention
> > where 0 means "all of my cores", so to disable threading the user
> > would set the number of threads to 1.  Or were you trying to measure
> > the contention between the worker thread and the add_work() thread?
> 
> Yes, indeed, the cost for the threading overhead does interest me.  The
> documentation should perhaps mention --no-threads explicitly to avoid
> confusion.
> 
> Currently there is no way to specify "as many threads as there are
> cores" here.  Previous measurements indicated that it wasn't too useful,
> however, because I/O parallelism was beneficial even for machines with
> less than eight cores and more threads didn't pay off.

Right.  Even for single CPU machines this is true, so the
nr_threads calculation above should still use all 8 THREADS
regardless of the number of online_cpus().

		-- Pete

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/2] grep: turn off threading for non-worktree
  2011-12-10 13:13                         ` Pete Wyckoff
@ 2011-12-12 22:37                           ` René Scharfe
  0 siblings, 0 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-12 22:37 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano

Am 10.12.2011 14:13, schrieb Pete Wyckoff:
> rene.scharfe@lsrfire.ath.cx wrote on Wed, 07 Dec 2011 18:00 +0100:
>> Am 07.12.2011 09:12, schrieb Thomas Rast:
>>> René Scharfe wrote:
>>>> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
>>>> index 47ac188..e981a9b 100644
>>>> --- a/Documentation/git-grep.txt
>>>> +++ b/Documentation/git-grep.txt
>>>> @@ -228,8 +228,9 @@ OPTIONS
>>>>   	there is a match and with non-zero status when there isn't.
>>>>
>>>>   --threads<n>::
>>>> +	Run<n>  search threads in parallel.  Default is 8 when searching
>>>> +	the worktree and 0 otherwise.  This option is ignored if git was
>>>> +	built without support for POSIX threads.
>>> [...]
>>>> -		nr_threads = (online_cpus()>  1) ? THREADS : 0;
>>>> +		nr_threads = (online_cpus()>  1&&  !list.nr) ? THREADS : 0;
>>>
>>> It would be more consistent to stick to the pack.threads convention
>>> where 0 means "all of my cores", so to disable threading the user
>>> would set the number of threads to 1.  Or were you trying to measure
>>> the contention between the worker thread and the add_work() thread?
>>
>> Yes, indeed, the cost for the threading overhead does interest me.  The
>> documentation should perhaps mention --no-threads explicitly to avoid
>> confusion.
>>
>> Currently there is no way to specify "as many threads as there are
>> cores" here.  Previous measurements indicated that it wasn't too useful,
>> however, because I/O parallelism was beneficial even for machines with
>> less than eight cores and more threads didn't pay off.
>
> Right.  Even for single CPU machines this is true, so the
> nr_threads calculation above should still use all 8 THREADS
> regardless of the number of online_cpus().

That makes sense.  However, in a quick test with a simple regex against 
a cache-warm Linux repo threading increased the runtime of git grep by 
30% on a single-core virtual machine.  Let's keep that check until we 
understand this better..

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-06 22:48                 ` René Scharfe
  2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
@ 2011-12-07  4:24                   ` Jeff King
  2011-12-07 16:52                     ` René Scharfe
  2011-12-07  8:11                   ` Thomas Rast
  2 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2011-12-07  4:24 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano

On Tue, Dec 06, 2011 at 11:48:26PM +0100, René Scharfe wrote:

>  #ifndef NO_PTHREADS
> -	if (use_threads) {
> +	if (nr_threads > 0) {
>  		grep_sha1_async(opt, name, sha1);
>  		return 0;
>  	} else

Should this be "if (nr_threads > 1)"?

As a user, I would do:

  git grep --threads=1 ...

if I wanted a single-threaded process. Instead, we actually spawn a
sub-thread and do all of the locking, which has a measurable cost:

  $ time git grep --threads=0 SIMPLE HEAD >/dev/null
  real    0m2.994s
  user    0m2.932s
  sys     0m0.060s

  $ time git grep --threads=1 SIMPLE HEAD >/dev/null
  real    0m3.407s
  user    0m3.392s
  sys     0m0.140s

Should --threads=1 be equivalent to --threads=0?

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-07  4:24                   ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
@ 2011-12-07 16:52                     ` René Scharfe
  2011-12-07 18:10                       ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: René Scharfe @ 2011-12-07 16:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Herman, Junio C Hamano

Am 07.12.2011 05:24, schrieb Jeff King:
> On Tue, Dec 06, 2011 at 11:48:26PM +0100, René Scharfe wrote:
> 
>>  #ifndef NO_PTHREADS
>> -	if (use_threads) {
>> +	if (nr_threads > 0) {
>>  		grep_sha1_async(opt, name, sha1);
>>  		return 0;
>>  	} else
> 
> Should this be "if (nr_threads > 1)"?
> 
> As a user, I would do:
> 
>   git grep --threads=1 ...
> 
> if I wanted a single-threaded process. Instead, we actually spawn a
> sub-thread and do all of the locking, which has a measurable cost:

Yes, the difference is measurable, and that's exactly how I like it to
be. :)  A user can turn off threading with --threads=0 or (more
intuitively) --no-threads.  And we can quantify the overhead.

>   $ time git grep --threads=0 SIMPLE HEAD >/dev/null
>   real    0m2.994s
>   user    0m2.932s
>   sys     0m0.060s
> 
>   $ time git grep --threads=1 SIMPLE HEAD >/dev/null
>   real    0m3.407s
>   user    0m3.392s
>   sys     0m0.140s
> 
> Should --threads=1 be equivalent to --threads=0?

We can do that if there's another way to calculate this difference, or
if it is not useful to know.  I find your results interesting at least,
though. :)

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-07 16:52                     ` René Scharfe
@ 2011-12-07 18:10                       ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2011-12-07 18:10 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano

On Wed, Dec 07, 2011 at 05:52:10PM +0100, René Scharfe wrote:

> > As a user, I would do:
> > 
> >   git grep --threads=1 ...
> > 
> > if I wanted a single-threaded process. Instead, we actually spawn a
> > sub-thread and do all of the locking, which has a measurable cost:
> 
> Yes, the difference is measurable, and that's exactly how I like it to
> be. :)  A user can turn off threading with --threads=0 or (more
> intuitively) --no-threads.  And we can quantify the overhead.

That seems acceptable to me if --threads is for speed-testing, but a
horrible interface if it is meant for end users who just want git to be
fast.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-06 22:48                 ` René Scharfe
  2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
  2011-12-07  4:24                   ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
@ 2011-12-07  8:11                   ` Thomas Rast
  2011-12-07 16:54                     ` René Scharfe
  2 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-12-07  8:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano

René Scharfe wrote:
> Am 02.12.2011 17:15, schrieb René Scharfe:
> > How about adding a parameter to control the number of threads 
> > (--threads?) instead that defaults to eight (or five) for the worktree 
> > and one for the rest? That would also make benchmarking easier.
> 
> Like this:
> 
> -- >8 --
> Subject: grep: add parameter --threads
> 
> Allow the number of threads to be specified by the user.  This makes
> benchmarking the performance impact of different numbers of threads
> much easier.

Sounds good, though in the end we would also want to have a config
variable for the poor OS X users who have to tune their threads
*down*... :-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-07  8:11                   ` Thomas Rast
@ 2011-12-07 16:54                     ` René Scharfe
  0 siblings, 0 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-07 16:54 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano

Am 07.12.2011 09:11, schrieb Thomas Rast:
> René Scharfe wrote:
>> Am 02.12.2011 17:15, schrieb René Scharfe:
>>> How about adding a parameter to control the number of threads 
>>> (--threads?) instead that defaults to eight (or five) for the worktree 
>>> and one for the rest? That would also make benchmarking easier.
>>
>> Like this:
>>
>> -- >8 --
>> Subject: grep: add parameter --threads
>>
>> Allow the number of threads to be specified by the user.  This makes
>> benchmarking the performance impact of different numbers of threads
>> much easier.
> 
> Sounds good, though in the end we would also want to have a config
> variable for the poor OS X users who have to tune their threads
> *down*... :-)

We could set different defaults for different platforms..

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/3] grep attributes and multithreading
  2011-12-02 16:15               ` René Scharfe
  2011-12-05  9:02                 ` Thomas Rast
  2011-12-06 22:48                 ` René Scharfe
@ 2011-12-12 21:16                 ` Thomas Rast
  2011-12-12 21:16                   ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
                                     ` (4 more replies)
  2 siblings, 5 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw)
  To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman

I think we should finish up these three patches for the next release.
The first two are unchanged from last time; nobody seemed to have any
objections.

On the third one I'm following René's argument:

> Am 02.12.2011 14:07, schrieb Thomas Rast:
> > So disable threading entirely when not scanning the worktree, to get
> > the NO_PTHREADS performance in that case.  This obsoletes all code
> > related to grep_sha1_async.  The thread startup must be delayed until
> > after all arguments have been parsed, but this does not have a
> > measurable effect.
> 
> This is a bit radical.  I think the underlying issue that 
> read_sha1_file() is not thread-safe can be solved eventually and then 
> we'd need to readd that code.

I already posted a bunch of POC patches, but doing the timing manually
has been getting on my nerves lately.  So I would first like to
formalize some of the performance testing, perhaps along lines already
drawn up by Michael Hagger, perhaps not.  Then we can revisit the
issue of grep performance.  But I would prefer not to block the -W fix
and two easy and confirmed speedups on that.

I dropped this part entirely:

> How about adding a parameter to control the number of threads 
> (--threads?) instead that defaults to eight (or five) for the worktree 
> and one for the rest?  That would also make benchmarking easier.

It does make testing a lot easier, but the interface is IMHO not fit
for users and I have a feeling that the "right" for-debugging
interface will end up falling out of the performance testing work
(probably an environment variable).  The end-user option should be a
config setting, if any.

Thomas Rast (3):
  grep: load funcname patterns for -W
  grep: enable threading with -p and -W using lazy attribute lookup
  grep: disable threading in non-worktree case

 builtin/grep.c  |   34 +++++++++++++++----------
 grep.c          |   73 ++++++++++++++++++++++++++++++++++---------------------
 grep.h          |    7 +++++
 t/t7810-grep.sh |   14 ++++++++++
 4 files changed, 86 insertions(+), 42 deletions(-)

-- 
1.7.8.431.g2abf2

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 1/3] grep: load funcname patterns for -W
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
@ 2011-12-12 21:16                   ` Thomas Rast
  2011-12-12 21:16                   ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw)
  To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman

git-grep avoids loading the funcname patterns unless they are needed.
ba8ea74 (grep: add option to show whole function as context,
2011-08-01) forgot to extend this test also to the new funcbody
feature.  Do so.

The catch is that we also have to disable threading when using
userdiff, as explained in grep_threads_ok().  So we must be careful to
introduce the same test there.
---
 grep.c          |    7 ++++---
 t/t7810-grep.sh |   14 ++++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index b29d09c..7a070e9 100644
--- a/grep.c
+++ b/grep.c
@@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt)
 	 * machinery in grep_buffer_1. The attribute code is not
 	 * thread safe, so we disable the use of threads.
 	 */
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
-	    !opt->name_only)
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
 		return 0;
 
 	return 1;
@@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	}
 
 	memset(&xecfg, 0, sizeof(xecfg));
-	if (opt->funcname && !opt->unmatch_name_only && !opt->status_only &&
+	if ((opt->funcname || opt->funcbody)
+	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
 		struct userdiff_driver *drv = userdiff_find_by_path(name);
 		if (drv && drv->funcname.pattern) {
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 81263b7..7ba5b16 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -523,6 +523,20 @@ test_expect_success 'grep -W' '
 	test_cmp expected actual
 '
 
+cat >expected <<EOF
+hello.c=	printf("Hello world.\n");
+hello.c:	return 0;
+hello.c-	/* char ?? */
+EOF
+
+test_expect_success 'grep -W with userdiff' '
+	test_when_finished "rm -f .gitattributes" &&
+	git config diff.custom.xfuncname "(printf.*|})$" &&
+	echo "hello.c diff=custom" >.gitattributes &&
+	git grep -W return >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
 	mkdir -p s &&
 	(
-- 
1.7.8.431.g2abf2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
  2011-12-12 21:16                   ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
@ 2011-12-12 21:16                   ` Thomas Rast
  2011-12-16  8:22                     ` Johannes Sixt
  2011-12-12 21:16                   ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
                                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw)
  To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman

Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 builtin/grep.c |   12 +++++++--
 grep.c         |   74 ++++++++++++++++++++++++++++++++++----------------------
 grep.h         |    7 +++++
 3 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 988ea1d..bc23c3c 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
 
 	pthread_mutex_init(&grep_mutex, NULL);
 	pthread_mutex_init(&read_sha1_mutex, NULL);
+	pthread_mutex_init(&grep_attr_mutex, NULL);
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
 	pthread_cond_init(&cond_result, NULL);
@@ -303,6 +304,7 @@ static int wait_all(void)
 
 	pthread_mutex_destroy(&grep_mutex);
 	pthread_mutex_destroy(&read_sha1_mutex);
+	pthread_mutex_destroy(&grep_attr_mutex);
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
@@ -1002,17 +1004,21 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.regflags |= REG_ICASE;
 
 #ifndef NO_PTHREADS
-	if (online_cpus() == 1 || !grep_threads_ok(&opt))
+	if (online_cpus() == 1)
 		use_threads = 0;
+#else
+	use_threads = 0;
+#endif
+
+	opt.use_threads = use_threads;
 
+#ifndef NO_PTHREADS
 	if (use_threads) {
 		if (opt.pre_context || opt.post_context || opt.file_break ||
 		    opt.funcbody)
 			skip_first_line = 1;
 		start_threads(&opt);
 	}
-#else
-	use_threads = 0;
 #endif
 
 	compile_grep_patterns(&opt);
diff --git a/grep.c b/grep.c
index 7a070e9..4dd7da2 100644
--- a/grep.c
+++ b/grep.c
@@ -2,6 +2,7 @@
 #include "grep.h"
 #include "userdiff.h"
 #include "xdiff-interface.h"
+#include "thread-utils.h"
 
 void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
 {
@@ -806,10 +807,46 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 	opt->output(opt, "\n", 1);
 }
 
-static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
+#ifndef NO_PTHREADS
+/*
+ * This lock protects access to the gitattributes machinery, which is
+ * not thread-safe.
+ */
+pthread_mutex_t grep_attr_mutex;
+
+static inline void grep_attr_lock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_lock(&grep_attr_mutex);
+}
+
+static inline void grep_attr_unlock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_unlock(&grep_attr_mutex);
+}
+#else
+#define grep_attr_lock(opt)
+#define grep_attr_unlock(opt)
+#endif
+
+static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
 {
 	xdemitconf_t *xecfg = opt->priv;
-	if (xecfg && xecfg->find_func) {
+	if (xecfg && !xecfg->find_func) {
+		struct userdiff_driver *drv;
+		grep_attr_lock(opt);
+		drv = userdiff_find_by_path(name);
+		grep_attr_unlock(opt);
+		if (drv && drv->funcname.pattern) {
+			const struct userdiff_funcname *pe = &drv->funcname;
+			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
+		} else {
+			xecfg = opt->priv = NULL;
+		}
+	}
+
+	if (xecfg) {
 		char buf[1];
 		return xecfg->find_func(bol, eol - bol, buf, 1,
 					xecfg->find_func_priv) >= 0;
@@ -835,7 +872,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
 		if (lno <= opt->last_shown)
 			break;
 
-		if (match_funcname(opt, bol, eol)) {
+		if (match_funcname(opt, name, bol, eol)) {
 			show_line(opt, bol, eol, name, lno, '=');
 			break;
 		}
@@ -848,7 +885,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 	unsigned cur = lno, from = 1, funcname_lno = 0;
 	int funcname_needed = !!opt->funcname;
 
-	if (opt->funcbody && !match_funcname(opt, bol, end))
+	if (opt->funcbody && !match_funcname(opt, name, bol, end))
 		funcname_needed = 2;
 
 	if (opt->pre_context < lno)
@@ -864,7 +901,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 		while (bol > buf && bol[-1] != '\n')
 			bol--;
 		cur--;
-		if (funcname_needed && match_funcname(opt, bol, eol)) {
+		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
 			funcname_lno = cur;
 			funcname_needed = 0;
 		}
@@ -942,19 +979,6 @@ static int look_ahead(struct grep_opt *opt,
 	return 0;
 }
 
-int grep_threads_ok(const struct grep_opt *opt)
-{
-	/* If this condition is true, then we may use the attribute
-	 * machinery in grep_buffer_1. The attribute code is not
-	 * thread safe, so we disable the use of threads.
-	 */
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
-		return 0;
-
-	return 1;
-}
-
 static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 {
 	fwrite(buf, size, 1, stdout);
@@ -1008,16 +1032,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	}
 
 	memset(&xecfg, 0, sizeof(xecfg));
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only &&
-	    !opt->name_only && !binary_match_only && !collect_hits) {
-		struct userdiff_driver *drv = userdiff_find_by_path(name);
-		if (drv && drv->funcname.pattern) {
-			const struct userdiff_funcname *pe = &drv->funcname;
-			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
-			opt->priv = &xecfg;
-		}
-	}
+	opt->priv = &xecfg;
+
 	try_lookahead = should_lookahead(opt);
 
 	while (left) {
@@ -1093,7 +1109,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 				show_function = 1;
 			goto next_line;
 		}
-		if (show_function && match_funcname(opt, bol, eol))
+		if (show_function && match_funcname(opt, name, bol, eol))
 			show_function = 0;
 		if (show_function ||
 		    (last_hit && lno <= last_hit + opt->post_context)) {
diff --git a/grep.h b/grep.h
index a652800..15d227c 100644
--- a/grep.h
+++ b/grep.h
@@ -115,6 +115,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int use_threads;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -131,4 +132,10 @@ struct grep_opt {
 extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);
 
+#ifndef NO_PTHREADS
+/* Mutex used around access to the attributes machinery if
+ * opt->use_threads.  Must be initialized/destroyed by callers! */
+extern pthread_mutex_t grep_attr_mutex;
+#endif
+
 #endif
-- 
1.7.8.431.g2abf2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup
  2011-12-12 21:16                   ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
@ 2011-12-16  8:22                     ` Johannes Sixt
  2011-12-16 17:34                       ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Johannes Sixt @ 2011-12-16  8:22 UTC (permalink / raw)
  To: Thomas Rast
  Cc: git, Junio C Hamano, René Scharfe, Jeff King, Eric Herman

Am 12/12/2011 22:16, schrieb Thomas Rast:
> diff --git a/grep.h b/grep.h
> index a652800..15d227c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -115,6 +115,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int use_threads;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -131,4 +132,10 @@ struct grep_opt {
>  extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
>  extern int grep_threads_ok(const struct grep_opt *opt);
>  
> +#ifndef NO_PTHREADS
> +/* Mutex used around access to the attributes machinery if
> + * opt->use_threads.  Must be initialized/destroyed by callers! */
> +extern pthread_mutex_t grep_attr_mutex;
> +#endif
> +
>  #endif

This is the first time we use pthread_mutex_t in a header file. We need at
least the following squashed in. An alternative would be to include
"thread-utils.h", but thread-utils is really more about implementation
helpers functions, not about types, and therefore does not look right to
be #included in a header.

---
 grep.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/grep.h b/grep.h
index 15d227c..754b270 100644
--- a/grep.h
+++ b/grep.h
@@ -133,6 +133,7 @@ extern struct grep_opt *grep_opt_dup(const struct
grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);

 #ifndef NO_PTHREADS
+#include <pthread.h>
 /* Mutex used around access to the attributes machinery if
  * opt->use_threads.  Must be initialized/destroyed by callers! */
 extern pthread_mutex_t grep_attr_mutex;
-- 
1.7.8.1436.gb3021

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup
  2011-12-16  8:22                     ` Johannes Sixt
@ 2011-12-16 17:34                       ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2011-12-16 17:34 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Thomas Rast, git, Junio C Hamano, René Scharfe, Jeff King,
	Eric Herman

Johannes Sixt <j.sixt@viscovery.net> writes:

> This is the first time we use pthread_mutex_t in a header file. We need at
> least the following squashed in. An alternative would be to include
> "thread-utils.h", but thread-utils is really more about implementation
> helpers functions, not about types,...

builtin/grep.c already uses thread-utils.h since 5b594f4 (Threaded grep,
2010-01-25), so does builtin/pack-objects.c since 833e3df (pack-objects:
Add runtime detection of online CPU's, 2008-02-22), so it may be simpler
to do so in grep.h instead.

diff --git a/builtin/grep.c b/builtin/grep.c
index bc23c3c..6474eed 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -17,7 +17,6 @@
 #include "grep.h"
 #include "quote.h"
 #include "dir.h"
-#include "thread-utils.h"
 
 static char const * const grep_usage[] = {
 	"git grep [options] [-e] <pattern> [<rev>...] [[--] <path>...]",
diff --git a/grep.h b/grep.h
index 15d227c..dd4c65e 100644
--- a/grep.h
+++ b/grep.h
@@ -133,8 +133,11 @@ extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);
 
 #ifndef NO_PTHREADS
-/* Mutex used around access to the attributes machinery if
- * opt->use_threads.  Must be initialized/destroyed by callers! */
+#include "thread-utils.h"
+/*
+ * Mutex used around access to the attributes machinery if
+ * opt->use_threads.  Must be initialized/destroyed by callers!
+ */
 extern pthread_mutex_t grep_attr_mutex;
 #endif
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 3/3] grep: disable threading in non-worktree case
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
  2011-12-12 21:16                   ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
  2011-12-12 21:16                   ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
@ 2011-12-12 21:16                   ` Thomas Rast
  2011-12-12 22:37                   ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
  2011-12-12 23:44                   ` Junio C Hamano
  4 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw)
  To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman

Measurements by various people have shown that grepping in parallel is
not beneficial when the object store is involved.  For example, with a
simple regex:

  Threads     | --cached case            | worktree case
  ----------------------------------------------------------------
  8 (default) | 2.88u 0.21s 0:02.94real  | 0.19u 0.32s 0:00.16real
  4           | 2.89u 0.29s 0:02.99real  | 0.16u 0.34s 0:00.17real
  2           | 2.83u 0.36s 0:02.87real  | 0.18u 0.32s 0:00.26real
  NO_PTHREADS | 2.16u 0.08s 0:02.25real  | 0.12u 0.17s 0:00.31real

This happens because all the threads contend on read_sha1_mutex almost
all of the time.  A more complex regex allows the threads to do more
work in parallel, but as Jeff King found out, the "super boost" (much
higher clock when only one core is active) feature of recent CPUs
still causes the unthreaded case to win by a large margin.

So until the pack machinery allows unthreaded access, we disable
grep's threading in all but the worktree case.

Helped-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 builtin/grep.c |   36 ++++++++++++++++++------------------
 1 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index bc23c3c..7affbda 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1003,24 +1003,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
-#ifndef NO_PTHREADS
-	if (online_cpus() == 1)
-		use_threads = 0;
-#else
-	use_threads = 0;
-#endif
-
-	opt.use_threads = use_threads;
-
-#ifndef NO_PTHREADS
-	if (use_threads) {
-		if (opt.pre_context || opt.post_context || opt.file_break ||
-		    opt.funcbody)
-			skip_first_line = 1;
-		start_threads(&opt);
-	}
-#endif
-
 	compile_grep_patterns(&opt);
 
 	/* Check revs and then paths */
@@ -1042,6 +1024,24 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		break;
 	}
 
+#ifndef NO_PTHREADS
+	if (list.nr || cached || online_cpus() == 1)
+		use_threads = 0;
+#else
+	use_threads = 0;
+#endif
+
+	opt.use_threads = use_threads;
+
+#ifndef NO_PTHREADS
+	if (use_threads) {
+		if (opt.pre_context || opt.post_context || opt.file_break ||
+		    opt.funcbody)
+			skip_first_line = 1;
+		start_threads(&opt);
+	}
+#endif
+
 	/* The rest are paths */
 	if (!seen_dashdash) {
 		int j;
-- 
1.7.8.431.g2abf2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/3] grep attributes and multithreading
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
                                     ` (2 preceding siblings ...)
  2011-12-12 21:16                   ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
@ 2011-12-12 22:37                   ` René Scharfe
  2011-12-12 23:44                   ` Junio C Hamano
  4 siblings, 0 replies; 50+ messages in thread
From: René Scharfe @ 2011-12-12 22:37 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano, Jeff King, Eric Herman

Am 12.12.2011 22:16, schrieb Thomas Rast:
> I think we should finish up these three patches for the next release.

> I already posted a bunch of POC patches, but doing the timing manually
> has been getting on my nerves lately.  So I would first like to
> formalize some of the performance testing, perhaps along lines already
> drawn up by Michael Hagger, perhaps not.  Then we can revisit the
> issue of grep performance.  But I would prefer not to block the -W fix
> and two easy and confirmed speedups on that.

Yes, that's a good idea.  The three patches are uncontroversial 
incremental improvements.

> I dropped this part entirely:
>
>> How about adding a parameter to control the number of threads
>> (--threads?) instead that defaults to eight (or five) for the worktree
>> and one for the rest?  That would also make benchmarking easier.
>
> It does make testing a lot easier, but the interface is IMHO not fit
> for users and I have a feeling that the "right" for-debugging
> interface will end up falling out of the performance testing work
> (probably an environment variable).  The end-user option should be a
> config setting, if any.

Agreed; users shouldn't need to specify such a parameter -- our 
heuristic should be good enough.

René

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/3] grep attributes and multithreading
  2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
                                     ` (3 preceding siblings ...)
  2011-12-12 22:37                   ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
@ 2011-12-12 23:44                   ` Junio C Hamano
  2011-12-13  8:44                     ` Thomas Rast
  4 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2011-12-12 23:44 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, René Scharfe, Jeff King, Eric Herman

Thanks; sign-off?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/3] grep attributes and multithreading
  2011-12-12 23:44                   ` Junio C Hamano
@ 2011-12-13  8:44                     ` Thomas Rast
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-13  8:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Jeff King, Eric Herman

Junio C Hamano wrote:
> Thanks; sign-off?

Ooops, not again!

> [PATCH v3 1/3] grep: load funcname patterns for -W

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
  2011-12-02 16:15               ` René Scharfe
@ 2011-12-23 22:37               ` Ævar Arnfjörð Bjarmason
  2011-12-23 22:49                 ` Thomas Rast
  1 sibling, 1 reply; 50+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-12-23 22:37 UTC (permalink / raw)
  To: Thomas Rast
  Cc: René Scharfe, Eric Herman, git, Junio C Hamano,
	Fredrik Kuivinen

On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote:

> I conjecture that this is caused by contention on
> read_sha1_mutex. [...] So disable threading entirely when not
> scanning the worktree

Why does git-grep even need to keep a mutex to call read_sha1_file()?
It's inherently a read-only operation isn't it? If the lock is needed
because data is being shared between threads in sha1_file.c shouldn't
we tackle that instead of completely disabling threading?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-23 22:37               ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
@ 2011-12-23 22:49                 ` Thomas Rast
  2011-12-24  1:39                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Rast @ 2011-12-23 22:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Eric Herman, git, Junio C Hamano,
	Fredrik Kuivinen

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote:
>
>> I conjecture that this is caused by contention on
>> read_sha1_mutex. [...] So disable threading entirely when not
>> scanning the worktree
>
> Why does git-grep even need to keep a mutex to call read_sha1_file()?
> It's inherently a read-only operation isn't it? If the lock is needed
> because data is being shared between threads in sha1_file.c shouldn't
> we tackle that instead of completely disabling threading?

The problem is that all sorts of data is shared.  See

  http://thread.gmane.org/gmane.comp.version-control.git/186618

But I need to go through it again, there are some races and double locks
in the posted version.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-23 22:49                 ` Thomas Rast
@ 2011-12-24  1:39                   ` Ævar Arnfjörð Bjarmason
  2011-12-24  7:07                     ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-12-24  1:39 UTC (permalink / raw)
  To: Thomas Rast
  Cc: René Scharfe, Eric Herman, git, Junio C Hamano,
	Fredrik Kuivinen

2011/12/23 Thomas Rast <trast@student.ethz.ch>:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote:
>>
>>> I conjecture that this is caused by contention on
>>> read_sha1_mutex. [...] So disable threading entirely when not
>>> scanning the worktree
>>
>> Why does git-grep even need to keep a mutex to call read_sha1_file()?
>> It's inherently a read-only operation isn't it? If the lock is needed
>> because data is being shared between threads in sha1_file.c shouldn't
>> we tackle that instead of completely disabling threading?
>
> The problem is that all sorts of data is shared.  See
>
>  http://thread.gmane.org/gmane.comp.version-control.git/186618
>
> But I need to go through it again, there are some races and double locks
> in the posted version.

I mentioned this on IRC, but I thought I'd bring it up here too.

Is the expensive part of git-grep all the setup work, or the actual
traversal and searching? I'm guessing it's the latter.

In that case an easy way to do git-grep in parallel would be to simply
spawn multiple sub-processes, e.g. if we had 1000 files and 4 cores:

 1. Split the 1000 into 4 parts 250 each.
 2. Spawn 4 processes as: git grep <pattern> -- <250 files>
 3. Aggregate all of the results in the parent process

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-24  1:39                   ` Ævar Arnfjörð Bjarmason
@ 2011-12-24  7:07                     ` Jeff King
  2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
                                         ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Jeff King @ 2011-12-24  7:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano,
	Fredrik Kuivinen

On Sat, Dec 24, 2011 at 02:39:11AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Is the expensive part of git-grep all the setup work, or the actual
> traversal and searching? I'm guessing it's the latter.
> 
> In that case an easy way to do git-grep in parallel would be to simply
> spawn multiple sub-processes, e.g. if we had 1000 files and 4 cores:
> 
>  1. Split the 1000 into 4 parts 250 each.
>  2. Spawn 4 processes as: git grep <pattern> -- <250 files>
>  3. Aggregate all of the results in the parent process

That's an interesting idea. The expense of the traversal and searching
depends on two things:

  - how complex is your regex?

  - are you reading from objects (which need zlib inflated) or disk?

But you should be able to approximate it by compiling with NO_PTHREADS
and doing (assuming you have GNU xargs):

  # grep in working tree
  git ls-files | xargs -P 8 git grep "$re" --

  # grep tree-ish
  git ls-tree -r --name-only $tree | xargs -P 8 git grep "$re" $tree --

I tried to get some timings for this, but ran across some quite
surprising results. Here's a simple grep of the linux-2.6 working tree,
using a single-threaded grep:

  $ time git grep SIMPLE >/dev/null
  real    0m0.439s
  user    0m0.272s
  sys     0m0.160s

and then the same thing, via xargs, without even turning on
parallelization. This should give us a measurement of the overhead for
going through xargs at all. We'd expect it to be slower, but not too
much so:

  $ time git ls-files | xargs git grep SIMPLE -- >/dev/null
  real    0m11.989s
  user    0m11.769s
  sys     0m0.268s

Twenty-five times slower! Running 'perf' reports the culprit as pathspec
matching:

  +  63.23%    git  git                 [.] match_pathspec_depth
  +  28.60%    git  libc-2.13.so        [.] __strncmp_sse42
  +   2.22%    git  git                 [.] strncmp@plt
  +   1.67%    git  git                 [.] kwsexec

where the strncmps are called as part of match_pathspec_depth. So over
90% of the CPU time is spent on matching the pathspecs, compared to less
than 2% actually grepping.

Which really makes me wonder if our pathspec matching could stand to be
faster. True, giving a bunch of single files is the least efficient way
to use pathspecs, but that's pretty amazingly slow.

The case where we would most expect the setup cost to be drowned out is
using a more complex regex, grepping tree objects. There we have a
baseline of:

  $ time git grep 'a.*c' HEAD >/dev/null
  real    0m5.684s
  user    0m5.472s
  sys     0m0.196s

  $ time git ls-tree --name-only -r HEAD |
      xargs git grep 'a.*c' HEAD -- >/dev/null
  real    0m10.906s
  user    0m10.725s
  sys     0m0.240s

Here, we still almost double our time. It looks like we don't use the
same pathspec matching code in this case. But we do waste a lot of extra
time zlib-inflating the trees in "ls-tree", only to do it separately in
"grep".

Doing it in parallel yields:

  $ time git ls-tree --name-only -r HEAD |
      xargs -n 4000 -P 8 git grep 'a.*c' HEAD -- >/dev/null
  real    0m3.573s
  user    0m21.885s
  sys     0m0.400s

So that does at least yield a real speedup, albeit only by about half,
despite using over six times as much CPU (though my numbers are skewed
somewhat, as this is a quad i7 with hyperthreading and turbo boost).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-24  7:07                     ` Jeff King
@ 2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
  2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
  2011-12-25  3:32                       ` Nguyen Thai Ngoc Duy
  2 siblings, 0 replies; 50+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-24 10:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe,
	Eric Herman, git, Junio C Hamano, Fredrik Kuivinen

On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote:
> I tried to get some timings for this, but ran across some quite
> surprising results. Here's a simple grep of the linux-2.6 working tree,
> using a single-threaded grep:
>
>  $ time git grep SIMPLE >/dev/null
>  real    0m0.439s
>  user    0m0.272s
>  sys     0m0.160s
>
> and then the same thing, via xargs, without even turning on
> parallelization. This should give us a measurement of the overhead for
> going through xargs at all. We'd expect it to be slower, but not too
> much so:
>
>  $ time git ls-files | xargs git grep SIMPLE -- >/dev/null
>  real    0m11.989s
>  user    0m11.769s
>  sys     0m0.268s
>
> Twenty-five times slower! Running 'perf' reports the culprit as pathspec
> matching:
>
>  +  63.23%    git  git                 [.] match_pathspec_depth
>  +  28.60%    git  libc-2.13.so        [.] __strncmp_sse42
>  +   2.22%    git  git                 [.] strncmp@plt
>  +   1.67%    git  git                 [.] kwsexec
>
> where the strncmps are called as part of match_pathspec_depth. So over
> 90% of the CPU time is spent on matching the pathspecs, compared to less
> than 2% actually grepping.
>
> Which really makes me wonder if our pathspec matching could stand to be
> faster. True, giving a bunch of single files is the least efficient way
> to use pathspecs, but that's pretty amazingly slow.

We could eliminate get_pathspec_depth() in grep_directory() when
read_directory() learns to filter path properly using (and at the cost
of) tree_entry_interesting(). The latter function has more
optimizaions built in and should be faster than the former. This is a
good test case for my read_directory() rewrite. Thanks.

get_pathspec_depth() can still use some optimizations though for
grep_cache() case.
-- 
Duy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-24  7:07                     ` Jeff King
  2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
@ 2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
  2011-12-24 13:38                         ` Jeff King
  2011-12-25  3:32                       ` Nguyen Thai Ngoc Duy
  2 siblings, 1 reply; 50+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-24 10:55 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe,
	Eric Herman, git, Junio C Hamano, Fredrik Kuivinen

(Sorry I replied without reading though the mail)

On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote:
> The case where we would most expect the setup cost to be drowned out is
> using a more complex regex, grepping tree objects. There we have a
> baseline of:
>
>  $ time git grep 'a.*c' HEAD >/dev/null
>  real    0m5.684s
>  user    0m5.472s
>  sys     0m0.196s
>
>  $ time git ls-tree --name-only -r HEAD |
>      xargs git grep 'a.*c' HEAD -- >/dev/null
>  real    0m10.906s
>  user    0m10.725s
>  sys     0m0.240s
>
> Here, we still almost double our time. It looks like we don't use the
> same pathspec matching code in this case. But we do waste a lot of extra
> time zlib-inflating the trees in "ls-tree", only to do it separately in
> "grep".

I assume this is gree_tree(), we have another form of pathspec
matching here: tree_entry_interesting() and it's still a bunch of
strcmp inside. Does strcmp show up in perf report?
-- 
Duy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
@ 2011-12-24 13:38                         ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2011-12-24 13:38 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe,
	Eric Herman, git, Junio C Hamano, Fredrik Kuivinen

On Sat, Dec 24, 2011 at 05:55:14PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote:
> > The case where we would most expect the setup cost to be drowned out is
> > using a more complex regex, grepping tree objects. There we have a
> > baseline of:
> >
> >  $ time git grep 'a.*c' HEAD >/dev/null
> >  real    0m5.684s
> >  user    0m5.472s
> >  sys     0m0.196s
> >
> >  $ time git ls-tree --name-only -r HEAD |
> >      xargs git grep 'a.*c' HEAD -- >/dev/null
> >  real    0m10.906s
> >  user    0m10.725s
> >  sys     0m0.240s
> >
> > Here, we still almost double our time. It looks like we don't use the
> > same pathspec matching code in this case. But we do waste a lot of extra
> > time zlib-inflating the trees in "ls-tree", only to do it separately in
> > "grep".
> 
> I assume this is gree_tree(), we have another form of pathspec
> matching here: tree_entry_interesting() and it's still a bunch of
> strcmp inside. Does strcmp show up in perf report?

Yes, but not nearly as high. The top of the report is:

  +  32.16%    git  libc-2.13.so        [.] re_search_internal
  +  17.82%    git  libz.so.1.2.3.4     [.] 0xe986
  +   7.81%    git  git                 [.] look_ahead
  +   6.24%    git  libc-2.13.so        [.] __strncmp_sse42
  +   4.08%    git  git                 [.] tree_entry_interesting
  +   3.27%    git  git                 [.] end_of_line
  +   2.63%    git  libz.so.1.2.3.4     [.] adler32
  +   1.93%    git  libz.so.1.2.3.4     [.] inflate

where the strncmps are from[1]:

  -   6.24%    git  libc-2.13.so        [.] __strncmp_sse42
     - __strncmp_sse42
        + 80.92% grep_tree
        + 19.08% tree_entry_interesting

So we're spending maybe 10% of our time on pathspecs, but most of it is
going to zlib and the actual regex search.

-Peff

[1] Note that this is with -O2, so some of that is from inlined calls.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case
  2011-12-24  7:07                     ` Jeff King
  2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
  2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
@ 2011-12-25  3:32                       ` Nguyen Thai Ngoc Duy
  2 siblings, 0 replies; 50+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-25  3:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe,
	Eric Herman, git, Junio C Hamano, Fredrik Kuivinen

On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote:
> The case where we would most expect the setup cost to be drowned out is
> using a more complex regex, grepping tree objects. There we have a
> baseline of:
>
>  $ time git grep 'a.*c' HEAD >/dev/null
>  real    0m5.684s
>  user    0m5.472s
>  sys     0m0.196s
>
>  $ time git ls-tree --name-only -r HEAD |
>      xargs git grep 'a.*c' HEAD -- >/dev/null
>  real    0m10.906s
>  user    0m10.725s
>  sys     0m0.240s
>
> Here, we still almost double our time. It looks like we don't use the
> same pathspec matching code in this case. But we do waste a lot of extra
> time zlib-inflating the trees in "ls-tree", only to do it separately in
> "grep".

Or you could pass blob SHA-1 to git grep to avoid reinflating trees

$ time git ls-tree -r HEAD|cut -c 13-52|xargs git grep 'a.*c' >/dev/null

Doing it in parallel does not seem to save time for me though.
-- 
Duy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] grep multithreading and scaling
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
                               ` (2 preceding siblings ...)
  2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
@ 2011-12-02 17:34             ` Jeff King
  2011-12-05  9:38               ` Thomas Rast
  2011-12-02 20:02             ` Eric Herman
  4 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2011-12-02 17:34 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano

On Fri, Dec 02, 2011 at 02:07:45PM +0100, Thomas Rast wrote:

> where I put the --cached originally because that makes it independent
> of the worktree (which in the very first measurements I still had
> wiped, as I tend to do for this repo; I checked it out again after
> that).  This in fact gives me (~/g/git-grep --cached
> INITRAMFS_ROOT_UID, leaving aside -W; best of 10):
> 
>   THREADS=8:   2.88user 0.21system 0:02.94elapsed
>   THREADS=4:   2.89user 0.29system 0:02.99elapsed
>   THREADS=2:   2.83user 0.36system 0:02.87elapsed
>   NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed
> 
> Uhuh.  Doesn't scale so well after all.  But removing the --cached, as
> most people probably would:
> 
>   THREADS=8:   0.19user 0.32system 0:00.16elapsed
>   THREADS=4:   0.16user 0.34system 0:00.17elapsed
>   THREADS=2:   0.18user 0.32system 0:00.26elapsed
>   NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed
> 
> So I conclude that during any grep that cannot use the worktree,
> having any threads hurts.

Wow, that's horrible. Leaving aside the parallelism, it's just terrible
that reading from the cache is 20 times slower than the worktree. I get
similar results on my quad-core machine.

A quick perf run shows most of the time is spent inflating objects. The
diff code has a sneaky trick to re-use worktree files when we know they
are stat-clean (in diff's case it is to avoid writing a tempfile). I
wonder if we should use the same trick here.

It would hurt the cold cache case, though, as the compressed versions
require fewer disk accesses, of course.

-Peff

PS I suspect your timings are somewhat affected by the simplicity of the
   regex you are asking for. The time to inflate the blobs dominates,
   because the search is just a memmem(). On my quad-core w/
   hyperthreading (i.e., 8 apparent cores):

   [no caching, simple regex; we get some parallelism, but the regex
    task is just not that intensive]
   $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
   0.42user 0.45system 0:00.15elapsed 578%CPU

   [no caching, harder regex; we get much higher CPU utilization]
   $ /usr/bin/time git grep 'a.*b' >/dev/null
   14.68user 0.50system 0:02.00elapsed 758%CPU

   [with caching, simple regex; we get almost _no_ parallelism because
    all of our time is spent deflating under a lock, and the regex task
    takes very little time]
   $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
   7.64user 0.41system 0:07.61elapsed 105%CPU

   [with caching, harder regex; not as much parallelism as we hoped for,
    but still much more than before. Because there is actually work to
    parallelize in the regex]
   $ /usr/bin/time git grep --cached 'a.*b' >/dev/null
   23.46user 0.47system 0:08.42elapsed 284%CPU

   So I think there is value in parallelizing even --cached greps. But
   we could do so much better if blob inflation could be done in
   parallel.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] grep multithreading and scaling
  2011-12-02 17:34             ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
@ 2011-12-05  9:38               ` Thomas Rast
  2011-12-05 20:16                 ` Thomas Rast
  2011-12-06  0:40                 ` Jeff King
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-05  9:38 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano

Jeff King wrote:
> 
> A quick perf run shows most of the time is spent inflating objects. The
> diff code has a sneaky trick to re-use worktree files when we know they
> are stat-clean (in diff's case it is to avoid writing a tempfile). I
> wonder if we should use the same trick here.
> 
> It would hurt the cold cache case, though, as the compressed versions
> require fewer disk accesses, of course.

I just found out that on Linux, there's mincore() that can tell us
(racily, but who cares) whether a given file mapping is in memory.  If
you would like to try it, see the source at the end, but I'm getting
things such as

  # in a random collection of files, none of which I have accessed lately
  $ ls -l
  -rw-r--r-- 1 thomas users    116534 Jul  4  2010 IMG_4884.JPG
  -rw-r--r-- 1 thomas users   7278081 Aug 25  2010 remoteserverrepo.zip
  $ ./mincore IMG_4884.JPG 
  00000000000000000000000000000
  $ cat IMG_4884.JPG > /dev/null 
  $ ./mincore IMG_4884.JPG 
  11111111111111111111111111111
  $ ./mincore remoteserverrepo.zip 
  0000000000000000000000[...]
  $ head -10 remoteserverrepo.zip >/dev/null
  $ ./mincore remoteserverrepo.zip 
  1111000000000000000000[...]

So that looks fairly promising, and the order would then be:

- if stat-clean, and we have mincore(), and it tells us we can do it
  cheaply: grab file from tree

- if it's a loose object: decompress it

- if stat-clean: grab file from tree

- access packs as usual

> PS I suspect your timings are somewhat affected by the simplicity of the
>    regex you are asking for. The time to inflate the blobs dominates,
>    because the search is just a memmem(). On my quad-core w/
>    hyperthreading (i.e., 8 apparent cores):
> 
>    $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
>    0.42user 0.45system 0:00.15elapsed 578%CPU
>    $ /usr/bin/time git grep 'a.*b' >/dev/null
>    14.68user 0.50system 0:02.00elapsed 758%CPU
>    $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
>    7.64user 0.41system 0:07.61elapsed 105%CPU
>    $ /usr/bin/time git grep --cached 'a.*b' >/dev/null
>    23.46user 0.47system 0:08.42elapsed 284%CPU
> 
>    So I think there is value in parallelizing even --cached greps. But
>    we could do so much better if blob inflation could be done in
>    parallel.

Ok, I see, I missed that part.  Perhaps the heuristic should then be
"if the regex boils down to memmem, disable threading", but let's see
what loose object decompression in parallel can give us.


---- 8< ---- mincore.c ---- 8< ----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>

void die(const char *s)
{
	perror(s);
	exit(1);
}

int main (int argc, char *argv[])
{
	void *mem;
	size_t len;
	struct stat st;
	int fd;
	unsigned char *vec;
	int vsize;
	int i;
	size_t page = sysconf(_SC_PAGESIZE);

	if (argc != 2) {
		fprintf(stderr, "usage: %s <file>\n", argv[0]);
		exit(2);
	}

	fd = open(argv[1], O_RDONLY);
	if (fd == -1)
		die("open failed");
	if (fstat(fd, &st) == -1)
		die("fstat failed");
	mem = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
	if (mem == (void*) -1)
		die("mmap failed");

	vsize = (st.st_size+page-1)/page;
	vec = malloc(vsize);
	if (!vec)
		die("malloc failed");
	if (mincore(mem, st.st_size, vec) == -1)
		die("mincore failed");
	for (i = 0; i < vsize; i++)
		printf("%d", (int) vec[i]);
	printf("\n");
	return 0;
}


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] grep multithreading and scaling
  2011-12-05  9:38               ` Thomas Rast
@ 2011-12-05 20:16                 ` Thomas Rast
  2011-12-06  0:40                 ` Jeff King
  1 sibling, 0 replies; 50+ messages in thread
From: Thomas Rast @ 2011-12-05 20:16 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano

Thomas Rast wrote:
> 
> I just found out that on Linux, there's mincore() that can tell us
> (racily, but who cares) whether a given file mapping is in memory.
[...]
> So that looks fairly promising, and the order would then be:
> 
> - if stat-clean, and we have mincore(), and it tells us we can do it
>   cheaply: grab file from tree
> 
> - if it's a loose object: decompress it
> 
> - if stat-clean: grab file from tree
> 
> - access packs as usual

Just a small note, I tried two things:

* the simpler option of grabbing a loose object if it exists and is
  mincore() turns out to massively slow down 'git log HEAD', probably
  because only very few of these objects are loose in the first place

* doing this only under grep's use_threads, and dropping the lock
  around unpack_sha1_file() [i.e., zlib decompression] still results
  in a git-grep that is slower than without this, though not much

So no improvement here.  Will have to look into the worktree trick
though.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] grep multithreading and scaling
  2011-12-05  9:38               ` Thomas Rast
  2011-12-05 20:16                 ` Thomas Rast
@ 2011-12-06  0:40                 ` Jeff King
  1 sibling, 0 replies; 50+ messages in thread
From: Jeff King @ 2011-12-06  0:40 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano

On Mon, Dec 05, 2011 at 10:38:16AM +0100, Thomas Rast wrote:

> I just found out that on Linux, there's mincore() that can tell us
> (racily, but who cares) whether a given file mapping is in memory.  If
> you would like to try it, see the source at the end, but I'm getting
> things such as

Neat, I didn't know about mincore.

> So that looks fairly promising, and the order would then be:
> 
> - if stat-clean, and we have mincore(), and it tells us we can do it
>   cheaply: grab file from tree
> 
> - if it's a loose object: decompress it
> 
> - if stat-clean: grab file from tree
> 
> - access packs as usual

I don't think your third one makes sense. If the working tree file isn't
stat clean, then either:

  1. the pack file is in cache, and it's way faster than faulting in the
     working tree file from disk

  2. the pack file is not in cache, and it's a toss-up whether it is
     faster to fault in the smaller compressed pack-file version and
     uncompress it, or to fault in the larger on-disk version. The
     exact result will depend on the ratio of CPU to disk speed, the
     quality of your filesystem, and the size and contents of your file.

     And possibly on the exact delta chains you have. Though this
     optimization only happens when the file is in the index, which
     usually means it's recent, which means it will tend to be at the
     head of the delta chain.

So it probably just makes sense to grab the working tree file only if
mincore() tells us we have all (or most) of it, and otherwise go to the
packfile.

> Ok, I see, I missed that part.  Perhaps the heuristic should then be
> "if the regex boils down to memmem, disable threading", but let's see
> what loose object decompression in parallel can give us.

Yeah. I'd really rather have parallel object decompression than some
complex Linux-only mincore optimization (even though that optimization
_could_ yield extra savings on top of properly threading, if the blob
retrieval is threaded, I think I'll care less about how much CPU time it
takes).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/3] grep multithreading and scaling
  2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
                               ` (3 preceding siblings ...)
  2011-12-02 17:34             ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
@ 2011-12-02 20:02             ` Eric Herman
  4 siblings, 0 replies; 50+ messages in thread
From: Eric Herman @ 2011-12-02 20:02 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Scharfe, git, Junio C Hamano

Hello Thomas,

Thanks for the work and the great info.
Some of the numbers are quite surprising.

I do, indeed, have a machine with more cores, but I have been either 
busy with out-of-town guests or generally plain lazy in the last couple 
of weeks. I intend to set aside some time to do some benchmarking this 
weekend.

I'll let you know what I find.

Cheers,
  -Eric

-- 
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2011-12-25  3:33 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast
2011-11-25 16:32 ` René Scharfe
2011-11-26 12:15   ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe
2011-11-29  9:54     ` Thomas Rast
2011-11-29 13:49       ` René Scharfe
2011-11-29 14:07         ` Thomas Rast
2011-12-02 13:07           ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast
2011-12-02 13:07             ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-02 13:07             ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-02 13:07             ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast
2011-12-02 16:15               ` René Scharfe
2011-12-05  9:02                 ` Thomas Rast
2011-12-06 22:48                 ` René Scharfe
2011-12-06 23:01                   ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe
2011-12-07  4:42                     ` Jeff King
2011-12-07 17:11                       ` René Scharfe
2011-12-07 18:28                         ` Jeff King
2011-12-07 20:11                       ` J. Bruce Fields
2011-12-07 20:45                         ` Jeff King
2011-12-07  8:12                     ` Thomas Rast
2011-12-07 17:00                       ` René Scharfe
2011-12-10 13:13                         ` Pete Wyckoff
2011-12-12 22:37                           ` René Scharfe
2011-12-07  4:24                   ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King
2011-12-07 16:52                     ` René Scharfe
2011-12-07 18:10                       ` Jeff King
2011-12-07  8:11                   ` Thomas Rast
2011-12-07 16:54                     ` René Scharfe
2011-12-12 21:16                 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast
2011-12-12 21:16                   ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast
2011-12-12 21:16                   ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast
2011-12-16  8:22                     ` Johannes Sixt
2011-12-16 17:34                       ` Junio C Hamano
2011-12-12 21:16                   ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast
2011-12-12 22:37                   ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe
2011-12-12 23:44                   ` Junio C Hamano
2011-12-13  8:44                     ` Thomas Rast
2011-12-23 22:37               ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason
2011-12-23 22:49                 ` Thomas Rast
2011-12-24  1:39                   ` Ævar Arnfjörð Bjarmason
2011-12-24  7:07                     ` Jeff King
2011-12-24 10:49                       ` Nguyen Thai Ngoc Duy
2011-12-24 10:55                       ` Nguyen Thai Ngoc Duy
2011-12-24 13:38                         ` Jeff King
2011-12-25  3:32                       ` Nguyen Thai Ngoc Duy
2011-12-02 17:34             ` [PATCH v2 0/3] grep multithreading and scaling Jeff King
2011-12-05  9:38               ` Thomas Rast
2011-12-05 20:16                 ` Thomas Rast
2011-12-06  0:40                 ` Jeff King
2011-12-02 20:02             ` Eric Herman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).