* [PATCH] grep: load funcname patterns for -W @ 2011-11-25 14:46 Thomas Rast 2011-11-25 16:32 ` René Scharfe 0 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-11-25 14:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe git-grep avoids loading the funcname patterns unless they are needed. ba8ea74 (grep: add option to show whole function as context, 2011-08-01) forgot to extend this test also to the new funcbody feature. Do so. The catch is that we also have to disable threading when using userdiff, as explained in grep_threads_ok(). So we must be careful to introduce the same test there. --- grep.c | 7 ++++--- t/t7810-grep.sh | 14 ++++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index b29d09c..7a070e9 100644 --- a/grep.c +++ b/grep.c @@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt) * machinery in grep_buffer_1. The attribute code is not * thread safe, so we disable the use of threads. */ - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && - !opt->name_only) + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) return 0; return 1; @@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } memset(&xecfg, 0, sizeof(xecfg)); - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only && !binary_match_only && !collect_hits) { struct userdiff_driver *drv = userdiff_find_by_path(name); if (drv && drv->funcname.pattern) { diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 81263b7..7ba5b16 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -523,6 +523,20 @@ test_expect_success 'grep -W' ' test_cmp expected actual ' +cat >expected <<EOF +hello.c= printf("Hello world.\n"); +hello.c: return 0; +hello.c- /* char ?? */ +EOF + +test_expect_success 'grep -W with userdiff' ' + test_when_finished "rm -f .gitattributes" && + git config diff.custom.xfuncname "(printf.*|})$" && + echo "hello.c diff=custom" >.gitattributes && + git grep -W return >actual && + test_cmp expected actual +' + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 1.7.8.rc3.415.gbcbb2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH] grep: load funcname patterns for -W 2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast @ 2011-11-25 16:32 ` René Scharfe 2011-11-26 12:15 ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe 0 siblings, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-11-25 16:32 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Junio C Hamano Am 25.11.2011 15:46, schrieb Thomas Rast: > git-grep avoids loading the funcname patterns unless they are needed. > ba8ea74 (grep: add option to show whole function as context, > 2011-08-01) forgot to extend this test also to the new funcbody > feature. Do so. > > The catch is that we also have to disable threading when using > userdiff, as explained in grep_threads_ok(). So we must be careful to > introduce the same test there. Oops. Thanks for catching this. That reminds me to look into adding a thread-safe way to access attributes again.. Thanks, René ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH] grep: enable multi-threading for -p and -W 2011-11-25 16:32 ` René Scharfe @ 2011-11-26 12:15 ` René Scharfe 2011-11-29 9:54 ` Thomas Rast 0 siblings, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-11-26 12:15 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Junio C Hamano Move attribute reading, which is not thread-safe, into add_work(), under grep_mutex. That way we can stop turning off multi-threading if -p or -W is given, as the diff attribute for each file is gotten safely now. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- Goes on top of your patch. Needs testing.. builtin/grep.c | 33 +++++++++++++++++++++++++-------- grep.c | 25 +++++++------------------ grep.h | 5 +++-- revision.c | 2 +- 4 files changed, 36 insertions(+), 29 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 3d7329d..5534111 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -18,6 +18,7 @@ #include "quote.h" #include "dir.h" #include "thread-utils.h" +#include "userdiff.h" static char const * const grep_usage[] = { "git grep [options] [-e] <pattern> [<rev>...] [[--] <path>...]", @@ -43,6 +44,7 @@ enum work_type {WORK_SHA1, WORK_FILE}; struct work_item { enum work_type type; char *name; + struct userdiff_driver *userdiff_driver; /* if type == WORK_SHA1, then 'identifier' is a SHA1, * otherwise type == WORK_FILE, and 'identifier' is a NUL @@ -114,7 +116,17 @@ static pthread_cond_t cond_result; static int skip_first_line; -static void add_work(enum work_type type, char *name, void *id) +/* Reading attributes is not thread-safe, so callers need to lock. */ +static struct userdiff_driver *get_userdiff_driver(struct grep_opt *opt, + const char *name) +{ + if (opt->funcbody || opt->funcname) + return userdiff_find_by_path(name); + return NULL; +} + +static void add_work(struct grep_opt *opt, enum work_type type, char *name, + void *id) { grep_lock(); @@ -124,6 +136,7 @@ static void add_work(enum work_type type, char *name, void *id) todo[todo_end].type = type; todo[todo_end].name = name; + todo[todo_end].userdiff_driver = get_userdiff_driver(opt, name); todo[todo_end].identifier = id; todo[todo_end].done = 0; strbuf_reset(&todo[todo_end].out); @@ -158,13 +171,13 @@ static void grep_sha1_async(struct grep_opt *opt, char *name, unsigned char *s; s = xmalloc(20); memcpy(s, sha1, 20); - add_work(WORK_SHA1, name, s); + add_work(opt, WORK_SHA1, name, s); } static void grep_file_async(struct grep_opt *opt, char *name, const char *filename) { - add_work(WORK_FILE, name, xstrdup(filename)); + add_work(opt, WORK_FILE, name, xstrdup(filename)); } static void work_done(struct work_item *w) @@ -212,24 +225,26 @@ static void *run(void *arg) struct grep_opt *opt = arg; while (1) { + struct userdiff_driver *drv; struct work_item *w = get_work(); if (!w) break; opt->output_priv = w; + drv = w->userdiff_driver; if (w->type == WORK_SHA1) { unsigned long sz; void* data = load_sha1(w->identifier, &sz, w->name); if (data) { - hit |= grep_buffer(opt, w->name, data, sz); + hit |= grep_buffer(opt, w->name, drv, data, sz); free(data); } } else if (w->type == WORK_FILE) { size_t sz; void* data = load_file(w->identifier, &sz); if (data) { - hit |= grep_buffer(opt, w->name, data, sz); + hit |= grep_buffer(opt, w->name, drv, data, sz); free(data); } } else { @@ -417,10 +432,11 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, int hit; unsigned long sz; void *data = load_sha1(sha1, &sz, name); + struct userdiff_driver *drv = get_userdiff_driver(opt, name); if (!data) hit = 0; else - hit = grep_buffer(opt, name, data, sz); + hit = grep_buffer(opt, name, drv, data, sz); free(data); free(name); @@ -479,10 +495,11 @@ static int grep_file(struct grep_opt *opt, const char *filename) int hit; size_t sz; void *data = load_file(filename, &sz); + struct userdiff_driver *drv = get_userdiff_driver(opt, name); if (!data) hit = 0; else - hit = grep_buffer(opt, name, data, sz); + hit = grep_buffer(opt, name, drv, data, sz); free(data); free(name); @@ -1001,7 +1018,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.regflags |= REG_ICASE; #ifndef NO_PTHREADS - if (online_cpus() == 1 || !grep_threads_ok(&opt)) + if (online_cpus() == 1) use_threads = 0; if (use_threads) { diff --git a/grep.c b/grep.c index 7a070e9..ff14a98 100644 --- a/grep.c +++ b/grep.c @@ -942,25 +942,13 @@ static int look_ahead(struct grep_opt *opt, return 0; } -int grep_threads_ok(const struct grep_opt *opt) -{ - /* If this condition is true, then we may use the attribute - * machinery in grep_buffer_1. The attribute code is not - * thread safe, so we disable the use of threads. - */ - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) - return 0; - - return 1; -} - static void std_output(struct grep_opt *opt, const void *buf, size_t size) { fwrite(buf, size, 1, stdout); } static int grep_buffer_1(struct grep_opt *opt, const char *name, + struct userdiff_driver *drv, char *buf, unsigned long size, int collect_hits) { char *bol = buf; @@ -1011,7 +999,6 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, if ((opt->funcname || opt->funcbody) && !opt->unmatch_name_only && !opt->status_only && !opt->name_only && !binary_match_only && !collect_hits) { - struct userdiff_driver *drv = userdiff_find_by_path(name); if (drv && drv->funcname.pattern) { const struct userdiff_funcname *pe = &drv->funcname; xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); @@ -1167,23 +1154,25 @@ static int chk_hit_marker(struct grep_expr *x) } } -int grep_buffer(struct grep_opt *opt, const char *name, char *buf, unsigned long size) +int grep_buffer(struct grep_opt *opt, const char *name, + struct userdiff_driver *userdiff_driver, + char *buf, unsigned long size) { /* * we do not have to do the two-pass grep when we do not check * buffer-wide "all-match". */ if (!opt->all_match) - return grep_buffer_1(opt, name, buf, size, 0); + return grep_buffer_1(opt, name, userdiff_driver, buf, size, 0); /* Otherwise the toplevel "or" terms hit a bit differently. * We first clear hit markers from them. */ clr_hit_marker(opt->pattern_expression); - grep_buffer_1(opt, name, buf, size, 1); + grep_buffer_1(opt, name, userdiff_driver, buf, size, 1); if (!chk_hit_marker(opt->pattern_expression)) return 0; - return grep_buffer_1(opt, name, buf, size, 0); + return grep_buffer_1(opt, name, userdiff_driver, buf, size, 0); } diff --git a/grep.h b/grep.h index a652800..20224b5 100644 --- a/grep.h +++ b/grep.h @@ -121,14 +121,15 @@ struct grep_opt { void *output_priv; }; +struct userdiff_driver; + extern void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); extern void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *origin, int no, enum grep_pat_token t); extern void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const char *); extern void compile_grep_patterns(struct grep_opt *opt); extern void free_grep_patterns(struct grep_opt *opt); -extern int grep_buffer(struct grep_opt *opt, const char *name, char *buf, unsigned long size); +extern int grep_buffer(struct grep_opt *opt, const char *name, struct userdiff_driver *userdiff_driver, char *buf, unsigned long size); extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); -extern int grep_threads_ok(const struct grep_opt *opt); #endif diff --git a/revision.c b/revision.c index 8764dde..95cb8c2 100644 --- a/revision.c +++ b/revision.c @@ -2126,7 +2126,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt) if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list) return 1; return grep_buffer(&opt->grep_filter, - NULL, /* we say nothing, not even filename */ + NULL, NULL, /* we say nothing, not even filename */ commit->buffer, strlen(commit->buffer)); } -- 1.7.7 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH] grep: enable multi-threading for -p and -W 2011-11-26 12:15 ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe @ 2011-11-29 9:54 ` Thomas Rast 2011-11-29 13:49 ` René Scharfe 0 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-11-29 9:54 UTC (permalink / raw) To: René Scharfe; +Cc: git, Junio C Hamano René Scharfe wrote: > Move attribute reading, which is not thread-safe, into add_work(), under > grep_mutex. That way we can stop turning off multi-threading if -p or -W > is given, as the diff attribute for each file is gotten safely now. > > Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> > --- > Goes on top of your patch. Needs testing.. On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5): * none of the patches: git grep --cached INITRAMFS_ROOT_UID 2.13user 0.14system 0:02.10elapsed git grep --cached -W INITRAMFS_ROOT_UID # note: broken! 2.23user 0.18system 0:02.21elapsed * my patch, but not yours: git grep --cached INITRAMFS_ROOT_UID 2.21user 0.21system 0:02.27elapsed git grep --cached -W INITRAMFS_ROOT_UID 3.01user 0.05system 0:03.07elapsed * my patch + your patch: git grep --cached INITRAMFS_ROOT_UID 2.22user 0.17system 0:02.22elapsed git grep --cached -W INITRAMFS_ROOT_UID 4.45user 0.22system 0:02.67elapsed So while it ends up being faster overall, it also uses way more CPU and would presumably be *slower* on a single-processor system. Apparently those attribute lookups really hurt. So I had the following idea: if we load attributes only very lazily (that is, when match_funcname() is first called), we can ask for them much more rarely. The revised timings: * my patch + the new patch at the end: git grep --cached INITRAMFS_ROOT_UID 2.15user 0.16system 0:02.15elapsed 107%CPU git grep --cached -W INITRAMFS_ROOT_UID 2.20user 0.18system 0:02.24elapsed However, locking on a specific lock relies on the fact that the attributes are only read from the tree, but never from the object store. Perhaps it would be more futureproof to lock on read_sha1_mutex instead. Either way the lazy attributes lookup seems a big win. ------ 8< ------ 8< ------ Subject: [PATCH] grep: fine-grained locking around attributes access Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. diff --git a/builtin/grep.c b/builtin/grep.c index 988ea1d..822b32f 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt) pthread_mutex_init(&grep_mutex, NULL); pthread_mutex_init(&read_sha1_mutex, NULL); + pthread_mutex_init(&grep_attr_mutex, NULL); pthread_cond_init(&cond_add, NULL); pthread_cond_init(&cond_write, NULL); pthread_cond_init(&cond_result, NULL); @@ -303,6 +304,7 @@ static int wait_all(void) pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&read_sha1_mutex); + pthread_mutex_destroy(&grep_attr_mutex); pthread_cond_destroy(&cond_add); pthread_cond_destroy(&cond_write); pthread_cond_destroy(&cond_result); @@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.regflags |= REG_ICASE; #ifndef NO_PTHREADS - if (online_cpus() == 1 || !grep_threads_ok(&opt)) + if (online_cpus() == 1) use_threads = 0; +#endif + + opt.use_threads = use_threads; +#ifndef NO_PTHREADS if (use_threads) { if (opt.pre_context || opt.post_context || opt.file_break || opt.funcbody) diff --git a/grep.c b/grep.c index 7a070e9..e9c3df3 100644 --- a/grep.c +++ b/grep.c @@ -2,6 +2,7 @@ #include "grep.h" #include "userdiff.h" #include "xdiff-interface.h" +#include "thread-utils.h" void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat) { @@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, opt->output(opt, "\n", 1); } -static int match_funcname(struct grep_opt *opt, char *bol, char *eol) +#ifndef NO_PTHREADS +pthread_mutex_t grep_attr_mutex; + +static inline void grep_attr_lock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_lock(&grep_attr_mutex); +} + +static inline void grep_attr_unlock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_unlock(&grep_attr_mutex); +} +#endif + +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol) { xdemitconf_t *xecfg = opt->priv; - if (xecfg && xecfg->find_func) { + if (xecfg && !xecfg->find_func) { + grep_attr_lock(opt); + struct userdiff_driver *drv = userdiff_find_by_path(name); + grep_attr_unlock(opt); + if (drv && drv->funcname.pattern) { + const struct userdiff_funcname *pe = &drv->funcname; + xdiff_set_find_func(xecfg, pe->pattern, pe->cflags); + } else { + xecfg = opt->priv = NULL; + } + } + + if (xecfg) { char buf[1]; return xecfg->find_func(bol, eol - bol, buf, 1, xecfg->find_func_priv) >= 0; @@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name, if (lno <= opt->last_shown) break; - if (match_funcname(opt, bol, eol)) { + if (match_funcname(opt, name, bol, eol)) { show_line(opt, bol, eol, name, lno, '='); break; } @@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, unsigned cur = lno, from = 1, funcname_lno = 0; int funcname_needed = !!opt->funcname; - if (opt->funcbody && !match_funcname(opt, bol, end)) + if (opt->funcbody && !match_funcname(opt, name, bol, end)) funcname_needed = 2; if (opt->pre_context < lno) @@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, while (bol > buf && bol[-1] != '\n') bol--; cur--; - if (funcname_needed && match_funcname(opt, bol, eol)) { + if (funcname_needed && match_funcname(opt, name, bol, eol)) { funcname_lno = cur; funcname_needed = 0; } @@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt, return 0; } -int grep_threads_ok(const struct grep_opt *opt) -{ - /* If this condition is true, then we may use the attribute - * machinery in grep_buffer_1. The attribute code is not - * thread safe, so we disable the use of threads. - */ - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) - return 0; - - return 1; -} - static void std_output(struct grep_opt *opt, const void *buf, size_t size) { fwrite(buf, size, 1, stdout); @@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, if ((opt->funcname || opt->funcbody) && !opt->unmatch_name_only && !opt->status_only && !opt->name_only && !binary_match_only && !collect_hits) { - struct userdiff_driver *drv = userdiff_find_by_path(name); - if (drv && drv->funcname.pattern) { - const struct userdiff_funcname *pe = &drv->funcname; - xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); - opt->priv = &xecfg; - } + opt->priv = &xecfg; } try_lookahead = should_lookahead(opt); @@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, show_function = 1; goto next_line; } - if (show_function && match_funcname(opt, bol, eol)) + if (show_function && match_funcname(opt, name, bol, eol)) show_function = 0; if (show_function || (last_hit && lno <= last_hit + opt->post_context)) { diff --git a/grep.h b/grep.h index a652800..15d227c 100644 --- a/grep.h +++ b/grep.h @@ -115,6 +115,7 @@ struct grep_opt { int show_hunk_mark; int file_break; int heading; + int use_threads; void *priv; void (*output)(struct grep_opt *opt, const void *data, size_t size); @@ -131,4 +132,10 @@ struct grep_opt { extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); extern int grep_threads_ok(const struct grep_opt *opt); +#ifndef NO_PTHREADS +/* Mutex used around access to the attributes machinery if + * opt->use_threads. Must be initialized/destroyed by callers! */ +extern pthread_mutex_t grep_attr_mutex; +#endif + #endif -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH] grep: enable multi-threading for -p and -W 2011-11-29 9:54 ` Thomas Rast @ 2011-11-29 13:49 ` René Scharfe 2011-11-29 14:07 ` Thomas Rast 0 siblings, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-11-29 13:49 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Junio C Hamano Am 29.11.2011 10:54, schrieb Thomas Rast: > René Scharfe wrote: >> Move attribute reading, which is not thread-safe, into add_work(), under >> grep_mutex. That way we can stop turning off multi-threading if -p or -W >> is given, as the diff attribute for each file is gotten safely now. >> >> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> >> --- >> Goes on top of your patch. Needs testing.. > > On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5): > > * none of the patches: > git grep --cached INITRAMFS_ROOT_UID > 2.13user 0.14system 0:02.10elapsed > git grep --cached -W INITRAMFS_ROOT_UID # note: broken! > 2.23user 0.18system 0:02.21elapsed Broken is a tad too hard a word; IIUC -W just lacks support for function patterns of different file types, which is unintended and surprising of course. For the default case it works correctly (and threaded). > * my patch, but not yours: > git grep --cached INITRAMFS_ROOT_UID > 2.21user 0.21system 0:02.27elapsed > git grep --cached -W INITRAMFS_ROOT_UID > 3.01user 0.05system 0:03.07elapsed > > * my patch + your patch: > git grep --cached INITRAMFS_ROOT_UID > 2.22user 0.17system 0:02.22elapsed > git grep --cached -W INITRAMFS_ROOT_UID > 4.45user 0.22system 0:02.67elapsed > > So while it ends up being faster overall, it also uses way more CPU > and would presumably be *slower* on a single-processor system. > Apparently those attribute lookups really hurt. Hmm, why are they *that* expensive? callgrind tells me that userdiff_find_by_path() contributes only 0.18% to the total cost with your first patch. Timings in my virtual machine are very volatile, but it seems that here the difference is in the system time while user is basically the same for all combinations of patches. > So I had the following idea: if we load attributes only very lazily > (that is, when match_funcname() is first called), we can ask for them > much more rarely. The revised timings: > > * my patch + the new patch at the end: > git grep --cached INITRAMFS_ROOT_UID > 2.15user 0.16system 0:02.15elapsed 107%CPU > git grep --cached -W INITRAMFS_ROOT_UID > 2.20user 0.18system 0:02.24elapsed Nice. I wonder what caused the slight speedup for the case without -W. It's probably just noise, though. > However, locking on a specific lock relies on the fact that the > attributes are only read from the tree, but never from the object > store. Perhaps it would be more futureproof to lock on > read_sha1_mutex instead. Either way the lazy attributes lookup seems > a big win. You could lock read_sha1_mutex after grep_attr_mutex. With lazy loading it should have a low impact most of the time. > ------ 8< ------ 8< ------ > Subject: [PATCH] grep: fine-grained locking around attributes access > > Lazily load the userdiff attributes in match_funcname(). Use a > separate mutex around this loading to protect the (not thread-safe) > attributes machinery. This lets us re-enable threading with -p and > -W while reducing the overhead caused by looking up attributes. > > diff --git a/builtin/grep.c b/builtin/grep.c > index 988ea1d..822b32f 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt) > > pthread_mutex_init(&grep_mutex, NULL); > pthread_mutex_init(&read_sha1_mutex, NULL); > + pthread_mutex_init(&grep_attr_mutex, NULL); > pthread_cond_init(&cond_add, NULL); > pthread_cond_init(&cond_write, NULL); > pthread_cond_init(&cond_result, NULL); > @@ -303,6 +304,7 @@ static int wait_all(void) > > pthread_mutex_destroy(&grep_mutex); > pthread_mutex_destroy(&read_sha1_mutex); > + pthread_mutex_destroy(&grep_attr_mutex); > pthread_cond_destroy(&cond_add); > pthread_cond_destroy(&cond_write); > pthread_cond_destroy(&cond_result); > @@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > opt.regflags |= REG_ICASE; > > #ifndef NO_PTHREADS > - if (online_cpus() == 1 || !grep_threads_ok(&opt)) > + if (online_cpus() == 1) > use_threads = 0; > +#endif > + > + opt.use_threads = use_threads; > > +#ifndef NO_PTHREADS > if (use_threads) { > if (opt.pre_context || opt.post_context || opt.file_break || > opt.funcbody) > diff --git a/grep.c b/grep.c > index 7a070e9..e9c3df3 100644 > --- a/grep.c > +++ b/grep.c > @@ -2,6 +2,7 @@ > #include "grep.h" > #include "userdiff.h" > #include "xdiff-interface.h" > +#include "thread-utils.h" > > void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat) > { > @@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, > opt->output(opt, "\n", 1); > } > > -static int match_funcname(struct grep_opt *opt, char *bol, char *eol) > +#ifndef NO_PTHREADS > +pthread_mutex_t grep_attr_mutex; > + > +static inline void grep_attr_lock(struct grep_opt *opt) > +{ > + if (opt->use_threads) > + pthread_mutex_lock(&grep_attr_mutex); > +} > + > +static inline void grep_attr_unlock(struct grep_opt *opt) > +{ > + if (opt->use_threads) > + pthread_mutex_unlock(&grep_attr_mutex); > +} > +#endif We'd need stubs here for the case of NO_PTHREADS being defined. > + > +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol) > { > xdemitconf_t *xecfg = opt->priv; > - if (xecfg && xecfg->find_func) { > + if (xecfg && !xecfg->find_func) { > + grep_attr_lock(opt); > + struct userdiff_driver *drv = userdiff_find_by_path(name); > + grep_attr_unlock(opt); > + if (drv && drv->funcname.pattern) { > + const struct userdiff_funcname *pe = &drv->funcname; > + xdiff_set_find_func(xecfg, pe->pattern, pe->cflags); > + } else { > + xecfg = opt->priv = NULL; > + } > + } Perhaps leave the thread stuff in builtin/grep.c and export a function from there that does the above, with locking and all? > + > + if (xecfg) { > char buf[1]; > return xecfg->find_func(bol, eol - bol, buf, 1, > xecfg->find_func_priv) >= 0; > @@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name, > if (lno <= opt->last_shown) > break; > > - if (match_funcname(opt, bol, eol)) { > + if (match_funcname(opt, name, bol, eol)) { > show_line(opt, bol, eol, name, lno, '='); > break; > } > @@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, > unsigned cur = lno, from = 1, funcname_lno = 0; > int funcname_needed = !!opt->funcname; > > - if (opt->funcbody && !match_funcname(opt, bol, end)) > + if (opt->funcbody && !match_funcname(opt, name, bol, end)) > funcname_needed = 2; > > if (opt->pre_context < lno) > @@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, > while (bol > buf && bol[-1] != '\n') > bol--; > cur--; > - if (funcname_needed && match_funcname(opt, bol, eol)) { > + if (funcname_needed && match_funcname(opt, name, bol, eol)) { > funcname_lno = cur; > funcname_needed = 0; > } > @@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt, > return 0; > } > > -int grep_threads_ok(const struct grep_opt *opt) > -{ > - /* If this condition is true, then we may use the attribute > - * machinery in grep_buffer_1. The attribute code is not > - * thread safe, so we disable the use of threads. > - */ > - if ((opt->funcname || opt->funcbody) > - && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) > - return 0; > - > - return 1; > -} > - > static void std_output(struct grep_opt *opt, const void *buf, size_t size) > { > fwrite(buf, size, 1, stdout); > @@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, > if ((opt->funcname || opt->funcbody) > && !opt->unmatch_name_only && !opt->status_only && > !opt->name_only && !binary_match_only && !collect_hits) { > - struct userdiff_driver *drv = userdiff_find_by_path(name); > - if (drv && drv->funcname.pattern) { > - const struct userdiff_funcname *pe = &drv->funcname; > - xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); > - opt->priv = &xecfg; > - } > + opt->priv = &xecfg; > } opt->priv can be set unconditionally here now, since we only access it from within match_funcname() and that function is not called if any of the short-circuit options is set. > try_lookahead = should_lookahead(opt); > > @@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, > show_function = 1; > goto next_line; > } > - if (show_function && match_funcname(opt, bol, eol)) > + if (show_function && match_funcname(opt, name, bol, eol)) > show_function = 0; > if (show_function || > (last_hit && lno <= last_hit + opt->post_context)) { > diff --git a/grep.h b/grep.h > index a652800..15d227c 100644 > --- a/grep.h > +++ b/grep.h > @@ -115,6 +115,7 @@ struct grep_opt { > int show_hunk_mark; > int file_break; > int heading; > + int use_threads; > void *priv; > > void (*output)(struct grep_opt *opt, const void *data, size_t size); > @@ -131,4 +132,10 @@ struct grep_opt { > extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); > extern int grep_threads_ok(const struct grep_opt *opt); > > +#ifndef NO_PTHREADS > +/* Mutex used around access to the attributes machinery if > + * opt->use_threads. Must be initialized/destroyed by callers! */ > +extern pthread_mutex_t grep_attr_mutex; > +#endif > + > #endif > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH] grep: enable multi-threading for -p and -W 2011-11-29 13:49 ` René Scharfe @ 2011-11-29 14:07 ` Thomas Rast 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast 0 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-11-29 14:07 UTC (permalink / raw) To: René Scharfe; +Cc: git, Junio C Hamano René Scharfe wrote: > > * none of the patches: > > git grep --cached INITRAMFS_ROOT_UID > > 2.13user 0.14system 0:02.10elapsed > > git grep --cached -W INITRAMFS_ROOT_UID # note: broken! > > 2.23user 0.18system 0:02.21elapsed > > Broken is a tad too hard a word Sorry, I just wanted to mark it as: this is unattainable since we're now doing extra work. > > * my patch, but not yours: > > git grep --cached -W INITRAMFS_ROOT_UID > > 3.01user 0.05system 0:03.07elapsed > > > > * my patch + your patch: > > git grep --cached -W INITRAMFS_ROOT_UID > > 4.45user 0.22system 0:02.67elapsed > > > > So while it ends up being faster overall, it also uses way more CPU > > and would presumably be *slower* on a single-processor system. > > Apparently those attribute lookups really hurt. > > Hmm, why are they *that* expensive? > > callgrind tells me that userdiff_find_by_path() contributes only 0.18% > to the total cost with your first patch. Timings in my virtual machine > are very volatile, but it seems that here the difference is in the > system time while user is basically the same for all combinations of > patches. Not sure, perhaps callgrind does not model syscalls as expensive enough. I also suspect (from my very cursory look at the attributes machinery) that loading attributes for all paths *in random order* (i.e., threaded) causes a lot of discards of the existing attributes data. (The order is still random with my new patch, but we only load them for files that have matches.) > I wonder what caused the slight speedup for the case without -W. It's > probably just noise, though. Yeah, it's very noisy, but I was too lazy for best-of-50 or something ;-) [...] > > +#ifndef NO_PTHREADS > > +static inline void grep_attr_lock(struct grep_opt *opt) [...] > We'd need stubs here for the case of NO_PTHREADS being defined. Right, thanks. Sorry for not testing with NO_PTHREADS to begin with. > Perhaps leave the thread stuff in builtin/grep.c and export a function > from there that does [the userdiff lookup], with locking and all? That seems like a layering violation to me. Isn't builtin/grep.c supposed to call out to grep.c, but not the other way around? Maybe it would be less messy if we had a global (across all of git) flag that says whether threads are on. Then subsystems that are used from threaded code, but cannot handle it, can learn to lock themselves around their work. But it would be pretty much the opposite of what git-grep now does. Thanks for the review. I'll reroll as a proper patch later. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 0/3] grep multithreading and scaling 2011-11-29 14:07 ` Thomas Rast @ 2011-12-02 13:07 ` Thomas Rast 2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast ` (4 more replies) 0 siblings, 5 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw) To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano [Eric, I measured some numbers that may be interesting to the discussion about b2924dc. See below.] This round wraps up the original patch I posted, plus the draft patch I posted inline the other day with René's review taken into account. I also added a patch that rips out threading in the non-worktree case; read on for the reasoning. René Scharfe wrote: > Hmm, why are [gitattributes lookups] that expensive? > > callgrind tells me that userdiff_find_by_path() contributes only 0.18% > to the total cost with your first patch. Timings in my virtual machine > are very volatile, but it seems that here the difference is in the > system time while user is basically the same for all combinations of > patches. Well, turns out I was measuring something completely stupid. I had git grep --cached -W INITRAMFS_ROOT_UID where I put the --cached originally because that makes it independent of the worktree (which in the very first measurements I still had wiped, as I tend to do for this repo; I checked it out again after that). This in fact gives me (~/g/git-grep --cached INITRAMFS_ROOT_UID, leaving aside -W; best of 10): THREADS=8: 2.88user 0.21system 0:02.94elapsed THREADS=4: 2.89user 0.29system 0:02.99elapsed THREADS=2: 2.83user 0.36system 0:02.87elapsed NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed Uhuh. Doesn't scale so well after all. But removing the --cached, as most people probably would: THREADS=8: 0.19user 0.32system 0:00.16elapsed THREADS=4: 0.16user 0.34system 0:00.17elapsed THREADS=2: 0.18user 0.32system 0:00.26elapsed NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed So I conclude that during any grep that cannot use the worktree, having any threads hurts. In addition, during a grep that *can* use the worktree, THREADS=8 still helps somewhat on my dual-core i7, though it goes downhill from there (12 is again as fast as 4; I verified these details using best-of-50 timings, and it is reproducible.) I have also run timings on a 2*6-core workstation running OS X, where performance is best at 5 cores: 2 threads: 0.96 real 0.41 user 1.27 sys 3 threads: 0.68 real 0.41 user 1.30 sys 4 threads: 0.54 real 0.43 user 1.63 sys 5 threads: 0.50 real 0.41 user 1.51 sys 6 threads: 0.54 real 0.43 user 1.63 sys 7 threads: 0.86 real 0.49 user 1.93 sys 8 threads: 0.98 real 0.51 user 2.07 sys I kid you not. That's best-of-50 and rather stable. It's on the same tree as the Linux machine too, except for the problem that the OS X FS is set to case-insensitive and thus cannot represent the tree exactly. So from git's POV, there are unstaged changes. Sadly I do not have access to a Linux box having more than 2 physical cores. If you have one, please run some tests :-) So based on my measurements, I would suggest that unless we have evidence of it scaling beyond 8 cores on some machine, b2924dc (grep: detect number of CPUs for thread spawning) be dropped. For now I'm ignoring the problem that on OS X it doesn't even scale to 8; I'd rather check how it fares on Linux first. I added a third patch on top that disables threading in any case that does not hit the worktree. I wonder if I missed something or if it really is that simple. The neat part is that it's also a reduction in code required, and at the same time avoids any issues 2/3 might have with a future attributes-from-trees implementation. With this I get worktree, 8 threads: 0.15user 0.37system 0:00.17elapsed --cached, 8 threads: 2.18user 0.07system 0:02.27elapsed Of course, we could probably gain a huge boost if the read_sha1 machinery could be made threaded, so that it can unpack several objects at a time. In addition, I can well imagine that there are combinations of delta density, object size, and luck where it pays off to grep in parallel. Do we care? Now I really should do something else than fretting over the sub-second performance of git-grep... Thomas Rast (3): grep: load funcname patterns for -W grep: enable threading with -p and -W using lazy attribute lookup grep: disable threading in all but worktree case builtin/grep.c | 153 ++++++++++++++++-------------------------------------- grep.c | 73 ++++++++++++++++---------- grep.h | 7 +++ t/t7810-grep.sh | 14 +++++ 4 files changed, 112 insertions(+), 135 deletions(-) -- 1.7.8.rc4.388.ge53ab ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v2 1/3] grep: load funcname patterns for -W 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast @ 2011-12-02 13:07 ` Thomas Rast 2011-12-02 13:07 ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast ` (3 subsequent siblings) 4 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw) To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano git-grep avoids loading the funcname patterns unless they are needed. ba8ea74 (grep: add option to show whole function as context, 2011-08-01) forgot to extend this test also to the new funcbody feature. Do so. The catch is that we also have to disable threading when using userdiff, as explained in grep_threads_ok(). So we must be careful to introduce the same test there. --- grep.c | 7 ++++--- t/t7810-grep.sh | 14 ++++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index b29d09c..7a070e9 100644 --- a/grep.c +++ b/grep.c @@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt) * machinery in grep_buffer_1. The attribute code is not * thread safe, so we disable the use of threads. */ - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && - !opt->name_only) + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) return 0; return 1; @@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } memset(&xecfg, 0, sizeof(xecfg)); - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only && !binary_match_only && !collect_hits) { struct userdiff_driver *drv = userdiff_find_by_path(name); if (drv && drv->funcname.pattern) { diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 81263b7..7ba5b16 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -523,6 +523,20 @@ test_expect_success 'grep -W' ' test_cmp expected actual ' +cat >expected <<EOF +hello.c= printf("Hello world.\n"); +hello.c: return 0; +hello.c- /* char ?? */ +EOF + +test_expect_success 'grep -W with userdiff' ' + test_when_finished "rm -f .gitattributes" && + git config diff.custom.xfuncname "(printf.*|})$" && + echo "hello.c diff=custom" >.gitattributes && + git grep -W return >actual && + test_cmp expected actual +' + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 1.7.8.rc4.388.ge53ab ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast 2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast @ 2011-12-02 13:07 ` Thomas Rast 2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast ` (2 subsequent siblings) 4 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw) To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin/grep.c | 10 +++++++- grep.c | 74 ++++++++++++++++++++++++++++++++++---------------------- grep.h | 7 +++++ 3 files changed, 61 insertions(+), 30 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 988ea1d..65b1ffe 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt) pthread_mutex_init(&grep_mutex, NULL); pthread_mutex_init(&read_sha1_mutex, NULL); + pthread_mutex_init(&grep_attr_mutex, NULL); pthread_cond_init(&cond_add, NULL); pthread_cond_init(&cond_write, NULL); pthread_cond_init(&cond_result, NULL); @@ -303,6 +304,7 @@ static int wait_all(void) pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&read_sha1_mutex); + pthread_mutex_destroy(&grep_attr_mutex); pthread_cond_destroy(&cond_add); pthread_cond_destroy(&cond_write); pthread_cond_destroy(&cond_result); @@ -1002,9 +1004,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.regflags |= REG_ICASE; #ifndef NO_PTHREADS - if (online_cpus() == 1 || !grep_threads_ok(&opt)) + if (online_cpus() == 1) use_threads = 0; +#else + use_threads = 0; +#endif + opt.use_threads = use_threads; + +#ifndef NO_PTHREADS if (use_threads) { if (opt.pre_context || opt.post_context || opt.file_break || opt.funcbody) diff --git a/grep.c b/grep.c index 7a070e9..4dd7da2 100644 --- a/grep.c +++ b/grep.c @@ -2,6 +2,7 @@ #include "grep.h" #include "userdiff.h" #include "xdiff-interface.h" +#include "thread-utils.h" void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat) { @@ -806,10 +807,46 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, opt->output(opt, "\n", 1); } -static int match_funcname(struct grep_opt *opt, char *bol, char *eol) +#ifndef NO_PTHREADS +/* + * This lock protects access to the gitattributes machinery, which is + * not thread-safe. + */ +pthread_mutex_t grep_attr_mutex; + +static inline void grep_attr_lock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_lock(&grep_attr_mutex); +} + +static inline void grep_attr_unlock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_unlock(&grep_attr_mutex); +} +#else +#define grep_attr_lock(opt) +#define grep_attr_unlock(opt) +#endif + +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol) { xdemitconf_t *xecfg = opt->priv; - if (xecfg && xecfg->find_func) { + if (xecfg && !xecfg->find_func) { + struct userdiff_driver *drv; + grep_attr_lock(opt); + drv = userdiff_find_by_path(name); + grep_attr_unlock(opt); + if (drv && drv->funcname.pattern) { + const struct userdiff_funcname *pe = &drv->funcname; + xdiff_set_find_func(xecfg, pe->pattern, pe->cflags); + } else { + xecfg = opt->priv = NULL; + } + } + + if (xecfg) { char buf[1]; return xecfg->find_func(bol, eol - bol, buf, 1, xecfg->find_func_priv) >= 0; @@ -835,7 +872,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name, if (lno <= opt->last_shown) break; - if (match_funcname(opt, bol, eol)) { + if (match_funcname(opt, name, bol, eol)) { show_line(opt, bol, eol, name, lno, '='); break; } @@ -848,7 +885,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, unsigned cur = lno, from = 1, funcname_lno = 0; int funcname_needed = !!opt->funcname; - if (opt->funcbody && !match_funcname(opt, bol, end)) + if (opt->funcbody && !match_funcname(opt, name, bol, end)) funcname_needed = 2; if (opt->pre_context < lno) @@ -864,7 +901,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, while (bol > buf && bol[-1] != '\n') bol--; cur--; - if (funcname_needed && match_funcname(opt, bol, eol)) { + if (funcname_needed && match_funcname(opt, name, bol, eol)) { funcname_lno = cur; funcname_needed = 0; } @@ -942,19 +979,6 @@ static int look_ahead(struct grep_opt *opt, return 0; } -int grep_threads_ok(const struct grep_opt *opt) -{ - /* If this condition is true, then we may use the attribute - * machinery in grep_buffer_1. The attribute code is not - * thread safe, so we disable the use of threads. - */ - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) - return 0; - - return 1; -} - static void std_output(struct grep_opt *opt, const void *buf, size_t size) { fwrite(buf, size, 1, stdout); @@ -1008,16 +1032,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } memset(&xecfg, 0, sizeof(xecfg)); - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && - !opt->name_only && !binary_match_only && !collect_hits) { - struct userdiff_driver *drv = userdiff_find_by_path(name); - if (drv && drv->funcname.pattern) { - const struct userdiff_funcname *pe = &drv->funcname; - xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); - opt->priv = &xecfg; - } - } + opt->priv = &xecfg; + try_lookahead = should_lookahead(opt); while (left) { @@ -1093,7 +1109,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, show_function = 1; goto next_line; } - if (show_function && match_funcname(opt, bol, eol)) + if (show_function && match_funcname(opt, name, bol, eol)) show_function = 0; if (show_function || (last_hit && lno <= last_hit + opt->post_context)) { diff --git a/grep.h b/grep.h index a652800..15d227c 100644 --- a/grep.h +++ b/grep.h @@ -115,6 +115,7 @@ struct grep_opt { int show_hunk_mark; int file_break; int heading; + int use_threads; void *priv; void (*output)(struct grep_opt *opt, const void *data, size_t size); @@ -131,4 +132,10 @@ struct grep_opt { extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); extern int grep_threads_ok(const struct grep_opt *opt); +#ifndef NO_PTHREADS +/* Mutex used around access to the attributes machinery if + * opt->use_threads. Must be initialized/destroyed by callers! */ +extern pthread_mutex_t grep_attr_mutex; +#endif + #endif -- 1.7.8.rc4.388.ge53ab ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast 2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast 2011-12-02 13:07 ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast @ 2011-12-02 13:07 ` Thomas Rast 2011-12-02 16:15 ` René Scharfe 2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason 2011-12-02 17:34 ` [PATCH v2 0/3] grep multithreading and scaling Jeff King 2011-12-02 20:02 ` Eric Herman 4 siblings, 2 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-02 13:07 UTC (permalink / raw) To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano Measuring grep performance showed that in all but the worktree case (as opposed to --cached, <committish> or <treeish>), threading actually slows things down. For example, on my dual-core hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got: Threads worktree case | --cached case -------------------------------------------------------------------------- 8 (default) | 2.17user 0.15sys 0:02.20real | 0.11user 0.26sys 0:00.11real 4 | 2.06user 0.17sys 0:02.08real | 0.11user 0.26sys 0:00.12real 2 | 2.02user 0.25sys 0:02.08real | 0.15user 0.37sys 0:00.28real NO_PTHREADS | 1.57user 0.05sys 0:01.64real | 0.09user 0.12sys 0:00.22real I conjecture that this is caused by contention on read_sha1_mutex. So disable threading entirely when not scanning the worktree, to get the NO_PTHREADS performance in that case. This obsoletes all code related to grep_sha1_async. The thread startup must be delayed until after all arguments have been parsed, but this does not have a measurable effect. --- builtin/grep.c | 157 ++++++++++++++++---------------------------------------- 1 files changed, 44 insertions(+), 113 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 65b1ffe..edf6a31 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -34,21 +34,13 @@ const char *name); static void *load_file(const char *filename, size_t *sz); -enum work_type {WORK_SHA1, WORK_FILE}; - /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the * consumers pick work items from the same array. */ struct work_item { - enum work_type type; char *name; - - /* if type == WORK_SHA1, then 'identifier' is a SHA1, - * otherwise type == WORK_FILE, and 'identifier' is a NUL - * terminated filename. - */ - void *identifier; + char *filename; char done; struct strbuf out; }; @@ -86,21 +78,6 @@ static inline void grep_unlock(void) pthread_mutex_unlock(&grep_mutex); } -/* Used to serialize calls to read_sha1_file. */ -static pthread_mutex_t read_sha1_mutex; - -static inline void read_sha1_lock(void) -{ - if (use_threads) - pthread_mutex_lock(&read_sha1_mutex); -} - -static inline void read_sha1_unlock(void) -{ - if (use_threads) - pthread_mutex_unlock(&read_sha1_mutex); -} - /* Signalled when a new work_item is added to todo. */ static pthread_cond_t cond_add; @@ -114,7 +91,7 @@ static inline void read_sha1_unlock(void) static int skip_first_line; -static void add_work(enum work_type type, char *name, void *id) +static void add_work(char *name, char *filename) { grep_lock(); @@ -122,9 +99,8 @@ static void add_work(enum work_type type, char *name, void *id) pthread_cond_wait(&cond_write, &grep_mutex); } - todo[todo_end].type = type; todo[todo_end].name = name; - todo[todo_end].identifier = id; + todo[todo_end].filename = filename; todo[todo_end].done = 0; strbuf_reset(&todo[todo_end].out); todo_end = (todo_end + 1) % ARRAY_SIZE(todo); @@ -152,19 +128,10 @@ static void add_work(enum work_type type, char *name, void *id) return ret; } -static void grep_sha1_async(struct grep_opt *opt, char *name, - const unsigned char *sha1) -{ - unsigned char *s; - s = xmalloc(20); - memcpy(s, sha1, 20); - add_work(WORK_SHA1, name, s); -} - static void grep_file_async(struct grep_opt *opt, char *name, const char *filename) { - add_work(WORK_FILE, name, xstrdup(filename)); + add_work(name, xstrdup(filename)); } static void work_done(struct work_item *w) @@ -194,7 +161,7 @@ static void work_done(struct work_item *w) write_or_die(1, p, len); } free(w->name); - free(w->identifier); + free(w->filename); } if (old_done != todo_done) @@ -213,29 +180,18 @@ static void work_done(struct work_item *w) while (1) { struct work_item *w = get_work(); + size_t sz; + void* data; + if (!w) break; opt->output_priv = w; - if (w->type == WORK_SHA1) { - unsigned long sz; - void* data = load_sha1(w->identifier, &sz, w->name); - - if (data) { - hit |= grep_buffer(opt, w->name, data, sz); - free(data); - } - } else if (w->type == WORK_FILE) { - size_t sz; - void* data = load_file(w->identifier, &sz); - if (data) { - hit |= grep_buffer(opt, w->name, data, sz); - free(data); - } - } else { - assert(0); + data = load_file(w->filename, &sz); + if (data) { + hit |= grep_buffer(opt, w->name, data, sz); + free(data); } - work_done(w); } free_grep_patterns(arg); @@ -255,7 +211,6 @@ static void start_threads(struct grep_opt *opt) int i; pthread_mutex_init(&grep_mutex, NULL); - pthread_mutex_init(&read_sha1_mutex, NULL); pthread_mutex_init(&grep_attr_mutex, NULL); pthread_cond_init(&cond_add, NULL); pthread_cond_init(&cond_write, NULL); @@ -303,7 +258,6 @@ static int wait_all(void) } pthread_mutex_destroy(&grep_mutex); - pthread_mutex_destroy(&read_sha1_mutex); pthread_mutex_destroy(&grep_attr_mutex); pthread_cond_destroy(&cond_add); pthread_cond_destroy(&cond_write); @@ -312,9 +266,6 @@ static int wait_all(void) return hit; } #else /* !NO_PTHREADS */ -#define read_sha1_lock() -#define read_sha1_unlock() - static int wait_all(void) { return 0; @@ -371,21 +322,11 @@ static int grep_config(const char *var, const char *value, void *cb) return 0; } -static void *lock_and_read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size) -{ - void *data; - - read_sha1_lock(); - data = read_sha1_file(sha1, type, size); - read_sha1_unlock(); - return data; -} - static void *load_sha1(const unsigned char *sha1, unsigned long *size, const char *name) { enum object_type type; - void *data = lock_and_read_sha1_file(sha1, &type, size); + void *data = read_sha1_file(sha1, &type, size); if (!data) error(_("'%s': unable to read %s"), name, sha1_to_hex(sha1)); @@ -398,6 +339,9 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, { struct strbuf pathbuf = STRBUF_INIT; char *name; + int hit; + unsigned long sz; + void *data; if (opt->relative && opt->prefix_length) { quote_path_relative(filename + tree_name_len, -1, &pathbuf, @@ -409,25 +353,15 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, name = strbuf_detach(&pathbuf, NULL); -#ifndef NO_PTHREADS - if (use_threads) { - grep_sha1_async(opt, name, sha1); - return 0; - } else -#endif - { - int hit; - unsigned long sz; - void *data = load_sha1(sha1, &sz, name); - if (!data) - hit = 0; - else - hit = grep_buffer(opt, name, data, sz); + data = load_sha1(sha1, &sz, name); + if (!data) + hit = 0; + else + hit = grep_buffer(opt, name, data, sz); - free(data); - free(name); - return hit; - } + free(data); + free(name); + return hit; } static void *load_file(const char *filename, size_t *sz) @@ -586,7 +520,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, void *data; unsigned long size; - data = lock_and_read_sha1_file(entry.sha1, &type, &size); + data = read_sha1_file(entry.sha1, &type, &size); if (!data) die(_("unable to read tree (%s)"), sha1_to_hex(entry.sha1)); @@ -616,10 +550,8 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, struct strbuf base; int hit, len; - read_sha1_lock(); data = read_object_with_reference(obj->sha1, tree_type, &size, NULL); - read_sha1_unlock(); if (!data) die(_("unable to read tree (%s)"), sha1_to_hex(obj->sha1)); @@ -1003,26 +935,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.fixed && opt.ignore_case) opt.regflags |= REG_ICASE; -#ifndef NO_PTHREADS - if (online_cpus() == 1) - use_threads = 0; -#else - use_threads = 0; -#endif - - opt.use_threads = use_threads; - -#ifndef NO_PTHREADS - if (use_threads) { - if (opt.pre_context || opt.post_context || opt.file_break || - opt.funcbody) - skip_first_line = 1; - start_threads(&opt); - } -#else - use_threads = 0; -#endif - compile_grep_patterns(&opt); /* Check revs and then paths */ @@ -1044,6 +956,25 @@ int cmd_grep(int argc, const char **argv, const char *prefix) break; } +#ifndef NO_PTHREADS + if (online_cpus() == 1 || cached || list.nr) + use_threads = 0; +#else + use_threads = 0; +#endif + + opt.use_threads = use_threads; + +#ifndef NO_PTHREADS + if (use_threads) { + opt.use_threads = use_threads; + if (opt.pre_context || opt.post_context || opt.file_break || + opt.funcbody) + skip_first_line = 1; + start_threads(&opt); + } +#endif + /* The rest are paths */ if (!seen_dashdash) { int j; -- 1.7.8.rc4.388.ge53ab ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast @ 2011-12-02 16:15 ` René Scharfe 2011-12-05 9:02 ` Thomas Rast ` (2 more replies) 2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason 1 sibling, 3 replies; 50+ messages in thread From: René Scharfe @ 2011-12-02 16:15 UTC (permalink / raw) To: Thomas Rast; +Cc: Eric Herman, git, Junio C Hamano Am 02.12.2011 14:07, schrieb Thomas Rast: > Measuring grep performance showed that in all but the worktree case > (as opposed to --cached,<committish> or<treeish>), threading > actually slows things down. For example, on my dual-core > hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got: > > Threads worktree case | --cached case > -------------------------------------------------------------------------- > 8 (default) | 2.17user 0.15sys 0:02.20real | 0.11user 0.26sys 0:00.11real > 4 | 2.06user 0.17sys 0:02.08real | 0.11user 0.26sys 0:00.12real > 2 | 2.02user 0.25sys 0:02.08real | 0.15user 0.37sys 0:00.28real > NO_PTHREADS | 1.57user 0.05sys 0:01.64real | 0.09user 0.12sys 0:00.22real Are the columns mixed up? > I conjecture that this is caused by contention on read_sha1_mutex. Yeah, and I wonder why we need to have this lock in the first place. In theory, multiple readers shouldn't have to affect each other at all, right? The lock could be pushed down into read_sha1_file(), or a thread-safe variant of the function added. In pratice, however, the code in sha1_file.c etc. scares me. ;-) > So disable threading entirely when not scanning the worktree, to get > the NO_PTHREADS performance in that case. This obsoletes all code > related to grep_sha1_async. The thread startup must be delayed until > after all arguments have been parsed, but this does not have a > measurable effect. This is a bit radical. I think the underlying issue that read_sha1_file() is not thread-safe can be solved eventually and then we'd need to readd that code. How about adding a parameter to control the number of threads (--threads?) instead that defaults to eight (or five) for the worktree and one for the rest? That would also make benchmarking easier. René PS: Patches one and three missed a signoff. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-02 16:15 ` René Scharfe @ 2011-12-05 9:02 ` Thomas Rast 2011-12-06 22:48 ` René Scharfe 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast 2 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-05 9:02 UTC (permalink / raw) To: René Scharfe; +Cc: Eric Herman, git, Junio C Hamano René Scharfe wrote: > Am 02.12.2011 14:07, schrieb Thomas Rast: > > Measuring grep performance showed that in all but the worktree case > > (as opposed to --cached,<committish> or<treeish>), threading > > actually slows things down. For example, on my dual-core > > hyperthreaded i7 in a linux-2.6.git at v2.6.37-rc2, I got: > > > > Threads worktree case | --cached case > > -------------------------------------------------------------------------- > > 8 (default) | 2.17user 0.15sys 0:02.20real | 0.11user 0.26sys 0:00.11real > > 4 | 2.06user 0.17sys 0:02.08real | 0.11user 0.26sys 0:00.12real > > 2 | 2.02user 0.25sys 0:02.08real | 0.15user 0.37sys 0:00.28real > > NO_PTHREADS | 1.57user 0.05sys 0:01.64real | 0.09user 0.12sys 0:00.22real > > Are the columns mixed up? Indeed, sorry. In case you were wondering why this table is different from the numbers given in the cover letter: I noticed at some point that I had an incomplete checkout (apparently 'git checkout -- .' is really not the same as 'git reset --hard'... sigh). Then I saw that while the numbers were different, the conclusion was not, so I forgot to update it. > This is a bit radical. I think the underlying issue that > read_sha1_file() is not thread-safe can be solved eventually and then > we'd need to readd that code. I'm also scared of sha1_file.c, especially when it gets down to packfiles. But perhaps it wouldn't be *too* hard to do it in parallel iff the object can be read from the loose object store. > PS: Patches one and three missed a signoff. Oops, thanks, turns out I had a misconfigured alias ... -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-02 16:15 ` René Scharfe 2011-12-05 9:02 ` Thomas Rast @ 2011-12-06 22:48 ` René Scharfe 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe ` (2 more replies) 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast 2 siblings, 3 replies; 50+ messages in thread From: René Scharfe @ 2011-12-06 22:48 UTC (permalink / raw) To: git; +Cc: Eric Herman, git, Junio C Hamano Am 02.12.2011 17:15, schrieb René Scharfe: > How about adding a parameter to control the number of threads > (--threads?) instead that defaults to eight (or five) for the worktree > and one for the rest? That would also make benchmarking easier. Like this: -- >8 -- Subject: grep: add parameter --threads Allow the number of threads to be specified by the user. This makes benchmarking the performance impact of different numbers of threads much easier. Move the code for thread handling after argument parsing. This allows to change the default number of threads based on the kind of search (worktree etc.) later on. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- Applies on top of your second patch. Documentation/git-grep.txt | 4 ++ builtin/grep.c | 75 +++++++++++++++++++++++-------------------- 2 files changed, 44 insertions(+), 35 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 15d6711..47ac188 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -227,6 +227,10 @@ OPTIONS Do not output matched lines; instead, exit with status 0 when there is a match and with non-zero status when there isn't. +--threads <n>:: + Run <n> search threads in parallel. Default is 8. This option + is ignored if git was built without support for POSIX threads. + <tree>...:: Instead of searching tracked files in the working tree, search blobs in the given trees. diff --git a/builtin/grep.c b/builtin/grep.c index 65b1ffe..0bda900 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,10 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; - #ifndef NO_PTHREADS #define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; +static int nr_threads = -1; static void *load_sha1(const unsigned char *sha1, unsigned long *size, const char *name); @@ -76,13 +75,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (nr_threads > 0) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (nr_threads > 0) pthread_mutex_unlock(&grep_mutex); } @@ -91,13 +90,13 @@ static pthread_mutex_t read_sha1_mutex; static inline void read_sha1_lock(void) { - if (use_threads) + if (nr_threads > 0) pthread_mutex_lock(&read_sha1_mutex); } static inline void read_sha1_unlock(void) { - if (use_threads) + if (nr_threads > 0) pthread_mutex_unlock(&read_sha1_mutex); } @@ -254,6 +253,8 @@ static void start_threads(struct grep_opt *opt) { int i; + threads = xcalloc(nr_threads, sizeof(pthread_t)); + pthread_mutex_init(&grep_mutex, NULL); pthread_mutex_init(&read_sha1_mutex, NULL); pthread_mutex_init(&grep_attr_mutex, NULL); @@ -265,7 +266,7 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < nr_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -296,7 +297,7 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < nr_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; @@ -309,6 +310,8 @@ static int wait_all(void) pthread_cond_destroy(&cond_write); pthread_cond_destroy(&cond_result); + free(threads); + return hit; } #else /* !NO_PTHREADS */ @@ -410,7 +413,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, name = strbuf_detach(&pathbuf, NULL); #ifndef NO_PTHREADS - if (use_threads) { + if (nr_threads > 0) { grep_sha1_async(opt, name, sha1); return 0; } else @@ -472,7 +475,7 @@ static int grep_file(struct grep_opt *opt, const char *filename) name = strbuf_detach(&buf, NULL); #ifndef NO_PTHREADS - if (use_threads) { + if (nr_threads > 0) { grep_file_async(opt, name, filename); return 0; } else @@ -895,6 +898,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_OPTARG, NULL, (intptr_t)default_pager }, OPT_BOOLEAN(0, "ext-grep", &external_grep_allowed__ignored, "allow calling of grep(1) (ignored by this build)"), +#ifdef NO_PTHREADS + OPT_INTEGER(0, "threads", &nr_threads, + "handle <n> files in parallel (ignored by this build)"), +#else + OPT_INTEGER(0, "threads", &nr_threads, + "handle <n> files in parallel"), +#endif { OPTION_CALLBACK, 0, "help-all", &options, NULL, "show usage", PARSE_OPT_HIDDEN | PARSE_OPT_NOARG, help_callback }, OPT_END() @@ -995,7 +1005,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.output_priv = &path_list; opt.output = append_path; string_list_append(&path_list, show_in_pager); - use_threads = 0; + nr_threads = 0; } if (!opt.pattern_list) @@ -1003,28 +1013,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.fixed && opt.ignore_case) opt.regflags |= REG_ICASE; -#ifndef NO_PTHREADS - if (online_cpus() == 1) - use_threads = 0; -#else - use_threads = 0; -#endif - - opt.use_threads = use_threads; - -#ifndef NO_PTHREADS - if (use_threads) { - if (opt.pre_context || opt.post_context || opt.file_break || - opt.funcbody) - skip_first_line = 1; - start_threads(&opt); - } -#else - use_threads = 0; -#endif - - compile_grep_patterns(&opt); - /* Check revs and then paths */ for (i = 0; i < argc; i++) { const char *arg = argv[i]; @@ -1056,6 +1044,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix) pathspec.max_depth = opt.max_depth; pathspec.recursive = 1; +#ifdef NO_PTHREADS + nr_threads = 0; +#else + if (nr_threads == -1) + nr_threads = (online_cpus() > 1) ? THREADS : 0; + + if (nr_threads > 0) { + opt.use_threads = 1; + if (opt.pre_context || opt.post_context || opt.file_break || + opt.funcbody) + skip_first_line = 1; + start_threads(&opt); + } +#endif + + compile_grep_patterns(&opt); + if (show_in_pager && (cached || list.nr)) die(_("--open-files-in-pager only works on the worktree")); @@ -1100,7 +1105,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) hit = grep_objects(&opt, &pathspec, &list); } - if (use_threads) + if (nr_threads > 0) hit |= wait_all(); if (hit && show_in_pager) run_pager(&opt, prefix); -- 1.7.8 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-06 22:48 ` René Scharfe @ 2011-12-06 23:01 ` René Scharfe 2011-12-07 4:42 ` Jeff King 2011-12-07 8:12 ` Thomas Rast 2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King 2011-12-07 8:11 ` Thomas Rast 2 siblings, 2 replies; 50+ messages in thread From: René Scharfe @ 2011-12-06 23:01 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano Reading of git objects needs to be protected by an exclusive lock and cannot be parallelized. Searching the read buffers can be done in parallel, but for simple expressions threading is a net loss due to its overhead, as measured by Thomas. Turn it off unless we're searching in the worktree. Once the object store can be read safely by multiple threads in parallel this patch should be reverted. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- Goes on top of my earlier patch. Could use a better commit message with your (cleaned up) performance numbers. Documentation/git-grep.txt | 5 +++-- builtin/grep.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 47ac188..e981a9b 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -228,8 +228,9 @@ OPTIONS there is a match and with non-zero status when there isn't. --threads <n>:: - Run <n> search threads in parallel. Default is 8. This option - is ignored if git was built without support for POSIX threads. + Run <n> search threads in parallel. Default is 8 when searching + the worktree and 0 otherwise. This option is ignored if git was + built without support for POSIX threads. <tree>...:: Instead of searching tracked files in the working tree, search diff --git a/builtin/grep.c b/builtin/grep.c index 0bda900..f698642 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) nr_threads = 0; #else if (nr_threads == -1) - nr_threads = (online_cpus() > 1) ? THREADS : 0; + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; if (nr_threads > 0) { opt.use_threads = 1; -- 1.7.8 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe @ 2011-12-07 4:42 ` Jeff King 2011-12-07 17:11 ` René Scharfe 2011-12-07 20:11 ` J. Bruce Fields 2011-12-07 8:12 ` Thomas Rast 1 sibling, 2 replies; 50+ messages in thread From: Jeff King @ 2011-12-07 4:42 UTC (permalink / raw) To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote: > Reading of git objects needs to be protected by an exclusive lock > and cannot be parallelized. Searching the read buffers can be done > in parallel, but for simple expressions threading is a net loss due > to its overhead, as measured by Thomas. Turn it off unless we're > searching in the worktree. Based on my earlier numbers, I was going to complain that we should also be checking the "simple expressions" assumption here, as time spent in the actual regex might be important. However, after trying to repeat my experiment, I think the numbers I posted earlier were misleading. For example, using my "more complex" regex of 'a.*b': $ time git grep --threads=8 'a.*b' HEAD >/dev/null real 0m8.655s user 0m23.817s sys 0m0.480s Look at that sweet, sweet parallelism. It's a quad-core with hyperthreading, so we're not getting the 8x speedup we might hope for (presumably due to lock contention on extracting blobs), but hey, 3x isn't bad. Except, wait: $ time git grep --threads=0 'a.*b' HEAD >/dev/null real 0m7.651s user 0m7.600s sys 0m0.048s We can get 1x on a single core, but the total time is lower! This processor is an i7 with "turbo boost", which means it clocks higher in single-core mode than when multiple cores are active. So the numbers I posted earlier were misleading. Yes, we got parallelism, but at the cost of knocking the clock speed down for a net loss. The sweet spot for me seems to be: $ time git grep --threads=2 'a.*b' HEAD >/dev/null real 0m6.303s user 0m11.129s sys 0m0.220s I'd be curious to see results from somebody with a quad-core (or more) without turbo boost; I suspect that threading may have more benefit there, even though we have some lock contention for blobs. > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > nr_threads = 0; > #else > if (nr_threads == -1) > - nr_threads = (online_cpus() > 1) ? THREADS : 0; > + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; > > if (nr_threads > 0) { > opt.use_threads = 1; This doesn't kick in for "--cached", which has the same performance characteristics as grepping a tree. I think you want to add "&& !cached" to the conditional. -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 4:42 ` Jeff King @ 2011-12-07 17:11 ` René Scharfe 2011-12-07 18:28 ` Jeff King 2011-12-07 20:11 ` J. Bruce Fields 1 sibling, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-12-07 17:11 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano Am 07.12.2011 05:42, schrieb Jeff King: > On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote: > >> Reading of git objects needs to be protected by an exclusive lock >> and cannot be parallelized. Searching the read buffers can be done >> in parallel, but for simple expressions threading is a net loss due >> to its overhead, as measured by Thomas. Turn it off unless we're >> searching in the worktree. > > Based on my earlier numbers, I was going to complain that we should > also be checking the "simple expressions" assumption here, as time spent > in the actual regex might be important. > > However, after trying to repeat my experiment, I think the numbers I > posted earlier were misleading. For example, using my "more complex" > regex of 'a.*b': > > $ time git grep --threads=8 'a.*b' HEAD >/dev/null > real 0m8.655s > user 0m23.817s > sys 0m0.480s > > Look at that sweet, sweet parallelism. It's a quad-core with > hyperthreading, so we're not getting the 8x speedup we might hope for > (presumably due to lock contention on extracting blobs), but hey, 3x > isn't bad. Except, wait: > > $ time git grep --threads=0 'a.*b' HEAD >/dev/null > real 0m7.651s > user 0m7.600s > sys 0m0.048s > > We can get 1x on a single core, but the total time is lower! This > processor is an i7 with "turbo boost", which means it clocks higher in > single-core mode than when multiple cores are active. So the numbers I > posted earlier were misleading. Yes, we got parallelism, but at the cost > of knocking the clock speed down for a net loss. Ugh, right, Turbo Boost complicates matters. I don't understand the multiplied user time in the threaded case, though. Is it caused by busy-waiting? Thomas reported similar numbers earlier. > The sweet spot for me seems to be: > > $ time git grep --threads=2 'a.*b' HEAD >/dev/null > real 0m6.303s > user 0m11.129s > sys 0m0.220s > > I'd be curious to see results from somebody with a quad-core (or more) > without turbo boost; I suspect that threading may have more benefit > there, even though we have some lock contention for blobs. It would be nice if we could come up with simple rules to calculate defaults for the number of threads on a given run. Users shouldn't have to specify this option normally. And it would be good if these rules didn't require a list of all CPUs known to git. :) >> --- a/builtin/grep.c >> +++ b/builtin/grep.c >> @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) >> nr_threads = 0; >> #else >> if (nr_threads == -1) >> - nr_threads = (online_cpus() > 1) ? THREADS : 0; >> + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; >> >> if (nr_threads > 0) { >> opt.use_threads = 1; > > This doesn't kick in for "--cached", which has the same performance > characteristics as grepping a tree. I think you want to add "&& !cached" to > the conditional. Oh, yes. René ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 17:11 ` René Scharfe @ 2011-12-07 18:28 ` Jeff King 0 siblings, 0 replies; 50+ messages in thread From: Jeff King @ 2011-12-07 18:28 UTC (permalink / raw) To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano On Wed, Dec 07, 2011 at 06:11:47PM +0100, René Scharfe wrote: > > $ time git grep --threads=8 'a.*b' HEAD >/dev/null > > real 0m8.655s > > user 0m23.817s > > sys 0m0.480s > [...] > > Ugh, right, Turbo Boost complicates matters. > > I don't understand the multiplied user time in the threaded case, > though. Is it caused by busy-waiting? Thomas reported similar numbers > earlier. I think it's mostly the clock speed. This processor runs at 1.86GHz but boosts to 3.2GHz. So we'd expect just the actual work to take close to twice as long. Plus it's a quad-core with hyperthreading, so 8 threads is going to mean two threads sharing each core, including cache (i.e., hyperthreading a core does not let you double performance, even though it presents itself as an extra core). And then you have context switching and lock overhead. So I can believe that it takes 3x the CPU time to accomplish the task. In an ideal world, it would be mitigated by having 8x the threads, but in this case, lock contention brings us down to less than 3x, and it's a slight net loss. -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 4:42 ` Jeff King 2011-12-07 17:11 ` René Scharfe @ 2011-12-07 20:11 ` J. Bruce Fields 2011-12-07 20:45 ` Jeff King 1 sibling, 1 reply; 50+ messages in thread From: J. Bruce Fields @ 2011-12-07 20:11 UTC (permalink / raw) To: Jeff King Cc: René Scharfe, Thomas Rast, git, Eric Herman, Junio C Hamano On Tue, Dec 06, 2011 at 11:42:42PM -0500, Jeff King wrote: > On Wed, Dec 07, 2011 at 12:01:37AM +0100, René Scharfe wrote: > > > Reading of git objects needs to be protected by an exclusive lock > > and cannot be parallelized. Searching the read buffers can be done > > in parallel, but for simple expressions threading is a net loss due > > to its overhead, as measured by Thomas. Turn it off unless we're > > searching in the worktree. > > Based on my earlier numbers, I was going to complain that we should > also be checking the "simple expressions" assumption here, as time spent > in the actual regex might be important. > > However, after trying to repeat my experiment, I think the numbers I > posted earlier were misleading. For example, using my "more complex" > regex of 'a.*b': > > $ time git grep --threads=8 'a.*b' HEAD >/dev/null > real 0m8.655s > user 0m23.817s > sys 0m0.480s Dumb question (I missed the beginning of the conversation): what kind of storage are you using, and is the data already cached? I seem to recall part of the motivation for the multithreading being NFS, where the goal isn't so much to keep CPU's busy as it is to keep the network busy. Probably a bigger problem for something like "git status" which I think ends up doing a series of stat's (which can each require a round trip to the server in the NFS case), as it is a problem for something like git-grep that's also doing reads. Just a plea for considering the IO cost as well when making these kinds of decisions.... (Which maybe you already do, apologies again for just naively dropping into the middle of a thread.) --b. > > Look at that sweet, sweet parallelism. It's a quad-core with > hyperthreading, so we're not getting the 8x speedup we might hope for > (presumably due to lock contention on extracting blobs), but hey, 3x > isn't bad. Except, wait: > > $ time git grep --threads=0 'a.*b' HEAD >/dev/null > real 0m7.651s > user 0m7.600s > sys 0m0.048s > > We can get 1x on a single core, but the total time is lower! This > processor is an i7 with "turbo boost", which means it clocks higher in > single-core mode than when multiple cores are active. So the numbers I > posted earlier were misleading. Yes, we got parallelism, but at the cost > of knocking the clock speed down for a net loss. > > The sweet spot for me seems to be: > > $ time git grep --threads=2 'a.*b' HEAD >/dev/null > real 0m6.303s > user 0m11.129s > sys 0m0.220s > > I'd be curious to see results from somebody with a quad-core (or more) > without turbo boost; I suspect that threading may have more benefit > there, even though we have some lock contention for blobs. > > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -1048,7 +1048,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) > > nr_threads = 0; > > #else > > if (nr_threads == -1) > > - nr_threads = (online_cpus() > 1) ? THREADS : 0; > > + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; > > > > if (nr_threads > 0) { > > opt.use_threads = 1; > > This doesn't kick in for "--cached", which has the same performance > characteristics as grepping a tree. I think you want to add "&& !cached" to > the conditional. > > -Peff > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 20:11 ` J. Bruce Fields @ 2011-12-07 20:45 ` Jeff King 0 siblings, 0 replies; 50+ messages in thread From: Jeff King @ 2011-12-07 20:45 UTC (permalink / raw) To: J. Bruce Fields Cc: René Scharfe, Thomas Rast, git, Eric Herman, Junio C Hamano On Wed, Dec 07, 2011 at 03:11:05PM -0500, J. Bruce Fields wrote: > > $ time git grep --threads=8 'a.*b' HEAD >/dev/null > > real 0m8.655s > > user 0m23.817s > > sys 0m0.480s > > Dumb question (I missed the beginning of the conversation): what kind of > storage are you using, and is the data already cached? Sorry, I should have been clear: all of those numbers are with a warm cache. So this is measuring only CPU. > I seem to recall part of the motivation for the multithreading being > NFS, where the goal isn't so much to keep CPU's busy as it is to keep > the network busy. > > Probably a bigger problem for something like "git status" which I think > ends up doing a series of stat's (which can each require a round trip to > the server in the NFS case), as it is a problem for something like > git-grep that's also doing reads. > > Just a plea for considering the IO cost as well when making these kinds > of decisions.... This system has a decent-quality SSD, so the I/O timings are perhaps not as interesting as they might otherwise be. But here are cold cache numbers (each run after 'echo 3 >/proc/sys/vm/drop_caches'): HEAD, --threads=0: 4.956s HEAD, --threads=8: 9.917s working tree, --threads=0: 17.444s working tree, --threads=8: 6.462s So when pulling from the object db, threads are still a huge loss (because the data is compressed, the SSD is fast, and we spend a lot of CPU time inflating; so it ends up close to the warm cache results). But for the working tree, the I/O parallelism is a huge win. So at least on my system, cold cache vs. warm cache leads to the same conclusion. "git grep --threads=8 ... HEAD" might still be a win on slow disks or NFS, though. -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe 2011-12-07 4:42 ` Jeff King @ 2011-12-07 8:12 ` Thomas Rast 2011-12-07 17:00 ` René Scharfe 1 sibling, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-12-07 8:12 UTC (permalink / raw) To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano René Scharfe wrote: > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 47ac188..e981a9b 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -228,8 +228,9 @@ OPTIONS > there is a match and with non-zero status when there isn't. > > --threads <n>:: > + Run <n> search threads in parallel. Default is 8 when searching > + the worktree and 0 otherwise. This option is ignored if git was > + built without support for POSIX threads. [...] > - nr_threads = (online_cpus() > 1) ? THREADS : 0; > + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; It would be more consistent to stick to the pack.threads convention where 0 means "all of my cores", so to disable threading the user would set the number of threads to 1. Or were you trying to measure the contention between the worker thread and the add_work() thread? -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 8:12 ` Thomas Rast @ 2011-12-07 17:00 ` René Scharfe 2011-12-10 13:13 ` Pete Wyckoff 0 siblings, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-12-07 17:00 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano Am 07.12.2011 09:12, schrieb Thomas Rast: > René Scharfe wrote: >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt >> index 47ac188..e981a9b 100644 >> --- a/Documentation/git-grep.txt >> +++ b/Documentation/git-grep.txt >> @@ -228,8 +228,9 @@ OPTIONS >> there is a match and with non-zero status when there isn't. >> >> --threads <n>:: >> + Run <n> search threads in parallel. Default is 8 when searching >> + the worktree and 0 otherwise. This option is ignored if git was >> + built without support for POSIX threads. > [...] >> - nr_threads = (online_cpus() > 1) ? THREADS : 0; >> + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; > > It would be more consistent to stick to the pack.threads convention > where 0 means "all of my cores", so to disable threading the user > would set the number of threads to 1. Or were you trying to measure > the contention between the worker thread and the add_work() thread? Yes, indeed, the cost for the threading overhead does interest me. The documentation should perhaps mention --no-threads explicitly to avoid confusion. Currently there is no way to specify "as many threads as there are cores" here. Previous measurements indicated that it wasn't too useful, however, because I/O parallelism was beneficial even for machines with less than eight cores and more threads didn't pay off. René ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-07 17:00 ` René Scharfe @ 2011-12-10 13:13 ` Pete Wyckoff 2011-12-12 22:37 ` René Scharfe 0 siblings, 1 reply; 50+ messages in thread From: Pete Wyckoff @ 2011-12-10 13:13 UTC (permalink / raw) To: René Scharfe; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano rene.scharfe@lsrfire.ath.cx wrote on Wed, 07 Dec 2011 18:00 +0100: > Am 07.12.2011 09:12, schrieb Thomas Rast: > > René Scharfe wrote: > >> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > >> index 47ac188..e981a9b 100644 > >> --- a/Documentation/git-grep.txt > >> +++ b/Documentation/git-grep.txt > >> @@ -228,8 +228,9 @@ OPTIONS > >> there is a match and with non-zero status when there isn't. > >> > >> --threads <n>:: > >> + Run <n> search threads in parallel. Default is 8 when searching > >> + the worktree and 0 otherwise. This option is ignored if git was > >> + built without support for POSIX threads. > > [...] > >> - nr_threads = (online_cpus() > 1) ? THREADS : 0; > >> + nr_threads = (online_cpus() > 1 && !list.nr) ? THREADS : 0; > > > > It would be more consistent to stick to the pack.threads convention > > where 0 means "all of my cores", so to disable threading the user > > would set the number of threads to 1. Or were you trying to measure > > the contention between the worker thread and the add_work() thread? > > Yes, indeed, the cost for the threading overhead does interest me. The > documentation should perhaps mention --no-threads explicitly to avoid > confusion. > > Currently there is no way to specify "as many threads as there are > cores" here. Previous measurements indicated that it wasn't too useful, > however, because I/O parallelism was beneficial even for machines with > less than eight cores and more threads didn't pay off. Right. Even for single CPU machines this is true, so the nr_threads calculation above should still use all 8 THREADS regardless of the number of online_cpus(). -- Pete ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 4/2] grep: turn off threading for non-worktree 2011-12-10 13:13 ` Pete Wyckoff @ 2011-12-12 22:37 ` René Scharfe 0 siblings, 0 replies; 50+ messages in thread From: René Scharfe @ 2011-12-12 22:37 UTC (permalink / raw) To: Pete Wyckoff; +Cc: Thomas Rast, git, Eric Herman, Junio C Hamano Am 10.12.2011 14:13, schrieb Pete Wyckoff: > rene.scharfe@lsrfire.ath.cx wrote on Wed, 07 Dec 2011 18:00 +0100: >> Am 07.12.2011 09:12, schrieb Thomas Rast: >>> René Scharfe wrote: >>>> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt >>>> index 47ac188..e981a9b 100644 >>>> --- a/Documentation/git-grep.txt >>>> +++ b/Documentation/git-grep.txt >>>> @@ -228,8 +228,9 @@ OPTIONS >>>> there is a match and with non-zero status when there isn't. >>>> >>>> --threads<n>:: >>>> + Run<n> search threads in parallel. Default is 8 when searching >>>> + the worktree and 0 otherwise. This option is ignored if git was >>>> + built without support for POSIX threads. >>> [...] >>>> - nr_threads = (online_cpus()> 1) ? THREADS : 0; >>>> + nr_threads = (online_cpus()> 1&& !list.nr) ? THREADS : 0; >>> >>> It would be more consistent to stick to the pack.threads convention >>> where 0 means "all of my cores", so to disable threading the user >>> would set the number of threads to 1. Or were you trying to measure >>> the contention between the worker thread and the add_work() thread? >> >> Yes, indeed, the cost for the threading overhead does interest me. The >> documentation should perhaps mention --no-threads explicitly to avoid >> confusion. >> >> Currently there is no way to specify "as many threads as there are >> cores" here. Previous measurements indicated that it wasn't too useful, >> however, because I/O parallelism was beneficial even for machines with >> less than eight cores and more threads didn't pay off. > > Right. Even for single CPU machines this is true, so the > nr_threads calculation above should still use all 8 THREADS > regardless of the number of online_cpus(). That makes sense. However, in a quick test with a simple regex against a cache-warm Linux repo threading increased the runtime of git grep by 30% on a single-core virtual machine. Let's keep that check until we understand this better.. René ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-06 22:48 ` René Scharfe 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe @ 2011-12-07 4:24 ` Jeff King 2011-12-07 16:52 ` René Scharfe 2011-12-07 8:11 ` Thomas Rast 2 siblings, 1 reply; 50+ messages in thread From: Jeff King @ 2011-12-07 4:24 UTC (permalink / raw) To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano On Tue, Dec 06, 2011 at 11:48:26PM +0100, René Scharfe wrote: > #ifndef NO_PTHREADS > - if (use_threads) { > + if (nr_threads > 0) { > grep_sha1_async(opt, name, sha1); > return 0; > } else Should this be "if (nr_threads > 1)"? As a user, I would do: git grep --threads=1 ... if I wanted a single-threaded process. Instead, we actually spawn a sub-thread and do all of the locking, which has a measurable cost: $ time git grep --threads=0 SIMPLE HEAD >/dev/null real 0m2.994s user 0m2.932s sys 0m0.060s $ time git grep --threads=1 SIMPLE HEAD >/dev/null real 0m3.407s user 0m3.392s sys 0m0.140s Should --threads=1 be equivalent to --threads=0? -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King @ 2011-12-07 16:52 ` René Scharfe 2011-12-07 18:10 ` Jeff King 0 siblings, 1 reply; 50+ messages in thread From: René Scharfe @ 2011-12-07 16:52 UTC (permalink / raw) To: Jeff King; +Cc: git, Eric Herman, Junio C Hamano Am 07.12.2011 05:24, schrieb Jeff King: > On Tue, Dec 06, 2011 at 11:48:26PM +0100, René Scharfe wrote: > >> #ifndef NO_PTHREADS >> - if (use_threads) { >> + if (nr_threads > 0) { >> grep_sha1_async(opt, name, sha1); >> return 0; >> } else > > Should this be "if (nr_threads > 1)"? > > As a user, I would do: > > git grep --threads=1 ... > > if I wanted a single-threaded process. Instead, we actually spawn a > sub-thread and do all of the locking, which has a measurable cost: Yes, the difference is measurable, and that's exactly how I like it to be. :) A user can turn off threading with --threads=0 or (more intuitively) --no-threads. And we can quantify the overhead. > $ time git grep --threads=0 SIMPLE HEAD >/dev/null > real 0m2.994s > user 0m2.932s > sys 0m0.060s > > $ time git grep --threads=1 SIMPLE HEAD >/dev/null > real 0m3.407s > user 0m3.392s > sys 0m0.140s > > Should --threads=1 be equivalent to --threads=0? We can do that if there's another way to calculate this difference, or if it is not useful to know. I find your results interesting at least, though. :) René ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-07 16:52 ` René Scharfe @ 2011-12-07 18:10 ` Jeff King 0 siblings, 0 replies; 50+ messages in thread From: Jeff King @ 2011-12-07 18:10 UTC (permalink / raw) To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano On Wed, Dec 07, 2011 at 05:52:10PM +0100, René Scharfe wrote: > > As a user, I would do: > > > > git grep --threads=1 ... > > > > if I wanted a single-threaded process. Instead, we actually spawn a > > sub-thread and do all of the locking, which has a measurable cost: > > Yes, the difference is measurable, and that's exactly how I like it to > be. :) A user can turn off threading with --threads=0 or (more > intuitively) --no-threads. And we can quantify the overhead. That seems acceptable to me if --threads is for speed-testing, but a horrible interface if it is meant for end users who just want git to be fast. -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-06 22:48 ` René Scharfe 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe 2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King @ 2011-12-07 8:11 ` Thomas Rast 2011-12-07 16:54 ` René Scharfe 2 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-12-07 8:11 UTC (permalink / raw) To: René Scharfe; +Cc: git, Eric Herman, Junio C Hamano René Scharfe wrote: > Am 02.12.2011 17:15, schrieb René Scharfe: > > How about adding a parameter to control the number of threads > > (--threads?) instead that defaults to eight (or five) for the worktree > > and one for the rest? That would also make benchmarking easier. > > Like this: > > -- >8 -- > Subject: grep: add parameter --threads > > Allow the number of threads to be specified by the user. This makes > benchmarking the performance impact of different numbers of threads > much easier. Sounds good, though in the end we would also want to have a config variable for the poor OS X users who have to tune their threads *down*... :-) -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-07 8:11 ` Thomas Rast @ 2011-12-07 16:54 ` René Scharfe 0 siblings, 0 replies; 50+ messages in thread From: René Scharfe @ 2011-12-07 16:54 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Eric Herman, Junio C Hamano Am 07.12.2011 09:11, schrieb Thomas Rast: > René Scharfe wrote: >> Am 02.12.2011 17:15, schrieb René Scharfe: >>> How about adding a parameter to control the number of threads >>> (--threads?) instead that defaults to eight (or five) for the worktree >>> and one for the rest? That would also make benchmarking easier. >> >> Like this: >> >> -- >8 -- >> Subject: grep: add parameter --threads >> >> Allow the number of threads to be specified by the user. This makes >> benchmarking the performance impact of different numbers of threads >> much easier. > > Sounds good, though in the end we would also want to have a config > variable for the poor OS X users who have to tune their threads > *down*... :-) We could set different defaults for different platforms.. René ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v3 0/3] grep attributes and multithreading 2011-12-02 16:15 ` René Scharfe 2011-12-05 9:02 ` Thomas Rast 2011-12-06 22:48 ` René Scharfe @ 2011-12-12 21:16 ` Thomas Rast 2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast ` (4 more replies) 2 siblings, 5 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw) To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman I think we should finish up these three patches for the next release. The first two are unchanged from last time; nobody seemed to have any objections. On the third one I'm following René's argument: > Am 02.12.2011 14:07, schrieb Thomas Rast: > > So disable threading entirely when not scanning the worktree, to get > > the NO_PTHREADS performance in that case. This obsoletes all code > > related to grep_sha1_async. The thread startup must be delayed until > > after all arguments have been parsed, but this does not have a > > measurable effect. > > This is a bit radical. I think the underlying issue that > read_sha1_file() is not thread-safe can be solved eventually and then > we'd need to readd that code. I already posted a bunch of POC patches, but doing the timing manually has been getting on my nerves lately. So I would first like to formalize some of the performance testing, perhaps along lines already drawn up by Michael Hagger, perhaps not. Then we can revisit the issue of grep performance. But I would prefer not to block the -W fix and two easy and confirmed speedups on that. I dropped this part entirely: > How about adding a parameter to control the number of threads > (--threads?) instead that defaults to eight (or five) for the worktree > and one for the rest? That would also make benchmarking easier. It does make testing a lot easier, but the interface is IMHO not fit for users and I have a feeling that the "right" for-debugging interface will end up falling out of the performance testing work (probably an environment variable). The end-user option should be a config setting, if any. Thomas Rast (3): grep: load funcname patterns for -W grep: enable threading with -p and -W using lazy attribute lookup grep: disable threading in non-worktree case builtin/grep.c | 34 +++++++++++++++---------- grep.c | 73 ++++++++++++++++++++++++++++++++++--------------------- grep.h | 7 +++++ t/t7810-grep.sh | 14 ++++++++++ 4 files changed, 86 insertions(+), 42 deletions(-) -- 1.7.8.431.g2abf2 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH v3 1/3] grep: load funcname patterns for -W 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast @ 2011-12-12 21:16 ` Thomas Rast 2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast ` (3 subsequent siblings) 4 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw) To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman git-grep avoids loading the funcname patterns unless they are needed. ba8ea74 (grep: add option to show whole function as context, 2011-08-01) forgot to extend this test also to the new funcbody feature. Do so. The catch is that we also have to disable threading when using userdiff, as explained in grep_threads_ok(). So we must be careful to introduce the same test there. --- grep.c | 7 ++++--- t/t7810-grep.sh | 14 ++++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index b29d09c..7a070e9 100644 --- a/grep.c +++ b/grep.c @@ -948,8 +948,8 @@ int grep_threads_ok(const struct grep_opt *opt) * machinery in grep_buffer_1. The attribute code is not * thread safe, so we disable the use of threads. */ - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && - !opt->name_only) + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) return 0; return 1; @@ -1008,7 +1008,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } memset(&xecfg, 0, sizeof(xecfg)); - if (opt->funcname && !opt->unmatch_name_only && !opt->status_only && + if ((opt->funcname || opt->funcbody) + && !opt->unmatch_name_only && !opt->status_only && !opt->name_only && !binary_match_only && !collect_hits) { struct userdiff_driver *drv = userdiff_find_by_path(name); if (drv && drv->funcname.pattern) { diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 81263b7..7ba5b16 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -523,6 +523,20 @@ test_expect_success 'grep -W' ' test_cmp expected actual ' +cat >expected <<EOF +hello.c= printf("Hello world.\n"); +hello.c: return 0; +hello.c- /* char ?? */ +EOF + +test_expect_success 'grep -W with userdiff' ' + test_when_finished "rm -f .gitattributes" && + git config diff.custom.xfuncname "(printf.*|})$" && + echo "hello.c diff=custom" >.gitattributes && + git grep -W return >actual && + test_cmp expected actual +' + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 1.7.8.431.g2abf2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast 2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast @ 2011-12-12 21:16 ` Thomas Rast 2011-12-16 8:22 ` Johannes Sixt 2011-12-12 21:16 ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast ` (2 subsequent siblings) 4 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw) To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman Lazily load the userdiff attributes in match_funcname(). Use a separate mutex around this loading to protect the (not thread-safe) attributes machinery. This lets us re-enable threading with -p and -W while reducing the overhead caused by looking up attributes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin/grep.c | 12 +++++++-- grep.c | 74 ++++++++++++++++++++++++++++++++++---------------------- grep.h | 7 +++++ 3 files changed, 61 insertions(+), 32 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 988ea1d..bc23c3c 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt) pthread_mutex_init(&grep_mutex, NULL); pthread_mutex_init(&read_sha1_mutex, NULL); + pthread_mutex_init(&grep_attr_mutex, NULL); pthread_cond_init(&cond_add, NULL); pthread_cond_init(&cond_write, NULL); pthread_cond_init(&cond_result, NULL); @@ -303,6 +304,7 @@ static int wait_all(void) pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&read_sha1_mutex); + pthread_mutex_destroy(&grep_attr_mutex); pthread_cond_destroy(&cond_add); pthread_cond_destroy(&cond_write); pthread_cond_destroy(&cond_result); @@ -1002,17 +1004,21 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.regflags |= REG_ICASE; #ifndef NO_PTHREADS - if (online_cpus() == 1 || !grep_threads_ok(&opt)) + if (online_cpus() == 1) use_threads = 0; +#else + use_threads = 0; +#endif + + opt.use_threads = use_threads; +#ifndef NO_PTHREADS if (use_threads) { if (opt.pre_context || opt.post_context || opt.file_break || opt.funcbody) skip_first_line = 1; start_threads(&opt); } -#else - use_threads = 0; #endif compile_grep_patterns(&opt); diff --git a/grep.c b/grep.c index 7a070e9..4dd7da2 100644 --- a/grep.c +++ b/grep.c @@ -2,6 +2,7 @@ #include "grep.h" #include "userdiff.h" #include "xdiff-interface.h" +#include "thread-utils.h" void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat) { @@ -806,10 +807,46 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, opt->output(opt, "\n", 1); } -static int match_funcname(struct grep_opt *opt, char *bol, char *eol) +#ifndef NO_PTHREADS +/* + * This lock protects access to the gitattributes machinery, which is + * not thread-safe. + */ +pthread_mutex_t grep_attr_mutex; + +static inline void grep_attr_lock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_lock(&grep_attr_mutex); +} + +static inline void grep_attr_unlock(struct grep_opt *opt) +{ + if (opt->use_threads) + pthread_mutex_unlock(&grep_attr_mutex); +} +#else +#define grep_attr_lock(opt) +#define grep_attr_unlock(opt) +#endif + +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol) { xdemitconf_t *xecfg = opt->priv; - if (xecfg && xecfg->find_func) { + if (xecfg && !xecfg->find_func) { + struct userdiff_driver *drv; + grep_attr_lock(opt); + drv = userdiff_find_by_path(name); + grep_attr_unlock(opt); + if (drv && drv->funcname.pattern) { + const struct userdiff_funcname *pe = &drv->funcname; + xdiff_set_find_func(xecfg, pe->pattern, pe->cflags); + } else { + xecfg = opt->priv = NULL; + } + } + + if (xecfg) { char buf[1]; return xecfg->find_func(bol, eol - bol, buf, 1, xecfg->find_func_priv) >= 0; @@ -835,7 +872,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name, if (lno <= opt->last_shown) break; - if (match_funcname(opt, bol, eol)) { + if (match_funcname(opt, name, bol, eol)) { show_line(opt, bol, eol, name, lno, '='); break; } @@ -848,7 +885,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, unsigned cur = lno, from = 1, funcname_lno = 0; int funcname_needed = !!opt->funcname; - if (opt->funcbody && !match_funcname(opt, bol, end)) + if (opt->funcbody && !match_funcname(opt, name, bol, end)) funcname_needed = 2; if (opt->pre_context < lno) @@ -864,7 +901,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf, while (bol > buf && bol[-1] != '\n') bol--; cur--; - if (funcname_needed && match_funcname(opt, bol, eol)) { + if (funcname_needed && match_funcname(opt, name, bol, eol)) { funcname_lno = cur; funcname_needed = 0; } @@ -942,19 +979,6 @@ static int look_ahead(struct grep_opt *opt, return 0; } -int grep_threads_ok(const struct grep_opt *opt) -{ - /* If this condition is true, then we may use the attribute - * machinery in grep_buffer_1. The attribute code is not - * thread safe, so we disable the use of threads. - */ - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && !opt->name_only) - return 0; - - return 1; -} - static void std_output(struct grep_opt *opt, const void *buf, size_t size) { fwrite(buf, size, 1, stdout); @@ -1008,16 +1032,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } memset(&xecfg, 0, sizeof(xecfg)); - if ((opt->funcname || opt->funcbody) - && !opt->unmatch_name_only && !opt->status_only && - !opt->name_only && !binary_match_only && !collect_hits) { - struct userdiff_driver *drv = userdiff_find_by_path(name); - if (drv && drv->funcname.pattern) { - const struct userdiff_funcname *pe = &drv->funcname; - xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); - opt->priv = &xecfg; - } - } + opt->priv = &xecfg; + try_lookahead = should_lookahead(opt); while (left) { @@ -1093,7 +1109,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, show_function = 1; goto next_line; } - if (show_function && match_funcname(opt, bol, eol)) + if (show_function && match_funcname(opt, name, bol, eol)) show_function = 0; if (show_function || (last_hit && lno <= last_hit + opt->post_context)) { diff --git a/grep.h b/grep.h index a652800..15d227c 100644 --- a/grep.h +++ b/grep.h @@ -115,6 +115,7 @@ struct grep_opt { int show_hunk_mark; int file_break; int heading; + int use_threads; void *priv; void (*output)(struct grep_opt *opt, const void *data, size_t size); @@ -131,4 +132,10 @@ struct grep_opt { extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); extern int grep_threads_ok(const struct grep_opt *opt); +#ifndef NO_PTHREADS +/* Mutex used around access to the attributes machinery if + * opt->use_threads. Must be initialized/destroyed by callers! */ +extern pthread_mutex_t grep_attr_mutex; +#endif + #endif -- 1.7.8.431.g2abf2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup 2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast @ 2011-12-16 8:22 ` Johannes Sixt 2011-12-16 17:34 ` Junio C Hamano 0 siblings, 1 reply; 50+ messages in thread From: Johannes Sixt @ 2011-12-16 8:22 UTC (permalink / raw) To: Thomas Rast Cc: git, Junio C Hamano, René Scharfe, Jeff King, Eric Herman Am 12/12/2011 22:16, schrieb Thomas Rast: > diff --git a/grep.h b/grep.h > index a652800..15d227c 100644 > --- a/grep.h > +++ b/grep.h > @@ -115,6 +115,7 @@ struct grep_opt { > int show_hunk_mark; > int file_break; > int heading; > + int use_threads; > void *priv; > > void (*output)(struct grep_opt *opt, const void *data, size_t size); > @@ -131,4 +132,10 @@ struct grep_opt { > extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); > extern int grep_threads_ok(const struct grep_opt *opt); > > +#ifndef NO_PTHREADS > +/* Mutex used around access to the attributes machinery if > + * opt->use_threads. Must be initialized/destroyed by callers! */ > +extern pthread_mutex_t grep_attr_mutex; > +#endif > + > #endif This is the first time we use pthread_mutex_t in a header file. We need at least the following squashed in. An alternative would be to include "thread-utils.h", but thread-utils is really more about implementation helpers functions, not about types, and therefore does not look right to be #included in a header. --- grep.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/grep.h b/grep.h index 15d227c..754b270 100644 --- a/grep.h +++ b/grep.h @@ -133,6 +133,7 @@ extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); extern int grep_threads_ok(const struct grep_opt *opt); #ifndef NO_PTHREADS +#include <pthread.h> /* Mutex used around access to the attributes machinery if * opt->use_threads. Must be initialized/destroyed by callers! */ extern pthread_mutex_t grep_attr_mutex; -- 1.7.8.1436.gb3021 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup 2011-12-16 8:22 ` Johannes Sixt @ 2011-12-16 17:34 ` Junio C Hamano 0 siblings, 0 replies; 50+ messages in thread From: Junio C Hamano @ 2011-12-16 17:34 UTC (permalink / raw) To: Johannes Sixt Cc: Thomas Rast, git, Junio C Hamano, René Scharfe, Jeff King, Eric Herman Johannes Sixt <j.sixt@viscovery.net> writes: > This is the first time we use pthread_mutex_t in a header file. We need at > least the following squashed in. An alternative would be to include > "thread-utils.h", but thread-utils is really more about implementation > helpers functions, not about types,... builtin/grep.c already uses thread-utils.h since 5b594f4 (Threaded grep, 2010-01-25), so does builtin/pack-objects.c since 833e3df (pack-objects: Add runtime detection of online CPU's, 2008-02-22), so it may be simpler to do so in grep.h instead. diff --git a/builtin/grep.c b/builtin/grep.c index bc23c3c..6474eed 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -17,7 +17,6 @@ #include "grep.h" #include "quote.h" #include "dir.h" -#include "thread-utils.h" static char const * const grep_usage[] = { "git grep [options] [-e] <pattern> [<rev>...] [[--] <path>...]", diff --git a/grep.h b/grep.h index 15d227c..dd4c65e 100644 --- a/grep.h +++ b/grep.h @@ -133,8 +133,11 @@ extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt); extern int grep_threads_ok(const struct grep_opt *opt); #ifndef NO_PTHREADS -/* Mutex used around access to the attributes machinery if - * opt->use_threads. Must be initialized/destroyed by callers! */ +#include "thread-utils.h" +/* + * Mutex used around access to the attributes machinery if + * opt->use_threads. Must be initialized/destroyed by callers! + */ extern pthread_mutex_t grep_attr_mutex; #endif ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH v3 3/3] grep: disable threading in non-worktree case 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast 2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast 2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast @ 2011-12-12 21:16 ` Thomas Rast 2011-12-12 22:37 ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe 2011-12-12 23:44 ` Junio C Hamano 4 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-12 21:16 UTC (permalink / raw) To: git, Junio C Hamano; +Cc: René Scharfe, Jeff King, Eric Herman Measurements by various people have shown that grepping in parallel is not beneficial when the object store is involved. For example, with a simple regex: Threads | --cached case | worktree case ---------------------------------------------------------------- 8 (default) | 2.88u 0.21s 0:02.94real | 0.19u 0.32s 0:00.16real 4 | 2.89u 0.29s 0:02.99real | 0.16u 0.34s 0:00.17real 2 | 2.83u 0.36s 0:02.87real | 0.18u 0.32s 0:00.26real NO_PTHREADS | 2.16u 0.08s 0:02.25real | 0.12u 0.17s 0:00.31real This happens because all the threads contend on read_sha1_mutex almost all of the time. A more complex regex allows the threads to do more work in parallel, but as Jeff King found out, the "super boost" (much higher clock when only one core is active) feature of recent CPUs still causes the unthreaded case to win by a large margin. So until the pack machinery allows unthreaded access, we disable grep's threading in all but the worktree case. Helped-by: René Scharfe <rene.scharfe@lsrfire.ath.cx> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Thomas Rast <trast@student.ethz.ch> --- builtin/grep.c | 36 ++++++++++++++++++------------------ 1 files changed, 18 insertions(+), 18 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index bc23c3c..7affbda 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1003,24 +1003,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.fixed && opt.ignore_case) opt.regflags |= REG_ICASE; -#ifndef NO_PTHREADS - if (online_cpus() == 1) - use_threads = 0; -#else - use_threads = 0; -#endif - - opt.use_threads = use_threads; - -#ifndef NO_PTHREADS - if (use_threads) { - if (opt.pre_context || opt.post_context || opt.file_break || - opt.funcbody) - skip_first_line = 1; - start_threads(&opt); - } -#endif - compile_grep_patterns(&opt); /* Check revs and then paths */ @@ -1042,6 +1024,24 @@ int cmd_grep(int argc, const char **argv, const char *prefix) break; } +#ifndef NO_PTHREADS + if (list.nr || cached || online_cpus() == 1) + use_threads = 0; +#else + use_threads = 0; +#endif + + opt.use_threads = use_threads; + +#ifndef NO_PTHREADS + if (use_threads) { + if (opt.pre_context || opt.post_context || opt.file_break || + opt.funcbody) + skip_first_line = 1; + start_threads(&opt); + } +#endif + /* The rest are paths */ if (!seen_dashdash) { int j; -- 1.7.8.431.g2abf2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH v3 0/3] grep attributes and multithreading 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast ` (2 preceding siblings ...) 2011-12-12 21:16 ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast @ 2011-12-12 22:37 ` René Scharfe 2011-12-12 23:44 ` Junio C Hamano 4 siblings, 0 replies; 50+ messages in thread From: René Scharfe @ 2011-12-12 22:37 UTC (permalink / raw) To: Thomas Rast; +Cc: git, Junio C Hamano, Jeff King, Eric Herman Am 12.12.2011 22:16, schrieb Thomas Rast: > I think we should finish up these three patches for the next release. > I already posted a bunch of POC patches, but doing the timing manually > has been getting on my nerves lately. So I would first like to > formalize some of the performance testing, perhaps along lines already > drawn up by Michael Hagger, perhaps not. Then we can revisit the > issue of grep performance. But I would prefer not to block the -W fix > and two easy and confirmed speedups on that. Yes, that's a good idea. The three patches are uncontroversial incremental improvements. > I dropped this part entirely: > >> How about adding a parameter to control the number of threads >> (--threads?) instead that defaults to eight (or five) for the worktree >> and one for the rest? That would also make benchmarking easier. > > It does make testing a lot easier, but the interface is IMHO not fit > for users and I have a feeling that the "right" for-debugging > interface will end up falling out of the performance testing work > (probably an environment variable). The end-user option should be a > config setting, if any. Agreed; users shouldn't need to specify such a parameter -- our heuristic should be good enough. René ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v3 0/3] grep attributes and multithreading 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast ` (3 preceding siblings ...) 2011-12-12 22:37 ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe @ 2011-12-12 23:44 ` Junio C Hamano 2011-12-13 8:44 ` Thomas Rast 4 siblings, 1 reply; 50+ messages in thread From: Junio C Hamano @ 2011-12-12 23:44 UTC (permalink / raw) To: Thomas Rast; +Cc: git, René Scharfe, Jeff King, Eric Herman Thanks; sign-off? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v3 0/3] grep attributes and multithreading 2011-12-12 23:44 ` Junio C Hamano @ 2011-12-13 8:44 ` Thomas Rast 0 siblings, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-13 8:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Jeff King, Eric Herman Junio C Hamano wrote: > Thanks; sign-off? Ooops, not again! > [PATCH v3 1/3] grep: load funcname patterns for -W Signed-off-by: Thomas Rast <trast@student.ethz.ch> -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast 2011-12-02 16:15 ` René Scharfe @ 2011-12-23 22:37 ` Ævar Arnfjörð Bjarmason 2011-12-23 22:49 ` Thomas Rast 1 sibling, 1 reply; 50+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2011-12-23 22:37 UTC (permalink / raw) To: Thomas Rast Cc: René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote: > I conjecture that this is caused by contention on > read_sha1_mutex. [...] So disable threading entirely when not > scanning the worktree Why does git-grep even need to keep a mutex to call read_sha1_file()? It's inherently a read-only operation isn't it? If the lock is needed because data is being shared between threads in sha1_file.c shouldn't we tackle that instead of completely disabling threading? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason @ 2011-12-23 22:49 ` Thomas Rast 2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 50+ messages in thread From: Thomas Rast @ 2011-12-23 22:49 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote: > >> I conjecture that this is caused by contention on >> read_sha1_mutex. [...] So disable threading entirely when not >> scanning the worktree > > Why does git-grep even need to keep a mutex to call read_sha1_file()? > It's inherently a read-only operation isn't it? If the lock is needed > because data is being shared between threads in sha1_file.c shouldn't > we tackle that instead of completely disabling threading? The problem is that all sorts of data is shared. See http://thread.gmane.org/gmane.comp.version-control.git/186618 But I need to go through it again, there are some races and double locks in the posted version. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-23 22:49 ` Thomas Rast @ 2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason 2011-12-24 7:07 ` Jeff King 0 siblings, 1 reply; 50+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2011-12-24 1:39 UTC (permalink / raw) To: Thomas Rast Cc: René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen 2011/12/23 Thomas Rast <trast@student.ethz.ch>: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Fri, Dec 2, 2011 at 14:07, Thomas Rast <trast@student.ethz.ch> wrote: >> >>> I conjecture that this is caused by contention on >>> read_sha1_mutex. [...] So disable threading entirely when not >>> scanning the worktree >> >> Why does git-grep even need to keep a mutex to call read_sha1_file()? >> It's inherently a read-only operation isn't it? If the lock is needed >> because data is being shared between threads in sha1_file.c shouldn't >> we tackle that instead of completely disabling threading? > > The problem is that all sorts of data is shared. See > > http://thread.gmane.org/gmane.comp.version-control.git/186618 > > But I need to go through it again, there are some races and double locks > in the posted version. I mentioned this on IRC, but I thought I'd bring it up here too. Is the expensive part of git-grep all the setup work, or the actual traversal and searching? I'm guessing it's the latter. In that case an easy way to do git-grep in parallel would be to simply spawn multiple sub-processes, e.g. if we had 1000 files and 4 cores: 1. Split the 1000 into 4 parts 250 each. 2. Spawn 4 processes as: git grep <pattern> -- <250 files> 3. Aggregate all of the results in the parent process ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason @ 2011-12-24 7:07 ` Jeff King 2011-12-24 10:49 ` Nguyen Thai Ngoc Duy ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Jeff King @ 2011-12-24 7:07 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen On Sat, Dec 24, 2011 at 02:39:11AM +0100, Ævar Arnfjörð Bjarmason wrote: > Is the expensive part of git-grep all the setup work, or the actual > traversal and searching? I'm guessing it's the latter. > > In that case an easy way to do git-grep in parallel would be to simply > spawn multiple sub-processes, e.g. if we had 1000 files and 4 cores: > > 1. Split the 1000 into 4 parts 250 each. > 2. Spawn 4 processes as: git grep <pattern> -- <250 files> > 3. Aggregate all of the results in the parent process That's an interesting idea. The expense of the traversal and searching depends on two things: - how complex is your regex? - are you reading from objects (which need zlib inflated) or disk? But you should be able to approximate it by compiling with NO_PTHREADS and doing (assuming you have GNU xargs): # grep in working tree git ls-files | xargs -P 8 git grep "$re" -- # grep tree-ish git ls-tree -r --name-only $tree | xargs -P 8 git grep "$re" $tree -- I tried to get some timings for this, but ran across some quite surprising results. Here's a simple grep of the linux-2.6 working tree, using a single-threaded grep: $ time git grep SIMPLE >/dev/null real 0m0.439s user 0m0.272s sys 0m0.160s and then the same thing, via xargs, without even turning on parallelization. This should give us a measurement of the overhead for going through xargs at all. We'd expect it to be slower, but not too much so: $ time git ls-files | xargs git grep SIMPLE -- >/dev/null real 0m11.989s user 0m11.769s sys 0m0.268s Twenty-five times slower! Running 'perf' reports the culprit as pathspec matching: + 63.23% git git [.] match_pathspec_depth + 28.60% git libc-2.13.so [.] __strncmp_sse42 + 2.22% git git [.] strncmp@plt + 1.67% git git [.] kwsexec where the strncmps are called as part of match_pathspec_depth. So over 90% of the CPU time is spent on matching the pathspecs, compared to less than 2% actually grepping. Which really makes me wonder if our pathspec matching could stand to be faster. True, giving a bunch of single files is the least efficient way to use pathspecs, but that's pretty amazingly slow. The case where we would most expect the setup cost to be drowned out is using a more complex regex, grepping tree objects. There we have a baseline of: $ time git grep 'a.*c' HEAD >/dev/null real 0m5.684s user 0m5.472s sys 0m0.196s $ time git ls-tree --name-only -r HEAD | xargs git grep 'a.*c' HEAD -- >/dev/null real 0m10.906s user 0m10.725s sys 0m0.240s Here, we still almost double our time. It looks like we don't use the same pathspec matching code in this case. But we do waste a lot of extra time zlib-inflating the trees in "ls-tree", only to do it separately in "grep". Doing it in parallel yields: $ time git ls-tree --name-only -r HEAD | xargs -n 4000 -P 8 git grep 'a.*c' HEAD -- >/dev/null real 0m3.573s user 0m21.885s sys 0m0.400s So that does at least yield a real speedup, albeit only by about half, despite using over six times as much CPU (though my numbers are skewed somewhat, as this is a quad i7 with hyperthreading and turbo boost). -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-24 7:07 ` Jeff King @ 2011-12-24 10:49 ` Nguyen Thai Ngoc Duy 2011-12-24 10:55 ` Nguyen Thai Ngoc Duy 2011-12-25 3:32 ` Nguyen Thai Ngoc Duy 2 siblings, 0 replies; 50+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-24 10:49 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote: > I tried to get some timings for this, but ran across some quite > surprising results. Here's a simple grep of the linux-2.6 working tree, > using a single-threaded grep: > > $ time git grep SIMPLE >/dev/null > real 0m0.439s > user 0m0.272s > sys 0m0.160s > > and then the same thing, via xargs, without even turning on > parallelization. This should give us a measurement of the overhead for > going through xargs at all. We'd expect it to be slower, but not too > much so: > > $ time git ls-files | xargs git grep SIMPLE -- >/dev/null > real 0m11.989s > user 0m11.769s > sys 0m0.268s > > Twenty-five times slower! Running 'perf' reports the culprit as pathspec > matching: > > + 63.23% git git [.] match_pathspec_depth > + 28.60% git libc-2.13.so [.] __strncmp_sse42 > + 2.22% git git [.] strncmp@plt > + 1.67% git git [.] kwsexec > > where the strncmps are called as part of match_pathspec_depth. So over > 90% of the CPU time is spent on matching the pathspecs, compared to less > than 2% actually grepping. > > Which really makes me wonder if our pathspec matching could stand to be > faster. True, giving a bunch of single files is the least efficient way > to use pathspecs, but that's pretty amazingly slow. We could eliminate get_pathspec_depth() in grep_directory() when read_directory() learns to filter path properly using (and at the cost of) tree_entry_interesting(). The latter function has more optimizaions built in and should be faster than the former. This is a good test case for my read_directory() rewrite. Thanks. get_pathspec_depth() can still use some optimizations though for grep_cache() case. -- Duy ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-24 7:07 ` Jeff King 2011-12-24 10:49 ` Nguyen Thai Ngoc Duy @ 2011-12-24 10:55 ` Nguyen Thai Ngoc Duy 2011-12-24 13:38 ` Jeff King 2011-12-25 3:32 ` Nguyen Thai Ngoc Duy 2 siblings, 1 reply; 50+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-24 10:55 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen (Sorry I replied without reading though the mail) On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote: > The case where we would most expect the setup cost to be drowned out is > using a more complex regex, grepping tree objects. There we have a > baseline of: > > $ time git grep 'a.*c' HEAD >/dev/null > real 0m5.684s > user 0m5.472s > sys 0m0.196s > > $ time git ls-tree --name-only -r HEAD | > xargs git grep 'a.*c' HEAD -- >/dev/null > real 0m10.906s > user 0m10.725s > sys 0m0.240s > > Here, we still almost double our time. It looks like we don't use the > same pathspec matching code in this case. But we do waste a lot of extra > time zlib-inflating the trees in "ls-tree", only to do it separately in > "grep". I assume this is gree_tree(), we have another form of pathspec matching here: tree_entry_interesting() and it's still a bunch of strcmp inside. Does strcmp show up in perf report? -- Duy ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-24 10:55 ` Nguyen Thai Ngoc Duy @ 2011-12-24 13:38 ` Jeff King 0 siblings, 0 replies; 50+ messages in thread From: Jeff King @ 2011-12-24 13:38 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen On Sat, Dec 24, 2011 at 05:55:14PM +0700, Nguyen Thai Ngoc Duy wrote: > On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote: > > The case where we would most expect the setup cost to be drowned out is > > using a more complex regex, grepping tree objects. There we have a > > baseline of: > > > > $ time git grep 'a.*c' HEAD >/dev/null > > real 0m5.684s > > user 0m5.472s > > sys 0m0.196s > > > > $ time git ls-tree --name-only -r HEAD | > > xargs git grep 'a.*c' HEAD -- >/dev/null > > real 0m10.906s > > user 0m10.725s > > sys 0m0.240s > > > > Here, we still almost double our time. It looks like we don't use the > > same pathspec matching code in this case. But we do waste a lot of extra > > time zlib-inflating the trees in "ls-tree", only to do it separately in > > "grep". > > I assume this is gree_tree(), we have another form of pathspec > matching here: tree_entry_interesting() and it's still a bunch of > strcmp inside. Does strcmp show up in perf report? Yes, but not nearly as high. The top of the report is: + 32.16% git libc-2.13.so [.] re_search_internal + 17.82% git libz.so.1.2.3.4 [.] 0xe986 + 7.81% git git [.] look_ahead + 6.24% git libc-2.13.so [.] __strncmp_sse42 + 4.08% git git [.] tree_entry_interesting + 3.27% git git [.] end_of_line + 2.63% git libz.so.1.2.3.4 [.] adler32 + 1.93% git libz.so.1.2.3.4 [.] inflate where the strncmps are from[1]: - 6.24% git libc-2.13.so [.] __strncmp_sse42 - __strncmp_sse42 + 80.92% grep_tree + 19.08% tree_entry_interesting So we're spending maybe 10% of our time on pathspecs, but most of it is going to zlib and the actual regex search. -Peff [1] Note that this is with -O2, so some of that is from inlined calls. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 3/3] grep: disable threading in all but worktree case 2011-12-24 7:07 ` Jeff King 2011-12-24 10:49 ` Nguyen Thai Ngoc Duy 2011-12-24 10:55 ` Nguyen Thai Ngoc Duy @ 2011-12-25 3:32 ` Nguyen Thai Ngoc Duy 2 siblings, 0 replies; 50+ messages in thread From: Nguyen Thai Ngoc Duy @ 2011-12-25 3:32 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð, Thomas Rast, René Scharfe, Eric Herman, git, Junio C Hamano, Fredrik Kuivinen On Sat, Dec 24, 2011 at 2:07 PM, Jeff King <peff@peff.net> wrote: > The case where we would most expect the setup cost to be drowned out is > using a more complex regex, grepping tree objects. There we have a > baseline of: > > $ time git grep 'a.*c' HEAD >/dev/null > real 0m5.684s > user 0m5.472s > sys 0m0.196s > > $ time git ls-tree --name-only -r HEAD | > xargs git grep 'a.*c' HEAD -- >/dev/null > real 0m10.906s > user 0m10.725s > sys 0m0.240s > > Here, we still almost double our time. It looks like we don't use the > same pathspec matching code in this case. But we do waste a lot of extra > time zlib-inflating the trees in "ls-tree", only to do it separately in > "grep". Or you could pass blob SHA-1 to git grep to avoid reinflating trees $ time git ls-tree -r HEAD|cut -c 13-52|xargs git grep 'a.*c' >/dev/null Doing it in parallel does not seem to save time for me though. -- Duy ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 0/3] grep multithreading and scaling 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast ` (2 preceding siblings ...) 2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast @ 2011-12-02 17:34 ` Jeff King 2011-12-05 9:38 ` Thomas Rast 2011-12-02 20:02 ` Eric Herman 4 siblings, 1 reply; 50+ messages in thread From: Jeff King @ 2011-12-02 17:34 UTC (permalink / raw) To: Thomas Rast; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano On Fri, Dec 02, 2011 at 02:07:45PM +0100, Thomas Rast wrote: > where I put the --cached originally because that makes it independent > of the worktree (which in the very first measurements I still had > wiped, as I tend to do for this repo; I checked it out again after > that). This in fact gives me (~/g/git-grep --cached > INITRAMFS_ROOT_UID, leaving aside -W; best of 10): > > THREADS=8: 2.88user 0.21system 0:02.94elapsed > THREADS=4: 2.89user 0.29system 0:02.99elapsed > THREADS=2: 2.83user 0.36system 0:02.87elapsed > NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed > > Uhuh. Doesn't scale so well after all. But removing the --cached, as > most people probably would: > > THREADS=8: 0.19user 0.32system 0:00.16elapsed > THREADS=4: 0.16user 0.34system 0:00.17elapsed > THREADS=2: 0.18user 0.32system 0:00.26elapsed > NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed > > So I conclude that during any grep that cannot use the worktree, > having any threads hurts. Wow, that's horrible. Leaving aside the parallelism, it's just terrible that reading from the cache is 20 times slower than the worktree. I get similar results on my quad-core machine. A quick perf run shows most of the time is spent inflating objects. The diff code has a sneaky trick to re-use worktree files when we know they are stat-clean (in diff's case it is to avoid writing a tempfile). I wonder if we should use the same trick here. It would hurt the cold cache case, though, as the compressed versions require fewer disk accesses, of course. -Peff PS I suspect your timings are somewhat affected by the simplicity of the regex you are asking for. The time to inflate the blobs dominates, because the search is just a memmem(). On my quad-core w/ hyperthreading (i.e., 8 apparent cores): [no caching, simple regex; we get some parallelism, but the regex task is just not that intensive] $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null 0.42user 0.45system 0:00.15elapsed 578%CPU [no caching, harder regex; we get much higher CPU utilization] $ /usr/bin/time git grep 'a.*b' >/dev/null 14.68user 0.50system 0:02.00elapsed 758%CPU [with caching, simple regex; we get almost _no_ parallelism because all of our time is spent deflating under a lock, and the regex task takes very little time] $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null 7.64user 0.41system 0:07.61elapsed 105%CPU [with caching, harder regex; not as much parallelism as we hoped for, but still much more than before. Because there is actually work to parallelize in the regex] $ /usr/bin/time git grep --cached 'a.*b' >/dev/null 23.46user 0.47system 0:08.42elapsed 284%CPU So I think there is value in parallelizing even --cached greps. But we could do so much better if blob inflation could be done in parallel. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 0/3] grep multithreading and scaling 2011-12-02 17:34 ` [PATCH v2 0/3] grep multithreading and scaling Jeff King @ 2011-12-05 9:38 ` Thomas Rast 2011-12-05 20:16 ` Thomas Rast 2011-12-06 0:40 ` Jeff King 0 siblings, 2 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-05 9:38 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano Jeff King wrote: > > A quick perf run shows most of the time is spent inflating objects. The > diff code has a sneaky trick to re-use worktree files when we know they > are stat-clean (in diff's case it is to avoid writing a tempfile). I > wonder if we should use the same trick here. > > It would hurt the cold cache case, though, as the compressed versions > require fewer disk accesses, of course. I just found out that on Linux, there's mincore() that can tell us (racily, but who cares) whether a given file mapping is in memory. If you would like to try it, see the source at the end, but I'm getting things such as # in a random collection of files, none of which I have accessed lately $ ls -l -rw-r--r-- 1 thomas users 116534 Jul 4 2010 IMG_4884.JPG -rw-r--r-- 1 thomas users 7278081 Aug 25 2010 remoteserverrepo.zip $ ./mincore IMG_4884.JPG 00000000000000000000000000000 $ cat IMG_4884.JPG > /dev/null $ ./mincore IMG_4884.JPG 11111111111111111111111111111 $ ./mincore remoteserverrepo.zip 0000000000000000000000[...] $ head -10 remoteserverrepo.zip >/dev/null $ ./mincore remoteserverrepo.zip 1111000000000000000000[...] So that looks fairly promising, and the order would then be: - if stat-clean, and we have mincore(), and it tells us we can do it cheaply: grab file from tree - if it's a loose object: decompress it - if stat-clean: grab file from tree - access packs as usual > PS I suspect your timings are somewhat affected by the simplicity of the > regex you are asking for. The time to inflate the blobs dominates, > because the search is just a memmem(). On my quad-core w/ > hyperthreading (i.e., 8 apparent cores): > > $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null > 0.42user 0.45system 0:00.15elapsed 578%CPU > $ /usr/bin/time git grep 'a.*b' >/dev/null > 14.68user 0.50system 0:02.00elapsed 758%CPU > $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null > 7.64user 0.41system 0:07.61elapsed 105%CPU > $ /usr/bin/time git grep --cached 'a.*b' >/dev/null > 23.46user 0.47system 0:08.42elapsed 284%CPU > > So I think there is value in parallelizing even --cached greps. But > we could do so much better if blob inflation could be done in > parallel. Ok, I see, I missed that part. Perhaps the heuristic should then be "if the regex boils down to memmem, disable threading", but let's see what loose object decompression in parallel can give us. ---- 8< ---- mincore.c ---- 8< ---- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> #include <fcntl.h> void die(const char *s) { perror(s); exit(1); } int main (int argc, char *argv[]) { void *mem; size_t len; struct stat st; int fd; unsigned char *vec; int vsize; int i; size_t page = sysconf(_SC_PAGESIZE); if (argc != 2) { fprintf(stderr, "usage: %s <file>\n", argv[0]); exit(2); } fd = open(argv[1], O_RDONLY); if (fd == -1) die("open failed"); if (fstat(fd, &st) == -1) die("fstat failed"); mem = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0); if (mem == (void*) -1) die("mmap failed"); vsize = (st.st_size+page-1)/page; vec = malloc(vsize); if (!vec) die("malloc failed"); if (mincore(mem, st.st_size, vec) == -1) die("mincore failed"); for (i = 0; i < vsize; i++) printf("%d", (int) vec[i]); printf("\n"); return 0; } -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 0/3] grep multithreading and scaling 2011-12-05 9:38 ` Thomas Rast @ 2011-12-05 20:16 ` Thomas Rast 2011-12-06 0:40 ` Jeff King 1 sibling, 0 replies; 50+ messages in thread From: Thomas Rast @ 2011-12-05 20:16 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano Thomas Rast wrote: > > I just found out that on Linux, there's mincore() that can tell us > (racily, but who cares) whether a given file mapping is in memory. [...] > So that looks fairly promising, and the order would then be: > > - if stat-clean, and we have mincore(), and it tells us we can do it > cheaply: grab file from tree > > - if it's a loose object: decompress it > > - if stat-clean: grab file from tree > > - access packs as usual Just a small note, I tried two things: * the simpler option of grabbing a loose object if it exists and is mincore() turns out to massively slow down 'git log HEAD', probably because only very few of these objects are loose in the first place * doing this only under grep's use_threads, and dropping the lock around unpack_sha1_file() [i.e., zlib decompression] still results in a git-grep that is slower than without this, though not much So no improvement here. Will have to look into the worktree trick though. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 0/3] grep multithreading and scaling 2011-12-05 9:38 ` Thomas Rast 2011-12-05 20:16 ` Thomas Rast @ 2011-12-06 0:40 ` Jeff King 1 sibling, 0 replies; 50+ messages in thread From: Jeff King @ 2011-12-06 0:40 UTC (permalink / raw) To: Thomas Rast; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano On Mon, Dec 05, 2011 at 10:38:16AM +0100, Thomas Rast wrote: > I just found out that on Linux, there's mincore() that can tell us > (racily, but who cares) whether a given file mapping is in memory. If > you would like to try it, see the source at the end, but I'm getting > things such as Neat, I didn't know about mincore. > So that looks fairly promising, and the order would then be: > > - if stat-clean, and we have mincore(), and it tells us we can do it > cheaply: grab file from tree > > - if it's a loose object: decompress it > > - if stat-clean: grab file from tree > > - access packs as usual I don't think your third one makes sense. If the working tree file isn't stat clean, then either: 1. the pack file is in cache, and it's way faster than faulting in the working tree file from disk 2. the pack file is not in cache, and it's a toss-up whether it is faster to fault in the smaller compressed pack-file version and uncompress it, or to fault in the larger on-disk version. The exact result will depend on the ratio of CPU to disk speed, the quality of your filesystem, and the size and contents of your file. And possibly on the exact delta chains you have. Though this optimization only happens when the file is in the index, which usually means it's recent, which means it will tend to be at the head of the delta chain. So it probably just makes sense to grab the working tree file only if mincore() tells us we have all (or most) of it, and otherwise go to the packfile. > Ok, I see, I missed that part. Perhaps the heuristic should then be > "if the regex boils down to memmem, disable threading", but let's see > what loose object decompression in parallel can give us. Yeah. I'd really rather have parallel object decompression than some complex Linux-only mincore optimization (even though that optimization _could_ yield extra savings on top of properly threading, if the blob retrieval is threaded, I think I'll care less about how much CPU time it takes). -Peff ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH v2 0/3] grep multithreading and scaling 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast ` (3 preceding siblings ...) 2011-12-02 17:34 ` [PATCH v2 0/3] grep multithreading and scaling Jeff King @ 2011-12-02 20:02 ` Eric Herman 4 siblings, 0 replies; 50+ messages in thread From: Eric Herman @ 2011-12-02 20:02 UTC (permalink / raw) To: Thomas Rast; +Cc: René Scharfe, git, Junio C Hamano Hello Thomas, Thanks for the work and the great info. Some of the numbers are quite surprising. I do, indeed, have a machine with more cores, but I have been either busy with out-of-town guests or generally plain lazy in the last couple of weeks. I intend to set aside some time to do some benchmarking this weekend. I'll let you know what I find. Cheers, -Eric -- http://www.freesa.org/ -- mobile: +31 620719662 aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2011-12-25 3:33 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-25 14:46 [PATCH] grep: load funcname patterns for -W Thomas Rast 2011-11-25 16:32 ` René Scharfe 2011-11-26 12:15 ` [PATCH] grep: enable multi-threading for -p and -W René Scharfe 2011-11-29 9:54 ` Thomas Rast 2011-11-29 13:49 ` René Scharfe 2011-11-29 14:07 ` Thomas Rast 2011-12-02 13:07 ` [PATCH v2 0/3] grep multithreading and scaling Thomas Rast 2011-12-02 13:07 ` [PATCH v2 1/3] grep: load funcname patterns for -W Thomas Rast 2011-12-02 13:07 ` [PATCH v2 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast 2011-12-02 13:07 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Thomas Rast 2011-12-02 16:15 ` René Scharfe 2011-12-05 9:02 ` Thomas Rast 2011-12-06 22:48 ` René Scharfe 2011-12-06 23:01 ` [PATCH 4/2] grep: turn off threading for non-worktree René Scharfe 2011-12-07 4:42 ` Jeff King 2011-12-07 17:11 ` René Scharfe 2011-12-07 18:28 ` Jeff King 2011-12-07 20:11 ` J. Bruce Fields 2011-12-07 20:45 ` Jeff King 2011-12-07 8:12 ` Thomas Rast 2011-12-07 17:00 ` René Scharfe 2011-12-10 13:13 ` Pete Wyckoff 2011-12-12 22:37 ` René Scharfe 2011-12-07 4:24 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Jeff King 2011-12-07 16:52 ` René Scharfe 2011-12-07 18:10 ` Jeff King 2011-12-07 8:11 ` Thomas Rast 2011-12-07 16:54 ` René Scharfe 2011-12-12 21:16 ` [PATCH v3 0/3] grep attributes and multithreading Thomas Rast 2011-12-12 21:16 ` [PATCH v3 1/3] grep: load funcname patterns for -W Thomas Rast 2011-12-12 21:16 ` [PATCH v3 2/3] grep: enable threading with -p and -W using lazy attribute lookup Thomas Rast 2011-12-16 8:22 ` Johannes Sixt 2011-12-16 17:34 ` Junio C Hamano 2011-12-12 21:16 ` [PATCH v3 3/3] grep: disable threading in non-worktree case Thomas Rast 2011-12-12 22:37 ` [PATCH v3 0/3] grep attributes and multithreading René Scharfe 2011-12-12 23:44 ` Junio C Hamano 2011-12-13 8:44 ` Thomas Rast 2011-12-23 22:37 ` [PATCH v2 3/3] grep: disable threading in all but worktree case Ævar Arnfjörð Bjarmason 2011-12-23 22:49 ` Thomas Rast 2011-12-24 1:39 ` Ævar Arnfjörð Bjarmason 2011-12-24 7:07 ` Jeff King 2011-12-24 10:49 ` Nguyen Thai Ngoc Duy 2011-12-24 10:55 ` Nguyen Thai Ngoc Duy 2011-12-24 13:38 ` Jeff King 2011-12-25 3:32 ` Nguyen Thai Ngoc Duy 2011-12-02 17:34 ` [PATCH v2 0/3] grep multithreading and scaling Jeff King 2011-12-05 9:38 ` Thomas Rast 2011-12-05 20:16 ` Thomas Rast 2011-12-06 0:40 ` Jeff King 2011-12-02 20:02 ` Eric Herman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).