* [PATCH 2/2] diff: add custom regular expressions for function names @ 2007-07-04 18:06 Johannes Schindelin 2007-07-04 18:41 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Johannes Schindelin @ 2007-07-04 18:06 UTC (permalink / raw) To: git, gitster For convenience, the hunk headers contain function names. The heuristics used for that in "git diff" is extremely simple, but works quite well for most C/C++ and script source code. However, it usually fails for Java source code. This patch introduces a config variable diff.functionNameRegexp which replaces the default heuristics. If the pattern contains a group, the match of this group is used for the hunk header instead of the whole match. You can give more than one regular expression (all of which have to match), by separating them with '\n'. By prefixing an expression with '!', it is negated, i.e. a non-match is required. The last expression is used for the hunk header, and therefore must not be negated. A special exception: since Java is so common nowadays, you can set the config variable to "java", and it will be substituted by a sensible heuristic. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Yes, I know. .gitattributes integration would be nice. But for now I am done with it. I invested way too much time into this. Documentation/config.txt | 17 ++++++++++ diff.c | 4 ++ t/t4018-diff-funcname.sh | 60 +++++++++++++++++++++++++++++++++++ xdiff-interface.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++ xdiff-interface.h | 2 + xdiff/xdiff.h | 4 ++ xdiff/xemit.c | 37 +++++++++++++-------- 7 files changed, 188 insertions(+), 14 deletions(-) create mode 100644 t/t4018-diff-funcname.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index 1d96adf..326e27b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -390,6 +390,23 @@ diff.renames:: will enable basic rename detection. If set to "copies" or "copy", it will detect copies, as well. +diff.functionNameRegexp:: + Instead of the builtin heuristic to determine the function names + for the hunk headers, use the given regular expression to match + lines which contain the function name. If this regular expression + contains at least one group, the match of the first group is + used, rather than the whole line. ++ +Sometimes, one regular expression is not enough. Use '\n' to separate +multiple regular expressions. All of these expressions have to match, +and the last one determines the returned function name. Prefix the +expression with '!' to require a non-match instead of a match. ++ +As a special exception, the pattern 'java' is substituted with a preset +pattern which can be expressed as: 'At least two identifiers, followed by +a "(", and not followed by a ";", and the first identifier must not be a +reserved keyword such as "new", "return", etc.' + fetch.unpackLimit:: If the number of objects fetched over the git native transfer is below this diff --git a/diff.c b/diff.c index bea0163..24b66d0 100644 --- a/diff.c +++ b/diff.c @@ -131,6 +131,10 @@ int git_diff_ui_config(const char *var, const char *value) color_parse(value, var, diff_colors[slot]); return 0; } + if (!prefixcmp(var, "diff.functionnameregexp")) { + xdiff_set_find_func(&default_xecfg, value); + return 0; + } return git_default_config(var, value); } diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh new file mode 100644 index 0000000..90c6605 --- /dev/null +++ b/t/t4018-diff-funcname.sh @@ -0,0 +1,60 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test custom diff function name patterns' + +. ./test-lib.sh + +LF=' +' + +cat > Beer.java << EOF +public class Beer +{ + int special; + public static void main(String args[]) + { + String s=" "; + for(int x = 99; x > 0; x--) + { + System.out.print(x + " bottles of beer on the wall " + + x + " bottles of beer\n" + + "Take one down, pass it around, " + (x - 1) + + " bottles of beer on the wall.\n"); + } + System.out.print("Go to the store, buy some more,\n" + + "99 bottles of beer on the wall.\n"); + } +} +EOF + +sed 's/beer\\/beer,\\/' < Beer.java > Beer-correct.java + +test_expect_success 'default behaviour' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public class Beer" +' + +test_expect_success 'preset java pattern' ' + git config diff.functionnameregexp java && + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public static void main(" +' + +git config diff.functionnameregexp '!static +!String +[^ ].*s.*' + +test_expect_success 'custom pattern' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ int special;$" +' + +test_expect_success 'last regexp must not be negated' ' + git config diff.functionnameregexp "!static" && + ! git diff Beer.java Beer-correct.java +' + +test_done diff --git a/xdiff-interface.c b/xdiff-interface.c index e407cf1..b0cbd60 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -129,3 +129,81 @@ int buffer_is_binary(const char *ptr, unsigned long size) size = FIRST_FEW_BYTES; return !!memchr(ptr, 0, size); } + +struct ff_regs { + int nr; + struct ff_reg { + regex_t re; + int negate; + } *array; +}; + +static long ff_regexp(const char *line, long len, + char *buffer, long buffer_size, void *priv) +{ + char *line_buffer = xstrndup(line, len); /* make NUL terminated */ + struct ff_regs *regs = priv; + regmatch_t pmatch[2]; + int result = 0, i; + + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + if (reg->negate ^ !!regexec(®->re, + line_buffer, 2, pmatch, 0)) { + free(line_buffer); + return -1; + } + } + i = pmatch[1].rm_so >= 0 ? 1 : 0; + line += pmatch[i].rm_so; + result = pmatch[i].rm_eo - pmatch[i].rm_so; + if (result > buffer_size) + result = buffer_size; + else + while (result > 0 && (isspace(line[result - 1]) || + line[result - 1] == '\n')) + result--; + memcpy(buffer, line, result); + free(line_buffer); + return result; +} + +void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value) +{ + int i; + struct ff_regs *regs; + + if (!strcmp(value, "java")) + value = "!^[ ]*\\(catch\\|do\\|for\\|if\\|instanceof\\|" + "new\\|return\\|switch\\|throw\\|while\\)\n" + "^[ ]*\\(\\([ ]*" + "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" + "[ ]*([^;]*$\\)"; + + xecfg->find_func = ff_regexp; + regs = xecfg->find_func_priv = xmalloc(sizeof(struct ff_regs)); + for (i = 0, regs->nr = 1; value[i]; i++) + if (value[i] == '\n') + regs->nr++; + regs->array = xmalloc(regs->nr * sizeof(struct ff_reg)); + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + const char *ep = strchr(value, '\n'), *expression; + char *buffer = NULL; + + reg->negate = (*value == '!'); + if (reg->negate && i == regs->nr - 1) + die("Last expression must not be negated: %s", value); + if (*value == '!') + value++; + if (ep) + expression = buffer = xstrndup(value, ep - value); + else + expression = value; + if (regcomp(®->re, expression, 0)) + die("Invalid diff.functionNameRegexp: %s", expression); + if (buffer) + free(buffer); + value = ep + 1; + } +} diff --git a/xdiff-interface.h b/xdiff-interface.h index 536f4e4..fb742db 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -20,4 +20,6 @@ int parse_hunk_header(char *line, int len, int read_mmfile(mmfile_t *ptr, const char *filename); int buffer_is_binary(const char *ptr, unsigned long size); +extern void xdiff_set_find_func(xdemitconf_t *xecfg, const char *line); + #endif diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 9402bb0..c00ddaa 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -73,9 +73,13 @@ typedef struct s_xdemitcb { int (*outf)(void *, mmbuffer_t *, int); } xdemitcb_t; +typedef long (*find_func_t)(const char *line, long line_len, char *buffer, long buffer_size, void *priv); + typedef struct s_xdemitconf { long ctxlen; unsigned long flags; + find_func_t find_func; + void *find_func_priv; } xdemitconf_t; typedef struct s_bdiffparam { diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 4b6e639..7237054 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -69,7 +69,24 @@ static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) { } -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { +static long def_ff(const char *rec, long len, char *buf, long sz, void *priv) +{ + if (len > 0 && + (isalpha((unsigned char)*rec) || /* identifier? */ + *rec == '_' || /* also identifier? */ + *rec == '$')) { /* mysterious GNU diff's invention */ + if (len > sz) + len = sz; + while (0 < len && isspace((unsigned char)rec[len - 1])) + len--; + memcpy(buf, rec, len); + return len; + } + return -1; +} + +static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll, + find_func_t ff, void *ff_priv) { /* * Be quite stupid about this for now. Find a line in the old file @@ -80,22 +97,12 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { const char *rec; long len; - *ll = 0; while (i-- > 0) { len = xdl_get_rec(xf, i, &rec); - if (len > 0 && - (isalpha((unsigned char)*rec) || /* identifier? */ - *rec == '_' || /* also identifier? */ - *rec == '$')) { /* mysterious GNU diff's invention */ - if (len > sz) - len = sz; - while (0 < len && isspace((unsigned char)rec[len - 1])) - len--; - memcpy(buf, rec, len); - *ll = len; + if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0) return; - } } + *ll = 0; } @@ -120,6 +127,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, xdchange_t *xch, *xche; char funcbuf[80]; long funclen = 0; + find_func_t ff = xecfg->find_func ? xecfg->find_func : def_ff; if (xecfg->flags & XDL_EMIT_COMMON) return xdl_emit_common(xe, xscr, ecb, xecfg); @@ -143,7 +151,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg->flags & XDL_EMIT_FUNCNAMES) { xdl_find_func(&xe->xdf1, s1, funcbuf, - sizeof(funcbuf), &funclen); + sizeof(funcbuf), &funclen, + ff, xecfg->find_func_priv); } if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2, funcbuf, funclen, ecb) < 0) -- 1.5.3.rc0.804.g7068 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-04 18:06 [PATCH 2/2] diff: add custom regular expressions for function names Johannes Schindelin @ 2007-07-04 18:41 ` Linus Torvalds 2007-07-04 18:45 ` Johannes Schindelin 2007-07-04 20:44 ` Junio C Hamano 0 siblings, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-07-04 18:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, gitster On Wed, 4 Jul 2007, Johannes Schindelin wrote: > > This patch introduces a config variable diff.functionNameRegexp > which replaces the default heuristics. If the pattern contains > a group, the match of this group is used for the hunk header > instead of the whole match. Umm. Shouldn't it be a path-name based attribute instead? That way you can set it to be different things for different source files in the same repository.. IOW, think mixed Java/C codebases. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-04 18:41 ` Linus Torvalds @ 2007-07-04 18:45 ` Johannes Schindelin 2007-07-04 20:44 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-04 18:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: git, gitster Hi, On Wed, 4 Jul 2007, Linus Torvalds wrote: > On Wed, 4 Jul 2007, Johannes Schindelin wrote: > > > > This patch introduces a config variable diff.functionNameRegexp > > which replaces the default heuristics. If the pattern contains > > a group, the match of this group is used for the hunk header > > instead of the whole match. > > Umm. Shouldn't it be a path-name based attribute instead? >From my original mail: Yes, I know. .gitattributes integration would be nice. But for now I am done with it. I invested way too much time into this. Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-04 18:41 ` Linus Torvalds 2007-07-04 18:45 ` Johannes Schindelin @ 2007-07-04 20:44 ` Junio C Hamano 2007-07-04 21:20 ` Johannes Schindelin 2007-07-05 5:00 ` Junio C Hamano 1 sibling, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-04 20:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git, gitster Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 4 Jul 2007, Johannes Schindelin wrote: >> >> This patch introduces a config variable diff.functionNameRegexp >> which replaces the default heuristics. If the pattern contains >> a group, the match of this group is used for the hunk header >> instead of the whole match. > > Umm. Shouldn't it be a path-name based attribute instead? > > That way you can set it to be different things for different source files > in the same repository.. IOW, think mixed Java/C codebases. Absolutely. I'll take it from there. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-04 20:44 ` Junio C Hamano @ 2007-07-04 21:20 ` Johannes Schindelin 2007-07-05 5:00 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-04 21:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Wed, 4 Jul 2007, Junio C Hamano wrote: > I'll take it from there. Thanks, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-04 20:44 ` Junio C Hamano 2007-07-04 21:20 ` Johannes Schindelin @ 2007-07-05 5:00 ` Junio C Hamano 2007-07-05 5:02 ` [PATCH 1/2] Future-proof source for changes in xdemitconf_t Junio C Hamano ` (3 more replies) 1 sibling, 4 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-05 5:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> On Wed, 4 Jul 2007, Johannes Schindelin wrote: >>> >>> This patch introduces a config variable diff.functionNameRegexp >>> which replaces the default heuristics. If the pattern contains >>> a group, the match of this group is used for the hunk header >>> instead of the whole match. >> >> Umm. Shouldn't it be a path-name based attribute instead? >> >> That way you can set it to be different things for different source files >> in the same repository.. IOW, think mixed Java/C codebases. > > Absolutely. I'll take it from there. I'll follow-up this message with two patches. The first one is almost the same as Johannes's, but I got rid of the new global variable. The second one discards Johannes's use of a single config variable and replaces it with gitattribute access. The second one is still a WIP; currently you can mark the path to use java convention to find hunk header with: $ echo '*.java funcname=java' >.gitattributes If there are useful "hunk header regexp" to make built-in, you can add them to diff_hunk_header_regexp(). We also should allow user override or define custom regexp with a configuration section, perhaps like: [funcname] java = ... perl = ... default = ... There are enough existing code in diff and convert area that use (path -> attribute -> config) mapping in a very similar way and any git hacker wannabe should be able to mimick them to implement such a support, but tonight's WIP does not have it. *NOTE IN BIG RED LETTERS* I do not particularly like the way multiple regexps are used in Johannes's patch, but I left it as-is, as that part of the change is orthogonal from the support to use the gitattribute mechanism, and the primary reason I got involved in this topic is to give help around the latter area. I think using multiple regexp is cute, but if we do that, it should allow people to pick from: public class Beer { int special; public static void main(String args[]) { ... modified part is here ... with two regexp matches, say: /^(public|private|protectd) class (.*)/ then /^ +.* (\w*\(.*)$/ and define the hunk_header format as something like: "\[1,2]::\[2,1]" meaning, "pick the second capture group from the match data of the first regexp, followed by double-colon, and pick the first capture group from the match data of the second regexp", to result in "Beer::main(String args[])". You should be able to pick "/package (\w+);/ then /^sub (\w+)/" in Perl code using the same idea. I am not married to the syntax I used in the above examples, though. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/2] Future-proof source for changes in xdemitconf_t 2007-07-05 5:00 ` Junio C Hamano @ 2007-07-05 5:02 ` Junio C Hamano 2007-07-05 11:01 ` Johannes Schindelin 2007-07-05 5:02 ` [PATCH 2/2] WIP per-path attribute based hunk header selection Junio C Hamano ` (2 subsequent siblings) 3 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-05 5:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git From: Johannes Schindelin <Johannes.Schindelin@gmx.de> Date: Wed, 4 Jul 2007 19:05:46 +0100 The instances of xdemitconf_t were initialized member by member. Instead, initialize them to all zero, so we do not have to update those places each time we introduce a new member. [jc: minimally fixed by getting rid of a new global] Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin-blame.c | 2 +- builtin-rerere.c | 2 +- combine-diff.c | 3 +-- diff.c | 10 +++++----- merge-file.c | 1 + merge-tree.c | 2 +- 6 files changed, 10 insertions(+), 10 deletions(-) diff --git a/builtin-blame.c b/builtin-blame.c index da23a6f..0519339 100644 --- a/builtin-blame.c +++ b/builtin-blame.c @@ -518,8 +518,8 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o, xdemitcb_t ecb; xpp.flags = xdl_opts; + memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = context; - xecfg.flags = 0; ecb.outf = xdiff_outf; ecb.priv = &state; memset(&state, 0, sizeof(state)); diff --git a/builtin-rerere.c b/builtin-rerere.c index 29fb075..01f3848 100644 --- a/builtin-rerere.c +++ b/builtin-rerere.c @@ -282,8 +282,8 @@ static int diff_two(const char *file1, const char *label1, printf("--- a/%s\n+++ b/%s\n", label1, label2); fflush(stdout); xpp.flags = XDF_NEED_MINIMAL; + memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; - xecfg.flags = 0; ecb.outf = outf; xdl_diff(&minus, &plus, &xpp, &xecfg, &ecb); diff --git a/combine-diff.c b/combine-diff.c index ea3ca5f..ef62234 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -215,8 +215,7 @@ static void combine_diff(const unsigned char *parent, mmfile_t *result_file, parent_file.ptr = grab_blob(parent, &sz); parent_file.size = sz; xpp.flags = XDF_NEED_MINIMAL; - xecfg.ctxlen = 0; - xecfg.flags = 0; + memset(&xecfg, 0, sizeof(xecfg)); ecb.outf = xdiff_outf; ecb.priv = &state; memset(&state, 0, sizeof(state)); diff --git a/diff.c b/diff.c index 1958970..552f7c0 100644 --- a/diff.c +++ b/diff.c @@ -390,6 +390,7 @@ static void diff_words_show(struct diff_words_data *diff_words) mmfile_t minus, plus; int i; + memset(&xecfg, 0, sizeof(xecfg)); minus.size = diff_words->minus.text.size; minus.ptr = xmalloc(minus.size); memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size); @@ -408,7 +409,6 @@ static void diff_words_show(struct diff_words_data *diff_words) xpp.flags = XDF_NEED_MINIMAL; xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc; - xecfg.flags = 0; ecb.outf = xdiff_outf; ecb.priv = diff_words; diff_words->xm.consume = fn_out_diff_words_aux; @@ -1202,6 +1202,7 @@ static void builtin_diff(const char *name_a, xdemitcb_t ecb; struct emit_callback ecbdata; + memset(&xecfg, 0, sizeof(xecfg)); memset(&ecbdata, 0, sizeof(ecbdata)); ecbdata.label_path = lbl; ecbdata.color_diff = o->color_diff; @@ -1270,9 +1271,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, xdemitconf_t xecfg; xdemitcb_t ecb; + memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts; - xecfg.ctxlen = 0; - xecfg.flags = 0; ecb.outf = xdiff_outf; ecb.priv = diffstat; xdl_diff(&mf1, &mf2, &xpp, &xecfg, &ecb); @@ -1310,9 +1310,8 @@ static void builtin_checkdiff(const char *name_a, const char *name_b, xdemitconf_t xecfg; xdemitcb_t ecb; + memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = XDF_NEED_MINIMAL; - xecfg.ctxlen = 0; - xecfg.flags = 0; ecb.outf = xdiff_outf; ecb.priv = &data; xdl_diff(&mf1, &mf2, &xpp, &xecfg, &ecb); @@ -2764,6 +2763,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) struct diff_filepair *p = q->queue[i]; int len1, len2; + memset(&xecfg, 0, sizeof(xecfg)); if (p->status == 0) return error("internal diff status error"); if (p->status == DIFF_STATUS_UNKNOWN) diff --git a/merge-file.c b/merge-file.c index 748d15c..1e031ea 100644 --- a/merge-file.c +++ b/merge-file.c @@ -62,6 +62,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2) xdemitcb_t ecb; xpp.flags = XDF_NEED_MINIMAL; + memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; xecfg.flags = XDL_EMIT_COMMON; ecb.outf = common_outf; diff --git a/merge-tree.c b/merge-tree.c index 3b8d9e6..7d4f628 100644 --- a/merge-tree.c +++ b/merge-tree.c @@ -106,8 +106,8 @@ static void show_diff(struct merge_list *entry) xdemitcb_t ecb; xpp.flags = XDF_NEED_MINIMAL; + memset(&xecfg, 0, sizeof(xecfg)); xecfg.ctxlen = 3; - xecfg.flags = 0; ecb.outf = show_outf; ecb.priv = NULL; -- 1.5.3.rc0.818.ga2b52 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] Future-proof source for changes in xdemitconf_t 2007-07-05 5:02 ` [PATCH 1/2] Future-proof source for changes in xdemitconf_t Junio C Hamano @ 2007-07-05 11:01 ` Johannes Schindelin 0 siblings, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-05 11:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Wed, 4 Jul 2007, Junio C Hamano wrote: > [jc: minimally fixed by getting rid of a new global] Funny. This is identical to my first version of that patch ;-) Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/2] WIP per-path attribute based hunk header selection. 2007-07-05 5:00 ` Junio C Hamano 2007-07-05 5:02 ` [PATCH 1/2] Future-proof source for changes in xdemitconf_t Junio C Hamano @ 2007-07-05 5:02 ` Junio C Hamano 2007-07-05 11:25 ` Johannes Schindelin 2007-07-05 8:24 ` [PATCH 2/2] diff: add custom regular expressions for function names Florian Weimer 2007-07-05 11:44 ` Johannes Schindelin 3 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-05 5:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git This makes"diff -p" hunk headers customizable via gitattributes mechanism. It is based on Johannes's earlier patch that allowed to define a single regexp to be used for everything. The mechanism to arrive at the regexp that is used to define hunk header is the same as other use of gitattributes. You assign an attribute, funcname (because "diff -p" typically uses the name of the function the patch is about as the hunk header), a simple string value. This can be one of the names of built-in pattern (currently, "java" is defined) or a custom pattern name, to be looked up from the configuration file. (in .gitattributes) *.java funcname=java *.perl funcname=perl (in .git/config) [funcname] java = ... # ugly and complicated regexp to override the built-in one. perl = ... # another ugly and complicated regexp to define a new one. This is still a WIP in that the code to look-up the configuration is not there yet. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff.c | 121 +++++++++++++++++++++++++++++++++------------- diffcore-delta.c | 2 +- diffcore.h | 3 + t/t4018-diff-funcname.sh | 63 ++++++++++++++++++++++++ xdiff-interface.c | 71 +++++++++++++++++++++++++++ xdiff-interface.h | 2 + xdiff/xdiff.h | 4 ++ xdiff/xemit.c | 37 +++++++++----- 8 files changed, 254 insertions(+), 49 deletions(-) create mode 100644 t/t4018-diff-funcname.sh diff --git a/diff.c b/diff.c index 552f7c0..0c7d2c6 100644 --- a/diff.c +++ b/diff.c @@ -1101,31 +1101,91 @@ static void emit_binary_diff(mmfile_t *one, mmfile_t *two) static void setup_diff_attr_check(struct git_attr_check *check) { static struct git_attr *attr_diff; + static struct git_attr *attr_diff_func_name; - if (!attr_diff) + if (!attr_diff) { attr_diff = git_attr("diff", 4); - check->attr = attr_diff; + attr_diff_func_name = git_attr("funcname", 8); + } + check[0].attr = attr_diff; + check[1].attr = attr_diff_func_name; } -static int file_is_binary(struct diff_filespec *one) +static void diff_filespec_check_attr(struct diff_filespec *one) { - struct git_attr_check attr_diff_check; + struct git_attr_check attr_diff_check[2]; - setup_diff_attr_check(&attr_diff_check); - if (!git_checkattr(one->path, 1, &attr_diff_check)) { - const char *value = attr_diff_check.value; + if (one->checked_attr) + return; + + setup_diff_attr_check(attr_diff_check); + one->is_binary = 0; + one->hunk_header_ident = NULL; + + if (!git_checkattr(one->path, ARRAY_SIZE(attr_diff_check), attr_diff_check)) { + const char *value; + + /* binaryness */ + value = attr_diff_check[0].value; if (ATTR_TRUE(value)) - return 0; + ; else if (ATTR_FALSE(value)) - return 1; + one->is_binary = 1; + + /* hunk header ident */ + value = attr_diff_check[1].value; + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value)) + ; + else + one->hunk_header_ident = value; } - if (!one->data) { - if (!DIFF_FILE_VALID(one)) - return 0; + if (!one->data && DIFF_FILE_VALID(one)) diff_populate_filespec(one, 0); + + if (one->data) + one->is_binary = buffer_is_binary(one->data, one->size); + +} + +int diff_filespec_is_binary(struct diff_filespec *one) +{ + diff_filespec_check_attr(one); + return one->is_binary; +} + +static const char *diff_hunk_header_regexp(struct diff_filespec *one) +{ + const char *ident; + + diff_filespec_check_attr(one); + ident = one->hunk_header_ident; + + if (!ident) { + /* + * If the config file has "funcname.default" defined, return + * that regexp here, instead of NULL. + */ + return NULL; } - return buffer_is_binary(one->data, one->size); + + /* + * FIXME: look up custom "funcname.$ident" regexp from config, + * and return it here. + */ + + + /* + * And define built-in fallback patterns here... + */ + if (!strcmp(ident, "java")) + return "!^[ ]*\\(catch\\|do\\|for\\|if\\|instanceof\\|" + "new\\|return\\|switch\\|throw\\|while\\)\n" + "^[ ]*\\(\\([ ]*" + "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" + "[ ]*([^;]*$\\)"; + + return NULL; } static void builtin_diff(const char *name_a, @@ -1182,7 +1242,8 @@ static void builtin_diff(const char *name_a, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (!o->text && (file_is_binary(one) || file_is_binary(two))) { + if (!o->text && + (diff_filespec_is_binary(one) || diff_filespec_is_binary(two))) { /* Quite common confusing case */ if (mf1.size == mf2.size && !memcmp(mf1.ptr, mf2.ptr, mf1.size)) @@ -1201,6 +1262,11 @@ static void builtin_diff(const char *name_a, xdemitconf_t xecfg; xdemitcb_t ecb; struct emit_callback ecbdata; + const char *hunk_header_regexp; + + hunk_header_regexp = diff_hunk_header_regexp(one); + if (!hunk_header_regexp) + hunk_header_regexp = diff_hunk_header_regexp(two); memset(&xecfg, 0, sizeof(xecfg)); memset(&ecbdata, 0, sizeof(ecbdata)); @@ -1210,6 +1276,8 @@ static void builtin_diff(const char *name_a, xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts; xecfg.ctxlen = o->context; xecfg.flags = XDL_EMIT_FUNCNAMES; + if (hunk_header_regexp) + xdiff_set_find_func(&xecfg, hunk_header_regexp); if (!diffopts) ; else if (!prefixcmp(diffopts, "--unified=")) @@ -1261,7 +1329,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (file_is_binary(one) || file_is_binary(two)) { + if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) { data->is_binary = 1; data->added = mf2.size; data->deleted = mf1.size; @@ -1302,7 +1370,7 @@ static void builtin_checkdiff(const char *name_a, const char *name_b, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (file_is_binary(two)) + if (diff_filespec_is_binary(two)) goto free_and_return; else { /* Crazy xdl interfaces.. */ @@ -1879,8 +1947,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o) if (o->binary) { mmfile_t mf; - if ((!fill_mmfile(&mf, one) && file_is_binary(one)) || - (!fill_mmfile(&mf, two) && file_is_binary(two))) + if ((!fill_mmfile(&mf, one) && diff_filespec_is_binary(one)) || + (!fill_mmfile(&mf, two) && diff_filespec_is_binary(two))) abbrev = 40; } len += snprintf(msg + len, sizeof(msg) - len, @@ -2783,7 +2851,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) return error("unable to read files to diff"); /* Maybe hash p->two? into the patch id? */ - if (file_is_binary(p->two)) + if (diff_filespec_is_binary(p->two)) continue; len1 = remove_space(p->one->path, strlen(p->one->path)); @@ -3011,21 +3079,6 @@ void diffcore_std(struct diff_options *options) if (options->quiet) return; - /* - * break/rename count similarity differently depending on - * the binary-ness. - */ - if ((options->break_opt != -1) || (options->detect_rename)) { - struct diff_queue_struct *q = &diff_queued_diff; - int i; - - for (i = 0; i < q->nr; i++) { - struct diff_filepair *p = q->queue[i]; - p->one->is_binary = file_is_binary(p->one); - p->two->is_binary = file_is_binary(p->two); - } - } - if (options->break_opt != -1) diffcore_break(options->break_opt); if (options->detect_rename) diff --git a/diffcore-delta.c b/diffcore-delta.c index a038b16..d9729e5 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -129,7 +129,7 @@ static struct spanhash_top *hash_chars(struct diff_filespec *one) struct spanhash_top *hash; unsigned char *buf = one->data; unsigned int sz = one->size; - int is_text = !one->is_binary; + int is_text = !diff_filespec_is_binary(one); i = INITIAL_HASH_SIZE; hash = xmalloc(sizeof(*hash) + sizeof(struct spanhash) * (1<<i)); diff --git a/diffcore.h b/diffcore.h index 0c8abb5..0598514 100644 --- a/diffcore.h +++ b/diffcore.h @@ -27,6 +27,7 @@ struct diff_filespec { char *path; void *data; void *cnt_data; + const void *hunk_header_ident; unsigned long size; int xfrm_flags; /* for use by the xfrm */ unsigned short mode; /* file mode */ @@ -37,6 +38,7 @@ struct diff_filespec { #define DIFF_FILE_VALID(spec) (((spec)->mode) != 0) unsigned should_free : 1; /* data should be free()'ed */ unsigned should_munmap : 1; /* data should be munmap()'ed */ + unsigned checked_attr : 1; unsigned is_binary : 1; /* data should be considered "binary" */ }; @@ -46,6 +48,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *, extern int diff_populate_filespec(struct diff_filespec *, int); extern void diff_free_filespec_data(struct diff_filespec *); +extern int diff_filespec_is_binary(struct diff_filespec *); struct diff_filepair { struct diff_filespec *one; diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh new file mode 100644 index 0000000..0689b2f --- /dev/null +++ b/t/t4018-diff-funcname.sh @@ -0,0 +1,63 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test custom diff function name patterns' + +. ./test-lib.sh + +LF=' +' + +cat > Beer.java << EOF +public class Beer +{ + int special; + public static void main(String args[]) + { + String s=" "; + for(int x = 99; x > 0; x--) + { + System.out.print(x + " bottles of beer on the wall " + + x + " bottles of beer\n" + + "Take one down, pass it around, " + (x - 1) + + " bottles of beer on the wall.\n"); + } + System.out.print("Go to the store, buy some more,\n" + + "99 bottles of beer on the wall.\n"); + } +} +EOF + +sed 's/beer\\/beer,\\/' < Beer.java > Beer-correct.java + +test_expect_success 'default behaviour' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public class Beer" +' + +test_expect_success 'preset java pattern' ' + echo "*.java funcname=java" >.gitattributes && + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public static void main(" +' + +test_done + + +git config diff.functionnameregexp '!static +!String +[^ ].*s.*' + +test_expect_success 'custom pattern' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ int special;$" +' + +test_expect_success 'last regexp must not be negated' ' + git config diff.functionnameregexp "!static" && + ! git diff Beer.java Beer-correct.java +' + +test_done diff --git a/xdiff-interface.c b/xdiff-interface.c index e407cf1..be866d1 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -129,3 +129,74 @@ int buffer_is_binary(const char *ptr, unsigned long size) size = FIRST_FEW_BYTES; return !!memchr(ptr, 0, size); } + +struct ff_regs { + int nr; + struct ff_reg { + regex_t re; + int negate; + } *array; +}; + +static long ff_regexp(const char *line, long len, + char *buffer, long buffer_size, void *priv) +{ + char *line_buffer = xstrndup(line, len); /* make NUL terminated */ + struct ff_regs *regs = priv; + regmatch_t pmatch[2]; + int result = 0, i; + + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + if (reg->negate ^ !!regexec(®->re, + line_buffer, 2, pmatch, 0)) { + free(line_buffer); + return -1; + } + } + i = pmatch[1].rm_so >= 0 ? 1 : 0; + line += pmatch[i].rm_so; + result = pmatch[i].rm_eo - pmatch[i].rm_so; + if (result > buffer_size) + result = buffer_size; + else + while (result > 0 && (isspace(line[result - 1]) || + line[result - 1] == '\n')) + result--; + memcpy(buffer, line, result); + free(line_buffer); + return result; +} + +void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value) +{ + int i; + struct ff_regs *regs; + + xecfg->find_func = ff_regexp; + regs = xecfg->find_func_priv = xmalloc(sizeof(struct ff_regs)); + for (i = 0, regs->nr = 1; value[i]; i++) + if (value[i] == '\n') + regs->nr++; + regs->array = xmalloc(regs->nr * sizeof(struct ff_reg)); + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + const char *ep = strchr(value, '\n'), *expression; + char *buffer = NULL; + + reg->negate = (*value == '!'); + if (reg->negate && i == regs->nr - 1) + die("Last expression must not be negated: %s", value); + if (*value == '!') + value++; + if (ep) + expression = buffer = xstrndup(value, ep - value); + else + expression = value; + if (regcomp(®->re, expression, 0)) + die("Invalid regexp to look for hunk header: %s", expression); + if (buffer) + free(buffer); + value = ep + 1; + } +} diff --git a/xdiff-interface.h b/xdiff-interface.h index 536f4e4..fb742db 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -20,4 +20,6 @@ int parse_hunk_header(char *line, int len, int read_mmfile(mmfile_t *ptr, const char *filename); int buffer_is_binary(const char *ptr, unsigned long size); +extern void xdiff_set_find_func(xdemitconf_t *xecfg, const char *line); + #endif diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 9402bb0..c00ddaa 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -73,9 +73,13 @@ typedef struct s_xdemitcb { int (*outf)(void *, mmbuffer_t *, int); } xdemitcb_t; +typedef long (*find_func_t)(const char *line, long line_len, char *buffer, long buffer_size, void *priv); + typedef struct s_xdemitconf { long ctxlen; unsigned long flags; + find_func_t find_func; + void *find_func_priv; } xdemitconf_t; typedef struct s_bdiffparam { diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 4b6e639..d3d9c84 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -69,7 +69,24 @@ static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) { } -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { +static long def_ff(const char *rec, long len, char *buf, long sz, void *priv) +{ + if (len > 0 && + (isalpha((unsigned char)*rec) || /* identifier? */ + *rec == '_' || /* also identifier? */ + *rec == '$')) { /* identifiers from VMS and other esoterico */ + if (len > sz) + len = sz; + while (0 < len && isspace((unsigned char)rec[len - 1])) + len--; + memcpy(buf, rec, len); + return len; + } + return -1; +} + +static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll, + find_func_t ff, void *ff_priv) { /* * Be quite stupid about this for now. Find a line in the old file @@ -80,22 +97,12 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { const char *rec; long len; - *ll = 0; while (i-- > 0) { len = xdl_get_rec(xf, i, &rec); - if (len > 0 && - (isalpha((unsigned char)*rec) || /* identifier? */ - *rec == '_' || /* also identifier? */ - *rec == '$')) { /* mysterious GNU diff's invention */ - if (len > sz) - len = sz; - while (0 < len && isspace((unsigned char)rec[len - 1])) - len--; - memcpy(buf, rec, len); - *ll = len; + if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0) return; - } } + *ll = 0; } @@ -120,6 +127,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, xdchange_t *xch, *xche; char funcbuf[80]; long funclen = 0; + find_func_t ff = xecfg->find_func ? xecfg->find_func : def_ff; if (xecfg->flags & XDL_EMIT_COMMON) return xdl_emit_common(xe, xscr, ecb, xecfg); @@ -143,7 +151,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg->flags & XDL_EMIT_FUNCNAMES) { xdl_find_func(&xe->xdf1, s1, funcbuf, - sizeof(funcbuf), &funclen); + sizeof(funcbuf), &funclen, + ff, xecfg->find_func_priv); } if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2, funcbuf, funclen, ecb) < 0) -- 1.5.3.rc0.818.ga2b52 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] WIP per-path attribute based hunk header selection. 2007-07-05 5:02 ` [PATCH 2/2] WIP per-path attribute based hunk header selection Junio C Hamano @ 2007-07-05 11:25 ` Johannes Schindelin 2007-07-05 16:43 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-05 11:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git Hi, On Wed, 4 Jul 2007, Junio C Hamano wrote: > diff --git a/diff.c b/diff.c > index 552f7c0..0c7d2c6 100644 > --- a/diff.c > +++ b/diff.c > @@ -3011,21 +3079,6 @@ void diffcore_std(struct diff_options *options) > if (options->quiet) > return; > > - /* > - * break/rename count similarity differently depending on > - * the binary-ness. > - */ > - if ((options->break_opt != -1) || (options->detect_rename)) { > - struct diff_queue_struct *q = &diff_queued_diff; > - int i; > - > - for (i = 0; i < q->nr; i++) { > - struct diff_filepair *p = q->queue[i]; > - p->one->is_binary = file_is_binary(p->one); > - p->two->is_binary = file_is_binary(p->two); > - } > - } > - > if (options->break_opt != -1) > diffcore_break(options->break_opt); > if (options->detect_rename) > diff --git a/diffcore-delta.c b/diffcore-delta.c > index a038b16..d9729e5 100644 > --- a/diffcore-delta.c > +++ b/diffcore-delta.c > @@ -129,7 +129,7 @@ static struct spanhash_top *hash_chars(struct diff_filespec *one) > struct spanhash_top *hash; > unsigned char *buf = one->data; > unsigned int sz = one->size; > - int is_text = !one->is_binary; > + int is_text = !diff_filespec_is_binary(one); > > i = INITIAL_HASH_SIZE; > hash = xmalloc(sizeof(*hash) + sizeof(struct spanhash) * (1<<i)); It is quite cute to hide this in the funcname patch... Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] WIP per-path attribute based hunk header selection. 2007-07-05 11:25 ` Johannes Schindelin @ 2007-07-05 16:43 ` Junio C Hamano 2007-07-06 8:37 ` [PATCH 1/3] Introduce diff_filespec_is_binary() Junio C Hamano 2007-07-06 8:39 ` [PATCH] Per-path attribute based hunk header selection Junio C Hamano 2 siblings, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-05 16:43 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Wed, 4 Jul 2007, Junio C Hamano wrote: > >> diff --git a/diff.c b/diff.c >> index 552f7c0..0c7d2c6 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -3011,21 +3079,6 @@ void diffcore_std(struct diff_options *options) >> if (options->quiet) >> return; >> >> - /* >> - * break/rename count similarity differently depending on >> - * the binary-ness. >> - */ >> - if ((options->break_opt != -1) || (options->detect_rename)) { >> - struct diff_queue_struct *q = &diff_queued_diff; >> - int i; >> - >> - for (i = 0; i < q->nr; i++) { >> - struct diff_filepair *p = q->queue[i]; >> - p->one->is_binary = file_is_binary(p->one); >> - p->two->is_binary = file_is_binary(p->two); >> - } >> - } >> - >> if (options->break_opt != -1) >> diffcore_break(options->break_opt); >> if (options->detect_rename) >> diff --git a/diffcore-delta.c b/diffcore-delta.c >> index a038b16..d9729e5 100644 >> --- a/diffcore-delta.c >> +++ b/diffcore-delta.c >> @@ -129,7 +129,7 @@ static struct spanhash_top *hash_chars(struct diff_filespec *one) >> struct spanhash_top *hash; >> unsigned char *buf = one->data; >> unsigned int sz = one->size; >> - int is_text = !one->is_binary; >> + int is_text = !diff_filespec_is_binary(one); >> >> i = INITIAL_HASH_SIZE; >> hash = xmalloc(sizeof(*hash) + sizeof(struct spanhash) * (1<<i)); > > It is quite cute to hide this in the funcname patch... That is why it is marked as WIP. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] Introduce diff_filespec_is_binary() 2007-07-05 11:25 ` Johannes Schindelin 2007-07-05 16:43 ` Junio C Hamano @ 2007-07-06 8:37 ` Junio C Hamano 2007-07-06 12:20 ` Johannes Schindelin 2007-07-06 8:39 ` [PATCH] Per-path attribute based hunk header selection Junio C Hamano 2 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-06 8:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, git This replaces an explicit initialization of filespec->is_binary field used for rename/break followed by direct access to that field with a wrapper function that lazily iniaitlizes and accesses the field. We would add more attribute accesses for the use of diff routines, and it would be better to make this abstraction earlier. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > It is quite cute to hide this in the funcname patch... So this is the proper refactoring _before_ any of your topics. diff.c | 71 +++++++++++++++++++++++++++-------------------------- diffcore-delta.c | 2 +- diffcore.h | 2 + 3 files changed, 39 insertions(+), 36 deletions(-) diff --git a/diff.c b/diff.c index 1958970..16ea710 100644 --- a/diff.c +++ b/diff.c @@ -1102,30 +1102,45 @@ static void setup_diff_attr_check(struct git_attr_check *check) { static struct git_attr *attr_diff; - if (!attr_diff) + if (!attr_diff) { attr_diff = git_attr("diff", 4); - check->attr = attr_diff; + } + check[0].attr = attr_diff; } -static int file_is_binary(struct diff_filespec *one) +static void diff_filespec_check_attr(struct diff_filespec *one) { - struct git_attr_check attr_diff_check; + struct git_attr_check attr_diff_check[1]; - setup_diff_attr_check(&attr_diff_check); - if (!git_checkattr(one->path, 1, &attr_diff_check)) { - const char *value = attr_diff_check.value; + if (one->checked_attr) + return; + + setup_diff_attr_check(attr_diff_check); + one->is_binary = 0; + + if (!git_checkattr(one->path, ARRAY_SIZE(attr_diff_check), attr_diff_check)) { + const char *value; + + /* binaryness */ + value = attr_diff_check[0].value; if (ATTR_TRUE(value)) - return 0; + ; else if (ATTR_FALSE(value)) - return 1; + one->is_binary = 1; } - if (!one->data) { - if (!DIFF_FILE_VALID(one)) - return 0; + if (!one->data && DIFF_FILE_VALID(one)) diff_populate_filespec(one, 0); - } - return buffer_is_binary(one->data, one->size); + + if (one->data) + one->is_binary = buffer_is_binary(one->data, one->size); + +} + +int diff_filespec_is_binary(struct diff_filespec *one) +{ + diff_filespec_check_attr(one); + return one->is_binary; } static void builtin_diff(const char *name_a, @@ -1182,7 +1197,8 @@ static void builtin_diff(const char *name_a, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (!o->text && (file_is_binary(one) || file_is_binary(two))) { + if (!o->text && + (diff_filespec_is_binary(one) || diff_filespec_is_binary(two))) { /* Quite common confusing case */ if (mf1.size == mf2.size && !memcmp(mf1.ptr, mf2.ptr, mf1.size)) @@ -1260,7 +1276,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (file_is_binary(one) || file_is_binary(two)) { + if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) { data->is_binary = 1; data->added = mf2.size; data->deleted = mf1.size; @@ -1302,7 +1318,7 @@ static void builtin_checkdiff(const char *name_a, const char *name_b, if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) die("unable to read files to diff"); - if (file_is_binary(two)) + if (diff_filespec_is_binary(two)) goto free_and_return; else { /* Crazy xdl interfaces.. */ @@ -1880,8 +1896,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o) if (o->binary) { mmfile_t mf; - if ((!fill_mmfile(&mf, one) && file_is_binary(one)) || - (!fill_mmfile(&mf, two) && file_is_binary(two))) + if ((!fill_mmfile(&mf, one) && diff_filespec_is_binary(one)) || + (!fill_mmfile(&mf, two) && diff_filespec_is_binary(two))) abbrev = 40; } len += snprintf(msg + len, sizeof(msg) - len, @@ -2783,7 +2799,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) return error("unable to read files to diff"); /* Maybe hash p->two? into the patch id? */ - if (file_is_binary(p->two)) + if (diff_filespec_is_binary(p->two)) continue; len1 = remove_space(p->one->path, strlen(p->one->path)); @@ -3011,21 +3027,6 @@ void diffcore_std(struct diff_options *options) if (options->quiet) return; - /* - * break/rename count similarity differently depending on - * the binary-ness. - */ - if ((options->break_opt != -1) || (options->detect_rename)) { - struct diff_queue_struct *q = &diff_queued_diff; - int i; - - for (i = 0; i < q->nr; i++) { - struct diff_filepair *p = q->queue[i]; - p->one->is_binary = file_is_binary(p->one); - p->two->is_binary = file_is_binary(p->two); - } - } - if (options->break_opt != -1) diffcore_break(options->break_opt); if (options->detect_rename) diff --git a/diffcore-delta.c b/diffcore-delta.c index a038b16..d9729e5 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -129,7 +129,7 @@ static struct spanhash_top *hash_chars(struct diff_filespec *one) struct spanhash_top *hash; unsigned char *buf = one->data; unsigned int sz = one->size; - int is_text = !one->is_binary; + int is_text = !diff_filespec_is_binary(one); i = INITIAL_HASH_SIZE; hash = xmalloc(sizeof(*hash) + sizeof(struct spanhash) * (1<<i)); diff --git a/diffcore.h b/diffcore.h index 0c8abb5..dcab7e2 100644 --- a/diffcore.h +++ b/diffcore.h @@ -37,6 +37,7 @@ struct diff_filespec { #define DIFF_FILE_VALID(spec) (((spec)->mode) != 0) unsigned should_free : 1; /* data should be free()'ed */ unsigned should_munmap : 1; /* data should be munmap()'ed */ + unsigned checked_attr : 1; unsigned is_binary : 1; /* data should be considered "binary" */ }; @@ -46,6 +47,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *, extern int diff_populate_filespec(struct diff_filespec *, int); extern void diff_free_filespec_data(struct diff_filespec *); +extern int diff_filespec_is_binary(struct diff_filespec *); struct diff_filepair { struct diff_filespec *one; -- 1.5.3.rc0.822.g38609 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] Introduce diff_filespec_is_binary() 2007-07-06 8:37 ` [PATCH 1/3] Introduce diff_filespec_is_binary() Junio C Hamano @ 2007-07-06 12:20 ` Johannes Schindelin 0 siblings, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-06 12:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Fri, 6 Jul 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > It is quite cute to hide this in the funcname patch... > > So this is the proper refactoring _before_ any of your topics. Thanks. And sorry for bugging you. Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] Per-path attribute based hunk header selection. 2007-07-05 11:25 ` Johannes Schindelin 2007-07-05 16:43 ` Junio C Hamano 2007-07-06 8:37 ` [PATCH 1/3] Introduce diff_filespec_is_binary() Junio C Hamano @ 2007-07-06 8:39 ` Junio C Hamano 2007-07-06 12:38 ` Johannes Schindelin 2007-07-06 17:59 ` Linus Torvalds 2 siblings, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-06 8:39 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, git This makes"diff -p" hunk headers customizable via gitattributes mechanism. It is based on Johannes's earlier patch that allowed to define a single regexp to be used for everything. The mechanism to arrive at the regexp that is used to define hunk header is the same as other use of gitattributes. You assign an attribute, funcname (because "diff -p" typically uses the name of the function the patch is about as the hunk header), a simple string value. This can be one of the names of built-in pattern (currently, "java" is defined) or a custom pattern name, to be looked up from the configuration file. (in .gitattributes) *.java funcname=java *.perl funcname=perl (in .git/config) [funcname] java = ... # ugly and complicated regexp to override the built-in one. perl = ... # another ugly and complicated regexp to define a new one. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- This three patch series is lacking 2/3, as it is exactly the same as the earlier "1/2 Future-proof source for changes in xdemitconf_t" patch. diff.c | 96 +++++++++++++++++++++++++++++++++++++++++++++- diffcore.h | 1 + t/t4018-diff-funcname.sh | 60 ++++++++++++++++++++++++++++ xdiff-interface.c | 71 ++++++++++++++++++++++++++++++++++ xdiff-interface.h | 2 + xdiff/xdiff.h | 4 ++ xdiff/xemit.c | 37 +++++++++++------- 7 files changed, 256 insertions(+), 15 deletions(-) create mode 100644 t/t4018-diff-funcname.sh diff --git a/diff.c b/diff.c index d10e848..04e7e91 100644 --- a/diff.c +++ b/diff.c @@ -1101,22 +1101,26 @@ static void emit_binary_diff(mmfile_t *one, mmfile_t *two) static void setup_diff_attr_check(struct git_attr_check *check) { static struct git_attr *attr_diff; + static struct git_attr *attr_diff_func_name; if (!attr_diff) { attr_diff = git_attr("diff", 4); + attr_diff_func_name = git_attr("funcname", 8); } check[0].attr = attr_diff; + check[1].attr = attr_diff_func_name; } static void diff_filespec_check_attr(struct diff_filespec *one) { - struct git_attr_check attr_diff_check[1]; + struct git_attr_check attr_diff_check[2]; if (one->checked_attr) return; setup_diff_attr_check(attr_diff_check); one->is_binary = 0; + one->hunk_header_ident = NULL; if (!git_checkattr(one->path, ARRAY_SIZE(attr_diff_check), attr_diff_check)) { const char *value; @@ -1127,6 +1131,13 @@ static void diff_filespec_check_attr(struct diff_filespec *one) ; else if (ATTR_FALSE(value)) one->is_binary = 1; + + /* hunk header ident */ + value = attr_diff_check[1].value; + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value)) + ; + else + one->hunk_header_ident = value; } if (!one->data && DIFF_FILE_VALID(one)) @@ -1143,6 +1154,82 @@ int diff_filespec_is_binary(struct diff_filespec *one) return one->is_binary; } +static struct hunk_header_regexp { + char *name; + char *regexp; + struct hunk_header_regexp *next; +} *hunk_header_regexp_list, **hunk_header_regexp_tail; + +static int hunk_header_config(const char *var, const char *value) +{ + static const char funcname[] = "funcname."; + struct hunk_header_regexp *hh; + + if (prefixcmp(var, funcname)) + return 0; + var += strlen(funcname); + for (hh = hunk_header_regexp_list; hh; hh = hh->next) + if (!strcmp(var, hh->name)) { + free(hh->regexp); + hh->regexp = xstrdup(value); + return 0; + } + hh = xcalloc(1, sizeof(*hh)); + hh->name = xstrdup(var); + hh->regexp = xstrdup(value); + hh->next = NULL; + *hunk_header_regexp_tail = hh; + return 0; +} + +static const char *hunk_header_regexp(const char *ident) +{ + struct hunk_header_regexp *hh; + + if (!hunk_header_regexp_tail) { + hunk_header_regexp_tail = &hunk_header_regexp_list; + git_config(hunk_header_config); + } + for (hh = hunk_header_regexp_list; hh; hh = hh->next) + if (!strcmp(ident, hh->name)) + return hh->regexp; + return NULL; +} + +static const char *diff_hunk_header_regexp(struct diff_filespec *one) +{ + const char *ident, *regexp; + + diff_filespec_check_attr(one); + ident = one->hunk_header_ident; + + if (!ident) + /* + * If the config file has "funcname.default" defined, that + * regexp is used; otherwise NULL is returned and xemit uses + * the built-in default. + */ + return hunk_header_regexp("default"); + + /* Look up custom "funcname.$ident" regexp from config. */ + regexp = hunk_header_regexp(ident); + if (regexp) + return regexp; + + /* + * And define built-in fallback patterns here. Note that + * these can be overriden by the user's config settings. + */ + if (!strcmp(ident, "java")) + return "!^[ ]*\\(catch\\|do\\|for\\|if\\|instanceof\\|" + "new\\|return\\|switch\\|throw\\|while\\)\n" + "^[ ]*\\(\\([ ]*" + "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" + "[ ]*([^;]*$\\)"; + + return NULL; +} + static void builtin_diff(const char *name_a, const char *name_b, struct diff_filespec *one, @@ -1217,6 +1304,11 @@ static void builtin_diff(const char *name_a, xdemitconf_t xecfg; xdemitcb_t ecb; struct emit_callback ecbdata; + const char *hunk_header_regexp; + + hunk_header_regexp = diff_hunk_header_regexp(one); + if (!hunk_header_regexp) + hunk_header_regexp = diff_hunk_header_regexp(two); memset(&xecfg, 0, sizeof(xecfg)); memset(&ecbdata, 0, sizeof(ecbdata)); @@ -1226,6 +1318,8 @@ static void builtin_diff(const char *name_a, xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts; xecfg.ctxlen = o->context; xecfg.flags = XDL_EMIT_FUNCNAMES; + if (hunk_header_regexp) + xdiff_set_find_func(&xecfg, hunk_header_regexp); if (!diffopts) ; else if (!prefixcmp(diffopts, "--unified=")) diff --git a/diffcore.h b/diffcore.h index dcab7e2..0598514 100644 --- a/diffcore.h +++ b/diffcore.h @@ -27,6 +27,7 @@ struct diff_filespec { char *path; void *data; void *cnt_data; + const void *hunk_header_ident; unsigned long size; int xfrm_flags; /* for use by the xfrm */ unsigned short mode; /* file mode */ diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh new file mode 100644 index 0000000..dc7a47b --- /dev/null +++ b/t/t4018-diff-funcname.sh @@ -0,0 +1,60 @@ +#!/bin/sh +# +# Copyright (c) 2007 Johannes E. Schindelin +# + +test_description='Test custom diff function name patterns' + +. ./test-lib.sh + +LF=' +' + +cat > Beer.java << EOF +public class Beer +{ + int special; + public static void main(String args[]) + { + String s=" "; + for(int x = 99; x > 0; x--) + { + System.out.print(x + " bottles of beer on the wall " + + x + " bottles of beer\n" + + "Take one down, pass it around, " + (x - 1) + + " bottles of beer on the wall.\n"); + } + System.out.print("Go to the store, buy some more,\n" + + "99 bottles of beer on the wall.\n"); + } +} +EOF + +sed 's/beer\\/beer,\\/' < Beer.java > Beer-correct.java + +test_expect_success 'default behaviour' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public class Beer" +' + +test_expect_success 'preset java pattern' ' + echo "*.java funcname=java" >.gitattributes && + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ public static void main(" +' + +git config funcname.java '!static +!String +[^ ].*s.*' + +test_expect_success 'custom pattern' ' + git diff Beer.java Beer-correct.java | + grep "^@@.*@@ int special;$" +' + +test_expect_success 'last regexp must not be negated' ' + git config diff.functionnameregexp "!static" && + ! git diff Beer.java Beer-correct.java +' + +test_done diff --git a/xdiff-interface.c b/xdiff-interface.c index e407cf1..be866d1 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -129,3 +129,74 @@ int buffer_is_binary(const char *ptr, unsigned long size) size = FIRST_FEW_BYTES; return !!memchr(ptr, 0, size); } + +struct ff_regs { + int nr; + struct ff_reg { + regex_t re; + int negate; + } *array; +}; + +static long ff_regexp(const char *line, long len, + char *buffer, long buffer_size, void *priv) +{ + char *line_buffer = xstrndup(line, len); /* make NUL terminated */ + struct ff_regs *regs = priv; + regmatch_t pmatch[2]; + int result = 0, i; + + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + if (reg->negate ^ !!regexec(®->re, + line_buffer, 2, pmatch, 0)) { + free(line_buffer); + return -1; + } + } + i = pmatch[1].rm_so >= 0 ? 1 : 0; + line += pmatch[i].rm_so; + result = pmatch[i].rm_eo - pmatch[i].rm_so; + if (result > buffer_size) + result = buffer_size; + else + while (result > 0 && (isspace(line[result - 1]) || + line[result - 1] == '\n')) + result--; + memcpy(buffer, line, result); + free(line_buffer); + return result; +} + +void xdiff_set_find_func(xdemitconf_t *xecfg, const char *value) +{ + int i; + struct ff_regs *regs; + + xecfg->find_func = ff_regexp; + regs = xecfg->find_func_priv = xmalloc(sizeof(struct ff_regs)); + for (i = 0, regs->nr = 1; value[i]; i++) + if (value[i] == '\n') + regs->nr++; + regs->array = xmalloc(regs->nr * sizeof(struct ff_reg)); + for (i = 0; i < regs->nr; i++) { + struct ff_reg *reg = regs->array + i; + const char *ep = strchr(value, '\n'), *expression; + char *buffer = NULL; + + reg->negate = (*value == '!'); + if (reg->negate && i == regs->nr - 1) + die("Last expression must not be negated: %s", value); + if (*value == '!') + value++; + if (ep) + expression = buffer = xstrndup(value, ep - value); + else + expression = value; + if (regcomp(®->re, expression, 0)) + die("Invalid regexp to look for hunk header: %s", expression); + if (buffer) + free(buffer); + value = ep + 1; + } +} diff --git a/xdiff-interface.h b/xdiff-interface.h index 536f4e4..fb742db 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -20,4 +20,6 @@ int parse_hunk_header(char *line, int len, int read_mmfile(mmfile_t *ptr, const char *filename); int buffer_is_binary(const char *ptr, unsigned long size); +extern void xdiff_set_find_func(xdemitconf_t *xecfg, const char *line); + #endif diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 9402bb0..c00ddaa 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -73,9 +73,13 @@ typedef struct s_xdemitcb { int (*outf)(void *, mmbuffer_t *, int); } xdemitcb_t; +typedef long (*find_func_t)(const char *line, long line_len, char *buffer, long buffer_size, void *priv); + typedef struct s_xdemitconf { long ctxlen; unsigned long flags; + find_func_t find_func; + void *find_func_priv; } xdemitconf_t; typedef struct s_bdiffparam { diff --git a/xdiff/xemit.c b/xdiff/xemit.c index 4b6e639..d3d9c84 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -69,7 +69,24 @@ static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) { } -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { +static long def_ff(const char *rec, long len, char *buf, long sz, void *priv) +{ + if (len > 0 && + (isalpha((unsigned char)*rec) || /* identifier? */ + *rec == '_' || /* also identifier? */ + *rec == '$')) { /* identifiers from VMS and other esoterico */ + if (len > sz) + len = sz; + while (0 < len && isspace((unsigned char)rec[len - 1])) + len--; + memcpy(buf, rec, len); + return len; + } + return -1; +} + +static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll, + find_func_t ff, void *ff_priv) { /* * Be quite stupid about this for now. Find a line in the old file @@ -80,22 +97,12 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll) { const char *rec; long len; - *ll = 0; while (i-- > 0) { len = xdl_get_rec(xf, i, &rec); - if (len > 0 && - (isalpha((unsigned char)*rec) || /* identifier? */ - *rec == '_' || /* also identifier? */ - *rec == '$')) { /* mysterious GNU diff's invention */ - if (len > sz) - len = sz; - while (0 < len && isspace((unsigned char)rec[len - 1])) - len--; - memcpy(buf, rec, len); - *ll = len; + if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0) return; - } } + *ll = 0; } @@ -120,6 +127,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, xdchange_t *xch, *xche; char funcbuf[80]; long funclen = 0; + find_func_t ff = xecfg->find_func ? xecfg->find_func : def_ff; if (xecfg->flags & XDL_EMIT_COMMON) return xdl_emit_common(xe, xscr, ecb, xecfg); @@ -143,7 +151,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb, if (xecfg->flags & XDL_EMIT_FUNCNAMES) { xdl_find_func(&xe->xdf1, s1, funcbuf, - sizeof(funcbuf), &funclen); + sizeof(funcbuf), &funclen, + ff, xecfg->find_func_priv); } if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2, funcbuf, funclen, ecb) < 0) -- 1.5.3.rc0.822.g38609 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-06 8:39 ` [PATCH] Per-path attribute based hunk header selection Junio C Hamano @ 2007-07-06 12:38 ` Johannes Schindelin 2007-07-06 17:59 ` Linus Torvalds 1 sibling, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-06 12:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git Hi, On Fri, 6 Jul 2007, Junio C Hamano wrote: > diff --git a/diff.c b/diff.c > @@ -1143,6 +1154,82 @@ int diff_filespec_is_binary(struct diff_filespec *one) > return one->is_binary; > } > > +static struct hunk_header_regexp { > + char *name; > + char *regexp; > + struct hunk_header_regexp *next; > +} *hunk_header_regexp_list, **hunk_header_regexp_tail; > + > +static int hunk_header_config(const char *var, const char *value) > +{ > + static const char funcname[] = "funcname."; > + struct hunk_header_regexp *hh; > + > + if (prefixcmp(var, funcname)) > + return 0; > + var += strlen(funcname); > + for (hh = hunk_header_regexp_list; hh; hh = hh->next) > + if (!strcmp(var, hh->name)) { > + free(hh->regexp); > + hh->regexp = xstrdup(value); > + return 0; > + } Heh. By reordering your code, you could say if ((hh = hunk_header_regexp(var))) { free(hh->regexp); hh->regexp = xstrdup(value); return 0; } > + hh = xcalloc(1, sizeof(*hh)); > + hh->name = xstrdup(var); > + hh->regexp = xstrdup(value); > + hh->next = NULL; > + *hunk_header_regexp_tail = hh; > + return 0; > +} Is that tail expansion not overly complex? Why not just set "hh->next = hunk_header_regexp_list; hunk_header_regexp_list = hh"; Yes, your code seems correct, but I took some extra cycles to get at that impression. A "static int parsed_config_for_hunk_headers" would have helped, instead of reusing _tail for two purposes. And this variable could be set at the beginning of hunk_header_config(), so that hunk_header_regexp() is usable from inside hunk_header_config(). > +static const char *hunk_header_regexp(const char *ident) > +{ > + struct hunk_header_regexp *hh; > + > + if (!hunk_header_regexp_tail) { > + hunk_header_regexp_tail = &hunk_header_regexp_list; > + git_config(hunk_header_config); > + } > + for (hh = hunk_header_regexp_list; hh; hh = hh->next) > + if (!strcmp(ident, hh->name)) > + return hh->regexp; > + return NULL; > +} Another thing. These long names are a bit inconsistent. In the config, you name it "funcname". In xdiff, we name them "FUNCNAMES". Yes, here they are hunk_headers. Also, since the expressions are not strictly regular expressions, but lists of them, and with your idea they are even more different, why not just go for "funcname_list"? It's easier to read, and static anyway. Rest looks fine to me... Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-06 8:39 ` [PATCH] Per-path attribute based hunk header selection Junio C Hamano 2007-07-06 12:38 ` Johannes Schindelin @ 2007-07-06 17:59 ` Linus Torvalds 2007-07-07 2:00 ` Junio C Hamano 2007-07-07 10:12 ` [PATCH] Fix configuration syntax to specify customized hunk header patterns Junio C Hamano 1 sibling, 2 replies; 29+ messages in thread From: Linus Torvalds @ 2007-07-06 17:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git On Fri, 6 Jul 2007, Junio C Hamano wrote: > > This makes"diff -p" hunk headers customizable via gitattributes mechanism. > It is based on Johannes's earlier patch that allowed to define a single > regexp to be used for everything. Ok, I think this is really nice, but I do wonder a bit about the syntax. In particular, the "funcname" thing is really a pretty ugly special-case approach. Wouldn't it be nicer to consider the "funcname=java" to be less of a "special case for the built-in diff", and instead think of it as a more generic issue of "how do we want to generate diffs for java files?" IOW, wouldn't this be much nicer to be thought about as a "custon diff driver" issue? So I like your patches, but dislike the config syntax, and would suggest something like In .gitattributes: *.java diff=java *.perl diff=perl *.doc diff=doc In .git/config [diff "java"] command = internal funcname = ... # ugly and complicated regexp to override the built-in one. [diff "perl"] command = internal funcname = ... [diff "doc"] command = ms-doc-diff Doesn't this make more sense and mesh much better with the already existing custom diff driver? (And yeah, maybe we could instead of "command=internal" just have the rule that "internal" is the default, and you'd not have a command at all when you want to run the internal diff. Just an idea. I don't have any code. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-06 17:59 ` Linus Torvalds @ 2007-07-07 2:00 ` Junio C Hamano 2007-07-07 10:08 ` しらいしななこ 2007-07-09 3:22 ` Nicolas Pitre 2007-07-07 10:12 ` [PATCH] Fix configuration syntax to specify customized hunk header patterns Junio C Hamano 1 sibling, 2 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-07 2:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git Linus Torvalds <torvalds@linux-foundation.org> writes: > In .gitattributes: > > *.java diff=java > *.perl diff=perl > *.doc diff=doc > > In .git/config > > [diff "java"] > command = internal > funcname = ... # ugly and complicated regexp to override the built-in one. > > [diff "perl"] > command = internal > funcname = ... > > [diff "doc"] > command = ms-doc-diff > > Doesn't this make more sense and mesh much better with the already > existing custom diff driver? > > (And yeah, maybe we could instead of "command=internal" just have the rule > that "internal" is the default, and you'd not have a command at all when > you want to run the internal diff. > > Just an idea. I don't have any code. Yeah, I'd be lying if I said that this did not cross my mind when I saw existing diff.*.command handling. About the comment from Johannes regarding hunk_header vs funcname, I would actually prefer hunk_header, since that is what this is about ("funcname" and "find_func" were misnomer from the beginning), but I'd rename hunk_header to funcname for the sake of consistency and minimizing the diff. Will find time to look at this over the weekend. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 2:00 ` Junio C Hamano @ 2007-07-07 10:08 ` しらいしななこ 2007-07-07 10:22 ` Junio C Hamano 2007-07-09 3:22 ` Nicolas Pitre 1 sibling, 1 reply; 29+ messages in thread From: しらいしななこ @ 2007-07-07 10:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Johannes Schindelin, git Quoting Junio C Hamano <gitster@pobox.com>: > Yeah, I'd be lying if I said that this did not cross my mind > when I saw existing diff.*.command handling. > > About the comment from Johannes regarding hunk_header vs > funcname, I would actually prefer hunk_header, since that is > what this is about ("funcname" and "find_func" were misnomer > from the beginning), but I'd rename hunk_header to funcname for > the sake of consistency and minimizing the diff. I would love to see "diff=tex" attribute to work on my manuscripts, but I do not write C and do not understand the long length of backslashes very well. I guessed in the source file a backslash needs to be doubled, and what I want to match is "\section{", "\subsection{", and "\subsubsection{" at the beginning of lines, and attempted to do it like the patch at the end. It does not work well, however. It shows only part of lines. @@ -8,7 +8,8 @@ \section{ @@ -224,7 +225,7 @@ sub @@ -240,7 +241,7 @@ subsub I have no idea what am I doing wrong (truthfully, I do not know what I am doing, period). diff --git a/diff.c b/diff.c index 04e7e91..57f91b0 100644 --- a/diff.c +++ b/diff.c @@ -1226,6 +1226,8 @@ static const char *diff_hunk_header_regexp(struct diff_filespec *one) "^[ ]*\\(\\([ ]*" "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" "[ ]*([^;]*$\\)"; + if (!strcmp(ident, "tex")) + return "^\\\\\\(sub\\)*section{"; return NULL; } -- Nanako Shiraishi http://ivory.ap.teacup.com/nanako3/ ---------------------------------------------------------------------- Finally - A spam blocker that actually works. http://www.bluebottle.com/tag/4 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 10:08 ` しらいしななこ @ 2007-07-07 10:22 ` Junio C Hamano 2007-07-07 12:17 ` Johannes Schindelin 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-07 10:22 UTC (permalink / raw) To: しらいしななこ Cc: Linus Torvalds, Johannes Schindelin, git しらいしななこ <nanako3@bluebottle.com> writes: > I would love to see "diff=tex" attribute to work on my manuscripts, but I > do not write C and do not understand the long length of backslashes very > well. I guessed in the source file a backslash needs to be doubled, and > what I want to match is "\section{", "\subsection{", and "\subsubsection{" > at the beginning of lines, and attempted to do it like the patch at the > end. Heh, I do not speak TeX very well, so we are even ;-) > It does not work well, however. It shows only part of lines. > > @@ -8,7 +8,8 @@ \section{ > @@ -224,7 +225,7 @@ sub > @@ -240,7 +241,7 @@ subsub > > I have no idea what am I doing wrong (truthfully, I do not know what I am > doing, period). > > diff --git a/diff.c b/diff.c > index 04e7e91..57f91b0 100644 > --- a/diff.c > +++ b/diff.c > @@ -1226,6 +1226,8 @@ static const char *diff_hunk_header_regexp(struct diff_filespec *one) > "^[ ]*\\(\\([ ]*" > "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" > "[ ]*([^;]*$\\)"; > + if (!strcmp(ident, "tex")) > + return "^\\\\\\(sub\\)*section{"; > > return NULL; > } It's getting late so I won't be testing this myself tonight, but I think return "^\\(\\\\\\(sub\\)*section{.*\\)$"; would do the job. HOWEVER. Johannes, it strikes me that it is very odd having to add ".*$" at the end and to surround the whole thing in a parentheses. Shouldn't the ff_regexp() grabber simply pick the whole line? After all, that is what GNU "diff -p -F RE" does. Also this makes me realize that not all contents in the world are not programming language source files, and "funcname" is a misnomer. For this one, however, we _can_ blame GNU diff, as they call this --show-function-line option ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 10:22 ` Junio C Hamano @ 2007-07-07 12:17 ` Johannes Schindelin 2007-07-07 20:36 ` Junio C Hamano 2007-07-08 7:23 ` Junio C Hamano 0 siblings, 2 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-07 12:17 UTC (permalink / raw) To: Junio C Hamano Cc: しらいしななこ, Linus Torvalds, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 2527 bytes --] Hi, On Sat, 7 Jul 2007, Junio C Hamano wrote: > しらいしななこ <nanako3@bluebottle.com> writes: > > > I would love to see "diff=tex" attribute to work on my manuscripts, > > [...] > > It does not work well, however. It shows only part of lines. > > > > @@ -8,7 +8,8 @@ \section{ > > @@ -224,7 +225,7 @@ sub > > @@ -240,7 +241,7 @@ subsub > > > > I have no idea what am I doing wrong (truthfully, I do not know what I > > am doing, period). Those are regular expressions. Read more about them here: http://en.wikipedia.org/wiki/Regular_expression > > + if (!strcmp(ident, "tex")) > > + return "^\\\\\\(sub\\)*section{"; It is always easier, and will never require C skills, to put this into the config. With Junio's current version: echo '*.tex funcname=tex' >> .gitattributes echo '[funcname] tex = ^\(\\\(sub\)*section{.*\)' >> .git/config The problem is of course that the backslashes have to be escaped _both_ in C and in regexps. You could write that much simpler, though: \\[a-z]*section.* It would work the same, in practice, because if something like \supercoolsection is defined, you are likely wanting to match that, too. > Johannes, it strikes me that it is very odd having to add ".*$" at the > end and to surround the whole thing in a parentheses. Shouldn't the > ff_regexp() grabber simply pick the whole line? After all, that is what > GNU "diff -p -F RE" does. Yes, but then you can forget about your hierarchical idea. Or maybe not, since I am still awaiting a sane syntax for that, which would probably have to solve just one more problem. Here is a patch for the rest of the line thing: diff --git a/xdiff-interface.c b/xdiff-interface.c index be866d1..9503dfb 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -156,7 +156,7 @@ static long ff_regexp(const char *line, long len, } i = pmatch[1].rm_so >= 0 ? 1 : 0; line += pmatch[i].rm_so; - result = pmatch[i].rm_eo - pmatch[i].rm_so; + result = (int)len - pmatch[i].rm_so; if (result > buffer_size) result = buffer_size; else > Also this makes me realize that not all contents in the world are not > programming language source files, and "funcname" is a misnomer. For > this one, however, we _can_ blame GNU diff, as they call this > --show-function-line option ;-) Okay. How about sorting out the other technical issues first (diff.<type>.funcname instead of funcname.<type>), leaving the names alone, and then rename the remaining references of funcname to hunkheader? Ciao, Dscho ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 12:17 ` Johannes Schindelin @ 2007-07-07 20:36 ` Junio C Hamano 2007-07-08 7:23 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-07 20:36 UTC (permalink / raw) To: Johannes Schindelin Cc: しらいしななこ, Linus Torvalds, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> Johannes, it strikes me that it is very odd having to add ".*$" at the >> end and to surround the whole thing in a parentheses. Shouldn't the >> ff_regexp() grabber simply pick the whole line? After all, that is what >> GNU "diff -p -F RE" does. > > Yes, but then you can forget about your hierarchical idea. Yeah, I've brought it up just for discussion, but I no longer think the multi-line pattern that picks pieces and paste them together is worth it. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 12:17 ` Johannes Schindelin 2007-07-07 20:36 ` Junio C Hamano @ 2007-07-08 7:23 ` Junio C Hamano 2007-07-08 11:58 ` Johannes Schindelin 1 sibling, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-08 7:23 UTC (permalink / raw) To: Johannes Schindelin Cc: しらいしななこ, Linus Torvalds, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> > + if (!strcmp(ident, "tex")) >> > + return "^\\\\\\(sub\\)*section{"; > > It is always easier, and will never require C skills, to put this into the > config. With Junio's current version: > > echo '*.tex funcname=tex' >> .gitattributes > echo '[funcname] tex = ^\(\\\(sub\)*section{.*\)' >> .git/config > > The problem is of course that the backslashes have to be escaped _both_ in > C and in regexps. I think giving a reasonable set of basic language support as built-in patterns is important for usability. Otherwise the users end up needing to have them in their $HOME/.gitconfig. I am not sure if Java and LaTeX qualify as the first two most important cases, but they are what we already have demonstrated. How about doing something like this? -- >8 -- diff.c: make built-in hunk header pattern a separate table This would hopefully make it easier to maintain. Initially we would have "java" and "tex" defined, as they are the only ones we already have. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff --git a/diff.c b/diff.c index b8473f5..cd6b0c4 100644 --- a/diff.c +++ b/diff.c @@ -1216,9 +1216,22 @@ static const char *funcname_pattern(const char *ident) return NULL; } +static struct builtin_funcname_pattern { + const char *name; + const char *pattern; +} builtin_funcname_pattern[] = { + { "java", "!^[ ]*\\(catch\\|do\\|for\\|if\\|instanceof\\|" + "new\\|return\\|switch\\|throw\\|while\\)\n" + "^[ ]*\\(\\([ ]*" + "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" + "[ ]*([^;]*$\\)" }, + { "tex", "^\\(\\\\\\(sub\\)*section{.*\\)$" }, +}; + static const char *diff_funcname_pattern(struct diff_filespec *one) { const char *ident, *pattern; + int i; diff_filespec_check_attr(one); ident = one->funcname_pattern_ident; @@ -1240,12 +1253,9 @@ static const char *diff_funcname_pattern(struct diff_filespec *one) * And define built-in fallback patterns here. Note that * these can be overriden by the user's config settings. */ - if (!strcmp(ident, "java")) - return "!^[ ]*\\(catch\\|do\\|for\\|if\\|instanceof\\|" - "new\\|return\\|switch\\|throw\\|while\\)\n" - "^[ ]*\\(\\([ ]*" - "[A-Za-z_][A-Za-z_0-9]*\\)\\{2,\\}" - "[ ]*([^;]*$\\)"; + for (i = 0; i < ARRAY_SIZE(builtin_funcname_pattern); i++) + if (!strcmp(ident, builtin_funcname_pattern[i].name)) + return builtin_funcname_pattern[i].pattern; return NULL; } ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-08 7:23 ` Junio C Hamano @ 2007-07-08 11:58 ` Johannes Schindelin 0 siblings, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-08 11:58 UTC (permalink / raw) To: Junio C Hamano Cc: しらいしななこ, Linus Torvalds, git Hi, On Sun, 8 Jul 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> > + if (!strcmp(ident, "tex")) > >> > + return "^\\\\\\(sub\\)*section{"; > > > > It is always easier, and will never require C skills, to put this into the > > config. With Junio's current version: > > > > echo '*.tex funcname=tex' >> .gitattributes > > echo '[funcname] tex = ^\(\\\(sub\)*section{.*\)' >> .git/config > > > > The problem is of course that the backslashes have to be escaped _both_ in > > C and in regexps. > > I think giving a reasonable set of basic language support as > built-in patterns is important for usability. Otherwise the > users end up needing to have them in their $HOME/.gitconfig. I agree. > I am not sure if Java and LaTeX qualify as the first two most important > cases, but they are what we already have demonstrated. Java and LaTeX are the two languages I use, for which the inbuilt funcname default is absolutely unusable. Given that Java is the only language with a notable git-related code base, for which the default fails, Java is the prime (if not the only) reason I wanted this whole patch series in the first place. > How about doing something like this? I do not see how a table in diff.c could be easier to maintain than a small part in a small function in diff.c. It would make sense, though, to move the table to xdiff-interface.[ch]. Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-07 2:00 ` Junio C Hamano 2007-07-07 10:08 ` しらいしななこ @ 2007-07-09 3:22 ` Nicolas Pitre 2007-07-09 5:05 ` Junio C Hamano 1 sibling, 1 reply; 29+ messages in thread From: Nicolas Pitre @ 2007-07-09 3:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Johannes Schindelin, git On Fri, 6 Jul 2007, Junio C Hamano wrote: > About the comment from Johannes regarding hunk_header vs > funcname, I would actually prefer hunk_header, since that is > what this is about ("funcname" and "find_func" were misnomer > from the beginning), but I'd rename hunk_header to funcname for > the sake of consistency and minimizing the diff. I think "minimizing the diff" in this case is a bad reason. Using hunk_header is so much better than funcname IMHO. Nicolas ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-09 3:22 ` Nicolas Pitre @ 2007-07-09 5:05 ` Junio C Hamano 2007-07-09 11:40 ` Johannes Schindelin 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2007-07-09 5:05 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Johannes Schindelin, git Nicolas Pitre <nico@cam.org> writes: > On Fri, 6 Jul 2007, Junio C Hamano wrote: > >> About the comment from Johannes regarding hunk_header vs >> funcname, I would actually prefer hunk_header, since that is >> what this is about ("funcname" and "find_func" were misnomer >> from the beginning), but I'd rename hunk_header to funcname for >> the sake of consistency and minimizing the diff. > > I think "minimizing the diff" in this case is a bad reason. Using > hunk_header is so much better than funcname IMHO. Well, even then it turns out to be a good reason, as the patch to rename function and field can be a separate patch. After adding that "latex pattern" stuff, I am even more inclined to rename them. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Per-path attribute based hunk header selection. 2007-07-09 5:05 ` Junio C Hamano @ 2007-07-09 11:40 ` Johannes Schindelin 0 siblings, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-09 11:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Linus Torvalds, git Hi, On Sun, 8 Jul 2007, Junio C Hamano wrote: > Nicolas Pitre <nico@cam.org> writes: > > > On Fri, 6 Jul 2007, Junio C Hamano wrote: > > > >> About the comment from Johannes regarding hunk_header vs > >> funcname, I would actually prefer hunk_header, since that is > >> what this is about ("funcname" and "find_func" were misnomer > >> from the beginning), but I'd rename hunk_header to funcname for > >> the sake of consistency and minimizing the diff. > > > > I think "minimizing the diff" in this case is a bad reason. Using > > hunk_header is so much better than funcname IMHO. > > Well, even then it turns out to be a good reason, as the patch > to rename function and field can be a separate patch. After > adding that "latex pattern" stuff, I am even more inclined to > rename them. Not to mention that even the name "hunk_header_pattern_ident" would be a misnomer to begin with. It is the diff attribute we are storing there. Did you have any chance to look at http://thread.gmane.org/gmane.comp.version-control.git/51828 yet? That should clarify things, and http://article.gmane.org/gmane.comp.version-control.git/51829 on top of it should clarify things even more, besides making the code a little faster again. Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] Fix configuration syntax to specify customized hunk header patterns. 2007-07-06 17:59 ` Linus Torvalds 2007-07-07 2:00 ` Junio C Hamano @ 2007-07-07 10:12 ` Junio C Hamano 1 sibling, 0 replies; 29+ messages in thread From: Junio C Hamano @ 2007-07-07 10:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, git This updates the hunk header customization syntax. The special case 'funcname' attribute is gone. It also uses the name 'funcname" throughout, instead of "hunk_header". You assign the name of the type of contents to path's "diff" attribute as a string value in .gitattributes like this: *.java diff=java *.perl diff=perl *.doc diff=doc If you supply "diff.<name>.funcname" variable via the configuration mechanism (e.g. in $HOME/.gitconfig), the value is used as the regexp set to find the line to use for the hunk header (the variable is called "funcname" because such a line typically is the one that has the name of the function in programming language source text). If there is no such configuration, built-in default is used, if any. Currently there are two default patterns: default and java. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff.c | 147 +++++++++++++++++++++++++--------------------- diffcore.h | 2 +- t/t4018-diff-funcname.sh | 6 +- 3 files changed, 84 insertions(+), 71 deletions(-) diff --git a/diff.c b/diff.c index 04e7e91..21e61af 100644 --- a/diff.c +++ b/diff.c @@ -56,6 +56,14 @@ static struct ll_diff_driver { char *cmd; } *user_diff, **user_diff_tail; +static void read_config_if_needed(void) +{ + if (!user_diff_tail) { + user_diff_tail = &user_diff; + git_config(git_diff_ui_config); + } +} + /* * Currently there is only "diff.<drivername>.command" variable; * because there are "diff.color.<slot>" variables, we are parsing @@ -94,6 +102,45 @@ static int parse_lldiff_command(const char *var, const char *ep, const char *val } /* + * 'diff.<what>.funcname' attribute can be specified in the configuration + * to define a customized regexp to find the beginning of a function to + * be used for hunk header lines of "diff -p" style output. + */ +static struct funcname_pattern { + char *name; + char *pattern; + struct funcname_pattern *next; +} *funcname_pattern_list; + +static int parse_funcname_pattern(const char *var, const char *ep, const char *value) +{ + const char *name; + int namelen; + struct funcname_pattern *pp; + + name = var + 5; /* "diff." */ + namelen = ep - name; + + for (pp = funcname_pattern_list; pp; pp = pp->next) + if (!strncmp(pp->name, name, namelen) && !pp->name[namelen]) + break; + if (!pp) { + char *namebuf; + pp = xcalloc(1, sizeof(*pp)); + namebuf = xmalloc(namelen + 1); + memcpy(namebuf, name, namelen); + namebuf[namelen] = 0; + pp->name = namebuf; + pp->next = funcname_pattern_list; + funcname_pattern_list = pp; + } + if (pp->pattern) + free(pp->pattern); + pp->pattern = xstrdup(value); + return 0; +} + +/* * These are to give UI layer defaults. * The core-level commands such as git-diff-files should * never be affected by the setting of diff.renames @@ -122,8 +169,12 @@ int git_diff_ui_config(const char *var, const char *value) if (!prefixcmp(var, "diff.")) { const char *ep = strrchr(var, '.'); - if (ep != var + 4 && !strcmp(ep, ".command")) - return parse_lldiff_command(var, ep, value); + if (ep != var + 4) { + if (!strcmp(ep, ".command")) + return parse_lldiff_command(var, ep, value); + if (!strcmp(ep, ".funcname")) + return parse_funcname_pattern(var, ep, value); + } } if (!prefixcmp(var, "diff.color.") || !prefixcmp(var, "color.diff.")) { int slot = parse_diff_color_slot(var, 11); @@ -1101,43 +1152,39 @@ static void emit_binary_diff(mmfile_t *one, mmfile_t *two) static void setup_diff_attr_check(struct git_attr_check *check) { static struct git_attr *attr_diff; - static struct git_attr *attr_diff_func_name; if (!attr_diff) { attr_diff = git_attr("diff", 4); - attr_diff_func_name = git_attr("funcname", 8); } check[0].attr = attr_diff; - check[1].attr = attr_diff_func_name; } static void diff_filespec_check_attr(struct diff_filespec *one) { - struct git_attr_check attr_diff_check[2]; + struct git_attr_check attr_diff_check; if (one->checked_attr) return; - setup_diff_attr_check(attr_diff_check); + setup_diff_attr_check(&attr_diff_check); one->is_binary = 0; - one->hunk_header_ident = NULL; + one->funcname_pattern_ident = NULL; - if (!git_checkattr(one->path, ARRAY_SIZE(attr_diff_check), attr_diff_check)) { + if (!git_checkattr(one->path, 1, &attr_diff_check)) { const char *value; /* binaryness */ - value = attr_diff_check[0].value; + value = attr_diff_check.value; if (ATTR_TRUE(value)) ; else if (ATTR_FALSE(value)) one->is_binary = 1; - /* hunk header ident */ - value = attr_diff_check[1].value; + /* funcname pattern ident */ if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value)) ; else - one->hunk_header_ident = value; + one->funcname_pattern_ident = value; } if (!one->data && DIFF_FILE_VALID(one)) @@ -1154,54 +1201,23 @@ int diff_filespec_is_binary(struct diff_filespec *one) return one->is_binary; } -static struct hunk_header_regexp { - char *name; - char *regexp; - struct hunk_header_regexp *next; -} *hunk_header_regexp_list, **hunk_header_regexp_tail; - -static int hunk_header_config(const char *var, const char *value) +static const char *funcname_pattern(const char *ident) { - static const char funcname[] = "funcname."; - struct hunk_header_regexp *hh; + struct funcname_pattern *pp; - if (prefixcmp(var, funcname)) - return 0; - var += strlen(funcname); - for (hh = hunk_header_regexp_list; hh; hh = hh->next) - if (!strcmp(var, hh->name)) { - free(hh->regexp); - hh->regexp = xstrdup(value); - return 0; - } - hh = xcalloc(1, sizeof(*hh)); - hh->name = xstrdup(var); - hh->regexp = xstrdup(value); - hh->next = NULL; - *hunk_header_regexp_tail = hh; - return 0; -} - -static const char *hunk_header_regexp(const char *ident) -{ - struct hunk_header_regexp *hh; - - if (!hunk_header_regexp_tail) { - hunk_header_regexp_tail = &hunk_header_regexp_list; - git_config(hunk_header_config); - } - for (hh = hunk_header_regexp_list; hh; hh = hh->next) - if (!strcmp(ident, hh->name)) - return hh->regexp; + read_config_if_needed(); + for (pp = funcname_pattern_list; pp; pp = pp->next) + if (!strcmp(ident, pp->name)) + return pp->pattern; return NULL; } -static const char *diff_hunk_header_regexp(struct diff_filespec *one) +static const char *diff_funcname_pattern(struct diff_filespec *one) { - const char *ident, *regexp; + const char *ident, *pattern; diff_filespec_check_attr(one); - ident = one->hunk_header_ident; + ident = one->funcname_pattern_ident; if (!ident) /* @@ -1209,12 +1225,12 @@ static const char *diff_hunk_header_regexp(struct diff_filespec *one) * regexp is used; otherwise NULL is returned and xemit uses * the built-in default. */ - return hunk_header_regexp("default"); + return funcname_pattern("default"); /* Look up custom "funcname.$ident" regexp from config. */ - regexp = hunk_header_regexp(ident); - if (regexp) - return regexp; + pattern = funcname_pattern(ident); + if (pattern) + return pattern; /* * And define built-in fallback patterns here. Note that @@ -1304,11 +1320,11 @@ static void builtin_diff(const char *name_a, xdemitconf_t xecfg; xdemitcb_t ecb; struct emit_callback ecbdata; - const char *hunk_header_regexp; + const char *funcname_pattern; - hunk_header_regexp = diff_hunk_header_regexp(one); - if (!hunk_header_regexp) - hunk_header_regexp = diff_hunk_header_regexp(two); + funcname_pattern = diff_funcname_pattern(one); + if (!funcname_pattern) + funcname_pattern = diff_funcname_pattern(two); memset(&xecfg, 0, sizeof(xecfg)); memset(&ecbdata, 0, sizeof(ecbdata)); @@ -1318,8 +1334,8 @@ static void builtin_diff(const char *name_a, xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts; xecfg.ctxlen = o->context; xecfg.flags = XDL_EMIT_FUNCNAMES; - if (hunk_header_regexp) - xdiff_set_find_func(&xecfg, hunk_header_regexp); + if (funcname_pattern) + xdiff_set_find_func(&xecfg, funcname_pattern); if (!diffopts) ; else if (!prefixcmp(diffopts, "--unified=")) @@ -1862,10 +1878,7 @@ static const char *external_diff_attr(const char *name) !ATTR_UNSET(value)) { struct ll_diff_driver *drv; - if (!user_diff_tail) { - user_diff_tail = &user_diff; - git_config(git_diff_ui_config); - } + read_config_if_needed(); for (drv = user_diff; drv; drv = drv->next) if (!strcmp(drv->name, value)) return drv->cmd; diff --git a/diffcore.h b/diffcore.h index 0598514..eef17c4 100644 --- a/diffcore.h +++ b/diffcore.h @@ -27,7 +27,7 @@ struct diff_filespec { char *path; void *data; void *cnt_data; - const void *hunk_header_ident; + const char *funcname_pattern_ident; unsigned long size; int xfrm_flags; /* for use by the xfrm */ unsigned short mode; /* file mode */ diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index dc7a47b..f9db81d 100644 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -38,12 +38,12 @@ test_expect_success 'default behaviour' ' ' test_expect_success 'preset java pattern' ' - echo "*.java funcname=java" >.gitattributes && + echo "*.java diff=java" >.gitattributes && git diff Beer.java Beer-correct.java | grep "^@@.*@@ public static void main(" ' -git config funcname.java '!static +git config diff.java.funcname '!static !String [^ ].*s.*' @@ -53,7 +53,7 @@ test_expect_success 'custom pattern' ' ' test_expect_success 'last regexp must not be negated' ' - git config diff.functionnameregexp "!static" && + git config diff.java.funcname "!static" && ! git diff Beer.java Beer-correct.java ' -- 1.5.3.rc0.837.g4574c ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-05 5:00 ` Junio C Hamano 2007-07-05 5:02 ` [PATCH 1/2] Future-proof source for changes in xdemitconf_t Junio C Hamano 2007-07-05 5:02 ` [PATCH 2/2] WIP per-path attribute based hunk header selection Junio C Hamano @ 2007-07-05 8:24 ` Florian Weimer 2007-07-05 11:44 ` Johannes Schindelin 3 siblings, 0 replies; 29+ messages in thread From: Florian Weimer @ 2007-07-05 8:24 UTC (permalink / raw) To: git * Junio C. Hamano: > I think using multiple regexp is cute, but if we do that, it > should allow people to pick from: > > public class Beer > { > int special; > public static void main(String args[]) > { > ... modified part is here ... > > with two regexp matches, say: > > /^(public|private|protectd) class (.*)/ then > /^ +.* (\w*\(.*)$/ > > and define the hunk_header format as something like: > > "\[1,2]::\[2,1]" Even that doesn't work for C++ namespaces, or nested Java classes. 8-( If you want to do it right, you need to spawn some helper program. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] diff: add custom regular expressions for function names 2007-07-05 5:00 ` Junio C Hamano ` (2 preceding siblings ...) 2007-07-05 8:24 ` [PATCH 2/2] diff: add custom regular expressions for function names Florian Weimer @ 2007-07-05 11:44 ` Johannes Schindelin 3 siblings, 0 replies; 29+ messages in thread From: Johannes Schindelin @ 2007-07-05 11:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git Hi, On Wed, 4 Jul 2007, Junio C Hamano wrote: > *NOTE IN BIG RED LETTERS* > > I do not particularly like the way multiple regexps are used in > Johannes's patch, but I left it as-is, as that part of the > change is orthogonal from the support to use the gitattribute > mechanism, and the primary reason I got involved in this topic > is to give help around the latter area. I do not like it particularly, either. Until two days ago, that POSIX regexps suck so much, in that they do not even have a look-ahead, let alone a negative look-ahead. _That_ is what I wanted. Alas, that is the first valid argument I encounter that would speak _for_ pcre. It speaks volumes that _nobody_ pointed this out before, but all pcre fans put out bogus arguments. > I think using multiple regexp is cute, but if we do that, it > should allow people to pick from: > > public class Beer > { > int special; > public static void main(String args[]) > { > ... modified part is here ... > > with two regexp matches, say: > > /^(public|private|protectd) class (.*)/ then > /^ +.* (\w*\(.*)$/ > > and define the hunk_header format as something like: > > "\[1,2]::\[2,1]" > > meaning, "pick the second capture group from the match data of > the first regexp, followed by double-colon, and pick the first > capture group from the match data of the second regexp", to > result in "Beer::main(String args[])". You should be able > to pick "/package (\w+);/ then /^sub (\w+)/" in Perl code using > the same idea. > > I am not married to the syntax I used in the above examples, > though. Is that really necessary? But yeah, it is possible. You'd have to have some state in the struct, and introduce yet another escape (probably in place of the first line of the regexp), similar to *[2,2]::[1,1] *keep-searching-after-match 1 ^ +.* (\w*\(.*)$ ^(public|private|protected) class (.*) Of course, there _has_ to be a way to handle exceptions, such as in Perl, where you do not necessarily have a package, and thus do not want that silly "::" in front of the rest. But frankly, I think this is too complicated. Maybe somebody more intelligent than me can actually come up with a sane syntax and implementation of this, but at some point I think it is Good Enough(tm). And IMHO it is already good enough matching just one line. Ciao, Dscho ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-07-09 11:48 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-04 18:06 [PATCH 2/2] diff: add custom regular expressions for function names Johannes Schindelin 2007-07-04 18:41 ` Linus Torvalds 2007-07-04 18:45 ` Johannes Schindelin 2007-07-04 20:44 ` Junio C Hamano 2007-07-04 21:20 ` Johannes Schindelin 2007-07-05 5:00 ` Junio C Hamano 2007-07-05 5:02 ` [PATCH 1/2] Future-proof source for changes in xdemitconf_t Junio C Hamano 2007-07-05 11:01 ` Johannes Schindelin 2007-07-05 5:02 ` [PATCH 2/2] WIP per-path attribute based hunk header selection Junio C Hamano 2007-07-05 11:25 ` Johannes Schindelin 2007-07-05 16:43 ` Junio C Hamano 2007-07-06 8:37 ` [PATCH 1/3] Introduce diff_filespec_is_binary() Junio C Hamano 2007-07-06 12:20 ` Johannes Schindelin 2007-07-06 8:39 ` [PATCH] Per-path attribute based hunk header selection Junio C Hamano 2007-07-06 12:38 ` Johannes Schindelin 2007-07-06 17:59 ` Linus Torvalds 2007-07-07 2:00 ` Junio C Hamano 2007-07-07 10:08 ` しらいしななこ 2007-07-07 10:22 ` Junio C Hamano 2007-07-07 12:17 ` Johannes Schindelin 2007-07-07 20:36 ` Junio C Hamano 2007-07-08 7:23 ` Junio C Hamano 2007-07-08 11:58 ` Johannes Schindelin 2007-07-09 3:22 ` Nicolas Pitre 2007-07-09 5:05 ` Junio C Hamano 2007-07-09 11:40 ` Johannes Schindelin 2007-07-07 10:12 ` [PATCH] Fix configuration syntax to specify customized hunk header patterns Junio C Hamano 2007-07-05 8:24 ` [PATCH 2/2] diff: add custom regular expressions for function names Florian Weimer 2007-07-05 11:44 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).