* [PATCH 0/4] refactor the --color-words to make it more hackable
@ 2009-01-11 19:58 Johannes Schindelin
2009-01-11 19:59 ` [PATCH 1/4] Add color_fwrite(), a function coloring each line individually Johannes Schindelin
` (5 more replies)
0 siblings, 6 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 19:58 UTC (permalink / raw)
To: git, Thomas Rast
So the total change is pretty large, I have to admit.
But at least _I_ think it is easy to follow, and it actually makes the code
more readable/hackable. Correct me if I'm wrong.
The basic idea is to decouple the original text from the text that is
passed to libxdiff to find the word differences.
To that end, the words of the pre and post texts are put into two lists that
are fed to libxdiff. While the words are extracted, an array is created which
contains pointers back to the word boundaries in the original text.
To make the transition as easy to understand as possible, the code is first
refactored without actually changing what makes a word boundary.
Johannes Schindelin (4):
Add color_fwrite(), a function coloring each line individually
color-words: refactor word splitting and use ALLOC_GROW()
color-words: refactor to allow for 0-character word boundaries
color-words: take an optional regular expression describing words
color.c | 24 ++++++++
color.h | 1 +
diff.c | 185 +++++++++++++++++++++++++++++++++++++++------------------------
diff.h | 1 +
4 files changed, 141 insertions(+), 70 deletions(-)
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH 1/4] Add color_fwrite(), a function coloring each line individually
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
@ 2009-01-11 19:59 ` Johannes Schindelin
2009-01-11 22:43 ` Junio C Hamano
2009-01-11 19:59 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
` (4 subsequent siblings)
5 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 19:59 UTC (permalink / raw)
To: git, Thomas Rast
We have to set the color before every line and reset it before every
newline. Add a function color_fwrite() which does that for us.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
color.c | 24 ++++++++++++++++++++++++
color.h | 1 +
2 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/color.c b/color.c
index fc0b72a..bff24ac 100644
--- a/color.c
+++ b/color.c
@@ -191,3 +191,27 @@ int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...)
va_end(args);
return r;
}
+
+/*
+ * This function splits the buffer by newlines and colors the lines individually.
+ */
+void color_fwrite(FILE *f, const char *color, size_t count, const char *buf)
+{
+ if (!*color) {
+ fwrite(buf, count, 1, f);
+ return;
+ }
+ while (count) {
+ char *p = memchr(buf, '\n', count);
+ fputs(color, f);
+ fwrite(buf, p ? p - buf : count, 1, f);
+ fputs(COLOR_RESET, f);
+ if (!p)
+ return;
+ fputc('\n', f);
+ count -= p + 1 - buf;
+ buf = p + 1;
+ }
+}
+
+
diff --git a/color.h b/color.h
index 6cf5c88..9fb58f5 100644
--- a/color.h
+++ b/color.h
@@ -19,5 +19,6 @@ int git_config_colorbool(const char *var, const char *value, int stdout_is_tty);
void color_parse(const char *var, const char *value, char *dst);
int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
+void color_fwrite(FILE *f, const char *color, size_t count, const char *buf);
#endif /* COLOR_H */
--
1.6.1.186.g48f3bc4
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW()
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
2009-01-11 19:59 ` [PATCH 1/4] Add color_fwrite(), a function coloring each line individually Johannes Schindelin
@ 2009-01-11 19:59 ` Johannes Schindelin
2009-01-11 19:59 ` [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries Johannes Schindelin
` (3 subsequent siblings)
5 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 19:59 UTC (permalink / raw)
To: git, Thomas Rast
Word splitting is now performed by the function diff_words_fill(),
avoiding having the same code twice.
In the same spirit, avoid duplicating the code of ALLOC_GROW().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 40 +++++++++++++++++++---------------------
1 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/diff.c b/diff.c
index f67e0b2..6d87ea5 100644
--- a/diff.c
+++ b/diff.c
@@ -326,10 +326,7 @@ struct diff_words_buffer {
static void diff_words_append(char *line, unsigned long len,
struct diff_words_buffer *buffer)
{
- if (buffer->text.size + len > buffer->alloc) {
- buffer->alloc = (buffer->text.size + len) * 3 / 2;
- buffer->text.ptr = xrealloc(buffer->text.ptr, buffer->alloc);
- }
+ ALLOC_GROW(buffer->text.ptr, buffer->text.size + len, buffer->alloc);
line++;
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
@@ -398,6 +395,22 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+/*
+ * This function splits the words in buffer->text, and stores the list with
+ * newline separator into out.
+ */
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+{
+ int i;
+ out->size = buffer->text.size;
+ out->ptr = xmalloc(out->size);
+ memcpy(out->ptr, buffer->text.ptr, out->size);
+ for (i = 0; i < out->size; i++)
+ if (isspace(out->ptr[i]))
+ out->ptr[i] = '\n';
+ buffer->current = 0;
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -405,26 +418,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
- diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
- diff_words->plus.current = 0;
-
+ diff_words_fill(&diff_words->minus, &minus);
+ diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
--
1.6.1.186.g48f3bc4
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
2009-01-11 19:59 ` [PATCH 1/4] Add color_fwrite(), a function coloring each line individually Johannes Schindelin
2009-01-11 19:59 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
@ 2009-01-11 19:59 ` Johannes Schindelin
2009-01-11 23:08 ` Junio C Hamano
2009-01-12 8:47 ` Thomas Rast
2009-01-11 20:00 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
` (2 subsequent siblings)
5 siblings, 2 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 19:59 UTC (permalink / raw)
To: git, Thomas Rast
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, but then identified the text to
print out by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 150 +++++++++++++++++++++++++++++++++++-----------------------------
1 files changed, 82 insertions(+), 68 deletions(-)
diff --git a/diff.c b/diff.c
index 6d87ea5..2a3d301 100644
--- a/diff.c
+++ b/diff.c
@@ -319,8 +319,10 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
struct diff_words_buffer {
mmfile_t text;
long alloc;
- long current; /* output pointer */
- int suppressed_newline;
+ struct diff_words_orig {
+ const char *begin, *end;
+ } *orig;
+ int orig_nr, orig_alloc;
};
static void diff_words_append(char *line, unsigned long len,
@@ -335,80 +337,79 @@ static void diff_words_append(char *line, unsigned long len,
struct diff_words_data {
struct diff_words_buffer minus, plus;
+ const char *current_plus;
FILE *file;
};
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
- int suppress_newline)
+static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
- const char *ptr;
- int eol = 0;
+ struct diff_words_data *diff_words = priv;
+ int minus_first, minus_len, plus_first, plus_len;
+ const char *minus_begin, *minus_end, *plus_begin, *plus_end;
- if (len == 0)
+ if (line[0] != '@' || parse_hunk_header(line, len,
+ &minus_first, &minus_len, &plus_first, &plus_len))
return;
- ptr = buffer->text.ptr + buffer->current;
- buffer->current += len;
-
- if (ptr[len - 1] == '\n') {
- eol = 1;
- len--;
- }
-
- fputs(diff_get_color(1, color), file);
- fwrite(ptr, len, 1, file);
- fputs(diff_get_color(1, DIFF_RESET), file);
-
- if (eol) {
- if (suppress_newline)
- buffer->suppressed_newline = 1;
- else
- putc('\n', file);
- }
-}
-
-static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
-{
- struct diff_words_data *diff_words = priv;
+ minus_begin = diff_words->minus.orig[minus_first].begin;
+ minus_end = minus_len == 0 ? minus_begin :
+ diff_words->minus.orig[minus_first + minus_len - 1].end;
+ plus_begin = diff_words->plus.orig[plus_first].begin;
+ plus_end = plus_len == 0 ? plus_begin :
+ diff_words->plus.orig[plus_first + plus_len - 1].end;
- if (diff_words->minus.suppressed_newline) {
- if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
+ if (diff_words->current_plus != plus_begin)
+ fwrite(diff_words->current_plus,
+ plus_begin - diff_words->current_plus, 1,
+ diff_words->file);
+ if (minus_begin != minus_end)
+ color_fwrite(diff_words->file, diff_get_color(1, DIFF_FILE_OLD),
+ minus_end - minus_begin, minus_begin);
+ if (plus_begin != plus_end)
+ color_fwrite(diff_words->file, diff_get_color(1, DIFF_FILE_NEW),
+ plus_end - plus_begin, plus_begin);
- len--;
- switch (line[0]) {
- case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
- break;
- case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
- break;
- case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
- break;
- }
+ diff_words->current_plus = plus_end;
}
/*
- * This function splits the words in buffer->text, and stores the list with
- * newline separator into out.
+ * This function splits the words in buffer->text, stores the list with
+ * newline separator into out, and saves the offsets of the original words
+ * in buffer->orig.
*/
static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
{
- int i;
- out->size = buffer->text.size;
- out->ptr = xmalloc(out->size);
- memcpy(out->ptr, buffer->text.ptr, out->size);
- for (i = 0; i < out->size; i++)
- if (isspace(out->ptr[i]))
- out->ptr[i] = '\n';
- buffer->current = 0;
+ int i, j;
+
+ out->size = 0;
+ out->ptr = xmalloc(buffer->text.size);
+
+ /* fake an empty "0th" word */
+ ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
+ buffer->orig[0].begin = buffer->orig[0].end = buffer->text.ptr;
+ buffer->orig_nr = 1;
+
+ for (i = 0; i < buffer->text.size; i++) {
+ if (isspace(buffer->text.ptr[i]))
+ continue;
+ for (j = i + 1; j < buffer->text.size &&
+ !isspace(buffer->text.ptr[j]); j++)
+ ; /* find the end of the word */
+
+ /* store original boundaries */
+ ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
+ buffer->orig_alloc);
+ buffer->orig[buffer->orig_nr].begin = buffer->text.ptr + i;
+ buffer->orig[buffer->orig_nr].end = buffer->text.ptr + j;
+ buffer->orig_nr++;
+
+ /* store one word */
+ memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
+ out->ptr[out->size + j - i] = '\n';
+ out->size += j - i + 1;
+
+ i = j - 1;
+ }
}
/* this executes the word diff on the accumulated buffers */
@@ -419,22 +420,33 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitcb_t ecb;
mmfile_t minus, plus;
+ /* special case: only removal */
+ if (!diff_words->plus.text.size) {
+ color_fwrite(diff_words->file, diff_get_color(1, DIFF_FILE_OLD),
+ diff_words->minus.text.size, diff_words->minus.text.ptr);
+ diff_words->minus.text.size = 0;
+ return;
+ }
+
+ diff_words->current_plus = diff_words->plus.text.ptr;
+
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
diff_words_fill(&diff_words->minus, &minus);
diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
+ xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
free(minus.ptr);
free(plus.ptr);
+ if (diff_words->current_plus != diff_words->plus.text.ptr +
+ diff_words->plus.text.size)
+ fwrite(diff_words->current_plus,
+ diff_words->plus.text.ptr + diff_words->plus.text.size
+ - diff_words->current_plus, 1,
+ diff_words->file);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
-
- if (diff_words->minus.suppressed_newline) {
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
}
typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
@@ -458,7 +470,9 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
diff_words_show(ecbdata->diff_words);
free (ecbdata->diff_words->minus.text.ptr);
+ free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
+ free (ecbdata->diff_words->plus.orig);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
--
1.6.1.186.g48f3bc4
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 4/4] color-words: take an optional regular expression describing words
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
` (2 preceding siblings ...)
2009-01-11 19:59 ` [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries Johannes Schindelin
@ 2009-01-11 20:00 ` Johannes Schindelin
2009-01-11 21:53 ` [PATCH 0/4] refactor the --color-words to make it more hackable Thomas Rast
2009-01-14 13:00 ` Santi Béjar
5 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 20:00 UTC (permalink / raw)
To: git, Thomas Rast
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='^[A-Za-z0-9]*'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
diff.h | 1 +
2 files changed, 42 insertions(+), 8 deletions(-)
diff --git a/diff.c b/diff.c
index 2a3d301..d6bba72 100644
--- a/diff.c
+++ b/diff.c
@@ -333,12 +333,14 @@ static void diff_words_append(char *line, unsigned long len,
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
buffer->text.size += len;
+ buffer->text.ptr[buffer->text.size] = '\0';
}
struct diff_words_data {
struct diff_words_buffer minus, plus;
const char *current_plus;
FILE *file;
+ regex_t *word_regex;
};
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -372,17 +374,36 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
diff_words->current_plus = plus_end;
}
+static int find_word_boundary(mmfile_t *buffer, int i, regex_t *word_regex)
+{
+ if (i >= buffer->size)
+ return i;
+
+ if (word_regex) {
+ regmatch_t match[1];
+ if (!regexec(word_regex, buffer->ptr + i, 1, match, 0))
+ i += match[0].rm_eo;
+ }
+ else
+ while (i < buffer->size && !isspace(buffer->ptr[i]))
+ i++;
+
+ return i;
+}
+
/*
* This function splits the words in buffer->text, stores the list with
* newline separator into out, and saves the offsets of the original words
* in buffer->orig.
*/
-static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
+ regex_t *word_regex)
{
int i, j;
+ long alloc = 0;
out->size = 0;
- out->ptr = xmalloc(buffer->text.size);
+ out->ptr = NULL;
/* fake an empty "0th" word */
ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
@@ -390,11 +411,9 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr = 1;
for (i = 0; i < buffer->text.size; i++) {
- if (isspace(buffer->text.ptr[i]))
+ j = find_word_boundary(&buffer->text, i, word_regex);
+ if (i == j)
continue;
- for (j = i + 1; j < buffer->text.size &&
- !isspace(buffer->text.ptr[j]); j++)
- ; /* find the end of the word */
/* store original boundaries */
ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -404,6 +423,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr++;
/* store one word */
+ ALLOC_GROW(out->ptr, out->size + j - i + 1, alloc);
memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
out->ptr[out->size + j - i] = '\n';
out->size += j - i + 1;
@@ -432,8 +452,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- diff_words_fill(&diff_words->minus, &minus);
- diff_words_fill(&diff_words->plus, &plus);
+ diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
+ diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
xpp.flags = XDF_NEED_MINIMAL;
xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
@@ -473,6 +493,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
free (ecbdata->diff_words->plus.orig);
+ free(ecbdata->diff_words->word_regex);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
@@ -1495,6 +1516,14 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (o->word_regex) {
+ ecbdata.diff_words->word_regex = (regex_t *)
+ xmalloc(sizeof(regex_t));
+ if (regcomp(ecbdata.diff_words->word_regex,
+ o->word_regex, REG_EXTENDED))
+ die ("Invalid regular expression: %s",
+ o->word_regex);
+ }
}
xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
&xpp, &xecfg, &ecb);
@@ -2510,6 +2539,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_CLR(options, COLOR_DIFF);
else if (!strcmp(arg, "--color-words"))
options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ else if (!prefixcmp(arg, "--color-words=")) {
+ options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ options->word_regex = arg + 14;
+ }
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
int stat_width;
int stat_name_width;
+ const char *word_regex;
/* this is set by diffcore for DIFF_FORMAT_PATCH */
int found_changes;
--
1.6.1.186.g48f3bc4
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 0/4] refactor the --color-words to make it more hackable
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
` (3 preceding siblings ...)
2009-01-11 20:00 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
@ 2009-01-11 21:53 ` Thomas Rast
2009-01-11 23:02 ` Johannes Schindelin
2009-01-14 13:00 ` Santi Béjar
5 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-11 21:53 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 695 bytes --]
Johannes Schindelin wrote:
>
> But at least _I_ think it is easy to follow, and it actually makes the code
> more readable/hackable. Correct me if I'm wrong.
It indeed seems a sane approach. However, the final result segfaults
and/or prints garbage (on apparently every commit except very small
changes) when using the regex '\S+', which IMHO should give exactly
the same result as not using a regex at all. In git.git:
$ ./git-show --color-words='\S+' 7eb5bbdb645
Segmentation fault
$ ./git-show --color-words='\S+' d3240d935c4
[...garbled output...]
Segmentation fault
Plain --color-words is not affected.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 1/4] Add color_fwrite(), a function coloring each line individually
2009-01-11 19:59 ` [PATCH 1/4] Add color_fwrite(), a function coloring each line individually Johannes Schindelin
@ 2009-01-11 22:43 ` Junio C Hamano
2009-01-11 23:49 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-11 22:43 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Thomas Rast
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> +/*
> + * This function splits the buffer by newlines and colors the lines individually.
> + */
> +void color_fwrite(FILE *f, const char *color, size_t count, const char *buf)
Is it just me that this is grossly misnamed? It is not about fwrite of
count bytes starting at buf in the specified color. At list it should be
called color_fwrite_lines() or something like that.
> diff --git a/color.h b/color.h
> index 6cf5c88..9fb58f5 100644
> --- a/color.h
> +++ b/color.h
> @@ -19,5 +19,6 @@ int git_config_colorbool(const char *var, const char *value, int stdout_is_tty);
> void color_parse(const char *var, const char *value, char *dst);
> int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
> int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
> +void color_fwrite(FILE *f, const char *color, size_t count, const char *buf);
Also if other functions in the family all return int to indicate errors
and name the FILE * argument fp, I find it a very bad taste not to follow
their patterns without having a good reason (which I do not see).
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 0/4] refactor the --color-words to make it more hackable
2009-01-11 21:53 ` [PATCH 0/4] refactor the --color-words to make it more hackable Thomas Rast
@ 2009-01-11 23:02 ` Johannes Schindelin
2009-01-12 6:25 ` Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 23:02 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Hi,
On Sun, 11 Jan 2009, Thomas Rast wrote:
> Johannes Schindelin wrote:
> >
> > But at least _I_ think it is easy to follow, and it actually makes the code
> > more readable/hackable. Correct me if I'm wrong.
>
> It indeed seems a sane approach.
Thanks.
> However, the final result segfaults and/or prints garbage (on
> apparently every commit except very small changes) when using the regex
> '\S+', which IMHO should give exactly the same result as not using a
> regex at all.
No, it should not. The correct regex is '^\S+'.
As it happens, your regex matches _anything_ + non-whitespace.
Unfortunately, this includes a newline which utterly confuses the diff,
and therefore the code that tries to get the true offsets.
Consequently, it crashes.
> Plain --color-words is not affected.
Of course, I did not change anything outside the code path of
--color-words.
Ciao,
Dscho
-- snipsnap --
[PATCH] color-words: \n must not be a part of the word.
Allowing \n as part of a word is a pilot error, but that is not a
reason for the code to crash.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
diff.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/diff.c b/diff.c
index d6bba72..676eb79 100644
--- a/diff.c
+++ b/diff.c
@@ -381,8 +381,10 @@ static int find_word_boundary(mmfile_t *buffer, int i, regex_t *word_regex)
if (word_regex) {
regmatch_t match[1];
- if (!regexec(word_regex, buffer->ptr + i, 1, match, 0))
- i += match[0].rm_eo;
+ if (!regexec(word_regex, buffer->ptr + i, 1, match, 0)) {
+ char *p = memchr(buffer->ptr + i, '\n', match[0].rm_eo);
+ i = p ? p - buffer->ptr : match[0].rm_eo + i;
+ }
}
else
while (i < buffer->size && !isspace(buffer->ptr[i]))
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
2009-01-11 19:59 ` [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries Johannes Schindelin
@ 2009-01-11 23:08 ` Junio C Hamano
2009-01-11 23:38 ` Johannes Schindelin
2009-01-12 8:47 ` Thomas Rast
1 sibling, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-11 23:08 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Thomas Rast
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> This code was ugly, for a number of reasons:
> ...
> Fix all of these issues by processing the text such that
Looks much cleaner than the original. I didn't compare it with Thomas's,
but it seems he found some breakages, so I'd expect a second round
sometime in the future.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
2009-01-11 23:08 ` Junio C Hamano
@ 2009-01-11 23:38 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 23:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Thomas Rast
Hi,
On Sun, 11 Jan 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > This code was ugly, for a number of reasons:
> > ...
> > Fix all of these issues by processing the text such that
>
> Looks much cleaner than the original. I didn't compare it with
> Thomas's, but it seems he found some breakages, so I'd expect a second
> round sometime in the future.
Certainly,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 1/4] Add color_fwrite(), a function coloring each line individually
2009-01-11 22:43 ` Junio C Hamano
@ 2009-01-11 23:49 ` Johannes Schindelin
2009-01-11 23:49 ` [PATCH v2 " Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 23:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Thomas Rast
Hi,
On Sun, 11 Jan 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > +/*
> > + * This function splits the buffer by newlines and colors the lines individually.
> > + */
> > +void color_fwrite(FILE *f, const char *color, size_t count, const char *buf)
>
> Is it just me that this is grossly misnamed? It is not about fwrite of
> count bytes starting at buf in the specified color. At list it should be
> called color_fwrite_lines() or something like that.
>
> > diff --git a/color.h b/color.h
> > index 6cf5c88..9fb58f5 100644
> > --- a/color.h
> > +++ b/color.h
> > @@ -19,5 +19,6 @@ int git_config_colorbool(const char *var, const char *value, int stdout_is_tty);
> > void color_parse(const char *var, const char *value, char *dst);
> > int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
> > int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
> > +void color_fwrite(FILE *f, const char *color, size_t count, const char *buf);
>
> Also if other functions in the family all return int to indicate errors
> and name the FILE * argument fp, I find it a very bad taste not to follow
> their patterns without having a good reason (which I do not see).
Valid points.
Sorry,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH v2 1/4] Add color_fwrite(), a function coloring each line individually
2009-01-11 23:49 ` Johannes Schindelin
@ 2009-01-11 23:49 ` Johannes Schindelin
2009-01-12 1:27 ` Jakub Narebski
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-11 23:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Thomas Rast
We have to set the color before every line and reset it before every
newline. Add a function color_fwrite() which does that for us.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
color.c | 28 ++++++++++++++++++++++++++++
color.h | 1 +
2 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/color.c b/color.c
index fc0b72a..b028880 100644
--- a/color.c
+++ b/color.c
@@ -191,3 +191,31 @@ int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...)
va_end(args);
return r;
}
+
+/*
+ * This function splits the buffer by newlines and colors the lines individually.
+ *
+ * Returns 0 on success.
+ */
+int color_fwrite_lines(FILE *fp, const char *color,
+ size_t count, const char *buf)
+{
+ if (!*color)
+ return fwrite(buf, count, 1, fp) != 1;
+ while (count) {
+ char *p = memchr(buf, '\n', count);
+ if (fputs(color, fp) < 0 ||
+ fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
+ fputs(COLOR_RESET, fp) < 0)
+ return -1;
+ if (!p)
+ return 0;
+ if (fputc('\n', fp) < 0)
+ return -1;
+ count -= p + 1 - buf;
+ buf = p + 1;
+ }
+ return 0;
+}
+
+
diff --git a/color.h b/color.h
index 6cf5c88..cd5c985 100644
--- a/color.h
+++ b/color.h
@@ -19,5 +19,6 @@ int git_config_colorbool(const char *var, const char *value, int stdout_is_tty);
void color_parse(const char *var, const char *value, char *dst);
int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
+int color_fwrite_lines(FILE *fp, const char *color, size_t count, const char *buf);
#endif /* COLOR_H */
--
1.6.1.223.g50c8f
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH v2 1/4] Add color_fwrite(), a function coloring each line individually
2009-01-11 23:49 ` [PATCH v2 " Johannes Schindelin
@ 2009-01-12 1:27 ` Jakub Narebski
0 siblings, 0 replies; 109+ messages in thread
From: Jakub Narebski @ 2009-01-12 1:27 UTC (permalink / raw)
To: git
Johannes Schindelin wrote:
> We have to set the color before every line and reset it before every
> newline. Add a function color_fwrite() which does that for us.
color_fwrite_lines(), but I guess Junio can correct this himself.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 0/4] refactor the --color-words to make it more hackable
2009-01-11 23:02 ` Johannes Schindelin
@ 2009-01-12 6:25 ` Thomas Rast
0 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-12 6:25 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]
Johannes Schindelin wrote:
> On Sun, 11 Jan 2009, Thomas Rast wrote:
> > However, the final result segfaults and/or prints garbage (on
> > apparently every commit except very small changes) when using the regex
> > '\S+', which IMHO should give exactly the same result as not using a
> > regex at all.
>
> No, it should not. The correct regex is '^\S+'.
>
> As it happens, your regex matches _anything_ + non-whitespace.
It definitely doesn't(*).
Given ' word rest', '^\S+' would not match at all, and '\S+' would
match 'word'. No space there. However, at a cursory glance your
patch seems to ignore the rm_so member of match[0], so it'll never
know the difference.
While it might arguably make sense to enforce that only isspace()
characters are whitespace and !isspace() is at least part of _some_
(possibly one-character) word, I do not think it is a good idea to
require the anchoring of the user. If we need it, we must anchor the
match ourselves.
> Unfortunately, this includes a newline which utterly confuses the
> diff,
I do agree that matching a newline as part of a word is bad because we
need it for its diff separator semantics. Consider passing
REG_NEWLINE to regcomp() to reduce the risk of matching newlines via
things like [^\[:space:]].
(*) Well, modulo Junio's objection in the other thread that \S is
actually a PCRE extension. Substitute [^[:space:]] if your local
flavour doesn't understand it.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
2009-01-11 19:59 ` [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries Johannes Schindelin
2009-01-11 23:08 ` Junio C Hamano
@ 2009-01-12 8:47 ` Thomas Rast
2009-01-12 9:36 ` Junio C Hamano
1 sibling, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-12 8:47 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]
As a side remark, this patch makes a good use-case for --patience, and
is not isomorphic to the other edit-and-move examples; rather it's a
delete-and-edit.
Johannes Schindelin wrote:
> Subject: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
I do not think the term "refactor" is accurate. Wikipedia roughly
defines it as a code change that preserves all external semantics by
some standard method, and lists methods such as variable renaming,
common code extraction, etc. You are actually completely replacing
the algorithm "under the hood" with a new one, so no such standard
method applies.
And there is also a tiny semantic change: compare
A: a b c
B: x y z
^^
The old version implicitly generated an empty line at the double
spaces (marked ^^), which subsequently became context and caused the
words to be printed as follows, where <..> is old and [..] is new:
<a b >[x y ] <c>[z]
Your patched version does not generate empty lines for any space
whatsoever, not even for newlines. Thus the result is
<a b c>[x y z]
I think this is actually a good change, since it results in longer
chunks for "entirely rewritten" parts of the diff. It also answers
Junio's question in the other thread:
Junio C Hamano wrote:
>>
>> What happens if the input "language" does not have any inter-word spacing
>> but its words can still be expressed by regexp patterns?
>>
>> ImagineALanguageThatAllowsYouToWriteSomethingLikeThis. Does the mechanism
>> help users who want to do word-diff files written in such a language by
>> outputting:
>>
>> ImagineALanguage<red>That</red><green>Which</green>AllowsYou...
>>
>> when '[A-Z][a-z]*' is given by the word pattern?
Your patch handles this as a side-effect *even if the lines are
indented*, since no sequence of spaces whatsoever is special. (Mine
would have given hard-to-predict results based on the number of
newlines between them, and xdiff's decision whether the newlines or
the words are more valuable as context.)
So I think this is actually an improvement, but the commit message
should point out the change in semantics.
> +static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
> {
> + if (line[0] != '@' || parse_hunk_header(line, len,
> + &minus_first, &minus_len, &plus_first, &plus_len))
It would be nice to have a comment here that points out that this
method crucially relies on having context length 0 (just as the old
one crucially relied on having the full text in a single hunk).
> + for (i = 0; i < buffer->text.size; i++) {
> + if (isspace(buffer->text.ptr[i]))
> + continue;
I think it is this coupling of the loops to find a word, and to find a
word _beginning_, that comes back to haunt you in 4/4. If the outer
loop was strictly about the words, you could use the regex match info
to find the beginning in the regex case. This is probably cleaner
than attempting to force an anchored match, since at least the 'grep'
on my system takes '^^foo' to mean 'a "^foo" at the beginning of a
line', so you cannot just unconditionally insert a ^. (Conditionally
inserting one seems even harder.)
These remarks aside (and the last one is the only one of relevance to
the code), this patch would be a vast improvement of the code even if
we weren't discussing it in the context of the regex feature. So FWIW
Acked-by: Thomas Rast <trast@student.ethz.ch>
up to here. I hope we can agree on some sane regex semantics for
4/4...
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries
2009-01-12 8:47 ` Thomas Rast
@ 2009-01-12 9:36 ` Junio C Hamano
0 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2009-01-12 9:36 UTC (permalink / raw)
To: Thomas Rast; +Cc: Johannes Schindelin, git
Thomas Rast <trast@student.ethz.ch> writes:
> These remarks aside (and the last one is the only one of relevance to
> the code), this patch would be a vast improvement of the code even if
> we weren't discussing it in the context of the regex feature. So FWIW
>
> Acked-by: Thomas Rast <trast@student.ethz.ch>
>
> up to here. I hope we can agree on some sane regex semantics for
> 4/4...
Ok, although I've already queued your series to 'pu' for the night, I'll
drop and replace it with the one from Dscho. After a few more iteration
hopefully we can get it into a reasonable shape.
Thanks, both.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 0/4] refactor the --color-words to make it more hackable
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
` (4 preceding siblings ...)
2009-01-11 21:53 ` [PATCH 0/4] refactor the --color-words to make it more hackable Thomas Rast
@ 2009-01-14 13:00 ` Santi Béjar
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
5 siblings, 1 reply; 109+ messages in thread
From: Santi Béjar @ 2009-01-14 13:00 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Thomas Rast
2009/1/11 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>
[...]
> The basic idea is to decouple the original text from the text that is
> passed to libxdiff to find the word differences.
>
> To that end, the words of the pre and post texts are put into two lists that
> are fed to libxdiff. While the words are extracted, an array is created which
> contains pointers back to the word boundaries in the original text.
>
Thanks. With this I will no longer need to add some spurious spaces in
my latex files :-)
I've tested and it seems to work, but there are some corner cases that
it does not handle well. If you have this two files:
---8<--- pre
h(4)
a = b + c
---8<--- post
h(4),hh[44]
a = b + c
aa = a
aeff = aeff * ( aaa )
---8<---
The "git diff" is okay, but not the "git diff --color-words", the
addition of "aeff = ..." is not shown.
Additionally with "git diff --no-index --color-words='^[A-Za-z0-9]*'
the ']' character is not shown as an addition, and instead of the
"aeff" line you get a ")" in green, as:
h(4),{GREEN}hh[44{ENDGREEN}]
a = b + c
{GREEN}aa = a
){ENDGREEN}
Also if the lost text is at the end the next "diff --git" line is
printed in read:
--8<---
#!/bin/bash
git init
cat > file <<EOF
a
aa
EOF
cat > gfile <<EOF
a
EOF
git add .
git commit -m "Initial import"
git rm file
cat > gfile <<EOF
b
EOF
git add gfile
git commit -m "changes"
git show --color-words
---8<---
Thanks,
Santi
P.D.: I've test the version that is in 'pu', it does not have the
patch to fix the segfault but I've also tested with it.
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH take 3 0/4] color-words improvements
2009-01-14 13:00 ` Santi Béjar
@ 2009-01-14 17:49 ` Johannes Schindelin
2009-01-14 17:50 ` [PATCH 1/4] Add color_fwrite_lines(), a function coloring each line individually Johannes Schindelin
` (6 more replies)
0 siblings, 7 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 17:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
This series is getting bigger and bigger, unfortunately, just what I tried
to avoid.
But at least I am pretty comfortable with the readability of the result,
and it adds tests -- finally.
Changes relative to the last round: color_fwrite_lines() had problems with
empty lines, and find_word_boundary() was replaced by find_word_boundaries(),
which finds not only the end of the next word, but the start, too.
The only "funny" thing I realized is that the lines which are output
by emit_line() add a RESET at the end of the line, and I do not do that
in color_fwrite_lines().
Can anybody think of undesired behavior as a consequence?
Johannes Schindelin (4):
Add color_fwrite_lines(), a function coloring each line individually
color-words: refactor word splitting and use ALLOC_GROW()
color-words: change algorithm to allow for 0-character word
boundaries
color-words: take an optional regular expression describing words
Documentation/diff-options.txt | 6 +-
color.c | 28 ++++++
color.h | 1 +
diff.c | 203 ++++++++++++++++++++++++++--------------
diff.h | 1 +
t/t4034-diff-words.sh | 86 +++++++++++++++++
6 files changed, 253 insertions(+), 72 deletions(-)
create mode 100755 t/t4034-diff-words.sh
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH 1/4] Add color_fwrite_lines(), a function coloring each line individually
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
@ 2009-01-14 17:50 ` Johannes Schindelin
2009-01-14 17:50 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
` (5 subsequent siblings)
6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 17:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
We have to set the color before every line and reset it before every
newline. Add a function color_fwrite_lines() which does that for us.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
color.c | 28 ++++++++++++++++++++++++++++
color.h | 1 +
2 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/color.c b/color.c
index fc0b72a..d4ae83f 100644
--- a/color.c
+++ b/color.c
@@ -191,3 +191,31 @@ int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...)
va_end(args);
return r;
}
+
+/*
+ * This function splits the buffer by newlines and colors the lines individually.
+ *
+ * Returns 0 on success.
+ */
+int color_fwrite_lines(FILE *fp, const char *color,
+ size_t count, const char *buf)
+{
+ if (!*color)
+ return fwrite(buf, count, 1, fp) != 1;
+ while (count) {
+ char *p = memchr(buf, '\n', count);
+ if (p != buf && (fputs(color, fp) < 0 ||
+ fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
+ fputs(COLOR_RESET, fp) < 0))
+ return -1;
+ if (!p)
+ return 0;
+ if (fputc('\n', fp) < 0)
+ return -1;
+ count -= p + 1 - buf;
+ buf = p + 1;
+ }
+ return 0;
+}
+
+
diff --git a/color.h b/color.h
index 6cf5c88..cd5c985 100644
--- a/color.h
+++ b/color.h
@@ -19,5 +19,6 @@ int git_config_colorbool(const char *var, const char *value, int stdout_is_tty);
void color_parse(const char *var, const char *value, char *dst);
int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
+int color_fwrite_lines(FILE *fp, const char *color, size_t count, const char *buf);
#endif /* COLOR_H */
--
1.6.1.243.g4c9c5a
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW()
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
2009-01-14 17:50 ` [PATCH 1/4] Add color_fwrite_lines(), a function coloring each line individually Johannes Schindelin
@ 2009-01-14 17:50 ` Johannes Schindelin
2009-01-14 17:51 ` [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
` (4 subsequent siblings)
6 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 17:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
Word splitting is now performed by the function diff_words_fill(),
avoiding having the same code twice.
In the same spirit, avoid duplicating the code of ALLOC_GROW().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
This has not changed, actually. Just for your convenience.
diff.c | 40 +++++++++++++++++++---------------------
1 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/diff.c b/diff.c
index f67e0b2..6d87ea5 100644
--- a/diff.c
+++ b/diff.c
@@ -326,10 +326,7 @@ struct diff_words_buffer {
static void diff_words_append(char *line, unsigned long len,
struct diff_words_buffer *buffer)
{
- if (buffer->text.size + len > buffer->alloc) {
- buffer->alloc = (buffer->text.size + len) * 3 / 2;
- buffer->text.ptr = xrealloc(buffer->text.ptr, buffer->alloc);
- }
+ ALLOC_GROW(buffer->text.ptr, buffer->text.size + len, buffer->alloc);
line++;
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
@@ -398,6 +395,22 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+/*
+ * This function splits the words in buffer->text, and stores the list with
+ * newline separator into out.
+ */
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+{
+ int i;
+ out->size = buffer->text.size;
+ out->ptr = xmalloc(out->size);
+ memcpy(out->ptr, buffer->text.ptr, out->size);
+ for (i = 0; i < out->size; i++)
+ if (isspace(out->ptr[i]))
+ out->ptr[i] = '\n';
+ buffer->current = 0;
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -405,26 +418,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
- diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
- diff_words->plus.current = 0;
-
+ diff_words_fill(&diff_words->minus, &minus);
+ diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
--
1.6.1.243.g4c9c5a
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
2009-01-14 17:50 ` [PATCH 1/4] Add color_fwrite_lines(), a function coloring each line individually Johannes Schindelin
2009-01-14 17:50 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
@ 2009-01-14 17:51 ` Johannes Schindelin
2009-01-14 18:08 ` Johannes Schindelin
2009-01-14 17:51 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
` (3 subsequent siblings)
6 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 17:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
[-- Attachment #1: Type: TEXT/PLAIN, Size: 8793 bytes --]
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 153 +++++++++++++++++++++++++++----------------------
t/t4034-diff-words.sh | 62 ++++++++++++++++++++
2 files changed, 147 insertions(+), 68 deletions(-)
create mode 100755 t/t4034-diff-words.sh
diff --git a/diff.c b/diff.c
index 6d87ea5..fe8b1f0 100644
--- a/diff.c
+++ b/diff.c
@@ -319,8 +319,10 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
struct diff_words_buffer {
mmfile_t text;
long alloc;
- long current; /* output pointer */
- int suppressed_newline;
+ struct diff_words_orig {
+ const char *begin, *end;
+ } *orig;
+ int orig_nr, orig_alloc;
};
static void diff_words_append(char *line, unsigned long len,
@@ -335,80 +337,81 @@ static void diff_words_append(char *line, unsigned long len,
struct diff_words_data {
struct diff_words_buffer minus, plus;
+ const char *current_plus;
FILE *file;
};
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
- int suppress_newline)
-{
- const char *ptr;
- int eol = 0;
-
- if (len == 0)
- return;
-
- ptr = buffer->text.ptr + buffer->current;
- buffer->current += len;
-
- if (ptr[len - 1] == '\n') {
- eol = 1;
- len--;
- }
-
- fputs(diff_get_color(1, color), file);
- fwrite(ptr, len, 1, file);
- fputs(diff_get_color(1, DIFF_RESET), file);
-
- if (eol) {
- if (suppress_newline)
- buffer->suppressed_newline = 1;
- else
- putc('\n', file);
- }
-}
-
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words = priv;
+ int minus_first, minus_len, plus_first, plus_len;
+ const char *minus_begin, *minus_end, *plus_begin, *plus_end;
- if (diff_words->minus.suppressed_newline) {
- if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
+ if (line[0] != '@' || parse_hunk_header(line, len,
+ &minus_first, &minus_len, &plus_first, &plus_len))
+ return;
- len--;
- switch (line[0]) {
- case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
- break;
- case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
- break;
- case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
- break;
- }
+ minus_begin = diff_words->minus.orig[minus_first].begin;
+ minus_end = minus_len == 0 ? minus_begin :
+ diff_words->minus.orig[minus_first + minus_len - 1].end;
+ plus_begin = diff_words->plus.orig[plus_first].begin;
+ plus_end = plus_len == 0 ? plus_begin :
+ diff_words->plus.orig[plus_first + plus_len - 1].end;
+
+ if (diff_words->current_plus != plus_begin)
+ fwrite(diff_words->current_plus,
+ plus_begin - diff_words->current_plus, 1,
+ diff_words->file);
+ if (minus_begin != minus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ minus_end - minus_begin, minus_begin);
+ if (plus_begin != plus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_NEW),
+ plus_end - plus_begin, plus_begin);
+
+ diff_words->current_plus = plus_end;
}
/*
- * This function splits the words in buffer->text, and stores the list with
- * newline separator into out.
+ * This function splits the words in buffer->text, stores the list with
+ * newline separator into out, and saves the offsets of the original words
+ * in buffer->orig.
*/
static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
{
- int i;
- out->size = buffer->text.size;
- out->ptr = xmalloc(out->size);
- memcpy(out->ptr, buffer->text.ptr, out->size);
- for (i = 0; i < out->size; i++)
- if (isspace(out->ptr[i]))
- out->ptr[i] = '\n';
- buffer->current = 0;
+ int i, j;
+
+ out->size = 0;
+ out->ptr = xmalloc(buffer->text.size);
+
+ /* fake an empty "0th" word */
+ ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
+ buffer->orig[0].begin = buffer->orig[0].end = buffer->text.ptr;
+ buffer->orig_nr = 1;
+
+ for (i = 0; i < buffer->text.size; i++) {
+ if (isspace(buffer->text.ptr[i]))
+ continue;
+ for (j = i + 1; j < buffer->text.size &&
+ !isspace(buffer->text.ptr[j]); j++)
+ ; /* find the end of the word */
+
+ /* store original boundaries */
+ ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
+ buffer->orig_alloc);
+ buffer->orig[buffer->orig_nr].begin = buffer->text.ptr + i;
+ buffer->orig[buffer->orig_nr].end = buffer->text.ptr + j;
+ buffer->orig_nr++;
+
+ /* store one word */
+ memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
+ out->ptr[out->size + j - i] = '\n';
+ out->size += j - i + 1;
+
+ i = j - 1;
+ }
}
/* this executes the word diff on the accumulated buffers */
@@ -419,22 +422,34 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitcb_t ecb;
mmfile_t minus, plus;
+ /* special case: only removal */
+ if (!diff_words->plus.text.size) {
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ diff_words->minus.text.size, diff_words->minus.text.ptr);
+ diff_words->minus.text.size = 0;
+ return;
+ }
+
+ diff_words->current_plus = diff_words->plus.text.ptr;
+
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
diff_words_fill(&diff_words->minus, &minus);
diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
+ xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
free(minus.ptr);
free(plus.ptr);
+ if (diff_words->current_plus != diff_words->plus.text.ptr +
+ diff_words->plus.text.size)
+ fwrite(diff_words->current_plus,
+ diff_words->plus.text.ptr + diff_words->plus.text.size
+ - diff_words->current_plus, 1,
+ diff_words->file);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
-
- if (diff_words->minus.suppressed_newline) {
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
}
typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
@@ -458,7 +473,9 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
diff_words_show(ecbdata->diff_words);
free (ecbdata->diff_words->minus.text.ptr);
+ free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
+ free (ecbdata->diff_words->plus.orig);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
new file mode 100755
index 0000000..b032bd3
--- /dev/null
+++ b/t/t4034-diff-words.sh
@@ -0,0 +1,62 @@
+#!/bin/sh
+
+test_description='word diff colors'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+ git config diff.color.old red
+ git config diff.color.new green
+
+'
+
+decrypt_color () {
+ sed \
+ -e 's/.\[1m/<WHITE>/g' \
+ -e 's/.\[31m/<RED>/g' \
+ -e 's/.\[32m/<GREEN>/g' \
+ -e 's/.\[36m/<BROWN>/g' \
+ -e 's/.\[m/<RESET>/g'
+}
+
+cat > pre <<\EOF
+h(4)
+
+a = b + c
+EOF
+
+cat > post <<\EOF
+h(4),hh[44]
+
+a = b + c
+
+aa = a
+
+aeff = aeff * ( aaa )
+EOF
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'word diff with runs of whitespace' '
+
+ test_must_fail git diff --no-index --color-words pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+
+'
+
+test_done
--
1.6.1.243.g4c9c5a
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 4/4] color-words: take an optional regular expression describing words
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
` (2 preceding siblings ...)
2009-01-14 17:51 ` [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
@ 2009-01-14 17:51 ` Johannes Schindelin
2009-01-14 19:55 ` Thomas Rast
2009-01-14 18:54 ` [PATCH take 3 0/4] color-words improvements Teemu Likonen
` (2 subsequent siblings)
6 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 17:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/diff-options.txt | 6 +++-
diff.c | 64 ++++++++++++++++++++++++++++++++++-----
diff.h | 1 +
t/t4034-diff-words.sh | 24 +++++++++++++++
4 files changed, 85 insertions(+), 10 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 1f8ce97..e546bfa 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,8 +94,12 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words::
+--color-words[=regex]::
Show colored word diff, i.e. color words which have changed.
++
+Optionally, you can pass a regular expression that tells Git what the
+words are that you are looking for; The default is to interpret any
+stretch of non-whitespace as a word.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index fe8b1f0..d5d7171 100644
--- a/diff.c
+++ b/diff.c
@@ -333,12 +333,14 @@ static void diff_words_append(char *line, unsigned long len,
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
buffer->text.size += len;
+ buffer->text.ptr[buffer->text.size] = '\0';
}
struct diff_words_data {
struct diff_words_buffer minus, plus;
const char *current_plus;
FILE *file;
+ regex_t *word_regex;
};
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -374,17 +376,49 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
diff_words->current_plus = plus_end;
}
+/* This function starts looking at *begin, and returns 0 iff a word was found. */
+static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
+ int *begin, int *end)
+{
+ if (word_regex && *begin < buffer->size) {
+ regmatch_t match[1];
+ if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+ char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
+ '\n', match[0].rm_eo);
+ *end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
+ *begin += match[0].rm_so;
+ return *begin >= *end;
+ }
+ return -1;
+ }
+
+ /* find the next word */
+ while (*begin < buffer->size && isspace(buffer->ptr[*begin]))
+ (*begin)++;
+ if (*begin >= buffer->size)
+ return -1;
+
+ /* find the end of the word */
+ *end = *begin + 1;
+ while (*end < buffer->size && !isspace(buffer->ptr[*end]))
+ (*end)++;
+
+ return 0;
+}
+
/*
* This function splits the words in buffer->text, stores the list with
* newline separator into out, and saves the offsets of the original words
* in buffer->orig.
*/
-static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
+ regex_t *word_regex)
{
int i, j;
+ long alloc = 0;
out->size = 0;
- out->ptr = xmalloc(buffer->text.size);
+ out->ptr = NULL;
/* fake an empty "0th" word */
ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
@@ -392,11 +426,8 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr = 1;
for (i = 0; i < buffer->text.size; i++) {
- if (isspace(buffer->text.ptr[i]))
- continue;
- for (j = i + 1; j < buffer->text.size &&
- !isspace(buffer->text.ptr[j]); j++)
- ; /* find the end of the word */
+ if (find_word_boundaries(&buffer->text, word_regex, &i, &j))
+ return;
/* store original boundaries */
ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -406,6 +437,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr++;
/* store one word */
+ ALLOC_GROW(out->ptr, out->size + j - i + 1, alloc);
memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
out->ptr[out->size + j - i] = '\n';
out->size += j - i + 1;
@@ -435,9 +467,10 @@ static void diff_words_show(struct diff_words_data *diff_words)
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- diff_words_fill(&diff_words->minus, &minus);
- diff_words_fill(&diff_words->plus, &plus);
+ diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
+ diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
xpp.flags = XDF_NEED_MINIMAL;
+ /* as only the hunk header will be parsed, we need a 0-context */
xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
@@ -476,6 +509,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
free (ecbdata->diff_words->plus.orig);
+ free(ecbdata->diff_words->word_regex);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
@@ -1498,6 +1532,14 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (o->word_regex) {
+ ecbdata.diff_words->word_regex = (regex_t *)
+ xmalloc(sizeof(regex_t));
+ if (regcomp(ecbdata.diff_words->word_regex,
+ o->word_regex, REG_EXTENDED))
+ die ("Invalid regular expression: %s",
+ o->word_regex);
+ }
}
xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
&xpp, &xecfg, &ecb);
@@ -2513,6 +2555,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_CLR(options, COLOR_DIFF);
else if (!strcmp(arg, "--color-words"))
options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ else if (!prefixcmp(arg, "--color-words=")) {
+ options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ options->word_regex = arg + 14;
+ }
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
int stat_width;
int stat_name_width;
+ const char *word_regex;
/* this is set by diffcore for DIFF_FORMAT_PATCH */
int found_changes;
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index b032bd3..0ed7e53 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -59,4 +59,28 @@ test_expect_success 'word diff with runs of whitespace' '
'
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4),<GREEN>hh<RESET>[44]
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'word diff with a regular expression' '
+
+ test_must_fail git diff --no-index --color-words='[a-z]+' \
+ pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+
+'
+
test_done
--
1.6.1.243.g4c9c5a
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries
2009-01-14 17:51 ` [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
@ 2009-01-14 18:08 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 18:08 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git, Santi Béjar
Hi,
On Wed, 14 Jan 2009, Johannes Schindelin wrote:
> +test_expect_success setup '
> +
> + git config diff.color.old red
> + git config diff.color.new green
> +
> +'
Oops. This should probably go...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
` (3 preceding siblings ...)
2009-01-14 17:51 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
@ 2009-01-14 18:54 ` Teemu Likonen
2009-01-14 18:57 ` Teemu Likonen
2009-01-14 19:58 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
2009-01-14 19:46 ` [PATCH] color-words: make regex configurable via attributes Thomas Rast
2009-01-14 20:04 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
6 siblings, 2 replies; 109+ messages in thread
From: Teemu Likonen @ 2009-01-14 18:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Thomas Rast, git, Santi Béjar
Johannes Schindelin (2009-01-14 18:49 +0100) wrote:
> Can anybody think of undesired behavior as a consequence?
>
> Johannes Schindelin (4):
> Add color_fwrite_lines(), a function coloring each line individually
> color-words: refactor word splitting and use ALLOC_GROW()
> color-words: change algorithm to allow for 0-character word
> boundaries
> color-words: take an optional regular expression describing words
There is something I don't understand. Maybe it's a bug or maybe it's my
limitation. I'd appreciate if you care to explain the reason of the
following output. Suppose we have two files and the line diff looks like
this:
--- 1/a
+++ 2/b
@@ -1 +1 @@
-aaa (aaa)
+aaa (aaa) aaa
With --color-diff=a+ it looks like
aaa (aaa)aaa) aaa
^^^^~~~~ ~~~
^ = red, ~ = green
Why show changes in the "aaa)" part when it didn't actually change?
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 18:54 ` [PATCH take 3 0/4] color-words improvements Teemu Likonen
@ 2009-01-14 18:57 ` Teemu Likonen
2009-01-14 19:28 ` Johannes Schindelin
2009-01-14 19:58 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
1 sibling, 1 reply; 109+ messages in thread
From: Teemu Likonen @ 2009-01-14 18:57 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
Teemu Likonen (2009-01-14 20:54 +0200) wrote:
> With --color-diff=a+ it looks like
Obviously I meant --color-words=a+
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 18:57 ` Teemu Likonen
@ 2009-01-14 19:28 ` Johannes Schindelin
2009-01-14 19:32 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 19:28 UTC (permalink / raw)
To: Teemu Likonen; +Cc: git
Hi,
On Wed, 14 Jan 2009, Teemu Likonen wrote:
> Teemu Likonen (2009-01-14 20:54 +0200) wrote:
>
> > With --color-diff=a+ it looks like
>
> Obviously I meant --color-words=a+
Heh, I missed that, even... Thanks for the report!
-- snipsnap --
[WILL BE SQUASHED INTO 4/4] Fix find_word_boundaries()
Since newlines cannot be part of words, we have to stop at newlines even
if the regular expression's match contains one.
Of course, I fscked up the range where to look for the newline when I
changed the function from find_word_boundary().
---
diff.c | 2 +-
t/t4034-diff-words.sh | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/diff.c b/diff.c
index d5d7171..1408717 100644
--- a/diff.c
+++ b/diff.c
@@ -384,7 +384,7 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
regmatch_t match[1];
if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
- '\n', match[0].rm_eo);
+ '\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
*begin += match[0].rm_so;
return *begin >= *end;
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 0ed7e53..1137131 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -83,4 +83,24 @@ test_expect_success 'word diff with a regular expression' '
'
+echo 'aaa (aaa)' > pre
+echo 'aaa (aaa) aaa' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index c29453b..be22f37 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+aaa (aaa)<GREEN> aaa<RESET>
+EOF
+
+test_expect_success "Teemo's example" '
+
+ test_must_fail git diff --no-index --color-words='a+' pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+
+'
+
test_done
--
1.6.1.295.gb16478
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 19:28 ` Johannes Schindelin
@ 2009-01-14 19:32 ` Johannes Schindelin
2009-01-14 20:44 ` [PATCH replacement for take 3 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 19:32 UTC (permalink / raw)
To: Teemu Likonen; +Cc: git
Hi,
On Wed, 14 Jan 2009, Johannes Schindelin wrote:
> +aaa (aaa)<GREEN> aaa<RESET>
Of course, the space must be on the other side of the <GREEN>... All this
will be fixed, and more.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH] color-words: make regex configurable via attributes
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
` (4 preceding siblings ...)
2009-01-14 18:54 ` [PATCH take 3 0/4] color-words improvements Teemu Likonen
@ 2009-01-14 19:46 ` Thomas Rast
2009-01-14 20:12 ` Johannes Schindelin
2009-01-14 20:04 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
6 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 19:46 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Santi Béjar, Junio C Hamano
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
This is the old 3/4 combined with a test similar to the one it had in
the old 4/4, built on top of Dscho's take 3. I researched the
operators for each language, but the identifier and number formats may
be off in some cases.
Documentation/diff-options.txt | 3 +
Documentation/gitattributes.txt | 21 ++++++++++
diff.c | 10 +++++
t/t4034-diff-words.sh | 40 ++++++++++++++++++++
userdiff.c | 78 +++++++++++++++++++++++++++++++-------
userdiff.h | 1 +
6 files changed, 138 insertions(+), 15 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 2c1fa4b..ef0e2f5 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -97,6 +97,9 @@ endif::git-format-patch[]
Optionally, you can pass a regular expression that tells Git what the
words are that you are looking for; The default is to interpret any
stretch of non-whitespace as a word.
+The regex can also be set via a diff driver, see
+linkgit:gitattributes[1]; giving it explicitly overrides any diff
+driver setting.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8af22ec..17707ba 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -317,6 +317,8 @@ patterns are available:
- `bibtex` suitable for files with BibTeX coded references.
+- `cpp` suitable for source code in the C and C++ languages.
+
- `html` suitable for HTML/XHTML documents.
- `java` suitable for source code in the Java language.
@@ -334,6 +336,25 @@ patterns are available:
- `tex` suitable for source code for LaTeX documents.
+Customizing word diff
+^^^^^^^^^^^^^^^^^^^^^
+
+You can customize the rules that `git diff --color-words` uses to
+split words in a line, by specifying an appropriate regular expression
+in the "diff.*.wordregex" configuration variable. For example, in TeX
+a backslash followed by a sequence of letters forms a command, but
+several such commands can be run together without intervening
+whitespace. To separate them, use a regular expression such as
+
+------------------------
+[diff "tex"]
+ wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
+------------------------
+
+A built-in pattern is provided for all languages listed in the last
+section.
+
+
Performing text diffs of binary files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/diff.c b/diff.c
index eb67431..08bdc86 100644
--- a/diff.c
+++ b/diff.c
@@ -1372,6 +1372,12 @@ int diff_filespec_is_binary(struct diff_filespec *one)
return one->driver->funcname.pattern ? &one->driver->funcname : NULL;
}
+static const char *userdiff_word_regex(struct diff_filespec *one)
+{
+ diff_filespec_load_driver(one);
+ return one->driver->word_regex;
+}
+
void diff_set_mnemonic_prefix(struct diff_options *options, const char *a, const char *b)
{
if (!options->a_prefix)
@@ -1532,6 +1538,10 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(one);
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(two);
if (o->word_regex) {
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 0ed7e53..d6731d1 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -83,4 +83,44 @@ test_expect_success 'word diff with a regular expression' '
'
+cat > expect-by-chars <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'set a diff driver' '
+ git config diff.testdriver.wordregex "[^[:space:]]" &&
+ cat <<EOF > .gitattributes
+test_* diff=testdriver
+EOF
+'
+
+test_expect_success 'use default supplied by driver' '
+
+ test_must_fail git diff --no-index --color-words \
+ pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect-by-chars output.decrypted
+
+'
+
+test_expect_success 'option overrides default' '
+
+ test_must_fail git diff --no-index --color-words="[a-z]+" \
+ pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+
+'
+
test_done
diff --git a/userdiff.c b/userdiff.c
index 3681062..79f9cb9 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,14 +6,20 @@
static int ndrivers;
static int drivers_alloc;
-#define FUNCNAME(name, pattern) \
- { name, NULL, -1, { pattern, REG_EXTENDED } }
+#define PATTERNS(name, pattern, wordregex) \
+ { name, NULL, -1, { pattern, REG_EXTENDED }, NULL, wordregex }
static struct userdiff_driver builtin_drivers[] = {
-FUNCNAME("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$"),
-FUNCNAME("java",
+PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
+ "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("java",
"!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
- "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$"),
-FUNCNAME("objc",
+ "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$",
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]="
+ "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("objc",
/* Negate C statements that can look like functions */
"!^[ \t]*(do|for|if|else|return|switch|while)\n"
/* Objective-C methods */
@@ -21,20 +27,60 @@
/* C functions */
"^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n"
/* Objective-C class/protocol definitions */
- "^(@(implementation|interface|protocol)[ \t].*)$"),
-FUNCNAME("pascal",
+ "^(@(implementation|interface|protocol)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("pascal",
"^((procedure|function|constructor|destructor|interface|"
"implementation|initialization|finalization)[ \t]*.*)$"
"\n"
- "^(.*=[ \t]*(class|record).*)$"),
-FUNCNAME("php", "^[\t ]*((function|class).*)"),
-FUNCNAME("python", "^[ \t]*((class|def)[ \t].*)$"),
-FUNCNAME("ruby", "^[ \t]*((class|module|def)[ \t].*)$"),
-FUNCNAME("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$"),
-FUNCNAME("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"),
+ "^(.*=[ \t]*(class|record).*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|<>|<=|>=|:=|\\.\\."
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("php", "^[\t ]*((function|class).*)",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
+ "|[^[:space:]|[\x80-\xff]+"),
+ /* -- */
+PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
+ /* -- */
+ "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
+ "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
+ "|[^[:space:]|[\x80-\xff]+"),
+PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
+ "[={}\"]|[^={}\" \t]+"),
+PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
+ "\\\\[a-zA-Z@]+|[{}]|\\\\.|[^\\{} \t]+"),
+PATTERNS("cpp",
+ /* Jump targets or access declarations */
+ "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
+ /* C functions at top level */
+ "^([A-Za-z_][A-Za-z_0-9]*([ \t]+[A-Za-z_][A-Za-z_0-9]*){1,}[ \t]*\\([^;]*)$\n"
+ /* compound type at top level */
+ "^((struct|class|enum)[^;]*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
{ "default", NULL, -1, { NULL, 0 } },
};
-#undef FUNCNAME
+#undef PATTERNS
static struct userdiff_driver driver_true = {
"diff=true",
@@ -134,6 +180,8 @@ int userdiff_config(const char *k, const char *v)
return parse_string(&drv->external, k, v);
if ((drv = parse_driver(k, v, "textconv")))
return parse_string(&drv->textconv, k, v);
+ if ((drv = parse_driver(k, v, "wordregex")))
+ return parse_string(&drv->word_regex, k, v);
return 0;
}
diff --git a/userdiff.h b/userdiff.h
index ba29457..2aab13e 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -12,6 +12,7 @@ struct userdiff_driver {
int binary;
struct userdiff_funcname funcname;
const char *textconv;
+ const char *word_regex;
};
int userdiff_config(const char *k, const char *v);
--
1.6.1.140.ge720e.dirty
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 4/4] color-words: take an optional regular expression describing words
2009-01-14 17:51 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
@ 2009-01-14 19:55 ` Thomas Rast
0 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 19:55 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]
Johannes Schindelin wrote:
> ---color-words::
> +--color-words[=regex]::
> Show colored word diff, i.e. color words which have changed.
> ++
> +Optionally, you can pass a regular expression that tells Git what the
> +words are that you are looking for; The default is to interpret any
> +stretch of non-whitespace as a word.
Perhaps you could resurrect the documentation from my series, adjusted
for the different newline rule:
--color-words[=<regex>]::
Show colored word diff, i.e., color words which have changed.
By default, a new word only starts at whitespace, so that a
'word' is defined as a maximal sequence of non-whitespace
characters. The optional argument <regex> can be used to
configure this. It can also be set via a diff driver, see
linkgit:gitattributes[1]; if a <regex> is given explicitly, it
overrides any diff driver setting.
+
The <regex> must be an (extended) regular expression. When set, every
non-overlapping match of the <regex> is considered a word. Anything
between these matches is considered whitespace and ignored for the
purposes of finding differences. You may want to append
`|[^[:space:]]` to your regular expression to make sure that it
matches all non-whitespace characters. A match that contains a
newline is silently truncated at the newline.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 18:54 ` [PATCH take 3 0/4] color-words improvements Teemu Likonen
2009-01-14 18:57 ` Teemu Likonen
@ 2009-01-14 19:58 ` Thomas Rast
2009-01-14 22:06 ` Johannes Schindelin
1 sibling, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 19:58 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Johannes Schindelin, Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 518 bytes --]
Teemu Likonen wrote:
> -aaa (aaa)
> +aaa (aaa) aaa
Bug aside, examples like this one make me wonder if we should force a
"last resort" match for `[^[:space:]]`. For example,
-aaa [aaa]
+aaa (aaa) aaa
would still give you
aaa (aaa)<GREEN> aaa<RESET>
which may be unexpected.
Of course, when diffing a language where something other than the
"usual" whitespace should be ignored, this behaviour would be useful.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
` (5 preceding siblings ...)
2009-01-14 19:46 ` [PATCH] color-words: make regex configurable via attributes Thomas Rast
@ 2009-01-14 20:04 ` Thomas Rast
2009-01-14 21:07 ` Johannes Schindelin
6 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 20:04 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 592 bytes --]
Johannes Schindelin wrote:
>
> The only "funny" thing I realized is that the lines which are output
> by emit_line() add a RESET at the end of the line, and I do not do that
> in color_fwrite_lines().
Umm.... but you seem to do?
Ack on the new regex semantics, though I'd have implemented it via
dying on '\n' instead of silently splitting there (and restarting a
new match!). [I actually _have_ implemented it, but your patch beat
me to it. :-)]
Thus, Ack on 4/4 once the boundary bug is fixed. Thanks for your
work!
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] color-words: make regex configurable via attributes
2009-01-14 19:46 ` [PATCH] color-words: make regex configurable via attributes Thomas Rast
@ 2009-01-14 20:12 ` Johannes Schindelin
2009-01-14 20:17 ` Thomas Rast
` (4 more replies)
0 siblings, 5 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 20:12 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano
Hi,
On Wed, 14 Jan 2009, Thomas Rast wrote:
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 2c1fa4b..ef0e2f5 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -97,6 +97,9 @@ endif::git-format-patch[]
> Optionally, you can pass a regular expression that tells Git what the
> words are that you are looking for; The default is to interpret any
> stretch of non-whitespace as a word.
> +The regex can also be set via a diff driver, see
> +linkgit:gitattributes[1]; giving it explicitly overrides any diff
> +driver setting.
How about making this an extra paragraph?
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 8af22ec..17707ba 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -317,6 +317,8 @@ patterns are available:
>
> - `bibtex` suitable for files with BibTeX coded references.
>
> +- `cpp` suitable for source code in the C and C++ languages.
> +
How about "written in C or C++"?
> +A built-in pattern is provided for all languages listed in the last
> +section.
Wow. But how about "previous section"?
> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> index 0ed7e53..d6731d1 100755
> --- a/t/t4034-diff-words.sh
> +++ b/t/t4034-diff-words.sh
That was fast!
> +test_expect_success 'use default supplied by driver' '
> +
> + test_must_fail git diff --no-index --color-words \
> + pre post > output &&
> + decrypt_color < output > output.decrypted &&
> + test_cmp expect-by-chars output.decrypted
> +
> +'
I am actually just about to post new revisions of the last two patches
where this would read
test_expect_success 'use default supplied by driver' '
word_diff --color-words
'
instead...
I don't want to get bitten by stupid mistakes again, though, so I let it
run with valgrind while glancing over the code. Stay tuned.
> +#define PATTERNS(name, pattern, wordregex) \
> + { name, NULL, -1, { pattern, REG_EXTENDED }, NULL, wordregex }
You could get rid of that NULL if...
> diff --git a/userdiff.h b/userdiff.h
> index ba29457..2aab13e 100644
> --- a/userdiff.h
> +++ b/userdiff.h
> @@ -12,6 +12,7 @@ struct userdiff_driver {
> int binary;
> struct userdiff_funcname funcname;
> const char *textconv;
> + const char *word_regex;
> };
... you inserted word_regex before textconv. In a way, I find this more
logical, since both funcname and word_regex have sensible defaults
(provided by you), whereas textconv is strictly a user's option.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] color-words: make regex configurable via attributes
2009-01-14 20:12 ` Johannes Schindelin
@ 2009-01-14 20:17 ` Thomas Rast
2009-01-14 22:26 ` [PATCH 1/4] color-words: fix quoting in t4034 Thomas Rast
` (3 subsequent siblings)
4 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 20:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Santi Béjar, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]
Johannes Schindelin wrote:
> How about making this an extra paragraph?
Sure, why not. Though I'm still in favour of taking some longer
version, possibly from my old series.
> On Wed, 14 Jan 2009, Thomas Rast wrote:
> > +- `cpp` suitable for source code in the C and C++ languages.
> > +
>
> How about "written in C or C++"?
I was just trying to be consistent with all other items; all
programming languages are listed as "Foo language".
> > +A built-in pattern is provided for all languages listed in the last
> > +section.
>
> Wow. But how about "previous section"?
Indeed, thanks.
> > +#define PATTERNS(name, pattern, wordregex) \
> > + { name, NULL, -1, { pattern, REG_EXTENDED }, NULL, wordregex }
>
> You could get rid of that NULL if...
[...]
> ... you inserted word_regex before textconv. In a way, I find this more
> logical, since both funcname and word_regex have sensible defaults
> (provided by you), whereas textconv is strictly a user's option.
Ok, I'll do that.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH replacement for take 3 3/4] color-words: change algorithm to allow for 0-character word boundaries
2009-01-14 19:32 ` Johannes Schindelin
@ 2009-01-14 20:44 ` Johannes Schindelin
2009-01-14 20:46 ` [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 20:44 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: TEXT/PLAIN, Size: 8924 bytes --]
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
I changed the test script to avoid repeating the same three-command
mantra all the time.
diff.c | 153 +++++++++++++++++++++++++++----------------------
t/t4034-diff-words.sh | 66 +++++++++++++++++++++
2 files changed, 151 insertions(+), 68 deletions(-)
create mode 100755 t/t4034-diff-words.sh
diff --git a/diff.c b/diff.c
index 6d87ea5..fe8b1f0 100644
--- a/diff.c
+++ b/diff.c
@@ -319,8 +319,10 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
struct diff_words_buffer {
mmfile_t text;
long alloc;
- long current; /* output pointer */
- int suppressed_newline;
+ struct diff_words_orig {
+ const char *begin, *end;
+ } *orig;
+ int orig_nr, orig_alloc;
};
static void diff_words_append(char *line, unsigned long len,
@@ -335,80 +337,81 @@ static void diff_words_append(char *line, unsigned long len,
struct diff_words_data {
struct diff_words_buffer minus, plus;
+ const char *current_plus;
FILE *file;
};
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
- int suppress_newline)
-{
- const char *ptr;
- int eol = 0;
-
- if (len == 0)
- return;
-
- ptr = buffer->text.ptr + buffer->current;
- buffer->current += len;
-
- if (ptr[len - 1] == '\n') {
- eol = 1;
- len--;
- }
-
- fputs(diff_get_color(1, color), file);
- fwrite(ptr, len, 1, file);
- fputs(diff_get_color(1, DIFF_RESET), file);
-
- if (eol) {
- if (suppress_newline)
- buffer->suppressed_newline = 1;
- else
- putc('\n', file);
- }
-}
-
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words = priv;
+ int minus_first, minus_len, plus_first, plus_len;
+ const char *minus_begin, *minus_end, *plus_begin, *plus_end;
- if (diff_words->minus.suppressed_newline) {
- if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
+ if (line[0] != '@' || parse_hunk_header(line, len,
+ &minus_first, &minus_len, &plus_first, &plus_len))
+ return;
- len--;
- switch (line[0]) {
- case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
- break;
- case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
- break;
- case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
- break;
- }
+ minus_begin = diff_words->minus.orig[minus_first].begin;
+ minus_end = minus_len == 0 ? minus_begin :
+ diff_words->minus.orig[minus_first + minus_len - 1].end;
+ plus_begin = diff_words->plus.orig[plus_first].begin;
+ plus_end = plus_len == 0 ? plus_begin :
+ diff_words->plus.orig[plus_first + plus_len - 1].end;
+
+ if (diff_words->current_plus != plus_begin)
+ fwrite(diff_words->current_plus,
+ plus_begin - diff_words->current_plus, 1,
+ diff_words->file);
+ if (minus_begin != minus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ minus_end - minus_begin, minus_begin);
+ if (plus_begin != plus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_NEW),
+ plus_end - plus_begin, plus_begin);
+
+ diff_words->current_plus = plus_end;
}
/*
- * This function splits the words in buffer->text, and stores the list with
- * newline separator into out.
+ * This function splits the words in buffer->text, stores the list with
+ * newline separator into out, and saves the offsets of the original words
+ * in buffer->orig.
*/
static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
{
- int i;
- out->size = buffer->text.size;
- out->ptr = xmalloc(out->size);
- memcpy(out->ptr, buffer->text.ptr, out->size);
- for (i = 0; i < out->size; i++)
- if (isspace(out->ptr[i]))
- out->ptr[i] = '\n';
- buffer->current = 0;
+ int i, j;
+
+ out->size = 0;
+ out->ptr = xmalloc(buffer->text.size);
+
+ /* fake an empty "0th" word */
+ ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
+ buffer->orig[0].begin = buffer->orig[0].end = buffer->text.ptr;
+ buffer->orig_nr = 1;
+
+ for (i = 0; i < buffer->text.size; i++) {
+ if (isspace(buffer->text.ptr[i]))
+ continue;
+ for (j = i + 1; j < buffer->text.size &&
+ !isspace(buffer->text.ptr[j]); j++)
+ ; /* find the end of the word */
+
+ /* store original boundaries */
+ ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
+ buffer->orig_alloc);
+ buffer->orig[buffer->orig_nr].begin = buffer->text.ptr + i;
+ buffer->orig[buffer->orig_nr].end = buffer->text.ptr + j;
+ buffer->orig_nr++;
+
+ /* store one word */
+ memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
+ out->ptr[out->size + j - i] = '\n';
+ out->size += j - i + 1;
+
+ i = j - 1;
+ }
}
/* this executes the word diff on the accumulated buffers */
@@ -419,22 +422,34 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitcb_t ecb;
mmfile_t minus, plus;
+ /* special case: only removal */
+ if (!diff_words->plus.text.size) {
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ diff_words->minus.text.size, diff_words->minus.text.ptr);
+ diff_words->minus.text.size = 0;
+ return;
+ }
+
+ diff_words->current_plus = diff_words->plus.text.ptr;
+
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
diff_words_fill(&diff_words->minus, &minus);
diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
+ xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
free(minus.ptr);
free(plus.ptr);
+ if (diff_words->current_plus != diff_words->plus.text.ptr +
+ diff_words->plus.text.size)
+ fwrite(diff_words->current_plus,
+ diff_words->plus.text.ptr + diff_words->plus.text.size
+ - diff_words->current_plus, 1,
+ diff_words->file);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
-
- if (diff_words->minus.suppressed_newline) {
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
}
typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
@@ -458,7 +473,9 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
diff_words_show(ecbdata->diff_words);
free (ecbdata->diff_words->minus.text.ptr);
+ free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
+ free (ecbdata->diff_words->plus.orig);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
new file mode 100755
index 0000000..b22195f
--- /dev/null
+++ b/t/t4034-diff-words.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='word diff colors'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+ git config diff.color.old red
+ git config diff.color.new green
+
+'
+
+decrypt_color () {
+ sed \
+ -e 's/.\[1m/<WHITE>/g' \
+ -e 's/.\[31m/<RED>/g' \
+ -e 's/.\[32m/<GREEN>/g' \
+ -e 's/.\[36m/<BROWN>/g' \
+ -e 's/.\[m/<RESET>/g'
+}
+
+word_diff () {
+ test_must_fail git diff --no-index "$@" pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+}
+
+cat > pre <<\EOF
+h(4)
+
+a = b + c
+EOF
+
+cat > post <<\EOF
+h(4),hh[44]
+
+a = b + c
+
+aa = a
+
+aeff = aeff * ( aaa )
+EOF
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'word diff with runs of whitespace' '
+
+ word_diff --color-words
+
+'
+
+test_done
--
1.6.1.295.g5d331
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-14 20:44 ` [PATCH replacement for take 3 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
@ 2009-01-14 20:46 ` Johannes Schindelin
2009-01-15 0:32 ` Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 20:46 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
This basically contains the fix I sent earlier.
As for the documentation, I would not have any issue with your
patch replacing my documentation in favor of yours.
Documentation/diff-options.txt | 6 +++-
diff.c | 64 ++++++++++++++++++++++++++++++++++-----
diff.h | 1 +
t/t4034-diff-words.sh | 39 ++++++++++++++++++++++++
4 files changed, 100 insertions(+), 10 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 1f8ce97..e546bfa 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,8 +94,12 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words::
+--color-words[=regex]::
Show colored word diff, i.e. color words which have changed.
++
+Optionally, you can pass a regular expression that tells Git what the
+words are that you are looking for; The default is to interpret any
+stretch of non-whitespace as a word.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index fe8b1f0..1408717 100644
--- a/diff.c
+++ b/diff.c
@@ -333,12 +333,14 @@ static void diff_words_append(char *line, unsigned long len,
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
buffer->text.size += len;
+ buffer->text.ptr[buffer->text.size] = '\0';
}
struct diff_words_data {
struct diff_words_buffer minus, plus;
const char *current_plus;
FILE *file;
+ regex_t *word_regex;
};
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -374,17 +376,49 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
diff_words->current_plus = plus_end;
}
+/* This function starts looking at *begin, and returns 0 iff a word was found. */
+static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
+ int *begin, int *end)
+{
+ if (word_regex && *begin < buffer->size) {
+ regmatch_t match[1];
+ if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+ char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
+ '\n', match[0].rm_eo - match[0].rm_so);
+ *end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
+ *begin += match[0].rm_so;
+ return *begin >= *end;
+ }
+ return -1;
+ }
+
+ /* find the next word */
+ while (*begin < buffer->size && isspace(buffer->ptr[*begin]))
+ (*begin)++;
+ if (*begin >= buffer->size)
+ return -1;
+
+ /* find the end of the word */
+ *end = *begin + 1;
+ while (*end < buffer->size && !isspace(buffer->ptr[*end]))
+ (*end)++;
+
+ return 0;
+}
+
/*
* This function splits the words in buffer->text, stores the list with
* newline separator into out, and saves the offsets of the original words
* in buffer->orig.
*/
-static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
+ regex_t *word_regex)
{
int i, j;
+ long alloc = 0;
out->size = 0;
- out->ptr = xmalloc(buffer->text.size);
+ out->ptr = NULL;
/* fake an empty "0th" word */
ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
@@ -392,11 +426,8 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr = 1;
for (i = 0; i < buffer->text.size; i++) {
- if (isspace(buffer->text.ptr[i]))
- continue;
- for (j = i + 1; j < buffer->text.size &&
- !isspace(buffer->text.ptr[j]); j++)
- ; /* find the end of the word */
+ if (find_word_boundaries(&buffer->text, word_regex, &i, &j))
+ return;
/* store original boundaries */
ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -406,6 +437,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr++;
/* store one word */
+ ALLOC_GROW(out->ptr, out->size + j - i + 1, alloc);
memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
out->ptr[out->size + j - i] = '\n';
out->size += j - i + 1;
@@ -435,9 +467,10 @@ static void diff_words_show(struct diff_words_data *diff_words)
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- diff_words_fill(&diff_words->minus, &minus);
- diff_words_fill(&diff_words->plus, &plus);
+ diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
+ diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
xpp.flags = XDF_NEED_MINIMAL;
+ /* as only the hunk header will be parsed, we need a 0-context */
xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
@@ -476,6 +509,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
free (ecbdata->diff_words->plus.orig);
+ free(ecbdata->diff_words->word_regex);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
@@ -1498,6 +1532,14 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (o->word_regex) {
+ ecbdata.diff_words->word_regex = (regex_t *)
+ xmalloc(sizeof(regex_t));
+ if (regcomp(ecbdata.diff_words->word_regex,
+ o->word_regex, REG_EXTENDED))
+ die ("Invalid regular expression: %s",
+ o->word_regex);
+ }
}
xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
&xpp, &xecfg, &ecb);
@@ -2513,6 +2555,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_CLR(options, COLOR_DIFF);
else if (!strcmp(arg, "--color-words"))
options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ else if (!prefixcmp(arg, "--color-words=")) {
+ options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ options->word_regex = arg + 14;
+ }
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
int stat_width;
int stat_name_width;
+ const char *word_regex;
/* this is set by diffcore for DIFF_FORMAT_PATCH */
int found_changes;
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index b22195f..f4810e9 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -63,4 +63,43 @@ test_expect_success 'word diff with runs of whitespace' '
'
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4),<GREEN>hh<RESET>[44]
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa<RESET> )
+EOF
+
+test_expect_success 'word diff with a regular expression' '
+
+ word_diff --color-words='[a-z]+'
+
+'
+
+echo 'aaa (aaa)' > pre
+echo 'aaa (aaa) aaa' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index c29453b..be22f37 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+aaa (aaa) <GREEN>aaa<RESET>
+EOF
+
+test_expect_success "test parsing words for newline" '
+
+ word_diff --color-words='a+'
+
+'
+
test_done
--
1.6.1.295.g5d331
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 20:04 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
@ 2009-01-14 21:07 ` Johannes Schindelin
2009-01-14 22:37 ` Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 21:07 UTC (permalink / raw)
To: Thomas Rast; +Cc: Junio C Hamano, git, Santi Béjar
Hi,
On Wed, 14 Jan 2009, Thomas Rast wrote:
> Johannes Schindelin wrote:
> >
> > The only "funny" thing I realized is that the lines which are output
> > by emit_line() add a RESET at the end of the line, and I do not do that
> > in color_fwrite_lines().
>
> Umm.... but you seem to do?
Oh, right! I think the culprit is in fn_out_diff_words_aux(), which calls
fwrite() directly for the common words.
> Ack on the new regex semantics, though I'd have implemented it via dying
> on '\n' instead of silently splitting there (and restarting a new
> match!).
Hmm. I'd rather not die() in the middle of it.
Maybe we can even handle newlines correctly by replacing them with NULs
which libxdiff handles just fine?
> Thus, Ack on 4/4 once the boundary bug is fixed. Thanks for your work!
Phew. I was almost convinced you would hate me for my criticiscm.
Thanks,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 19:58 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
@ 2009-01-14 22:06 ` Johannes Schindelin
2009-01-14 22:11 ` Thomas Rast
` (2 more replies)
0 siblings, 3 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 22:06 UTC (permalink / raw)
To: Thomas Rast; +Cc: Teemu Likonen, Junio C Hamano, git, Santi Béjar
Hi,
On Wed, 14 Jan 2009, Thomas Rast wrote:
> Teemu Likonen wrote:
> > -aaa (aaa)
> > +aaa (aaa) aaa
>
> Bug aside, examples like this one make me wonder if we should force a
> "last resort" match for `[^[:space:]]`. For example,
>
> -aaa [aaa]
> +aaa (aaa) aaa
>
> would still give you
>
> aaa (aaa)<GREEN> aaa<RESET>
>
> which may be unexpected.
But why should it be unexpected? If people say that every length of "a"
makes a word, and consequently everything else is clutter, then that's
that, no?
So people might be surprised, but then they should have said something
like
[-.+#@"'$%^&*([{<>~|]*[A-Za-z][A-Za-z0-9]*[-.+#@"'$%&*)\]}>|]*
instead.
Although I have to say that for some applications, it is a pity that
even POSIX extended regular expressions knows neither lookahead nor
lookbehind.
Which reminds me... should we activate REG_EXTENDED by default?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 22:06 ` Johannes Schindelin
@ 2009-01-14 22:11 ` Thomas Rast
2009-01-14 22:24 ` Boyd Stephen Smith Jr.
2009-01-15 4:56 ` Teemu Likonen
2 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Teemu Likonen, Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 263 bytes --]
Johannes Schindelin wrote:
> Which reminds me... should we activate REG_EXTENDED by default?
We (you :-) do, and I think so. Consider that funcname is not even
documented any more, in favour of xfuncname.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 22:06 ` Johannes Schindelin
2009-01-14 22:11 ` Thomas Rast
@ 2009-01-14 22:24 ` Boyd Stephen Smith Jr.
2009-01-15 4:56 ` Teemu Likonen
2 siblings, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-14 22:24 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Thomas Rast, Teemu Likonen, Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]
On Wednesday 2009 January 14 16:06:48 Johannes Schindelin wrote:
>On Wed, 14 Jan 2009, Thomas Rast wrote:
>> Bug aside, examples like this one make me wonder if we should force a
>> "last resort" match for `[^[:space:]]`. For example,
>>
>> -aaa [aaa]
>> +aaa (aaa) aaa
>>
>> would still give you
>>
>> aaa (aaa)<GREEN> aaa<RESET>
>>
>> which may be unexpected.
>
>But why should it be unexpected? If people say that every length of "a"
>makes a word, and consequently everything else is clutter, then that's
>that, no?
I think some people are going to have problems with the strict dichotomy
between "part of a word" and "ignorable whitespace" that is being set up.
It makes sense technically, but it could confuse.
Imagine with --diff-words=[A-Z][A-Za-z]* and the following change:
-To be Or Not To be.
+To ignore Or Not To treat whitespace differently.
I think there is value in being able to ignore anything that's not a word,
so the documentation that mentions adding '|[^[:space:]]' to your regex
seems sufficient to me.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH 1/4] color-words: fix quoting in t4034
2009-01-14 20:12 ` Johannes Schindelin
2009-01-14 20:17 ` Thomas Rast
@ 2009-01-14 22:26 ` Thomas Rast
2009-01-14 22:41 ` Johannes Schindelin
2009-01-14 22:26 ` [PATCH 2/4] color-words: enable REG_NEWLINE to help user Thomas Rast
` (2 subsequent siblings)
4 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:26 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Santi Béjar, Junio C Hamano
Since the single quotes match the ones used to quote the test text
itself, they'd be dropped. Use double quotes instead.
---
I'd squash this into Dscho's 4/4, so no SoB.
t/t4034-diff-words.sh | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f4810e9..6ad1c1f 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -80,7 +80,7 @@ EOF
test_expect_success 'word diff with a regular expression' '
- word_diff --color-words='[a-z]+'
+ word_diff --color-words="[a-z]+"
'
@@ -98,7 +98,7 @@ EOF
test_expect_success "test parsing words for newline" '
- word_diff --color-words='a+'
+ word_diff --color-words="a+"
'
--
1.6.1.142.ge070e
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 2/4] color-words: enable REG_NEWLINE to help user
2009-01-14 20:12 ` Johannes Schindelin
2009-01-14 20:17 ` Thomas Rast
2009-01-14 22:26 ` [PATCH 1/4] color-words: fix quoting in t4034 Thomas Rast
@ 2009-01-14 22:26 ` Thomas Rast
2009-01-14 22:26 ` [PATCH 3/4] color-words: expand docs with precise semantics Thomas Rast
2009-01-14 22:26 ` [PATCH 4/4] color-words: make regex configurable via attributes Thomas Rast
4 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:26 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Santi Béjar, Junio C Hamano
We silently truncate a match at the newline, which may lead to
unexpected behaviour, e.g., when matching "<[^>]*>" against
<foo
bar>
since then "<foo" becomes a word (and "bar>" doesn't!) even though the
regex said only angle-bracket-delimited things can be words.
To alleviate the problem slightly, use REG_NEWLINE so that negated
classes can't match a newline. Of course newlines can still be
matched explicitly.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
diff.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/diff.c b/diff.c
index cc42adf..3f07ac1 100644
--- a/diff.c
+++ b/diff.c
@@ -1536,7 +1536,8 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
if (regcomp(ecbdata.diff_words->word_regex,
- o->word_regex, REG_EXTENDED))
+ o->word_regex,
+ REG_EXTENDED | REG_NEWLINE))
die ("Invalid regular expression: %s",
o->word_regex);
}
--
1.6.1.142.ge070e
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 3/4] color-words: expand docs with precise semantics
2009-01-14 20:12 ` Johannes Schindelin
` (2 preceding siblings ...)
2009-01-14 22:26 ` [PATCH 2/4] color-words: enable REG_NEWLINE to help user Thomas Rast
@ 2009-01-14 22:26 ` Thomas Rast
2009-01-14 22:26 ` [PATCH 4/4] color-words: make regex configurable via attributes Thomas Rast
4 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:26 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Santi Béjar, Junio C Hamano
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Documentation/diff-options.txt | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 2c1fa4b..8689a92 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,12 +91,17 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words[=regex]::
- Show colored word diff, i.e. color words which have changed.
+--color-words[=<regex>]::
+ Show colored word diff, i.e., color words which have changed.
+ By default, words are separated by whitespace.
+
-Optionally, you can pass a regular expression that tells Git what the
-words are that you are looking for; The default is to interpret any
-stretch of non-whitespace as a word.
+When a <regex> is specified, every non-overlapping match of the
+<regex> is considered a word. Anything between these matches is
+considered whitespace and ignored(!) for the purposes of finding
+differences. You may want to append `|[^[:space:]]` to your regular
+expression to make sure that it matches all non-whitespace characters.
+A match that contains a newline is silently truncated(!) at the
+newline.
--no-renames::
Turn off rename detection, even when the configuration
--
1.6.1.142.ge070e
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH 4/4] color-words: make regex configurable via attributes
2009-01-14 20:12 ` Johannes Schindelin
` (3 preceding siblings ...)
2009-01-14 22:26 ` [PATCH 3/4] color-words: expand docs with precise semantics Thomas Rast
@ 2009-01-14 22:26 ` Thomas Rast
2009-01-15 1:33 ` Johannes Schindelin
4 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:26 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Santi Béjar, Junio C Hamano
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Incorporates the last round of Dscho's suggestions.
Documentation/diff-options.txt | 4 ++
Documentation/gitattributes.txt | 21 ++++++++++
diff.c | 10 +++++
t/t4034-diff-words.sh | 49 ++++++++++++++++++++++--
userdiff.c | 78 +++++++++++++++++++++++++++++++-------
userdiff.h | 1 +
6 files changed, 144 insertions(+), 19 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 8689a92..1edb82e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -102,6 +102,10 @@ differences. You may want to append `|[^[:space:]]` to your regular
expression to make sure that it matches all non-whitespace characters.
A match that contains a newline is silently truncated(!) at the
newline.
++
+The regex can also be set via a diff driver, see
+linkgit:gitattributes[1]; giving it explicitly overrides any diff
+driver setting.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8af22ec..17707ba 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -317,6 +317,8 @@ patterns are available:
- `bibtex` suitable for files with BibTeX coded references.
+- `cpp` suitable for source code in the C and C++ languages.
+
- `html` suitable for HTML/XHTML documents.
- `java` suitable for source code in the Java language.
@@ -334,6 +336,25 @@ patterns are available:
- `tex` suitable for source code for LaTeX documents.
+Customizing word diff
+^^^^^^^^^^^^^^^^^^^^^
+
+You can customize the rules that `git diff --color-words` uses to
+split words in a line, by specifying an appropriate regular expression
+in the "diff.*.wordregex" configuration variable. For example, in TeX
+a backslash followed by a sequence of letters forms a command, but
+several such commands can be run together without intervening
+whitespace. To separate them, use a regular expression such as
+
+------------------------
+[diff "tex"]
+ wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
+------------------------
+
+A built-in pattern is provided for all languages listed in the last
+section.
+
+
Performing text diffs of binary files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/diff.c b/diff.c
index 3f07ac1..0e82e18 100644
--- a/diff.c
+++ b/diff.c
@@ -1372,6 +1372,12 @@ int diff_filespec_is_binary(struct diff_filespec *one)
return one->driver->funcname.pattern ? &one->driver->funcname : NULL;
}
+static const char *userdiff_word_regex(struct diff_filespec *one)
+{
+ diff_filespec_load_driver(one);
+ return one->driver->word_regex;
+}
+
void diff_set_mnemonic_prefix(struct diff_options *options, const char *a, const char *b)
{
if (!options->a_prefix)
@@ -1532,6 +1538,10 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(one);
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(two);
if (o->word_regex) {
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 6ad1c1f..631ca44 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -22,8 +22,10 @@ decrypt_color () {
word_diff () {
test_must_fail git diff --no-index "$@" pre post > output &&
- decrypt_color < output > output.decrypted &&
- test_cmp expect output.decrypted
+ decrypt_color < output > output.decrypted
+}
+word_diff_check () {
+ test_cmp "$1" output.decrypted
}
cat > pre <<\EOF
@@ -80,7 +82,45 @@ EOF
test_expect_success 'word diff with a regular expression' '
- word_diff --color-words="[a-z]+"
+ word_diff --color-words="[a-z]+" &&
+ word_diff_check expect
+
+'
+
+cat > expect-by-chars <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4)<GREEN>,hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'set a diff driver' '
+ git config diff.testdriver.wordregex "[^[:space:]]" &&
+ cat <<EOF > .gitattributes
+pre diff=testdriver
+post diff=testdriver
+EOF
+'
+
+test_expect_success 'use default supplied by driver' '
+
+ word_diff --color-words &&
+ word_diff_check expect-by-chars
+
+'
+
+test_expect_success 'option overrides default' '
+
+ word_diff --color-words="[a-z]+" &&
+ word_diff_check expect
'
@@ -98,7 +138,8 @@ EOF
test_expect_success "test parsing words for newline" '
- word_diff --color-words="a+"
+ word_diff --color-words="a+" &&
+ word_diff_check expect
'
diff --git a/userdiff.c b/userdiff.c
index 3681062..dbfda6d 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,14 +6,20 @@
static int ndrivers;
static int drivers_alloc;
-#define FUNCNAME(name, pattern) \
- { name, NULL, -1, { pattern, REG_EXTENDED } }
+#define PATTERNS(name, pattern, wordregex) \
+ { name, NULL, -1, { pattern, REG_EXTENDED }, wordregex }
static struct userdiff_driver builtin_drivers[] = {
-FUNCNAME("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$"),
-FUNCNAME("java",
+PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
+ "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("java",
"!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
- "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$"),
-FUNCNAME("objc",
+ "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$",
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]="
+ "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("objc",
/* Negate C statements that can look like functions */
"!^[ \t]*(do|for|if|else|return|switch|while)\n"
/* Objective-C methods */
@@ -21,20 +27,60 @@
/* C functions */
"^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n"
/* Objective-C class/protocol definitions */
- "^(@(implementation|interface|protocol)[ \t].*)$"),
-FUNCNAME("pascal",
+ "^(@(implementation|interface|protocol)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("pascal",
"^((procedure|function|constructor|destructor|interface|"
"implementation|initialization|finalization)[ \t]*.*)$"
"\n"
- "^(.*=[ \t]*(class|record).*)$"),
-FUNCNAME("php", "^[\t ]*((function|class).*)"),
-FUNCNAME("python", "^[ \t]*((class|def)[ \t].*)$"),
-FUNCNAME("ruby", "^[ \t]*((class|module|def)[ \t].*)$"),
-FUNCNAME("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$"),
-FUNCNAME("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"),
+ "^(.*=[ \t]*(class|record).*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|<>|<=|>=|:=|\\.\\."
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("php", "^[\t ]*((function|class).*)",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
+ "|[^[:space:]|[\x80-\xff]+"),
+ /* -- */
+PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
+ /* -- */
+ "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
+ "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
+ "|[^[:space:]|[\x80-\xff]+"),
+PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
+ "[={}\"]|[^={}\" \t]+"),
+PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
+ "\\\\[a-zA-Z@]+|[{}]|\\\\.|[^\\{} \t]+"),
+PATTERNS("cpp",
+ /* Jump targets or access declarations */
+ "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
+ /* C functions at top level */
+ "^([A-Za-z_][A-Za-z_0-9]*([ \t]+[A-Za-z_][A-Za-z_0-9]*){1,}[ \t]*\\([^;]*)$\n"
+ /* compound type at top level */
+ "^((struct|class|enum)[^;]*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
{ "default", NULL, -1, { NULL, 0 } },
};
-#undef FUNCNAME
+#undef PATTERNS
static struct userdiff_driver driver_true = {
"diff=true",
@@ -134,6 +180,8 @@ int userdiff_config(const char *k, const char *v)
return parse_string(&drv->external, k, v);
if ((drv = parse_driver(k, v, "textconv")))
return parse_string(&drv->textconv, k, v);
+ if ((drv = parse_driver(k, v, "wordregex")))
+ return parse_string(&drv->word_regex, k, v);
return 0;
}
diff --git a/userdiff.h b/userdiff.h
index ba29457..c315159 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -11,6 +11,7 @@ struct userdiff_driver {
const char *external;
int binary;
struct userdiff_funcname funcname;
+ const char *word_regex;
const char *textconv;
};
--
1.6.1.142.ge070e
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 21:07 ` Johannes Schindelin
@ 2009-01-14 22:37 ` Thomas Rast
0 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-14 22:37 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 905 bytes --]
Johannes Schindelin wrote:
> > Ack on the new regex semantics, though I'd have implemented it via dying
> > on '\n' instead of silently splitting there (and restarting a new
> > match!).
>
> Hmm. I'd rather not die() in the middle of it.
>
> Maybe we can even handle newlines correctly by replacing them with NULs
> which libxdiff handles just fine?
I'm not sure it's worth the effort---anyone who wants words to stick
together across newlines probably doesn't put a newline there in the
first place, don't they? (And it just shifts the problem to another
special character.)
> Phew. I was almost convinced you would hate me for my criticiscm.
Let's say I wasn't too happy when you asked for two rounds of
improvements and _then_ rejected. But the end result certainly turned
out better, so the criticism was justified.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH 1/4] color-words: fix quoting in t4034
2009-01-14 22:26 ` [PATCH 1/4] color-words: fix quoting in t4034 Thomas Rast
@ 2009-01-14 22:41 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-14 22:41 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano
Hi,
On Wed, 14 Jan 2009, Thomas Rast wrote:
> Since the single quotes match the ones used to quote the test text
> itself, they'd be dropped. Use double quotes instead.
See, I suck with quoting.
> ---
>
> I'd squash this into Dscho's 4/4, so no SoB.
Sure, done.
Thanks,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-14 20:46 ` [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
@ 2009-01-15 0:32 ` Thomas Rast
2009-01-15 1:12 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-15 0:32 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 571 bytes --]
Johannes Schindelin wrote:
> This basically contains the fix I sent earlier.
Unfortunately I found another case where it breaks. It even comes
with a fairly neat test case:
$ g diff --no-index test_a test_b
diff --git 1/test_a 2/test_b
index 289cb9d..2d06f37 100644
--- 1/test_a
+++ 2/test_b
@@ -1 +1 @@
-(:
+(
$ g diff --no-index --color-words='.' test_a test_b
diff --git 1/test_a 2/test_b
index 289cb9d..2d06f37 100644
--- 1/test_a
+++ 2/test_b
@@ -1 +1 @@
:(
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-15 0:32 ` Thomas Rast
@ 2009-01-15 1:12 ` Johannes Schindelin
2009-01-15 1:36 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 1:12 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
Hi,
On Thu, 15 Jan 2009, Thomas Rast wrote:
> Johannes Schindelin wrote:
> > This basically contains the fix I sent earlier.
>
> Unfortunately I found another case where it breaks. It even comes
> with a fairly neat test case:
>
> $ g diff --no-index test_a test_b
> diff --git 1/test_a 2/test_b
> index 289cb9d..2d06f37 100644
> --- 1/test_a
> +++ 2/test_b
> @@ -1 +1 @@
> -(:
> +(
The diff of the words would look like this:
diff --git a/a1 b/a2
index 8309acb..2d06f37 100644
--- a/a1
+++ b/a2
@@ -2 +1,0 @@
-:
Notice the "+1,0"? I fully expected this to be "+2,0", but apparently I
was mistaken...
Can anybody explain to me why this is so?
Ciao,
Dscho
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 4/4] color-words: make regex configurable via attributes
2009-01-14 22:26 ` [PATCH 4/4] color-words: make regex configurable via attributes Thomas Rast
@ 2009-01-15 1:33 ` Johannes Schindelin
2009-01-15 1:43 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 1:33 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano
Hi Thomas,
could you please squash this in?
-- snipsnap --
[PATCH to be squashed into the attributes patch] Decomplicate t4034 again
---
t/t4034-diff-words.sh | 50 ++++++++++++++++++++++--------------------------
1 files changed, 23 insertions(+), 27 deletions(-)
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 631ca44..07e48d1 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -22,10 +22,8 @@ decrypt_color () {
word_diff () {
test_must_fail git diff --no-index "$@" pre post > output &&
- decrypt_color < output > output.decrypted
-}
-word_diff_check () {
- test_cmp "$1" output.decrypted
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
}
cat > pre <<\EOF
@@ -82,12 +80,25 @@ EOF
test_expect_success 'word diff with a regular expression' '
- word_diff --color-words="[a-z]+" &&
- word_diff_check expect
+ word_diff --color-words="[a-z]+"
'
-cat > expect-by-chars <<\EOF
+test_expect_success 'set a diff driver' '
+ git config diff.testdriver.wordregex "[^[:space:]]" &&
+ cat <<EOF > .gitattributes
+pre diff=testdriver
+post diff=testdriver
+EOF
+'
+
+test_expect_success 'option overrides default' '
+
+ word_diff --color-words="[a-z]+"
+
+'
+
+cat > expect <<\EOF
<WHITE>diff --git a/pre b/post<RESET>
<WHITE>index 330b04f..5ed8eff 100644<RESET>
<WHITE>--- a/pre<RESET>
@@ -102,25 +113,9 @@ a = b + c<RESET>
<GREEN>aeff = aeff * ( aaa )<RESET>
EOF
-test_expect_success 'set a diff driver' '
- git config diff.testdriver.wordregex "[^[:space:]]" &&
- cat <<EOF > .gitattributes
-pre diff=testdriver
-post diff=testdriver
-EOF
-'
-
test_expect_success 'use default supplied by driver' '
- word_diff --color-words &&
- word_diff_check expect-by-chars
-
-'
-
-test_expect_success 'option overrides default' '
-
- word_diff --color-words="[a-z]+" &&
- word_diff_check expect
+ word_diff --color-words
'
@@ -136,10 +131,11 @@ cat > expect <<\EOF
aaa (aaa) <GREEN>aaa<RESET>
EOF
-test_expect_success "test parsing words for newline" '
+test_expect_success 'test parsing words for newline' '
+
+ word_diff --color-words="a+"
- word_diff --color-words="a+" &&
- word_diff_check expect
+ word_diff --color-words=.
'
--
1.6.1.300.gbc493
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-15 1:12 ` Johannes Schindelin
@ 2009-01-15 1:36 ` Johannes Schindelin
2009-01-15 8:30 ` Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 1:36 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
Hi,
On Thu, 15 Jan 2009, Johannes Schindelin wrote:
> On Thu, 15 Jan 2009, Thomas Rast wrote:
>
> > Johannes Schindelin wrote:
> > > This basically contains the fix I sent earlier.
> >
> > Unfortunately I found another case where it breaks. It even comes
> > with a fairly neat test case:
> >
> > $ g diff --no-index test_a test_b
> > diff --git 1/test_a 2/test_b
> > index 289cb9d..2d06f37 100644
> > --- 1/test_a
> > +++ 2/test_b
> > @@ -1 +1 @@
> > -(:
> > +(
>
> The diff of the words would look like this:
>
> diff --git a/a1 b/a2
> index 8309acb..2d06f37 100644
> --- a/a1
> +++ b/a2
> @@ -2 +1,0 @@
> -:
>
>
> Notice the "+1,0"? I fully expected this to be "+2,0", but apparently I
> was mistaken...
>
> Can anybody explain to me why this is so?
[PATCH to be squashed into the word regex patch] Fix for strange '@@ -2 +1,0 @@' hunk header
If a hunk header '@@ -2 +1,0 @@' is found that logically should be
'@@ -2 +2,0 @@', diff_words got confused.
It would bee squashed into 4/4.
This might be a libxdiff issue, though.
Not sure yet.
---
diff.c | 18 ++++++++++++++++++
t/t4034-diff-words.sh | 16 ++++++++++++++++
2 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/diff.c b/diff.c
index c5f7c57..3709651 100644
--- a/diff.c
+++ b/diff.c
@@ -360,6 +360,24 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
plus_end = plus_len == 0 ? plus_begin :
diff_words->plus.orig[plus_first + plus_len - 1].end;
+ /*
+ * since this is a --unified=0 diff, it can result in a single hunk
+ * with a header like this: @@ -2 +1,0 @@
+ *
+ * This breaks the assumption that minus_first == plus_first.
+ *
+ * So we have to fix it: whenever we reach the end of pre and post
+ * texts, but nothing was added, we need to shift the plus part
+ * to the end of the buffer.
+ *
+ * It is only necessary for the plus part, as we show the common
+ * words from that buffer.
+ */
+ if (plus_len == 0 && minus_first + minus_len
+ == diff_words->minus.orig_nr)
+ plus_begin = plus_end =
+ diff_words->plus.orig[diff_words->plus.orig_nr - 1].end;
+
if (diff_words->current_plus != plus_begin)
fwrite(diff_words->current_plus,
plus_begin - diff_words->current_plus, 1,
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 07e48d1..817fba6 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -135,6 +135,22 @@ test_expect_success 'test parsing words for newline' '
word_diff --color-words="a+"
+'
+
+echo '(:' > pre
+echo '(' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 289cb9d..2d06f37 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+(<RED>:<RESET>
+EOF
+
+test_expect_success 'test when words are only removed at the end' '
+
word_diff --color-words=.
'
--
1.6.1.300.gbc493
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH 4/4] color-words: make regex configurable via attributes
2009-01-15 1:33 ` Johannes Schindelin
@ 2009-01-15 1:43 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 1:43 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano
Hi,
On Thu, 15 Jan 2009, Johannes Schindelin wrote:
> @@ -136,10 +131,11 @@ cat > expect <<\EOF
> aaa (aaa) <GREEN>aaa<RESET>
> EOF
>
> -test_expect_success "test parsing words for newline" '
> +test_expect_success 'test parsing words for newline' '
> +
> + word_diff --color-words="a+"
>
> - word_diff --color-words="a+" &&
> - word_diff_check expect
> + word_diff --color-words=.
>
> '
D'oh. please remove the last word_diff, this comes from my "fix" for your
smiley issue.
Ciao,
Dscho "off to bed"
--
"The night was so dark that he hardly coulx srr tje keuboarf."
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-14 22:06 ` Johannes Schindelin
2009-01-14 22:11 ` Thomas Rast
2009-01-14 22:24 ` Boyd Stephen Smith Jr.
@ 2009-01-15 4:56 ` Teemu Likonen
2009-01-15 12:41 ` Johannes Schindelin
2 siblings, 1 reply; 109+ messages in thread
From: Teemu Likonen @ 2009-01-15 4:56 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Thomas Rast, Junio C Hamano, git, Santi Béjar
Johannes Schindelin (2009-01-14 23:06 +0100) wrote:
> On Wed, 14 Jan 2009, Thomas Rast wrote:
>> -aaa [aaa]
>> +aaa (aaa) aaa
>>
>> would still give you
>>
>> aaa (aaa)<GREEN> aaa<RESET>
>>
>> which may be unexpected.
>
> But why should it be unexpected? If people say that every length of "a"
> makes a word, and consequently everything else is clutter, then that's
> that, no?
It works logically but I'd very much like to see a some kind of advice
in the man page. I already faced this (unexpected) situation and wasn't
able to fix the regexp myself.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-15 1:36 ` Johannes Schindelin
@ 2009-01-15 8:30 ` Thomas Rast
2009-01-15 10:40 ` Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-15 8:30 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 363 bytes --]
Johannes Schindelin wrote:
> If a hunk header '@@ -2 +1,0 @@' is found that logically should be
> '@@ -2 +2,0 @@', diff_words got confused.
[...]
> This might be a libxdiff issue, though.
Looks like it's just bug-for-bug compatible with diff. At least my
GNU diffutils 2.8.7 show the same behaviour.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-15 8:30 ` Thomas Rast
@ 2009-01-15 10:40 ` Thomas Rast
2009-01-15 12:54 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-15 10:40 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 3984 bytes --]
Thomas Rast wrote:
> Johannes Schindelin wrote:
> > If a hunk header '@@ -2 +1,0 @@' is found that logically should be
> > '@@ -2 +2,0 @@', diff_words got confused.
> [...]
> > This might be a libxdiff issue, though.
>
> Looks like it's just bug-for-bug compatible with diff. At least my
> GNU diffutils 2.8.7 show the same behaviour.
I think the culprit is in
commit ca557afff9f7dad7a8739cd193ac0730d872e282
Author: Davide Libenzi <davidel@xmailserver.org>
Date: Mon Apr 3 18:47:55 2006 -0700
Clean-up trivially redundant diff.
Also corrects the line numbers in unified output when using
zero lines context.
[...]
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
[...]
@@ -244,7 +257,7 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
memcpy(buf, "@@ -", 4);
nb += 4;
- nb += xdl_num_out(buf + nb, c1 ? s1: 0);
+ nb += xdl_num_out(buf + nb, c1 ? s1: s1 - 1);
if (c1 != 1) {
memcpy(buf + nb, ",", 1);
@@ -256,7 +269,7 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
memcpy(buf + nb, " +", 2);
nb += 2;
- nb += xdl_num_out(buf + nb, c2 ? s2: 0);
+ nb += xdl_num_out(buf + nb, c2 ? s2: s2 - 1);
if (c2 != 1) {
memcpy(buf + nb, ",", 1);
Note how (for some reason I don't quite understand yet) "correcting"
the offsets involves subtracting 1 if there were no changes on that
side.
But skipping ahead to the end doesn't work if there are several such
instances where nothing was added. So I think it must be fixed as
follows.
---- 8< ----
diff --git a/diff.c b/diff.c
index 4174d88..d7bbf74 100644
--- a/diff.c
+++ b/diff.c
@@ -361,8 +361,9 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
diff_words->plus.orig[plus_first + plus_len - 1].end;
/*
- * since this is a --unified=0 diff, it can result in a single hunk
- * with a header like this: @@ -2 +1,0 @@
+ * libxdiff subtracts one from the offset if the corresponding
+ * length is 0. (This can only happen because we use
+ * --unified=0.)
*
* This breaks the assumption that minus_first == plus_first.
*
@@ -373,10 +374,9 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
* It is only necessary for the plus part, as we show the common
* words from that buffer.
*/
- if (plus_len == 0 && minus_first + minus_len
- == diff_words->minus.orig_nr)
+ if (plus_len == 0)
plus_begin = plus_end =
- diff_words->plus.orig[diff_words->plus.orig_nr - 1].end;
+ diff_words->plus.orig[plus_first + plus_len].end;
if (diff_words->current_plus != plus_begin)
fwrite(diff_words->current_plus,
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 744221b..875b464 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -156,4 +156,40 @@ test_expect_success 'test when words are only removed at the end' '
'
+echo 'abcd(Xefghijklmn(YZopqrst' > pre
+echo 'abcd(efghijklmn(opqrst' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 434ff54..c4bb9f1 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+abcd(<RED>X<RESET>efghijklmn(<RED>YZ<RESET>opqrst
+EOF
+
+test_expect_success 'no added words' '
+
+ word_diff --color-words=.
+
+'
+
+echo 'abcd(efghijklmn(opqrst' > pre
+echo 'abcd(Xefghijklmn(YZopqrst' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index c4bb9f1..434ff54 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+abcd(<GREEN>X<RESET>efghijklmn(<GREEN>YZ<RESET>opqrst
+EOF
+
+test_expect_success 'no removed words' '
+
+ word_diff --color-words=.
+
+'
+
test_done
--
1.6.1.283.g653b2
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 4:56 ` Teemu Likonen
@ 2009-01-15 12:41 ` Johannes Schindelin
2009-01-15 13:03 ` Teemu Likonen
2009-01-15 18:15 ` Junio C Hamano
0 siblings, 2 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 12:41 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Thomas Rast, Junio C Hamano, git, Santi Béjar
Hi,
On Thu, 15 Jan 2009, Teemu Likonen wrote:
> Johannes Schindelin (2009-01-14 23:06 +0100) wrote:
>
> > On Wed, 14 Jan 2009, Thomas Rast wrote:
> >> -aaa [aaa]
> >> +aaa (aaa) aaa
> >>
> >> would still give you
> >>
> >> aaa (aaa)<GREEN> aaa<RESET>
> >>
> >> which may be unexpected.
> >
> > But why should it be unexpected? If people say that every length of "a"
> > makes a word, and consequently everything else is clutter, then that's
> > that, no?
>
> It works logically but I'd very much like to see a some kind of advice
> in the man page. I already faced this (unexpected) situation and wasn't
> able to fix the regexp myself.
Exactly because it works logically, I do not want to change it. This is
what the user said, and for a change, it could be what the user meant.
You'll have to come up with a method to describe exactly what you want.
So what is it exactly? What would you want in such a situation? You
asked for words that consist solely of the letter 'a'. Now, the
surrounding stuff differs. What should Git do?
BTW this gets even worse when you compare the following:
bbb aaa
ccc aaa
--color-words=a+ will show
ccc aaa
(!!!)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words
2009-01-15 10:40 ` Thomas Rast
@ 2009-01-15 12:54 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 12:54 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Santi Béjar, Junio C Hamano, Teemu Likonen
Hi,
On Thu, 15 Jan 2009, Thomas Rast wrote:
> Thomas Rast wrote:
> > Johannes Schindelin wrote:
> > > If a hunk header '@@ -2 +1,0 @@' is found that logically should be
> > > '@@ -2 +2,0 @@', diff_words got confused.
> > [...]
> > > This might be a libxdiff issue, though.
> >
> > Looks like it's just bug-for-bug compatible with diff. At least my
> > GNU diffutils 2.8.7 show the same behaviour.
>
> I think the culprit is in
>
> commit ca557afff9f7dad7a8739cd193ac0730d872e282
> Author: Davide Libenzi <davidel@xmailserver.org>
> Date: Mon Apr 3 18:47:55 2006 -0700
>
> Clean-up trivially redundant diff.
>
> Also corrects the line numbers in unified output when using
> zero lines context.
> [...]
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> [...]
> @@ -244,7 +257,7 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
> memcpy(buf, "@@ -", 4);
> nb += 4;
>
> - nb += xdl_num_out(buf + nb, c1 ? s1: 0);
> + nb += xdl_num_out(buf + nb, c1 ? s1: s1 - 1);
Junio mentioned some POSIX document in which this behavior is actually
required. So I'll fix my code thusly:
-- snipsnap --
diff --git a/diff.c b/diff.c
index 3709651..219a242 100644
--- a/diff.c
+++ b/diff.c
@@ -353,30 +353,20 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
&minus_first, &minus_len, &plus_first, &plus_len))
return;
- minus_begin = diff_words->minus.orig[minus_first].begin;
- minus_end = minus_len == 0 ? minus_begin :
- diff_words->minus.orig[minus_first + minus_len - 1].end;
- plus_begin = diff_words->plus.orig[plus_first].begin;
- plus_end = plus_len == 0 ? plus_begin :
- diff_words->plus.orig[plus_first + plus_len - 1].end;
+ /* POSIX requires that first be decremented by one if len == 0... */
+ if (minus_len) {
+ minus_begin = diff_words->minus.orig[minus_first].begin;
+ minus_end =
+ diff_words->minus.orig[minus_first + minus_len - 1].end;
+ } else
+ minus_begin = minus_end =
+ diff_words->minus.orig[minus_first].end;
- /*
- * since this is a --unified=0 diff, it can result in a single hunk
- * with a header like this: @@ -2 +1,0 @@
- *
- * This breaks the assumption that minus_first == plus_first.
- *
- * So we have to fix it: whenever we reach the end of pre and post
- * texts, but nothing was added, we need to shift the plus part
- * to the end of the buffer.
- *
- * It is only necessary for the plus part, as we show the common
- * words from that buffer.
- */
- if (plus_len == 0 && minus_first + minus_len
- == diff_words->minus.orig_nr)
- plus_begin = plus_end =
- diff_words->plus.orig[diff_words->plus.orig_nr - 1].end;
+ if (plus_len) {
+ plus_begin = diff_words->plus.orig[plus_first].begin;
+ plus_end = diff_words->plus.orig[plus_first + plus_len - 1].end;
+ } else
+ plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
if (diff_words->current_plus != plus_begin)
fwrite(diff_words->current_plus,
--
1.6.1.300.gbc493
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 12:41 ` Johannes Schindelin
@ 2009-01-15 13:03 ` Teemu Likonen
2009-01-15 13:27 ` Thomas Rast
2009-01-15 18:15 ` Junio C Hamano
1 sibling, 1 reply; 109+ messages in thread
From: Teemu Likonen @ 2009-01-15 13:03 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Thomas Rast, Junio C Hamano, git, Santi Béjar
Johannes Schindelin (2009-01-15 13:41 +0100) wrote:
> Exactly because it works logically, I do not want to change it. This is
> what the user said, and for a change, it could be what the user meant.
I'm just saying that it would be helpful (to me at least) if the man
page included this advice. Thomas Rast already suggested this in his
version of the man page change:
You may want to append `|\S` to your regular expression to make sure
that it matches all non-whitespace characters.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 13:03 ` Teemu Likonen
@ 2009-01-15 13:27 ` Thomas Rast
0 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-15 13:27 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Johannes Schindelin, Junio C Hamano, git, Santi Béjar
[-- Attachment #1: Type: text/plain, Size: 795 bytes --]
Teemu Likonen wrote:
> Johannes Schindelin (2009-01-15 13:41 +0100) wrote:
>
> > Exactly because it works logically, I do not want to change it. This is
> > what the user said, and for a change, it could be what the user meant.
>
> I'm just saying that it would be helpful (to me at least) if the man
> page included this advice. Thomas Rast already suggested this in his
> version of the man page change:
>
> You may want to append `|\S` to your regular expression to make sure
> that it matches all non-whitespace characters.
Dscho requested that I put the extended docs in one of my patches, so
it's currently in
http://article.gmane.org/gmane.comp.version-control.git/105716
Comments welcome of course.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 12:41 ` Johannes Schindelin
2009-01-15 13:03 ` Teemu Likonen
@ 2009-01-15 18:15 ` Junio C Hamano
2009-01-15 19:25 ` Johannes Schindelin
1 sibling, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-15 18:15 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Teemu Likonen, Thomas Rast, git, Santi Béjar
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> BTW this gets even worse when you compare the following:
>
> bbb aaa
> ccc aaa
>
> --color-words=a+ will show
>
> ccc aaa
Naive question. What is the expected output?
The user defines that "a", "aa", "aaa",... are words and everything else
is the background that the words float on, and asks --color-words to color
code where the words differ. The way to show them is to have the words in
red (if it comes from preimage) or in green (if it comes from postimage) on
top of some background.
In this case, there is no difference in words, and the only difference is
the background. Should we still see any output? Shouldn't it behave more
like "diff -w" that suppresses lines that differ only in whitespace?
I didn't see the semantics of color-words documented in the original
either, and I think it should be described in a way humans would
understand (in other words, "here is what we do internally, splitting
words into lines, running diff between them and coalescing the result in
this and that way, and whatever happens to be output is what you get" is
not the semantics that is explained in a way humans would understand).
The above "The way to show them is to have the words in red (if it comes
from preimage) or in green (if it comes from postimage) on top of some
background." was my attempt to describe an easier half of the semantics,
but I am not sure what definition of "some background" the current draft
code is designed around; I think the original's definition was "we discard
the background from either preimage or postimage and insert whitespace
outselves between the words we output; the only exception is the
end-of-line that appears in the postimage which we try to keep" or
something like that, but that is not written in the documentation either.
How should the background computed to draw the result on? If a
corresponding background portion appear in both the preimage and the
postimage, we use the one from the postimage? That justifies why bbb is
not shown but ccc is, when you compare these two:
bbb aaa
ccc aa
What happens if a portion of background is only in the preimage?
E.g. when these two are compared:
bbb aaa bb aa b
ccc aaa cc
what should happen? We would want to say "aa" was removed by showing it
in red, but on what background should it be displayed? cc <red>aa</red>
b?
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 18:15 ` Junio C Hamano
@ 2009-01-15 19:25 ` Johannes Schindelin
2009-01-16 0:10 ` Santi Béjar
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-15 19:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Teemu Likonen, Thomas Rast, git, Santi Béjar
Hi,
On Thu, 15 Jan 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> I didn't see the semantics of color-words documented in the original
> either,
Yeah, my bad. Will try to fix it with this round of patches.
Actually, I'll give a quick outline right here:
Idea: the idea of word diff is to show the differences on a word level
instead of line level. To make it easier for humans (albeit we studiously
exclude color blinds with our defaults), we do not show "+" and "-" as the
standard diff does, but use colors to designate if the words were removed
or added.
Now, the thing is that the inter-word parts _can_ differ. The idea here
is to show the part of the postimage and drop the preimage under the
table.
Method: We use libxdiff as the real workhorse. First, we let it generate
a line diff.
Then we reconstruct the preimage and postimage for each hunk, process both
into new images that have at most one word (in the new code exactly one
word) per line, and feed the new preimage/postimage pair to libxdiff.
>From the output of libxdiff, we reconstruct which words were actually
removed and which were added. Then -- like the line based diff -- we
combine the runs of common words, removed words and added words, and show
them.
The algorithm I implemented in the new patch series is actually much
cleaner than the old one:
- it feeds images to libxdiff which contain _exactly_ one word per line,
decoupling the word offsets in the original image from the offsets in
the processed image,
- this decoupling allows for arbitrary word boundaries, even 0-character
ones,
- it parses the hunk headers of the libxdiff output instead of the "-",
"+" and " " lines, and therefore does not have to play tricks with the
newline character in the middle of a run of removed words.
> What happens if a portion of background is only in the preimage?
If it is in a run of words that were removed, i.e. that are only in the
preimage, then it is shown in that part. Otherwise, the background of the
preimage is never shown.
> E.g. when these two are compared:
>
> bbb aaa bb aa b
> ccc aaa cc
>
> what should happen? We would want to say "aa" was removed by showing it
> in red, but on what background should it be displayed? cc <red>aa</red>
> b?
If we are only ever interested in the 'a's, I'd say that the output should
only reflect that. In other words, what the current code does (ccc
aaa<red>aa</red> cc) is okay IMHO. After all, we said we're interested in
the 'a's, so we should not complain that it did not show us the removal of
'b's.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-15 19:25 ` Johannes Schindelin
@ 2009-01-16 0:10 ` Santi Béjar
2009-01-16 1:37 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 109+ messages in thread
From: Santi Béjar @ 2009-01-16 0:10 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Teemu Likonen, Thomas Rast, git
2009/1/15 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> On Thu, 15 Jan 2009, Junio C Hamano wrote:
>> I didn't see the semantics of color-words documented in the original
>> either,
>
[...]
>> E.g. when these two are compared:
>>
>> bbb aaa bb aa b
>> ccc aaa cc
>>
>> what should happen? We would want to say "aa" was removed by showing it
>> in red, but on what background should it be displayed? cc <red>aa</red>
>> b?
>
> If we are only ever interested in the 'a's, I'd say that the output should
> only reflect that. In other words, what the current code does (ccc
> aaa<red>aa</red> cc) is okay IMHO. After all, we said we're interested in
> the 'a's, so we should not complain that it did not show us the removal of
> 'b's.
It may be ok and logical, but for me it is not what I want. Mmaybe I
don't really undestand what I want or is a crazy idea but here it is
anyway:
Take a simple case with this two lines :
matrix[a,b,c]
matrix{d,b,c}
there is no space so the standard color-words does not help to
visualize that matrix, the b and c are not changed.
What I currently do is to add some spaces:
matrix[ a, b, c ]
matrix{ d, b, c }
then the color-words at least says that "b, c" is unchanged.
What I would like is that --color-words would act as adding this
spaces automatically (and even one after "matrix").
Or another way to think it could be:
a) primary words are those with alphanumerics (or a regex)
b) secondary "words" are the other non-whitespaces characters (in this
case "[]{} and ,"
c) whitespaces are cruft.
(having two regexp to specify what is a words but they cannot mix).
If everything works as I think (it's late night :-) with the above two lines:
matrix[a,b,c]
matrix{d,b,c}
the word diff would be
matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
Santi
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 0:10 ` Santi Béjar
@ 2009-01-16 1:37 ` Junio C Hamano
2009-01-16 1:42 ` Boyd Stephen Smith Jr.
2009-01-16 1:55 ` Johannes Schindelin
2 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2009-01-16 1:37 UTC (permalink / raw)
To: Santi Béjar; +Cc: Johannes Schindelin, Teemu Likonen, Thomas Rast, git
Santi Béjar <santi@agolina.net> writes:
> It may be ok and logical, but for me it is not what I want. Mmaybe I
> don't really undestand what I want or is a crazy idea but here it is
> anyway:
>
> Take a simple case with this two lines :
>
> matrix[a,b,c]
> matrix{d,b,c}
>
> there is no space so the standard color-words does not help to
> visualize that matrix, the b and c are not changed.
>
> What I currently do is to add some spaces:
>
> matrix[ a, b, c ]
> matrix{ d, b, c }
>
> then the color-words at least says that "b, c" is unchanged.
>
> What I would like is that --color-words would act as adding this
> spaces automatically (and even one after "matrix").
>
> Or another way to think it could be:
>
> a) primary words are those with alphanumerics (or a regex)
> b) secondary "words" are the other non-whitespaces characters (in this
> case "[]{} and ,"
> c) whitespaces are cruft.
Dscho and Thomas discussed and designed a way to mark "words look like
this" (and anything that are not words are crufts), and Dscho further
argues that it is Ok to discard crufts (which I think is fine).
What you seem to want in this example is "there is no cruft other than
whitespace, but there are different kinds of words". I do not think it is
incompatible with the way crufts are discarded, but it may be incompatible
with the way how words are identified.
I would expect something like:
[a-zA-Z0-9]+|[^ a-zA-Z0-9]+
should define your "two kinds of words". That is, a run of alnums is a
word, and a run of non-alnums is a word, but "matrix[a" is not a word (it
is a sequence of three words "matrix", "[" and "a").
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 0:10 ` Santi Béjar
2009-01-16 1:37 ` Junio C Hamano
@ 2009-01-16 1:42 ` Boyd Stephen Smith Jr.
2009-01-16 1:55 ` Johannes Schindelin
2 siblings, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-16 1:42 UTC (permalink / raw)
To: Santi Béjar
Cc: Johannes Schindelin, Junio C Hamano, Teemu Likonen, Thomas Rast,
git
[-- Attachment #1: Type: text/plain, Size: 2053 bytes --]
On Thursday 15 January 2009, Santi Béjar <santi@agolina.net> wrote
about 'Re: [PATCH take 3 0/4] color-words improvements':
>It may be ok and logical, but for me it is not what I want. Mmaybe I
>don't really undestand what I want or is a crazy idea but here it is
>anyway:
The discussion above is mildly theoretical. I don't imagine someone is
going to intentionally mark 98% of a file as non-words, which is basically
what you are doing with a regex of "a+".
>a) primary words are those with alphanumerics (or a regex)
regex: [[:alnum:]]+
example words: matrix ball I a
example non-words: don't haven't
>b) secondary "words" are the other non-whitespaces characters (in this
>case "[]{} and ,"
regex: []{}[,]
example words: [ , }
example non-words: [] ball 147
>c) whitespaces are cruft.
>
>(having two regexp to specify what is a words but they cannot mix).
Combine regex with '|' to get:
[[:alnum:]]+|[]{}[,]
>If everything works as I think (it's late night :-) with the above two
> lines:
>
>matrix[a,b,c]
>matrix{d,b,c}
>
>the word diff would be
>
>matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
For this specific case, the regex "[^[:space:]]" by itself should work,
although it would end up being a character-by-character diff.
The regex you built from your description "[[:alnum:]]+|[]}{[,]" would also
give the same diff. However:
-dont
+don't
gives a word diff of:
don't
not:
don<RED>'<RESET>t
because "'" is not recognized as part of any word it is considered
ignorable.
There was a patch that included documentation that most users should add
"|[^[:space:]]" to the end of their regex, to capture all non-whitespace
characters that are not otherwise part of a word as individual,
single-character "words".
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 0:10 ` Santi Béjar
2009-01-16 1:37 ` Junio C Hamano
2009-01-16 1:42 ` Boyd Stephen Smith Jr.
@ 2009-01-16 1:55 ` Johannes Schindelin
2009-01-16 9:02 ` Santi Béjar
2 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-16 1:55 UTC (permalink / raw)
To: Santi Béjar; +Cc: Junio C Hamano, Teemu Likonen, Thomas Rast, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 445 bytes --]
Hi,
On Fri, 16 Jan 2009, Santi Béjar wrote:
> If everything works as I think (it's late night :-) with the above two lines:
>
> matrix[a,b,c]
> matrix{d,b,c}
>
> the word diff would be
>
> matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
So I guess that you want something like
[A-Za-z0-9]+|[^A-Za-z0-9 \t]+
Note: I only want to help you finding what you actually want, I am not
trying to find it for you.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 1:55 ` Johannes Schindelin
@ 2009-01-16 9:02 ` Santi Béjar
2009-01-16 11:57 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 109+ messages in thread
From: Santi Béjar @ 2009-01-16 9:02 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Teemu Likonen,
Thomas Rast, git
2009/1/16 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> Hi,
>
> On Fri, 16 Jan 2009, Santi Béjar wrote:
>
>> If everything works as I think (it's late night :-) with the above two lines:
>>
>> matrix[a,b,c]
>> matrix{d,b,c}
>>
>> the word diff would be
>>
>> matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
>
> So I guess that you want something like
>
> [A-Za-z0-9]+|[^A-Za-z0-9 \t]+
>
> Note: I only want to help you finding what you actually want, I am not
> trying to find it for you.
>
Thanks all for the answers.
So, I see, it is a matter of finding the right regexp.
But the only use case for me is of this kind, and I think for the
others too. So maybe an easier way to specify it could be worth. But
I'll write an alias as this is the only regexp I would use, apart from
the default word diff.
Thanks,
Santi
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 9:02 ` Santi Béjar
@ 2009-01-16 11:57 ` Johannes Schindelin
2009-01-16 12:01 ` Santi Béjar
2009-01-16 16:11 ` [PATCH take 3 0/4] color-words improvements Boyd Stephen Smith Jr.
2 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-16 11:57 UTC (permalink / raw)
To: Santi Béjar
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Teemu Likonen,
Thomas Rast, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 853 bytes --]
Hi,
On Fri, 16 Jan 2009, Santi Béjar wrote:
> 2009/1/16 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> >
> > On Fri, 16 Jan 2009, Santi Béjar wrote:
> >
> >> If everything works as I think (it's late night :-) with the above
> >> two lines:
> >>
> >> matrix[a,b,c]
> >> matrix{d,b,c}
> >>
> >> the word diff would be
> >>
> >> matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
> >
> > So I guess that you want something like
> >
> > [A-Za-z0-9]+|[^A-Za-z0-9 \t]+
> >
>
> So, I see, it is a matter of finding the right regexp.
>
> But the only use case for me is of this kind, and I think for the
> others too. So maybe an easier way to specify it could be worth.
Sure. If you can come up with a nice name for it, we could add special
handling for something like "[[:words:]]" expanding into said regexp.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 9:02 ` Santi Béjar
2009-01-16 11:57 ` Johannes Schindelin
@ 2009-01-16 12:01 ` Santi Béjar
2009-01-16 12:40 ` Johannes Schindelin
2009-01-16 19:04 ` Thomas Rast
2009-01-16 16:11 ` [PATCH take 3 0/4] color-words improvements Boyd Stephen Smith Jr.
2 siblings, 2 replies; 109+ messages in thread
From: Santi Béjar @ 2009-01-16 12:01 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Teemu Likonen,
Thomas Rast, git
Hi,
can you both provide a public repository to be able to test the
lastest version without having to search and apply them?
Thanks,
Santi
P.D.: I know it will be rebased.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 12:01 ` Santi Béjar
@ 2009-01-16 12:40 ` Johannes Schindelin
2009-01-16 19:04 ` Thomas Rast
1 sibling, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-16 12:40 UTC (permalink / raw)
To: Santi Béjar
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Teemu Likonen,
Thomas Rast, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 261 bytes --]
Hi,
On Fri, 16 Jan 2009, Santi Béjar wrote:
> can you both provide a public repository to be able to test the
> lastest version without having to search and apply them?
You will always find my latest version in git://repo.or.cz/git/dscho.git.
Hth,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 9:02 ` Santi Béjar
2009-01-16 11:57 ` Johannes Schindelin
2009-01-16 12:01 ` Santi Béjar
@ 2009-01-16 16:11 ` Boyd Stephen Smith Jr.
2 siblings, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-16 16:11 UTC (permalink / raw)
To: Santi Béjar
Cc: Johannes Schindelin, Junio C Hamano, Teemu Likonen, Thomas Rast,
git
[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]
On Friday 2009 January 16 03:02:33 Santi Béjar wrote:
> 2009/1/16 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>> Hi,
>>
>> On Fri, 16 Jan 2009, Santi Béjar wrote:
>>> If everything works as I think (it's late night :-) with the above two
>>> lines:
>>>
>>> matrix[a,b,c]
>>> matrix{d,b,c}
>>>
>>> the word diff would be
>>>
>>> matrix<RED>[<GREEN>{<RED>a<GREEN>d<RESET>,b,c<RED>]<GREEN>}<RED>
>
>So, I see, it is a matter of finding the right regexp.
>
>But the only use case for me is of this kind, and I think for the
>others too. So maybe an easier way to specify it could be worth. But
>I'll write an alias as this is the only regexp I would use, apart from
>the default word diff.
I think that the C/C++ language word-diff driver would work here, and there
should be a shortcut for that.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 12:01 ` Santi Béjar
2009-01-16 12:40 ` Johannes Schindelin
@ 2009-01-16 19:04 ` Thomas Rast
2009-01-16 21:09 ` Johannes Schindelin
1 sibling, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-16 19:04 UTC (permalink / raw)
To: Santi Béjar
Cc: Johannes Schindelin, Junio C Hamano, Boyd Stephen Smith Jr.,
Teemu Likonen, git
[-- Attachment #1: Type: text/plain, Size: 630 bytes --]
Santi Béjar wrote:
> Hi,
>
> can you both provide a public repository to be able to test the
> lastest version without having to search and apply them?
I set up a clone at
git://repo.or.cz/git/trast.git
The respective topics are js/word-diff-p1 and tr/word-diff-p2. For
your testing convenience, there are master/next branches that merge
tr/word-diff-p2 to Junio's master/next. I *think* I should have
gathered all squashes from Dscho, too.
The tip commit has some tweaks to the builtin regexes that aren't in
any mailed version yet; I'll resend RSN.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH take 3 0/4] color-words improvements
2009-01-16 19:04 ` Thomas Rast
@ 2009-01-16 21:09 ` Johannes Schindelin
2009-01-17 16:29 ` [PATCH v4 0/7] customizable --color-words Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-16 21:09 UTC (permalink / raw)
To: Thomas Rast
Cc: Santi Béjar, Junio C Hamano, Boyd Stephen Smith Jr.,
Teemu Likonen, git
Hi,
On Fri, 16 Jan 2009, Thomas Rast wrote:
> The tip commit has some tweaks to the builtin regexes that aren't in any
> mailed version yet; I'll resend RSN.
Note that I applied the "better" fix for the @@ -2 +1,0 @@ issue, but
haven't sent out a redone series.
Thomas, could you pick up the patches from my 'my-next' branch and
maintain an "official" topic branch?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH v4 0/7] customizable --color-words
2009-01-16 21:09 ` Johannes Schindelin
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually Thomas Rast
` (2 more replies)
0 siblings, 3 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
Johannes Schindelin wrote:
> Thomas, could you pick up the patches from my 'my-next' branch and
> maintain an "official" topic branch?
I cherry-picked the three commits you had there, and rebuilt on top.
I pushed them to
git://repo.or.cz/git/trast.git tr/word-diff-p2
again (js/word-diff-p1 again points directly at your half).
The changes on your side since my last push (hence your last sent
patches&squashes I collected) were only a pair of quotes changed from
double to single.
On my side I mainly tweaked the TeX pattern since I noticed it didn't
match many non-alnums such as (), and therefore declare them
unchanged:
- "\\\\[a-zA-Z@]+|[][{}]|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
+ "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+|[^[:space:]]"),
I also added a clause to the C++ pattern to allow it to match
declarations such as
int Foo::bar(...)
(it would give up on the :: before).
Johannes Schindelin (4):
Add color_fwrite_lines(), a function coloring each line individually
color-words: refactor word splitting and use ALLOC_GROW()
color-words: change algorithm to allow for 0-character word
boundaries
color-words: take an optional regular expression describing words
Thomas Rast (3):
color-words: enable REG_NEWLINE to help user
color-words: expand docs with precise semantics
color-words: make regex configurable via attributes
Documentation/diff-options.txt | 17 +++-
Documentation/gitattributes.txt | 21 ++++
color.c | 28 +++++
color.h | 1 +
diff.c | 222 ++++++++++++++++++++++++++-------------
diff.h | 1 +
t/t4034-diff-words.sh | 159 ++++++++++++++++++++++++++++
userdiff.c | 78 +++++++++++---
userdiff.h | 1 +
9 files changed, 440 insertions(+), 88 deletions(-)
create mode 100755 t/t4034-diff-words.sh
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually
2009-01-17 16:29 ` [PATCH v4 0/7] customizable --color-words Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 2/7] color-words: refactor word splitting and use ALLOC_GROW() Thomas Rast
2009-01-18 15:05 ` [PATCH v4 0/7] customizable --color-words Santi Béjar
2009-01-19 22:47 ` Santi Béjar
2 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
From: Johannes Schindelin <johannes.schindelin@gmx.de>
We have to set the color before every line and reset it before every
newline. Add a function color_fwrite_lines() which does that for us.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
color.c | 28 ++++++++++++++++++++++++++++
color.h | 1 +
2 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/color.c b/color.c
index fc0b72a..d4ae83f 100644
--- a/color.c
+++ b/color.c
@@ -191,3 +191,31 @@ int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...)
va_end(args);
return r;
}
+
+/*
+ * This function splits the buffer by newlines and colors the lines individually.
+ *
+ * Returns 0 on success.
+ */
+int color_fwrite_lines(FILE *fp, const char *color,
+ size_t count, const char *buf)
+{
+ if (!*color)
+ return fwrite(buf, count, 1, fp) != 1;
+ while (count) {
+ char *p = memchr(buf, '\n', count);
+ if (p != buf && (fputs(color, fp) < 0 ||
+ fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
+ fputs(COLOR_RESET, fp) < 0))
+ return -1;
+ if (!p)
+ return 0;
+ if (fputc('\n', fp) < 0)
+ return -1;
+ count -= p + 1 - buf;
+ buf = p + 1;
+ }
+ return 0;
+}
+
+
diff --git a/color.h b/color.h
index 6cf5c88..cd5c985 100644
--- a/color.h
+++ b/color.h
@@ -19,5 +19,6 @@
void color_parse(const char *var, const char *value, char *dst);
int color_fprintf(FILE *fp, const char *color, const char *fmt, ...);
int color_fprintf_ln(FILE *fp, const char *color, const char *fmt, ...);
+int color_fwrite_lines(FILE *fp, const char *color, size_t count, const char *buf);
#endif /* COLOR_H */
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 2/7] color-words: refactor word splitting and use ALLOC_GROW()
2009-01-17 16:29 ` [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 3/7] color-words: change algorithm to allow for 0-character word boundaries Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Word splitting is now performed by the function diff_words_fill(),
avoiding having the same code twice.
In the same spirit, avoid duplicating the code of ALLOC_GROW().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 40 +++++++++++++++++++---------------------
1 files changed, 19 insertions(+), 21 deletions(-)
diff --git a/diff.c b/diff.c
index d235482..c111eef 100644
--- a/diff.c
+++ b/diff.c
@@ -326,10 +326,7 @@ struct diff_words_buffer {
static void diff_words_append(char *line, unsigned long len,
struct diff_words_buffer *buffer)
{
- if (buffer->text.size + len > buffer->alloc) {
- buffer->alloc = (buffer->text.size + len) * 3 / 2;
- buffer->text.ptr = xrealloc(buffer->text.ptr, buffer->alloc);
- }
+ ALLOC_GROW(buffer->text.ptr, buffer->text.size + len, buffer->alloc);
line++;
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
@@ -398,6 +395,22 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+/*
+ * This function splits the words in buffer->text, and stores the list with
+ * newline separator into out.
+ */
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+{
+ int i;
+ out->size = buffer->text.size;
+ out->ptr = xmalloc(out->size);
+ memcpy(out->ptr, buffer->text.ptr, out->size);
+ for (i = 0; i < out->size; i++)
+ if (isspace(out->ptr[i]))
+ out->ptr[i] = '\n';
+ buffer->current = 0;
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -405,26 +418,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
- diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
- diff_words->plus.current = 0;
-
+ diff_words_fill(&diff_words->minus, &minus);
+ diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 3/7] color-words: change algorithm to allow for 0-character word boundaries
2009-01-17 16:29 ` [PATCH v4 2/7] color-words: refactor word splitting and use ALLOC_GROW() Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 4/7] color-words: take an optional regular expression describing words Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Up until now, the color-words code assumed that word boundaries are
identical to white space characters.
Therefore, it could get away with a very simple scheme: it copied the
hunks, substituted newlines for each white space character, called
libxdiff with the processed text, and then identified the text to
output by the offsets (which agreed since the original text had the
same length).
This code was ugly, for a number of reasons:
- it was impossible to introduce 0-character word boundaries,
- we had to print everything word by word, and
- the code needed extra special handling of newlines in the removed part.
Fix all of these issues by processing the text such that
- we build word lists, separated by newlines,
- we remember the original offsets for every word, and
- after calling libxdiff on the wordlists, we parse the hunk headers, and
find the corresponding offsets, and then
- we print the removed/added parts in one go.
The pre and post samples in the test were provided by Santi Béjar.
Note that there is some strange special handling of hunk headers where
one line range is 0 due to POSIX: in this case, the start is one too
low. In other words a hunk header '@@ -1,0 +2 @@' actually means that
the line must be added after the _second_ line of the pre text, _not_
the first.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
diff.c | 161 ++++++++++++++++++++++++++++---------------------
t/t4034-diff-words.sh | 66 ++++++++++++++++++++
2 files changed, 159 insertions(+), 68 deletions(-)
create mode 100755 t/t4034-diff-words.sh
diff --git a/diff.c b/diff.c
index c111eef..37c886a 100644
--- a/diff.c
+++ b/diff.c
@@ -319,8 +319,10 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
struct diff_words_buffer {
mmfile_t text;
long alloc;
- long current; /* output pointer */
- int suppressed_newline;
+ struct diff_words_orig {
+ const char *begin, *end;
+ } *orig;
+ int orig_nr, orig_alloc;
};
static void diff_words_append(char *line, unsigned long len,
@@ -335,80 +337,89 @@ static void diff_words_append(char *line, unsigned long len,
struct diff_words_data {
struct diff_words_buffer minus, plus;
+ const char *current_plus;
FILE *file;
};
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
- int suppress_newline)
+static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
- const char *ptr;
- int eol = 0;
+ struct diff_words_data *diff_words = priv;
+ int minus_first, minus_len, plus_first, plus_len;
+ const char *minus_begin, *minus_end, *plus_begin, *plus_end;
- if (len == 0)
+ if (line[0] != '@' || parse_hunk_header(line, len,
+ &minus_first, &minus_len, &plus_first, &plus_len))
return;
- ptr = buffer->text.ptr + buffer->current;
- buffer->current += len;
+ /* POSIX requires that first be decremented by one if len == 0... */
+ if (minus_len) {
+ minus_begin = diff_words->minus.orig[minus_first].begin;
+ minus_end =
+ diff_words->minus.orig[minus_first + minus_len - 1].end;
+ } else
+ minus_begin = minus_end =
+ diff_words->minus.orig[minus_first].end;
- if (ptr[len - 1] == '\n') {
- eol = 1;
- len--;
- }
+ if (plus_len) {
+ plus_begin = diff_words->plus.orig[plus_first].begin;
+ plus_end = diff_words->plus.orig[plus_first + plus_len - 1].end;
+ } else
+ plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
- fputs(diff_get_color(1, color), file);
- fwrite(ptr, len, 1, file);
- fputs(diff_get_color(1, DIFF_RESET), file);
+ if (diff_words->current_plus != plus_begin)
+ fwrite(diff_words->current_plus,
+ plus_begin - diff_words->current_plus, 1,
+ diff_words->file);
+ if (minus_begin != minus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ minus_end - minus_begin, minus_begin);
+ if (plus_begin != plus_end)
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_NEW),
+ plus_end - plus_begin, plus_begin);
- if (eol) {
- if (suppress_newline)
- buffer->suppressed_newline = 1;
- else
- putc('\n', file);
- }
-}
-
-static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
-{
- struct diff_words_data *diff_words = priv;
-
- if (diff_words->minus.suppressed_newline) {
- if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
-
- len--;
- switch (line[0]) {
- case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
- break;
- case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
- break;
- case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
- break;
- }
+ diff_words->current_plus = plus_end;
}
/*
- * This function splits the words in buffer->text, and stores the list with
- * newline separator into out.
+ * This function splits the words in buffer->text, stores the list with
+ * newline separator into out, and saves the offsets of the original words
+ * in buffer->orig.
*/
static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
{
- int i;
- out->size = buffer->text.size;
- out->ptr = xmalloc(out->size);
- memcpy(out->ptr, buffer->text.ptr, out->size);
- for (i = 0; i < out->size; i++)
- if (isspace(out->ptr[i]))
- out->ptr[i] = '\n';
- buffer->current = 0;
+ int i, j;
+
+ out->size = 0;
+ out->ptr = xmalloc(buffer->text.size);
+
+ /* fake an empty "0th" word */
+ ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
+ buffer->orig[0].begin = buffer->orig[0].end = buffer->text.ptr;
+ buffer->orig_nr = 1;
+
+ for (i = 0; i < buffer->text.size; i++) {
+ if (isspace(buffer->text.ptr[i]))
+ continue;
+ for (j = i + 1; j < buffer->text.size &&
+ !isspace(buffer->text.ptr[j]); j++)
+ ; /* find the end of the word */
+
+ /* store original boundaries */
+ ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
+ buffer->orig_alloc);
+ buffer->orig[buffer->orig_nr].begin = buffer->text.ptr + i;
+ buffer->orig[buffer->orig_nr].end = buffer->text.ptr + j;
+ buffer->orig_nr++;
+
+ /* store one word */
+ memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
+ out->ptr[out->size + j - i] = '\n';
+ out->size += j - i + 1;
+
+ i = j - 1;
+ }
}
/* this executes the word diff on the accumulated buffers */
@@ -419,22 +430,34 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitcb_t ecb;
mmfile_t minus, plus;
+ /* special case: only removal */
+ if (!diff_words->plus.text.size) {
+ color_fwrite_lines(diff_words->file,
+ diff_get_color(1, DIFF_FILE_OLD),
+ diff_words->minus.text.size, diff_words->minus.text.ptr);
+ diff_words->minus.text.size = 0;
+ return;
+ }
+
+ diff_words->current_plus = diff_words->plus.text.ptr;
+
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
diff_words_fill(&diff_words->minus, &minus);
diff_words_fill(&diff_words->plus, &plus);
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
+ xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
free(minus.ptr);
free(plus.ptr);
+ if (diff_words->current_plus != diff_words->plus.text.ptr +
+ diff_words->plus.text.size)
+ fwrite(diff_words->current_plus,
+ diff_words->plus.text.ptr + diff_words->plus.text.size
+ - diff_words->current_plus, 1,
+ diff_words->file);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
-
- if (diff_words->minus.suppressed_newline) {
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
}
typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
@@ -458,7 +481,9 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
diff_words_show(ecbdata->diff_words);
free (ecbdata->diff_words->minus.text.ptr);
+ free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
+ free (ecbdata->diff_words->plus.orig);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
new file mode 100755
index 0000000..b22195f
--- /dev/null
+++ b/t/t4034-diff-words.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+
+test_description='word diff colors'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+ git config diff.color.old red
+ git config diff.color.new green
+
+'
+
+decrypt_color () {
+ sed \
+ -e 's/.\[1m/<WHITE>/g' \
+ -e 's/.\[31m/<RED>/g' \
+ -e 's/.\[32m/<GREEN>/g' \
+ -e 's/.\[36m/<BROWN>/g' \
+ -e 's/.\[m/<RESET>/g'
+}
+
+word_diff () {
+ test_must_fail git diff --no-index "$@" pre post > output &&
+ decrypt_color < output > output.decrypted &&
+ test_cmp expect output.decrypted
+}
+
+cat > pre <<\EOF
+h(4)
+
+a = b + c
+EOF
+
+cat > post <<\EOF
+h(4),hh[44]
+
+a = b + c
+
+aa = a
+
+aeff = aeff * ( aaa )
+EOF
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+<RED>h(4)<RESET><GREEN>h(4),hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'word diff with runs of whitespace' '
+
+ word_diff --color-words
+
+'
+
+test_done
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 4/7] color-words: take an optional regular expression describing words
2009-01-17 16:29 ` [PATCH v4 3/7] color-words: change algorithm to allow for 0-character word boundaries Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 5/7] color-words: enable REG_NEWLINE to help user Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
From: Johannes Schindelin <johannes.schindelin@gmx.de>
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
This commit contains a quoting fix by Thomas Rast.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/diff-options.txt | 6 +++-
diff.c | 64 ++++++++++++++++++++++++++++++++++-----
diff.h | 1 +
t/t4034-diff-words.sh | 57 +++++++++++++++++++++++++++++++++++
4 files changed, 118 insertions(+), 10 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 43793d7..2c1fa4b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,8 +91,12 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words::
+--color-words[=regex]::
Show colored word diff, i.e. color words which have changed.
++
+Optionally, you can pass a regular expression that tells Git what the
+words are that you are looking for; The default is to interpret any
+stretch of non-whitespace as a word.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index 37c886a..9fb3d0d 100644
--- a/diff.c
+++ b/diff.c
@@ -333,12 +333,14 @@ static void diff_words_append(char *line, unsigned long len,
len--;
memcpy(buffer->text.ptr + buffer->text.size, line, len);
buffer->text.size += len;
+ buffer->text.ptr[buffer->text.size] = '\0';
}
struct diff_words_data {
struct diff_words_buffer minus, plus;
const char *current_plus;
FILE *file;
+ regex_t *word_regex;
};
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -382,17 +384,49 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
diff_words->current_plus = plus_end;
}
+/* This function starts looking at *begin, and returns 0 iff a word was found. */
+static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
+ int *begin, int *end)
+{
+ if (word_regex && *begin < buffer->size) {
+ regmatch_t match[1];
+ if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+ char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
+ '\n', match[0].rm_eo - match[0].rm_so);
+ *end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
+ *begin += match[0].rm_so;
+ return *begin >= *end;
+ }
+ return -1;
+ }
+
+ /* find the next word */
+ while (*begin < buffer->size && isspace(buffer->ptr[*begin]))
+ (*begin)++;
+ if (*begin >= buffer->size)
+ return -1;
+
+ /* find the end of the word */
+ *end = *begin + 1;
+ while (*end < buffer->size && !isspace(buffer->ptr[*end]))
+ (*end)++;
+
+ return 0;
+}
+
/*
* This function splits the words in buffer->text, stores the list with
* newline separator into out, and saves the offsets of the original words
* in buffer->orig.
*/
-static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
+static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
+ regex_t *word_regex)
{
int i, j;
+ long alloc = 0;
out->size = 0;
- out->ptr = xmalloc(buffer->text.size);
+ out->ptr = NULL;
/* fake an empty "0th" word */
ALLOC_GROW(buffer->orig, 1, buffer->orig_alloc);
@@ -400,11 +434,8 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr = 1;
for (i = 0; i < buffer->text.size; i++) {
- if (isspace(buffer->text.ptr[i]))
- continue;
- for (j = i + 1; j < buffer->text.size &&
- !isspace(buffer->text.ptr[j]); j++)
- ; /* find the end of the word */
+ if (find_word_boundaries(&buffer->text, word_regex, &i, &j))
+ return;
/* store original boundaries */
ALLOC_GROW(buffer->orig, buffer->orig_nr + 1,
@@ -414,6 +445,7 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out)
buffer->orig_nr++;
/* store one word */
+ ALLOC_GROW(out->ptr, out->size + j - i + 1, alloc);
memcpy(out->ptr + out->size, buffer->text.ptr + i, j - i);
out->ptr[out->size + j - i] = '\n';
out->size += j - i + 1;
@@ -443,9 +475,10 @@ static void diff_words_show(struct diff_words_data *diff_words)
memset(&xpp, 0, sizeof(xpp));
memset(&xecfg, 0, sizeof(xecfg));
- diff_words_fill(&diff_words->minus, &minus);
- diff_words_fill(&diff_words->plus, &plus);
+ diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
+ diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
xpp.flags = XDF_NEED_MINIMAL;
+ /* as only the hunk header will be parsed, we need a 0-context */
xecfg.ctxlen = 0;
xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
&xpp, &xecfg, &ecb);
@@ -484,6 +517,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
free (ecbdata->diff_words->minus.orig);
free (ecbdata->diff_words->plus.text.ptr);
free (ecbdata->diff_words->plus.orig);
+ free(ecbdata->diff_words->word_regex);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
@@ -1506,6 +1540,14 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (o->word_regex) {
+ ecbdata.diff_words->word_regex = (regex_t *)
+ xmalloc(sizeof(regex_t));
+ if (regcomp(ecbdata.diff_words->word_regex,
+ o->word_regex, REG_EXTENDED))
+ die ("Invalid regular expression: %s",
+ o->word_regex);
+ }
}
xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
&xpp, &xecfg, &ecb);
@@ -2517,6 +2559,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_CLR(options, COLOR_DIFF);
else if (!strcmp(arg, "--color-words"))
options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ else if (!prefixcmp(arg, "--color-words=")) {
+ options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ options->word_regex = arg + 14;
+ }
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
int stat_width;
int stat_name_width;
+ const char *word_regex;
/* this is set by diffcore for DIFF_FORMAT_PATCH */
int found_changes;
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index b22195f..4873486 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -63,4 +63,61 @@ test_expect_success 'word diff with runs of whitespace' '
'
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4),<GREEN>hh<RESET>[44]
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa<RESET> )
+EOF
+
+test_expect_success 'word diff with a regular expression' '
+
+ word_diff --color-words="[a-z]+"
+
+'
+
+echo 'aaa (aaa)' > pre
+echo 'aaa (aaa) aaa' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index c29453b..be22f37 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+aaa (aaa) <GREEN>aaa<RESET>
+EOF
+
+test_expect_success 'test parsing words for newline' '
+
+ word_diff --color-words="a+"
+
+'
+
+echo '(:' > pre
+echo '(' > post
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 289cb9d..2d06f37 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1 +1 @@<RESET>
+(<RED>:<RESET>
+EOF
+
+test_expect_success 'test when words are only removed at the end' '
+
+ word_diff --color-words=.
+
+'
+
test_done
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 5/7] color-words: enable REG_NEWLINE to help user
2009-01-17 16:29 ` [PATCH v4 4/7] color-words: take an optional regular expression describing words Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 6/7] color-words: expand docs with precise semantics Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
We silently truncate a match at the newline, which may lead to
unexpected behaviour, e.g., when matching "<[^>]*>" against
<foo
bar>
since then "<foo" becomes a word (and "bar>" doesn't!) even though the
regex said only angle-bracket-delimited things can be words.
To alleviate the problem slightly, use REG_NEWLINE so that negated
classes can't match a newline. Of course newlines can still be
matched explicitly.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
diff.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/diff.c b/diff.c
index 9fb3d0d..00c661f 100644
--- a/diff.c
+++ b/diff.c
@@ -1544,7 +1544,8 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
if (regcomp(ecbdata.diff_words->word_regex,
- o->word_regex, REG_EXTENDED))
+ o->word_regex,
+ REG_EXTENDED | REG_NEWLINE))
die ("Invalid regular expression: %s",
o->word_regex);
}
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 6/7] color-words: expand docs with precise semantics
2009-01-17 16:29 ` [PATCH v4 5/7] color-words: enable REG_NEWLINE to help user Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
2009-01-17 16:29 ` [PATCH v4 7/7] color-words: make regex configurable via attributes Thomas Rast
0 siblings, 1 reply; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Documentation/diff-options.txt | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 2c1fa4b..8689a92 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,12 +91,17 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words[=regex]::
- Show colored word diff, i.e. color words which have changed.
+--color-words[=<regex>]::
+ Show colored word diff, i.e., color words which have changed.
+ By default, words are separated by whitespace.
+
-Optionally, you can pass a regular expression that tells Git what the
-words are that you are looking for; The default is to interpret any
-stretch of non-whitespace as a word.
+When a <regex> is specified, every non-overlapping match of the
+<regex> is considered a word. Anything between these matches is
+considered whitespace and ignored(!) for the purposes of finding
+differences. You may want to append `|[^[:space:]]` to your regular
+expression to make sure that it matches all non-whitespace characters.
+A match that contains a newline is silently truncated(!) at the
+newline.
--no-renames::
Turn off rename detection, even when the configuration
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH v4 7/7] color-words: make regex configurable via attributes
2009-01-17 16:29 ` [PATCH v4 6/7] color-words: expand docs with precise semantics Thomas Rast
@ 2009-01-17 16:29 ` Thomas Rast
0 siblings, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-17 16:29 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Boyd Stephen Smith Jr., Teemu Likonen
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute. The user can then set the
driver on a file in .gitattributes. If a regex is given on the
command line, it overrides the driver's setting.
We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Documentation/diff-options.txt | 4 ++
Documentation/gitattributes.txt | 21 ++++++++++
diff.c | 10 +++++
t/t4034-diff-words.sh | 36 ++++++++++++++++++
userdiff.c | 78 +++++++++++++++++++++++++++++++-------
userdiff.h | 1 +
6 files changed, 135 insertions(+), 15 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 8689a92..1edb82e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -102,6 +102,10 @@ differences. You may want to append `|[^[:space:]]` to your regular
expression to make sure that it matches all non-whitespace characters.
A match that contains a newline is silently truncated(!) at the
newline.
++
+The regex can also be set via a diff driver, see
+linkgit:gitattributes[1]; giving it explicitly overrides any diff
+driver setting.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8af22ec..ba3ba12 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -317,6 +317,8 @@ patterns are available:
- `bibtex` suitable for files with BibTeX coded references.
+- `cpp` suitable for source code in the C and C++ languages.
+
- `html` suitable for HTML/XHTML documents.
- `java` suitable for source code in the Java language.
@@ -334,6 +336,25 @@ patterns are available:
- `tex` suitable for source code for LaTeX documents.
+Customizing word diff
+^^^^^^^^^^^^^^^^^^^^^
+
+You can customize the rules that `git diff --color-words` uses to
+split words in a line, by specifying an appropriate regular expression
+in the "diff.*.wordregex" configuration variable. For example, in TeX
+a backslash followed by a sequence of letters forms a command, but
+several such commands can be run together without intervening
+whitespace. To separate them, use a regular expression such as
+
+------------------------
+[diff "tex"]
+ wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
+------------------------
+
+A built-in pattern is provided for all languages listed in the
+previous section.
+
+
Performing text diffs of binary files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/diff.c b/diff.c
index 00c661f..9fcde96 100644
--- a/diff.c
+++ b/diff.c
@@ -1380,6 +1380,12 @@ int diff_filespec_is_binary(struct diff_filespec *one)
return one->driver->funcname.pattern ? &one->driver->funcname : NULL;
}
+static const char *userdiff_word_regex(struct diff_filespec *one)
+{
+ diff_filespec_load_driver(one);
+ return one->driver->word_regex;
+}
+
void diff_set_mnemonic_prefix(struct diff_options *options, const char *a, const char *b)
{
if (!options->a_prefix)
@@ -1540,6 +1546,10 @@ static void builtin_diff(const char *name_a,
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
ecbdata.diff_words->file = o->file;
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(one);
+ if (!o->word_regex)
+ o->word_regex = userdiff_word_regex(two);
if (o->word_regex) {
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 4873486..744221b 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -84,6 +84,41 @@ test_expect_success 'word diff with a regular expression' '
'
+test_expect_success 'set a diff driver' '
+ git config diff.testdriver.wordregex "[^[:space:]]" &&
+ cat <<EOF > .gitattributes
+pre diff=testdriver
+post diff=testdriver
+EOF
+'
+
+test_expect_success 'option overrides default' '
+
+ word_diff --color-words="[a-z]+"
+
+'
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4)<GREEN>,hh[44]<RESET>
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa )<RESET>
+EOF
+
+test_expect_success 'use default supplied by driver' '
+
+ word_diff --color-words
+
+'
+
echo 'aaa (aaa)' > pre
echo 'aaa (aaa) aaa' > post
@@ -100,6 +135,7 @@ test_expect_success 'test parsing words for newline' '
word_diff --color-words="a+"
+
'
echo '(:' > pre
diff --git a/userdiff.c b/userdiff.c
index 3681062..2b55509 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,14 +6,20 @@
static int ndrivers;
static int drivers_alloc;
-#define FUNCNAME(name, pattern) \
- { name, NULL, -1, { pattern, REG_EXTENDED } }
+#define PATTERNS(name, pattern, wordregex) \
+ { name, NULL, -1, { pattern, REG_EXTENDED }, wordregex }
static struct userdiff_driver builtin_drivers[] = {
-FUNCNAME("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$"),
-FUNCNAME("java",
+PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
+ "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("java",
"!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
- "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$"),
-FUNCNAME("objc",
+ "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$",
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]="
+ "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("objc",
/* Negate C statements that can look like functions */
"!^[ \t]*(do|for|if|else|return|switch|while)\n"
/* Objective-C methods */
@@ -21,20 +27,60 @@
/* C functions */
"^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$\n"
/* Objective-C class/protocol definitions */
- "^(@(implementation|interface|protocol)[ \t].*)$"),
-FUNCNAME("pascal",
+ "^(@(implementation|interface|protocol)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("pascal",
"^((procedure|function|constructor|destructor|interface|"
"implementation|initialization|finalization)[ \t]*.*)$"
"\n"
- "^(.*=[ \t]*(class|record).*)$"),
-FUNCNAME("php", "^[\t ]*((function|class).*)"),
-FUNCNAME("python", "^[ \t]*((class|def)[ \t].*)$"),
-FUNCNAME("ruby", "^[ \t]*((class|module|def)[ \t].*)$"),
-FUNCNAME("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$"),
-FUNCNAME("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"),
+ "^(.*=[ \t]*(class|record).*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|<>|<=|>=|:=|\\.\\."
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("php", "^[\t ]*((function|class).*)",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
+ "|[-+*/<>%&^|=!.]=|--|\\+\\+|<<=?|>>=?|===|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"
+ "|[^[:space:]|[\x80-\xff]+"),
+ /* -- */
+PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
+ /* -- */
+ "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
+ "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"
+ "|[^[:space:]|[\x80-\xff]+"),
+PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
+ "[={}\"]|[^={}\" \t]+"),
+PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
+ "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+|[^[:space:]]"),
+PATTERNS("cpp",
+ /* Jump targets or access declarations */
+ "!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
+ /* C/++ functions/methods at top level */
+ "^([A-Za-z_][A-Za-z_0-9]*([ \t]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
+ /* compound type at top level */
+ "^((struct|class|enum)[^;]*)$",
+ /* -- */
+ "[a-zA-Z_][a-zA-Z0-9_]*"
+ "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
+ "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"
+ "|[^[:space:]]|[\x80-\xff]+"),
{ "default", NULL, -1, { NULL, 0 } },
};
-#undef FUNCNAME
+#undef PATTERNS
static struct userdiff_driver driver_true = {
"diff=true",
@@ -134,6 +180,8 @@ int userdiff_config(const char *k, const char *v)
return parse_string(&drv->external, k, v);
if ((drv = parse_driver(k, v, "textconv")))
return parse_string(&drv->textconv, k, v);
+ if ((drv = parse_driver(k, v, "wordregex")))
+ return parse_string(&drv->word_regex, k, v);
return 0;
}
diff --git a/userdiff.h b/userdiff.h
index ba29457..c315159 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -11,6 +11,7 @@ struct userdiff_driver {
const char *external;
int binary;
struct userdiff_funcname funcname;
+ const char *word_regex;
const char *textconv;
};
--
1.6.1.315.g92577
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH v4 0/7] customizable --color-words
2009-01-17 16:29 ` [PATCH v4 0/7] customizable --color-words Thomas Rast
2009-01-17 16:29 ` [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually Thomas Rast
@ 2009-01-18 15:05 ` Santi Béjar
2009-01-18 15:29 ` Santi Béjar
2009-01-19 22:47 ` Santi Béjar
2 siblings, 1 reply; 109+ messages in thread
From: Santi Béjar @ 2009-01-18 15:05 UTC (permalink / raw)
To: Thomas Rast
Cc: git, Junio C Hamano, Johannes Schindelin, Boyd Stephen Smith Jr.,
Teemu Likonen
2009/1/17 Thomas Rast <trast@student.ethz.ch>:
> Johannes Schindelin wrote:
>> Thomas, could you pick up the patches from my 'my-next' branch and
>> maintain an "official" topic branch?
>
> I cherry-picked the three commits you had there, and rebuilt on top.
> I pushed them to
>
> git://repo.or.cz/git/trast.git tr/word-diff-p2
>
> again (js/word-diff-p1 again points directly at your half).
I've tested tr/word-diff-p2 and I have not found any issues. I've even
tested that nothing changed from the tradicional word diff to:
git log -p --color-words="[^[:space:]]+"
for the whole git history.
At the end I've found that a general regex that works best for me is:
"[[:alpha:]]+|[[:digit:]]+|[^[:alnum:][:space:]]"
and that is what I tested.
Santi
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH v4 0/7] customizable --color-words
2009-01-18 15:05 ` [PATCH v4 0/7] customizable --color-words Santi Béjar
@ 2009-01-18 15:29 ` Santi Béjar
0 siblings, 0 replies; 109+ messages in thread
From: Santi Béjar @ 2009-01-18 15:29 UTC (permalink / raw)
To: Thomas Rast
Cc: git, Junio C Hamano, Johannes Schindelin, Boyd Stephen Smith Jr.,
Teemu Likonen
2009/1/18 Santi Béjar <santi@agolina.net>:
> 2009/1/17 Thomas Rast <trast@student.ethz.ch>:
>> Johannes Schindelin wrote:
>>> Thomas, could you pick up the patches from my 'my-next' branch and
>>> maintain an "official" topic branch?
>>
>> I cherry-picked the three commits you had there, and rebuilt on top.
>> I pushed them to
>>
>> git://repo.or.cz/git/trast.git tr/word-diff-p2
>>
>> again (js/word-diff-p1 again points directly at your half).
>
> I've tested tr/word-diff-p2 and I have not found any issues. I've even
> tested that nothing changed from the tradicional word diff to:
>
> git log -p --color-words="[^[:space:]]+"
>
> for the whole git history.
>
What I tested is that the new code produces the same result for this
two commands:
git log -p --color-words="[^[:space:]]+"
git log -p --color-words
The old code produced color codes before and after each word, while
the new only at the begining of the color and the end of the color. So
they cannot produce the same output but equivalent.
Santi
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH v4 0/7] customizable --color-words
2009-01-17 16:29 ` [PATCH v4 0/7] customizable --color-words Thomas Rast
2009-01-17 16:29 ` [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually Thomas Rast
2009-01-18 15:05 ` [PATCH v4 0/7] customizable --color-words Santi Béjar
@ 2009-01-19 22:47 ` Santi Béjar
2009-01-19 23:35 ` Johannes Schindelin
2 siblings, 1 reply; 109+ messages in thread
From: Santi Béjar @ 2009-01-19 22:47 UTC (permalink / raw)
To: Thomas Rast
Cc: git, Junio C Hamano, Johannes Schindelin, Boyd Stephen Smith Jr.,
Teemu Likonen
2009/1/17 Thomas Rast <trast@student.ethz.ch>:
> Johannes Schindelin (4):
> Add color_fwrite_lines(), a function coloring each line individually
> color-words: refactor word splitting and use ALLOC_GROW()
> color-words: change algorithm to allow for 0-character word
> boundaries
> color-words: take an optional regular expression describing words
>
> Thomas Rast (3):
> color-words: enable REG_NEWLINE to help user
> color-words: expand docs with precise semantics
> color-words: make regex configurable via attributes
>
Also, having a config (diff.color-words?) to set the default regexp
would be great. Thanks.
Santi
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH v4 0/7] customizable --color-words
2009-01-19 22:47 ` Santi Béjar
@ 2009-01-19 23:35 ` Johannes Schindelin
2009-01-20 2:17 ` [PATCH] Add tests for diff.color-words configuration option Boyd Stephen Smith Jr.
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-19 23:35 UTC (permalink / raw)
To: Santi Béjar
Cc: Thomas Rast, git, Junio C Hamano, Boyd Stephen Smith Jr.,
Teemu Likonen
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1310 bytes --]
Hi,
On Mon, 19 Jan 2009, Santi Béjar wrote:
> 2009/1/17 Thomas Rast <trast@student.ethz.ch>:
> > Johannes Schindelin (4):
> > Add color_fwrite_lines(), a function coloring each line individually
> > color-words: refactor word splitting and use ALLOC_GROW()
> > color-words: change algorithm to allow for 0-character word
> > boundaries
> > color-words: take an optional regular expression describing words
> >
> > Thomas Rast (3):
> > color-words: enable REG_NEWLINE to help user
> > color-words: expand docs with precise semantics
> > color-words: make regex configurable via attributes
> >
>
> Also, having a config (diff.color-words?) to set the default regexp
> would be great. Thanks.
>From "git log --author==Santi --stat" it seems that you are quite capable
of providing that patch.
A few pointers:
- Add a global variable to diff.c, maybe "char *diff_word_regex".
(Maybe it should be static instead, as it will be used in diff.c only.)
- Add code to set it in diff.c, function git_diff_ui_config().
- In diff.c, where "--color-words" is handled (without "="), add
if (diff_words_regex)
options->word_regex = diff_word_regex;
- Add a test to t4034 that tests that the config sets a default, and that
the command line can override it.
- Send to this list :-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH] Add tests for diff.color-words configuration option.
2009-01-19 23:35 ` Johannes Schindelin
@ 2009-01-20 2:17 ` Boyd Stephen Smith Jr.
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
2009-01-20 9:58 ` [PATCH] Add tests for diff.color-words configuration option Johannes Schindelin
0 siblings, 2 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-20 2:17 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Signed-Off-By: Boyd Stephen Smith Jr. <bss@iguanasuicide.net>
---
On Monday 19 January 2009, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote about 'Re: [PATCH v4
0/7] customizable --color-words':
>On Mon, 19 Jan 2009, Santi Béjar wrote:
>> Also, having a config (diff.color-words?) to set the default regexp
>> would be great. Thanks.
>
>From "git log --author==Santi --stat" it seems that you are quite capable
>of providing that patch.
>
>A few pointers:
>
>- Add a test to t4034 that tests that the config sets a default, and that
> the command line can override it.
Here's a couple tests to get someone started, adds one "known breakage" to
the results of the test suite. This is to be applied on top of
the existing patches.
Yes, I also think I'll work on the actual implementation, but I'd be glad
to have someone beat me to it. I'm not sure why the diff is crazy long.
t/t4034-diff-words.sh | 50 +++++++++++++++++++++++++++++++++++-------------
1 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 744221b..6ebce9d 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -63,7 +63,7 @@ test_expect_success 'word diff with runs of whitespace' '
'
-cat > expect <<\EOF
+cat > expect.letter-runs-are-words <<\EOF
<WHITE>diff --git a/pre b/post<RESET>
<WHITE>index 330b04f..5ed8eff 100644<RESET>
<WHITE>--- a/pre<RESET>
@@ -77,6 +77,7 @@ a = b + c<RESET>
<GREEN>aeff = aeff * ( aaa<RESET> )
EOF
+cp expect.letter-runs-are-words expect
test_expect_success 'word diff with a regular expression' '
@@ -84,21 +85,11 @@ test_expect_success 'word diff with a regular expression' '
'
-test_expect_success 'set a diff driver' '
- git config diff.testdriver.wordregex "[^[:space:]]" &&
- cat <<EOF > .gitattributes
-pre diff=testdriver
-post diff=testdriver
-EOF
-'
-
-test_expect_success 'option overrides default' '
-
- word_diff --color-words="[a-z]+"
-
+test_expect_success 'add configuration for default regex' '
+ git config diff.color-words "[^[:space:]]"
'
-cat > expect <<\EOF
+cat > expect.non-whitespace-is-word <<\EOF
<WHITE>diff --git a/pre b/post<RESET>
<WHITE>index 330b04f..5ed8eff 100644<RESET>
<WHITE>--- a/pre<RESET>
@@ -112,6 +103,37 @@ a = b + c<RESET>
<GREEN>aeff = aeff * ( aaa )<RESET>
EOF
+cp expect.non-whitespace-is-word expect
+
+test_expect_failure 'use default supplied by config' '
+
+ word_diff --color-words
+
+'
+
+cp expect.letter-runs-are-words expect
+
+test_expect_success 'option overrides config-default' '
+
+ word_diff --color-words="[a-z]+"
+
+'
+
+test_expect_success 'set a diff driver' '
+ git config diff.testdriver.wordregex "[^[:space:]]" &&
+ cat <<EOF > .gitattributes
+pre diff=testdriver
+post diff=testdriver
+EOF
+'
+
+test_expect_success 'option overrides default' '
+
+ word_diff --color-words="[a-z]+"
+
+'
+
+cp expect.non-whitespace-is-word expect
test_expect_success 'use default supplied by driver' '
--
1.5.6.5
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH] diff: Support diff.color-words config option
2009-01-20 2:17 ` [PATCH] Add tests for diff.color-words configuration option Boyd Stephen Smith Jr.
@ 2009-01-20 3:45 ` Boyd Stephen Smith Jr.
2009-01-20 6:59 ` Junio C Hamano
` (2 more replies)
2009-01-20 9:58 ` [PATCH] Add tests for diff.color-words configuration option Johannes Schindelin
1 sibling, 3 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-20 3:45 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
When diff is invoked with --color-words (w/o =regex), use the regular
expression the user has configured as diff.color-words.
diff drivers configured via attributes take precedence over the
diff.color-words setting. If the user wants to change them, they have
their own configuration variables.
Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net>
---
On Monday 19 January 2009, "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> wrote about '[PATCH] Add
tests for diff.color-words configuration option.':
>Yes, I also think I'll work on the actual implementation, but I'd be glad
>to have someone beat me to it. I'm not sure why the diff is crazy long.
Here's a patch that makes the added test case succeed, but I think it and
the tests themselves should probably be reworked. Hopefully, this doesn't
show up in quoted-printable format (damn you kmail).
While it might be a corner-case, we probably need a test of some sort for
when a user/system has a global diff.color-words configuration wants
to have a single repository (or single run of 'git diff') use the default
algorithm. I.e. run as if no regex had been set.
diff.c | 5 +++++
t/t4034-diff-words.sh | 2 +-
2 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/diff.c b/diff.c
index 9fcde96..c53e1d1 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_detect_rename_default;
static int diff_rename_limit_default = 200;
static int diff_suppress_blank_empty;
int diff_use_color_default = -1;
+static const char *diff_color_words_cfg = NULL;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
static int diff_mnemonic_prefix;
@@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
}
if (!strcmp(var, "diff.external"))
return git_config_string(&external_diff_cmd_cfg, var, value);
+ if (!strcmp(var, "diff.color-words"))
+ return git_config_string(&diff_color_words_cfg, var, value);
return git_diff_basic_config(var, value, cb);
}
@@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
o->word_regex = userdiff_word_regex(one);
if (!o->word_regex)
o->word_regex = userdiff_word_regex(two);
+ if (!o->word_regex)
+ o->word_regex = diff_color_words_cfg;
if (o->word_regex) {
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 6ebce9d..a207d9e 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -105,7 +105,7 @@ a = b + c<RESET>
EOF
cp expect.non-whitespace-is-word expect
-test_expect_failure 'use default supplied by config' '
+test_expect_success 'use default supplied by config' '
word_diff --color-words
--
1.5.6.5
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
@ 2009-01-20 6:59 ` Junio C Hamano
2009-01-20 17:42 ` Markus Heidelberg
2009-01-20 10:02 ` Johannes Schindelin
2009-01-20 14:38 ` [PATCH] diff: Support diff.color-words " Jakub Narebski
2 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-20 6:59 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Johannes Schindelin, Santi Béjar, Thomas Rast, git,
Teemu Likonen
"Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
> When diff is invoked with --color-words (w/o =regex), use the regular
> expression the user has configured as diff.color-words.
>
> diff drivers configured via attributes take precedence over the
> diff.color-words setting. If the user wants to change them, they have
> their own configuration variables.
This needs an entry in Documentation/config.txt
None of the existing configuration variables defined use hyphens in
multi-word variable names.
Other than that, I think this is a welcome addition to the suite.
Thanks.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Add tests for diff.color-words configuration option.
2009-01-20 2:17 ` [PATCH] Add tests for diff.color-words configuration option Boyd Stephen Smith Jr.
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
@ 2009-01-20 9:58 ` Johannes Schindelin
2009-01-20 16:34 ` Boyd Stephen Smith Jr.
1 sibling, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 9:58 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Hi,
On Mon, 19 Jan 2009, Boyd Stephen Smith Jr. wrote:
> I'm not sure why the diff is crazy long.
Because you changed things that need no changing, such as "cat > expect"
-> "cat > expect.blabla", and because you inserted your test instead of
adding it at the end.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
2009-01-20 6:59 ` Junio C Hamano
@ 2009-01-20 10:02 ` Johannes Schindelin
2009-01-20 16:52 ` Boyd Stephen Smith Jr.
` (2 more replies)
2009-01-20 14:38 ` [PATCH] diff: Support diff.color-words " Jakub Narebski
2 siblings, 3 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 10:02 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Hi,
On Mon, 19 Jan 2009, Boyd Stephen Smith Jr. wrote:
> diff --git a/diff.c b/diff.c
> index 9fcde96..c53e1d1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -23,6 +23,7 @@ static int diff_detect_rename_default;
> static int diff_rename_limit_default = 200;
> static int diff_suppress_blank_empty;
> int diff_use_color_default = -1;
> +static const char *diff_color_words_cfg = NULL;
> static const char *external_diff_cmd_cfg;
Guess why external_diff_cmd_cfg is not set to NULL? All variables
defined outside a function are set to all-zero anyway.
> @@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
> }
> if (!strcmp(var, "diff.external"))
> return git_config_string(&external_diff_cmd_cfg, var, value);
> + if (!strcmp(var, "diff.color-words"))
I'd call it diff.wordregex, because that's what it is.
> @@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
> o->word_regex = userdiff_word_regex(one);
> if (!o->word_regex)
> o->word_regex = userdiff_word_regex(two);
> + if (!o->word_regex)
> + o->word_regex = diff_color_words_cfg;
IMHO this is the wrong order. config should not override attributes,
which are by definition more specific.
> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> index 6ebce9d..a207d9e 100755
> --- a/t/t4034-diff-words.sh
> +++ b/t/t4034-diff-words.sh
> @@ -105,7 +105,7 @@ a = b + c<RESET>
> EOF
> cp expect.non-whitespace-is-word expect
>
> -test_expect_failure 'use default supplied by config' '
> +test_expect_success 'use default supplied by config' '
Let's squash the two, okay?
Thanks,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
2009-01-20 6:59 ` Junio C Hamano
2009-01-20 10:02 ` Johannes Schindelin
@ 2009-01-20 14:38 ` Jakub Narebski
2 siblings, 0 replies; 109+ messages in thread
From: Jakub Narebski @ 2009-01-20 14:38 UTC (permalink / raw)
To: git
Boyd Stephen Smith Jr. wrote:
> Nawiązania: 1 2 3
> When diff is invoked with --color-words (w/o =regex), use the regular
> expression the user has configured as diff.color-words.
>
> diff drivers configured via attributes take precedence over the
> diff.color-words setting. If the user wants to change them, they have
> their own configuration variables.
Just a nit: all other configuration variables use camelCase or runwords;
this would be first configuration variable with '-' as words separator,
I think.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Add tests for diff.color-words configuration option.
2009-01-20 9:58 ` [PATCH] Add tests for diff.color-words configuration option Johannes Schindelin
@ 2009-01-20 16:34 ` Boyd Stephen Smith Jr.
2009-01-20 16:54 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-20 16:34 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]
On Tuesday 2009 January 20 03:58:37 Johannes Schindelin wrote:
>On Mon, 19 Jan 2009, Boyd Stephen Smith Jr. wrote:
>> I'm not sure why the diff is crazy long.
>
>Because you changed things that need no changing, such as "cat > expect"
>-> "cat > expect.blabla",
I suppose I could have gotten away with doing this differently, but I did need
to save off some of those results to different files because I wanted to
resuse the results.
>and because you inserted your test instead of
>adding it at the end.
I put the tests in that order explicitly to test that .gitattributes overrides
the configuration option.
I'm going to be reworking both patches anyway, so I should be able to
rearrange things less, in this file.
Thanks for the feedback.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 10:02 ` Johannes Schindelin
@ 2009-01-20 16:52 ` Boyd Stephen Smith Jr.
2009-01-20 17:14 ` Johannes Schindelin
2009-01-20 17:09 ` Junio C Hamano
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
2 siblings, 1 reply; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-20 16:52 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]
On Tuesday 2009 January 20 04:02:00 you wrote:
>On Mon, 19 Jan 2009, Boyd Stephen Smith Jr. wrote:
>> diff --git a/diff.c b/diff.c
>> index 9fcde96..c53e1d1 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -23,6 +23,7 @@ static int diff_detect_rename_default;
>> static int diff_rename_limit_default = 200;
>> static int diff_suppress_blank_empty;
>> int diff_use_color_default = -1;
>> +static const char *diff_color_words_cfg = NULL;
>> static const char *external_diff_cmd_cfg;
>
>Guess why external_diff_cmd_cfg is not set to NULL? All variables
>defined outside a function are set to all-zero anyway.
I suppose I just initialize variables by reflex, having been bitten with too
many sometimes-crashes due to variables that were usually-zero. Assuming C
does guarantee that it is zeroed, I'll drop the " = NULL" line noise in the
next version.
>> @@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char
>> *value, void *cb) }
>> if (!strcmp(var, "diff.external"))
>> return git_config_string(&external_diff_cmd_cfg, var, value);
>> + if (!strcmp(var, "diff.color-words"))
>
>I'd call it diff.wordregex, because that's what it is.
I don't like runtogetherwords because they are hard to read for me; I tend to
choose the wrong word breaks if it is ambiguous. There are other
configuration values that use camelCaseWords so I will convert over to using
that.
I thought "word regex" made more sense, but I wanted to match the command-line
option. Will change.
>> @@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
>> o->word_regex = userdiff_word_regex(one);
>> if (!o->word_regex)
>> o->word_regex = userdiff_word_regex(two);
>> + if (!o->word_regex)
>> + o->word_regex = diff_color_words_cfg;
>
>IMHO this is the wrong order. config should not override attributes,
>which are by definition more specific.
You are up too late Dscho. This ordering makes the config not override
attributes. If one of the files has a diff driver, o->word_regex will be set
to it (and become non-NULL). That will prevent execution of the body of the
added "if (!o->word_regex)" -- preventing the configuration option from being
used.
>> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
>> index 6ebce9d..a207d9e 100755
>> --- a/t/t4034-diff-words.sh
>> +++ b/t/t4034-diff-words.sh
>> @@ -105,7 +105,7 @@ a = b + c<RESET>
>> EOF
>> cp expect.non-whitespace-is-word expect
>>
>> -test_expect_failure 'use default supplied by config' '
>> +test_expect_success 'use default supplied by config' '
>
>Let's squash the two, okay?
Will do. I expected the code changes to be larger than the test, and when I
finished it was completely the other way. My next patch will be all-in-one.
Thanks for your feedback.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Add tests for diff.color-words configuration option.
2009-01-20 16:34 ` Boyd Stephen Smith Jr.
@ 2009-01-20 16:54 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 16:54 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.; +Cc: git
Hi,
On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
> On Tuesday 2009 January 20 03:58:37 Johannes Schindelin wrote:
> >On Mon, 19 Jan 2009, Boyd Stephen Smith Jr. wrote:
> >> I'm not sure why the diff is crazy long.
> >
> >Because you changed things that need no changing, such as "cat > expect"
> >-> "cat > expect.blabla",
>
> I suppose I could have gotten away with doing this differently, but I
> did need to save off some of those results to different files because I
> wanted to resuse the results.
Why didn't you do that, then?
cp expect expect.for-later-use
> >and because you inserted your test instead of adding it at the end.
>
> I put the tests in that order explicitly to test that .gitattributes
> overrides the configuration option.
Why not just remove the .gitattributes for your second test?
It would be much clearer that you did not modify any existing tests, then.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 10:02 ` Johannes Schindelin
2009-01-20 16:52 ` Boyd Stephen Smith Jr.
@ 2009-01-20 17:09 ` Junio C Hamano
2009-01-20 17:28 ` Johannes Schindelin
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
2 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-20 17:09 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Boyd Stephen Smith Jr., Santi Béjar, Thomas Rast, git,
Teemu Likonen
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> @@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>> }
>> if (!strcmp(var, "diff.external"))
>> return git_config_string(&external_diff_cmd_cfg, var, value);
>> + if (!strcmp(var, "diff.color-words"))
>
> I'd call it diff.wordregex, because that's what it is.
If we want to add a new word-oriented option to diff that is not about
coloring the word differences, is it safe and sane to reuse the same
definition? That is, "git diff --color-words" would be affected when
diff.wordregex is set to some value, so does any new word-oriented
operation we will add, and the single regex configured would be used as
the default value to define how a word would look like.
I think it makes sense; I do not think of a case offhand where you would
want to define what a word is for the purpose of coloring diffs in one
way, and would want to use a different definition for another
word-oriented operation.
>> @@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
>> o->word_regex = userdiff_word_regex(one);
>> if (!o->word_regex)
>> o->word_regex = userdiff_word_regex(two);
>> + if (!o->word_regex)
>> + o->word_regex = diff_color_words_cfg;
>
> IMHO this is the wrong order. config should not override attributes,
> which are by definition more specific.
Isn't it merely giving a fallback value when attributes does not give one?
By the way, wouldn't it make sense to optimize the precontext of that hunk
by doing _something_ like:
if (!o->word_regex && strcmp(one->path, two->path))
o->word_regex = userdiff_word_regex(two);
"Something like" comes from special cases like /dev/null for new/deleted
files, etc.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 16:52 ` Boyd Stephen Smith Jr.
@ 2009-01-20 17:14 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 17:14 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.; +Cc: git
Hi,
On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
> You are up too late Dscho.
You, sir, are absolutely correct.
> >Let's squash the two, okay?
>
> Will do. I expected the code changes to be larger than the test, and
> when I finished it was completely the other way. My next patch will be
> all-in-one.
FWIW I think it is the correct thing to start with the test script, so
that you get a better idea what to look out for.
And for patches of which I don't know if they are still necessary, I like
to "git checkout <name>^ && make -j50 && git checkout <name> && (cd t &&
sh <test>)".
But for submission, I think it makes sense to squash them, except if you
submit a bug report with a test script to show the validity of the report
first, and only later decide that you want to fix it yourself.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 17:09 ` Junio C Hamano
@ 2009-01-20 17:28 ` Johannes Schindelin
2009-01-20 20:27 ` Junio C Hamano
0 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 17:28 UTC (permalink / raw)
To: Junio C Hamano
Cc: Boyd Stephen Smith Jr., Santi Béjar, Thomas Rast, git,
Teemu Likonen
Hi,
On Tue, 20 Jan 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> @@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
> >> }
> >> if (!strcmp(var, "diff.external"))
> >> return git_config_string(&external_diff_cmd_cfg, var, value);
> >> + if (!strcmp(var, "diff.color-words"))
> >
> > I'd call it diff.wordregex, because that's what it is.
>
> If we want to add a new word-oriented option to diff that is not about
> coloring the word differences, is it safe and sane to reuse the same
> definition? That is, "git diff --color-words" would be affected when
> diff.wordregex is set to some value, so does any new word-oriented
> operation we will add, and the single regex configured would be used as
> the default value to define how a word would look like.
>
> I think it makes sense; I do not think of a case offhand where you would
> want to define what a word is for the purpose of coloring diffs in one
> way, and would want to use a different definition for another
> word-oriented operation.
Why not cross that bridge when we're there? Should we ever feel the need
for different word regexes, we would just introduce color.wordregex.
> >> @@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
> >> o->word_regex = userdiff_word_regex(one);
> >> if (!o->word_regex)
> >> o->word_regex = userdiff_word_regex(two);
> >> + if (!o->word_regex)
> >> + o->word_regex = diff_color_words_cfg;
> >
> > IMHO this is the wrong order. config should not override attributes,
> > which are by definition more specific.
>
> Isn't it merely giving a fallback value when attributes does not give one?
Yep. Boyd (or Stephen, as he wants to be called, making it hard to guess
from his email address, but that's all part of the fun, in't it?) already
realized that I was up too late and got the order wrong myself.
> By the way, wouldn't it make sense to optimize the precontext of that
> hunk by doing _something_ like:
>
> if (!o->word_regex && strcmp(one->path, two->path))
> o->word_regex = userdiff_word_regex(two);
>
> "Something like" comes from special cases like /dev/null for new/deleted
> files, etc.
You mean to avoid the cost of initializing the regex in case one and the
same file is diffed against itself? But that would be better handled
before calling builtin_diff(), don't you think?
I do not know off-hand if diffcore_std() handles that already, so that the
diff_flush() ... builtin_diff() cascade is not even called.
But you raise a valid concern: the regular expression is initialized every
time we look at a file. We probably should have a member
word_regex_compiled in diff_options, then, and only initialize it the
first time.
Ciao,
Dscho "who does not have the time to work on Git right now"
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 6:59 ` Junio C Hamano
@ 2009-01-20 17:42 ` Markus Heidelberg
2009-01-20 17:58 ` Boyd Stephen Smith Jr.
2009-01-20 21:08 ` Johannes Schindelin
0 siblings, 2 replies; 109+ messages in thread
From: Markus Heidelberg @ 2009-01-20 17:42 UTC (permalink / raw)
To: Junio C Hamano
Cc: Boyd Stephen Smith Jr., Johannes Schindelin, Santi Béjar,
Thomas Rast, git, Teemu Likonen
Junio C Hamano, 20.01.2009:
> "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
>
> > When diff is invoked with --color-words (w/o =regex), use the regular
> > expression the user has configured as diff.color-words.
> >
> > diff drivers configured via attributes take precedence over the
> > diff.color-words setting. If the user wants to change them, they have
> > their own configuration variables.
>
> This needs an entry in Documentation/config.txt
>
> None of the existing configuration variables defined use hyphens in
> multi-word variable names.
Except for diff.suppress-blank-empty
Should it be converted or is it intention to reflect GNU diff's option?
Markus
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 17:42 ` Markus Heidelberg
@ 2009-01-20 17:58 ` Boyd Stephen Smith Jr.
2009-01-20 21:08 ` Johannes Schindelin
1 sibling, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-20 17:58 UTC (permalink / raw)
To: markus.heidelberg
Cc: Junio C Hamano, Johannes Schindelin, Santi Béjar,
Thomas Rast, git, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]
On Tuesday 2009 January 20 11:42:23 Markus Heidelberg wrote:
>Junio C Hamano, 20.01.2009:
>> "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
>> > When diff is invoked with --color-words (w/o =regex), use the regular
>> > expression the user has configured as diff.color-words.
>> >
>> > diff drivers configured via attributes take precedence over the
>> > diff.color-words setting. If the user wants to change them, they have
>> > their own configuration variables.
>>
>> This needs an entry in Documentation/config.txt
>>
>> None of the existing configuration variables defined use hyphens in
>> multi-word variable names.
>
>Except for diff.suppress-blank-empty
>Should it be converted or is it intention to reflect GNU diff's option?
I think best would be to have a project policy, use that for the wordRegex
option and other options moving forward, then fix the others at some point in
the future (1.7?) while having some period of time where both old and "per
policy" names work. But, then I'm a big fan of standardization.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 17:28 ` Johannes Schindelin
@ 2009-01-20 20:27 ` Junio C Hamano
2009-01-20 21:02 ` Johannes Schindelin
0 siblings, 1 reply; 109+ messages in thread
From: Junio C Hamano @ 2009-01-20 20:27 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Boyd Stephen Smith Jr., Santi Béjar, Thomas Rast, git,
Teemu Likonen
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> By the way, wouldn't it make sense to optimize the precontext of that
>> hunk by doing _something_ like:
>>
>> if (!o->word_regex && strcmp(one->path, two->path))
>> o->word_regex = userdiff_word_regex(two);
>>
>> "Something like" comes from special cases like /dev/null for new/deleted
>> files, etc.
>
> You mean to avoid the cost of initializing the regex in case one and the
> same file is diffed against itself?
No.
What I meant is much simpler than that.
If one and two are the same filename, and earlier gitattributes lookup for
the path already failed to produce any when you checked one, isn't it very
likely that the gitattributes lookup for two would fail the same way to
produce any result?
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 20:27 ` Junio C Hamano
@ 2009-01-20 21:02 ` Johannes Schindelin
0 siblings, 0 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 21:02 UTC (permalink / raw)
To: Junio C Hamano
Cc: Boyd Stephen Smith Jr., Santi Béjar, Thomas Rast, git,
Teemu Likonen
Hi,
On Tue, 20 Jan 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> By the way, wouldn't it make sense to optimize the precontext of that
> >> hunk by doing _something_ like:
> >>
> >> if (!o->word_regex && strcmp(one->path, two->path))
> >> o->word_regex = userdiff_word_regex(two);
> >>
> >> "Something like" comes from special cases like /dev/null for new/deleted
> >> files, etc.
> >
> > You mean to avoid the cost of initializing the regex in case one and the
> > same file is diffed against itself?
>
> No.
>
> What I meant is much simpler than that.
>
> If one and two are the same filename, and earlier gitattributes lookup for
> the path already failed to produce any when you checked one, isn't it very
> likely that the gitattributes lookup for two would fail the same way to
> produce any result?
Oh, I see!
Thanks,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 17:42 ` Markus Heidelberg
2009-01-20 17:58 ` Boyd Stephen Smith Jr.
@ 2009-01-20 21:08 ` Johannes Schindelin
2009-01-21 10:27 ` Junio C Hamano
2009-01-21 19:37 ` Markus Heidelberg
1 sibling, 2 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-20 21:08 UTC (permalink / raw)
To: Markus Heidelberg
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Santi Béjar,
Thomas Rast, git, Teemu Likonen
Hi,
On Tue, 20 Jan 2009, Markus Heidelberg wrote:
> Junio C Hamano, 20.01.2009:
> > "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
> >
> > > When diff is invoked with --color-words (w/o =regex), use the regular
> > > expression the user has configured as diff.color-words.
> > >
> > > diff drivers configured via attributes take precedence over the
> > > diff.color-words setting. If the user wants to change them, they have
> > > their own configuration variables.
> >
> > This needs an entry in Documentation/config.txt
> >
> > None of the existing configuration variables defined use hyphens in
> > multi-word variable names.
>
> Except for diff.suppress-blank-empty
> Should it be converted or is it intention to reflect GNU diff's option?
Grumble. It's in v1.6.1-rc1~348, so we cannot just go ahead and fix it.
My preference would be to convert it _except_ that the old name should
still work. But it should not be advertized.
Ciao,
Dscho "who loves consistency, and knows new users appreciate it, too"
-- snipsnap --
[PATCH] Rename diff.suppress-blank-empty to diff.suppressBlankEmpty
All the other config variables use CamelCase. This config variable should
not be an exception.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/config.txt | 2 +-
diff.c | 4 +++-
t/t4029-diff-trailing-space.sh | 8 ++++----
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index c92e7e6..4f0a0b1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -652,7 +652,7 @@ diff.renames::
will enable basic rename detection. If set to "copies" or
"copy", it will detect copies, as well.
-diff.suppress-blank-empty::
+diff.suppressBlankEmpty::
A boolean to inhibit the standard behavior of printing a space
before each empty output line. Defaults to false.
diff --git a/diff.c b/diff.c
index c6a992d..0100b59 100644
--- a/diff.c
+++ b/diff.c
@@ -118,7 +118,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
}
/* like GNU diff's --suppress-blank-empty option */
- if (!strcmp(var, "diff.suppress-blank-empty")) {
+ if (!strcmp(var, "diff.suppressblankempty") ||
+ /* for backwards compatibility */
+ !strcmp(var, "diff.suppress-blank-empty")) {
diff_suppress_blank_empty = git_config_bool(var, value);
return 0;
}
diff --git a/t/t4029-diff-trailing-space.sh b/t/t4029-diff-trailing-space.sh
index 4ca65e0..9ddbbcd 100755
--- a/t/t4029-diff-trailing-space.sh
+++ b/t/t4029-diff-trailing-space.sh
@@ -2,7 +2,7 @@
#
# Copyright (c) Jim Meyering
#
-test_description='diff honors config option, diff.suppress-blank-empty'
+test_description='diff honors config option, diff.suppressBlankEmpty'
. ./test-lib.sh
@@ -24,14 +24,14 @@ test_expect_success \
git add f &&
git commit -q -m. f &&
printf "\ny\n" > f &&
- git config --bool diff.suppress-blank-empty true &&
+ git config --bool diff.suppressBlankEmpty true &&
git diff f > actual &&
test_cmp exp actual &&
perl -i.bak -p -e "s/^\$/ /" exp &&
- git config --bool diff.suppress-blank-empty false &&
+ git config --bool diff.suppressBlankEmpty false &&
git diff f > actual &&
test_cmp exp actual &&
- git config --bool --unset diff.suppress-blank-empty &&
+ git config --bool --unset diff.suppressBlankEmpty &&
git diff f > actual &&
test_cmp exp actual
'
--
1.6.1.439.g22f77c
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH] color-words: Support diff.color-words config option
2009-01-20 10:02 ` Johannes Schindelin
2009-01-20 16:52 ` Boyd Stephen Smith Jr.
2009-01-20 17:09 ` Junio C Hamano
@ 2009-01-21 3:46 ` Boyd Stephen Smith Jr.
2009-01-21 4:59 ` [PATCH] Change the spelling of "wordregex" Boyd Stephen Smith Jr.
` (2 more replies)
2 siblings, 3 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-21 3:46 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
When diff is invoked with --color-words (w/o =regex), use the regular
expression the user has configured as diff.wordregex.
diff drivers configured via attributes take precedence over the
diff.wordregex-words setting. If the user wants to change them, they have
their own configuration variables.
Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net>
---
This version is squashed into one patch and includes documentation and
rewritten tests. It was generated against js/diff-color-words~2,
80c49c3d (color-words: make regex configurable via attributes), replacing
my previous 2 patches. It uses "diff.wordregex" for reasons mention by
Dscho and because that was already what the diff drivers were using.
I'm not entirely satisfied with it. There should probably be some way
to force the default behavior (which is a bit faster) even if a global
config or diff driver exists. Also, I think camelCase is better than
runtogether so I'd prefer to change "wordregex" -> "wordRegex" across
the entire patch set.
Documentation/config.txt | 6 +++++
Documentation/diff-options.txt | 7 +++--
diff.c | 5 ++++
t/t4034-diff-words.sh | 45 ++++++++++++++++++++++++++++++++++++++-
4 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 7408bb2..0ca983a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -639,6 +639,12 @@ diff.suppress-blank-empty::
A boolean to inhibit the standard behavior of printing a space
before each empty output line. Defaults to false.
+diff.wordregex::
+ A POSIX Extended Regular Expression used to determine what is a "word"
+ when performing word-by-word difference calculations. Character
+ sequences that match the regular expression are "words", all other
+ characters are *ignorable* whitespace.
+
fetch.unpackLimit::
If the number of objects fetched over the git native
transfer is below this
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 1edb82e..164e2c5 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -103,9 +103,10 @@ expression to make sure that it matches all non-whitespace characters.
A match that contains a newline is silently truncated(!) at the
newline.
+
-The regex can also be set via a diff driver, see
-linkgit:gitattributes[1]; giving it explicitly overrides any diff
-driver setting.
+The regex can also be set via a diff driver or configuration option, see
+linkgit:gitattributes[1] or linkgit:git-config[1]. Giving it explicitly
+overrides any diff driver or configuration setting. Diff drivers
+override configuration settings.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index 9fcde96..ed8b83c 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_detect_rename_default;
static int diff_rename_limit_default = 200;
static int diff_suppress_blank_empty;
int diff_use_color_default = -1;
+static const char *diff_word_regex_cfg;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
static int diff_mnemonic_prefix;
@@ -92,6 +93,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
}
if (!strcmp(var, "diff.external"))
return git_config_string(&external_diff_cmd_cfg, var, value);
+ if (!strcmp(var, "diff.wordregex"))
+ return git_config_string(&diff_word_regex_cfg, var, value);
return git_diff_basic_config(var, value, cb);
}
@@ -1550,6 +1553,8 @@ static void builtin_diff(const char *name_a,
o->word_regex = userdiff_word_regex(one);
if (!o->word_regex)
o->word_regex = userdiff_word_regex(two);
+ if (!o->word_regex)
+ o->word_regex = diff_word_regex_cfg;
if (o->word_regex) {
ecbdata.diff_words->word_regex = (regex_t *)
xmalloc(sizeof(regex_t));
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 744221b..6bcc153 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -77,6 +77,7 @@ a = b + c<RESET>
<GREEN>aeff = aeff * ( aaa<RESET> )
EOF
+cp expect expect.letter-runs-are-words
test_expect_success 'word diff with a regular expression' '
@@ -92,7 +93,7 @@ post diff=testdriver
EOF
'
-test_expect_success 'option overrides default' '
+test_expect_success 'option overrides .gitattributes' '
word_diff --color-words="[a-z]+"
@@ -112,13 +113,53 @@ a = b + c<RESET>
<GREEN>aeff = aeff * ( aaa )<RESET>
EOF
+cp expect expect.non-whitespace-is-word
-test_expect_success 'use default supplied by driver' '
+test_expect_success 'use regex supplied by driver' '
word_diff --color-words
'
+test_expect_success 'set diff.wordregex option' '
+ git config diff.wordregex "[[:alnum:]]+"
+'
+
+cp expect.letter-runs-are-words expect
+
+test_expect_success 'command-line overrides config' '
+ word_diff --color-words="[a-z]+"
+'
+
+cp expect.non-whitespace-is-word expect
+
+test_expect_success '.gitattributes override config' '
+ word_diff --color-words
+'
+
+test_expect_success 'remove diff driver regex' '
+ git config --unset diff.testdriver.wordregex
+'
+
+cat > expect <<\EOF
+<WHITE>diff --git a/pre b/post<RESET>
+<WHITE>index 330b04f..5ed8eff 100644<RESET>
+<WHITE>--- a/pre<RESET>
+<WHITE>+++ b/post<RESET>
+<BROWN>@@ -1,3 +1,7 @@<RESET>
+h(4),<GREEN>hh[44<RESET>]
+<RESET>
+a = b + c<RESET>
+
+<GREEN>aa = a<RESET>
+
+<GREEN>aeff = aeff * ( aaa<RESET> )
+EOF
+
+test_expect_success 'use configured regex' '
+ word_diff --color-words
+'
+
echo 'aaa (aaa)' > pre
echo 'aaa (aaa) aaa' > post
--
1.5.6.5
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
^ permalink raw reply related [flat|nested] 109+ messages in thread
* [PATCH] Change the spelling of "wordregex".
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
@ 2009-01-21 4:59 ` Boyd Stephen Smith Jr.
2009-01-21 8:26 ` Johannes Schindelin
2009-01-21 8:25 ` [PATCH] color-words: Support diff.color-words config option Johannes Schindelin
2009-01-21 10:27 ` [PATCH] color-words: Support diff.wordregex " Junio C Hamano
2 siblings, 1 reply; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-21 4:59 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Use "wordRegex" for configuration variable names. Use "word_regex" for C
language tokens.
Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net>
---
On Tuesday 20 January 2009, "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> wrote about '[PATCH]
color-words: Support diff.color-words config option':
>I'm not entirely satisfied with it. [...] I think camelCase is better than
>runtogether so I'd prefer to change "wordregex" -> "wordRegex" across
>the entire patch set.
Here's a patch that does something like that, that can be squashed into the
previous patch.
Documentation/config.txt | 2 +-
Documentation/gitattributes.txt | 4 ++--
t/t4034-diff-words.sh | 8 ++++----
userdiff.c | 4 ++--
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0ca983a..332213e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -639,7 +639,7 @@ diff.suppress-blank-empty::
A boolean to inhibit the standard behavior of printing a space
before each empty output line. Defaults to false.
-diff.wordregex::
+diff.wordRegex::
A POSIX Extended Regular Expression used to determine what is a "word"
when performing word-by-word difference calculations. Character
sequences that match the regular expression are "words", all other
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index ba3ba12..227934f 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -341,14 +341,14 @@ Customizing word diff
You can customize the rules that `git diff --color-words` uses to
split words in a line, by specifying an appropriate regular expression
-in the "diff.*.wordregex" configuration variable. For example, in TeX
+in the "diff.*.wordRegex" configuration variable. For example, in TeX
a backslash followed by a sequence of letters forms a command, but
several such commands can be run together without intervening
whitespace. To separate them, use a regular expression such as
------------------------
[diff "tex"]
- wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
+ wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
------------------------
A built-in pattern is provided for all languages listed in the
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index 6bcc153..4508eff 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -86,7 +86,7 @@ test_expect_success 'word diff with a regular expression' '
'
test_expect_success 'set a diff driver' '
- git config diff.testdriver.wordregex "[^[:space:]]" &&
+ git config diff.testdriver.wordRegex "[^[:space:]]" &&
cat <<EOF > .gitattributes
pre diff=testdriver
post diff=testdriver
@@ -121,8 +121,8 @@ test_expect_success 'use regex supplied by driver' '
'
-test_expect_success 'set diff.wordregex option' '
- git config diff.wordregex "[[:alnum:]]+"
+test_expect_success 'set diff.wordRegex option' '
+ git config diff.wordRegex "[[:alnum:]]+"
'
cp expect.letter-runs-are-words expect
@@ -138,7 +138,7 @@ test_expect_success '.gitattributes override config' '
'
test_expect_success 'remove diff driver regex' '
- git config --unset diff.testdriver.wordregex
+ git config --unset diff.testdriver.wordRegex
'
cat > expect <<\EOF
diff --git a/userdiff.c b/userdiff.c
index 2b55509..d556da9 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,8 +6,8 @@ static struct userdiff_driver *drivers;
static int ndrivers;
static int drivers_alloc;
-#define PATTERNS(name, pattern, wordregex) \
- { name, NULL, -1, { pattern, REG_EXTENDED }, wordregex }
+#define PATTERNS(name, pattern, word_regex) \
+ { name, NULL, -1, { pattern, REG_EXTENDED }, word_regex }
static struct userdiff_driver builtin_drivers[] = {
PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
"[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
--
1.5.6.5
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: [PATCH] color-words: Support diff.color-words config option
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
2009-01-21 4:59 ` [PATCH] Change the spelling of "wordregex" Boyd Stephen Smith Jr.
@ 2009-01-21 8:25 ` Johannes Schindelin
2009-01-21 16:09 ` Boyd Stephen Smith Jr.
2009-01-21 10:27 ` [PATCH] color-words: Support diff.wordregex " Junio C Hamano
2 siblings, 1 reply; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-21 8:25 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Hi,
On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
> It uses "diff.wordregex" for reasons mention by Dscho and because that
> was already what the diff drivers were using.
To be fair, Jakub and Junio mentioned it, too.
> I'm not entirely satisfied with it. There should probably be some way
> to force the default behavior (which is a bit faster) even if a global
> config or diff driver exists. Also, I think camelCase is better than
> runtogether so I'd prefer to change "wordregex" -> "wordRegex" across
> the entire patch set.
Well, the thing is, it _should_ be "wordRegex", _except_ in the strcmp()
because the config helpers get a downcased key.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Change the spelling of "wordregex".
2009-01-21 4:59 ` [PATCH] Change the spelling of "wordregex" Boyd Stephen Smith Jr.
@ 2009-01-21 8:26 ` Johannes Schindelin
2009-01-21 9:22 ` Thomas Rast
2009-01-21 15:33 ` Boyd Stephen Smith Jr.
0 siblings, 2 replies; 109+ messages in thread
From: Johannes Schindelin @ 2009-01-21 8:26 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
Hi,
On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
> diff --git a/userdiff.c b/userdiff.c
> index 2b55509..d556da9 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -6,8 +6,8 @@ static struct userdiff_driver *drivers;
> static int ndrivers;
> static int drivers_alloc;
>
> -#define PATTERNS(name, pattern, wordregex) \
> - { name, NULL, -1, { pattern, REG_EXTENDED }, wordregex }
> +#define PATTERNS(name, pattern, word_regex) \
> + { name, NULL, -1, { pattern, REG_EXTENDED }, word_regex }
> static struct userdiff_driver builtin_drivers[] = {
> PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
> "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
In general, it is an awesomly good idea to imitate code that is already
there. That literally guarantees consistency (which is Good, as you
know).
And Thomas just imitated "xfuncname", which just so happens to be without
an "_".
Ciao,
Dscho
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Change the spelling of "wordregex".
2009-01-21 8:26 ` Johannes Schindelin
@ 2009-01-21 9:22 ` Thomas Rast
2009-01-21 15:33 ` Boyd Stephen Smith Jr.
1 sibling, 0 replies; 109+ messages in thread
From: Thomas Rast @ 2009-01-21 9:22 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Boyd Stephen Smith Jr., Santi Béjar, git, Junio C Hamano,
Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 468 bytes --]
Johannes Schindelin wrote:
> And Thomas just imitated "xfuncname", which just so happens to be without
> an "_".
Then again I ignored the 'x' for "extended regex", so it's not
entirely consistent.
[Mostly because I think the user expects a "<something>" whenever
there's an "x<something>", and "funcname" is actually deprecated/not
documented any more, so introducing a basic-regex version seemed
silly.]
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 21:08 ` Johannes Schindelin
@ 2009-01-21 10:27 ` Junio C Hamano
2009-01-21 19:37 ` Markus Heidelberg
1 sibling, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2009-01-21 10:27 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Markus Heidelberg, Boyd Stephen Smith Jr., Santi Béjar,
Thomas Rast, git, Teemu Likonen
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Subject: Rename diff.suppress-blank-empty to diff.suppressBlankEmpty
>
> All the other config variables use CamelCase. This config variable should
> not be an exception.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
Thanks.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] color-words: Support diff.wordregex config option
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
2009-01-21 4:59 ` [PATCH] Change the spelling of "wordregex" Boyd Stephen Smith Jr.
2009-01-21 8:25 ` [PATCH] color-words: Support diff.color-words config option Johannes Schindelin
@ 2009-01-21 10:27 ` Junio C Hamano
2 siblings, 0 replies; 109+ messages in thread
From: Junio C Hamano @ 2009-01-21 10:27 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.
Cc: Johannes Schindelin, Santi Béjar, Thomas Rast, git,
Teemu Likonen
"Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
> When diff is invoked with --color-words (w/o =regex), use the regular
> expression the user has configured as diff.wordregex.
>
> diff drivers configured via attributes take precedence over the
> diff.wordregex-words setting. If the user wants to change them, they have
> their own configuration variables.
>
> Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net>
> ---
> This version is squashed into one patch and includes documentation and
> rewritten tests. It was generated against js/diff-color-words~2,
> 80c49c3d (color-words: make regex configurable via attributes), replacing
> my previous 2 patches. It uses "diff.wordregex" for reasons mention by
> Dscho and because that was already what the diff drivers were using.
Nicely done and very well described. I fixed the Subject: line, though ;-)
Thanks.
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Change the spelling of "wordregex".
2009-01-21 8:26 ` Johannes Schindelin
2009-01-21 9:22 ` Thomas Rast
@ 2009-01-21 15:33 ` Boyd Stephen Smith Jr.
1 sibling, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-21 15:33 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]
On Wednesday 21 January 2009, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote about 'Re: [PATCH] Change the spelling
of "wordregex".':
>On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
>> diff --git a/userdiff.c b/userdiff.c
>> index 2b55509..d556da9 100644
>> --- a/userdiff.c
>> +++ b/userdiff.c
>> @@ -6,8 +6,8 @@ static struct userdiff_driver *drivers;
>> static int ndrivers;
>> static int drivers_alloc;
>>
>> -#define PATTERNS(name, pattern, wordregex) \
>> - { name, NULL, -1, { pattern, REG_EXTENDED }, wordregex }
>> +#define PATTERNS(name, pattern, word_regex) \
>> + { name, NULL, -1, { pattern, REG_EXTENDED }, word_regex }
>> static struct userdiff_driver builtin_drivers[] = {
>> PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
>> "[^<>= \t]+|[^[:space:]]|[\x80-\xff]+"),
>
>In general, it is an awesomly good idea to imitate code that is already
>there. That literally guarantees consistency (which is Good, as you
>know).
Agreed that consistency is good. However, using "wordregex" isn't
consistent. The rest of the time it is used as an identifier in the code,
it's spelled "word_regex" or "word_regexp", even before my patch.
(Declarations in: userdiff.h, builtin-grep.c, 3x diff.c, and grep.h)
In particular, the macro is used to initialize "struct userdiff_driver"s
and the relevant member of that struct uses "word_regex" before my patch.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] color-words: Support diff.color-words config option
2009-01-21 8:25 ` [PATCH] color-words: Support diff.color-words config option Johannes Schindelin
@ 2009-01-21 16:09 ` Boyd Stephen Smith Jr.
0 siblings, 0 replies; 109+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-21 16:09 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Santi Béjar, Thomas Rast, git, Junio C Hamano, Teemu Likonen
[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]
On Wednesday 21 January 2009, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote about 'Re: [PATCH] color-words: Support
diff.color-words config option':
>On Tue, 20 Jan 2009, Boyd Stephen Smith Jr. wrote:
>> I'm not entirely satisfied with it. There should probably be some way
>> to force the default behavior (which is a bit faster) even if a global
>> config or diff driver exists. Also, I think camelCase is better than
>> runtogether so I'd prefer to change "wordregex" -> "wordRegex" across
>> the entire patch set.
>
>Well, the thing is, it _should_ be "wordRegex", _except_ in the strcmp()
>because the config helpers get a downcased key.
It would have been nice to know that last night. I spent far longer than I
should have on the "wordregex" -> "wordRegex" patch.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] diff: Support diff.color-words config option
2009-01-20 21:08 ` Johannes Schindelin
2009-01-21 10:27 ` Junio C Hamano
@ 2009-01-21 19:37 ` Markus Heidelberg
1 sibling, 0 replies; 109+ messages in thread
From: Markus Heidelberg @ 2009-01-21 19:37 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, Boyd Stephen Smith Jr., Santi Béjar,
Thomas Rast, git, Teemu Likonen
Johannes Schindelin, 20.01.2009:
> Hi,
>
> On Tue, 20 Jan 2009, Markus Heidelberg wrote:
>
> > Junio C Hamano, 20.01.2009:
> > > None of the existing configuration variables defined use hyphens in
> > > multi-word variable names.
> >
> > Except for diff.suppress-blank-empty
> > Should it be converted or is it intention to reflect GNU diff's option?
>
> Grumble. It's in v1.6.1-rc1~348, so we cannot just go ahead and fix it.
Did I say change it without keeping backward compatibility?
> Ciao,
> Dscho "who loves consistency, and knows new users appreciate it, too"
Me, too.
Markus
^ permalink raw reply [flat|nested] 109+ messages in thread
end of thread, other threads:[~2009-01-21 19:38 UTC | newest]
Thread overview: 109+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-11 19:58 [PATCH 0/4] refactor the --color-words to make it more hackable Johannes Schindelin
2009-01-11 19:59 ` [PATCH 1/4] Add color_fwrite(), a function coloring each line individually Johannes Schindelin
2009-01-11 22:43 ` Junio C Hamano
2009-01-11 23:49 ` Johannes Schindelin
2009-01-11 23:49 ` [PATCH v2 " Johannes Schindelin
2009-01-12 1:27 ` Jakub Narebski
2009-01-11 19:59 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
2009-01-11 19:59 ` [PATCH 3/4] color-words: refactor to allow for 0-character word boundaries Johannes Schindelin
2009-01-11 23:08 ` Junio C Hamano
2009-01-11 23:38 ` Johannes Schindelin
2009-01-12 8:47 ` Thomas Rast
2009-01-12 9:36 ` Junio C Hamano
2009-01-11 20:00 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
2009-01-11 21:53 ` [PATCH 0/4] refactor the --color-words to make it more hackable Thomas Rast
2009-01-11 23:02 ` Johannes Schindelin
2009-01-12 6:25 ` Thomas Rast
2009-01-14 13:00 ` Santi Béjar
2009-01-14 17:49 ` [PATCH take 3 0/4] color-words improvements Johannes Schindelin
2009-01-14 17:50 ` [PATCH 1/4] Add color_fwrite_lines(), a function coloring each line individually Johannes Schindelin
2009-01-14 17:50 ` [PATCH 2/4] color-words: refactor word splitting and use ALLOC_GROW() Johannes Schindelin
2009-01-14 17:51 ` [PATCH 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
2009-01-14 18:08 ` Johannes Schindelin
2009-01-14 17:51 ` [PATCH 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
2009-01-14 19:55 ` Thomas Rast
2009-01-14 18:54 ` [PATCH take 3 0/4] color-words improvements Teemu Likonen
2009-01-14 18:57 ` Teemu Likonen
2009-01-14 19:28 ` Johannes Schindelin
2009-01-14 19:32 ` Johannes Schindelin
2009-01-14 20:44 ` [PATCH replacement for take 3 3/4] color-words: change algorithm to allow for 0-character word boundaries Johannes Schindelin
2009-01-14 20:46 ` [PATCH replacement for take 3 4/4] color-words: take an optional regular expression describing words Johannes Schindelin
2009-01-15 0:32 ` Thomas Rast
2009-01-15 1:12 ` Johannes Schindelin
2009-01-15 1:36 ` Johannes Schindelin
2009-01-15 8:30 ` Thomas Rast
2009-01-15 10:40 ` Thomas Rast
2009-01-15 12:54 ` Johannes Schindelin
2009-01-14 19:58 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
2009-01-14 22:06 ` Johannes Schindelin
2009-01-14 22:11 ` Thomas Rast
2009-01-14 22:24 ` Boyd Stephen Smith Jr.
2009-01-15 4:56 ` Teemu Likonen
2009-01-15 12:41 ` Johannes Schindelin
2009-01-15 13:03 ` Teemu Likonen
2009-01-15 13:27 ` Thomas Rast
2009-01-15 18:15 ` Junio C Hamano
2009-01-15 19:25 ` Johannes Schindelin
2009-01-16 0:10 ` Santi Béjar
2009-01-16 1:37 ` Junio C Hamano
2009-01-16 1:42 ` Boyd Stephen Smith Jr.
2009-01-16 1:55 ` Johannes Schindelin
2009-01-16 9:02 ` Santi Béjar
2009-01-16 11:57 ` Johannes Schindelin
2009-01-16 12:01 ` Santi Béjar
2009-01-16 12:40 ` Johannes Schindelin
2009-01-16 19:04 ` Thomas Rast
2009-01-16 21:09 ` Johannes Schindelin
2009-01-17 16:29 ` [PATCH v4 0/7] customizable --color-words Thomas Rast
2009-01-17 16:29 ` [PATCH v4 1/7] Add color_fwrite_lines(), a function coloring each line individually Thomas Rast
2009-01-17 16:29 ` [PATCH v4 2/7] color-words: refactor word splitting and use ALLOC_GROW() Thomas Rast
2009-01-17 16:29 ` [PATCH v4 3/7] color-words: change algorithm to allow for 0-character word boundaries Thomas Rast
2009-01-17 16:29 ` [PATCH v4 4/7] color-words: take an optional regular expression describing words Thomas Rast
2009-01-17 16:29 ` [PATCH v4 5/7] color-words: enable REG_NEWLINE to help user Thomas Rast
2009-01-17 16:29 ` [PATCH v4 6/7] color-words: expand docs with precise semantics Thomas Rast
2009-01-17 16:29 ` [PATCH v4 7/7] color-words: make regex configurable via attributes Thomas Rast
2009-01-18 15:05 ` [PATCH v4 0/7] customizable --color-words Santi Béjar
2009-01-18 15:29 ` Santi Béjar
2009-01-19 22:47 ` Santi Béjar
2009-01-19 23:35 ` Johannes Schindelin
2009-01-20 2:17 ` [PATCH] Add tests for diff.color-words configuration option Boyd Stephen Smith Jr.
2009-01-20 3:45 ` [PATCH] diff: Support diff.color-words config option Boyd Stephen Smith Jr.
2009-01-20 6:59 ` Junio C Hamano
2009-01-20 17:42 ` Markus Heidelberg
2009-01-20 17:58 ` Boyd Stephen Smith Jr.
2009-01-20 21:08 ` Johannes Schindelin
2009-01-21 10:27 ` Junio C Hamano
2009-01-21 19:37 ` Markus Heidelberg
2009-01-20 10:02 ` Johannes Schindelin
2009-01-20 16:52 ` Boyd Stephen Smith Jr.
2009-01-20 17:14 ` Johannes Schindelin
2009-01-20 17:09 ` Junio C Hamano
2009-01-20 17:28 ` Johannes Schindelin
2009-01-20 20:27 ` Junio C Hamano
2009-01-20 21:02 ` Johannes Schindelin
2009-01-21 3:46 ` [PATCH] color-words: " Boyd Stephen Smith Jr.
2009-01-21 4:59 ` [PATCH] Change the spelling of "wordregex" Boyd Stephen Smith Jr.
2009-01-21 8:26 ` Johannes Schindelin
2009-01-21 9:22 ` Thomas Rast
2009-01-21 15:33 ` Boyd Stephen Smith Jr.
2009-01-21 8:25 ` [PATCH] color-words: Support diff.color-words config option Johannes Schindelin
2009-01-21 16:09 ` Boyd Stephen Smith Jr.
2009-01-21 10:27 ` [PATCH] color-words: Support diff.wordregex " Junio C Hamano
2009-01-20 14:38 ` [PATCH] diff: Support diff.color-words " Jakub Narebski
2009-01-20 9:58 ` [PATCH] Add tests for diff.color-words configuration option Johannes Schindelin
2009-01-20 16:34 ` Boyd Stephen Smith Jr.
2009-01-20 16:54 ` Johannes Schindelin
2009-01-16 16:11 ` [PATCH take 3 0/4] color-words improvements Boyd Stephen Smith Jr.
2009-01-14 19:46 ` [PATCH] color-words: make regex configurable via attributes Thomas Rast
2009-01-14 20:12 ` Johannes Schindelin
2009-01-14 20:17 ` Thomas Rast
2009-01-14 22:26 ` [PATCH 1/4] color-words: fix quoting in t4034 Thomas Rast
2009-01-14 22:41 ` Johannes Schindelin
2009-01-14 22:26 ` [PATCH 2/4] color-words: enable REG_NEWLINE to help user Thomas Rast
2009-01-14 22:26 ` [PATCH 3/4] color-words: expand docs with precise semantics Thomas Rast
2009-01-14 22:26 ` [PATCH 4/4] color-words: make regex configurable via attributes Thomas Rast
2009-01-15 1:33 ` Johannes Schindelin
2009-01-15 1:43 ` Johannes Schindelin
2009-01-14 20:04 ` [PATCH take 3 0/4] color-words improvements Thomas Rast
2009-01-14 21:07 ` Johannes Schindelin
2009-01-14 22:37 ` Thomas Rast
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).