* [PATCH] Make words boundary for --color-words configurable
@ 2008-05-02 3:39 Ping Yin
2008-05-02 3:54 ` Junio C Hamano
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
0 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 3:39 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Previously --color-words only allow spaces as words boundary. However,
just space is not enough. For example, when i rename a function from
foo to bar, following example doesn't show as expected when using
--color-words.
------------------
- if (foo(arg))
+ if (bar(arg))
------------------
It shows as "if <r>(foo(arg))</r><g>(foo(arg))</g>". Actually, it's the
best to show as "if (<r>foo</r><g>bar</g>(arg))". Here "r" and "g"
represent "red" and "green" separately.
This patch introduces a configuration diff.wordsboundary to make
--color-words treat both spaces and characters in diff.wordsboundary as
boundary characters.
If we set diff.wordsboundary to "()", the example above will show as
"if (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best,
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/diff.c b/diff.c
index 3632b55..2c37bb6 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_rename_limit_default = 100;
int diff_use_color_default = -1;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
+static const char *diff_words_boundary = "";
static char diff_colors[][COLOR_MAXLEN] = {
"\033[m", /* reset */
@@ -159,6 +160,10 @@ int git_diff_ui_config(const char *var, const char *value)
external_diff_cmd_cfg = xstrdup(value);
return 0;
}
+ if (!strcmp(var, "diff.wordsboundary")) {
+ diff_words_boundary = value ? xstrdup(value) : "";
+ return 0;
+ }
if (!prefixcmp(var, "diff.")) {
const char *ep = strrchr(var, '.');
@@ -434,6 +439,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static int iswordsboundary(char c)
+{
+ return isspace(c) || !!strchr(diff_words_boundary, c);
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -448,7 +458,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
minus.ptr = xmalloc(minus.size);
memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
+ if (iswordsboundary(minus.ptr[i]))
minus.ptr[i] = '\n';
diff_words->minus.current = 0;
@@ -456,7 +466,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
plus.ptr = xmalloc(plus.size);
memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
+ if (iswordsboundary(plus.ptr[i]))
plus.ptr[i] = '\n';
diff_words->plus.current = 0;
--
1.5.5.1.116.ge4b9c.dirty
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 3:39 [PATCH] Make words boundary for --color-words configurable Ping Yin
@ 2008-05-02 3:54 ` Junio C Hamano
2008-05-02 4:28 ` Ping Yin
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
1 sibling, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-02 3:54 UTC (permalink / raw)
To: Ping Yin; +Cc: git
Ping Yin <pkufranky@gmail.com> writes:
> Previously --color-words only allow spaces as words boundary. However,
> just space is not enough. For example, when i rename a function from
> foo to bar, following example doesn't show as expected when using
> --color-words.
>
> ------------------
> - if (foo(arg))
> + if (bar(arg))
> ------------------
>
> It shows as "if <r>(foo(arg))</r><g>(foo(arg))</g>". Actually, it's the
> best to show as "if (<r>foo</r><g>bar</g>(arg))". Here "r" and "g"
> represent "red" and "green" separately.
>
> This patch introduces a configuration diff.wordsboundary to make
> --color-words treat both spaces and characters in diff.wordsboundary as
> boundary characters.
Just an idle thought.
I suspect a more natural definition of word boundary is between a run of
word characters and a run of non-word characters. IOW, instead of saying
" " and "(these other)" characters are boundary, you would say
if (foo(arg))
between f and space, open paren and f, second o and open paren, that open
paren and a, ... are boundaries.
If you go that route, you would make the definition of "what is the set of
word characters" configurable.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 3:54 ` Junio C Hamano
@ 2008-05-02 4:28 ` Ping Yin
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-02 4:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, May 2, 2008 at 11:54 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ping Yin <pkufranky@gmail.com> writes:
>
> I suspect a more natural definition of word boundary is between a run of
> word characters and a run of non-word characters. IOW, instead of saying
> " " and "(these other)" characters are boundary, you would say
>
> if (foo(arg))
>
> between f and space, open paren and f, second o and open paren, that open
> paren and a, ... are boundaries.
>
> If you go that route, you would make the definition of "what is the set of
> word characters" configurable.
I prefer nonwordcharacters to wordcharacters. With
diff.wordcharacters, to maintain the same behaviour as before, i have
to define a long list of characters. While with
diff.nonwordcharacters, i need only define it to "".
So how about just renaming wordsboundary to nonwordcharacters?
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 3:39 [PATCH] Make words boundary for --color-words configurable Ping Yin
2008-05-02 3:54 ` Junio C Hamano
@ 2008-05-02 7:45 ` Johannes Schindelin
2008-05-02 8:14 ` Teemu Likonen
` (2 more replies)
1 sibling, 3 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-02 7:45 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Fri, 2 May 2008, Ping Yin wrote:
> Previously --color-words only allow spaces as words boundary. However,
> just space is not enough. For example, when i rename a function from foo
> to bar, following example doesn't show as expected when using
> --color-words.
Thanks for starting this.
However, as Junio pointed out, it is easier to specify word-characters,
rather than non-word characters (think TAB), and...
> +static int iswordsboundary(char c)
> +{
> + return isspace(c) || !!strchr(diff_words_boundary, c);
> +}
this will be called quite some times. So it would make more sense to have
an "unsigned char word_characters[256]" and set those entries to 1 that
are to be interpreted as word characters.
This would allow you also to interpret "0-9A-Za-z" correctly.
Oh, and maybe having "::default" and "::alnum" suffixes interpreted, so
that I can say "_::alnum" to have C identifiers interpreted as words?
Thanks,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
@ 2008-05-02 8:14 ` Teemu Likonen
2008-05-02 9:23 ` Ping Yin
2008-05-02 10:01 ` Teemu Likonen
2008-05-02 9:28 ` Ping Yin
2008-05-03 0:18 ` Jakub Narebski
2 siblings, 2 replies; 92+ messages in thread
From: Teemu Likonen @ 2008-05-02 8:14 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git, gitster
Johannes Schindelin wrote (2008-05-02 08:45 +0100):
> On Fri, 2 May 2008, Ping Yin wrote:
>
> > Previously --color-words only allow spaces as words boundary.
> > However, just space is not enough. For example, when i rename
> > a function from foo to bar, following example doesn't show as
> > expected when using --color-words.
>
> Thanks for starting this.
>
> However, as Junio pointed out, it is easier to specify
> word-characters, rather than non-word characters (think TAB), and...
Just a quick note from someone who is not so much a programmer but who
uses Git to track text/LaTex/etc. files with human languages: Please
don't make this kind of things too Ascii-specific and too much
byte-is-interpreted-as-character type thing.
In general, my opinion is that with international text it's better to
define word boundary characters than trying to maintain a _huge_ list of
characters used within words in different human languages.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 8:14 ` Teemu Likonen
@ 2008-05-02 9:23 ` Ping Yin
2008-05-02 10:01 ` Teemu Likonen
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 9:23 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Johannes Schindelin, git, gitster
On Fri, May 2, 2008 at 4:14 PM, Teemu Likonen <tlikonen@iki.fi> wrote:
> Johannes Schindelin wrote (2008-05-02 08:45 +0100):
>
> In general, my opinion is that with international text it's better to
> define word boundary characters than trying to maintain a _huge_ list of
> characters used within words in different human languages.
>
Agreed, there is no easy way to designate such a huge list of
non-ascii characters. Even if we figure out one way, the user may
still be scared by such a huge list or the character class syntax.
I think doing this complex both the implementation and the representation.
Instead, if using non word characters, the user only needs specify a
small list of characters (ascii or wide character). instead of the
nearly whole set.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
2008-05-02 8:14 ` Teemu Likonen
@ 2008-05-02 9:28 ` Ping Yin
2008-05-03 0:18 ` Jakub Narebski
2 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 9:28 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Fri, May 2, 2008 at 3:45 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> However, as Junio pointed out, it is easier to specify word-characters,
> rather than non-word characters (think TAB), and...
>
We don't have to designate TAB. i think space characters should always
be word boundary.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 8:14 ` Teemu Likonen
2008-05-02 9:23 ` Ping Yin
@ 2008-05-02 10:01 ` Teemu Likonen
1 sibling, 0 replies; 92+ messages in thread
From: Teemu Likonen @ 2008-05-02 10:01 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git, gitster
Teemu Likonen wrote (2008-05-02 11:14 +0300):
> > On Fri, 2 May 2008, Ping Yin wrote:
> >
> > > Previously --color-words only allow spaces as words boundary.
> > > However, just space is not enough. For example, when i rename
> > > a function from foo to bar, following example doesn't show as
> > > expected when using --color-words.
> Just a quick note from someone who is not so much a programmer but who
> uses Git to track text/LaTex/etc. files with human languages:
And let me add a sidenote that, at least to me, --color-words is the
tool for doing diffs of human language text files. I think it's a lot
more useful than normal diff.
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH] Make boundary characters for --color-words configurable
2008-05-02 4:28 ` Ping Yin
@ 2008-05-02 13:59 ` Ping Yin
2008-05-02 14:26 ` Ping Yin
` (2 more replies)
0 siblings, 3 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 13:59 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Previously --color-words only allow spaces as boundary characters.
However, just space is not enough. For example, when i rename a function
from foo to bar, following example doesn't show as expected when using
--color-words.
------------------
- if (foo(arg))
+ if (bar(arg))
------------------
It shows as "if <r>(foo(arg))</r><g>(foo(arg))</g>". Actually, it's the
best to show as "if (<r>foo</r><g>bar</g>(arg))". Here "r" and "g"
represent "red" and "green" separately.
This patch introduces a configuration diff.nonwordchars to make
--color-words treat both spaces and characters in diff.nonwordchars as
boundary characters.
If we set diff.nonwordchars to "()", the example above will show as
"if (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best,
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
>> In general, my opinion is that with international text it's better to
>> define word boundary characters than trying to maintain a _huge_ list of
>> characters used within words in different human languages.
>>
>
> Agreed, there is no easy way to designate such a huge list of
> non-ascii characters. Even if we figure out one way, the user may
> still be scared by such a huge list or the character class syntax.
>
> I think doing this complex both the implementation and the representation.
>
> Instead, if using non word characters, the user only needs specify a
> small list of characters (ascii or wide character). instead of the
> nearly whole set.
Two changes:
* Rename diff.wordsboundary to diff.nonwordchars
* Add documentation
Documentation/config.txt | 4 ++++
diff.c | 14 ++++++++++++--
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 824e416..eb05592 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -546,6 +546,10 @@ diff.renames::
will enable basic rename detection. If set to "copies" or
"copy", it will detect copies, as well.
+diff.nonwordchars::
+ Specify additional boundary characters other than spaces for
+ --color-words.
+
fetch.unpackLimit::
If the number of objects fetched over the git native
transfer is below this
diff --git a/diff.c b/diff.c
index 3632b55..e48466d 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_rename_limit_default = 100;
int diff_use_color_default = -1;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
+static const char *diff_non_word_chars = "";
static char diff_colors[][COLOR_MAXLEN] = {
"\033[m", /* reset */
@@ -159,6 +160,10 @@ int git_diff_ui_config(const char *var, const char *value)
external_diff_cmd_cfg = xstrdup(value);
return 0;
}
+ if (!strcmp(var, "diff.nonwordchars")) {
+ diff_non_word_chars = value ? xstrdup(value) : "";
+ return 0;
+ }
if (!prefixcmp(var, "diff.")) {
const char *ep = strrchr(var, '.');
@@ -434,6 +439,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static int is_non_word_char(char c)
+{
+ return isspace(c) || !!strchr(diff_non_word_chars, c);
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -448,7 +458,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
minus.ptr = xmalloc(minus.size);
memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
+ if (is_non_word_char(minus.ptr[i]))
minus.ptr[i] = '\n';
diff_words->minus.current = 0;
@@ -456,7 +466,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
plus.ptr = xmalloc(plus.size);
memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
+ if (is_non_word_char(plus.ptr[i]))
plus.ptr[i] = '\n';
diff_words->plus.current = 0;
--
1.5.5.1.117.g73010
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
@ 2008-05-02 14:26 ` Ping Yin
2008-05-02 14:27 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
2008-05-02 14:36 ` [PATCH] Make boundary characters for --color-words configurable Teemu Likonen
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 14:26 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
On Fri, May 2, 2008 at 9:59 PM, Ping Yin <pkufranky@gmail.com> wrote:
> Previously --color-words only allow spaces as boundary characters.
> However, just space is not enough. For example, when i rename a function
> from foo to bar, following example doesn't show as expected when using
> --color-words.
>
> ------------------
> - if (foo(arg))
> + if (bar(arg))
> ------------------
>
> It shows as "if <r>(foo(arg))</r><g>(foo(arg))</g>". Actually, it's the
> best to show as "if (<r>foo</r><g>bar</g>(arg))". Here "r" and "g"
> represent "red" and "green" separately.
>
> This patch introduces a configuration diff.nonwordchars to make
> --color-words treat both spaces and characters in diff.nonwordchars as
> boundary characters.
>
> If we set diff.nonwordchars to "()", the example above will show as
> "if (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best,
>
Oh, there are some problems, assuming "{}" are set as diff.nonwordchars
1. Trailing boundary character lost, for example
----------------------------
$ git diff-
- foo{
+ foo
$ git diff --color-words
foo
----------------------------
With --color-words, i can't know the trailing '{' is removed. This
problem exists even without my patch. In that case, only trainling
spaces are lost.
2. Trailing removed words shows at new line instead of the same line
----------------------------
$ git diff
- foo bar
+ foo
(note: no space after foo)
$ git diff --color-words
foo
<red>bar</red>
--------------------------------
bar should show in the same line with bar. This is not related to my patch.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-02 14:26 ` Ping Yin
@ 2008-05-02 14:27 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-02 14:27 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
On Fri, May 2, 2008 at 10:26 PM, Ping Yin <pkufranky@gmail.com> wrote:
> bar should show in the same line with bar. This is not related to my patch.
s/with bar/with foo/
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
2008-05-02 14:26 ` Ping Yin
@ 2008-05-02 14:36 ` Teemu Likonen
2008-05-03 0:22 ` Ping Yin
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2 siblings, 1 reply; 92+ messages in thread
From: Teemu Likonen @ 2008-05-02 14:36 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Ping Yin wrote (2008-05-02 21:59 +0800):
> Previously --color-words only allow spaces as boundary characters.
> However, just space is not enough. For example, when i rename
> a function from foo to bar, following example doesn't show as expected
> when using --color-words.
> Two changes:
>
> * Rename diff.wordsboundary to diff.nonwordchars
> * Add documentation
I think config variables should be in alphabetical order in config.txt.
Hence your diff.nonwordchars is not in the right place. Other than that
this patch seems to work and is really useful to me. Thanks.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make words boundary for --color-words configurable
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
2008-05-02 8:14 ` Teemu Likonen
2008-05-02 9:28 ` Ping Yin
@ 2008-05-03 0:18 ` Jakub Narebski
2 siblings, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2008-05-03 0:18 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git, gitster
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Oh, and maybe having "::default" and "::alnum" suffixes interpreted, so
> that I can say "_::alnum" to have C identifiers interpreted as words?
I'd rather use POSIX classes like "[:alnum:]" for that
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-02 14:36 ` [PATCH] Make boundary characters for --color-words configurable Teemu Likonen
@ 2008-05-03 0:22 ` Ping Yin
2008-05-03 13:22 ` Dirk Süsserott
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-03 0:22 UTC (permalink / raw)
To: tlikonen; +Cc: gitster, git, Ping Yin
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
> I think config variables should be in alphabetical order in config.txt.
> Hence your diff.nonwordchars is not in the right place
THX, this is fixing patch
Documentation/config.txt | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index eb05592..812ec2c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -537,6 +537,10 @@ diff.external::
program only on a subset of your files, you might want to
use linkgit:gitattributes[5] instead.
+diff.nonwordchars::
+ Specify additional boundary characters other than spaces for
+ --color-words.
+
diff.renameLimit::
The number of files to consider when performing the copy/rename
detection; equivalent to the git diff option '-l'.
@@ -546,10 +550,6 @@ diff.renames::
will enable basic rename detection. If set to "copies" or
"copy", it will detect copies, as well.
-diff.nonwordchars::
- Specify additional boundary characters other than spaces for
- --color-words.
-
fetch.unpackLimit::
If the number of objects fetched over the git native
transfer is below this
--
1.5.5.1.117.g73010
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 0/5] Make boundary characters for --color-words configurable
2008-05-02 14:26 ` Ping Yin
2008-05-02 14:27 ` Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-04 4:20 ` [PATCH v3 0/6] --color-words improvement Ping Yin
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster
Ping Yin (5):
diff.c: Remove code redundancy in diff_words_show
diff.c: Use show variable name in fn_out_diff_words_aux
diff.c: Fix --color-words showing trailing deleted words at another line
Make boundary characters for --color-words configurable
fn_out_diff_words_aux: Handle common diff line more carefully
Documentation/config.txt | 4 ++
Documentation/diff-options.txt | 1 +
diff.c | 83 +++++++++++++++++++++++++++-------------
3 files changed, 61 insertions(+), 27 deletions(-)
The first two patches are just code refactor
The 3rd patch fixes following problem 2
The 4th patch introduces diff.nonwordchars
The 5th patch fixes following problem 1
> Oh, there are some problems, assuming "{}" are set as diff.nonwordchars
>
> 1. Trailing boundary character lost, for example
> ----------------------------
> $ git diff-
> - foo{
> + foo
> $ git diff --color-words
> foo
> ----------------------------
> With --color-words, i can't know the trailing '{' is removed. This
> problem exists even without my patch. In that case, only trainling
> spaces are lost.
>
> 2. Trailing removed words shows at new line instead of the same line
> ----------------------------
> $ git diff
> - foo bar
> + foo
> (note: no space after foo)
> $ git diff --color-words
> foo
> <red>bar</red>
> --------------------------------
> bar should show in the same line with bar. This is not related to my patch.
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 18:20 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Junio C Hamano
2008-05-04 4:20 ` [PATCH v3 0/6] --color-words improvement Ping Yin
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Introduce mmfile_copy_set_boundary to do the repeated work.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 26 +++++++++++++-------------
1 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/diff.c b/diff.c
index 3632b55..acef138 100644
--- a/diff.c
+++ b/diff.c
@@ -434,6 +434,17 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
+ int i;
+
+ dest->size = src->size;
+ dest->ptr = xmalloc(dest->size);
+ memcpy(dest->ptr, src->ptr, dest->size);
+ for (i = 0; i < dest->size; i++)
+ if (isspace(dest->ptr[i]))
+ dest->ptr[i] = '\n';
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -444,20 +455,9 @@ static void diff_words_show(struct diff_words_data *diff_words)
int i;
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
+ mmfile_copy_set_boundary(&minus, &(diff_words->minus.text));
diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
+ mmfile_copy_set_boundary(&plus, &(diff_words->plus.text));
diff_words->plus.current = 0;
xpp.flags = XDF_NEED_MINIMAL;
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
` (2 more replies)
2008-05-03 18:20 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Junio C Hamano
1 sibling, 3 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 26 +++++++++++++++-----------
1 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/diff.c b/diff.c
index acef138..b5f7141 100644
--- a/diff.c
+++ b/diff.c
@@ -408,28 +408,32 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
- struct diff_words_data *diff_words = priv;
+ struct diff_words_data *diff_words;
+ struct diff_words_buffer *dm, *dp;
+ FILE *df;
- if (diff_words->minus.suppressed_newline) {
+ diff_words = priv;
+ dm = &(diff_words->minus);
+ dp = &(diff_words->plus);
+ df = diff_words->file;
+
+ if (dm->suppressed_newline) {
if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
+ putc('\n', df);
+ dm->suppressed_newline = 0;
}
len--;
switch (line[0]) {
case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
+ print_word(df, dm, len, DIFF_FILE_OLD, 1);
break;
case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
+ print_word(df, dp, len, DIFF_FILE_NEW, 0);
break;
case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
+ print_word(df, dp, len, DIFF_PLAIN, 0);
+ dm->current += len;
break;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
2008-05-03 18:01 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Junio C Hamano
2008-05-03 12:01 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 17:47 ` Junio C Hamano
2 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
With --color-words, following example will show deleted word "bar" at
another line. (<r> represents red)
------------------
$ git diff
- foo bar
+ foo
$ git diff --color-words
foo
<r>bar</r>
------------------
This wrong behaviour is a bug in fn_out_diff_words_aux which always
outputs a newline after handling the diff line beginning with "+" and
ending with a newline.
Instead, we always supress the newline when using print_word, and in
fn_out_diff_words_aux, a newline is shown only in following cases:
- true minus.suppressed_newline followd by a line beginning with
'-', ' ' or '@' (i.e. not '+')
- true plus.suppressed_newline followd by a line beginning with
'+', ' ' or '@' (i.e. not '-')
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 19 +++++++++++++------
1 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/diff.c b/diff.c
index b5f7141..11316fe 100644
--- a/diff.c
+++ b/diff.c
@@ -409,6 +409,7 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words;
+ char cm;
struct diff_words_buffer *dm, *dp;
FILE *df;
@@ -417,10 +418,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
dp = &(diff_words->plus);
df = diff_words->file;
- if (dm->suppressed_newline) {
- if (line[0] != '+')
- putc('\n', df);
+ if ((dm->suppressed_newline && line[0] != '+') ||
+ (dp->suppressed_newline && line[0] != '-')) {
+ putc('\n', df);
dm->suppressed_newline = 0;
+ dp->suppressed_newline = 0;
}
len--;
@@ -429,11 +431,14 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
print_word(df, dm, len, DIFF_FILE_OLD, 1);
break;
case '+':
- print_word(df, dp, len, DIFF_FILE_NEW, 0);
+ print_word(df, dp, len, DIFF_FILE_NEW, 1);
break;
case ' ':
- print_word(df, dp, len, DIFF_PLAIN, 0);
+ cm = dm->text.ptr[dm->current + len - 1];
+ print_word(df, dp, len, DIFF_PLAIN, 1);
dm->current += len;
+ if (cm == '\n')
+ dm->suppressed_newline = 1;
break;
}
}
@@ -475,9 +480,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
free(plus.ptr);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
- if (diff_words->minus.suppressed_newline) {
+ if (diff_words->minus.suppressed_newline ||
+ diff_words->plus.suppressed_newline) {
putc('\n', diff_words->file);
diff_words->minus.suppressed_newline = 0;
+ diff_words->plus.suppressed_newline = 0;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 5/5] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
2008-05-03 18:01 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Junio C Hamano
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Previously --color-words only allow spaces as boundary characters.
However, just space is not enough. For example, when i rename a function
from foo to bar, following example doesn't show as expected when using
--color-words.
------------------
- if (foo(arg))
+ if (bar(arg))
------------------
It shows as "if <r>(foo(arg))</r><g>(foo(arg))</g>". Actually, it's the
best to show as "if (<r>foo</r><g>bar</g>(arg))". Here "r" and "g"
represent "red" and "green" separately.
This patch introduces a configuration diff.nonwordchars to make
--color-words treat both spaces and characters in diff.nonwordchars as
boundary characters.
If we set diff.nonwordchars to "()", the example above will show as
"if (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best,
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
Documentation/config.txt | 4 ++++
Documentation/diff-options.txt | 1 +
diff.c | 12 +++++++++++-
3 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 824e416..812ec2c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -537,6 +537,10 @@ diff.external::
program only on a subset of your files, you might want to
use linkgit:gitattributes[5] instead.
+diff.nonwordchars::
+ Specify additional boundary characters other than spaces for
+ --color-words.
+
diff.renameLimit::
The number of files to consider when performing the copy/rename
detection; equivalent to the git diff option '-l'.
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 13234fa..60dd5e6 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -95,6 +95,7 @@ endif::git-format-patch[]
--color-words::
Show colored word diff, i.e. color words which have changed.
+ The boundary characters can be configured with diff.nonwordchars.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index 11316fe..50d7fa7 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_rename_limit_default = 100;
int diff_use_color_default = -1;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
+static const char *diff_non_word_chars = "";
static char diff_colors[][COLOR_MAXLEN] = {
"\033[m", /* reset */
@@ -159,6 +160,10 @@ int git_diff_ui_config(const char *var, const char *value)
external_diff_cmd_cfg = xstrdup(value);
return 0;
}
+ if (!strcmp(var, "diff.nonwordchars")) {
+ diff_non_word_chars = value ? xstrdup(value) : "";
+ return 0;
+ }
if (!prefixcmp(var, "diff.")) {
const char *ep = strrchr(var, '.');
@@ -443,6 +448,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static int is_non_word_char(char c)
+{
+ return isspace(c) || !!strchr(diff_non_word_chars, c);
+}
+
static mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
int i;
@@ -450,7 +460,7 @@ static mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
dest->ptr = xmalloc(dest->size);
memcpy(dest->ptr, src->ptr, dest->size);
for (i = 0; i < dest->size; i++)
- if (isspace(dest->ptr[i]))
+ if (is_non_word_char(dest->ptr[i]))
dest->ptr[i] = '\n';
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v2 5/5] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
@ 2008-05-03 11:57 ` Ping Yin
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 11:57 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Before feeding minus and plus lines into xdi_diff, we replace non word
characters with '\n'. So we need recover the replaced character (always
the last character) in the callback fn_out_diff_words_aux.
Therefore, a common diff line beginning with ' ' is not always a real
common line. And we should check the last characters of the common diff
line. If they are different, we should output the first len-1 characters
as the common part and then the last characters in minus and plus
separately.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 18 +++++++++++++-----
1 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/diff.c b/diff.c
index 50d7fa7..72fe804 100644
--- a/diff.c
+++ b/diff.c
@@ -414,7 +414,7 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words;
- char cm;
+ char cm, cp;
struct diff_words_buffer *dm, *dp;
FILE *df;
@@ -440,10 +440,18 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
break;
case ' ':
cm = dm->text.ptr[dm->current + len - 1];
- print_word(df, dp, len, DIFF_PLAIN, 1);
- dm->current += len;
- if (cm == '\n')
- dm->suppressed_newline = 1;
+ cp = dp->text.ptr[dp->current + len - 1];
+ if (cm == cp) {
+ print_word(df, dp, len, DIFF_PLAIN, 1);
+ dm->current += len;
+ }
+ else {
+ len--;
+ print_word(df, dp, len, DIFF_PLAIN, 1);
+ dm->current += len;
+ print_word(df, dm, 1, DIFF_FILE_OLD, 1);
+ print_word(df, dp, 1, DIFF_FILE_NEW, 1);
+ }
break;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
@ 2008-05-03 12:01 ` Ping Yin
2008-05-03 17:47 ` Junio C Hamano
2 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 12:01 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
On Sat, May 3, 2008 at 7:57 PM, Ping Yin <pkufranky@gmail.com> wrote:
> Signed-off-by: Ping Yin <pkufranky@gmail.com>
> ---
Sorry, the wrong title, s/show/short/
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-03 0:22 ` Ping Yin
@ 2008-05-03 13:22 ` Dirk Süsserott
2008-05-03 13:57 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Dirk Süsserott @ 2008-05-03 13:22 UTC (permalink / raw)
To: Ping Yin; +Cc: tlikonen, gitster, git
Hi Ping,
I highly appreciate your effort into "diff --color-words"
and hope it makes it into the the next release. I wanted
to change the behaviour as well, but when I saw that
git-diff is a builtin, I had to give up. I hoped it was
a perl script and could insert some "\b" regexes somewhere,
but I was wrong. I'm using Git for Windows, you know.
However, I'd like to ask you whether you've done any research
in how to use "--color-words" in gitk? gitk seems to color
the lines only by means of a '+' or '-' sign in the first
column. Hardcoded. I managed to add a checkbox to gitk that
adds the '--color-words' switch to git diff, but when checked
the output is just muddled up. All of those ^] characters
whithin the code. :-(
Dirk
Ping Yin schrieb:
> Signed-off-by: Ping Yin <pkufranky@gmail.com>
> ---
>
>> I think config variables should be in alphabetical order in config.txt.
>> Hence your diff.nonwordchars is not in the right place
>>
>
> THX, this is fixing patch
>
> Documentation/config.txt | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index eb05592..812ec2c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -537,6 +537,10 @@ diff.external::
> program only on a subset of your files, you might want to
> use linkgit:gitattributes[5] instead.
>
> +diff.nonwordchars::
> + Specify additional boundary characters other than spaces for
> + --color-words.
> +
> diff.renameLimit::
> The number of files to consider when performing the copy/rename
> detection; equivalent to the git diff option '-l'.
> @@ -546,10 +550,6 @@ diff.renames::
> will enable basic rename detection. If set to "copies" or
> "copy", it will detect copies, as well.
>
> -diff.nonwordchars::
> - Specify additional boundary characters other than spaces for
> - --color-words.
> -
> fetch.unpackLimit::
> If the number of objects fetched over the git native
> transfer is below this
>
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] Make boundary characters for --color-words configurable
2008-05-03 13:22 ` Dirk Süsserott
@ 2008-05-03 13:57 ` Ping Yin
0 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-03 13:57 UTC (permalink / raw)
To: Dirk Süsserott; +Cc: tlikonen, gitster, git
On Sat, May 3, 2008 at 9:22 PM, Dirk Süsserott <newsletter@dirk.my1.cc> wrote:
> Hi Ping,
>
> I highly appreciate your effort into "diff --color-words"
> and hope it makes it into the the next release.
Glad to hear that.
>
> However, I'd like to ask you whether you've done any research
> in how to use "--color-words" in gitk?
oh, i havn' t even used gtk yet. I don't have the X environment
because i use windows and then use securecrt to ssh to remote server.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH] --color-words: Make the word characters configurable
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
2008-05-02 14:26 ` Ping Yin
2008-05-02 14:36 ` [PATCH] Make boundary characters for --color-words configurable Teemu Likonen
@ 2008-05-03 14:03 ` Johannes Schindelin
2008-05-03 14:13 ` Ping Yin
` (2 more replies)
2 siblings, 3 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-03 14:03 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Now, you can specify which characters are to be interpreted as word
characters with "--color-words=A-Za-z", or by setting the config variable
diff.wordCharacters.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
I would have preferred an approach like this.
Documentation/config.txt | 6 ++++
Documentation/diff-options.txt | 8 ++++-
README | 2 +-
diff.c | 64 +++++++++++++++++++++++++++++++++++++++-
4 files changed, 77 insertions(+), 3 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 05bf2df..663d82b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -546,6 +546,12 @@ diff.renames::
will enable basic rename detection. If set to "copies" or
"copy", it will detect copies, as well.
+diff.wordcharacters::
+ This config setting overrides which characters are interpreted as
+ word characters by the --color-words option of linkgit:git-diff[1].
++
+The default is: all ASCII characters excluding NUL to SPACE.
+
fetch.unpackLimit::
If the number of objects fetched over the git native
transfer is below this
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 13234fa..88ea5d4 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -93,8 +93,14 @@ endif::git-format-patch[]
Turn off colored diff, even when the configuration file
gives the default to color output.
---color-words::
+--color-words[=<set>]::
Show colored word diff, i.e. color words which have changed.
++
+If a set of characters is specified, it is interpreted as the range of
+word characters. Example: "0-9A-Fa-f". As a convenience, "[:alnum:]"
+and "[:alpha:]" expand to alpha-numeric and alpha characters,
+respectively. This argument overrides the config setting
+'diff.wordCharacters'.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/README b/README
index 548142c..0e325e2 100644
--- a/README
+++ b/README
@@ -4,7 +4,7 @@
////////////////////////////////////////////////////////////////
-"git" can mean anything, depending on your mood.
+"git" cann mean anything, depending on your mood.
- random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
diff --git a/diff.c b/diff.c
index 3632b55..3e8719c 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,20 @@ static int diff_rename_limit_default = 100;
int diff_use_color_default = -1;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
+static char word_character[256] = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+};
static char diff_colors[][COLOR_MAXLEN] = {
"\033[m", /* reset */
@@ -123,6 +137,44 @@ static int parse_funcname_pattern(const char *var, const char *ep, const char *v
return 0;
}
+static void set_word_character_range(char start, char end)
+{
+ int i;
+ for (i = (unsigned char)start; i <= (unsigned char)end; i++)
+ word_character[i] = 1;
+}
+
+static int set_word_characters(const char *set)
+{
+ int previous_character = -1;
+
+ memset(word_character, 0, sizeof(word_character));
+
+ /* parse values like "0-9[:alnum:]" */
+ for (; *set; set++)
+ if (!prefixcmp(set, "[:alpha:]")) {
+ set_word_character_range('A', 'Z');
+ set_word_character_range('a', 'z');
+ previous_character = -1;
+ set += 8;
+ } else if (!prefixcmp(set, "[:alnum:]")) {
+ set_word_character_range('A', 'Z');
+ set_word_character_range('a', 'z');
+ set_word_character_range('0', '9');
+ previous_character = -1;
+ set += 8;
+ } else if (*set == '-' && previous_character >= 0) {
+ set++;
+ set_word_character_range(previous_character, *set);
+ previous_character = -1;
+ } else {
+ word_character[(unsigned int)*set] = 1;
+ previous_character = *set;
+ }
+
+ return 0;
+}
+
/*
* These are to give UI layer defaults.
* The core-level commands such as git-diff-files should
@@ -179,6 +231,12 @@ int git_diff_basic_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "diff.wordcharacters")) {
+ if (!value)
+ return config_error_nonbool(var);
+ return set_word_characters(value);
+ }
+
if (!prefixcmp(var, "diff.")) {
const char *ep = strrchr(var, '.');
if (ep != var + 4) {
@@ -456,7 +514,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
plus.ptr = xmalloc(plus.size);
memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
+ if (!word_character[(unsigned char)plus.ptr[i]])
plus.ptr[i] = '\n';
diff_words->plus.current = 0;
@@ -2489,6 +2547,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
DIFF_OPT_CLR(options, COLOR_DIFF);
else if (!strcmp(arg, "--color-words"))
options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ else if (!prefixcmp(arg, "--color-words=")) {
+ options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+ set_word_characters(arg + 13);
+ }
else if (!strcmp(arg, "--exit-code"))
DIFF_OPT_SET(options, EXIT_WITH_STATUS);
else if (!strcmp(arg, "--quiet"))
--
1.5.5.1.266.g7cbb
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
@ 2008-05-03 14:13 ` Ping Yin
2008-05-03 14:23 ` Johannes Schindelin
2008-05-03 14:43 ` Teemu Likonen
2008-05-03 17:43 ` Junio C Hamano
2 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-03 14:13 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Sat, May 3, 2008 at 10:03 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Now, you can specify which characters are to be interpreted as word
> characters with "--color-words=A-Za-z", or by setting the config variable
> diff.wordCharacters.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> I would have preferred an approach like this.
>
> Documentation/config.txt | 6 ++++
> Documentation/diff-options.txt | 8 ++++-
> README | 2 +-
> diff.c | 64 +++++++++++++++++++++++++++++++++++++++-
> 4 files changed, 77 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 05bf2df..663d82b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -546,6 +546,12 @@ diff.renames::
> will enable basic rename detection. If set to "copies" or
> "copy", it will detect copies, as well.
>
> +diff.wordcharacters::
> + This config setting overrides which characters are interpreted as
> + word characters by the --color-words option of linkgit:git-diff[1].
I think a-zA-Z0-9_ should always be word characters. We can't
override them, instead, we just extend them.
>
> -"git" can mean anything, depending on your mood.
> +"git" cann mean anything, depending on your mood.
Why replacing can as cann?
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 14:13 ` Ping Yin
@ 2008-05-03 14:23 ` Johannes Schindelin
0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-03 14:23 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Sat, 3 May 2008, Ping Yin wrote:
> On Sat, May 3, 2008 at 10:03 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Now, you can specify which characters are to be interpreted as word
> > characters with "--color-words=A-Za-z", or by setting the config variable
> > diff.wordCharacters.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >
> > I would have preferred an approach like this.
> >
> > Documentation/config.txt | 6 ++++
> > Documentation/diff-options.txt | 8 ++++-
> > README | 2 +-
> > diff.c | 64 +++++++++++++++++++++++++++++++++++++++-
> > 4 files changed, 77 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 05bf2df..663d82b 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -546,6 +546,12 @@ diff.renames::
> > will enable basic rename detection. If set to "copies" or
> > "copy", it will detect copies, as well.
> >
> > +diff.wordcharacters::
> > + This config setting overrides which characters are interpreted as
> > + word characters by the --color-words option of linkgit:git-diff[1].
>
> I think a-zA-Z0-9_ should always be word characters. We can't override
> them, instead, we just extend them.
No. That is exactly the artificial-limitation-by-design I do not want.
> > -"git" can mean anything, depending on your mood.
> > +"git" cann mean anything, depending on your mood.
>
> Why replacing can as cann?
Because I am stupid and committed my test case. Of course, this patch was
done under time pressure, because I have to leave for the day, like, right
now.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2008-05-03 14:13 ` Ping Yin
@ 2008-05-03 14:43 ` Teemu Likonen
2008-05-04 9:18 ` Johannes Schindelin
2008-05-03 17:43 ` Junio C Hamano
2 siblings, 1 reply; 92+ messages in thread
From: Teemu Likonen @ 2008-05-03 14:43 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git, gitster
Johannes Schindelin wrote (2008-05-03 15:03 +0100):
> Now, you can specify which characters are to be interpreted as word
> characters with "--color-words=A-Za-z", or by setting the config
> variable diff.wordCharacters.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> I would have preferred an approach like this.
Unfortunately this does not work at all with other than Ascii
characters. It makes --color-words completely unusable for anything
other than Ascii text. Sorry.
Ping Yin's version has also the problem that UTF-8 multibyte characters
U+0080..U+10FFFF don't work in diff.nonwordchars. Fortunately the most
important word delimiters are in U+0000..U+007F (=Ascii) area so Ping's
version is perfectly usable with Unicode text. (Even the old
--color-words behaviour with only SPACE as non-word char was perfectly
usable with Unicode text.) I, too, would like to see Ping's patch series
merged in.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2008-05-03 14:13 ` Ping Yin
2008-05-03 14:43 ` Teemu Likonen
@ 2008-05-03 17:43 ` Junio C Hamano
2008-05-04 9:25 ` Johannes Schindelin
2 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-03 17:43 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Now, you can specify which characters are to be interpreted as word
> characters with "--color-words=A-Za-z", or by setting the config variable
> diff.wordCharacters.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> I would have preferred an approach like this.
Hmmm...
> diff --git a/README b/README
> index 548142c..0e325e2 100644
> --- a/README
> +++ b/README
> @@ -4,7 +4,7 @@
>
> ////////////////////////////////////////////////////////////////
>
> -"git" can mean anything, depending on your mood.
> +"git" cann mean anything, depending on your mood.
Heh.
> @@ -456,7 +514,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
> plus.ptr = xmalloc(plus.size);
> memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
> for (i = 0; i < plus.size; i++)
> - if (isspace(plus.ptr[i]))
> + if (!word_character[(unsigned char)plus.ptr[i]])
> plus.ptr[i] = '\n';
> diff_words->plus.current = 0;
I do not think there is much difference between specifying the set of word
characters and the set of non-word characters, especially as long as your
definition of "character" is limited to 8-bit bytes. By enumerating word
characters, your patch is letting the user specify non word characters
that are remainder from the 256-element set. By the way, I think you
meant to do the same for the "minus" side a few lines above this hunk.
I commented on the patch from Ping earier about a quite different issue.
I was wondering if we can avoid losing the non-word character information.
The original code replaces any isspace byte with LF, but a whitespace is a
whitespace is a whitespace so there won't be much loss of information, but
making the above isspace() configurable means that now you are going to
drop non-space non-word characters from the output set.
Instead of dropping the original character and replacing it with LF,
I thought a more sensible approach would be to _insert_ a line break
between runs of word characters and non-word characters (while probably
dropping a LF in the original). That is, instead of what the current
implementation of the above loop does to "ab c d" (i.e. rewrite it to
"ab\n\nc\nd"), rewrite it to "ab\n \nc\n \nd". Which feels more consistent
with the way how \b should work.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
2008-05-03 12:01 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
@ 2008-05-03 17:47 ` Junio C Hamano
2 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-03 17:47 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Cannot parse the summary line. Try again.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
@ 2008-05-03 18:01 ` Junio C Hamano
1 sibling, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-03 18:01 UTC (permalink / raw)
To: Ping Yin; +Cc: git
Ping Yin <pkufranky@gmail.com> writes:
> With --color-words, following example will show deleted word "bar" at
> another line. (<r> represents red)
> ------------------
> $ git diff
> - foo bar
> + foo
> $ git diff --color-words
> foo
> <r>bar</r>
> ------------------
Minor style nit, but commit log is not asiidoc and the horizontal bars are
distracting. Just use a blank line either ends to separate the example
from the descriptive text, and indent the example by a few places.
> This wrong behaviour is a bug in fn_out_diff_words_aux which always
"This wrong behaviour is a bug" is a very roundabout way to say it. "This
is caused by a bug in ..."?
> Instead, we always supress the newline when using print_word, and in
> fn_out_diff_words_aux, a newline is shown only in following cases:
>
> - true minus.suppressed_newline followd by a line beginning with
> '-', ' ' or '@' (i.e. not '+')
> - true plus.suppressed_newline followd by a line beginning with
> '+', ' ' or '@' (i.e. not '-')
Hmm, that may describe _what_ the updated code _does_, but does not
explain why/how it is an improvement.
For this kind of change we really would need tests to illustrate various
inputs and desired output for them.
> diff --git a/diff.c b/diff.c
> index b5f7141..11316fe 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -409,6 +409,7 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
> static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
> {
> struct diff_words_data *diff_words;
> + char cm;
> struct diff_words_buffer *dm, *dp;
> FILE *df;
These nondescriptive two letter variable names make the logic very hard to
follow. What does cm represent in the new code? Let's see.... (spends a
few minutes to follow the code)... Ah, it is just a randomly named
temporary variable of "char" type and do not have long-term meaning of any
significance, and could have been named "c" or "ch" or whatever. Heck,
the code fooled me because it looked like it was named similarly to the dm
and dp nearby.
Not a demonstration of "cm" being named poorly, but this wasted few
minutes shows that dm and dp are named very poorly.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
2008-05-03 11:57 ` [PATCH v2 5/5] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
@ 2008-05-03 18:18 ` Junio C Hamano
2008-05-03 18:41 ` Teemu Likonen
2008-05-04 0:32 ` Ping Yin
1 sibling, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-03 18:18 UTC (permalink / raw)
To: Ping Yin; +Cc: git
Has this series been ever tested?
$ git diff one two
diff --git a/one b/two
index f9facd3..10c6195 100644
--- a/one
+++ b/two
@@ -1 +1 @@
-A quick(brown) fox
+A quick(yellow) fox
$ tail -n 2 .git/config
[diff]
nonwordchars = "()"
$ git diff --color-words one two
diff --git a/one b/two
index f9facd3..10c6195 100644
--- a/one
+++ b/two
@@ -1 +1 @@
A quick(<red>brown)</red><green>yellow)</green> fox
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
@ 2008-05-03 18:20 ` Junio C Hamano
1 sibling, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-03 18:20 UTC (permalink / raw)
To: Ping Yin; +Cc: git
Ping Yin <pkufranky@gmail.com> writes:
> +static mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
cc1: warnings being treated as errors
diff.c:464: warning: return type defaults int
> + int i;
> +
> + dest->size = src->size;
> + dest->ptr = xmalloc(dest->size);
> + memcpy(dest->ptr, src->ptr, dest->size);
> + for (i = 0; i < dest->size; i++)
> + if (isspace(dest->ptr[i]))
> + dest->ptr[i] = '\n';
> +}
> +
> /* this executes the word diff on the accumulated buffers */
> static void diff_words_show(struct diff_words_data *diff_words)
> {
> @@ -444,20 +455,9 @@ static void diff_words_show(struct diff_words_data *diff_words)
> int i;
diff.c:482: warning: unused variable 'i'
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
@ 2008-05-03 18:41 ` Teemu Likonen
2008-05-04 0:32 ` Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Teemu Likonen @ 2008-05-03 18:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ping Yin, git
Junio C Hamano wrote (2008-05-03 11:18 -0700):
> Has this series been ever tested?
>
> $ git diff one two
> diff --git a/one b/two
> index f9facd3..10c6195 100644
> --- a/one
> +++ b/two
> @@ -1 +1 @@
> -A quick(brown) fox
> +A quick(yellow) fox
>
> $ tail -n 2 .git/config
> [diff]
> nonwordchars = "()"
>
> $ git diff --color-words one two
> diff --git a/one b/two
> index f9facd3..10c6195 100644
> --- a/one
> +++ b/two
> @@ -1 +1 @@
> A quick(<red>brown)</red><green>yellow)</green> fox
I've been testing but not quite sure what to think about the above
output. Is this more natural and expected output:
A quick(<red>brown</red><green>yellow</green>) fox
i.e. no ()'s between the changed words? Although this
-A quick brown fox
+A quick yellow fox
has always became this:
A quick <red>brown</red> <green>yellow</green> fox
------^
(Notice space here)
So there is kind of "added space" but I guess technically it's actually
like this:
A quick <red>brown </red><green>yellow </green>fox
So I think space is consistent with parentheses in your example.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
2008-05-03 18:41 ` Teemu Likonen
@ 2008-05-04 0:32 ` Ping Yin
2008-05-04 9:44 ` Johannes Schindelin
1 sibling, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-04 0:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, May 4, 2008 at 2:18 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Has this series been ever tested?
>
> $ git diff one two
> diff --git a/one b/two
> index f9facd3..10c6195 100644
> --- a/one
> +++ b/two
> @@ -1 +1 @@
> -A quick(brown) fox
> +A quick(yellow) fox
>
> $ tail -n 2 .git/config
> [diff]
> nonwordchars = "()"
>
> $ git diff --color-words one two
> diff --git a/one b/two
> index f9facd3..10c6195 100644
> --- a/one
> +++ b/two
> @@ -1 +1 @@
> A quick(<red>brown)</red><green>yellow)</green> fox
Yeah, i tested it. It's a better improvement, although not the best.
As i said in the commit msg
If we set diff.nonwordchars to "()", the example above will show as
"if (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best.
As you said in another thread, unless we insert LF between run of word
chars and run of nonword chars, the is no way to achieve the best
result.
I have try my best to achieve a better output in current
implementation (say, replace nonword chars with LF)
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* [PATCH v3 0/6] --color-words improvement
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster
Ping Yin (6):
diff.c: Remove code redundancy in diff_words_show
fn_out_diff_words_aux: Use short variable name
--color-words: Fix showing trailing deleted words at another line
--color-words: Make non-word characters configurable
fn_out_diff_words_aux: Handle common diff line more carefully
--color-words: Add test t4030
Related to last series
- add a test patch (the last one)
- refine commit message
- use more meaningful varaible name (dm -> dwb_minus etc.)
- refine some words (boundary -> non-word etc.)
- some compiling warning given by junio (add void, remove int i)
The diff between previous patch series and current series
except the last patch
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 60dd5e6..70acc14 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -95,7 +95,7 @@ endif::git-format-patch[]
--color-words::
Show colored word diff, i.e. color words which have changed.
- The boundary characters can be configured with diff.nonwordchars.
+ The non-word characters can be configured with diff.nonwordchars.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index 72fe804..08048e4 100644
--- a/diff.c
+++ b/diff.c
@@ -414,43 +414,43 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words;
- char cm, cp;
- struct diff_words_buffer *dm, *dp;
- FILE *df;
+ char lastchar_minus, lastchar_plus;
+ struct diff_words_buffer *dwb_minus, *dwb_plus;
+ FILE *outfile;
diff_words = priv;
- dm = &(diff_words->minus);
- dp = &(diff_words->plus);
- df = diff_words->file;
+ dwb_minus = &(diff_words->minus);
+ dwb_plus = &(diff_words->plus);
+ outfile = diff_words->file;
- if ((dm->suppressed_newline && line[0] != '+') ||
- (dp->suppressed_newline && line[0] != '-')) {
- putc('\n', df);
- dm->suppressed_newline = 0;
- dp->suppressed_newline = 0;
+ if ((dwb_minus->suppressed_newline && line[0] != '+') ||
+ (dwb_plus->suppressed_newline && line[0] != '-')) {
+ putc('\n', outfile);
+ dwb_minus->suppressed_newline = 0;
+ dwb_plus->suppressed_newline = 0;
}
len--;
switch (line[0]) {
case '-':
- print_word(df, dm, len, DIFF_FILE_OLD, 1);
+ print_word(outfile, dwb_minus, len, DIFF_FILE_OLD, 1);
break;
case '+':
- print_word(df, dp, len, DIFF_FILE_NEW, 1);
+ print_word(outfile, dwb_plus, len, DIFF_FILE_NEW, 1);
break;
case ' ':
- cm = dm->text.ptr[dm->current + len - 1];
- cp = dp->text.ptr[dp->current + len - 1];
- if (cm == cp) {
- print_word(df, dp, len, DIFF_PLAIN, 1);
- dm->current += len;
+ lastchar_minus = dwb_minus->text.ptr[dwb_minus->current + len - 1];
+ lastchar_plus = dwb_plus->text.ptr[dwb_plus->current + len - 1];
+ if (lastchar_minus == lastchar_plus) {
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
+ dwb_minus->current += len;
}
else {
len--;
- print_word(df, dp, len, DIFF_PLAIN, 1);
- dm->current += len;
- print_word(df, dm, 1, DIFF_FILE_OLD, 1);
- print_word(df, dp, 1, DIFF_FILE_NEW, 1);
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
+ dwb_minus->current += len;
+ print_word(outfile, dwb_minus, 1, DIFF_FILE_OLD, 1);
+ print_word(outfile, dwb_plus, 1, DIFF_FILE_NEW, 1);
}
break;
}
@@ -461,7 +461,7 @@ static int is_non_word_char(char c)
return isspace(c) || !!strchr(diff_non_word_chars, c);
}
-static mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
+static void mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
int i;
dest->size = src->size;
@@ -479,7 +479,6 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
memset(&xecfg, 0, sizeof(xecfg));
mmfile_copy_set_boundary(&minus, &(diff_words->minus.text));
Documentation/config.txt | 4 ++
Documentation/diff-options.txt | 1 +
diff.c | 84 ++++++++++++++++++++++++++-------------
t/t4030-diff-color-words.sh | 42 ++++++++++++++++++++
t/t4030/expect1 | 1 +
t/t4030/expect10 | 1 +
t/t4030/expect2 | 1 +
t/t4030/expect3 | 1 +
t/t4030/expect4 | 1 +
t/t4030/expect5 | 1 +
t/t4030/expect6 | 1 +
t/t4030/expect7 | 1 +
t/t4030/expect8 | 1 +
t/t4030/expect9 | 1 +
t/t4030/gen-expect.sh | 35 ++++++++++++++++
15 files changed, 148 insertions(+), 28 deletions(-)
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show
2008-05-04 4:20 ` [PATCH v3 0/6] --color-words improvement Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
2008-05-04 9:46 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Johannes Schindelin
0 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Introduce mmfile_copy_set_boundary to do the repeated work.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 27 +++++++++++++--------------
1 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/diff.c b/diff.c
index 3632b55..6633c9c 100644
--- a/diff.c
+++ b/diff.c
@@ -434,6 +434,17 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static void mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
+ int i;
+
+ dest->size = src->size;
+ dest->ptr = xmalloc(dest->size);
+ memcpy(dest->ptr, src->ptr, dest->size);
+ for (i = 0; i < dest->size; i++)
+ if (isspace(dest->ptr[i]))
+ dest->ptr[i] = '\n';
+}
+
/* this executes the word diff on the accumulated buffers */
static void diff_words_show(struct diff_words_data *diff_words)
{
@@ -441,23 +452,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
+ mmfile_copy_set_boundary(&minus, &(diff_words->minus.text));
diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
+ mmfile_copy_set_boundary(&plus, &(diff_words->plus.text));
diff_words->plus.current = 0;
xpp.flags = XDF_NEED_MINIMAL;
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name
2008-05-04 4:20 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
2008-05-04 9:47 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Johannes Schindelin
2008-05-04 9:46 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Johannes Schindelin
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 26 +++++++++++++++-----------
1 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/diff.c b/diff.c
index 6633c9c..05f7c35 100644
--- a/diff.c
+++ b/diff.c
@@ -408,28 +408,32 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
- struct diff_words_data *diff_words = priv;
+ struct diff_words_data *diff_words;
+ struct diff_words_buffer *dwb_minus, *dwb_plus;
+ FILE *outfile;
- if (diff_words->minus.suppressed_newline) {
+ diff_words = priv;
+ dwb_minus = &(diff_words->minus);
+ dwb_plus = &(diff_words->plus);
+ outfile = diff_words->file;
+
+ if (dwp_minus->suppressed_newline) {
if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
+ putc('\n', outfile);
+ dwp_minus->suppressed_newline = 0;
}
len--;
switch (line[0]) {
case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
+ print_word(outfile, dwb_minus, len, DIFF_FILE_OLD, 1);
break;
case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
+ print_word(outfile, dwb_plus, len, DIFF_FILE_NEW, 0);
break;
case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 0);
+ dwb_minus->current += len;
break;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
2008-05-04 9:52 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Johannes Schindelin
2008-05-04 9:47 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Johannes Schindelin
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
With --color-words, following example will show deleted word "bar" at
another line. (<r> represents red)
$ git diff
- foo bar
+ foo
$ git diff --color-words
foo
<r>bar</r>
This is caused by the unsymmetrical handling of LF in the plus and minus
buffer in fn_out_diff_words_aux.
Assume "trailing minus (or plus) word" represents the last word
(with real LF following it) in a line of the minus (or plus) buffer.
Following is original unsymmetrical handling rules where LF represents
a LF will be shown there.
- trailing minus word, [plus word, ...], LF, non plus word
- trailing plus word, LF, any word
The second rule causes any word following the trailing plus word will
be shown in a different line.
This patch tries to implement the symmetrical handling rules:
- trailing minus word, [plus word, ...], LF, non plus word
- trailing plus word, [minus word, ...], LF, non minus word
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 21 ++++++++++++++-------
1 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/diff.c b/diff.c
index 05f7c35..c7a0d77 100644
--- a/diff.c
+++ b/diff.c
@@ -409,6 +409,7 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words;
+ char lastchar_minus;
struct diff_words_buffer *dwb_minus, *dwb_plus;
FILE *outfile;
@@ -417,10 +418,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
dwb_plus = &(diff_words->plus);
outfile = diff_words->file;
- if (dwp_minus->suppressed_newline) {
- if (line[0] != '+')
- putc('\n', outfile);
- dwp_minus->suppressed_newline = 0;
+ if ((dwb_minus->suppressed_newline && line[0] != '+') ||
+ (dwb_plus->suppressed_newline && line[0] != '-')) {
+ putc('\n', outfile);
+ dwb_minus->suppressed_newline = 0;
+ dwb_plus->suppressed_newline = 0;
}
len--;
@@ -429,11 +431,14 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
print_word(outfile, dwb_minus, len, DIFF_FILE_OLD, 1);
break;
case '+':
- print_word(outfile, dwb_plus, len, DIFF_FILE_NEW, 0);
+ print_word(outfile, dwb_plus, len, DIFF_FILE_NEW, 1);
break;
case ' ':
- print_word(outfile, dwb_plus, len, DIFF_PLAIN, 0);
+ lastchar_minus = dwb_minus->text.ptr[dwb_minus->current + len - 1];
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
dwb_minus->current += len;
+ if (lastchar_minus == '\n')
+ dwb_minus->suppressed_newline = 1;
break;
}
}
@@ -474,9 +479,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
free(plus.ptr);
diff_words->minus.text.size = diff_words->plus.text.size = 0;
- if (diff_words->minus.suppressed_newline) {
+ if (diff_words->minus.suppressed_newline ||
+ diff_words->plus.suppressed_newline) {
putc('\n', diff_words->file);
diff_words->minus.suppressed_newline = 0;
+ diff_words->plus.suppressed_newline = 0;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 4/6] --color-words: Make non-word characters configurable
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-04 6:45 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Junio C Hamano
2008-05-04 9:52 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Johannes Schindelin
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Previously --color-words only allow spaces as non-word characters.
However, just space is not enough. For example, when i rename a function
from foo to bar, following example doesn't show as expected when using
--color-words.
- if (foo(arg))
+ if (bar(arg))
Assuming "r" and "g" represent "red" and "green" separately. It shows as
if <r>(foo(arg))</r><g>(foo(arg))</g>
Actually, it's the best to show as
if (<r>foo</r><g>bar</g>(arg))
This patch introduces a configuration diff.nonwordchars to allow configurable
non-word characters (both spaces and chars in diff.nonwordchars)
for --color-words.
Now, with diff.nonwordchars set to "()", the example above will show as
if (<r>foo(</r><g>bar(</g>arg))
It's much better, athough not the best,
NOTE:
With current implementation (i.e. to replace non word characters with
LF before feeding into xdi_diff), we can't get the best output.
A more sensible implementation is to use 'insert' instead of 'replace'.
Say, to insert a line break between runs of word characters and non-word
characters or between non-word characters.
That is, "foo>=bar" will be rewritten as "foo\n>\n=\nbar" instead of
"foo\n\nbar".
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
Documentation/config.txt | 4 ++++
Documentation/diff-options.txt | 1 +
diff.c | 12 +++++++++++-
3 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 824e416..812ec2c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -537,6 +537,10 @@ diff.external::
program only on a subset of your files, you might want to
use linkgit:gitattributes[5] instead.
+diff.nonwordchars::
+ Specify additional boundary characters other than spaces for
+ --color-words.
+
diff.renameLimit::
The number of files to consider when performing the copy/rename
detection; equivalent to the git diff option '-l'.
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 13234fa..70acc14 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -95,6 +95,7 @@ endif::git-format-patch[]
--color-words::
Show colored word diff, i.e. color words which have changed.
+ The non-word characters can be configured with diff.nonwordchars.
--no-renames::
Turn off rename detection, even when the configuration
diff --git a/diff.c b/diff.c
index c7a0d77..eb7c086 100644
--- a/diff.c
+++ b/diff.c
@@ -23,6 +23,7 @@ static int diff_rename_limit_default = 100;
int diff_use_color_default = -1;
static const char *external_diff_cmd_cfg;
int diff_auto_refresh_index = 1;
+static const char *diff_non_word_chars = "";
static char diff_colors[][COLOR_MAXLEN] = {
"\033[m", /* reset */
@@ -159,6 +160,10 @@ int git_diff_ui_config(const char *var, const char *value)
external_diff_cmd_cfg = xstrdup(value);
return 0;
}
+ if (!strcmp(var, "diff.nonwordchars")) {
+ diff_non_word_chars = value ? xstrdup(value) : "";
+ return 0;
+ }
if (!prefixcmp(var, "diff.")) {
const char *ep = strrchr(var, '.');
@@ -443,6 +448,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
}
}
+static int is_non_word_char(char c)
+{
+ return isspace(c) || !!strchr(diff_non_word_chars, c);
+}
+
static void mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
int i;
@@ -450,7 +460,7 @@ static void mmfile_copy_set_boundary(mmfile_t *dest, mmfile_t *src) {
dest->ptr = xmalloc(dest->size);
memcpy(dest->ptr, src->ptr, dest->size);
for (i = 0; i < dest->size; i++)
- if (isspace(dest->ptr[i]))
+ if (is_non_word_char(dest->ptr[i]))
dest->ptr[i] = '\n';
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 4:20 ` [PATCH v3 6/6] --color-words: Add test t4030 Ping Yin
2008-05-04 9:54 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Johannes Schindelin
2008-05-04 6:45 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Junio C Hamano
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Before feeding minus and plus lines into xdi_diff, we replace non word
characters with '\n'. So we need recover the replaced character (always
the last character) in the callback fn_out_diff_words_aux.
Therefore, a common diff line beginning with ' ' is not always a real
common line. And we should check the last characters of the common diff
line. If they are different, we should output the first len-1 characters
as the common part and then the last characters in minus and plus
separately.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
diff.c | 18 +++++++++++++-----
1 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/diff.c b/diff.c
index eb7c086..08048e4 100644
--- a/diff.c
+++ b/diff.c
@@ -414,7 +414,7 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words;
- char lastchar_minus;
+ char lastchar_minus, lastchar_plus;
struct diff_words_buffer *dwb_minus, *dwb_plus;
FILE *outfile;
@@ -440,10 +440,18 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
break;
case ' ':
lastchar_minus = dwb_minus->text.ptr[dwb_minus->current + len - 1];
- print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
- dwb_minus->current += len;
- if (lastchar_minus == '\n')
- dwb_minus->suppressed_newline = 1;
+ lastchar_plus = dwb_plus->text.ptr[dwb_plus->current + len - 1];
+ if (lastchar_minus == lastchar_plus) {
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
+ dwb_minus->current += len;
+ }
+ else {
+ len--;
+ print_word(outfile, dwb_plus, len, DIFF_PLAIN, 1);
+ dwb_minus->current += len;
+ print_word(outfile, dwb_minus, 1, DIFF_FILE_OLD, 1);
+ print_word(outfile, dwb_plus, 1, DIFF_FILE_NEW, 1);
+ }
break;
}
}
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* [PATCH v3 6/6] --color-words: Add test t4030
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
@ 2008-05-04 4:20 ` Ping Yin
2008-05-04 9:54 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Johannes Schindelin
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 4:20 UTC (permalink / raw)
To: git; +Cc: gitster, Ping Yin
Also add a script gen-expect.sh to generate the expected
diff output.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
---
t/t4030-diff-color-words.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
t/t4030/expect1 | 1 +
t/t4030/expect10 | 1 +
t/t4030/expect2 | 1 +
t/t4030/expect3 | 1 +
t/t4030/expect4 | 1 +
t/t4030/expect5 | 1 +
t/t4030/expect6 | 1 +
t/t4030/expect7 | 1 +
t/t4030/expect8 | 1 +
t/t4030/expect9 | 1 +
t/t4030/gen-expect.sh | 35 +++++++++++++++++++++++++++++++++++
12 files changed, 87 insertions(+), 0 deletions(-)
create mode 100755 t/t4030-diff-color-words.sh
create mode 100644 t/t4030/expect1
create mode 100644 t/t4030/expect10
create mode 100644 t/t4030/expect2
create mode 100644 t/t4030/expect3
create mode 100644 t/t4030/expect4
create mode 100644 t/t4030/expect5
create mode 100644 t/t4030/expect6
create mode 100644 t/t4030/expect7
create mode 100644 t/t4030/expect8
create mode 100644 t/t4030/expect9
create mode 100644 t/t4030/gen-expect.sh
diff --git a/t/t4030-diff-color-words.sh b/t/t4030-diff-color-words.sh
new file mode 100755
index 0000000..e1c8e8e
--- /dev/null
+++ b/t/t4030-diff-color-words.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='diff --color-words'
+
+. ./test-lib.sh
+. ../diff-lib.sh
+
+dotest() {
+ test_expect_success "$1" "
+ echo '$1' >t &&
+ git diff --color-words | tail -1 >actual &&
+ cat actual &&
+ test_cmp ../t4030/$2 actual
+"
+}
+
+test_expect_success 'setup for foo bar(_baz' '
+ git config diff.nonwordchars "_()" &&
+ echo "foo bar(_baz" > t &&
+ git add t &&
+ git commit -m "add t"
+'
+
+dotest "foo bar(_" expect1
+dotest "foo bar(" expect2
+dotest "foo bar" expect3
+dotest "foo (_baz" expect4
+dotest "foo _baz" expect5
+dotest "foo baz" expect6
+dotest "bar(_baz" expect7
+
+test_expect_success 'setup for foo bar(_' '
+ echo "foo bar(_" > t &&
+ git add t &&
+ git commit -m "add t"
+'
+
+dotest "foo bar(" expect8
+dotest "foo bar" expect9
+dotest "foo bar_baz" expect10
+
+test_done
diff --git a/t/t4030/expect1 b/t/t4030/expect1
new file mode 100644
index 0000000..fb75467
--- /dev/null
+++ b/t/t4030/expect1
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar(^[[m_^[[m^[[31mbaz^[[m^[[32m^[[m
diff --git a/t/t4030/expect10 b/t/t4030/expect10
new file mode 100644
index 0000000..b36e5db
--- /dev/null
+++ b/t/t4030/expect10
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar^[[m^[[31m(^[[m^[[32m_^[[m^[[31m_^[[m^[[31m^[[m^[[32mbaz^[[m
diff --git a/t/t4030/expect2 b/t/t4030/expect2
new file mode 100644
index 0000000..6a9ca69
--- /dev/null
+++ b/t/t4030/expect2
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar(^[[m^[[31m_^[[m^[[32m^[[m^[[31mbaz^[[m
diff --git a/t/t4030/expect3 b/t/t4030/expect3
new file mode 100644
index 0000000..d7d9885
--- /dev/null
+++ b/t/t4030/expect3
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar^[[m^[[31m(^[[m^[[32m^[[m^[[31m_^[[m^[[31mbaz^[[m
diff --git a/t/t4030/expect4 b/t/t4030/expect4
new file mode 100644
index 0000000..449fd6d
--- /dev/null
+++ b/t/t4030/expect4
@@ -0,0 +1 @@
+^[[mfoo ^[[m^[[31mbar(^[[m^[[32m(^[[m_^[[mbaz^[[m
diff --git a/t/t4030/expect5 b/t/t4030/expect5
new file mode 100644
index 0000000..eb184f7
--- /dev/null
+++ b/t/t4030/expect5
@@ -0,0 +1 @@
+^[[mfoo ^[[m^[[31mbar(^[[m_^[[mbaz^[[m
diff --git a/t/t4030/expect6 b/t/t4030/expect6
new file mode 100644
index 0000000..58591ad
--- /dev/null
+++ b/t/t4030/expect6
@@ -0,0 +1 @@
+^[[mfoo ^[[m^[[31mbar(^[[m^[[31m_^[[mbaz^[[m
diff --git a/t/t4030/expect7 b/t/t4030/expect7
new file mode 100644
index 0000000..c68a1f1
--- /dev/null
+++ b/t/t4030/expect7
@@ -0,0 +1 @@
+^[[m^[[31mfoo ^[[mbar(^[[m_^[[mbaz^[[m
diff --git a/t/t4030/expect8 b/t/t4030/expect8
new file mode 100644
index 0000000..f48152f
--- /dev/null
+++ b/t/t4030/expect8
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar(^[[m^[[31m_^[[m^[[32m^[[m^[[31m^[[m
diff --git a/t/t4030/expect9 b/t/t4030/expect9
new file mode 100644
index 0000000..4f42b6c
--- /dev/null
+++ b/t/t4030/expect9
@@ -0,0 +1 @@
+^[[mfoo ^[[mbar^[[m^[[31m(^[[m^[[32m^[[m^[[31m_^[[m^[[31m^[[m
diff --git a/t/t4030/gen-expect.sh b/t/t4030/gen-expect.sh
new file mode 100644
index 0000000..da07535
--- /dev/null
+++ b/t/t4030/gen-expect.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+git config diff.nonwordchars "_()"
+
+echo "foo bar(_baz" >&2 &&
+echo "foo bar(_baz" > t
+
+git add t &&
+git commit -m "add t"
+
+dotest() {
+ echo "$1" >&2 &&
+ echo "$1" >t &&
+ git diff --color-words |
+ tail -1 > $2
+}
+
+dotest "foo bar(_" expect1
+dotest "foo bar(" expect2
+dotest "foo bar" expect3
+dotest "foo (_baz" expect4
+dotest "foo _baz" expect5
+dotest "foo baz" expect6
+dotest "bar(_baz" expect7
+
+
+echo "foo bar(_" >&2 &&
+echo "foo bar(_" >t
+
+git add t &&
+git commit -m "add t"
+
+dotest "foo bar(" expect8
+dotest "foo bar" expect9
+dotest "foo bar_baz" expect10
--
1.5.5.1.121.g26b3
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v3 4/6] --color-words: Make non-word characters configurable
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
@ 2008-05-04 6:45 ` Junio C Hamano
2008-05-04 7:04 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-04 6:45 UTC (permalink / raw)
To: Ping Yin; +Cc: git
Ping Yin <pkufranky@gmail.com> writes:
> A more sensible implementation is to use 'insert' instead of 'replace'.
> Say, to insert a line break between runs of word characters and non-word
> characters or between non-word characters.
>
> That is, "foo>=bar" will be rewritten as "foo\n>\n=\nbar" instead of
> "foo\n\nbar".
Hmmmm.
I am not sure if/why you would want a separator between '>' and '=' in
that example. Shouldn't that be "foo <sep> >= <sep> bar"?
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 4/6] --color-words: Make non-word characters configurable
2008-05-04 6:45 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Junio C Hamano
@ 2008-05-04 7:04 ` Ping Yin
0 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 7:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, May 4, 2008 at 2:45 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Ping Yin <pkufranky@gmail.com> writes:
>
> > A more sensible implementation is to use 'insert' instead of 'replace'.
> > Say, to insert a line break between runs of word characters and non-word
> > characters or between non-word characters.
> >
> > That is, "foo>=bar" will be rewritten as "foo\n>\n=\nbar" instead of
> > "foo\n\nbar".
>
> Hmmmm.
>
> I am not sure if/why you would want a separator between '>' and '=' in
> that example. Shouldn't that be "foo <sep> >= <sep> bar"?
That's intentional. If i change >= to >, i want it to be highlighted
as <r>=</r> instead of <r> >= </r> <g> > </g> (spaced added for
readability). So i tend to consider each non-word character as a
single word.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 14:43 ` Teemu Likonen
@ 2008-05-04 9:18 ` Johannes Schindelin
0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:18 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Ping Yin, git, gitster
Hi,
On Sat, 3 May 2008, Teemu Likonen wrote:
> Johannes Schindelin wrote (2008-05-03 15:03 +0100):
>
> > Now, you can specify which characters are to be interpreted as word
> > characters with "--color-words=A-Za-z", or by setting the config
> > variable diff.wordCharacters.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >
> > I would have preferred an approach like this.
>
> Unfortunately this does not work at all with other than Ascii
> characters. It makes --color-words completely unusable for anything
> other than Ascii text. Sorry.
Sorry, but the original way was also only meant for ASCII. The fact that
isspace() happens to work with UTF-8 does _not_ mean that it was any more
useful with non-ASCII: think UTF-16.
So no, I do not buy into your ASCII argument at all.
> Ping Yin's version has also the problem that UTF-8 multibyte characters
> U+0080..U+10FFFF don't work in diff.nonwordchars. Fortunately the most
> important word delimiters are in U+0000..U+007F (=Ascii) area so Ping's
> version is perfectly usable with Unicode text.
>
> (Even the old --color-words behaviour with only SPACE as non-word char
> was perfectly usable with Unicode text.)
See above.
> I, too, would like to see Ping's patch series merged in.
I have no problems with the intention. But I have problems with the
design. It is no less ASCII-bound than what I proposed, it wants you to
specify what does _not_ make a word character (making every newbie, and
me, too, going "Huh?").
And I also commented on the artificial limitations by design: I think it
is a big mistake to limit the user's options when it would be easy not to,
just because the designer could not think of useful applications.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH] --color-words: Make the word characters configurable
2008-05-03 17:43 ` Junio C Hamano
@ 2008-05-04 9:25 ` Johannes Schindelin
0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ping Yin, git
Hi,
On Sat, 3 May 2008, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Now, you can specify which characters are to be interpreted as word
> > characters with "--color-words=A-Za-z", or by setting the config variable
> > diff.wordCharacters.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >
> > I would have preferred an approach like this.
>
> Hmmm...
Just to clarify: specifying word characters, and allowing sets (as
specifyable for tr(1)).
> > diff --git a/README b/README
> > index 548142c..0e325e2 100644
> > --- a/README
> > +++ b/README
> > @@ -4,7 +4,7 @@
> >
> > ////////////////////////////////////////////////////////////////
> >
> > -"git" can mean anything, depending on your mood.
> > +"git" cann mean anything, depending on your mood.
>
> Heh.
Yeah, I already said I am a moron. I can repeat it if it makes you
happier ;-)
> > @@ -456,7 +514,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
> > plus.ptr = xmalloc(plus.size);
> > memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
> > for (i = 0; i < plus.size; i++)
> > - if (isspace(plus.ptr[i]))
> > + if (!word_character[(unsigned char)plus.ptr[i]])
> > plus.ptr[i] = '\n';
> > diff_words->plus.current = 0;
>
> I do not think there is much difference between specifying the set of
> word characters and the set of non-word characters, especially as long
> as your definition of "character" is limited to 8-bit bytes. By
> enumerating word characters, your patch is letting the user specify non
> word characters that are remainder from the 256-element set. By the
> way, I think you meant to do the same for the "minus" side a few lines
> above this hunk.
I just imitated Ping's patch, but you're right, I forgot that.
> I commented on the patch from Ping earier about a quite different issue.
> I was wondering if we can avoid losing the non-word character
> information. The original code replaces any isspace byte with LF, but a
> whitespace is a whitespace is a whitespace so there won't be much loss
> of information, but making the above isspace() configurable means that
> now you are going to drop non-space non-word characters from the output
> set.
>
> Instead of dropping the original character and replacing it with LF, I
> thought a more sensible approach would be to _insert_ a line break
> between runs of word characters and non-word characters (while probably
> dropping a LF in the original). That is, instead of what the current
> implementation of the above loop does to "ab c d" (i.e. rewrite it to
> "ab\n\nc\nd"), rewrite it to "ab\n \nc\n \nd". Which feels more
> consistent with the way how \b should work.
The conversion to "\n" is done only because of limitations in libxdiff
(did I not just rant about artificial limitations in another mail?),
because it is married to the notion that LF ends a line.
Now, there are two options:
- try to reconstruct the original text from what libxdiff returns. This
is potentially memory-efficient, but tricky, and therefore easy to get
wrong.
- go with your approach. You will have to duplicate all the text, so this
is something quite heavy on memory consumption. But you have to do
something special for _real_ LFs so that they are not stripped away when
displaying the result.
I like your idea (I was trying to come up with something sensible for the
first option, but as I said, it is too tricky).
But the LF issue is a real one.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 0:32 ` Ping Yin
@ 2008-05-04 9:44 ` Johannes Schindelin
2008-05-04 16:35 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:44 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, git
Hi,
On Sun, 4 May 2008, Ping Yin wrote:
> If we set diff.nonwordchars to "()", the example above will show as "if
> (<r>foo(</r><g>bar(</g>arg))". It's much better, athough not the best.
Okay, let's use the power of Open Source, and come up with the best
solution.
The problem: given two chunks of text, where a word was changed, and a
non-word-character was moved to the next line. Example:
The quick,
brown fox
vs
The fast
, brown fox
IMHO the layout of the new version should be retained, i.e.
The /quick/fast/
, brown fox
should be shown.
If everybody is fine with that, I'll try to come up with an
implementation.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show
2008-05-04 4:20 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
@ 2008-05-04 9:46 ` Johannes Schindelin
1 sibling, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:46 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Sun, 4 May 2008, Ping Yin wrote:
> Introduce mmfile_copy_set_boundary to do the repeated work.
It the name supposed to be descriptive? Because it sure is not for me.
Reading the patch, I know what it should do, but the commit message could
be a big blank space then, to save me time.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
@ 2008-05-04 9:47 ` Johannes Schindelin
2008-05-04 16:39 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:47 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
the use of this patch evades me. Care to defend why it is necessary?
Preferably in the commit message?
Thanks,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
@ 2008-05-04 9:52 ` Johannes Schindelin
2008-05-04 16:48 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:52 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Sun, 4 May 2008, Ping Yin wrote:
> With --color-words, following example will show deleted word "bar" at
> another line.
"will", or "used to"?
> This is caused by the unsymmetrical handling of LF in the plus and minus
> buffer in fn_out_diff_words_aux.
Is it not rather caused by the need to replace non-word-characters with
LF?
> Following is original unsymmetrical handling rules where LF represents
> a LF will be shown there.
I cannot parse this sentence.
> The second rule causes any word following the trailing plus word will
> be shown in a different line.
I cannot parse this sentence.
> @@ -417,10 +418,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
> dwb_plus = &(diff_words->plus);
> outfile = diff_words->file;
>
> - if (dwp_minus->suppressed_newline) {
> - if (line[0] != '+')
> - putc('\n', outfile);
> - dwp_minus->suppressed_newline = 0;
> + if ((dwb_minus->suppressed_newline && line[0] != '+') ||
If the previous version had dwp_minus, and the new version has dwb_minus,
I wonder if both compile and pass the test suite.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-04 4:20 ` [PATCH v3 6/6] --color-words: Add test t4030 Ping Yin
@ 2008-05-04 9:54 ` Johannes Schindelin
2008-05-04 16:53 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-04 9:54 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Sun, 4 May 2008, Ping Yin wrote:
> Before feeding minus and plus lines into xdi_diff, we replace non word
> characters with '\n'. So we need recover the replaced character (always
> the last character) in the callback fn_out_diff_words_aux.
>
> Therefore, a common diff line beginning with ' ' is not always a real
> common line.
Umm, why?
> And we should check the last characters of the common diff line. If they
> are different, we should output the first len-1 characters as the common
> part and then the last characters in minus and plus separately.
Umm, why?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 9:44 ` Johannes Schindelin
@ 2008-05-04 16:35 ` Ping Yin
2008-05-04 20:16 ` Junio C Hamano
2008-05-05 11:51 ` Johannes Schindelin
0 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-04 16:35 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On Sun, May 4, 2008 at 5:44 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> The problem: given two chunks of text, where a word was changed, and a
> non-word-character was moved to the next line. Example:
>
> The quick,
> brown fox
>
> vs
>
> The fast
> , brown fox
>
> IMHO the layout of the new version should be retained, i.e.
>
> The /quick/fast/
> , brown fox
>
> should be shown.
Why not
The <r>quick</r><g>fast</g><r>,</r>
<g>,</g>brown fox
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name
2008-05-04 9:47 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Johannes Schindelin
@ 2008-05-04 16:39 ` Ping Yin
2008-05-05 12:05 ` Johannes Schindelin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-04 16:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Sun, May 4, 2008 at 5:47 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> the use of this patch evades me. Care to defend why it is necessary?
> Preferably in the commit message?
>
In fn_out_diff_words_aux, we use diff_words->plus in so many places.
So just use shorter name to save typing and avoid typo.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line
2008-05-04 9:52 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Johannes Schindelin
@ 2008-05-04 16:48 ` Ping Yin
2008-05-05 12:10 ` Johannes Schindelin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-04 16:48 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Sun, May 4, 2008 at 5:52 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
>
> On Sun, 4 May 2008, Ping Yin wrote:
>
> > With --color-words, following example will show deleted word "bar" at
> > another line.
>
> "will", or "used to"?
should be 'used to'
> > This is caused by the unsymmetrical handling of LF in the plus and minus
> > buffer in fn_out_diff_words_aux.
>
> Is it not rather caused by the need to replace non-word-characters with
> LF?
No, i think this has nothing to do with the replacing
non-word-characters with LF.
>
> > Following is original unsymmetrical handling rules where LF represents
> > a LF will be shown there.
>
> I cannot parse this sentence.
Following is the original unsymmetrical handling rules
>
> > The second rule causes any word following the trailing plus word will
> > be shown in a different line.
>
> I cannot parse this sentence.
>
>
The second rule causes any word following the trailing plus word to show
in a different line with the trailing plus word.
>
> > @@ -417,10 +418,11 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
> > dwb_plus = &(diff_words->plus);
> > outfile = diff_words->file;
> >
> > - if (dwp_minus->suppressed_newline) {
> > - if (line[0] != '+')
> > - putc('\n', outfile);
> > - dwp_minus->suppressed_newline = 0;
> > + if ((dwb_minus->suppressed_newline && line[0] != '+') ||
>
> If the previous version had dwp_minus, and the new version has dwb_minus,
> I wonder if both compile and pass the test suite.
>
Oh, it's a typo in the former patch.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-04 9:54 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Johannes Schindelin
@ 2008-05-04 16:53 ` Ping Yin
2008-05-05 12:11 ` Johannes Schindelin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-04 16:53 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Sun, May 4, 2008 at 5:54 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
>
> On Sun, 4 May 2008, Ping Yin wrote:
>
> > Before feeding minus and plus lines into xdi_diff, we replace non word
> > characters with '\n'. So we need recover the replaced character (always
> > the last character) in the callback fn_out_diff_words_aux.
> >
> > Therefore, a common diff line beginning with ' ' is not always a real
> > common line.
>
> Umm, why?
Because we need recover the replaced character.
Say, for a common diff line " foo", after restoring the replaced
character, the corresponding line in minus and plus may be different.
For example, "foo(" and "foo)".
> > And we should check the last characters of the common diff line. If they
> > are different, we should output the first len-1 characters as the common
> > part and then the last characters in minus and plus separately.
>
> Umm, why?
Explained.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 16:35 ` Ping Yin
@ 2008-05-04 20:16 ` Junio C Hamano
2008-05-04 20:47 ` Jakub Narebski
2008-05-05 1:40 ` Ping Yin
2008-05-05 11:51 ` Johannes Schindelin
1 sibling, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-04 20:16 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git
Let's step back a bit and try to clarify the problem with a bit of
illustration.
The motivation behind "word diff" is because line oriented diff is
sometimes unwieldy.
-Hello world.
+Hi, world.
A naïve strategy to solve this would be to convert the input into one
character a line while changing the representation of characters into
their codepoints, take the diff between them, and synthesize the result
back, like this:
preimage postimage char-diff
48 H 48 H 48 H
65 e -65 e
6c l -6c l
6c l -6c l
6f o -6f o
69 i +69 i
2c , +2c ,
20 ' ' 20 ' ' 20 ' '
77 w 77 w 77 w
6f o 6f o 6f o
72 r 72 r 72 r
6c l 6c l 6c l
64 d 64 d 64 d
2e . 2e . 2e .
0a '\n' 0a '\n' 0a '\n'
That would produce "H/ello/i,/ world.\n" which is very suboptimal for
human consumption because it chomps a word "Hello" and "Hi" in the middle.
We instead can do this word by word (note that I am doing this as a
thought experiment, to illustrate what the problem is and what should
conceptually happen, not suggesting this particular implementation):
preimage postimage word-diff
48656c6c6f -48656c6c6f Hello
4869 +4869 Hi
2c +2c ,
20 20 20 ' '
776f726c64 776f726c64 776f726c64 world
2e 2e 2e .
0a 0a 0a '\n'
Which would give you "/Hello/Hi,/ world.\n".
Another my favorite example:
-if (i > 1)
+while (i >= 0)
preimage postimage word-diff
6966 -6966 if
7768696c65 +7768696c65 while
20 20 20 ' '
28 28 28 (
69 69 69 i
20 20 20 ' '
3e -3e >
3e3d +3e3d >=
20 20 20 ' '
31 -31 1
30 +30 0
29 29 29 )
which should yield "/if/while/ (i />/>=/ /1/0/)".
So the overall algorithm I think should be is:
- make the input into stream of tokens, where a token is either a run of
word characters only, non-word punct characters only, or whitespaces
only;
- compute the diff over the stream of tokens;
- emit common tokens in white, deleted in red and added in green.
Notice that you do not have to special case LF in any way if you go this
route.
You could do this with only two classes, and use a different tokenization
rule: a token is either a run of word characters only, or each byte of non
word character becomes individual token. This however would yield a
suboptimal result:
-if (i > 1)
+while (i >= 0)
preimage postimage word-diff
6966 -6966 if
7768696c65 +7768696c65 while
20 20 20 ' '
28 28 28 (
69 69 69 i
20 20 20 ' '
3e 3e 3e >
3d +3d =
20 20 20 ' '
31 -31 1
30 +30 0
29 29 29 )
This would give "/if/while/ (i >//=/ /1/0/)". A logical unit ">=" is
chomped into two tokens, which is suboptimal for the same reason why the
output "H/ello/i,/" from the original char-diff based one was suboptimal.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 20:16 ` Junio C Hamano
@ 2008-05-04 20:47 ` Jakub Narebski
2008-05-04 21:27 ` Teemu Likonen
2008-05-05 12:14 ` Johannes Schindelin
2008-05-05 1:40 ` Ping Yin
1 sibling, 2 replies; 92+ messages in thread
From: Jakub Narebski @ 2008-05-04 20:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Ping Yin, git
Junio C Hamano <junio@pobox.com> writes:
> Let's step back a bit and try to clarify the problem with a bit of
> illustration.
>
> The motivation behind "word diff" is because line oriented diff is
> sometimes unwieldy.
>
> -Hello world.
> +Hi, world.
[...]
> We instead can do this word by word (note that I am doing this as a
> thought experiment, to illustrate what the problem is and what should
> conceptually happen, not suggesting this particular implementation):
>
> preimage postimage word-diff
> 48656c6c6f -48656c6c6f Hello
> 4869 +4869 Hi
> 2c +2c ,
> 20 20 20 ' '
> 776f726c64 776f726c64 776f726c64 world
> 2e 2e 2e .
> 0a 0a 0a '\n'
>
> Which would give you "/Hello/Hi,/ world.\n".
Would it be possible instead of in-line word diff, use word coloring
to enhance traditional diff format? Something like
-/Hello/ world.
+/Hi,/ world.
(We could use bold, or reverse for marking changed fragment, or use
color only for changed fragment).
IMHO current output is nice, unless you have long lines and not very
wide screen...
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 20:47 ` Jakub Narebski
@ 2008-05-04 21:27 ` Teemu Likonen
2008-05-05 12:14 ` Johannes Schindelin
1 sibling, 0 replies; 92+ messages in thread
From: Teemu Likonen @ 2008-05-04 21:27 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Junio C Hamano, Johannes Schindelin, Ping Yin, git
Jakub Narebski wrote (2008-05-04 13:47 -0700):
> Would it be possible instead of in-line word diff, use word coloring
> to enhance traditional diff format? Something like
>
> -/Hello/ world.
> +/Hi,/ world.
>
> (We could use bold, or reverse for marking changed fragment, or use
> color only for changed fragment).
That would be helpful too, no doubt. I'm advocating the word diff
because it's extremely useful with human languages. Lines don't have
(usually) any special meaning there so in this context words are the
most useful units.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 20:16 ` Junio C Hamano
2008-05-04 20:47 ` Jakub Narebski
@ 2008-05-05 1:40 ` Ping Yin
2008-05-05 5:00 ` Junio C Hamano
1 sibling, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-05 1:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
On 5/5/08, Junio C Hamano <junio@pobox.com> wrote:
>
> So the overall algorithm I think should be is:
>
> - make the input into stream of tokens, where a token is either a run of
> word characters only, non-word punct characters only, or whitespaces
> only;
>
> - compute the diff over the stream of tokens;
>
> - emit common tokens in white, deleted in red and added in green.
>
> Notice that you do not have to special case LF in any way if you go this
> route.
>
> You could do this with only two classes, and use a different tokenization
> rule: a token is either a run of word characters only, or each byte of non
> word character becomes individual token. This however would yield a
> suboptimal result:
>
> -if (i > 1)
> +while (i >= 0)
>
> preimage postimage word-diff
> 6966 -6966 if
> 7768696c65 +7768696c65 while
> 20 20 20 ' '
> 28 28 28 (
> 69 69 69 i
> 20 20 20 ' '
> 3e 3e 3e >
> 3d +3d =
> 20 20 20 ' '
> 31 -31 1
> 30 +30 0
> 29 29 29 )
>
> This would give "/if/while/ (i >//=/ /1/0/)". A logical unit ">=" is
> chomped into two tokens, which is suboptimal for the same reason why the
> output "H/ello/i,/" from the original char-diff based one was suboptimal.
>
For this example,both "/if/while/ (i />/>=/ /1/0/)" and "/if/while/
(i >//=/ /1/0/)" are fine to me. However, the run of non-word
characters shouldn't always be considered as a single token.
For example
- **************
+ ************
If just a '+' is removed, surely "************/*//" is better.
And when designing, i think it's better to take multi-byte characters
into account. For multi-byte characters (especially CJK), every
character should be considered as a token. if we consider either a run
of word characters or a run of non-word characters as a single token,
there is no way to specify every character as a token.
So from this viewpoint, is it better to use single-token character or
something else instead of non-word character?
Another consideration: Space information is also important for me when
using --color-words. However, i can't distinguish between the removed
spaces and added spaces in current implementaion. So how about use
red/green background color for removed/added spaces?
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-05 1:40 ` Ping Yin
@ 2008-05-05 5:00 ` Junio C Hamano
2008-05-05 12:10 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-05 5:00 UTC (permalink / raw)
To: Ping Yin; +Cc: Johannes Schindelin, git
"Ping Yin" <pkufranky@gmail.com> writes:
> For this example,both "/if/while/ (i />/>=/ /1/0/)" and "/if/while/
> (i >//=/ /1/0/)" are fine to me.
For the particular example, both are Ok, but for this other example:
-if (i > 1...
+if ((i > 1...
it probably is better to treat each non-word character as a separate
token, that is, it would be easier to read if we said "( stayed intact,
and another ( was added", instead of saying "( is changed to ((".
So "a run of punct chars" rule only sometimes produces better output but
otherwise worse output, and to make it produce better output consistently,
we would need to know the syntax of the target language for tokenization,
i.e. ">=" and ">" are comparison operators, while "(" is a token and "(("
is better split into two open-paren tokens.
So as a very longer term subproject, we may want to teach the mechanism
language specific tokenization rules, just like we can specify the hunk
header pattern via gitattributes(5) to the diff output layer.
Of course, I do not expect you to do that during this round --- and if we
choose to keep the rule simple, I think it is probably better to use
one-char-one-token rule for now.
> And when designing, i think it's better to take multi-byte characters
> into account. For multi-byte characters (especially CJK), every
> character should be considered as a token.
If we take an idealistic view for the longer term, we should be tokenizing
even CJK sensibly, but unlike Occidental scripts, we cannot even use
inter-word spacing for tokenizing hint, so unless we are willing to learn
morphological analysis (which we are not for now), the best we can do is
to use one-char-one-token rule.
Side Note. For Japanese we could cheat and often do a slightly
better job than simple one-char-one-token without having full
morphological analysis by splicing between Kanji and Kana
boundaries, but I'd prefer not to go there and keep the rules we
would use to the minimum.
I should stress that I said "character" in the above "punct" and "CJK"
discussions, not "byte".
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 16:35 ` Ping Yin
2008-05-04 20:16 ` Junio C Hamano
@ 2008-05-05 11:51 ` Johannes Schindelin
2008-05-05 12:02 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-05 11:51 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, git
Hi,
On Mon, 5 May 2008, Ping Yin wrote:
> On Sun, May 4, 2008 at 5:44 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Hi,
> >
> > The problem: given two chunks of text, where a word was changed, and a
> > non-word-character was moved to the next line. Example:
> >
> > The quick,
> > brown fox
> >
> > vs
> >
> > The fast
> > , brown fox
> >
> > IMHO the layout of the new version should be retained, i.e.
> >
> > The /quick/fast/
> > , brown fox
> >
> > should be shown.
>
> Why not
>
> The <r>quick</r><g>fast</g><r>,</r>
> <g>,</g>brown fox
I might well be a complete idiot, but your <r></r><something> example is
way harder for me to read than my example.
And of course your example would still be wrong: the "quick" and the comma
would not be separated at all.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-05 11:51 ` Johannes Schindelin
@ 2008-05-05 12:02 ` Ping Yin
0 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-05 12:02 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On Mon, May 5, 2008 at 7:51 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > >
> > > The quick,
> > > brown fox
> > >
> > > vs
> > >
> > > The fast
> > > , brown fox
> > >
> > > IMHO the layout of the new version should be retained, i.e.
> > >
> > > The /quick/fast/
> > > , brown fox
> > >
> > > should be shown.
> >
> > Why not
> >
> > The <r>quick</r><g>fast</g><r>,</r>
> > <g>,</g>brown fox
>
> I might well be a complete idiot, but your <r></r><something> example is
> way harder for me to read than my example.
>
> And of course your example would still be wrong: the "quick" and the comma
> would not be separated at all.
So i am an idiot too -:). Should be
The /quick,/fast/
//, /brown fox
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name
2008-05-04 16:39 ` Ping Yin
@ 2008-05-05 12:05 ` Johannes Schindelin
0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-05 12:05 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Mon, 5 May 2008, Ping Yin wrote:
> On Sun, May 4, 2008 at 5:47 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > Hi,
> >
> > the use of this patch evades me. Care to defend why it is necessary?
> > Preferably in the commit message?
> >
>
> In fn_out_diff_words_aux, we use diff_words->plus in so many places.
> So just use shorter name to save typing and avoid typo.
Me, on the other hand, I find "diff_words->plus" so much better to read.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-05 5:00 ` Junio C Hamano
@ 2008-05-05 12:10 ` Ping Yin
2008-05-06 0:40 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-05 12:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
On Mon, May 5, 2008 at 1:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Ping Yin" <pkufranky@gmail.com> writes:
>
> > For this example,both "/if/while/ (i />/>=/ /1/0/)" and "/if/while/
> > (i >//=/ /1/0/)" are fine to me.
>
> For the particular example, both are Ok, but for this other example:
>
> -if (i > 1...
> +if ((i > 1...
>
> it probably is better to treat each non-word character as a separate
> token, that is, it would be easier to read if we said "( stayed intact,
> and another ( was added", instead of saying "( is changed to ((".
>
> So "a run of punct chars" rule only sometimes produces better output but
> otherwise worse output, and to make it produce better output consistently,
> we would need to know the syntax of the target language for tokenization,
> i.e. ">=" and ">" are comparison operators, while "(" is a token and "(("
> is better split into two open-paren tokens.
>
> So as a very longer term subproject, we may want to teach the mechanism
> language specific tokenization rules, just like we can specify the hunk
> header pattern via gitattributes(5) to the diff output layer.
>
> Of course, I do not expect you to do that during this round --- and if we
> choose to keep the rule simple, I think it is probably better to use
> one-char-one-token rule for now.
>
>
> > And when designing, i think it's better to take multi-byte characters
> > into account. For multi-byte characters (especially CJK), every
> > character should be considered as a token.
>
> If we take an idealistic view for the longer term, we should be tokenizing
> even CJK sensibly, but unlike Occidental scripts, we cannot even use
> inter-word spacing for tokenizing hint, so unless we are willing to learn
> morphological analysis (which we are not for now), the best we can do is
> to use one-char-one-token rule.
>
> Side Note. For Japanese we could cheat and often do a slightly
> better job than simple one-char-one-token without having full
> morphological analysis by splicing between Kanji and Kana
> boundaries, but I'd prefer not to go there and keep the rules we
> would use to the minimum.
>
> I should stress that I said "character" in the above "punct" and "CJK"
> discussions, not "byte".
>
The one-char-one-token and multi-char-one-token rules may have
different implementation issues. I think multi-char-one-token rule may
be more representative. So for the current time, i prefer considering
both run of word characters and single non-word character as a token.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line
2008-05-04 16:48 ` Ping Yin
@ 2008-05-05 12:10 ` Johannes Schindelin
0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-05 12:10 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Mon, 5 May 2008, Ping Yin wrote:
> On Sun, May 4, 2008 at 5:52 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Sun, 4 May 2008, Ping Yin wrote:
> >
> > > This is caused by the unsymmetrical handling of LF in the plus and
> > > minus buffer in fn_out_diff_words_aux.
> >
> > Is it not rather caused by the need to replace non-word-characters
> > with LF?
>
> No, i think this has nothing to do with the replacing
> non-word-characters with LF.
I know that you _think_ that. But you do a bad job convincing me (by
avoiding an explanation).
> > > Following is original unsymmetrical handling rules where LF
> > > represents a LF will be shown there.
> >
> > I cannot parse this sentence.
>
> Following is the original unsymmetrical handling rules
I am sorry, but this non-native English speaker still has trouble
recognizing the meaning in this.
> > > The second rule causes any word following the trailing plus word
> > > will be shown in a different line.
> >
> > I cannot parse this sentence.
>
> The second rule causes any word following the trailing plus word to show
> in a different line with the trailing plus word.
Again, I can only guess that you mean something like this:
2nd rule: any word after an added word will be shown after a line
break.
Again, I am not a native speaker, so IMO it is very important to be clear
in crucial points such as this one.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-04 16:53 ` Ping Yin
@ 2008-05-05 12:11 ` Johannes Schindelin
2008-05-05 14:18 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-05 12:11 UTC (permalink / raw)
To: Ping Yin; +Cc: git, gitster
Hi,
On Mon, 5 May 2008, Ping Yin wrote:
> On Sun, May 4, 2008 at 5:54 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > On Sun, 4 May 2008, Ping Yin wrote:
> >
> > > Before feeding minus and plus lines into xdi_diff, we replace non
> > > word characters with '\n'. So we need recover the replaced
> > > character (always the last character) in the callback
> > > fn_out_diff_words_aux.
> > >
> > > Therefore, a common diff line beginning with ' ' is not always a
> > > real common line.
> >
> > Umm, why?
>
> Because we need recover the replaced character.
>
> Say, for a common diff line " foo", after restoring the replaced
> character, the corresponding line in minus and plus may be different.
> For example, "foo(" and "foo)".
Why do I have to spend time trying to figure out what you meant, write an
email, and get the explanation only in a response (i.e. not the commit
message, where it belongs)?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-04 20:47 ` Jakub Narebski
2008-05-04 21:27 ` Teemu Likonen
@ 2008-05-05 12:14 ` Johannes Schindelin
1 sibling, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-05 12:14 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Junio C Hamano, Ping Yin, git
Hi,
On Sun, 4 May 2008, Jakub Narebski wrote:
> Junio C Hamano <junio@pobox.com> writes:
>
> > Let's step back a bit and try to clarify the problem with a bit of
> > illustration.
> >
> > The motivation behind "word diff" is because line oriented diff is
> > sometimes unwieldy.
> >
> > -Hello world.
> > +Hi, world.
> [...]
> > We instead can do this word by word (note that I am doing this as a
> > thought experiment, to illustrate what the problem is and what should
> > conceptually happen, not suggesting this particular implementation):
> >
> > preimage postimage word-diff
> > 48656c6c6f -48656c6c6f Hello
> > 4869 +4869 Hi
> > 2c +2c ,
> > 20 20 20 ' '
> > 776f726c64 776f726c64 776f726c64 world
> > 2e 2e 2e .
> > 0a 0a 0a '\n'
> >
> > Which would give you "/Hello/Hi,/ world.\n".
>
> Would it be possible instead of in-line word diff, use word coloring
> to enhance traditional diff format? Something like
>
> -/Hello/ world.
> +/Hi,/ world.
>
> (We could use bold, or reverse for marking changed fragment, or use
> color only for changed fragment).
>
> IMHO current output is nice, unless you have long lines and not very
> wide screen...
-S ;-)
IIRC the code to display was not too complicated for the current mode, so
it should be relatively simple for the mode you desire.
But first let's agree on the semantics of the "tokens", as Junio calls
them, okay?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully
2008-05-05 12:11 ` Johannes Schindelin
@ 2008-05-05 14:18 ` Ping Yin
0 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-05 14:18 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
On Mon, May 5, 2008 at 8:11 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> >
> > Because we need recover the replaced character.
> >
> > Say, for a common diff line " foo", after restoring the replaced
> > character, the corresponding line in minus and plus may be different.
> > For example, "foo(" and "foo)".
>
> Why do I have to spend time trying to figure out what you meant, write an
> email, and get the explanation only in a response (i.e. not the commit
> message, where it belongs)?
>
Since the new solution from junio has risen up, i think this patch
doesn't deserve us spending more time.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-05 12:10 ` Ping Yin
@ 2008-05-06 0:40 ` Ping Yin
2008-05-06 8:55 ` Johannes Schindelin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-06 0:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
On Mon, May 5, 2008 at 8:10 PM, Ping Yin <pkufranky@gmail.com> wrote:
>
> The one-char-one-token and multi-char-one-token rules may have
> different implementation issues. I think multi-char-one-token rule may
> be more representative. So for the current time, i prefer considering
> both run of word characters and single non-word character as a token.
>
If we agree on this. I will come up with an implementation still using
diff.nonwordchars few days later.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-06 0:40 ` Ping Yin
@ 2008-05-06 8:55 ` Johannes Schindelin
2008-05-07 1:15 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-06 8:55 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, git
Hi,
On Tue, 6 May 2008, Ping Yin wrote:
> On Mon, May 5, 2008 at 8:10 PM, Ping Yin <pkufranky@gmail.com> wrote:
>
> I will come up with an implementation still using diff.nonwordchars few
> days later.
If I did not like the unnecessary negative approach "nonwordchars" (as
opposed to "wordchars"), it seems even less appropriate now, when you
actually want to discern between "spaceCharacters",
"punctuationCharacters" and "wordCharacters".
Hth,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-06 8:55 ` Johannes Schindelin
@ 2008-05-07 1:15 ` Ping Yin
2008-05-07 11:24 ` Johannes Schindelin
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-07 1:15 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On Tue, May 6, 2008 at 4:55 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > I will come up with an implementation still using diff.nonwordchars few
> > days later.
>
> If I did not like the unnecessary negative approach "nonwordchars" (as
> opposed to "wordchars"), it seems even less appropriate now, when you
> actually want to discern between "spaceCharacters",
> "punctuationCharacters" and "wordCharacters".
>
Hmm, punctchars should be a better word than nonwordchars.
So how about this
--color-words={char,punct,word}
- char: one char one token
- punct/word: a token can be either a run of word characters or a
single punct character. diff.punctchars is used for punct, and
diff.wordchars is used for word.
We leave the choice to the user.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 1:15 ` Ping Yin
@ 2008-05-07 11:24 ` Johannes Schindelin
2008-05-07 12:19 ` Ping Yin
2008-05-07 19:13 ` Junio C Hamano
0 siblings, 2 replies; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-07 11:24 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, git
Hi,
On Wed, 7 May 2008, Ping Yin wrote:
> On Tue, May 6, 2008 at 4:55 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > > I will come up with an implementation still using diff.nonwordchars few
> > > days later.
> >
> > If I did not like the unnecessary negative approach "nonwordchars" (as
> > opposed to "wordchars"), it seems even less appropriate now, when you
> > actually want to discern between "spaceCharacters",
> > "punctuationCharacters" and "wordCharacters".
> >
>
> Hmm, punctchars should be a better word than nonwordchars.
>
> So how about this
>
> --color-words={char,punct,word}
>
> - char: one char one token
> - punct/word: a token can be either a run of word characters or a
> single punct character. diff.punctchars is used for punct, and
> diff.wordchars is used for word.
I am rather interested in the semantics, i.e. if you can punch holes into
this 3-class approach.
Bikeshedding comes later ;-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 11:24 ` Johannes Schindelin
@ 2008-05-07 12:19 ` Ping Yin
2008-05-07 13:10 ` Johannes Schindelin
2008-05-07 19:13 ` Junio C Hamano
1 sibling, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-07 12:19 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On Wed, May 7, 2008 at 7:24 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > So how about this
> >
> > --color-words={char,punct,word}
> >
> > - char: one char one token
> > - punct/word: a token can be either a run of word characters or a
> > single punct character. diff.punctchars is used for punct, and
> > diff.wordchars is used for word.
>
> I am rather interested in the semantics, i.e. if you can punch holes into
> this 3-class approach.
>
> Bikeshedding comes later ;-)
Sorry, but i can't parse both sentences, especially Bikeshedding and
"punch holes into".
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 12:19 ` Ping Yin
@ 2008-05-07 13:10 ` Johannes Schindelin
2008-05-07 14:11 ` Ping Yin
0 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2008-05-07 13:10 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, git
Hi,
On Wed, 7 May 2008, Ping Yin wrote:
> On Wed, May 7, 2008 at 7:24 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > > So how about this
> > >
> > > --color-words={char,punct,word}
> > >
> > > - char: one char one token
> > > - punct/word: a token can be either a run of word characters or a
> > > single punct character. diff.punctchars is used for punct, and
> > > diff.wordchars is used for word.
> >
> > I am rather interested in the semantics, i.e. if you can punch holes into
> > this 3-class approach.
> >
> > Bikeshedding comes later ;-)
>
> Sorry, but i can't parse both sentences, especially Bikeshedding and
> "punch holes into".
"punch holes into": find cases where Junio's proposed algorithm breaks
down.
"bikeshedding": discussing minor implementation details that are not
really interesting at this stage. See also
http://en.wikipedia.org/wiki/Bikeshedding.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 13:10 ` Johannes Schindelin
@ 2008-05-07 14:11 ` Ping Yin
0 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-07 14:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, git
On Wed, May 7, 2008 at 9:10 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > Sorry, but i can't parse both sentences, especially Bikeshedding and
> > "punch holes into".
>
> "punch holes into": find cases where Junio's proposed algorithm breaks
> down.
I think the --color-words={char,punct,word} covers most of the cases
that junio mentioned, except the case that a token can be a run of
word chars, a run of non-word chars or a run of whitespaces.
If anyone is interested in this case, he can extend --color-words with
a fourth option or tokenizer (i havn't yet found a good name for this
tokenizer).
Of course, if neccessary, one can implement other language-specific tokenizers.
However, i think the 3 tokenizers i mentioned is enough for most cases.
>
> "bikeshedding": discussing minor implementation details that are not
> really interesting at this stage. See also
> http://en.wikipedia.org/wiki/Bikeshedding.
>
See. THX.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 11:24 ` Johannes Schindelin
2008-05-07 12:19 ` Ping Yin
@ 2008-05-07 19:13 ` Junio C Hamano
2008-05-07 19:33 ` Junio C Hamano
` (3 more replies)
1 sibling, 4 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-07 19:13 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> I am rather interested in the semantics, i.e. if you can punch holes into
> this 3-class approach.
This is not the 3-class thing, but was done as a lunchtime hack. It
removes more lines than it adds, with real comments ;-).
* It removes the custom allocator used for minus/plus buffers and
replaces it with the bog-standard strbuf;
* The tokenization is done when diff_words_append() is called, i.e. when
we read the original "added or deleted _lines_";
* The tokenization function is separated out, and gets the emit_callback,
so anybody can enhance it with customization using gitattributes and
other heuristics. More importantly, it is not byte oriented and would
be easier to extend it to UTF-8 contents;
* It does not have to play "suppressed_newline" games anymore. A LF is
just a token.
I haven't tested this at all (this is a lunchtime hack) and have a mild
suspicion that it may have corner case miscounting (e.g. I blindly
subtracts 3 from len when dealing with a line that represents a single
token from the internal diff output --- do I always have 3 there even when
the original file ends with an incomplete line? I didn't check), but
other than that I think this is a lot easier to read and follow.
---
diff.c | 216 +++++++++++++++++++++++++++++++--------------------------------
1 files changed, 106 insertions(+), 110 deletions(-)
diff --git a/diff.c b/diff.c
index e35384b..344aaa6 100644
--- a/diff.c
+++ b/diff.c
@@ -351,87 +351,119 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
return 0;
}
-struct diff_words_buffer {
- mmfile_t text;
- long alloc;
- long current; /* output pointer */
- int suppressed_newline;
+typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
+
+struct emit_callback {
+ struct xdiff_emit_state xm;
+ int nparents, color_diff;
+ unsigned ws_rule;
+ sane_truncate_fn truncate;
+ const char **label_path;
+ struct diff_words_data *diff_words;
+ int *found_changesp;
+ FILE *file;
};
-static void diff_words_append(char *line, unsigned long len,
- struct diff_words_buffer *buffer)
+static size_t diff_words_tokenize(struct emit_callback *ecbdata,
+ char *line, unsigned long len)
{
- if (buffer->text.size + len > buffer->alloc) {
- buffer->alloc = (buffer->text.size + len) * 3 / 2;
- buffer->text.ptr = xrealloc(buffer->text.ptr, buffer->alloc);
+ /*
+ * This function currently is deliberately done very stupid,
+ * but passing ecbdata here means that you can potentially
+ * implement different tokenization rules depending on
+ * the content (e.g. "gitattributes(5)").
+ */
+ int is_space;
+ char *line0 = line;
+
+ if (!len)
+ return 0;
+
+ is_space = isspace(*line);
+ while (len && (isspace(*line) == is_space)) {
+ line++;
+ len--;
}
+ return line - line0;
+}
+
+static void diff_words_append(struct emit_callback *ecbdata,
+ char *line, unsigned long len,
+ struct strbuf *text)
+{
+ /* Skip leading +/- first. */
line++;
len--;
- memcpy(buffer->text.ptr + buffer->text.size, line, len);
- buffer->text.size += len;
+
+ /*
+ * Tokenize and stuff the words in.
+ */
+ while (len) {
+ size_t token_len = diff_words_tokenize(ecbdata, line, len);
+
+ if (line[0] != '\n') {
+ /*
+ * A nonempty token has ' ' stuffed in front,
+ * so that we can recover the original
+ * end-of-line easily. Stupid, but works.
+ */
+ strbuf_add(text, " ", 1);
+ strbuf_add(text, line, token_len);
+ strbuf_add(text, "\n", 1);
+ len -= token_len;
+ line += token_len;
+ } else {
+ /* A real LF */
+ strbuf_add(text, "\n", 1);
+ break;
+ }
+ }
}
struct diff_words_data {
struct xdiff_emit_state xm;
- struct diff_words_buffer minus, plus;
+ struct strbuf minus;
+ struct strbuf plus;
FILE *file;
};
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
- int suppress_newline)
+static void emit_line(FILE *file, const char *set, const char *reset, const char *line, int len)
{
- const char *ptr;
- int eol = 0;
-
- if (len == 0)
- return;
-
- ptr = buffer->text.ptr + buffer->current;
- buffer->current += len;
-
- if (ptr[len - 1] == '\n') {
- eol = 1;
- len--;
- }
-
- fputs(diff_get_color(1, color), file);
- fwrite(ptr, len, 1, file);
- fputs(diff_get_color(1, DIFF_RESET), file);
-
- if (eol) {
- if (suppress_newline)
- buffer->suppressed_newline = 1;
- else
- putc('\n', file);
- }
+ fputs(set, file);
+ fwrite(line, len, 1, file);
+ fputs(reset, file);
}
static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
{
struct diff_words_data *diff_words = priv;
+ const char *set;
+ const char *reset = diff_colors[DIFF_RESET];
- if (diff_words->minus.suppressed_newline) {
- if (line[0] != '+')
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
+ switch (line[0]) {
+ case '-':
+ set = diff_colors[DIFF_FILE_OLD];
+ break;
+ case '+':
+ set = diff_colors[DIFF_FILE_NEW];
+ break;
+ case ' ':
+ set = diff_colors[DIFF_PLAIN];
+ break;
+ default:
+ return; /* omit @@ -j,k +l,m @@ header */
}
- len--;
- switch (line[0]) {
- case '-':
- print_word(diff_words->file,
- &diff_words->minus, len, DIFF_FILE_OLD, 1);
- break;
- case '+':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_FILE_NEW, 0);
- break;
- case ' ':
- print_word(diff_words->file,
- &diff_words->plus, len, DIFF_PLAIN, 0);
- diff_words->minus.current += len;
- break;
+ if (line[1] == ' ') {
+ /* A token */
+ line += 2;
+ len -= 3; /* drop the trailing LF */
+ } else {
+ /* A real LF */
+ line++;
+ len--;
}
+ emit_line(diff_words->file, set, reset, line, len);
}
/* this executes the word diff on the accumulated buffers */
@@ -441,27 +473,18 @@ static void diff_words_show(struct diff_words_data *diff_words)
xdemitconf_t xecfg;
xdemitcb_t ecb;
mmfile_t minus, plus;
- int i;
+ unsigned long sz;
memset(&xecfg, 0, sizeof(xecfg));
- minus.size = diff_words->minus.text.size;
- minus.ptr = xmalloc(minus.size);
- memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
- for (i = 0; i < minus.size; i++)
- if (isspace(minus.ptr[i]))
- minus.ptr[i] = '\n';
- diff_words->minus.current = 0;
-
- plus.size = diff_words->plus.text.size;
- plus.ptr = xmalloc(plus.size);
- memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
- for (i = 0; i < plus.size; i++)
- if (isspace(plus.ptr[i]))
- plus.ptr[i] = '\n';
- diff_words->plus.current = 0;
+
+ minus.ptr = strbuf_detach(&diff_words->minus, &sz);
+ minus.size = sz;
+ plus.ptr = strbuf_detach(&diff_words->plus, &sz);
+ plus.size = sz;
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
+ /* hack to make it a single hunk to show all */
+ xecfg.ctxlen = minus.size + plus.size;
ecb.outf = xdiff_outf;
ecb.priv = diff_words;
diff_words->xm.consume = fn_out_diff_words_aux;
@@ -469,37 +492,15 @@ static void diff_words_show(struct diff_words_data *diff_words)
free(minus.ptr);
free(plus.ptr);
- diff_words->minus.text.size = diff_words->plus.text.size = 0;
-
- if (diff_words->minus.suppressed_newline) {
- putc('\n', diff_words->file);
- diff_words->minus.suppressed_newline = 0;
- }
}
-typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
-
-struct emit_callback {
- struct xdiff_emit_state xm;
- int nparents, color_diff;
- unsigned ws_rule;
- sane_truncate_fn truncate;
- const char **label_path;
- struct diff_words_data *diff_words;
- int *found_changesp;
- FILE *file;
-};
-
static void free_diff_words_data(struct emit_callback *ecbdata)
{
if (ecbdata->diff_words) {
/* flush buffers */
- if (ecbdata->diff_words->minus.text.size ||
- ecbdata->diff_words->plus.text.size)
+ if (ecbdata->diff_words->minus.len ||
+ ecbdata->diff_words->plus.len)
diff_words_show(ecbdata->diff_words);
-
- free (ecbdata->diff_words->minus.text.ptr);
- free (ecbdata->diff_words->plus.text.ptr);
free(ecbdata->diff_words);
ecbdata->diff_words = NULL;
}
@@ -512,13 +513,6 @@ const char *diff_get_color(int diff_use_color, enum color_diff ix)
return "";
}
-static void emit_line(FILE *file, const char *set, const char *reset, const char *line, int len)
-{
- fputs(set, file);
- fwrite(line, len, 1, file);
- fputs(reset, file);
-}
-
static void emit_add_line(const char *reset, struct emit_callback *ecbdata, const char *line, int len)
{
const char *ws = diff_get_color(ecbdata->color_diff, DIFF_WHITESPACE);
@@ -604,16 +598,16 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
free_diff_words_data(ecbdata);
if (ecbdata->diff_words) {
if (line[0] == '-') {
- diff_words_append(line, len,
+ diff_words_append(ecbdata, line, len,
&ecbdata->diff_words->minus);
return;
} else if (line[0] == '+') {
- diff_words_append(line, len,
+ diff_words_append(ecbdata, line, len,
&ecbdata->diff_words->plus);
return;
}
- if (ecbdata->diff_words->minus.text.size ||
- ecbdata->diff_words->plus.text.size)
+ if (ecbdata->diff_words->minus.len ||
+ ecbdata->diff_words->plus.len)
diff_words_show(ecbdata->diff_words);
line++;
len--;
@@ -1470,6 +1464,8 @@ static void builtin_diff(const char *name_a,
if (DIFF_OPT_TST(o, COLOR_DIFF_WORDS)) {
ecbdata.diff_words =
xcalloc(1, sizeof(struct diff_words_data));
+ strbuf_init(&ecbdata.diff_words->minus, 0);
+ strbuf_init(&ecbdata.diff_words->plus, 0);
ecbdata.diff_words->file = o->file;
}
xdi_diff(&mf1, &mf2, &xpp, &xecfg, &ecb);
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 19:13 ` Junio C Hamano
@ 2008-05-07 19:33 ` Junio C Hamano
2008-05-07 19:45 ` Jeff King
` (2 subsequent siblings)
3 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-07 19:33 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Ping Yin, git
Junio C Hamano <gitster@pobox.com> writes:
> I haven't tested this at all (this is a lunchtime hack) and have a mild
> suspicion that it may have corner case miscounting (e.g. I blindly
> subtracts 3 from len when dealing with a line that represents a single
> token from the internal diff output --- do I always have 3 there even when
> the original file ends with an incomplete line? I didn't check), but
> other than that I think this is a lot easier to read and follow.
And this adds "--color-words -b" support as an example.
The second hunk, however, is a bugfix to the previous one. The code wants
the LF at the end of the line always returned as a single token.
---
diff.c | 30 +++++++++++++++++++++++++++---
1 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/diff.c b/diff.c
index 344aaa6..bce0626 100644
--- a/diff.c
+++ b/diff.c
@@ -357,6 +357,7 @@ struct emit_callback {
struct xdiff_emit_state xm;
int nparents, color_diff;
unsigned ws_rule;
+ struct diff_options *diffopt;
sane_truncate_fn truncate;
const char **label_path;
struct diff_words_data *diff_words;
@@ -378,19 +379,36 @@ static size_t diff_words_tokenize(struct emit_callback *ecbdata,
if (!len)
return 0;
+ /*
+ * Always return LF at the end as a single separate token.
+ */
+ if ((len == 1) && *line == '\n')
+ return 1;
is_space = isspace(*line);
while (len && (isspace(*line) == is_space)) {
line++;
len--;
}
+ if (is_space && !len)
+ line--;
return line - line0;
}
+static int token_is_ws_only(char *line, size_t len)
+{
+ while (len--)
+ if (!isspace(*line))
+ return 0;
+ return 1;
+}
+
static void diff_words_append(struct emit_callback *ecbdata,
char *line, unsigned long len,
struct strbuf *text)
{
+ struct diff_options *diffopt = ecbdata->diffopt;
+
/* Skip leading +/- first. */
line++;
len--;
@@ -407,9 +425,14 @@ static void diff_words_append(struct emit_callback *ecbdata,
* so that we can recover the original
* end-of-line easily. Stupid, but works.
*/
- strbuf_add(text, " ", 1);
- strbuf_add(text, line, token_len);
- strbuf_add(text, "\n", 1);
+ if ((diffopt->xdl_opts & XDF_IGNORE_WHITESPACE) &&
+ token_is_ws_only(line, token_len)) {
+ strbuf_add(text, " \n", 3);
+ } else {
+ strbuf_add(text, " ", 1);
+ strbuf_add(text, line, token_len);
+ strbuf_add(text, "\n", 1);
+ }
len -= token_len;
line += token_len;
} else {
@@ -1447,6 +1470,7 @@ static void builtin_diff(const char *name_a,
ecbdata.found_changesp = &o->found_changes;
ecbdata.ws_rule = whitespace_rule(name_b ? name_b : name_a);
ecbdata.file = o->file;
+ ecbdata.diffopt = o;
xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
xecfg.ctxlen = o->context;
xecfg.flags = XDL_EMIT_FUNCNAMES;
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 19:13 ` Junio C Hamano
2008-05-07 19:33 ` Junio C Hamano
@ 2008-05-07 19:45 ` Jeff King
2008-05-07 20:02 ` Junio C Hamano
2008-05-08 10:34 ` Teemu Likonen
2008-05-10 8:20 ` Ping Yin
3 siblings, 1 reply; 92+ messages in thread
From: Jeff King @ 2008-05-07 19:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Ping Yin, git
On Wed, May 07, 2008 at 12:13:39PM -0700, Junio C Hamano wrote:
> /* this executes the word diff on the accumulated buffers */
> @@ -441,27 +473,18 @@ static void diff_words_show(struct diff_words_data *diff_words)
> xdemitconf_t xecfg;
> xdemitcb_t ecb;
> mmfile_t minus, plus;
> - int i;
> + unsigned long sz;
strbuf uses size_t; since we pass sz in as a pointer to strbuf_detach,
there can be a pointer type mismatch.
But more big-picture, comparing the output of the old color words and
this implementation, there is one thing I don't like: the new one
doesn't bring together runs of additions and deletions, which can make
parsing text much easier. For example:
$ echo This is a complete sentence. >one
$ echo Here is some totally different text. >two
# with old implementation; /-.../ is red, /+.../ is green
$ git diff --color-words one two
...
/-This/ /+Here/ is /-a complete sentence./+some totally different text./
# with this patch
$ git diff --color-words one two
...
/-This/+Here/ is /-a/+some/ /-complete/+totally/ /-sentence./+different text./
-Peff
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 19:45 ` Jeff King
@ 2008-05-07 20:02 ` Junio C Hamano
2008-05-07 22:04 ` Jeff King
0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-07 20:02 UTC (permalink / raw)
To: Jeff King; +Cc: Johannes Schindelin, Ping Yin, git
Jeff King <peff@peff.net> writes:
> But more big-picture, comparing the output of the old color words and
> this implementation, there is one thing I don't like: the new one
> doesn't bring together runs of additions and deletions, which can make
> parsing text much easier. For example:
>
> $ echo This is a complete sentence. >one
> $ echo Here is some totally different text. >two
>
> # with old implementation; /-.../ is red, /+.../ is green
> $ git diff --color-words one two
> ...
> /-This/ /+Here/ is /-a complete sentence./+some totally different text./
>
> # with this patch
> $ git diff --color-words one two
> ...
> /-This/+Here/ is /-a/+some/ /-complete/+totally/ /-sentence./+different text./
I suspect that heavily depends on the input text. If you drop "different"
in the example, the output becomes:
{-This|+Here} is {-a|+some} {-complete|+totally} {-sentence.|+text.}
which is totally sensible.
You can get the output that is closer to the original by tweaking the
definition of what a token is. You can for example define a token as "0 or
more non whitespace characters followed by 1 or more whitespace characters"
and then the internal diff would become ($ to show the end of line):
-This $
+Here $
is $
-a $
-complete $
-sentence.$
+some $
+totally $
+different $
+text.$
which would yield on the output:
{-This |+Here }is {-a complete sentence.|+some totally different text.}
It's all in diff_words_tokenize(), which I kept deliberately stupid so
that people can tweak it to their liking.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 20:02 ` Junio C Hamano
@ 2008-05-07 22:04 ` Jeff King
0 siblings, 0 replies; 92+ messages in thread
From: Jeff King @ 2008-05-07 22:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Ping Yin, git
On Wed, May 07, 2008 at 01:02:54PM -0700, Junio C Hamano wrote:
> I suspect that heavily depends on the input text. If you drop "different"
> in the example, the output becomes:
>
> {-This|+Here} is {-a|+some} {-complete|+totally} {-sentence.|+text.}
>
> which is totally sensible.
>
> [...]
>
> which would yield on the output:
>
> {-This |+Here }is {-a complete sentence.|+some totally different text.}
Sensible, perhaps, but I think the second one is much nicer for English
text (though the first is much nicer for code, I expect).
> It's all in diff_words_tokenize(), which I kept deliberately stupid so
> that people can tweak it to their liking.
OK; I haven't been following the thread too closely, and I wanted to
make sure this was a question of how the tokenizing works, and not a
fundamental problem with this approach. Thanks for the explanation.
-Peff
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 19:13 ` Junio C Hamano
2008-05-07 19:33 ` Junio C Hamano
2008-05-07 19:45 ` Jeff King
@ 2008-05-08 10:34 ` Teemu Likonen
2008-05-10 9:02 ` Ping Yin
2008-05-10 8:20 ` Ping Yin
3 siblings, 1 reply; 92+ messages in thread
From: Teemu Likonen @ 2008-05-08 10:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Ping Yin, git
Junio C Hamano wrote (2008-05-07 12:13 -0700):
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I am rather interested in the semantics, i.e. if you can punch holes
> > into this 3-class approach.
>
> This is not the 3-class thing, but was done as a lunchtime hack. It
> removes more lines than it adds, with real comments ;-).
I tested your lunchtime hack from the "pu" branch. I'm perfectly happy
with the colored output itself but I noticed some different line feed
behaviour that you might want to know. Look at the example below. The
first is normal line diff. The second is the same text with the old
--color-words behaviour and the last is with the lunchtime hack version.
There are only three words added to the text; additions are written as
{+word} in the --color-words output.
Normal line diff
----------------
-OpenOffice.org has user setting for defining the minimum length for
+OpenOffice.org has a user setting for defining the minimum length for
words to be hyphenated. By default the word length is counted from the
-whole word - even for compound words. For example the word
-'elokuvalippu' is 12 characters long. The word will be hyphenated like
-'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
-12 or less. If the minimum length is set to 13 or more the word is not
-hyphenated at all.
+whole word - even for compound words. For example the compound word
+'elokuvalippu' is considered 12 characters long. The word will be
+hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
+length is set to 12 or less. If the minimum length is set to 13 or more
+the word is not hyphenated at all.
With the old --color-words
--------------------------
OpenOffice.org has {+a }user setting for defining the minimum length for
words to be hyphenated. By default the word length is counted from the
whole word - even for compound words. For example the {+compound }word
'elokuvalippu' is {+considered }12 characters long. The word will be
hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
length is set to 12 or less. If the minimum length is set to 13 or more
the word is not hyphenated at all.
With the lunchtime hack --color-words
-------------------------------------
OpenOffice.org has {+a }user setting for defining the minimum length for
words to be hyphenated. By default the word length is counted from the
whole word - even for compound words. For example the {+compound }word
'elokuvalippu' is {+considered }12 characters long. The word will be
hyphenated like
'elo-ku-va-lip-pu' in all cases when the minimum word
length is set to
12 or less. If the minimum length is set to 13 or more
the word is not
hyphenated at all.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-07 19:13 ` Junio C Hamano
` (2 preceding siblings ...)
2008-05-08 10:34 ` Teemu Likonen
@ 2008-05-10 8:20 ` Ping Yin
3 siblings, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-10 8:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
On Thu, May 8, 2008 at 3:13 AM, Junio C Hamano <gitster@pobox.com> wrote:
> I haven't tested this at all (this is a lunchtime hack) and have a mild
> suspicion that it may have corner case miscounting (e.g. I blindly
> subtracts 3 from len when dealing with a line that represents a single
> token from the internal diff output --- do I always have 3 there even when
> the original file ends with an incomplete line? I didn't check), but
> other than that I think this is a lot easier to read and follow.
>
> ---
>
> diff.c | 216 +++++++++++++++++++++++++++++++--------------------------------
> 1 files changed, 106 insertions(+), 110 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index e35384b..344aaa6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -351,87 +351,119 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
> return 0;
> }
>
> -struct diff_words_buffer {
> - mmfile_t text;
> - long alloc;
> - long current; /* output pointer */
> - int suppressed_newline;
> +typedef unsigned long (*sane_truncate_fn)(char *line, unsigned long len);
> +
> +struct emit_callback {
> + struct xdiff_emit_state xm;
> + int nparents, color_diff;
> + unsigned ws_rule;
> + sane_truncate_fn truncate;
> + const char **label_path;
> + struct diff_words_data *diff_words;
> + int *found_changesp;
> + FILE *file;
> };
>
> -static void diff_words_append(char *line, unsigned long len,
> - struct diff_words_buffer *buffer)
> +static size_t diff_words_tokenize(struct emit_callback *ecbdata,
> + char *line, unsigned long len)
> {
> - if (buffer->text.size + len > buffer->alloc) {
> - buffer->alloc = (buffer->text.size + len) * 3 / 2;
> - buffer->text.ptr = xrealloc(buffer->text.ptr, buffer->alloc);
> + /*
> + * This function currently is deliberately done very stupid,
> + * but passing ecbdata here means that you can potentially
> + * implement different tokenization rules depending on
> + * the content (e.g. "gitattributes(5)").
> + */
> + int is_space;
> + char *line0 = line;
> +
> + if (!len)
> + return 0;
> +
> + is_space = isspace(*line);
> + while (len && (isspace(*line) == is_space)) {
> + line++;
> + len--;
> }
> + return line - line0;
> +}
> +
> +static void diff_words_append(struct emit_callback *ecbdata,
> + char *line, unsigned long len,
> + struct strbuf *text)
> +{
> + /* Skip leading +/- first. */
> line++;
> len--;
> - memcpy(buffer->text.ptr + buffer->text.size, line, len);
> - buffer->text.size += len;
> +
> + /*
> + * Tokenize and stuff the words in.
> + */
> + while (len) {
> + size_t token_len = diff_words_tokenize(ecbdata, line, len);
> +
> + if (line[0] != '\n') {
> + /*
> + * A nonempty token has ' ' stuffed in front,
> + * so that we can recover the original
> + * end-of-line easily. Stupid, but works.
> + */
> + strbuf_add(text, " ", 1);
> + strbuf_add(text, line, token_len);
> + strbuf_add(text, "\n", 1);
> + len -= token_len;
> + line += token_len;
I still don't understand why a ' ' is prepended. See my comment for
the following part
> + if (line[1] == ' ') {
> + /* A token */
> + line += 2;
> + len -= 3; /* drop the trailing LF */
> + } else {
> + /* A real LF */
> + line++;
> + len--;
> }
I think we can recognize a real LF by that the diff line should be a
single '\n', i.e. line[1] == '\n'. So what's wrong by
s/line[1] == ' '/line[1] != '\n'/ ?
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-08 10:34 ` Teemu Likonen
@ 2008-05-10 9:02 ` Ping Yin
2008-05-10 9:14 ` Teemu Likonen
2008-05-11 13:16 ` Ping Yin
0 siblings, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-10 9:02 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Junio C Hamano, Johannes Schindelin, git
On Thu, May 8, 2008 at 6:34 PM, Teemu Likonen <tlikonen@iki.fi> wrote:
> Junio C Hamano wrote (2008-05-07 12:13 -0700):
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > I am rather interested in the semantics, i.e. if you can punch holes
>> > into this 3-class approach.
>>
>> This is not the 3-class thing, but was done as a lunchtime hack. It
>> removes more lines than it adds, with real comments ;-).
>
> I tested your lunchtime hack from the "pu" branch. I'm perfectly happy
> with the colored output itself but I noticed some different line feed
> behaviour that you might want to know. Look at the example below. The
> first is normal line diff. The second is the same text with the old
> --color-words behaviour and the last is with the lunchtime hack version.
> There are only three words added to the text; additions are written as
> {+word} in the --color-words output.
You not only added the three words, but also wrap line at different position.
> Normal line diff
> ----------------
>
> -OpenOffice.org has user setting for defining the minimum length for
> +OpenOffice.org has a user setting for defining the minimum length for
> words to be hyphenated. By default the word length is counted from the
> -whole word - even for compound words. For example the word
> -'elokuvalippu' is 12 characters long. The word will be hyphenated like
> -'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
> -12 or less. If the minimum length is set to 13 or more the word is not
> -hyphenated at all.
> +whole word - even for compound words. For example the compound word
> +'elokuvalippu' is considered 12 characters long. The word will be
> +hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
> +length is set to 12 or less. If the minimum length is set to 13 or more
> +the word is not hyphenated at all.
>
> With the old --color-words
> --------------------------
>
> OpenOffice.org has {+a }user setting for defining the minimum length for
> words to be hyphenated. By default the word length is counted from the
> whole word - even for compound words. For example the {+compound }word
> 'elokuvalippu' is {+considered }12 characters long. The word will be
> hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
> length is set to 12 or less. If the minimum length is set to 13 or more
> the word is not hyphenated at all.
>
> With the lunchtime hack --color-words
> -------------------------------------
>
> OpenOffice.org has {+a }user setting for defining the minimum length for
> words to be hyphenated. By default the word length is counted from the
> whole word - even for compound words. For example the {+compound }word
> 'elokuvalippu' is {+considered }12 characters long. The word will be
> hyphenated like
> 'elo-ku-va-lip-pu' in all cases when the minimum word
> length is set to
> 12 or less. If the minimum length is set to 13 or more
> the word is not
> hyphenated at all.
>
With junio's following code, the number of LF can be more than any of
the original two input. Because it will output a LF whenever we
encounter a real LF in the original input. So some LFs are outputed
as {-LF} or {+LF} . We can't differentiate them because we can't see
the colored output for added or removed LF. The same case is that we
can't differentiate added and removed spaces.
That's why i proposed colored background for added/removed space
characters in former reply of this thread.
+ if (line[1] == ' ') {
+ /* A token */
+ line += 2;
+ len -= 3; /* drop the trailing LF */
+ } else {
+ /* A real LF */
+ line++;
+ len--;
}
+ emit_line(diff_words->file, set, reset, line, len);
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-10 9:02 ` Ping Yin
@ 2008-05-10 9:14 ` Teemu Likonen
2008-05-11 13:16 ` Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Teemu Likonen @ 2008-05-10 9:14 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, Johannes Schindelin, git
Ping Yin wrote (2008-05-10 17:02 +0800):
> On Thu, May 8, 2008 at 6:34 PM, Teemu Likonen <tlikonen@iki.fi> wrote:
> > There are only three words added to the text; additions are written
> > as {+word} in the --color-words output.
>
> You not only added the three words, but also wrap line at different
> position.
Ah, you're right of course. I'm too focused on just words and language.
I don't mind these "added" LFs on Junio's version but I understand if
someone finds them surprising when color-word-diffing natural languages.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-10 9:02 ` Ping Yin
2008-05-10 9:14 ` Teemu Likonen
@ 2008-05-11 13:16 ` Ping Yin
2008-05-11 13:27 ` Ping Yin
2008-05-11 16:27 ` Junio C Hamano
1 sibling, 2 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-11 13:16 UTC (permalink / raw)
To: Teemu Likonen; +Cc: Junio C Hamano, Johannes Schindelin, git
On Sat, May 10, 2008 at 5:02 PM, Ping Yin <pkufranky@gmail.com> wrote:
> On Thu, May 8, 2008 at 6:34 PM, Teemu Likonen <tlikonen@iki.fi> wrote:
>> Junio C Hamano wrote (2008-05-07 12:13 -0700):
>>
>>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>>
>>> > I am rather interested in the semantics, i.e. if you can punch holes
>>> > into this 3-class approach.
>>>
>>> This is not the 3-class thing, but was done as a lunchtime hack. It
>>> removes more lines than it adds, with real comments ;-).
>>
>> I tested your lunchtime hack from the "pu" branch. I'm perfectly happy
>> with the colored output itself but I noticed some different line feed
>> behaviour that you might want to know. Look at the example below. The
>> first is normal line diff. The second is the same text with the old
>> --color-words behaviour and the last is with the lunchtime hack version.
>> There are only three words added to the text; additions are written as
>> {+word} in the --color-words output.
>
> You not only added the three words, but also wrap line at different position.
>
>> Normal line diff
>> ----------------
>>
>> -OpenOffice.org has user setting for defining the minimum length for
>> +OpenOffice.org has a user setting for defining the minimum length for
>> words to be hyphenated. By default the word length is counted from the
>> -whole word - even for compound words. For example the word
>> -'elokuvalippu' is 12 characters long. The word will be hyphenated like
>> -'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
>> -12 or less. If the minimum length is set to 13 or more the word is not
>> -hyphenated at all.
>> +whole word - even for compound words. For example the compound word
>> +'elokuvalippu' is considered 12 characters long. The word will be
>> +hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
>> +length is set to 12 or less. If the minimum length is set to 13 or more
>> +the word is not hyphenated at all.
>>
>> With the old --color-words
>> --------------------------
>>
>> OpenOffice.org has {+a }user setting for defining the minimum length for
>> words to be hyphenated. By default the word length is counted from the
>> whole word - even for compound words. For example the {+compound }word
>> 'elokuvalippu' is {+considered }12 characters long. The word will be
>> hyphenated like 'elo-ku-va-lip-pu' in all cases when the minimum word
>> length is set to 12 or less. If the minimum length is set to 13 or more
>> the word is not hyphenated at all.
>>
>> With the lunchtime hack --color-words
>> -------------------------------------
>>
>> OpenOffice.org has {+a }user setting for defining the minimum length for
>> words to be hyphenated. By default the word length is counted from the
>> whole word - even for compound words. For example the {+compound }word
>> 'elokuvalippu' is {+considered }12 characters long. The word will be
>> hyphenated like
>> 'elo-ku-va-lip-pu' in all cases when the minimum word
>> length is set to
>> 12 or less. If the minimum length is set to 13 or more
>> the word is not
>> hyphenated at all.
>>
>
> With junio's following code, the number of LF can be more than any of
> the original two input. Because it will output a LF whenever we
> encounter a real LF in the original input. So some LFs are outputed
> as {-LF} or {+LF} . We can't differentiate them because we can't see
> the colored output for added or removed LF. The same case is that we
> can't differentiate added and removed spaces.
>
> That's why i proposed colored background for added/removed space
> characters in former reply of this thread.
>
> + if (line[1] == ' ') {
> + /* A token */
> + line += 2;
> + len -= 3; /* drop the trailing LF */
> + } else {
> + /* A real LF */
> + line++;
> + len--;
> }
> + emit_line(diff_words->file, set, reset, line, len);
>
With following patch, the diff output becomes (i don't know which one is better)
OpenOffice.org has {+a }user setting for defining the minimum length for
words to be hyphenated. By default the word length is counted from the
whole word - even for compound words. For example the {compound +}word
'elokuvalippu' is {+considered }12 characters long. The word will be
hyphenated like
'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
12 or less. If the minimum length is set to 13 or more the word is not
hyphenated at all.
diff --git a/diff.c b/diff.c
index 51048c6..06fbace 100644
--- a/diff.c
+++ b/diff.c
@@ -446,6 +446,7 @@ struct diff_words_data {
struct xdiff_emit_state xm;
struct strbuf minus;
struct strbuf plus;
+ int suppressed_newline;
FILE *file;
};
@@ -480,12 +481,16 @@ static void fn_out_diff_words_aux(
/* A token */
line += 2;
len -= 3; /* drop the trailing LF */
+ emit_line(diff_words->file, set, reset, line, len);
} else {
/* A real LF */
- line++;
- len--;
+ if (diff_words->suppressed_newline || line[0] == ' ') {
+ diff_words->suppressed_newline = 0;
+ emit_line(diff_words->file, set, reset, "\n", 1);
+ }
+ else
+ diff_words->suppressed_newline = 1;
}
- emit_line(diff_words->file, set, reset, line, len);
}
/* this executes the word diff on the accumulated buffers */
@@ -510,8 +515,14 @@ static void diff_words_show(
ecb.outf = xdiff_outf;
ecb.priv = diff_words;
diff_words->xm.consume = fn_out_diff_words_aux;
+ diff_words->suppressed_newline = 0;
xdi_diff(&minus, &plus, &xpp, &xecfg, &ecb);
+ if (diff_words->suppressed_newline) {
+ putc('\n', diff_words->file);
+ diff_words->suppressed_newline = 0;
+ }
+
free(minus.ptr);
free(plus.ptr);
}
--
Ping Yin
^ permalink raw reply related [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-11 13:16 ` Ping Yin
@ 2008-05-11 13:27 ` Ping Yin
2008-05-11 16:27 ` Junio C Hamano
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-11 13:27 UTC (permalink / raw)
To: Ping Yin; +Cc: Teemu Likonen, Junio C Hamano, Johannes Schindelin, git
* Ping Yin <pkufranky@gmail.com> [2008-05-11 21:16:11 +0800]:
>
> With following patch, the diff output becomes (i don't know which one is better)
>
> OpenOffice.org has {+a }user setting for defining the minimum length for
> words to be hyphenated. By default the word length is counted from the
> whole word - even for compound words. For example the {compound +}word
> 'elokuvalippu' is {+considered }12 characters long. The word will be
> hyphenated like
The above two lines are auto wrapped by gmail client, should be
'elokuvalippu' is {+considered }12 characters long. The word will be hyphenated like
Sorry for that.
>
> --
> Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-11 13:16 ` Ping Yin
2008-05-11 13:27 ` Ping Yin
@ 2008-05-11 16:27 ` Junio C Hamano
2008-05-12 16:31 ` Ping Yin
1 sibling, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2008-05-11 16:27 UTC (permalink / raw)
To: Ping Yin; +Cc: Teemu Likonen, Junio C Hamano, Johannes Schindelin, git
"Ping Yin" <pkufranky@gmail.com> writes:
> With following patch, the diff output becomes (i don't know which one is better)
>
> OpenOffice.org has {+a }user setting for defining the minimum length for
> words to be hyphenated. By default the word length is counted from the
> whole word - even for compound words. For example the {compound +}word
> 'elokuvalippu' is {+considered }12 characters long. The word will be hyphenated like
> 'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
> 12 or less. If the minimum length is set to 13 or more the word is not
> hyphenated at all.
Yeah, after playing with it a bit, I realize that my original stated goal
of not playing games with "newline suppression" goes very against what
color-words, which is a word oriented diff, tries to achieve. It appears
that it is necessary to reintroduce suppressed_newline.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-11 16:27 ` Junio C Hamano
@ 2008-05-12 16:31 ` Ping Yin
2008-05-12 18:57 ` Jakub Narebski
0 siblings, 1 reply; 92+ messages in thread
From: Ping Yin @ 2008-05-12 16:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Teemu Likonen, Johannes Schindelin, git
On Mon, May 12, 2008 at 12:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
> "Ping Yin" <pkufranky@gmail.com> writes:
>
> > With following patch, the diff output becomes (i don't know which one is better)
> >
> > OpenOffice.org has {+a }user setting for defining the minimum length for
> > words to be hyphenated. By default the word length is counted from the
> > whole word - even for compound words. For example the {compound +}word
> > 'elokuvalippu' is {+considered }12 characters long. The word will be hyphenated like
> > 'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
> > 12 or less. If the minimum length is set to 13 or more the word is not
> > hyphenated at all.
>
> Yeah, after playing with it a bit, I realize that my original stated goal
> of not playing games with "newline suppression" goes very against what
> color-words, which is a word oriented diff, tries to achieve. It appears
> that it is necessary to reintroduce suppressed_newline.
>
No matter how well we play with suppressed_newline, we still can't
achieve the best result by doing word diff between multiple minus
lines and multiple plus lines.
( i think the result of vimdiff can be considered as the best).
To achieve the best, we have to find the pairs of lines (one minus and
one plus for each pair) which most match each other, and then do the
word diff for each pair.
--
Ping Yin
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-12 16:31 ` Ping Yin
@ 2008-05-12 18:57 ` Jakub Narebski
2008-05-12 19:17 ` Junio C Hamano
2008-05-13 1:42 ` Ping Yin
0 siblings, 2 replies; 92+ messages in thread
From: Jakub Narebski @ 2008-05-12 18:57 UTC (permalink / raw)
To: Ping Yin; +Cc: Junio C Hamano, Teemu Likonen, Johannes Schindelin, git
"Ping Yin" <pkufranky@gmail.com> writes:
> On Mon, May 12, 2008 at 12:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> "Ping Yin" <pkufranky@gmail.com> writes:
>>
>>> With following patch, the diff output becomes (i don't know which
>>> one is better)
>>>
>>> OpenOffice.org has {+a }user setting for defining the minimum length for
>>> words to be hyphenated. By default the word length is counted from the
>>> whole word - even for compound words. For example the {compound +}word
>>> 'elokuvalippu' is {+considered }12 characters long. The word will be hyphenated like
>>> 'elo-ku-va-lip-pu' in all cases when the minimum word length is set to
>>> 12 or less. If the minimum length is set to 13 or more the word is not
>>> hyphenated at all.
>>
>> Yeah, after playing with it a bit, I realize that my original
>> stated goal of not playing games with "newline suppression" goes
>> very against what color-words, which is a word oriented diff,
>> tries to achieve. It appears that it is necessary to reintroduce
>> suppressed_newline.
>>
>
> No matter how well we play with suppressed_newline, we still can't
> achieve the best result by doing word diff between multiple minus
> lines and multiple plus lines.
>
> ( i think the result of vimdiff can be considered as the best).
Is the vimdiff algorithm described anywhere? What about wdiff output?
> To achieve the best, we have to find the pairs of lines (one minus and
> one plus for each pair) which most match each other, and then do the
> word diff for each pair.
Wouldn't be enough to treat run of plus/minus lines as a single block,
tokenize, do token-based (as opposed to line-based) diff, then show it
using linebreaks of the destination file (pluses line)?
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-12 18:57 ` Jakub Narebski
@ 2008-05-12 19:17 ` Junio C Hamano
2008-05-12 19:57 ` Jakub Narebski
2008-05-13 1:37 ` Ping Yin
2008-05-13 1:42 ` Ping Yin
1 sibling, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2008-05-12 19:17 UTC (permalink / raw)
To: Jakub Narebski; +Cc: Ping Yin, Teemu Likonen, Johannes Schindelin, git
Jakub Narebski <jnareb@gmail.com> writes:
>> To achieve the best, we have to find the pairs of lines (one minus and
>> one plus for each pair) which most match each other, and then do the
>> word diff for each pair.
>
> Wouldn't be enough to treat run of plus/minus lines as a single block,
> tokenize, do token-based (as opposed to line-based) diff, then show it
> using linebreaks of the destination file (pluses line)?
I tried the "using linebreaks" but I discarded it because I did not think
it would work. If we rewrite the last three lines above with this single
line:
> Wouldn't be enough to use magic?
and apply that algorithm between the two, then we would get a long single
line that has words painted in red, two lines worth, followed by green "to
use magic?" and finally an end-of-line.
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-12 19:17 ` Junio C Hamano
@ 2008-05-12 19:57 ` Jakub Narebski
2008-05-13 1:37 ` Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2008-05-12 19:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ping Yin, Teemu Likonen, Johannes Schindelin, git
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>> "Ping Yin" <pkufranky@gmail.com> writes:
>>>
>>> To achieve the best, we have to find the pairs of lines (one minus and
>>> one plus for each pair) which most match each other, and then do the
>>> word diff for each pair.
>>
>> Wouldn't be enough to treat run of plus/minus lines as a single block,
>> tokenize, do token-based (as opposed to line-based) diff, then show it
>> using linebreaks of the destination file (pluses line)?
>
> I tried the "using linebreaks" but I discarded it because I did not think
> it would work. If we rewrite the last three lines above with this single
> line:
>
>> Wouldn't be enough to use magic?
>
> and apply that algorithm between the two, then we would get a long single
> line that has words painted in red, two lines worth, followed by green "to
> use magic?" and finally an end-of-line.
It looks then like inserting (retaining?) newlines in word/token based
--color-words output isn't simple. It would have to produce readable
output both for the case you stated/mentioned, and for the opposite case
(replacing single line by multiple lines). What's even more difficult,
it should produce clear output for a simple case of rewrapping output,
e.g. the following as replacement.
>> Wouldn't be enough to treat run of plus/minus lines as a single
>> block, tokenize, do token-based (as opposed to line-based) diff,
>> then show it using linebreaks of the destination file (pluses
>> line)?
Gaahh... I don't think this (word diff/token diff) is something computer
science has worked on?
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-12 19:17 ` Junio C Hamano
2008-05-12 19:57 ` Jakub Narebski
@ 2008-05-13 1:37 ` Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-13 1:37 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jakub Narebski, Ping Yin, Teemu Likonen, Johannes Schindelin, git
* Junio C Hamano <gitster@pobox.com> [2008-05-12 12:17:26 -0700]:
> Jakub Narebski <jnareb@gmail.com> writes:
>
> >> To achieve the best, we have to find the pairs of lines (one minus and
> >> one plus for each pair) which most match each other, and then do the
> >> word diff for each pair.
> >
> > Wouldn't be enough to treat run of plus/minus lines as a single block,
> > tokenize, do token-based (as opposed to line-based) diff, then show it
> > using linebreaks of the destination file (pluses line)?
>
> I tried the "using linebreaks" but I discarded it because I did not think
> it would work. If we rewrite the last three lines above with this single
> line:
>
> > Wouldn't be enough to use magic?
>
> and apply that algorithm between the two, then we would get a long single
> line that has words painted in red, two lines worth, followed by green "to
> use magic?" and finally an end-of-line.
That's why i said with current implementation we can't get the
best output which i think should be
Wouldn't be enough to {-treat run of plus/minus lines as a single block,}{+use magic?}
{-tokenize, do token-based (as opposed to line-based) diff, then show it}
{-using linebreaks of the destination file (pluses line)?}
^ permalink raw reply [flat|nested] 92+ messages in thread
* Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
2008-05-12 18:57 ` Jakub Narebski
2008-05-12 19:17 ` Junio C Hamano
@ 2008-05-13 1:42 ` Ping Yin
1 sibling, 0 replies; 92+ messages in thread
From: Ping Yin @ 2008-05-13 1:42 UTC (permalink / raw)
To: Jakub Narebski
Cc: Ping Yin, Junio C Hamano, Teemu Likonen, Johannes Schindelin, git
* Jakub Narebski <jnareb@gmail.com> [2008-05-12 11:57:48 -0700]:
> > ( i think the result of vimdiff can be considered as the best).
>
> Is the vimdiff algorithm described anywhere? What about wdiff output?
I don't know about that. source code? I just show an example of best
--color-words.
I havn't yet used wdiff.
^ permalink raw reply [flat|nested] 92+ messages in thread
end of thread, other threads:[~2008-05-13 1:43 UTC | newest]
Thread overview: 92+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-02 3:39 [PATCH] Make words boundary for --color-words configurable Ping Yin
2008-05-02 3:54 ` Junio C Hamano
2008-05-02 4:28 ` Ping Yin
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
2008-05-02 14:26 ` Ping Yin
2008-05-02 14:27 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
2008-05-03 11:57 ` [PATCH v2 5/5] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
2008-05-03 18:41 ` Teemu Likonen
2008-05-04 0:32 ` Ping Yin
2008-05-04 9:44 ` Johannes Schindelin
2008-05-04 16:35 ` Ping Yin
2008-05-04 20:16 ` Junio C Hamano
2008-05-04 20:47 ` Jakub Narebski
2008-05-04 21:27 ` Teemu Likonen
2008-05-05 12:14 ` Johannes Schindelin
2008-05-05 1:40 ` Ping Yin
2008-05-05 5:00 ` Junio C Hamano
2008-05-05 12:10 ` Ping Yin
2008-05-06 0:40 ` Ping Yin
2008-05-06 8:55 ` Johannes Schindelin
2008-05-07 1:15 ` Ping Yin
2008-05-07 11:24 ` Johannes Schindelin
2008-05-07 12:19 ` Ping Yin
2008-05-07 13:10 ` Johannes Schindelin
2008-05-07 14:11 ` Ping Yin
2008-05-07 19:13 ` Junio C Hamano
2008-05-07 19:33 ` Junio C Hamano
2008-05-07 19:45 ` Jeff King
2008-05-07 20:02 ` Junio C Hamano
2008-05-07 22:04 ` Jeff King
2008-05-08 10:34 ` Teemu Likonen
2008-05-10 9:02 ` Ping Yin
2008-05-10 9:14 ` Teemu Likonen
2008-05-11 13:16 ` Ping Yin
2008-05-11 13:27 ` Ping Yin
2008-05-11 16:27 ` Junio C Hamano
2008-05-12 16:31 ` Ping Yin
2008-05-12 18:57 ` Jakub Narebski
2008-05-12 19:17 ` Junio C Hamano
2008-05-12 19:57 ` Jakub Narebski
2008-05-13 1:37 ` Ping Yin
2008-05-13 1:42 ` Ping Yin
2008-05-10 8:20 ` Ping Yin
2008-05-05 11:51 ` Johannes Schindelin
2008-05-05 12:02 ` Ping Yin
2008-05-03 18:01 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Junio C Hamano
2008-05-03 12:01 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 17:47 ` Junio C Hamano
2008-05-03 18:20 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Junio C Hamano
2008-05-04 4:20 ` [PATCH v3 0/6] --color-words improvement Ping Yin
2008-05-04 4:20 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-04 4:20 ` [PATCH v3 6/6] --color-words: Add test t4030 Ping Yin
2008-05-04 9:54 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Johannes Schindelin
2008-05-04 16:53 ` Ping Yin
2008-05-05 12:11 ` Johannes Schindelin
2008-05-05 14:18 ` Ping Yin
2008-05-04 6:45 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Junio C Hamano
2008-05-04 7:04 ` Ping Yin
2008-05-04 9:52 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Johannes Schindelin
2008-05-04 16:48 ` Ping Yin
2008-05-05 12:10 ` Johannes Schindelin
2008-05-04 9:47 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Johannes Schindelin
2008-05-04 16:39 ` Ping Yin
2008-05-05 12:05 ` Johannes Schindelin
2008-05-04 9:46 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Johannes Schindelin
2008-05-02 14:36 ` [PATCH] Make boundary characters for --color-words configurable Teemu Likonen
2008-05-03 0:22 ` Ping Yin
2008-05-03 13:22 ` Dirk Süsserott
2008-05-03 13:57 ` Ping Yin
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2008-05-03 14:13 ` Ping Yin
2008-05-03 14:23 ` Johannes Schindelin
2008-05-03 14:43 ` Teemu Likonen
2008-05-04 9:18 ` Johannes Schindelin
2008-05-03 17:43 ` Junio C Hamano
2008-05-04 9:25 ` Johannes Schindelin
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
2008-05-02 8:14 ` Teemu Likonen
2008-05-02 9:23 ` Ping Yin
2008-05-02 10:01 ` Teemu Likonen
2008-05-02 9:28 ` Ping Yin
2008-05-03 0:18 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).