From: Junio C Hamano <junio@pobox.com>
To: "Johannes Schindelin" <Johannes.Schindelin@gmx.de>
Cc: "Ping Yin" <pkufranky@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable
Date: Sun, 04 May 2008 13:16:47 -0700 [thread overview]
Message-ID: <7v63ttq0y8.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: Junio C. Hamano's message of "(unknown date)"
Let's step back a bit and try to clarify the problem with a bit of
illustration.
The motivation behind "word diff" is because line oriented diff is
sometimes unwieldy.
-Hello world.
+Hi, world.
A naïve strategy to solve this would be to convert the input into one
character a line while changing the representation of characters into
their codepoints, take the diff between them, and synthesize the result
back, like this:
preimage postimage char-diff
48 H 48 H 48 H
65 e -65 e
6c l -6c l
6c l -6c l
6f o -6f o
69 i +69 i
2c , +2c ,
20 ' ' 20 ' ' 20 ' '
77 w 77 w 77 w
6f o 6f o 6f o
72 r 72 r 72 r
6c l 6c l 6c l
64 d 64 d 64 d
2e . 2e . 2e .
0a '\n' 0a '\n' 0a '\n'
That would produce "H/ello/i,/ world.\n" which is very suboptimal for
human consumption because it chomps a word "Hello" and "Hi" in the middle.
We instead can do this word by word (note that I am doing this as a
thought experiment, to illustrate what the problem is and what should
conceptually happen, not suggesting this particular implementation):
preimage postimage word-diff
48656c6c6f -48656c6c6f Hello
4869 +4869 Hi
2c +2c ,
20 20 20 ' '
776f726c64 776f726c64 776f726c64 world
2e 2e 2e .
0a 0a 0a '\n'
Which would give you "/Hello/Hi,/ world.\n".
Another my favorite example:
-if (i > 1)
+while (i >= 0)
preimage postimage word-diff
6966 -6966 if
7768696c65 +7768696c65 while
20 20 20 ' '
28 28 28 (
69 69 69 i
20 20 20 ' '
3e -3e >
3e3d +3e3d >=
20 20 20 ' '
31 -31 1
30 +30 0
29 29 29 )
which should yield "/if/while/ (i />/>=/ /1/0/)".
So the overall algorithm I think should be is:
- make the input into stream of tokens, where a token is either a run of
word characters only, non-word punct characters only, or whitespaces
only;
- compute the diff over the stream of tokens;
- emit common tokens in white, deleted in red and added in green.
Notice that you do not have to special case LF in any way if you go this
route.
You could do this with only two classes, and use a different tokenization
rule: a token is either a run of word characters only, or each byte of non
word character becomes individual token. This however would yield a
suboptimal result:
-if (i > 1)
+while (i >= 0)
preimage postimage word-diff
6966 -6966 if
7768696c65 +7768696c65 while
20 20 20 ' '
28 28 28 (
69 69 69 i
20 20 20 ' '
3e 3e 3e >
3d +3d =
20 20 20 ' '
31 -31 1
30 +30 0
29 29 29 )
This would give "/if/while/ (i >//=/ /1/0/)". A logical unit ">=" is
chomped into two tokens, which is suboptimal for the same reason why the
output "H/ello/i,/" from the original char-diff based one was suboptimal.
next prev parent reply other threads:[~2008-05-04 20:17 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-02 3:39 [PATCH] Make words boundary for --color-words configurable Ping Yin
2008-05-02 3:54 ` Junio C Hamano
2008-05-02 4:28 ` Ping Yin
2008-05-02 13:59 ` [PATCH] Make boundary characters " Ping Yin
2008-05-02 14:26 ` Ping Yin
2008-05-02 14:27 ` Ping Yin
2008-05-03 11:57 ` [PATCH v2 0/5] " Ping Yin
2008-05-03 11:57 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-03 11:57 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 11:57 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Ping Yin
2008-05-03 11:57 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Ping Yin
2008-05-03 11:57 ` [PATCH v2 5/5] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-03 18:18 ` [PATCH v2 4/5] Make boundary characters for --color-words configurable Junio C Hamano
2008-05-03 18:41 ` Teemu Likonen
2008-05-04 0:32 ` Ping Yin
2008-05-04 9:44 ` Johannes Schindelin
2008-05-04 16:35 ` Ping Yin
2008-05-04 20:16 ` Junio C Hamano [this message]
2008-05-04 20:47 ` Jakub Narebski
2008-05-04 21:27 ` Teemu Likonen
2008-05-05 12:14 ` Johannes Schindelin
2008-05-05 1:40 ` Ping Yin
2008-05-05 5:00 ` Junio C Hamano
2008-05-05 12:10 ` Ping Yin
2008-05-06 0:40 ` Ping Yin
2008-05-06 8:55 ` Johannes Schindelin
2008-05-07 1:15 ` Ping Yin
2008-05-07 11:24 ` Johannes Schindelin
2008-05-07 12:19 ` Ping Yin
2008-05-07 13:10 ` Johannes Schindelin
2008-05-07 14:11 ` Ping Yin
2008-05-07 19:13 ` Junio C Hamano
2008-05-07 19:33 ` Junio C Hamano
2008-05-07 19:45 ` Jeff King
2008-05-07 20:02 ` Junio C Hamano
2008-05-07 22:04 ` Jeff King
2008-05-08 10:34 ` Teemu Likonen
2008-05-10 9:02 ` Ping Yin
2008-05-10 9:14 ` Teemu Likonen
2008-05-11 13:16 ` Ping Yin
2008-05-11 13:27 ` Ping Yin
2008-05-11 16:27 ` Junio C Hamano
2008-05-12 16:31 ` Ping Yin
2008-05-12 18:57 ` Jakub Narebski
2008-05-12 19:17 ` Junio C Hamano
2008-05-12 19:57 ` Jakub Narebski
2008-05-13 1:37 ` Ping Yin
2008-05-13 1:42 ` Ping Yin
2008-05-10 8:20 ` Ping Yin
2008-05-05 11:51 ` Johannes Schindelin
2008-05-05 12:02 ` Ping Yin
2008-05-03 18:01 ` [PATCH v2 3/5] diff.c: Fix --color-words showing trailing deleted words at another line Junio C Hamano
2008-05-03 12:01 ` [PATCH v2 2/5] diff.c: Use show variable name in fn_out_diff_words_aux Ping Yin
2008-05-03 17:47 ` Junio C Hamano
2008-05-03 18:20 ` [PATCH v2 1/5] diff.c: Remove code redundancy in diff_words_show Junio C Hamano
2008-05-04 4:20 ` [PATCH v3 0/6] --color-words improvement Ping Yin
2008-05-04 4:20 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Ping Yin
2008-05-04 4:20 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Ping Yin
2008-05-04 4:20 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Ping Yin
2008-05-04 4:20 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Ping Yin
2008-05-04 4:20 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Ping Yin
2008-05-04 4:20 ` [PATCH v3 6/6] --color-words: Add test t4030 Ping Yin
2008-05-04 9:54 ` [PATCH v3 5/6] fn_out_diff_words_aux: Handle common diff line more carefully Johannes Schindelin
2008-05-04 16:53 ` Ping Yin
2008-05-05 12:11 ` Johannes Schindelin
2008-05-05 14:18 ` Ping Yin
2008-05-04 6:45 ` [PATCH v3 4/6] --color-words: Make non-word characters configurable Junio C Hamano
2008-05-04 7:04 ` Ping Yin
2008-05-04 9:52 ` [PATCH v3 3/6] --color-words: Fix showing trailing deleted words at another line Johannes Schindelin
2008-05-04 16:48 ` Ping Yin
2008-05-05 12:10 ` Johannes Schindelin
2008-05-04 9:47 ` [PATCH v3 2/6] fn_out_diff_words_aux: Use short variable name Johannes Schindelin
2008-05-04 16:39 ` Ping Yin
2008-05-05 12:05 ` Johannes Schindelin
2008-05-04 9:46 ` [PATCH v3 1/6] diff.c: Remove code redundancy in diff_words_show Johannes Schindelin
2008-05-02 14:36 ` [PATCH] Make boundary characters for --color-words configurable Teemu Likonen
2008-05-03 0:22 ` Ping Yin
2008-05-03 13:22 ` Dirk Süsserott
2008-05-03 13:57 ` Ping Yin
2008-05-03 14:03 ` [PATCH] --color-words: Make the word characters configurable Johannes Schindelin
2008-05-03 14:13 ` Ping Yin
2008-05-03 14:23 ` Johannes Schindelin
2008-05-03 14:43 ` Teemu Likonen
2008-05-04 9:18 ` Johannes Schindelin
2008-05-03 17:43 ` Junio C Hamano
2008-05-04 9:25 ` Johannes Schindelin
2008-05-02 7:45 ` [PATCH] Make words boundary for --color-words configurable Johannes Schindelin
2008-05-02 8:14 ` Teemu Likonen
2008-05-02 9:23 ` Ping Yin
2008-05-02 10:01 ` Teemu Likonen
2008-05-02 9:28 ` Ping Yin
2008-05-03 0:18 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7v63ttq0y8.fsf@gitster.siamese.dyndns.org \
--to=junio@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=pkufranky@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).