* Understanding and improving --word-diff @ 2010-11-08 15:16 Matthijs Kooijman 2010-11-08 15:41 ` Matthieu Moy ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Matthijs Kooijman @ 2010-11-08 15:16 UTC (permalink / raw) To: git; +Cc: Thomas Rast, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 7104 bytes --] Hi folks, I recently discovered --word-diff (or rather, --color-words and found --word-diff when I started to hack on the git master version) and I had hoped it would make the unified diffs generated by git-diff more readable. More specifically, I had expected to get a normal unified diff, with colouring added to highlight the changes within the normal - and + lines (so you don't have to review the entire changed line to see that just a single word or character has changed). E.g., I would like to see: -a <r>b</r> c +a <g>x</g> c Unfortunately, all --word-diff types currently departs from line-based - and + lines and show the new version of the file with the changed words (both old and new versions) shown inline, marked with coloring or {- ... -} kind of syntax. E.g., with --word-diff=color, the above would look like: a <r>b</r><g>x</g> c Personally, I think that the first example above is easier to read than the second one (at least for diffs of code). I was planning to let this mail be accompanied with a patch, so I've started hacking on this feature already. However, halfway through some cleanups and a prototype implementation of the above (breaking some of the other --word-diff formats in the process), I found that the current generalization of the different styles as stored in diff_words_styles[] does not apply cleanly enough to my intended output format. While trying to extend this generalization to something that would fit, I found that I don't actually understand the rationale behind --word-diff and the formats well enough to find a proper implemenation (see the link at the bottom of this email for the unfinished code I hacked up until now). So, here's some observations and questions about how --word-diff works or should work. Comments are welcome, both in general terms as well as in terms of the word-diff implementation (I know my way around there by now). Intended use ------------ First of all, it seems that the main intended use of --word-diff is for LaTeX or HTML documents or similar, where blocks of running text might be hard-wrapped (and thus rewrapped after a small change). In these cases, a small change in wording could cause a lot of whitespace to shift, resulting in a big normal diff. The current word-diff implementation therefore simply does not show the whitespace (or rather, non-word) changes, since they're usually not relevant to LaTeX anyway. Is this indeed the main usecase, or are there others I'm missing? Inexact output -------------- Secondly, the --word-diff output currently never displays any changes to the non-word (whitespace) parts of a file. This makes sense for the LaTeX case, but sometimes you might want to get exact diff output instead. At first glance this seems possible by specifiying a word-regex of "." or something similar (i.e., make sure that the word regex matches everything). But this is problematic for newlines. The documentation states that stuff gets silently ignored if a newline ends up inside a word. For the --word-diff=color format, this is probably a fixed limitation of the otput format: you can't give a color to a newline (or a space, for that matter). Including a newline inside a {- ... -} block should not be a problem with the --word-diff=plain format, and something similar can be argued for the porcelain format. An alternative approach would be to add a --word-diff-exact flag, which would cause the whitespace between to matches of the word regex to be treated as a word as well and have it included in the generate word-diff. This still leaves an implementation problem: To generate the word-diff, the current code looks at one patch hunk at a time, collecting all the plus and minus lines. It then splits those lines into words and generates two new "files" containing one word per line. It then applies a diff to this new document to get the word-diff. When a word would contain a newline, this would effectively mean the word would be split into two words for the word-diff, which will probably screw up the output. An obvious solution would be to use some escape sequence (e.g. \n) for a newline, though that might get messy and inefficient. An alternative that seems feasable is to use the empty word (i.e., an empty line in the word-diff "files") to mean a newline. This would mean that every newline always breaks a word into two, regardless of what the word regex is set to (but I guess that makes sense anyway?). I also think this would allow complete diff output wrt whitespace and newlines, for output formats that support it: plain, (modified) porcelain and my proposed format. Porcelain format ---------------- Lastly, the "porcelain" word-diff format seems a bit weird to me. Is the format specified somewhere, or are there any programs that use it currently? I couldn't find any users inside the git.git tree itself? Looking at the format itself, it's a bit unclear to me what the ~ lines mean exactly. Commit 882749, which introduced the format says the mean "newlines in the input", but I'm not sure if this means the old file, new file or both. In fact, it seems that this uncertainty makes the porcelain-format ambiguous wrt newlines. For example, these two diff hunks: @@ -1,3 +1,2 @@ a -b c @@ -1,3 +1,3 @@ a -b + c both look the same in porcelain format, except for the hunk header. @@ -1,3 +1,2 @@ a ~ -b ~ c ~ @@ -1,3 +1,3 @@ a ~ -b ~ c ~ This is somewhat expected, of course, since the --word-diff formats are documented to show only changes to words, not to non-words/whitespace. So I guess it is expected that the output is ambigious wrt whitespace, but if so, what is the use of this porcelain format? Wouldn't it be make a lot more sense to make the format unambiguous and make it do word-based diff at the same time? I think this should be possible because of the explicit notation used for the newline. For example, Specifying the ~ lines to mean a newline in the old, new or both files depending on the previous +, - or space prefixed line is probably enough for this. By generating empty +, - or space prefixed lines when needed, every occurence of ~ could be disambiguated. For example, the above two diff hunks would then become the following. The only difference is the near-empty line (just a space prefix) after -b in the second hunk. @@ -1,3 +1,2 @@ a ~ -b ~ c ~ @@ -1,3 +1,3 @@ a ~ -b ~ c ~ So, these are some thoughts I've had while hacking on the code. As said, suggestions are welcome. I'd like my hacking to result in some useful patches, but right now I'm unsure what direction(s) I should be thinking/working in. In case you're interested in the hacking I've done so far, I've put it up here: http://git.stderr.nl/gitweb?p=matthijs/upstream/git.git;a=shortlog;h=refs/heads/word-diff Most of it is broken or not properly tested, but it gets an idea what kinds of cleanup I've been doing. Gr. Matthijs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:16 Understanding and improving --word-diff Matthijs Kooijman @ 2010-11-08 15:41 ` Matthieu Moy 2010-11-08 15:49 ` Matthijs Kooijman 2010-11-08 16:33 ` Thomas Rast 2010-11-08 17:35 ` Wincent Colaiuta ` (2 subsequent siblings) 3 siblings, 2 replies; 10+ messages in thread From: Matthieu Moy @ 2010-11-08 15:41 UTC (permalink / raw) To: git; +Cc: Thomas Rast, Johannes Schindelin Matthijs Kooijman <matthijs@stdin.nl> writes: > Lastly, the "porcelain" word-diff format seems a bit weird to me. Is > the format specified somewhere, or are there any programs that use it > currently? I couldn't find any users inside the git.git tree itself? There's a pending patch for "gitk --color-words". Actually, the porcelain format was precisely implemented to allow this patch to exist IIRC. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:41 ` Matthieu Moy @ 2010-11-08 15:49 ` Matthijs Kooijman 2010-11-08 16:33 ` Thomas Rast 1 sibling, 0 replies; 10+ messages in thread From: Matthijs Kooijman @ 2010-11-08 15:49 UTC (permalink / raw) To: Matthieu Moy; +Cc: git, Thomas Rast, Johannes Schindelin [-- Attachment #1: Type: text/plain, Size: 428 bytes --] Hi Matthieu, > There's a pending patch for "gitk --color-words". Actually, the > porcelain format was precisely implemented to allow this patch to > exist IIRC. I think you mean this patch? [PATCH v4' 2/2] gitk: add the equivalent of diff --color-words https://kerneltrap.org/mailarchive/git/2010/4/19/28707 I had seen it around, but hadn't realized it wasn't in master yet. I'll have a closer look at it. Thanks. Matthijs [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:41 ` Matthieu Moy 2010-11-08 15:49 ` Matthijs Kooijman @ 2010-11-08 16:33 ` Thomas Rast 1 sibling, 0 replies; 10+ messages in thread From: Thomas Rast @ 2010-11-08 16:33 UTC (permalink / raw) To: Matthijs Kooijman; +Cc: Matthieu Moy, git, Johannes Schindelin Matthieu Moy wrote: > Matthijs Kooijman <matthijs@stdin.nl> writes: > > > Lastly, the "porcelain" word-diff format seems a bit weird to me. Is > > the format specified somewhere, or are there any programs that use it > > currently? I couldn't find any users inside the git.git tree itself? > > There's a pending patch for "gitk --color-words". Actually, the > porcelain format was precisely implemented to allow this patch to > exist IIRC. Also, my intent was to specify the format fully under its option description. It's rather simple, but if you think the explanation/definition is inadequate, please provide a patch. We'd really want the writers of consumers (GUIs and such) to be very clear about how it is formatted. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:16 Understanding and improving --word-diff Matthijs Kooijman 2010-11-08 15:41 ` Matthieu Moy @ 2010-11-08 17:35 ` Wincent Colaiuta 2010-11-08 19:22 ` Thomas Rast 2010-11-09 22:01 ` Jeff King 3 siblings, 0 replies; 10+ messages in thread From: Wincent Colaiuta @ 2010-11-08 17:35 UTC (permalink / raw) To: Matthijs Kooijman; +Cc: git, Thomas Rast, Johannes Schindelin On 08/11/2010, at 16:16, Matthijs Kooijman wrote: > Hi folks, > > I recently discovered --word-diff (or rather, --color-words and found > --word-diff when I started to hack on the git master version) and I had > hoped it would make the unified diffs generated by git-diff more > readable. [snip] > Inexact output > -------------- > Secondly, the --word-diff output currently never displays any changes to > the non-word (whitespace) parts of a file. This makes sense for the > LaTeX case, but sometimes you might want to get exact diff output > instead. At first glance this seems possible by specifiying a word-regex > of "." or something similar (i.e., make sure that the word regex matches > everything). But this is problematic for newlines. The documentation > states that stuff gets silently ignored if a newline ends up inside a > word. For the --word-diff=color format, this is probably a fixed > limitation of the otput format: you can't give a color to a newline (or > a space, for that matter). [snip] > Porcelain format > ---------------- > Lastly, the "porcelain" word-diff format seems a bit weird to me. Is > the format specified somewhere, or are there any programs that use it > currently? I couldn't find any users inside the git.git tree itself? > > Looking at the format itself, it's a bit unclear to me what the ~ lines > mean exactly. Commit 882749, which introduced the format says the mean > "newlines in the input", but I'm not sure if this means the old file, > new file or both. I posted along similar lines back in September: http://article.gmane.org/gmane.comp.version-control.git/156375 I found I couldn't really use the porcelain format due to information loss and ended up having to roll my own, parse the normal diff output and using that. Cheers, Wincent ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:16 Understanding and improving --word-diff Matthijs Kooijman 2010-11-08 15:41 ` Matthieu Moy 2010-11-08 17:35 ` Wincent Colaiuta @ 2010-11-08 19:22 ` Thomas Rast 2010-11-09 22:01 ` Jeff King 3 siblings, 0 replies; 10+ messages in thread From: Thomas Rast @ 2010-11-08 19:22 UTC (permalink / raw) To: Matthijs Kooijman Cc: git, Johannes Schindelin, Matthieu Moy, Wincent Colaiuta Matthijs Kooijman wrote: > This is somewhat expected, of course, since the --word-diff formats are > documented to show only changes to words, not to non-words/whitespace. > So I guess it is expected that the output is ambigious wrt whitespace, > but if so, what is the use of this porcelain format? Wouldn't it be make > a lot more sense to make the format unambiguous and make it do > word-based diff at the same time? I think this should be possible > because of the explicit notation used for the newline. It's not just the newlines. You will note that when diffing e.g. a x c vs. a b c then because the word-diff engine doesn't bother with the non-word parts, you will end up with (using ordinary --word-diff) a [-x-]{+b+} c or using the porcelain format $ git diff --no-index --word-diff=porcelain a b | cat -E diff --git 1/a 2/b$ index 4f47486..b040ae0 100644$ --- 1/a$ +++ 2/b$ @@ -1 +1 @@$ a $ -x$ +b$ c$ ~$ Note how the second space after 'x' disappeared. The case that Wincent explains in the email he linked (thanks; I didn't even see it the first time around) shows a more drastic example. So the logic that inserts the space again is too simplistic. What it does not is something like (haven't checked again) always insert the postimage space that went with the preceding word. What it *would* have to do is actually compute differences in the space, while at the same time ignoring them for the purposes of the LCS algorithm. If it weren't for the customizable word-regex, it might be enough to put the spaces in the words again but enable the internal equivalent of -b. > Looking at the format itself, it's a bit unclear to me what the ~ lines > mean exactly. Commit 882749, which introduced the format says the mean > "newlines in the input", but I'm not sure if this means the old file, > new file or both. As Matthieu already said, the history of --word-diff=porcelain is just: I needed a way to pipe word-diff output to gitk for coloring, without any messing around with the color strings. The ~ is the minimal required feature to at least let git communicate *some* linebreaks. You are probably right in that while the format leaves open the possibility of showing addition and removal of space (although the current engine does not really attempt to compute it), I did not think far enough to let it show addition and removal of whitespace. > For example, Specifying the ~ lines to mean a newline in the old, new or > both files depending on the previous +, - or space prefixed line is > probably enough for this. By generating empty +, - or space prefixed > lines when needed, every occurence of ~ could be disambiguated. AFAICS that breaks down if you couple a newline-change-aware consumer with a non-aware producer (i.e., a new frontend with an old git), since there is currently no guarantee that the line before a ~ is context. So you would have to introduce a new format (porcelain2?) anyway, and if you're already going that route, then for simplicity I'd rather have something like ~+ (or so) instead of requiring the parser to track state. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-08 15:16 Understanding and improving --word-diff Matthijs Kooijman ` (2 preceding siblings ...) 2010-11-08 19:22 ` Thomas Rast @ 2010-11-09 22:01 ` Jeff King 2010-11-10 0:05 ` Johannes Schindelin 2010-11-18 6:40 ` Jeff King 3 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2010-11-09 22:01 UTC (permalink / raw) To: git, Thomas Rast, Johannes Schindelin On Mon, Nov 08, 2010 at 04:16:01PM +0100, Matthijs Kooijman wrote: > E.g., I would like to see: > > -a <r>b</r> c > +a <g>x</g> c > > Unfortunately, all --word-diff types currently departs from line-based - > and + lines and show the new version of the file with the changed words > (both old and new versions) shown inline, marked with coloring or > {- ... -} kind of syntax. E.g., with --word-diff=color, the above would > look like: > > a <r>b</r><g>x</g> c > > Personally, I think that the first example above is easier to read than > the second one (at least for diffs of code). Yeah, as you figured out, word diff is really about formats that aren't line oriented. I have seen other systems do what you are asking for, but I'm not sure they are nearly as complex as a true word diff. In particular, I have noticed two simplifications: 1. Only highlight between two lines that are in part of a hunk by themselves. In other words, highlight on this: context -foo bar +foo baz context but not this: context -foo bar -another line +foo baz +another changed line context Even though you could perhaps match up pairs of lines (in this case, the first removed line obviously corresponds with the first added line, etc), there are a ton of cases where it would be uselessly distracting, as the lines simply aren't related to each other (of course, there are one-liner changes where that is the case, too, but they are hopefully less common, and since it is only a single pair of lines, there is no guessing about how they were matched up). 2. Don't do real LCS matching between the pairs. Just find the common prefix and suffix, and highlight everything in the middle. This would yield something like: -a <r>b c d</r> e +a <g>x c z</g> e when you could in theory produce: -a <r>b</r> c <r>d</r> e -a <g>x</g> c <g>z</g> e but I suspect in practice it covers many situations. For fun, I hacked up the perl script below which will highlight (using ANSI reverse-codes!) a diff using both of these simplifications. It looks for diff hunks on stdin, so you can do: git log | perl /path/to/script.pl I'm curious to see how the output looks on some real-world diffs. I admit to not having given this problem much thought, but instead just sat down and started hacking. :) The script tries to handle color in the input sanely, so you can see your highlighting in color. But there are two caveats: 1. You have to pass --color to git explicitly, since piping stdout turns off git's color autodetection. 2. There is a buglet in git's color emitting code. For added lines, it produces "<g>+</g><g>rest of line</g>" which is annoying to parse. It is fixed by the patch below: diff --git a/diff.c b/diff.c index b1b0d3d..0efcdb7 100644 --- a/diff.c +++ b/diff.c @@ -363,9 +363,9 @@ static void emit_add_line(const char *reset, emit_line_0(ecbdata->opt, ws, reset, '+', line, len); else { /* Emit just the prefix, then the rest. */ - emit_line_0(ecbdata->opt, set, reset, '+', "", 0); + emit_line_0(ecbdata->opt, set, "", '+', "", 0); ws_check_emit(line, len, ecbdata->ws_rule, - ecbdata->opt->file, set, reset, ws); + ecbdata->opt->file, "", reset, ws); } } This patch, btw, is much easier to read when piped through my script. :) Here's the script: -- >8 -- #!/usr/bin/perl use strict; my $COLOR = qr/\x1b\[[0-9;]*m/; my @window; while (<>) { chomp; s/^($COLOR)*//; my $prefix = $&; s/($COLOR)*$//; my $suffix = $&; push @window, [$prefix, $_, $suffix]; if ($window[0] && $window[0]->[1] =~ /^(\@| )/ && $window[1] && $window[1]->[1] =~ /^-/ && $window[2] && $window[2]->[1] =~ /^\+/ && $window[3] && $window[3]->[1] !~ /^\+/) { show_line(shift @window); show_pair(shift @window, shift @window); } if (@window >= 4) { show_line(shift @window); } } if (@window == 3 && $window[0] && $window[0]->[1] =~ /^(\@| )/ && $window[1] && $window[1]->[1] =~ /^-/ && $window[2] && $window[2]->[1] =~ /^\+/) { show_line(shift @window); show_pair(shift @window, shift @window); } while (@window) { show_line(shift @window); } exit 0; sub show_line { my $line = shift; print @$line, "\n"; } sub show_pair { my ($from, $to) = @_; my @from = split //, $from->[1]; my @to = split //, $to->[1]; my $prefix = 1; while ($from[$prefix] eq $to[$prefix]) { $prefix++; } my $suffix = 0; while ($from[$#from-$suffix] eq $to[$#to-$suffix]) { $suffix++; } print $from->[0], highlight($from->[1], $prefix, $suffix), $from->[2], "\n"; print $to->[0], highlight($to->[1], $prefix, $suffix), $to->[2], "\n"; } sub highlight { my ($line, $prefix, $suffix) = @_; my $end = length($line) - $suffix; return join('', substr($line, 0, $prefix), "\x1b[7m", substr($line, $prefix, $end - $prefix), "\x1b[27m", substr($line, $end) ); } ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-09 22:01 ` Jeff King @ 2010-11-10 0:05 ` Johannes Schindelin 2010-11-10 4:17 ` Jeff King 2010-11-18 6:40 ` Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Johannes Schindelin @ 2010-11-10 0:05 UTC (permalink / raw) To: Jeff King; +Cc: git, Thomas Rast Hi, On Tue, 9 Nov 2010, Jeff King wrote: > On Mon, Nov 08, 2010 at 04:16:01PM +0100, Matthijs Kooijman wrote: > > > E.g., I would like to see: > > > > -a <r>b</r> c > > +a <g>x</g> c > > > > Unfortunately, all --word-diff types currently departs from line-based > > - and + lines and show the new version of the file with the changed > > words (both old and new versions) shown inline, marked with coloring > > or {- ... -} kind of syntax. E.g., with --word-diff=color, the above > > would look like: > > > > a <r>b</r><g>x</g> c > > > > Personally, I think that the first example above is easier to read > > than the second one (at least for diffs of code). > > Yeah, as you figured out, word diff is really about formats that aren't > line oriented. Not really correct. While the _current_ way to show word diffs is imitating GNU wdiff, the internal data structure does allow for more sophisticated output. Ciao, Dscho P.S.: Peff, I hope you're fine with me responding to the first sentence only. After all, you know that my attention span is 7.2 seconds. Some people taught me to put a smiley after such a statement, so: ;-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-10 0:05 ` Johannes Schindelin @ 2010-11-10 4:17 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2010-11-10 4:17 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Thomas Rast On Wed, Nov 10, 2010 at 01:05:56AM +0100, Johannes Schindelin wrote: > > Yeah, as you figured out, word diff is really about formats that aren't > > line oriented. > > Not really correct. While the _current_ way to show word diffs is > imitating GNU wdiff, the internal data structure does allow for more > sophisticated output. Fair enough. I was basing my statement on the output. If somebody would like to demonstrate how the code can be used to get line-oriented diffs with highlighting, I would be very happy to eat my words. :) > P.S.: Peff, I hope you're fine with me responding to the first sentence > only. After all, you know that my attention span is 7.2 seconds. Heh. I am happy you responded at all. You are very quiet these days. :) -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding and improving --word-diff 2010-11-09 22:01 ` Jeff King 2010-11-10 0:05 ` Johannes Schindelin @ 2010-11-18 6:40 ` Jeff King 1 sibling, 0 replies; 10+ messages in thread From: Jeff King @ 2010-11-18 6:40 UTC (permalink / raw) To: git; +Cc: Thomas Rast, Johannes Schindelin On Tue, Nov 09, 2010 at 05:01:36PM -0500, Jeff King wrote: > 2. There is a buglet in git's color emitting code. For added lines, it > produces "<g>+</g><g>rest of line</g>" which is annoying to parse. > It is fixed by the patch below: > > diff --git a/diff.c b/diff.c > index b1b0d3d..0efcdb7 100644 > --- a/diff.c > +++ b/diff.c > @@ -363,9 +363,9 @@ static void emit_add_line(const char *reset, > emit_line_0(ecbdata->opt, ws, reset, '+', line, len); > else { > /* Emit just the prefix, then the rest. */ > - emit_line_0(ecbdata->opt, set, reset, '+', "", 0); > + emit_line_0(ecbdata->opt, set, "", '+', "", 0); > ws_check_emit(line, len, ecbdata->ws_rule, > - ecbdata->opt->file, set, reset, ws); > + ecbdata->opt->file, "", reset, ws); > } > } FYI, I looked further at this patch, and it should not be used. There are cases where ws_check_emit will output a reset (e.g., because it has colored some whitespace), and needs to be able to set the proper color again. So while I think the intent of this patch is fine (to avoid duplicate colorizing in the output), the actual implementation would need to be much more complex. Instead, here's an updated version of my color highlighting script that handles this case. With the patch I posted earlier today: [PATCH 2/2] allow command-specific pagers in pager.<cmd> http://article.gmane.org/gmane.comp.version-control.git/161624 you can do: git config pager.log '/path/to/diff-highlight.pl | less' and get automagic coloring. Still, this is a bit of a hack, and there are some funny corner cases, so something that used the internal diff machinery would be much nicer. -Peff -- >8 -- #!/usr/bin/perl use strict; my $COLOR = qr/\x1b\[[0-9;]*m/; my $RESET = "\x1b[0m"; my $RED = "\x1b[31m"; my $GREEN = "\x1b[32m"; my $REVERSE = "\x1b[7m"; my $UNREVERSE = "\x1b[27m"; my @window; while (<>) { chomp; my $plain = $_; $plain =~ s/$COLOR//g; push @window, [$_, $plain]; if ($window[0] && $window[0]->[1] =~ /^(\@| )/ && $window[1] && $window[1]->[1] =~ /^-/ && $window[2] && $window[2]->[1] =~ /^\+/ && $window[3] && $window[3]->[1] !~ /^\+/) { show_line(shift @window); show_pair(shift @window, shift @window); } if (@window >= 4) { show_line(shift @window); } } if (@window == 3 && $window[0] && $window[0]->[1] =~ /^(\@| )/ && $window[1] && $window[1]->[1] =~ /^-/ && $window[2] && $window[2]->[1] =~ /^\+/) { show_line(shift @window); show_pair(shift @window, shift @window); } while (@window) { show_line(shift @window); } exit 0; sub show_line { my $line = shift; print $line->[0], "\n"; } sub show_pair { my ($from, $to) = @_; my @from = split //, $from->[1]; my @to = split //, $to->[1]; my $prefix = 1; while ($from[$prefix] eq $to[$prefix]) { $prefix++; } my $suffix = 0; while ($from[$#from-$suffix] eq $to[$#to-$suffix]) { $suffix++; } print $RED, highlight($from->[1], $prefix, $suffix), $RESET, "\n"; print $GREEN, highlight($to->[1], $prefix, $suffix), $RESET, "\n"; } sub highlight { my ($line, $prefix, $suffix) = @_; my $end = length($line) - $suffix; return $line unless $end > $prefix; return join('', substr($line, 0, $prefix), $REVERSE, substr($line, $prefix, $end - $prefix), $UNREVERSE, substr($line, $end) ); } ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-11-18 6:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-08 15:16 Understanding and improving --word-diff Matthijs Kooijman 2010-11-08 15:41 ` Matthieu Moy 2010-11-08 15:49 ` Matthijs Kooijman 2010-11-08 16:33 ` Thomas Rast 2010-11-08 17:35 ` Wincent Colaiuta 2010-11-08 19:22 ` Thomas Rast 2010-11-09 22:01 ` Jeff King 2010-11-10 0:05 ` Johannes Schindelin 2010-11-10 4:17 ` Jeff King 2010-11-18 6:40 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).