git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Thomas Rast <trast@student.ethz.ch>
Cc: git@vger.kernel.org,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Teemu Likonen <tlikonen@iki.fi>
Subject: Re: [PATCH v3 2/4] word diff: customizable word splits
Date: Sun, 11 Jan 2009 14:20:07 -0800	[thread overview]
Message-ID: <7vfxjppjs8.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 529cd830908f018f796dbc46d3b055c1f8ba9c1b.1231669012.git.trast@student.ethz.ch

Thomas Rast <trast@student.ethz.ch> writes:

> Allows for user-configurable word splits when using --color-words.
> This can make the diff more readable if the regex is configured
> according to the language of the file.
>
> Each non-overlapping match of the regex is a word; everything in
> between is whitespace.

What happens if the input "language" does not have any inter-word spacing
but its words can still be expressed by regexp patterns?

ImagineALanguageThatAllowsYouToWriteSomethingLikeThis.  Does the mechanism
help users who want to do word-diff files written in such a language by
outputting:

	ImagineALanguage<red>That</red><green>Which</green>AllowsYou...

when '[A-Z][a-z]*' is given by the word pattern?

> We disallow matching the empty string (because
> it results in an endless loop) or a newline (breaks color escapes and
> interacts badly with the input coming from the usual line diff).  To
> help the user, we set REG_NEWLINE so that [^...] and . do not match
> newlines.

AndImagineALanguageWhoseWordStruc
tureDoesNotCareAboutLineBreak

Can you help users with such payload?

	Side note.  Yes, I am coming from Japanese background.

        Side note 2.  No, I am not saying your code must support both of
        the above to be acceptable.  I am just gauging the design
        assumptions and limitations.

> Insertion of spaces is somewhat subtle.  We echo a "context" space
> twice (once on each side of the diff) if it follows directly after a
> word.  While this loses a tiny bit of accuracy, it runs together long
> sequences of changed word into one removed and one added block, making
> the diff much more readable.

I guess this part can be later enhanced to be more precise, so that it
keeps the original context space more faithfully (i.e. does not lose two
consecutive spaces in the original occidental script, and does not insert
any extra space to the oriental script), if we were to support the second
example I gave above in the future as a follow-up patch.

> +--color-words[=<regex>]::
> +	Show colored word diff, i.e., color words which have changed.
> +	By default, a new word only starts at whitespace, so that a
> +	'word' is defined as a maximal sequence of non-whitespace
> +	characters.  The optional argument <regex> can be used to
> +	configure this.
> ++
> +The <regex> must be an (extended) regular expression.  When set, every
> +non-overlapping match of the <regex> is considered a word.  (Regular
> +expression semantics ensure that quantifiers grab a maximal sequence
> +of characters.)  Anything between these matches is considered
> +whitespace and ignored for the purposes of finding differences.  You
> +may want to append `|\S` to your regular expression to make sure that
> +it matches all non-whitespace characters.

Whose regexp library do we assume here?  Traditionally we limited
ourselves to POSIX BRE, and I do not think anybody minds using POSIX ERE
here, but we need to be clear.  In either case \S is a pcre outside
POSIX.

The rest I only skimmed but did not spot anything glaringly wrong; thanks.

  reply	other threads:[~2009-01-11 22:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-09  0:05 [RFC PATCH] make diff --color-words customizable Thomas Rast
2009-01-09  0:25 ` Johannes Schindelin
2009-01-09  0:50   ` Thomas Rast
2009-01-09 11:15     ` Johannes Schindelin
2009-01-09 11:59       ` [ILLUSTRATION PATCH] color-words: take an optional regular expression describing words Johannes Schindelin
2009-01-09 12:24         ` Thomas Rast
2009-01-09 13:05           ` Teemu Likonen
2009-01-10  0:57             ` [PATCH v2] make diff --color-words customizable Thomas Rast
2009-01-10  1:50               ` Jakub Narebski
2009-01-10 11:37                 ` Johannes Schindelin
2009-01-10 13:36                   ` Jakub Narebski
2009-01-10 14:08                     ` Johannes Schindelin
2009-01-12 23:59                       ` Jakub Narebski
2009-01-13  0:40                         ` Johannes Schindelin
2009-01-10 17:53                     ` Davide Libenzi
2009-01-13  0:52                       ` Jakub Narebski
2009-01-13 18:50                         ` Davide Libenzi
2009-01-10 10:49               ` Johannes Schindelin
2009-01-10 11:25                 ` Thomas Rast
2009-01-10 11:45                   ` Johannes Schindelin
2009-01-11  1:34                     ` Junio C Hamano
2009-01-11 10:27                       ` [PATCH v3 0/4] customizable --color-words Thomas Rast
2009-01-11 10:27                       ` [PATCH v3 1/4] word diff: comments, preparations for regex customization Thomas Rast
2009-01-11 13:41                         ` Johannes Schindelin
2009-01-11 19:49                         ` Johannes Schindelin
2009-01-11 22:19                         ` Junio C Hamano
2009-01-11 10:27                       ` [PATCH v3 2/4] word diff: customizable word splits Thomas Rast
2009-01-11 22:20                         ` Junio C Hamano [this message]
2009-01-11 10:27                       ` [PATCH v3 3/4] word diff: make regex configurable via attributes Thomas Rast
2009-01-11 23:20                         ` Junio C Hamano
2009-01-11 10:27                       ` [PATCH v3 4/4] word diff: test customizable word splits Thomas Rast
2009-01-09  9:53 ` [RFC PATCH] make diff --color-words customizable Jeff King
2009-01-09 11:18   ` Johannes Schindelin
2009-01-09 11:22     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vfxjppjs8.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=tlikonen@iki.fi \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).