From: Michael J Gruber <git@drmicha.warpmail.net>
To: Scott Johnson <scottj75074@yahoo.com>
Cc: git@vger.kernel.org, trast@student.ethz.ch
Subject: Re: html userdiff is not showing all my changes
Date: Wed, 15 Dec 2010 10:06:21 +0100 [thread overview]
Message-ID: <4D08850D.3010402@drmicha.warpmail.net> (raw)
In-Reply-To: <561247.22837.qm@web110707.mail.gq1.yahoo.com>
Scott Johnson venit, vidit, dixit 15.12.2010 04:47:
> I am attempting to do a word diff of an html source file. Part of the removed
> html is disappearing from the diff when I enable the fancy html word diff.
>
> Here's the output from basic `git diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> - <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
> - <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
> + <li><em></em><a href="#">yws-maps</a></li>
> + <li><em></em><a href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
>
> Here's the default `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+}
> href="#">yws-maps</a></li>
> [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+}
> href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
> Which is correct, but less than ideal because it highlights much more than the
> actual changes.
>
> So I create a .gitattributes file with one line:
> *.html diff=html
>
> And rerun `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
> <li><em></em><a href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
> Yikes! What happened to the second line of changes? The removed code is not
> displayed at all.
>
> This is running git 1.7.3.3.
>
> I suspect the problem is in the html patterns in userdiff.c, but I don't
> understand the word-diff-regex well enough to fix it.
The wordRegex should really only control what comprises a word, i.e. the
granularity of --word-diff. (Where do we insert additional line-breaks
before running ordinary diff?)
If a wordRegex can make parts of diff disappear than there is problem
deeper in the diff machinery. Can you trim this down to a minimal example?
Michael
next prev parent reply other threads:[~2010-12-15 9:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-15 3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15 9:06 ` Michael J Gruber [this message]
2010-12-15 9:12 ` Matthijs Kooijman
2010-12-15 9:29 ` Michael J Gruber
2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 15:13 ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13 ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13 ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13 ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
[not found] ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48 ` Scott Johnson
2010-12-18 16:17 ` [PATCH v2 " Thomas Rast
2010-12-18 16:17 ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17 ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00 ` Junio C Hamano
2010-12-19 1:59 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02 ` Junio C Hamano
2010-12-19 2:10 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47 ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48 ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00 ` Re*: " Junio C Hamano
2011-01-11 21:48 ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49 ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24 ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D08850D.3010402@drmicha.warpmail.net \
--to=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=scottj75074@yahoo.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).