From: Michael J Gruber <git@drmicha.warpmail.net>
To: Scott Johnson <scottj75074@yahoo.com>
Cc: git@vger.kernel.org, trast@student.ethz.ch
Subject: Re: html userdiff is not showing all my changes
Date: Wed, 15 Dec 2010 10:06:21 +0100 [thread overview]
Message-ID: <4D08850D.3010402@drmicha.warpmail.net> (raw)
In-Reply-To: <561247.22837.qm@web110707.mail.gq1.yahoo.com>
Scott Johnson venit, vidit, dixit 15.12.2010 04:47:
> I am attempting to do a word diff of an html source file. Part of the removed
> html is disappearing from the diff when I enable the fancy html word diff.
>
> Here's the output from basic `git diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> - <li class="yws-maps"><em></em><a href="#">yws-maps</a></li>
> - <li class="ydn-delicious"><em></em><a href="#">ydn-delicious</a></li>
> + <li><em></em><a href="#">yws-maps</a></li>
> + <li><em></em><a href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
>
> Here's the default `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> [-<li class="yws-maps"><em></em><a-]{+<li><em></em><a+}
> href="#">yws-maps</a></li>
> [-<li class="ydn-delicious"><em></em><a-]{+<li><em></em><a+}
> href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
> Which is correct, but less than ideal because it highlights much more than the
> actual changes.
>
> So I create a .gitattributes file with one line:
> *.html diff=html
>
> And rerun `git diff --word-diff`:
> diff --git a/adv_layout_source.html b/adv_layout_source.html
> index 18a81dd..c4ed609 100644
> --- a/adv_layout_source.html
> +++ b/adv_layout_source.html
> @@ -42,8 +42,8 @@
> <ul>
> <li class="ydn-patterns"><em></em><a href="#">ydn-patterns</a></li>
> <li class="ydn-mail"><em></em><a href="#">ydn-mail</a></li>
> <li[-class="yws-maps"-]><em></em><a href="#">yws-maps</a></li>
> <li><em></em><a href="#">ydn-delicious</a></li>
> <li class="yws-flickr"><em></em><a href="#">yws-flickr</a></li>
> <li class="yws-events"><em></em><a href="#">yws-events</a></li>
> </ul>
>
> Yikes! What happened to the second line of changes? The removed code is not
> displayed at all.
>
> This is running git 1.7.3.3.
>
> I suspect the problem is in the html patterns in userdiff.c, but I don't
> understand the word-diff-regex well enough to fix it.
The wordRegex should really only control what comprises a word, i.e. the
granularity of --word-diff. (Where do we insert additional line-breaks
before running ordinary diff?)
If a wordRegex can make parts of diff disappear than there is problem
deeper in the diff machinery. Can you trim this down to a minimal example?
Michael
next prev parent reply other threads:[~2010-12-15 9:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-15 3:47 html userdiff is not showing all my changes Scott Johnson
2010-12-15 9:06 ` Michael J Gruber [this message]
2010-12-15 9:12 ` Matthijs Kooijman
2010-12-15 9:29 ` Michael J Gruber
2010-12-15 15:13 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 15:13 ` [PATCH 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-15 15:13 ` [PATCH 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-15 15:13 ` [PATCH 3/4] userdiff: fix typo in ruby word regex Thomas Rast
2010-12-15 15:13 ` [PATCH 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
[not found] ` <913156.57703.qm@web110711.mail.gq1.yahoo.com>
2010-12-15 19:51 ` [PATCH 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-15 20:48 ` Scott Johnson
2010-12-18 16:17 ` [PATCH v2 " Thomas Rast
2010-12-18 16:17 ` [PATCH v2 1/4] diff.c: pass struct diff_words into find_word_boundaries Thomas Rast
2010-12-18 16:17 ` [PATCH v2 2/4] diff.c: implement a sanity check for word regexes Thomas Rast
2010-12-18 21:00 ` Junio C Hamano
2010-12-19 1:59 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 3/4] userdiff: fix typo in ruby and python " Thomas Rast
2010-12-18 21:02 ` Junio C Hamano
2010-12-19 2:10 ` Thomas Rast
2010-12-18 16:17 ` [PATCH v2 4/4] t4034: bulk verify builtin word regex sanity Thomas Rast
2011-01-11 21:47 ` [RFC/PATCH 0/3] " Jonathan Nieder
2011-01-11 21:48 ` [PATCH 1/3] " Jonathan Nieder
2011-01-18 18:00 ` Re*: " Junio C Hamano
2011-01-11 21:48 ` [PATCH 2/3] userdiff: simplify word-diff safeguard Jonathan Nieder
2011-01-11 21:49 ` [PATCH 3/3] t4034 (diff --word-diff): style suggestions Jonathan Nieder
2010-12-18 16:24 ` [PATCH v2 0/4] --word-regex sanity checking and such Thomas Rast
2010-12-18 20:48 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D08850D.3010402@drmicha.warpmail.net \
--to=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
--cc=scottj75074@yahoo.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.