git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH, WAS: "weird diff output?" v2 0/2] implement empty line diff chunk heuristic
@ 2016-04-15 21:56 Jacob Keller
  2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function Jacob Keller
  2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
  0 siblings, 2 replies; 5+ messages in thread
From: Jacob Keller @ 2016-04-15 21:56 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

From: Jacob Keller <jacob.keller@gmail.com>

Second version of my series with a few more minor fixups. I left the
diff command line and configuration option alone for now, suspecting
that long term we either (a) remove it or (b) use a gitattribute, so
there is no reason to bikeshed the name or its contents right now.

TODO:
* add some tests
* think about whether we need a git attribute or not (I did some
  thinking, and if we do need to configure this at all, this is where I
  would put it)
* figure out how to make is_emptyline CRLF aware

Changes since my v1:
* rename xdl_hash_and_recmatch to recs_match
* remove counting empty lines in the first section of the looping

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

For reviewer convenience, the interdiff between v1 and v2:

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index cebf82702d2a..9265d60d9571 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -173,8 +173,8 @@ include::mergetools-diff.txt[]
 diff.emptyLineHeuristic::
 	Set this option to true to enable the empty line chunk heuristic when
 	producing diff output. This heuristic will attempt to shift hunks such
-	that a common empty line occurs below the hunk with the rest of the
-	context above it.
+	that the last common empty line occurs below the hunk with the rest of
+	the context above it.
 
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 83984b90f82f..9436ad735243 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -405,7 +405,7 @@ static int is_emptyline(const char *recs)
 	return recs[0] == '\n'; /* CRLF not covered here */
 }
 
-static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags)
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
 		xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
@@ -457,9 +457,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the last line of the current change group, shift backward
 			 * the group.
 			 */
-			while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
-				emptylines += is_emptyline(recs[ix - 1]->ptr);
-
+			while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -486,7 +484,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the line next of the current change group, shift forward
 			 * the group.
 			 */
-			while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, ix, flags)) {
+			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
 				emptylines += is_emptyline(recs[ix]->ptr);
 
 				rchg[ixs++] = 0;
@@ -527,7 +525,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
 			while (ixs > 0 &&
 			       !is_emptyline(recs[ix - 1]->ptr) &&
-			       xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
+			       recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 			}

Jacob Keller (1):
  xdiff: add recs_match helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  6 ++++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 40 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 61 insertions(+), 4 deletions(-)

-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function
  2016-04-15 21:56 [RFC PATCH, WAS: "weird diff output?" v2 0/2] implement empty line diff chunk heuristic Jacob Keller
@ 2016-04-15 21:56 ` Jacob Keller
  2016-04-15 22:46   ` Jeff King
  2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
  1 sibling, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2016-04-15 21:56 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller, Jacob Keller

From: Jacob Keller <jacob.keller@gmail.com>

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 xdiff/xdiffi.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d6326e..d14c47de5e36 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+{
+	return (recs[ixs]->ha == recs[ix]->ha &&
+		xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+			     recs[ix]->ptr, recs[ix]->size,
+			     flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the last line of the current change group, shift backward
 			 * the group.
 			 */
-			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
-			       xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+			while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the line next of the current change group, shift forward
 			 * the group.
 			 */
-			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-			       xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
+			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+				emptylines += is_emptyline(recs[ix]->ptr);
+
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH, WAS: "weird diff output?" v2 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 21:56 [RFC PATCH, WAS: "weird diff output?" v2 0/2] implement empty line diff chunk heuristic Jacob Keller
  2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function Jacob Keller
@ 2016-04-15 21:56 ` Jacob Keller
  1 sibling, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2016-04-15 21:56 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

From: Stefan Beller <sbeller@google.com>

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/diff-config.txt  |  6 ++++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 24 ++++++++++++++++++++++++
 5 files changed, 49 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..9265d60d9571 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+	Set this option to true to enable the empty line chunk heuristic when
+	producing diff output. This heuristic will attempt to shift hunks such
+	that the last common empty line occurs below the hunk with the rest of
+	the context above it.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+	When possible, shift common empty lines in diff hunks below the hunk
+	such that the last common empty line for each hunk is below, with the
+	rest of the context above the hunk.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.emptylineheuristic")) {
+		diff_empty_line_heuristic = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_empty_line_heuristic)
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--empty-line-heuristic"))
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+	else if (!strcmp(arg, "--no-empty-line-heuristic"))
+		DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index d14c47de5e36..9436ad735243 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int is_emptyline(const char *recs)
+{
+	return recs[0] == '\n'; /* CRLF not covered here */
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -411,6 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+	unsigned int emptylines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
+			emptylines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -506,6 +513,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			rchg[--ix] = 0;
 			while (rchgo[--ixo]);
 		}
+
+		/*
+		 * If a group can be moved back and forth, see if there is an
+		 * empty line in the moving space. If there is an empty line,
+		 * make sure the last empty line is the end of the group.
+		 *
+		 * As we shifted the group forward as far as possible, we only
+		 * need to shift it back if at all.
+		 */
+		if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
+			while (ixs > 0 &&
+			       !is_emptyline(recs[ix - 1]->ptr) &&
+			       recs_match(recs, ixs - 1, ix - 1, flags)) {
+				rchg[--ixs] = 1;
+				rchg[--ix] = 0;
+			}
+		}
 	}
 
 	return 0;
-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function
  2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function Jacob Keller
@ 2016-04-15 22:46   ` Jeff King
  2016-04-15 22:48     ` Jacob Keller
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-04-15 22:46 UTC (permalink / raw)
  To: Jacob Keller
  Cc: git, Stefan Beller, Junio C Hamano, Jens Lehmann, Davide Libenzi,
	Jacob Keller

On Fri, Apr 15, 2016 at 02:56:21PM -0700, Jacob Keller wrote:

> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>  			 * the line next of the current change group, shift forward
>  			 * the group.
>  			 */
> -			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
> -			       xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
> +			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
> +				emptylines += is_emptyline(recs[ix]->ptr);
> +

I have not looked closely at your patches yet, but is this hunk right?
The is_emptyline stuff doesn't come in until patch 2.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function
  2016-04-15 22:46   ` Jeff King
@ 2016-04-15 22:48     ` Jacob Keller
  0 siblings, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2016-04-15 22:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Jacob Keller, Git mailing list, Stefan Beller, Junio C Hamano,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 3:46 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 15, 2016 at 02:56:21PM -0700, Jacob Keller wrote:
>
>> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                        * the line next of the current change group, shift forward
>>                        * the group.
>>                        */
>> -                     while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
>> -                            xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
>> +                     while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
>> +                             emptylines += is_emptyline(recs[ix]->ptr);
>> +
>
> I have not looked closely at your patches yet, but is this hunk right?
> The is_emptyline stuff doesn't come in until patch 2.
>
> -Peff

Oops, I suspect this is a rebase mistake, will fix it.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-15 22:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-15 21:56 [RFC PATCH, WAS: "weird diff output?" v2 0/2] implement empty line diff chunk heuristic Jacob Keller
2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 1/2] xdiff: add recs_match helper function Jacob Keller
2016-04-15 22:46   ` Jeff King
2016-04-15 22:48     ` Jacob Keller
2016-04-15 21:56 ` [RFC PATCH, WAS: "weird diff output?" v2 2/2] xdiff: implement empty line chunk heuristic Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).