git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git diff --exit-code does not honour textconv setting
@ 2016-03-20 12:43 Georg Pichler
  2016-04-05 14:22 ` Michael J Gruber
  0 siblings, 1 reply; 4+ messages in thread
From: Georg Pichler @ 2016-03-20 12:43 UTC (permalink / raw)
  To: git


[-- Attachment #1.1: Type: text/plain, Size: 1552 bytes --]

Hi,

I realized that "git diff --exit-code" does not honour textconv settings.
Maybe this behaviour is desired. It can be partially circumvented by using the "-b" flag if one does not care about whitespace changes.
To reproduce this, create an empty repository and run the following commands:

(I was using git version 2.7.3)

$ git config --add diff.void.textconv test
$ echo "foo diff=void" >.gitattributes
$ echo foo >foo
$ git add . && git commit -m "Init"
[master (root-commit) 70c39d9] Init
2 files changed, 2 insertions(+)
create mode 100644 .gitattributes
create mode 100644 foo
$ echo bar >foo
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified: foo

no changes added to commit (use "git add" and/or "git commit -a")
$ git diff
$ git diff --exit-code
[exits with 1, no output]
$ git diff --exit-code -b
[exits with 0, no output]

The "test" command is used as it does not generate any output on stdout.

I would expect "git diff --exit-code" to return with exit code 0. If this is not desired, it should be clearly stated in the man page,
that "--exit-code" does not honour the textconv setting, except if "-b" is given. Currently this is not clear:

       --exit-code
           Make the program exit with codes similar to diff(1). That is, it exits
           with 1 if there were differences and 0 means no differences.

Best,
Georg Pichler


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git diff --exit-code does not honour textconv setting
  2016-03-20 12:43 git diff --exit-code does not honour textconv setting Georg Pichler
@ 2016-04-05 14:22 ` Michael J Gruber
  2016-04-05 23:16   ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Michael J Gruber @ 2016-04-05 14:22 UTC (permalink / raw)
  To: Georg Pichler, git

Georg Pichler venit, vidit, dixit 20.03.2016 13:43:
> Hi,
> 
> I realized that "git diff --exit-code" does not honour textconv settings.
> Maybe this behaviour is desired. It can be partially circumvented by using the "-b" flag if one does not care about whitespace changes.
> To reproduce this, create an empty repository and run the following commands:
> 
> (I was using git version 2.7.3)
> 
> $ git config --add diff.void.textconv test
> $ echo "foo diff=void" >.gitattributes
> $ echo foo >foo
> $ git add . && git commit -m "Init"
> [master (root-commit) 70c39d9] Init
> 2 files changed, 2 insertions(+)
> create mode 100644 .gitattributes
> create mode 100644 foo
> $ echo bar >foo
> $ git status
> On branch master
> Changes not staged for commit:
> (use "git add <file>..." to update what will be committed)
> (use "git checkout -- <file>..." to discard changes in working directory)
> 
> modified: foo
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> $ git diff
> $ git diff --exit-code
> [exits with 1, no output]
> $ git diff --exit-code -b
> [exits with 0, no output]
> 
> The "test" command is used as it does not generate any output on stdout.

"test" is a bit of a red herring here since it will receive commands.
But your example works even with "true" which ignores its commands and
produces no output.

> I would expect "git diff --exit-code" to return with exit code 0. If this is not desired, it should be clearly stated in the man page,
> that "--exit-code" does not honour the textconv setting, except if "-b" is given. Currently this is not clear:
> 
>        --exit-code
>            Make the program exit with codes similar to diff(1). That is, it exits
>            with 1 if there were differences and 0 means no differences.
> 
> Best,
> Georg Pichler

The description doesn't make it clear whether exit-code refers to the
actual diff (foo vs. bar) or to the diff after textconv (empty vs.
empty). In any case, "-b" should not make a difference for your example.


diff_flush() in diff.c has this piece of code:

        /*
         * Report the content-level differences with HAS_CHANGES;
         * diff_addremove/diff_change does not set the bit when
         * DIFF_FROM_CONTENTS is in effect (e.g. with -w).
         */
        if (DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
                if (options->found_changes)
                        DIFF_OPT_SET(options, HAS_CHANGES);
                else
                        DIFF_OPT_CLR(options, HAS_CHANGES);
        }

So it's clear that depending on "-b" (or "-w") or not, it's taking
shortcuts or looking at the textconved diff but I'm not sure where's the
proper place to fix that.

Michael

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git diff --exit-code does not honour textconv setting
  2016-04-05 14:22 ` Michael J Gruber
@ 2016-04-05 23:16   ` Junio C Hamano
  2016-04-06  6:44     ` Michael J Gruber
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2016-04-05 23:16 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Georg Pichler, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

>> The "test" command is used as it does not generate any output on stdout.
>
> "test" is a bit of a red herring here since it will receive commands.
> But your example works even with "true" which ignores its commands and
> produces no output.

Either sounds strange, as they exit without reading their input at
all, so the other side of the pipe may get an write error it has to
handle ;-)

> The description doesn't make it clear whether exit-code refers to
> the actual diff (foo vs. bar) or to the diff after textconv (empty
> vs. empty). In any case, "-b" should not make a difference for
> your example.
>
> diff_flush() in diff.c has this piece of code:
>

It is understandable that "-b" makes difference.  The actual
short-cut happens a bit before the code you quoted.

	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
	    DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
	    DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
		/*
		 * run diff_flush_patch for the exit status. setting
		 * options->file to /dev/null should be safe, because we
		 * aren't supposed to produce any output anyway.
		 */
		if (options->close_file)
			fclose(options->file);
		options->file = fopen("/dev/null", "w");
		if (!options->file)
			die_errno("Could not open /dev/null");
		options->close_file = 1;
		for (i = 0; i < q->nr; i++) {
			struct diff_filepair *p = q->queue[i];
			if (check_pair_status(p))
				diff_flush_patch(p, options);
			if (options->found_changes)
				break;
		}
	}

When running without producing output but we need the exit status,
unless DIFF_FROM_CONTENTS is set (e.g. "-b" in use), we do not run
the code that would inspect the blob contents that would have
produced the actual patch.  When DIFF_FROM_CONTENTS is set, we need
to dig into the blob contents and the body of the above if() gets
run only to trigger o->found_changes (e.g. in builtin_diff() that
sets ecbdata.found_changesp to point at o->found_changes before
calling into the xdiff machinery), but the output is discarded to
/dev/null.

The textconv filter is an unfortunate beast, in that while some
paths use one while others don't, so it won't be as simple as "-b"
that flips DIFF_FROM_CONTENTS to say "We need to inspect blobs for
ALL FILEPAIRS to see if there is any difference" if we want to keep
the optimization as much as possible.

Without DIFF_FROM_CONTENTS set, any filepair that point at two
different blobs that might end up to be the same after applying
textconv will flip the o->found_changes bit on, and with
EXIT_WITH_STATUS bit on, we have another optimization to skip
checking the remainder of the filepairs.  So if you want to support
textconv with QUICK, you would also need to disable that
optimization by teaching diff_change() to refrain from setting
HAS_CHANGES bit, i.e.

    diff --git a/diff.h b/diff.h
    index 125447b..f6c0c07 100644
    --- a/diff.h
    +++ b/diff.h
    @@ -69,7 +69,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
     #define DIFF_OPT_FIND_COPIES_HARDER  (1 <<  6)
     #define DIFF_OPT_FOLLOW_RENAMES      (1 <<  7)
     #define DIFF_OPT_RENAME_EMPTY        (1 <<  8)
    -/* (1 <<  9) unused */
    +#define DIFF_OPT_WITH_TEXTCONV       (1 <<  9)
     #define DIFF_OPT_HAS_CHANGES         (1 << 10)
     #define DIFF_OPT_QUICK               (1 << 11)
     #define DIFF_OPT_NO_INDEX            (1 << 12)

    diff --git a/diff.c b/diff.c
    index 4dfe660..0016ad2 100644
    --- a/diff.c
    +++ b/diff.c
    @@ -5018,6 +5018,11 @@ void diff_change(struct diff_options *options,
            two->dirty_submodule = new_dirty_submodule;
            p = diff_queue(&diff_queued_diff, one, two);

    +       if (either path one or path two has textconv defined) {
    +               DIFF_OPT_SET(options, DIFF_WITH_TEXTCONV);
    +               return;
    +       }
    +
            if (DIFF_OPT_TST(options, DIFF_FROM_CONTENTS))
                    return;

And then in order to keep the optimization, add this to the above
codepath:

    diff --git a/diff.c b/diff.c
    index 4dfe660..7b318cc 100644
    --- a/diff.c
    +++ b/diff.c
    @@ -4598,6 +4598,7 @@ void diff_flush(struct diff_options *options)
            int i, output_format = options->output_format;
            int separator = 0;
            int dirstat_by_line = 0;
    +       int textconv_hack;

            /*
             * Order: raw, stat, summary, patch
    @@ -4652,9 +4653,17 @@ void diff_flush(struct diff_options *options)
                    separator++;
            }

    +       /*
    +        * If there is any path that needs textconv and we haven't
    +        * found any change on paths that don't, we need to pass
    +        * them through textconv and see the textual difference.
    +        */
    +       textconv_hack = (DIFF_OPT_TST(options, DIFF_WITH_TEXTCONV) &&
    +                        !DIFF_OPT_TST(options, HAS_CHANGES));
    +
            if (output_format & DIFF_FORMAT_NO_OUTPUT &&
                DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
    -           DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
    +           (textconv_hack || DIFF_OPT_TST(options, DIFF_FROM_CONTENTS))) {
                    /*
                     * run diff_flush_patch for the exit status. setting
                     * options->file to /dev/null should be safe, because we

so that if we find some other change (e.g. diff_addremove() noticed
a path added or removed, or diff_change() noticed a changed blob
that does not have textconv defined), we do not have to inspect all
the paths to see if there is any change at all by going into the
body of this if() block.

And then finally, clean things up in a way similar to how
DIFF_FROM_CONTENTS codepath handles the HAS_CHANGES bit.

    diff --git a/diff.c b/diff.c
    index 4dfe660..7b318cc 100644
    --- a/diff.c
    +++ b/diff.c
    @@ -4709,7 +4718,7 @@ free_queue:
             * diff_addremove/diff_change does not set the bit when
             * DIFF_FROM_CONTENTS is in effect (e.g. with -w).
             */
    -       if (DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
    +       if (textconv_hack || DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
                    if (options->found_changes)
                            DIFF_OPT_SET(options, HAS_CHANGES);
                    else

By the way, I have a suspicion that the existing code immediately
after the context of this hunk is wrong to clear HAS_CHANGES bit
when options->found_changes is clear.  HAS_CHANGES bit should be
clear upon entry to this function, when DIFF_FROM_CONTENTS is in
use.

I also suspect that addremove() that refrains from setting
HAS_CHANGES when DIFF_FROM_CONTENTS is in effect is wrong.  No
matter what combination of -w or -b is used, an addition of a new
file, or a removal of an existing file, is a change.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git diff --exit-code does not honour textconv setting
  2016-04-05 23:16   ` Junio C Hamano
@ 2016-04-06  6:44     ` Michael J Gruber
  0 siblings, 0 replies; 4+ messages in thread
From: Michael J Gruber @ 2016-04-06  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Georg Pichler, git

Junio C Hamano venit, vidit, dixit 06.04.2016 01:16:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>>> The "test" command is used as it does not generate any output on stdout.
>>
>> "test" is a bit of a red herring here since it will receive commands.
>> But your example works even with "true" which ignores its commands and
>> produces no output.
> 
> Either sounds strange, as they exit without reading their input at
> all, so the other side of the pipe may get an write error it has to
> handle ;-)
> 
>> The description doesn't make it clear whether exit-code refers to
>> the actual diff (foo vs. bar) or to the diff after textconv (empty
>> vs. empty). In any case, "-b" should not make a difference for
>> your example.
>>
>> diff_flush() in diff.c has this piece of code:
>>
> 
> It is understandable that "-b" makes difference.  The actual
> short-cut happens a bit before the code you quoted.
> 
> 	if (output_format & DIFF_FORMAT_NO_OUTPUT &&
> 	    DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
> 	    DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
> 		/*
> 		 * run diff_flush_patch for the exit status. setting
> 		 * options->file to /dev/null should be safe, because we
> 		 * aren't supposed to produce any output anyway.
> 		 */
> 		if (options->close_file)
> 			fclose(options->file);
> 		options->file = fopen("/dev/null", "w");
> 		if (!options->file)
> 			die_errno("Could not open /dev/null");
> 		options->close_file = 1;
> 		for (i = 0; i < q->nr; i++) {
> 			struct diff_filepair *p = q->queue[i];
> 			if (check_pair_status(p))
> 				diff_flush_patch(p, options);
> 			if (options->found_changes)
> 				break;
> 		}
> 	}
> 
> When running without producing output but we need the exit status,
> unless DIFF_FROM_CONTENTS is set (e.g. "-b" in use), we do not run
> the code that would inspect the blob contents that would have
> produced the actual patch.  When DIFF_FROM_CONTENTS is set, we need
> to dig into the blob contents and the body of the above if() gets
> run only to trigger o->found_changes (e.g. in builtin_diff() that
> sets ecbdata.found_changesp to point at o->found_changes before
> calling into the xdiff machinery), but the output is discarded to
> /dev/null.
> 
> The textconv filter is an unfortunate beast, in that while some
> paths use one while others don't, so it won't be as simple as "-b"
> that flips DIFF_FROM_CONTENTS to say "We need to inspect blobs for
> ALL FILEPAIRS to see if there is any difference" if we want to keep
> the optimization as much as possible.
> 
> Without DIFF_FROM_CONTENTS set, any filepair that point at two
> different blobs that might end up to be the same after applying
> textconv will flip the o->found_changes bit on, and with
> EXIT_WITH_STATUS bit on, we have another optimization to skip
> checking the remainder of the filepairs.  So if you want to support
> textconv with QUICK, you would also need to disable that
> optimization by teaching diff_change() to refrain from setting
> HAS_CHANGES bit, i.e.
> 
>     diff --git a/diff.h b/diff.h
>     index 125447b..f6c0c07 100644
>     --- a/diff.h
>     +++ b/diff.h
>     @@ -69,7 +69,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
>      #define DIFF_OPT_FIND_COPIES_HARDER  (1 <<  6)
>      #define DIFF_OPT_FOLLOW_RENAMES      (1 <<  7)
>      #define DIFF_OPT_RENAME_EMPTY        (1 <<  8)
>     -/* (1 <<  9) unused */
>     +#define DIFF_OPT_WITH_TEXTCONV       (1 <<  9)
>      #define DIFF_OPT_HAS_CHANGES         (1 << 10)
>      #define DIFF_OPT_QUICK               (1 << 11)
>      #define DIFF_OPT_NO_INDEX            (1 << 12)
> 
>     diff --git a/diff.c b/diff.c
>     index 4dfe660..0016ad2 100644
>     --- a/diff.c
>     +++ b/diff.c
>     @@ -5018,6 +5018,11 @@ void diff_change(struct diff_options *options,
>             two->dirty_submodule = new_dirty_submodule;
>             p = diff_queue(&diff_queued_diff, one, two);
> 
>     +       if (either path one or path two has textconv defined) {
>     +               DIFF_OPT_SET(options, DIFF_WITH_TEXTCONV);
>     +               return;
>     +       }
>     +
>             if (DIFF_OPT_TST(options, DIFF_FROM_CONTENTS))
>                     return;
> 
> And then in order to keep the optimization, add this to the above
> codepath:
> 
>     diff --git a/diff.c b/diff.c
>     index 4dfe660..7b318cc 100644
>     --- a/diff.c
>     +++ b/diff.c
>     @@ -4598,6 +4598,7 @@ void diff_flush(struct diff_options *options)
>             int i, output_format = options->output_format;
>             int separator = 0;
>             int dirstat_by_line = 0;
>     +       int textconv_hack;
> 
>             /*
>              * Order: raw, stat, summary, patch
>     @@ -4652,9 +4653,17 @@ void diff_flush(struct diff_options *options)
>                     separator++;
>             }
> 
>     +       /*
>     +        * If there is any path that needs textconv and we haven't
>     +        * found any change on paths that don't, we need to pass
>     +        * them through textconv and see the textual difference.
>     +        */
>     +       textconv_hack = (DIFF_OPT_TST(options, DIFF_WITH_TEXTCONV) &&
>     +                        !DIFF_OPT_TST(options, HAS_CHANGES));
>     +
>             if (output_format & DIFF_FORMAT_NO_OUTPUT &&
>                 DIFF_OPT_TST(options, EXIT_WITH_STATUS) &&
>     -           DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
>     +           (textconv_hack || DIFF_OPT_TST(options, DIFF_FROM_CONTENTS))) {
>                     /*
>                      * run diff_flush_patch for the exit status. setting
>                      * options->file to /dev/null should be safe, because we
> 
> so that if we find some other change (e.g. diff_addremove() noticed
> a path added or removed, or diff_change() noticed a changed blob
> that does not have textconv defined), we do not have to inspect all
> the paths to see if there is any change at all by going into the
> body of this if() block.
> 
> And then finally, clean things up in a way similar to how
> DIFF_FROM_CONTENTS codepath handles the HAS_CHANGES bit.
> 
>     diff --git a/diff.c b/diff.c
>     index 4dfe660..7b318cc 100644
>     --- a/diff.c
>     +++ b/diff.c
>     @@ -4709,7 +4718,7 @@ free_queue:
>              * diff_addremove/diff_change does not set the bit when
>              * DIFF_FROM_CONTENTS is in effect (e.g. with -w).
>              */
>     -       if (DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
>     +       if (textconv_hack || DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) {
>                     if (options->found_changes)
>                             DIFF_OPT_SET(options, HAS_CHANGES);
>                     else
> 
> By the way, I have a suspicion that the existing code immediately
> after the context of this hunk is wrong to clear HAS_CHANGES bit
> when options->found_changes is clear.  HAS_CHANGES bit should be
> clear upon entry to this function, when DIFF_FROM_CONTENTS is in
> use.
> 
> I also suspect that addremove() that refrains from setting
> HAS_CHANGES when DIFF_FROM_CONTENTS is in effect is wrong.  No
> matter what combination of -w or -b is used, an addition of a new
> file, or a removal of an existing file, is a change.
> 

We certainly share that suspicion.

Even with your clear explanations, I still can't wrap my brain
completely around that flow of different cases. But in any case:

--textconv is about presenting blobs (and, as a consequence diffs) to
the user.

--name-status and the like are about differences between the actual
objects (irrespective of any possibly coinciding textual representation)

Shouldn't "--exit-code" rather be in the "name-status ballpark" than in
the "textconv ballpark", i.e. signal that there are changes irrespective
of representation?

On the other hand, if "--exit-code" should be in the "textconv ballbark"
then, e.g., it should return 0 resp. 1 on a file with mere white space
changes when run with resp. without "-w", which means we can never use
optimizations with --exit-code, or more correctly: optimization (quick)
can identify "equal" (equal blobs produce equal textual representations)
but not "unequal". textconv filters are functions, but not necessarily
injective. So we could keep "quick" in the presumably more common case
of equal blobs.

Michael

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-06  6:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-20 12:43 git diff --exit-code does not honour textconv setting Georg Pichler
2016-04-05 14:22 ` Michael J Gruber
2016-04-05 23:16   ` Junio C Hamano
2016-04-06  6:44     ` Michael J Gruber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).