From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: John Cai via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, John Cai <johncai86@gmail.com>
Subject: Re: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm
Date: Mon, 06 Feb 2023 17:39:30 +0100 [thread overview]
Message-ID: <230206.865yce7n1w.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <8e73793b0db3e84366a9c6441cc0fdc04f9614a5.1675568781.git.gitgitgadget@gmail.com>
On Sun, Feb 05 2023, John Cai via GitGitGadget wrote:
> From: John Cai <johncai86@gmail.com>
> [...]
> +
> + if (!o->xdl_opts_command_line) {
> + static struct attr_check *check;
> + const char *one_diff_algo;
> + const char *two_diff_algo;
> +
> + check = attr_check_alloc();
> + attr_check_append(check, git_attr("diff-algorithm"));
> +
> + git_check_attr(the_repository->index, NULL, one->path, check);
> + one_diff_algo = check->items[0].value;
> + git_check_attr(the_repository->index, NULL, two->path, check);
> + two_diff_algo = check->items[0].value;
> +
> + if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) &&
> + !strcmp(one_diff_algo, two_diff_algo))
> + set_diff_algorithm(o, one_diff_algo);
> +
> + attr_check_free(check);
This is a bit nitpicky, but I for one would find this much easier to
read with some shorter variables, here just with "a" rather than
"one_diff_algo", "b" instead of "two_diff_algo", and splitting
"the_repository->index" into "istate" (untested):
+ if (!o->xdl_opts_command_line) {
+ static struct attr_check *check;
+ const char *a;
+ const char *b;
+ struct index_state *istate = the_repository->index;
+
+ check = attr_check_alloc();
+ attr_check_append(check, git_attr("diff-algorithm"));
+
+ git_check_attr(istate, NULL, one->path, check);
+ a = check->items[0].value;
+ git_check_attr(istate, NULL, two->path, check);
+ b = check->items[0].value;
+
+ if (!ATTR_UNSET(a) && !ATTR_UNSET(b) && !strcmp(a, b))
+ set_diff_algorithm(o, a);
+
+ attr_check_free(check);
+ }
That also nicely keeps the line length shorter.
> @@ -333,6 +333,8 @@ struct diff_options {
> int prefix_length;
> const char *stat_sep;
> int xdl_opts;
> + /* If xdl_opts has been set via the command line. */
> + int xdl_opts_command_line;
>
> /* see Documentation/diff-options.txt */
> char **anchors;
> diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh
> index 8d1e408bb58..630c98ea65a 100644
> --- a/t/lib-diff-alternative.sh
> +++ b/t/lib-diff-alternative.sh
> @@ -107,8 +107,27 @@ EOF
>
> STRATEGY=$1
>
> + test_expect_success "$STRATEGY diff from attributes" '
> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> + test_must_fail git diff --no-index file1 file2 > output &&
> + test_cmp expect output
> + '
> +
> test_expect_success "$STRATEGY diff" '
> - test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output &&
> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
Nit: The usual style is ">output", not "> output".
> + test_cmp expect output
> + '
> +
> + test_expect_success "$STRATEGY diff command line precedence before attributes" '
> + echo "file* diff-algorithm=meyers" >.gitattributes &&
> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> + test_cmp expect output
> + '
> +
> + test_expect_success "$STRATEGY diff attributes precedence before config" '
> + git config diff.algorithm default &&
> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output &&
> test_cmp expect output
> '
>
> @@ -166,5 +185,11 @@ EOF
> test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output &&
> test_cmp expect output
> '
> +
> + test_expect_success "$STRATEGY diff from attributes" '
> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes &&
> + test_must_fail git diff --no-index uniq1 uniq2 > output &&
> + test_cmp expect output
> + '
> }
For some non-nitpicking, I do worry about exposing this as a DoS vector,
e.g. here's a diff between two distant points in git.git with the
various algorithms:
$ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0'
Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0
Time (abs ≡): 42.121 s [User: 41.879 s, System: 0.144 s]
Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0
Time (abs ≡): 35.634 s [User: 35.473 s, System: 0.160 s]
Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0
Time (abs ≡): 46.912 s [User: 46.657 s, System: 0.228 s]
Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0
Time (abs ≡): 33.233 s [User: 33.072 s, System: 0.160 s]
Summary
'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran
1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0'
1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0'
1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0'
Now, all of those are very slow overall, but some much more than
others. I seem to recall that the non-default ones also had some
pathological cases.
Another thing to think about is that we've so far considered the diff
algorithm to be purely about presentation, with some notable exceptions
such as "patch-id".
I've advocated for us getting to the point of having an in-repo
.gitconfig or .gitattributes before with a whitelist of settings like
diff.context for certain paths, or a diff.orderFile.
But those seem easy to promise future behavior for, v.s. an entire diff
algorithm (which we of course had before, but now we'd have it in
repository data).
Maybe that's not a distinction worth worrying about, just putting that
out there.
I think if others are concerned about the above something that would
neatly side-step those is to have it opt-in via the .git/config somehow,
similar to e.g. how you can commit *.gpg content, put this in
.gitattributes:
*.gpg diff=gpg
But not have it do anything until this is in the repo's .git/config (or
similar):
[diff "gpg"]
textconv = gpg --no-tty --decrypt
For that you could still keep the exact .gitattributes format you have
here, i.e.:
file* diff-algorithm=$STRATEGY
But we to pick it up we'd need either:
[diff-algorithm]
histogram = myers
Or:
[diff-algorithm "histogram"]
allow = true
The former form being one that would allow you to map the .gitattributes
of the repo (but maybe that would be redundant to
.git/info/attributes)...
next prev parent reply other threads:[~2023-02-06 16:53 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-05 3:46 [PATCH 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-05 3:46 ` [PATCH 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-06 16:20 ` Phillip Wood
2023-02-05 3:46 ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-05 17:50 ` Eric Sunshine
2023-02-06 13:10 ` John Cai
2023-02-06 16:27 ` Phillip Wood
2023-02-06 18:14 ` Eric Sunshine
2023-02-06 19:50 ` John Cai
2023-02-09 8:26 ` Elijah Newren
2023-02-09 10:31 ` "bad" diffs (was: [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm) Ævar Arnfjörð Bjarmason
2023-02-09 16:37 ` [PATCH 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai
2023-02-06 16:39 ` Ævar Arnfjörð Bjarmason [this message]
2023-02-06 20:37 ` John Cai
2023-02-07 14:55 ` Phillip Wood
2023-02-07 17:00 ` John Cai
2023-02-09 9:09 ` Elijah Newren
2023-02-09 14:44 ` Phillip Wood
2023-02-10 9:57 ` Elijah Newren
2023-02-11 17:39 ` Phillip Wood
2023-02-11 1:59 ` Jeff King
2023-02-15 2:35 ` Elijah Newren
2023-02-15 4:21 ` Jeff King
2023-02-15 5:20 ` Junio C Hamano
2023-02-15 14:44 ` Phillip Wood
2023-02-15 15:00 ` Jeff King
2023-02-07 17:27 ` Ævar Arnfjörð Bjarmason
2023-02-15 14:47 ` Phillip Wood
2023-02-09 8:44 ` Elijah Newren
2023-02-14 21:16 ` John Cai
2023-02-15 3:41 ` Elijah Newren
2023-02-09 7:50 ` Elijah Newren
2023-02-09 9:41 ` Ævar Arnfjörð Bjarmason
2023-02-11 2:04 ` Jeff King
2023-02-07 17:56 ` Jeff King
2023-02-07 20:18 ` Ævar Arnfjörð Bjarmason
2023-02-07 20:47 ` Junio C Hamano
2023-02-07 21:05 ` Ævar Arnfjörð Bjarmason
2023-02-07 21:28 ` Junio C Hamano
2023-02-07 21:44 ` Ævar Arnfjörð Bjarmason
2023-02-09 16:34 ` John Cai
2023-02-11 1:39 ` Jeff King
2023-02-14 21:40 ` [PATCH v2 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-14 21:40 ` [PATCH v2 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-15 2:38 ` Junio C Hamano
2023-02-15 23:34 ` John Cai
2023-02-15 23:42 ` Junio C Hamano
2023-02-16 2:14 ` Jeff King
2023-02-16 2:57 ` Junio C Hamano
2023-02-16 20:34 ` John Cai
2023-02-14 21:40 ` [PATCH v2 2/2] diff: teach diff to read gitattribute diff-algorithm John Cai via GitGitGadget
2023-02-15 2:56 ` Junio C Hamano
2023-02-15 3:20 ` Junio C Hamano
2023-02-16 20:37 ` John Cai
2023-02-17 20:21 ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes John Cai via GitGitGadget
2023-02-17 20:21 ` [PATCH v3 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-17 21:27 ` Junio C Hamano
2023-02-18 1:36 ` Elijah Newren
2023-02-17 20:21 ` [PATCH v3 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-17 21:50 ` Junio C Hamano
2023-02-18 2:56 ` Elijah Newren
2023-02-20 15:32 ` John Cai
2023-02-20 16:21 ` Elijah Newren
2023-02-20 16:49 ` John Cai
2023-02-20 17:32 ` Elijah Newren
2023-02-20 20:53 ` John Cai
2023-02-22 19:47 ` Jeff King
2023-02-24 17:44 ` John Cai
2023-02-18 1:16 ` [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes Elijah Newren
2023-02-20 13:37 ` John Cai
2023-02-20 21:04 ` [PATCH v4 " John Cai via GitGitGadget
2023-02-20 21:04 ` [PATCH v4 1/2] diff: consolidate diff algorithm option parsing John Cai via GitGitGadget
2023-02-20 21:04 ` [PATCH v4 2/2] diff: teach diff to read algorithm from diff driver John Cai via GitGitGadget
2023-02-21 17:34 ` [PATCH v4 0/2] Teach diff to honor diff algorithms set through git attributes Junio C Hamano
2023-02-21 18:05 ` Elijah Newren
2023-02-21 18:51 ` Junio C Hamano
2023-02-21 19:36 ` John Cai
2023-02-21 20:16 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=230206.865yce7n1w.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=johncai86@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).