* [PATCH] diff: fix up SHA-1 abbreviations outside of repository
@ 2016-12-04 19:47 Jack Bates
2016-12-05 6:55 ` Jeff King
2016-12-05 7:19 ` Junio C Hamano
0 siblings, 2 replies; 6+ messages in thread
From: Jack Bates @ 2016-12-04 19:47 UTC (permalink / raw)
To: git; +Cc: Jeff King, Jack Bates
The three cases where "git diff" operates outside of a repository are 1)
when we run it outside of a repository, 2) when one of the files we're
comparing is outside of the repository we're in, and 3) the --no-index
option. Commit 4f03666 ("diff: handle sha1 abbreviations outside of
repository", 2016-10-20) only worked in the first case.
---
builtin/diff.c | 4 +++-
t/t4063-diff-no-index-abbrev.sh | 50 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+), 1 deletion(-)
create mode 100755 t/t4063-diff-no-index-abbrev.sh
diff --git a/builtin/diff.c b/builtin/diff.c
index 7f91f6d..ec7c432 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -342,9 +342,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
"--no-index" : "[--no-index]");
}
- if (no_index)
+ if (no_index) {
/* If this is a no-index diff, just run it and exit there. */
+ startup_info->have_repository = 0;
diff_no_index(&rev, argc, argv);
+ }
/* Otherwise, we are doing the usual "git" diff */
rev.diffopt.skip_stat_unmatch = !!diff_auto_refresh_index;
diff --git a/t/t4063-diff-no-index-abbrev.sh b/t/t4063-diff-no-index-abbrev.sh
new file mode 100755
index 0000000..d1d6302
--- /dev/null
+++ b/t/t4063-diff-no-index-abbrev.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='don'\'' peek into .git when operating outside of a repository
+
+When abbreviating SHA-1s, if another object in the repository has the
+same abbreviation, we normally lengthen the abbreviation until it'\''s
+unique. Commit 4f03666 ("diff: handle sha1 abbreviations outside of
+repository", 2016-10-20) addressed the case of abbreviating SHA-1s
+outside the context of a repository. In that case we shouldn'\''t peek
+into a .git directory to make an abbreviation unique.
+
+To check that we don'\''t, create an blob with a SHA-1 that starts with
+0000. (Outside of a repository, SHA-1s are all zeros.) Then make an
+abbreviation and check that Git doesn'\''t lengthen it.
+
+The three cases where "git diff" operates outside of a repository are
+1) when we run it outside of a repository, 2) when one of the files
+we'\''re comparing is outside of the repository we'\''re in,
+and 3) the --no-index option.
+'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ echo 1 >a &&
+ echo 2 >b &&
+ git init repo &&
+ (
+ cd repo &&
+
+ # Create a blob
+ # 00002e907f44c3881822c473d8842405cfd96362
+ echo 119132 >collision &&
+ git add collision
+ )
+'
+
+cat >expect <<EOF
+:100644 100644 0000... 0000... M ../a
+EOF
+
+test_expect_success 'don'\''t peek into .git when operating outside of a repository' '
+ (
+ cd repo &&
+ test_must_fail git diff --raw --abbrev=4 ../a ../b >actual &&
+ test_cmp ../expect actual
+ )
+'
+
+test_done
--
2.10.2
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository 2016-12-04 19:47 [PATCH] diff: fix up SHA-1 abbreviations outside of repository Jack Bates @ 2016-12-05 6:55 ` Jeff King 2016-12-05 7:19 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Jeff King @ 2016-12-05 6:55 UTC (permalink / raw) To: Jack Bates; +Cc: git, Jack Bates On Sun, Dec 04, 2016 at 12:47:47PM -0700, Jack Bates wrote: > The three cases where "git diff" operates outside of a repository are 1) > when we run it outside of a repository, 2) when one of the files we're > comparing is outside of the repository we're in, and 3) the --no-index > option. Commit 4f03666 ("diff: handle sha1 abbreviations outside of > repository", 2016-10-20) only worked in the first case. You didn't define "worked" here. From looking at your patch, it looks like we look look in the object database for abbreviations in the --no-index case, but you think we shouldn't. I'm not sure I agree. The "--no-index" option asks git not to treat the arguments as pathspecs, but instead as two filesystem files to diff. But should it ignore the repository entirely? One use case is to just ask for the diff of two files: git diff --no-index foo bar which may or may not be tracked in the repository. For abbreviations, I doubt that people would care much, but see below. > diff --git a/builtin/diff.c b/builtin/diff.c > index 7f91f6d..ec7c432 100644 > --- a/builtin/diff.c > +++ b/builtin/diff.c > @@ -342,9 +342,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > "--no-index" : "[--no-index]"); > > } > - if (no_index) > + if (no_index) { > /* If this is a no-index diff, just run it and exit there. */ > + startup_info->have_repository = 0; > diff_no_index(&rev, argc, argv); > + } This reset of have_repository would affect more than just abbreviations. We may also look at the repository for attributes, config, etc. For instance, right now: echo "*.pdf diff=pdf" >.git/info/attributes git config diff.pdf.textconv pdftotext git diff --no-index --textconv foo.pdf bar.pdf will show a text diff of the two files, but wouldn't after your patch. (I actually think even needing to say --textconv is actually a bug; normally the diff porcelain defaults to --textconv, but that setup is skipped in the no-index case). > +To check that we don'\''t, create an blob with a SHA-1 that starts with > +0000. (Outside of a repository, SHA-1s are all zeros.) Then make an > +abbreviation and check that Git doesn'\''t lengthen it. That's not always true. If we actually diff the file contents, the sha1s are correct (which you can see in the default --patch output). It's only in the --raw case that the sha1 is all zeroes. I'm not 100% sure that isn't a bug. In a normal git diff, we can show the sha1s without actually looking at the file content, because we get them from either the containing tree or the index. In a --no-index diff, we create the diff_filespec structs without a valid sha1. But we can't get away from reading the files eventually. The magic happens in diffcore_skip_stat_unmatch(), which actually does a series of checks, culminating in a size-check and then a comparison of the contents. I suppose it _is_ faster than computing the actual sha1, because we can sometimes show "modified" by only looking at the size, or the first few bytes. But any time two files are identical, we pay the full cost. So if you're comparing two hierarchies, say, like: git diff --raw one/ two/ it's probably not much more expensive to compare with the full sha1s, because we're already reading the entire contents of every file that's the same. So I dunno. It kind of surprised me that anybody was using "--no-index --raw" in the first place, so maybe there is some use case I'm missing. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository 2016-12-04 19:47 [PATCH] diff: fix up SHA-1 abbreviations outside of repository Jack Bates 2016-12-05 6:55 ` Jeff King @ 2016-12-05 7:19 ` Junio C Hamano 2016-12-05 7:26 ` Jeff King 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2016-12-05 7:19 UTC (permalink / raw) To: Jack Bates; +Cc: git, Jeff King, Jack Bates Jack Bates <bk874k@nottheoilrig.com> writes: > The three cases where "git diff" operates outside of a repository are 1) > when we run it outside of a repository, 2) when one of the files we're > comparing is outside of the repository we're in, and 3) the --no-index > option. Commit 4f03666 ("diff: handle sha1 abbreviations outside of > repository", 2016-10-20) only worked in the first case. > --- > builtin/diff.c | 4 +++- > t/t4063-diff-no-index-abbrev.sh | 50 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 53 insertions(+), 1 deletion(-) > create mode 100755 t/t4063-diff-no-index-abbrev.sh > > diff --git a/builtin/diff.c b/builtin/diff.c > index 7f91f6d..ec7c432 100644 > --- a/builtin/diff.c > +++ b/builtin/diff.c > @@ -342,9 +342,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix) > "--no-index" : "[--no-index]"); > > } > - if (no_index) > + if (no_index) { > /* If this is a no-index diff, just run it and exit there. */ > + startup_info->have_repository = 0; > diff_no_index(&rev, argc, argv); > + } This kind of change makes me nervous (partly because I am not seeing the whole code but only this part of the patch). Some code may react to "have_repository" being zero and do the right thing (which I think is what you are using from your previous "we did one of the three cases" change here), but the codepath that led to "have_repository" being set to non-zero previously must have done a lot more than just flipping that field to non-zero, and setting zero to this field alone would not "undo" what it did. I wonder if we're better off if we made sure that diff_no_index() works the same way regardless of the value of "have_repository" field? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository 2016-12-05 7:19 ` Junio C Hamano @ 2016-12-05 7:26 ` Jeff King 2016-12-05 17:45 ` Jack Bates 2016-12-05 22:43 ` Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Jeff King @ 2016-12-05 7:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jack Bates, git, Jack Bates On Sun, Dec 04, 2016 at 11:19:46PM -0800, Junio C Hamano wrote: > > - if (no_index) > > + if (no_index) { > > /* If this is a no-index diff, just run it and exit there. */ > > + startup_info->have_repository = 0; > > diff_no_index(&rev, argc, argv); > > + } > > This kind of change makes me nervous (partly because I am not seeing > the whole code but only this part of the patch). > > Some code may react to "have_repository" being zero and do the right > thing (which I think is what you are using from your previous "we > did one of the three cases" change here), but the codepath that led > to "have_repository" being set to non-zero previously must have done > a lot more than just flipping that field to non-zero, and setting > zero to this field alone would not "undo" what it did. I _think_ it's OK because the only substantive change would be the chdir() to the top of the working tree. But that information is carried through by revs->prefix, which we act on regardless of the value of startup_info->have_repository when we call prefix_filename(). I agree that it may be an accident waiting to happen, though, as soon as some buried sub-function needs to care about the distinction. > I wonder if we're better off if we made sure that diff_no_index() > works the same way regardless of the value of "have_repository" > field? If you mean adding a diffopt flag like "just abbreviate everything to FALLBACK_DEFAULT_ABBREV even if we're in a repository", and then setting that in diff_no_index(), I agree that is a lot cleaner. I'm still not 100% convinced that it's actually the correct behavior, but at least doing a more contained version wouldn't take away other functionality like reading config. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository 2016-12-05 7:26 ` Jeff King @ 2016-12-05 17:45 ` Jack Bates 2016-12-05 22:43 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Jack Bates @ 2016-12-05 17:45 UTC (permalink / raw) To: Jeff King, Junio C Hamano; +Cc: git On 05/12/16 12:26 AM, Jeff King wrote: > On Sun, Dec 04, 2016 at 11:19:46PM -0800, Junio C Hamano wrote: >>> - if (no_index) >>> + if (no_index) { >>> /* If this is a no-index diff, just run it and exit there. */ >>> + startup_info->have_repository = 0; >>> diff_no_index(&rev, argc, argv); >>> + } >> >> This kind of change makes me nervous (partly because I am not seeing >> the whole code but only this part of the patch). >> >> Some code may react to "have_repository" being zero and do the right >> thing (which I think is what you are using from your previous "we >> did one of the three cases" change here), but the codepath that led >> to "have_repository" being set to non-zero previously must have done >> a lot more than just flipping that field to non-zero, and setting >> zero to this field alone would not "undo" what it did. > > I _think_ it's OK because the only substantive change would be the > chdir() to the top of the working tree. But that information is carried > through by revs->prefix, which we act on regardless of the value of > startup_info->have_repository when we call prefix_filename(). > > I agree that it may be an accident waiting to happen, though, as soon as > some buried sub-function needs to care about the distinction. > >> I wonder if we're better off if we made sure that diff_no_index() >> works the same way regardless of the value of "have_repository" >> field? > > If you mean adding a diffopt flag like "just abbreviate everything to > FALLBACK_DEFAULT_ABBREV even if we're in a repository", and then setting > that in diff_no_index(), I agree that is a lot cleaner. > > I'm still not 100% convinced that it's actually the correct behavior, > but at least doing a more contained version wouldn't take away other > functionality like reading config. I don't have a strong reason for wanting these three cases to behave identically, I was merely surprised that they don't. I think you expected them to behave the same as well? I'll withdraw this patch. Conceptually I do think of "git diff" as having two separate modes, "in repository" and "out of repository", with the --no-index option forcing the "out of repository" mode. But maybe there are good reasons why this isn't accurate, or maybe it doesn't matter that it's not 100% accurate. In summary, currently all of the three cases are "no index" but only the first case doesn't "have repository". Thank you for your thoughtful feedback! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository 2016-12-05 7:26 ` Jeff King 2016-12-05 17:45 ` Jack Bates @ 2016-12-05 22:43 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2016-12-05 22:43 UTC (permalink / raw) To: Jeff King; +Cc: Jack Bates, git, Jack Bates Jeff King <peff@peff.net> writes: > I agree that it may be an accident waiting to happen, though, as soon as > some buried sub-function needs to care about the distinction. Yes to that. >> I wonder if we're better off if we made sure that diff_no_index() >> works the same way regardless of the value of "have_repository" >> field? > > If you mean adding a diffopt flag like "just abbreviate everything to > FALLBACK_DEFAULT_ABBREV even if we're in a repository", and then setting > that in diff_no_index(), I agree that is a lot cleaner. I am not sure if that is what I meant (I no longer sure what exactly I meant to say there TBH), but this is probably not limited to the default abbrev length aka core.abbrev configuration. Don't we have other configuration settings we may read from $HOME/.gitconfig (and possibly per-repository .git/config, if we did discovery but were explicitly given "--no-index") that want to affect the behaviour of the command? I guess what I wanted, with "the same way", to see happen was that "have_repository" should be only controling how and from what files the configuration is read, and the behaviour of the command should be controlled by the values read from the configuration after that. Specifically, even if we were running with "--no-index", if we know we have access to the current repository discovered by setting it up gently, I do not think it is bad to ask find_unique_abbrev() to come up with an appropriate abbreviation. So the fact that patch in question has to flip the have_repository bit off, if it is done in order to affect what diff_abbrev_oid() does, smells quite fishy from that point of view. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-12-05 22:43 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-04 19:47 [PATCH] diff: fix up SHA-1 abbreviations outside of repository Jack Bates 2016-12-05 6:55 ` Jeff King 2016-12-05 7:19 ` Junio C Hamano 2016-12-05 7:26 ` Jeff King 2016-12-05 17:45 ` Jack Bates 2016-12-05 22:43 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox