* Bug: Git sees branch as valid commit ref and works; should fail @ 2024-08-05 15:07 Matt Thompson 2024-08-05 16:35 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Matt Thompson @ 2024-08-05 15:07 UTC (permalink / raw) To: git Thank you for filling out a Git bug report! Please answer the following questions to help us understand your issue. What did you do before the bug happened? (Steps to reproduce your issue) $ git clone https://github.com/GEOS-ESM/GFDL_atmos_cubed_sphere.git fvdycore $ cd fvdycore $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 What did you expect to happen? (Expected behavior) I expected a failure as there is no branch named 'bugfix/mathomp4/trivial-ci-commit-gcc14' in the repository. What happened instead? (Actual behavior) $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 Note: switching to 'bugfix/mathomp4/trivial-ci-commit-gcc14'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c <new-branch-name> Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at cc14d30 removed an unused module use in fv_sg What's different between what you expected and what actually happened? Well, git somehow saw 'bugfix/mathomp4/trivial-ci-commit-gcc14' as a valid commit reference and checked it out. I was expecting it to...not. Anything else you want to add: You can see similar behavior with git-switch, though slightly different: $ git switch bugfix/mathomp4/trivial-ci-commit-gcc14 fatal: a branch is expected, got commit 'bugfix/mathomp4/trivial-ci-commit-gcc14' hint: If you want to detach HEAD at the commit, try again with the --detach option. $ git switch --detach bugfix/mathomp4/trivial-ci-commit-gcc14 HEAD is now at cc14d30 removed an unused module use in fv_sg So even git-switch sees 'bugfix/mathomp4/trivial-ci-commit-gcc14' as a commit reference. Also, I have duplicated this behavior on a SLES 15 Linux machine with git verison 2.45.0 and on another RHEL8 box with 2.45.2 Please review the rest of the bug report below. You can delete any lines you don't wish to share. [System Info] git version: git version 2.46.0 cpu: arm64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh feature: fsmonitor--daemon libcurl: 8.4.0 zlib: 1.2.12 uname: Darwin 23.6.0 Darwin Kernel Version 23.6.0: Fri Jul 5 17:53:24 PDT 2024; root:xnu-10063.141.1~2/RELEASE_ARM64_T6020 arm64 compiler info: clang: 15.0.0 (clang-1500.3.9.4) libc info: no libc information available $SHELL (typically, interactive shell): /bin/zsh [Enabled Hooks] not run from a git repository - no hooks to show -- Matt Thompson “The fact is, this is about us identifying what we do best and finding more ways of doing less of it better” -- Director of Better Anna Rampton ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: Git sees branch as valid commit ref and works; should fail 2024-08-05 15:07 Bug: Git sees branch as valid commit ref and works; should fail Matt Thompson @ 2024-08-05 16:35 ` Junio C Hamano 2024-08-05 16:58 ` Matt Thompson 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2024-08-05 16:35 UTC (permalink / raw) To: Matt Thompson; +Cc: git Matt Thompson <fortran@gmail.com> writes: > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) > > $ git clone https://github.com/GEOS-ESM/GFDL_atmos_cubed_sphere.git fvdycore > $ cd fvdycore > $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 > > What did you expect to happen? (Expected behavior) > > I expected a failure as there is no branch named > 'bugfix/mathomp4/trivial-ci-commit-gcc14' in the repository. If you run $ git for-each-ref | grep /bugfix/mathomp4/trivial-ci-commit-gcc14 I suspect you have a remote-tracking branch that matches the pattern. In such a case, "bugfix/mathomp4/trivial-ci-commit-gcc14" is a very valid way to refer to a commit to any Git command. It is handy to say "git show bugfix/mathomp4/trivial-ci-commit-gcc14", for example. And when "git checkout" is given a commit that is not a local branch, what it did below (by the way, thanks for giving a very accurate "Actual behavior" description in your report) is very much expected behaviour, including the part that it gave a message to advise how to work on a detached HEAD. > What happened instead? (Actual behavior) > > $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 > Note: switching to 'bugfix/mathomp4/trivial-ci-commit-gcc14'. > ... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: Git sees branch as valid commit ref and works; should fail 2024-08-05 16:35 ` Junio C Hamano @ 2024-08-05 16:58 ` Matt Thompson 2024-08-13 11:53 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Matt Thompson @ 2024-08-05 16:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Well, I tried the "for-each-ref" command and: $ git for-each-ref | grep /bugfix/mathomp4/trivial-ci-commit-gcc14 $ echo $? 1 Indeed, if I try a few other greps, there are no "trivial" or "gcc14": $ git for-each-ref | grep -i gcc14 $ echo $? 1 $ git for-each-ref | grep -i trivial $ echo $? 1 But, I did some experimenting and I found that "ci-gcc14" does trigger...something: $ git for-each-ref | grep -i ci-gcc14 mathomp4@gslwl2023080107 ~/GitBugReport/fvdycore geos/main $ echo $? 0 But I'm confused as to how the grep returned a status of 0 but didn't print anything? So it said "Yes I see this string" but it actually didn't? And it can switch to it: $ git switch ci-gcc14 fatal: a branch is expected, got commit 'ci-gcc14' hint: If you want to detach HEAD at the commit, try again with the --detach option. mathomp4@gslwl2023080107 ~/GitBugReport/fvdycore geos/main $ git switch --detach ci-gcc14 HEAD is now at cc14d30 removed an unused module use in fv_sg On Mon, Aug 5, 2024 at 12:35 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matt Thompson <fortran@gmail.com> writes: > > > Thank you for filling out a Git bug report! > > Please answer the following questions to help us understand your issue. > > > > What did you do before the bug happened? (Steps to reproduce your issue) > > > > $ git clone https://github.com/GEOS-ESM/GFDL_atmos_cubed_sphere.git fvdycore > > $ cd fvdycore > > $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 > > > > What did you expect to happen? (Expected behavior) > > > > I expected a failure as there is no branch named > > 'bugfix/mathomp4/trivial-ci-commit-gcc14' in the repository. > > If you run > > $ git for-each-ref | grep /bugfix/mathomp4/trivial-ci-commit-gcc14 > > I suspect you have a remote-tracking branch that matches the > pattern. In such a case, "bugfix/mathomp4/trivial-ci-commit-gcc14" > is a very valid way to refer to a commit to any Git command. It is > handy to say "git show bugfix/mathomp4/trivial-ci-commit-gcc14", for > example. > > And when "git checkout" is given a commit that is not a local > branch, what it did below (by the way, thanks for giving a very > accurate "Actual behavior" description in your report) is very much > expected behaviour, including the part that it gave a message to > advise how to work on a detached HEAD. > > > What happened instead? (Actual behavior) > > > > $ git checkout bugfix/mathomp4/trivial-ci-commit-gcc14 > > Note: switching to 'bugfix/mathomp4/trivial-ci-commit-gcc14'. > > ... > -- Matt Thompson “The fact is, this is about us identifying what we do best and finding more ways of doing less of it better” -- Director of Better Anna Rampton ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: Git sees branch as valid commit ref and works; should fail 2024-08-05 16:58 ` Matt Thompson @ 2024-08-13 11:53 ` Jeff King 2024-08-13 12:45 ` [PATCH] get_oid(): enforce minimum length for "-g<hex>" names Jeff King 2024-08-13 15:41 ` Bug: Git sees branch as valid commit ref and works; should fail Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: Jeff King @ 2024-08-13 11:53 UTC (permalink / raw) To: Matt Thompson; +Cc: Junio C Hamano, git On Mon, Aug 05, 2024 at 12:58:57PM -0400, Matt Thompson wrote: > $ git for-each-ref | grep /bugfix/mathomp4/trivial-ci-commit-gcc14 > $ echo $? > 1 Hmm, this is quite an interesting case. Let's simplify the test case by just resolving the name instead of using checkout: $ git clone https://github.com/GEOS-ESM/GFDL_atmos_cubed_sphere.git fvdycore [...] $ cd fvdycore $ git rev-parse bugfix/mathomp4/trivial-ci-commit-gcc14 cc14d30e332cd06327fe5a81ed26c24140882f42 That narrows it down to the name resolution code. If I step through it in a debugger, the culprit seems to be get_describe_name(). I think it's taking the "-gcc14" on the end as a git-describe. E.g., in a clean repo: $ git init $ git commit --allow-empty -m foo $ git tag -m mytag mytag $ git commit --allow-empty -m bar $ git describe mytag-1-g1a4fb75 So git describe will append "-g<hex_hash>", and we likewise accept that style during name resolution. But in this case, "cc14" happens to be a valid hex (and you'll note that it matches the start of the one we found!). In other words, it's a false positive in the name resolver looking for "describe" names. We'd prefer a real ref of that full name, I think, but since there isn't one, we prefer the describe resolution rather than treating it as a path. I can think of a few ways to make this better: - we ignore everything before the "-g<hex>" part entirely. Generally this should be the name of a tag or at least some ref, so we could perhaps verify that. But part of the point of sticking the hash in the name is that you might have gotten the name from another source, and your local one might not have the same tag. So that might be a bad direction. - the hash is abbreviated in the usual way, making it as short as possible while remaining unambiguous. But unless the user goes out of their way to set core.abbrev to something smaller, the minimum is always 7. So perhaps get_describe_name() should be a bit more picky about about that? That doesn't fix the problem, but it makes it a lot less likely to trigger in the real world. And anybody who really does somehow end up with a describe name with 4 characters can always pick the hash out of the string themselves (or just set core.abbrev in their local repo to be more permissive). I think the second one is something like this: diff --git a/object-name.c b/object-name.c index 527b853ac4..a90338aa62 100644 --- a/object-name.c +++ b/object-name.c @@ -1276,6 +1276,10 @@ static int get_describe_name(struct repository *r, if (ch == 'g' && cp[-1] == '-') { cp++; len -= cp - name; + if (len < (default_abbrev < 0 ? + FALLBACK_DEFAULT_ABBREV : + default_abbrev)) + return -1; return get_short_oid(r, cp, len, oid, flags); } -Peff ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] get_oid(): enforce minimum length for "-g<hex>" names 2024-08-13 11:53 ` Jeff King @ 2024-08-13 12:45 ` Jeff King 2024-08-13 13:02 ` Jeff King 2024-08-13 15:41 ` Bug: Git sees branch as valid commit ref and works; should fail Junio C Hamano 1 sibling, 1 reply; 7+ messages in thread From: Jeff King @ 2024-08-13 12:45 UTC (permalink / raw) To: Matt Thompson; +Cc: Junio C Hamano, git On Tue, Aug 13, 2024 at 07:53:58AM -0400, Jeff King wrote: > - the hash is abbreviated in the usual way, making it as short as > possible while remaining unambiguous. But unless the user goes out > of their way to set core.abbrev to something smaller, the minimum is > always 7. So perhaps get_describe_name() should be a bit more picky > about about that? > > That doesn't fix the problem, but it makes it a lot less likely to > trigger in the real world. And anybody who really does somehow end > up with a describe name with 4 characters can always pick the hash > out of the string themselves (or just set core.abbrev in their local > repo to be more permissive). > > I think the second one is something like this: > > diff --git a/object-name.c b/object-name.c > index 527b853ac4..a90338aa62 100644 > --- a/object-name.c > +++ b/object-name.c > @@ -1276,6 +1276,10 @@ static int get_describe_name(struct repository *r, > if (ch == 'g' && cp[-1] == '-') { > cp++; > len -= cp - name; > + if (len < (default_abbrev < 0 ? > + FALLBACK_DEFAULT_ABBREV : > + default_abbrev)) > + return -1; > return get_short_oid(r, > cp, len, oid, flags); > } After thinking on this for a bit, it seems like the correct direction. So here it is written in a slightly more readable way, and with a commit message and tests. -- >8 -- Subject: [PATCH] get_oid(): enforce minimum length for "-g<hex>" names Since 7dd45e15c2 (sha1_name.c: understand "describe" output as a valid object name, 2006-09-20), we'll resolve a name like "foo-g1234abc" into the object name "1234abc". However, this has a small chance of creating a surprising ambiguity. For example, in a real world case we saw a name like "foo-gcc14" where "gcc14" refers to the compiler, but which caused us to look for an object with prefix "cc14". And if the repo is large enough to have such an object, and small enough that there is only one such commit (since we feed the disambiguation lookup code with the "commit" hint), then we'll return that object. Note that we would still resolve "foo-gcc14" as a tag name in preference to the describe name. But in this case it did not exist, and resolving anything was a surprise. We can't solve the ambiguity completely, but we can reduce the chances of it happening significantly by enforcing a minimum length we'll accept for the hex component. Since the name may have been generated by another repository, we can't know for sure what minimum they would have used, but a good guess is the value of core.abbrev (or if it's set to "auto", which is the default these days, the hard-coded minimum of "7"). There are five new tests here: 1. We check that describe names can be resolved at all. As far as I can tell we had no existing test that covered this. It passes before and after this patch. 2. Another option for solving this would be to insist that "foo" in "foo-gcc14" matches an existing ref (which would have been the source of the description). I don't think this is a good idea, though, as part of the point of having the "-g<hex>" suffix is that you might not have the same tags as whoever generated it. So this test codifies the existing behavior that we do not care about the parts before the "-g" at all. 3. Looking at the loop in get_describe_name(), we read from the back end and check for "-g" when we see a non-hex digit. But if it's not "-g", we keep looking! So for a name like "foo-g1234abc-bar", we'll still pass "1234abc-bar" to get_short_oid()! This is OK in practice, since it will barf when seeing the non-hex digits. But let's confirm that it does so. This is particularly important with our length checks, since "foo-gcc14-bar" would yield a length of 8, which is plausibly long (so we are likewise depending on get_short_oid() to reject it). 4. We check that names shorter than core.abbrev are rejected (i.e., the fix in this patch). 5. Likewise, when core.abbrev is "auto", we enforce the 7-character minimum. Reported-by: Matt Thompson <fortran@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- object-name.c | 12 ++++++++++++ t/t6120-describe.sh | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/object-name.c b/object-name.c index 527b853ac4..6507a30ace 100644 --- a/object-name.c +++ b/object-name.c @@ -1274,8 +1274,20 @@ static int get_describe_name(struct repository *r, * for it to be describe output. */ if (ch == 'g' && cp[-1] == '-') { + /* + * To reduce the chance of false positives, + * assume that any "-g<hex>" must have some + * minimum number of <hex> that matches what + * we'd produce when abbreviating. + */ + int min_len = default_abbrev; + if (min_len < 0) + min_len = FALLBACK_DEFAULT_ABBREV; + cp++; len -= cp - name; + if (len < min_len) + return -1; return get_short_oid(r, cp, len, oid, flags); } diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 79e0f19deb..790afe40ac 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -707,4 +707,37 @@ test_expect_success 'describe --broken --dirty with a file with changed stat' ' ) ' +test_expect_success 'long describe name can be resolved' ' + name=$(git describe --long A) && + git rev-parse "A^{commit}" >expect && + git rev-parse "$name" >actual && + test_cmp expect actual +' + +test_expect_success 'resolving describe name does not depend on tag' ' + hash=$(git rev-parse A^{commit}) && + abbrev=$(echo $hash | cut -c1-30) && + echo "$hash" >expect && + git rev-parse "does-not-exist-g$abbrev" >actual && + test_cmp expect actual +' + +test_expect_success 'resolving describe name only valid at end' ' + hash=$(git rev-parse A^{commit}) && + abbrev=$(echo $hash | cut -c1-30) && + test_must_fail git rev-parse "foo-g$abbrev-bar" +' + +test_expect_success 'resolving describe name requires minimum abbrev (auto)' ' + hash=$(git rev-parse A^{commit}) && + abbrev=$(echo $hash | cut -c1-6) && + test_must_fail git -c core.abbrev=auto rev-parse "foo-g$abbrev" +' + +test_expect_success 'resolving describe name requires minimum abbrev (config)' ' + hash=$(git rev-parse A^{commit}) && + abbrev=$(echo $hash | cut -c1-20) && + test_must_fail git -c core.abbrev=25 rev-parse "foo-g$abbrev" +' + test_done -- 2.46.0.452.ga6607598b6 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] get_oid(): enforce minimum length for "-g<hex>" names 2024-08-13 12:45 ` [PATCH] get_oid(): enforce minimum length for "-g<hex>" names Jeff King @ 2024-08-13 13:02 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2024-08-13 13:02 UTC (permalink / raw) To: Matt Thompson; +Cc: Junio C Hamano, git On Tue, Aug 13, 2024 at 08:45:50AM -0400, Jeff King wrote: > 3. Looking at the loop in get_describe_name(), we read from the back > end and check for "-g" when we see a non-hex digit. But if it's not > "-g", we keep looking! So for a name like "foo-g1234abc-bar", we'll > still pass "1234abc-bar" to get_short_oid()! This is OK in > practice, since it will barf when seeing the non-hex digits. But > let's confirm that it does so. This is particularly important > with our length checks, since "foo-gcc14-bar" would yield a length > of 8, which is plausibly long (so we are likewise depending on > get_short_oid() to reject it). So I think the current code is working as we'd want, and there's no correctness issue. I do think breaking out of the loop early would be more clear (and would provide a tiny speedup). Like: diff --git a/object-name.c b/object-name.c index 6507a30ace..89de9db8e9 100644 --- a/object-name.c +++ b/object-name.c @@ -1263,35 +1263,36 @@ static int peel_onion(struct repository *r, const char *name, int len, static int get_describe_name(struct repository *r, const char *name, int len, struct object_id *oid) { const char *cp; unsigned flags = GET_OID_QUIETLY | GET_OID_COMMIT; for (cp = name + len - 1; name + 2 <= cp; cp--) { char ch = *cp; if (!isxdigit(ch)) { /* We must be looking at g in "SOMETHING-g" * for it to be describe output. */ if (ch == 'g' && cp[-1] == '-') { /* * To reduce the chance of false positives, * assume that any "-g<hex>" must have some * minimum number of <hex> that matches what * we'd produce when abbreviating. */ int min_len = default_abbrev; if (min_len < 0) min_len = FALLBACK_DEFAULT_ABBREV; cp++; len -= cp - name; if (len < min_len) return -1; return get_short_oid(r, cp, len, oid, flags); } + break; } } return -1; } But I don't know that it matters much in practice. -Peff ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Bug: Git sees branch as valid commit ref and works; should fail 2024-08-13 11:53 ` Jeff King 2024-08-13 12:45 ` [PATCH] get_oid(): enforce minimum length for "-g<hex>" names Jeff King @ 2024-08-13 15:41 ` Junio C Hamano 1 sibling, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2024-08-13 15:41 UTC (permalink / raw) To: Jeff King; +Cc: Matt Thompson, git Jeff King <peff@peff.net> writes: > $ git rev-parse bugfix/mathomp4/trivial-ci-commit-gcc14 > cc14d30e332cd06327fe5a81ed26c24140882f42 > > That narrows it down to the name resolution code. If I step through it > in a debugger, the culprit seems to be get_describe_name(). ROTFL. Couldn't resist laughing with wonder, as I totally missed the fact that -gcc14 exactly looks like a valid describe suffix. > In other words, it's a false positive in the name resolver looking for > "describe" names. We'd prefer a real ref of that full name, I think, but > since there isn't one, we prefer the describe resolution rather than > treating it as a path. > > I can think of a few ways to make this better: > > - we ignore everything before the "-g<hex>" part entirely. Generally > this should be the name of a tag or at least some ref, so we could > perhaps verify that. But part of the point of sticking the hash in > the name is that you might have gotten the name from another source, > and your local one might not have the same tag. So that might be a > bad direction. Sad but you are absolutely right. > - the hash is abbreviated in the usual way, making it as short as > possible while remaining unambiguous. But unless the user goes out > of their way to set core.abbrev to something smaller, the minimum is > always 7. So perhaps get_describe_name() should be a bit more picky > about about that? > > That doesn't fix the problem, but it makes it a lot less likely to > trigger in the real world. And anybody who really does somehow end > up with a describe name with 4 characters can always pick the hash > out of the string themselves (or just set core.abbrev in their local > repo to be more permissive). That sounds like not-so-bad a direction to go. But notice that autogenerated preformatted documentation repositories record the corresponding source material using the "describe" name that is instructed to use absolutely minimum 4 chars when possible. I would imagine that it is not only me who does that, assuming that those who care enough about the correspondence between the commits in artifact repository and the commits in source repository would have tags used by the "describe" from the source repository. > I think the second one is something like this: > > diff --git a/object-name.c b/object-name.c > index 527b853ac4..a90338aa62 100644 > --- a/object-name.c > +++ b/object-name.c > @@ -1276,6 +1276,10 @@ static int get_describe_name(struct repository *r, > if (ch == 'g' && cp[-1] == '-') { > cp++; > len -= cp - name; > + if (len < (default_abbrev < 0 ? > + FALLBACK_DEFAULT_ABBREV : > + default_abbrev)) > + return -1; > return get_short_oid(r, > cp, len, oid, flags); > } So, perhaps we probably would want to allow shorter than the fallback default hexadecimal after "-g" _iff_ the leading "tag" part actually names an existing tag, or something like that. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-13 15:41 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-05 15:07 Bug: Git sees branch as valid commit ref and works; should fail Matt Thompson 2024-08-05 16:35 ` Junio C Hamano 2024-08-05 16:58 ` Matt Thompson 2024-08-13 11:53 ` Jeff King 2024-08-13 12:45 ` [PATCH] get_oid(): enforce minimum length for "-g<hex>" names Jeff King 2024-08-13 13:02 ` Jeff King 2024-08-13 15:41 ` Bug: Git sees branch as valid commit ref and works; should fail Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).