* [PATCH v2 1/2] object-name: fix resolution of object names containing curly braces
2025-01-04 0:17 ` [PATCH v2 0/2] " Elijah Newren via GitGitGadget
@ 2025-01-04 0:17 ` Elijah Newren via GitGitGadget
2025-01-04 17:26 ` Junio C Hamano
2025-01-05 16:14 ` Junio C Hamano
2025-01-04 0:17 ` [PATCH v2 2/2] object-name: be more strict in parsing describe-like output Elijah Newren via GitGitGadget
` (3 subsequent siblings)
4 siblings, 2 replies; 25+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-01-04 0:17 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Elijah Newren, Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
object-name.c | 8 +++++---
t/t1006-cat-file.sh | 31 ++++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/object-name.c b/object-name.c
index a563635a8cb..e54ef1f621e 100644
--- a/object-name.c
+++ b/object-name.c
@@ -2051,12 +2051,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
return -1;
}
for (cp = name, bracket_depth = 0; *cp; cp++) {
- if (*cp == '{')
+ if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
+ cp++;
bracket_depth++;
- else if (bracket_depth && *cp == '}')
+ } else if (bracket_depth && *cp == '}') {
bracket_depth--;
- else if (!bracket_depth && *cp == ':')
+ } else if (!bracket_depth && *cp == ':') {
break;
+ }
}
if (*cp == ':') {
struct object_id tree_oid;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ff9bf213aa2..398865d6ebe 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,7 +240,8 @@ test_expect_success "setup" '
git config extensions.objectformat $test_hash_algo &&
git config extensions.compatobjectformat $test_compat_hash_algo &&
echo_without_newline "$hello_content" > hello &&
- git update-index --add hello
+ git update-index --add hello &&
+ git commit -m "add hello file"
'
run_blob_tests () {
@@ -602,6 +603,34 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
test_cmp expect actual
'
+test_expect_success 'setup with curly braches in input' '
+ git branch "foo{bar" HEAD &&
+ git branch "foo@" HEAD
+'
+
+test_expect_success 'object reference with curly brace' '
+ git cat-file -p "foo{bar:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'object reference with at-sign' '
+ git cat-file -p "foo@@{0}:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'setup with commit with colon' '
+ git commit-tree -m "testing: just a bunch of junk" HEAD^{tree} >out &&
+ git branch other $(cat out)
+'
+
+test_expect_success 'object reference via commit text search' '
+ git cat-file -p "other^{/testing:}:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
test_expect_success 'setup blobs which are likely to delta' '
test-tool genrandom foo 10240 >foo &&
{ cat foo && echo plus; } >foo-plus &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v2 1/2] object-name: fix resolution of object names containing curly braces
2025-01-04 0:17 ` [PATCH v2 1/2] " Elijah Newren via GitGitGadget
@ 2025-01-04 17:26 ` Junio C Hamano
2025-01-04 18:54 ` Elijah Newren
2025-01-05 16:14 ` Junio C Hamano
1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2025-01-04 17:26 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Patrick Steinhardt, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> for (cp = name, bracket_depth = 0; *cp; cp++) {
> - if (*cp == '{')
> + if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
> + cp++;
> bracket_depth++;
Checking cp[1] before even knowing if cp[0] is the end of the string
(hence cp[1] is an out of bounds access) smells fishy. If it were
something like ...
if (cp[0] && strchr("@^", cp[0]) && cp[1] == '{')
... it may be a bit more palatable, perhaps? At least writing it
this way we can easily scale when we find the third character we
need to special case, hopefully, but again, I do prefer if we can
find a solution that does not have such an intimate knowledge about
"@^", which I just failed to do here X-<.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v2 1/2] object-name: fix resolution of object names containing curly braces
2025-01-04 17:26 ` Junio C Hamano
@ 2025-01-04 18:54 ` Elijah Newren
0 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren @ 2025-01-04 18:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
On Sat, Jan 4, 2025 at 9:26 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > for (cp = name, bracket_depth = 0; *cp; cp++) {
> > - if (*cp == '{')
> > + if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
> > + cp++;
> > bracket_depth++;
>
> Checking cp[1] before even knowing if cp[0] is the end of the string
> (hence cp[1] is an out of bounds access) smells fishy.
We checked *cp in the loop already, so we know cp[0] != '\0'.
Combined with the fact that this is a NUL-terminated string, we
therefore also know that cp[1] is not an out-of-bounds access.
> If it were
> something like ...
>
> if (cp[0] && strchr("@^", cp[0]) && cp[1] == '{')
Since we know cp[0] != '\0' already, couldn't this be simplified to
if (strchr("@^", *cp) && cp[1] == '{')
?
I do like this form better though, yes.
> ... it may be a bit more palatable, perhaps? At least writing it
> this way we can easily scale when we find the third character we
> need to special case, hopefully, but again, I do prefer if we can
> find a solution that does not have such an intimate knowledge about
> "@^", which I just failed to do here X-<.
Yeah, I have failed to come up with an alternative as well. If I and
others can't come up with something better in a few days, I'll
resubmit with the above change and a comment in the commit message
that we'd prefer something better but were unable to come up with
anything.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/2] object-name: fix resolution of object names containing curly braces
2025-01-04 0:17 ` [PATCH v2 1/2] " Elijah Newren via GitGitGadget
2025-01-04 17:26 ` Junio C Hamano
@ 2025-01-05 16:14 ` Junio C Hamano
1 sibling, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2025-01-05 16:14 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Patrick Steinhardt, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Given a branch name of 'foo{bar', commands like
>
> git cat-file -p foo{bar:README.md
>
> should succeed (assuming that branch had a README.md file, of course).
> However, the change in cce91a2caef9 (Change 'master@noon' syntax to
> 'master@{noon}'., 2006-05-19) presumed that curly braces would always
> come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
> to entirely miss the ':' and assume there's no object being referenced.
> In short, git would report:
>
> fatal: Not a valid object name foo{bar:README.md
Naïvely, it seems that a solution is to parse from left to right,
i.e., (1) notice there is a colon, (2) see if everything before that
colon resolves to a treeish, and (3) see if everything after it is a
path that appears in the treeish.
- When we are given foo@{some:thing}, if we did that, we realize
that "foo@{some" is not a valid tree-ish object name (since "@{"
cannot appear in a refname) and then can backtrack by realizing
"foo" is a ref, and @{...} could be a reflog reference (most
likely a way to spell some sort of timestamp), and try that.
- Similarly, for foo:path-gaffed, we would notice "foo" is a valid
tree-ish object name, and if path-gaffed is a path in it, we'd be
happy. Or foo may not be a tree-ish, or path-gaffed may not
exist in that tree-ish. In which case, we can backtrack and see
foo:path-g is an allowed prefix in a desribe name.
Now in the above description, I have assumed that an alternative
interpretation kicks in only as a fallback when we backtrack, but
we could make sure we try all possibilities and notice ambiguity if
we wanted to.
In any case, such an updated structure of the parsing code paths
(whether alternative interpretations are treated as fallbacks or
equally plausible candidates subject to disambiguation) would be
a vast departure from what we currently have, so a targeted "fix"
like these two patches attempt would be more appropriate as an
initial approach, I think.
Thanks, will queue, but probably we'd look at in any seriousness
after the 2.48 final gets tagged.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 2/2] object-name: be more strict in parsing describe-like output
2025-01-04 0:17 ` [PATCH v2 0/2] " Elijah Newren via GitGitGadget
2025-01-04 0:17 ` [PATCH v2 1/2] " Elijah Newren via GitGitGadget
@ 2025-01-04 0:17 ` Elijah Newren via GitGitGadget
2025-01-04 14:35 ` [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces Junio C Hamano
` (2 subsequent siblings)
4 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-01-04 0:17 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Elijah Newren, Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expand ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit affed exists, will not return us information about a
file we are looking for but will instead tell us about commit affed.
Similarly, we should probably not treat
refs/tags/invalid/./../...../// ~^:/?*\\&[}/busted.lock-g049e0ef6
as a request for commit 050e0ef6 either.
Tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present and valid.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
object-name.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
t/t6120-describe.sh | 22 ++++++++++++++++++
2 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/object-name.c b/object-name.c
index e54ef1f621e..71207729f6f 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1271,6 +1271,58 @@ static int peel_onion(struct repository *r, const char *name, int len,
return 0;
}
+/*
+ * Documentation/revisions.txt says:
+ * '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
+ * Output from `git describe`; i.e. a closest tag, optionally
+ * followed by a dash and a number of commits, followed by a dash, a
+ * 'g', and an abbreviated object name.
+ *
+ * which means that the stuff before '-g${HASH}' needs to be a valid
+ * refname, a dash, and a non-negative integer. This function verifies
+ * that.
+ *
+ * In particular, we do not want to treat
+ * branchname:path/to/file/named/i-gaffed
+ * as a request for commit affed.
+ *
+ * More generally, we should probably not treat
+ * 'refs/heads/./../.../ ~^:/?*[////\\\&}/busted.lock-g050e0ef6ead'
+ * as a request for object 050e0ef6ead either.
+ *
+ * We are called with name[len] == '-' and name[len+1] == 'g', i.e.
+ * we are verifying ${REFNAME}-{INTEGER} part of the name.
+ */
+static int ref_and_count_parts_valid(const char *name, int len)
+{
+ struct strbuf sb;
+ const char *cp;
+ int flags = REFNAME_ALLOW_ONELEVEL;
+ int ret = 1;
+
+ /* Ensure we have at least one digit */
+ if (!isxdigit(name[len-1]))
+ return 0;
+
+ /* Skip over digits backwards until we get to the dash */
+ for (cp = name + len - 2; name < cp; cp--) {
+ if (*cp == '-')
+ break;
+ if (!isxdigit(*cp))
+ return 0;
+ }
+ /* Ensure we found the leading dash */
+ if (*cp != '-')
+ return 0;
+
+ len = cp - name;
+ strbuf_init(&sb, len);
+ strbuf_add(&sb, name, len);
+ ret = !check_refname_format(name, flags);
+ strbuf_release(&sb);
+ return ret;
+}
+
static int get_describe_name(struct repository *r,
const char *name, int len,
struct object_id *oid)
@@ -1284,7 +1336,8 @@ static int get_describe_name(struct repository *r,
/* We must be looking at g in "SOMETHING-g"
* for it to be describe output.
*/
- if (ch == 'g' && cp[-1] == '-') {
+ if (ch == 'g' && cp[-1] == '-' &&
+ ref_and_count_parts_valid(name, cp - 1 - name)) {
cp++;
len -= cp - name;
return get_short_oid(r,
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 3f6160d702b..9217bd0fa89 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -725,4 +725,26 @@ test_expect_success '--exact-match does not show --always fallback' '
test_must_fail git describe --exact-match --always
'
+test_expect_success 'avoid being fooled by describe-like filename' '
+ test_when_finished rm out &&
+
+ git rev-parse --short HEAD >out &&
+ FILENAME=filename-g$(cat out) &&
+ touch $FILENAME &&
+ git add $FILENAME &&
+ git commit -m "Add $FILENAME" &&
+
+ git cat-file -t HEAD:$FILENAME >actual &&
+
+ echo blob >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'do not be fooled by invalid describe format ' '
+ test_when_finished rm out &&
+
+ git rev-parse --short HEAD >out &&
+ test_must_fail git cat-file -t "refs/tags/super-invalid/./../...../ ~^:/?*[////\\\\\\&}/busted.lock-42-g"$(cat out)
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-04 0:17 ` [PATCH v2 0/2] " Elijah Newren via GitGitGadget
2025-01-04 0:17 ` [PATCH v2 1/2] " Elijah Newren via GitGitGadget
2025-01-04 0:17 ` [PATCH v2 2/2] object-name: be more strict in parsing describe-like output Elijah Newren via GitGitGadget
@ 2025-01-04 14:35 ` Junio C Hamano
2025-01-04 15:55 ` Elijah Newren
2025-01-06 17:29 ` Junio C Hamano
2025-01-13 17:13 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Elijah Newren via GitGitGadget
4 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2025-01-04 14:35 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Patrick Steinhardt, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> * Added a second patch for another bug discovered by the same reporter,
> where branch:path/to/file/named/major-gaffed is interpreted as a request
> for a commit (namely affed) rather than a blob. (At least, assuming
> commit affed exists)
>
> The second patch has some backward compatibility concerns. People used to be
> able to do e.g. git show ${garbage}-g${hash}. I tightened it to
> ${valid_refname}-${number}-g${hash}, but do we want to allow e.g.
> ${valid_refname}-g${hash} (allowing the count to be omitted) or maybe even
> allow a subset of invalid refnames?
My take on it is that it is an absolute no-no if we require that
${valid_refname} exists locally, and it is still iffy if we checked
${valid_refname} with check_format() (because the definition of
validity can change over time, and we would not know the rules that
were valid back when the reference to the commit was written).
Otherwise a tightened rule would make "${garbage}-g${hash}" less
useful to copy-and-paste from a text file to command line.
In general what would we do if a string can be interpreted in
multiple ways in _different_ parts of the object-name codepaths. We
all know that "affed" would trigger the "ambiguous object name"
error if there are more than one object whose object name begins
with "affed", but if "${garbage}-gaffed" can be interpreted as the
name of an object whose object name begins with "affed" and also can
be interpreted as the name of another object that sits at a path
that ends with "-gaffed" in some tree object, regardless of how the
leading part "${garbage}" looks like, it would be desirable if we
declared such a string as "ambiguous" the same way.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-04 14:35 ` [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces Junio C Hamano
@ 2025-01-04 15:55 ` Elijah Newren
2025-01-04 17:51 ` Junio C Hamano
0 siblings, 1 reply; 25+ messages in thread
From: Elijah Newren @ 2025-01-04 15:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
On Sat, Jan 4, 2025 at 6:35 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > * Added a second patch for another bug discovered by the same reporter,
> > where branch:path/to/file/named/major-gaffed is interpreted as a request
> > for a commit (namely affed) rather than a blob. (At least, assuming
> > commit affed exists)
> >
> > The second patch has some backward compatibility concerns. People used to be
> > able to do e.g. git show ${garbage}-g${hash}. I tightened it to
> > ${valid_refname}-${number}-g${hash}, but do we want to allow e.g.
> > ${valid_refname}-g${hash} (allowing the count to be omitted) or maybe even
> > allow a subset of invalid refnames?
>
> My take on it is that it is an absolute no-no if we require that
> ${valid_refname} exists locally, and it is still iffy if we checked
> ${valid_refname} with check_format() (because the definition of
> validity can change over time, and we would not know the rules that
> were valid back when the reference to the commit was written).
Fair enough. However...
> Otherwise a tightened rule would make "${garbage}-g${hash}" less
> useful to copy-and-paste from a text file to command line.
>
> In general what would we do if a string can be interpreted in
> multiple ways in _different_ parts of the object-name codepaths. We
> all know that "affed" would trigger the "ambiguous object name"
> error if there are more than one object whose object name begins
> with "affed", but if "${garbage}-gaffed" can be interpreted as the
> name of an object whose object name begins with "affed" and also can
> be interpreted as the name of another object that sits at a path
> that ends with "-gaffed" in some tree object, regardless of how the
> leading part "${garbage}" looks like, it would be desirable if we
> declared such a string as "ambiguous" the same way.
How would that be desirable? There's no possible way to disambiguate.
While abbreviated revisions can just be modified to be less
abbreviated, paths cannot be spelled any other way. How would you
spell
master:path/to/who-gabbed
in a "less ambiguous" way to differentiate it from commit abbed? As
far as I can tell, this proposal just leaves the user stuck with an
error with no way to get the path they want.
If you don't like check_format() being called on the leading part of
the string, can we at least enforce that there is no ':', so that we
can successfully request explicit paths of given revisions and know
that we'll get them? (That'd disallow e.g. next^{/doc:}-12-gabbed,
but that clearly was never a valid describe output anyway.)
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-04 15:55 ` Elijah Newren
@ 2025-01-04 17:51 ` Junio C Hamano
2025-01-04 18:55 ` Elijah Newren
0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2025-01-04 17:51 UTC (permalink / raw)
To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
Elijah Newren <newren@gmail.com> writes:
>> In general what would we do if a string can be interpreted in
>> multiple ways in _different_ parts of the object-name codepaths. We
>> all know that "affed" would trigger the "ambiguous object name"
>> error if there are more than one object whose object name begins
>> with "affed", but if "${garbage}-gaffed" can be interpreted as the
>> name of an object whose object name begins with "affed" and also can
>> be interpreted as the name of another object that sits at a path
>> that ends with "-gaffed" in some tree object, regardless of how the
>> leading part "${garbage}" looks like, it would be desirable if we
>> declared such a string as "ambiguous" the same way.
>
> How would that be desirable?
In "a:b/c-0-gabcde", *if* "a:b/c-0" *were* a valid way to spell a
valid refname, then the whole thing is an ambiguous object name,
i.e. it could be "something reachable from object 'a:b/c' whose
object name begins with abcde", or it could be "object at the path
b/c-0-gabcde in a tree-ish a", and in such a case our code should be
set up to allow us to give a "that's ambiguous" error, instead of
yielding the first possible interpretation (i.e. if we happen to
have checked the describe name first and "$garbage-0-gabcde", we
yield "abcde" before even checking if $garbage part gives a possible
leading part of a tree-ish; but if a future refactoring of the code
flips the order of checking, we may end up yielding 'an object at a
path, which ends with -0-gabcde, sitting in a tree-ish', without
checking if that could be a valid describe name).
Of course we should make sure that the syntax cannot be ambiguous
when we introduce a new syntax to represent a new feature ;-)
Now, I think ":" has always been a byte that is invalid as a part of
any refname, so "${garbage}-gabcde" with a colon in ${garbage}
cannot be a describe name. So in the above about "a:b/c-0" is an
impossible example, but I was wondering more about the general
principle we should follow.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-04 17:51 ` Junio C Hamano
@ 2025-01-04 18:55 ` Elijah Newren
0 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren @ 2025-01-04 18:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
On Sat, Jan 4, 2025 at 9:51 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >> In general what would we do if a string can be interpreted in
> >> multiple ways in _different_ parts of the object-name codepaths. We
> >> all know that "affed" would trigger the "ambiguous object name"
> >> error if there are more than one object whose object name begins
> >> with "affed", but if "${garbage}-gaffed" can be interpreted as the
> >> name of an object whose object name begins with "affed" and also can
> >> be interpreted as the name of another object that sits at a path
> >> that ends with "-gaffed" in some tree object, regardless of how the
> >> leading part "${garbage}" looks like, it would be desirable if we
> >> declared such a string as "ambiguous" the same way.
> >
> > How would that be desirable?
>
> In "a:b/c-0-gabcde", *if* "a:b/c-0" *were* a valid way to spell a
> valid refname, then the whole thing is an ambiguous object name,
> i.e. it could be "something reachable from object 'a:b/c' whose
> object name begins with abcde", or it could be "object at the path
> b/c-0-gabcde in a tree-ish a", and in such a case our code should be
> set up to allow us to give a "that's ambiguous" error, instead of
> yielding the first possible interpretation (i.e. if we happen to
> have checked the describe name first and "$garbage-0-gabcde", we
> yield "abcde" before even checking if $garbage part gives a possible
> leading part of a tree-ish; but if a future refactoring of the code
> flips the order of checking, we may end up yielding 'an object at a
> path, which ends with -0-gabcde, sitting in a tree-ish', without
> checking if that could be a valid describe name).
>
> Of course we should make sure that the syntax cannot be ambiguous
> when we introduce a new syntax to represent a new feature ;-)
>
> Now, I think ":" has always been a byte that is invalid as a part of
> any refname, so "${garbage}-gabcde" with a colon in ${garbage}
> cannot be a describe name. So in the above about "a:b/c-0" is an
> impossible example, but I was wondering more about the general
> principle we should follow.
Are you only interested in the general principle for the "possible
examples"? What about the general principle for the "impossible
examples"? Things like "master:path/to/who-gabbed" are unambiguously
a reference to a path within a revision that cannot be spelled any
alternate way, but the code currently gives the user a commit instead.
What's the right way to fix these "impossible examples"? I've given
three proposals and implemented the first of them:
- ${POSSIBLY_VALID_REFNAME}-${INTEGER}-g${HASH}
- ${POSSIBLY_VALID_REFNAME}-g${HASH}
- ${ANYTHING_WITHOUT_A_COLON}-g${HASH}
You said you don't like the first two because check_refname() rules
might change, and not commented on the third.
Also, as far as I can tell, the set of "possible examples" you are
focusing on is currently the empty set. A change of syntax might in
the future expand that to a non-empty-set, and then bring us backward
compatibility headaches because we have been allowing
"${garbage}-g${hash}" to mean a reference to ${hash} and we'd then
have to deal with it becoming ambiguous (and potentially also having
no way to disambiguate those cases, similar to how if colon is allowed
in garbage then we have no way to disambiguate paths). If we want to
allow future object naming extensions, it seems like we should lock
down and rule out as many existing forms of known ${garbage} as we
can, but that'd push us towards the
${POSSIBLY_VALID_REFNAME}-${INTEGER}-g${HASH} solution I implemented
that you don't seem to like. Is there a middle ground that you do
like?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-04 0:17 ` [PATCH v2 0/2] " Elijah Newren via GitGitGadget
` (2 preceding siblings ...)
2025-01-04 14:35 ` [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces Junio C Hamano
@ 2025-01-06 17:29 ` Junio C Hamano
2025-01-06 19:26 ` Elijah Newren
2025-01-13 17:13 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Elijah Newren via GitGitGadget
4 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2025-01-06 17:29 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Patrick Steinhardt, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Maintainer note: these bugs both date back to 2006; neither is a regression
> in this cycle.
While I was preparing today's -rc2 release, I noticed that this
change broke some of my release scripts with
$ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
fatal: Needed a single revision
which is the construct that has been there almost forever. Its
expected output is
$ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
6c2274cdbca14b7eb70fb182ffac80bf6950e137
The series seems to need a bit more work.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-06 17:29 ` Junio C Hamano
@ 2025-01-06 19:26 ` Elijah Newren
2025-01-06 20:38 ` Junio C Hamano
0 siblings, 1 reply; 25+ messages in thread
From: Elijah Newren @ 2025-01-06 19:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
On Mon, Jan 6, 2025 at 9:29 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Maintainer note: these bugs both date back to 2006; neither is a regression
> > in this cycle.
>
> While I was preparing today's -rc2 release, I noticed that this
> change broke some of my release scripts with
>
> $ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
> fatal: Needed a single revision
>
> which is the construct that has been there almost forever. Its
> expected output is
>
> $ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
> 6c2274cdbca14b7eb70fb182ffac80bf6950e137
>
> The series seems to need a bit more work.
Gah, I made sure to copy the object name into a string buf, so I could
operate on just the relevant part, but then ignored that and operated
on the full thing.
This fixes it:
diff --git a/object-name.c b/object-name.c
index 614520954c7..cb96a0e6161 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1318,7 +1318,7 @@ static int ref_and_count_parts_valid(const char
*name, int len)
len = cp - name;
strbuf_init(&sb, len);
strbuf_add(&sb, name, len);
- ret = !check_refname_format(name, flags);
+ ret = !check_refname_format(sb.buf, flags);
strbuf_release(&sb);
return ret;
}
I'll include it with all my other fixes in a reroll, which I'll
probably send out after 2.48 to avoid distracting from the release.
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v2 0/2] object-name: fix resolution of object names containing curly braces
2025-01-06 19:26 ` Elijah Newren
@ 2025-01-06 20:38 ` Junio C Hamano
0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2025-01-06 20:38 UTC (permalink / raw)
To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
Elijah Newren <newren@gmail.com> writes:
> On Mon, Jan 6, 2025 at 9:29 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > Maintainer note: these bugs both date back to 2006; neither is a regression
>> > in this cycle.
>>
>> While I was preparing today's -rc2 release, I noticed that this
>> change broke some of my release scripts with
>>
>> $ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
>> fatal: Needed a single revision
>>
>> which is the construct that has been there almost forever. Its
>> expected output is
>>
>> $ git rev-parse --verify v2.48.0-rc2-161-g6c2274cdbc^0
>> 6c2274cdbca14b7eb70fb182ffac80bf6950e137
>>
>> The series seems to need a bit more work.
>
> Gah, I made sure to copy the object name into a string buf, so I could
> operate on just the relevant part, but then ignored that and operated
> on the full thing.
>
> This fixes it:
>
> diff --git a/object-name.c b/object-name.c
> index 614520954c7..cb96a0e6161 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1318,7 +1318,7 @@ static int ref_and_count_parts_valid(const char
> *name, int len)
> len = cp - name;
> strbuf_init(&sb, len);
> strbuf_add(&sb, name, len);
> - ret = !check_refname_format(name, flags);
> + ret = !check_refname_format(sb.buf, flags);
> strbuf_release(&sb);
> return ret;
> }
>
> I'll include it with all my other fixes in a reroll, which I'll
> probably send out after 2.48 to avoid distracting from the release.
In existing tests, we seem to be lacking coverage to notice this
breakage, so let's make sure we add one or two.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 0/2] object-name: fix a pair of object name resolution issues
2025-01-04 0:17 ` [PATCH v2 0/2] " Elijah Newren via GitGitGadget
` (3 preceding siblings ...)
2025-01-06 17:29 ` Junio C Hamano
@ 2025-01-13 17:13 ` Elijah Newren via GitGitGadget
2025-01-13 17:13 ` [PATCH v3 1/2] object-name: fix resolution of object names containing curly braces Elijah Newren via GitGitGadget
` (2 more replies)
4 siblings, 3 replies; 25+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-01-13 17:13 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Elijah Newren, Elijah Newren
Changes since v2:
* Readability improvement to the first patch, which fixes object name
resolution with refs containing a curly brace
* Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
couple test cases for that. Also, extended the commit message a bit to
discuss the cases brought up on the list.
For the second patch, if folks want some open source examples where it could
be triggered, I found two examples:
* lore.git: git cat-file -t master:random/path/major-gaffed
* git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e
Elijah Newren (2):
object-name: fix resolution of object names containing curly braces
object-name: be more strict in parsing describe-like output
object-name.c | 63 ++++++++++++++++++++++++++++++++++++++++++---
t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
t/t6120-describe.sh | 24 +++++++++++++++++
3 files changed, 113 insertions(+), 5 deletions(-)
base-commit: 063bcebf0c917140ca0e705cbe0fdea127e90086
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1844%2Fnewren%2Fobject-name-fix-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1844/newren/object-name-fix-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1844
Range-diff vs v2:
1: 13f68bebe90 ! 1: b21b8fc5554 object-name: fix resolution of object names containing curly braces
@@ Commit message
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
+ Note that we'd prefer duplicating the special logic for "@^" characters
+ here, because if get_oid_basic() or interpret_nth_prior_checkout() or
+ get_oid_basic() or similar gain extra methods of using curly braces,
+ then the logic in get_oid_with_context_1() would need to be updated as
+ well. But it's not clear how to refactor all of these to have a simple
+ common callpoint with the specialized logic.
+
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ object-name.c: static enum get_oid_result get_oid_with_context_1(struct reposito
}
for (cp = name, bracket_depth = 0; *cp; cp++) {
- if (*cp == '{')
-+ if (*(cp+1) == '{' && (*cp == '@' || *cp == '^')) {
++ if (strchr("@^", *cp) && cp[1] == '{') {
+ cp++;
bracket_depth++;
- else if (bracket_depth && *cp == '}')
2: 31f1c37b31a ! 2: 19f84dfc9cc object-name: be more strict in parsing describe-like output
@@ Commit message
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
- should parse to fully expand ${HASH}. This is fine. However, we
+ should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
- which, when commit affed exists, will not return us information about a
- file we are looking for but will instead tell us about commit affed.
+ which, when commit (or tree or blob) affed exists, will not return us
+ information about the file we are looking for but will instead
+ erroneously tell us about object affed.
- Similarly, we should probably not treat
- refs/tags/invalid/./../...../// ~^:/?*\\&[}/busted.lock-g049e0ef6
- as a request for commit 050e0ef6 either.
+ A few additional notes:
+ - This is a slight backward incompatibility break, because we used
+ to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
+ a backward incompatible break is necessary, because there is no
+ other way for someone to be more specific and disambiguate that they
+ want the blob master:path/to/who-gabbed instead of the object abbed.
+ - There is a possibility that check_refname_format() rules change in
+ the future. However, we can only realistically loosen the rules
+ for what that function accepts rather than tighten. If we were to
+ tighten the rules, some real world repositories may already have
+ refnames that suddenly become unacceptable and we break those
+ repositories. As such, any describe-like syntax of the form
+ ${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
+ changes in this commit will remain valid in the future.
+ - The fact that check_refname_format() rules could loosen in the
+ future is probably also an important reason to make this change. If
+ the rules loosen, there might be additional cases within
+ ${GARBAGE}-g${HASH} that become ambiguous in the future. While
+ abbreviated hashes can be disambiguated by abbreviating less, it may
+ well be that these alternative object names have no way of being
+ disambiguated (much like pathnames cannot be). Accepting all random
+ ${GARBAGE} thus makes it difficult for us to allow future
+ extensions to object naming.
- Tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
- present and valid.
+ So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
+ present in the string, and would be considered a valid ref and
+ non-negative integer.
+
+ Also, add a few tests for git describe using object names of the form
+ ${REVISION_NAME}${MODIFIERS}
+ since an early version of this patch failed on constructs like
+ git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ object-name.c: static int peel_onion(struct repository *r, const char *name, int
+ len = cp - name;
+ strbuf_init(&sb, len);
+ strbuf_add(&sb, name, len);
-+ ret = !check_refname_format(name, flags);
++ ret = !check_refname_format(sb.buf, flags);
+ strbuf_release(&sb);
+ return ret;
+}
@@ object-name.c: static int get_describe_name(struct repository *r,
return get_short_oid(r,
## t/t6120-describe.sh ##
+@@ t/t6120-describe.sh: check_describe R-2-gHASH HEAD^^
+ check_describe A-3-gHASH HEAD^^2
+ check_describe B HEAD^^2^
+ check_describe R-1-gHASH HEAD^^^
++check_describe R-1-gHASH R-1-g$(git rev-parse --short HEAD^^)~1
+
+ check_describe c-7-gHASH --tags HEAD
+ check_describe c-6-gHASH --tags HEAD^
+ check_describe e-1-gHASH --tags HEAD^^
+ check_describe c-2-gHASH --tags HEAD^^2
++check_describe c-2-gHASH --tags c-2-g$(git rev-parse --short HEAD^^2)^0
+ check_describe B --tags HEAD^^2^
+ check_describe e --tags HEAD^^^
+ check_describe e --tags --exact-match HEAD^^^
@@ t/t6120-describe.sh: test_expect_success '--exact-match does not show --always fallback' '
test_must_fail git describe --exact-match --always
'
--
gitgitgadget
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH v3 1/2] object-name: fix resolution of object names containing curly braces
2025-01-13 17:13 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Elijah Newren via GitGitGadget
@ 2025-01-13 17:13 ` Elijah Newren via GitGitGadget
2025-01-13 17:13 ` [PATCH v3 2/2] object-name: be more strict in parsing describe-like output Elijah Newren via GitGitGadget
2025-01-13 18:15 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Junio C Hamano
2 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-01-13 17:13 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Elijah Newren, Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Note that we'd prefer duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well. But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
object-name.c | 8 +++++---
t/t1006-cat-file.sh | 31 ++++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/object-name.c b/object-name.c
index a563635a8cb..8e80841acd3 100644
--- a/object-name.c
+++ b/object-name.c
@@ -2051,12 +2051,14 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
return -1;
}
for (cp = name, bracket_depth = 0; *cp; cp++) {
- if (*cp == '{')
+ if (strchr("@^", *cp) && cp[1] == '{') {
+ cp++;
bracket_depth++;
- else if (bracket_depth && *cp == '}')
+ } else if (bracket_depth && *cp == '}') {
bracket_depth--;
- else if (!bracket_depth && *cp == ':')
+ } else if (!bracket_depth && *cp == ':') {
break;
+ }
}
if (*cp == ':') {
struct object_id tree_oid;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ff9bf213aa2..398865d6ebe 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,7 +240,8 @@ test_expect_success "setup" '
git config extensions.objectformat $test_hash_algo &&
git config extensions.compatobjectformat $test_compat_hash_algo &&
echo_without_newline "$hello_content" > hello &&
- git update-index --add hello
+ git update-index --add hello &&
+ git commit -m "add hello file"
'
run_blob_tests () {
@@ -602,6 +603,34 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
test_cmp expect actual
'
+test_expect_success 'setup with curly braches in input' '
+ git branch "foo{bar" HEAD &&
+ git branch "foo@" HEAD
+'
+
+test_expect_success 'object reference with curly brace' '
+ git cat-file -p "foo{bar:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'object reference with at-sign' '
+ git cat-file -p "foo@@{0}:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'setup with commit with colon' '
+ git commit-tree -m "testing: just a bunch of junk" HEAD^{tree} >out &&
+ git branch other $(cat out)
+'
+
+test_expect_success 'object reference via commit text search' '
+ git cat-file -p "other^{/testing:}:hello" >actual &&
+ git cat-file -p HEAD:hello >expect &&
+ test_cmp expect actual
+'
+
test_expect_success 'setup blobs which are likely to delta' '
test-tool genrandom foo 10240 >foo &&
{ cat foo && echo plus; } >foo-plus &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH v3 2/2] object-name: be more strict in parsing describe-like output
2025-01-13 17:13 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Elijah Newren via GitGitGadget
2025-01-13 17:13 ` [PATCH v3 1/2] object-name: fix resolution of object names containing curly braces Elijah Newren via GitGitGadget
@ 2025-01-13 17:13 ` Elijah Newren via GitGitGadget
2025-01-13 18:15 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Junio C Hamano
2 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-01-13 17:13 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Elijah Newren, Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.
A few additional notes:
- This is a slight backward incompatibility break, because we used
to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
a backward incompatible break is necessary, because there is no
other way for someone to be more specific and disambiguate that they
want the blob master:path/to/who-gabbed instead of the object abbed.
- There is a possibility that check_refname_format() rules change in
the future. However, we can only realistically loosen the rules
for what that function accepts rather than tighten. If we were to
tighten the rules, some real world repositories may already have
refnames that suddenly become unacceptable and we break those
repositories. As such, any describe-like syntax of the form
${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
changes in this commit will remain valid in the future.
- The fact that check_refname_format() rules could loosen in the
future is probably also an important reason to make this change. If
the rules loosen, there might be additional cases within
${GARBAGE}-g${HASH} that become ambiguous in the future. While
abbreviated hashes can be disambiguated by abbreviating less, it may
well be that these alternative object names have no way of being
disambiguated (much like pathnames cannot be). Accepting all random
${GARBAGE} thus makes it difficult for us to allow future
extensions to object naming.
So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.
Also, add a few tests for git describe using object names of the form
${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
object-name.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
t/t6120-describe.sh | 24 ++++++++++++++++++++
2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/object-name.c b/object-name.c
index 8e80841acd3..cb96a0e6161 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1271,6 +1271,58 @@ static int peel_onion(struct repository *r, const char *name, int len,
return 0;
}
+/*
+ * Documentation/revisions.txt says:
+ * '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
+ * Output from `git describe`; i.e. a closest tag, optionally
+ * followed by a dash and a number of commits, followed by a dash, a
+ * 'g', and an abbreviated object name.
+ *
+ * which means that the stuff before '-g${HASH}' needs to be a valid
+ * refname, a dash, and a non-negative integer. This function verifies
+ * that.
+ *
+ * In particular, we do not want to treat
+ * branchname:path/to/file/named/i-gaffed
+ * as a request for commit affed.
+ *
+ * More generally, we should probably not treat
+ * 'refs/heads/./../.../ ~^:/?*[////\\\&}/busted.lock-g050e0ef6ead'
+ * as a request for object 050e0ef6ead either.
+ *
+ * We are called with name[len] == '-' and name[len+1] == 'g', i.e.
+ * we are verifying ${REFNAME}-{INTEGER} part of the name.
+ */
+static int ref_and_count_parts_valid(const char *name, int len)
+{
+ struct strbuf sb;
+ const char *cp;
+ int flags = REFNAME_ALLOW_ONELEVEL;
+ int ret = 1;
+
+ /* Ensure we have at least one digit */
+ if (!isxdigit(name[len-1]))
+ return 0;
+
+ /* Skip over digits backwards until we get to the dash */
+ for (cp = name + len - 2; name < cp; cp--) {
+ if (*cp == '-')
+ break;
+ if (!isxdigit(*cp))
+ return 0;
+ }
+ /* Ensure we found the leading dash */
+ if (*cp != '-')
+ return 0;
+
+ len = cp - name;
+ strbuf_init(&sb, len);
+ strbuf_add(&sb, name, len);
+ ret = !check_refname_format(sb.buf, flags);
+ strbuf_release(&sb);
+ return ret;
+}
+
static int get_describe_name(struct repository *r,
const char *name, int len,
struct object_id *oid)
@@ -1284,7 +1336,8 @@ static int get_describe_name(struct repository *r,
/* We must be looking at g in "SOMETHING-g"
* for it to be describe output.
*/
- if (ch == 'g' && cp[-1] == '-') {
+ if (ch == 'g' && cp[-1] == '-' &&
+ ref_and_count_parts_valid(name, cp - 1 - name)) {
cp++;
len -= cp - name;
return get_short_oid(r,
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 3f6160d702b..76843a61691 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -82,11 +82,13 @@ check_describe R-2-gHASH HEAD^^
check_describe A-3-gHASH HEAD^^2
check_describe B HEAD^^2^
check_describe R-1-gHASH HEAD^^^
+check_describe R-1-gHASH R-1-g$(git rev-parse --short HEAD^^)~1
check_describe c-7-gHASH --tags HEAD
check_describe c-6-gHASH --tags HEAD^
check_describe e-1-gHASH --tags HEAD^^
check_describe c-2-gHASH --tags HEAD^^2
+check_describe c-2-gHASH --tags c-2-g$(git rev-parse --short HEAD^^2)^0
check_describe B --tags HEAD^^2^
check_describe e --tags HEAD^^^
check_describe e --tags --exact-match HEAD^^^
@@ -725,4 +727,26 @@ test_expect_success '--exact-match does not show --always fallback' '
test_must_fail git describe --exact-match --always
'
+test_expect_success 'avoid being fooled by describe-like filename' '
+ test_when_finished rm out &&
+
+ git rev-parse --short HEAD >out &&
+ FILENAME=filename-g$(cat out) &&
+ touch $FILENAME &&
+ git add $FILENAME &&
+ git commit -m "Add $FILENAME" &&
+
+ git cat-file -t HEAD:$FILENAME >actual &&
+
+ echo blob >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'do not be fooled by invalid describe format ' '
+ test_when_finished rm out &&
+
+ git rev-parse --short HEAD >out &&
+ test_must_fail git cat-file -t "refs/tags/super-invalid/./../...../ ~^:/?*[////\\\\\\&}/busted.lock-42-g"$(cat out)
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v3 0/2] object-name: fix a pair of object name resolution issues
2025-01-13 17:13 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Elijah Newren via GitGitGadget
2025-01-13 17:13 ` [PATCH v3 1/2] object-name: fix resolution of object names containing curly braces Elijah Newren via GitGitGadget
2025-01-13 17:13 ` [PATCH v3 2/2] object-name: be more strict in parsing describe-like output Elijah Newren via GitGitGadget
@ 2025-01-13 18:15 ` Junio C Hamano
2025-01-13 19:26 ` Elijah Newren
2 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2025-01-13 18:15 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Patrick Steinhardt, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Changes since v2:
>
> * Readability improvement to the first patch, which fixes object name
> resolution with refs containing a curly brace
> * Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
> couple test cases for that. Also, extended the commit message a bit to
> discuss the cases brought up on the list.
>
> For the second patch, if folks want some open source examples where it could
> be triggered, I found two examples:
>
> * lore.git: git cat-file -t master:random/path/major-gaffed
> * git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e
>
> Elijah Newren (2):
> object-name: fix resolution of object names containing curly braces
> object-name: be more strict in parsing describe-like output
>
> object-name.c | 63 ++++++++++++++++++++++++++++++++++++++++++---
> t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
> t/t6120-describe.sh | 24 +++++++++++++++++
> 3 files changed, 113 insertions(+), 5 deletions(-)
>
Although ...
> + Note that we'd prefer duplicating the special logic for "@^" characters
> + here, because if get_oid_basic() or interpret_nth_prior_checkout() or
... I suspect that you meant "we'd prefer not duplicating" here,
both patches look very good to me.
Thanks, will replace.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] object-name: fix a pair of object name resolution issues
2025-01-13 18:15 ` [PATCH v3 0/2] object-name: fix a pair of object name resolution issues Junio C Hamano
@ 2025-01-13 19:26 ` Elijah Newren
0 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren @ 2025-01-13 19:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Patrick Steinhardt
On Mon, Jan 13, 2025 at 10:15 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Changes since v2:
> >
> > * Readability improvement to the first patch, which fixes object name
> > resolution with refs containing a curly brace
> > * Fixed the second patch for cases like OBJ-COUNT-gHASH~13 and added a
> > couple test cases for that. Also, extended the commit message a bit to
> > discuss the cases brought up on the list.
> >
> > For the second patch, if folks want some open source examples where it could
> > be triggered, I found two examples:
> >
> > * lore.git: git cat-file -t master:random/path/major-gaffed
> > * git.git: git cat-file -t super-invalid~///\\.....@.lock-gfd0bba94e
> >
> > Elijah Newren (2):
> > object-name: fix resolution of object names containing curly braces
> > object-name: be more strict in parsing describe-like output
> >
> > object-name.c | 63 ++++++++++++++++++++++++++++++++++++++++++---
> > t/t1006-cat-file.sh | 31 +++++++++++++++++++++-
> > t/t6120-describe.sh | 24 +++++++++++++++++
> > 3 files changed, 113 insertions(+), 5 deletions(-)
> >
>
> Although ...
>
> > + Note that we'd prefer duplicating the special logic for "@^" characters
> > + here, because if get_oid_basic() or interpret_nth_prior_checkout() or
>
> ... I suspect that you meant "we'd prefer not duplicating" here,
> both patches look very good to me.
Oops, indeed.
> Thanks, will replace.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread