* [PATCH 1/5] describe: pass oid struct by const pointer
2025-08-18 20:58 ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
@ 2025-08-18 20:59 ` Jeff King
2025-08-18 21:05 ` Junio C Hamano
2025-08-18 21:01 ` [PATCH 2/5] describe: error if blob not found Jeff King
` (3 subsequent siblings)
4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 20:59 UTC (permalink / raw)
To: René Scharfe; +Cc: phillip.wood, Cheng, git
We pass a "struct object_id" to describe_blob() by value. This isn't
wrong, as an oid is composed only of copy-able values. But it's unusual;
typically we pass structs by const pointer, including object_ids. Let's
do so.
It similarly makes sense for us to hold that pointer in the callback
data (rather than yet another copy of the oid).
Signed-off-by: Jeff King <peff@peff.net>
---
Not strictly related, but I noticed while in the area and remembered a
recent discussion in this direction.
builtin/describe.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..383d3e6b9a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -490,7 +490,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
struct process_commit_data {
struct object_id current_commit;
- struct object_id looking_for;
+ const struct object_id *looking_for;
struct strbuf *dst;
struct rev_info *revs;
};
@@ -505,7 +505,7 @@ static void process_object(struct object *obj, const char *path, void *data)
{
struct process_commit_data *pcd = data;
- if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+ if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
reset_revision_walk();
describe_commit(&pcd->current_commit, pcd->dst);
strbuf_addf(pcd->dst, ":%s", path);
@@ -514,7 +514,7 @@ static void process_object(struct object *obj, const char *path, void *data)
}
}
-static void describe_blob(struct object_id oid, struct strbuf *dst)
+static void describe_blob(const struct object_id *oid, struct strbuf *dst)
{
struct rev_info revs;
struct strvec args = STRVEC_INIT;
@@ -554,7 +554,7 @@ static void describe(const char *arg, int last_one)
describe_commit(&oid, &sb);
else if (odb_read_object_info(the_repository->objects,
&oid, NULL) == OBJ_BLOB)
- describe_blob(oid, &sb);
+ describe_blob(&oid, &sb);
else
die(_("%s is neither a commit nor blob"), arg);
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 1/5] describe: pass oid struct by const pointer
2025-08-18 20:59 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
@ 2025-08-18 21:05 ` Junio C Hamano
0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:05 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
Jeff King <peff@peff.net> writes:
> We pass a "struct object_id" to describe_blob() by value. This isn't
> wrong, as an oid is composed only of copy-able values. But it's unusual;
> typically we pass structs by const pointer, including object_ids. Let's
> do so.
>
> It similarly makes sense for us to hold that pointer in the callback
> data (rather than yet another copy of the oid).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Not strictly related, but I noticed while in the area and remembered a
> recent discussion in this direction.
Thanks for making this part of the code less odd.
> builtin/describe.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index d7dd8139de..383d3e6b9a 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -490,7 +490,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
>
> struct process_commit_data {
> struct object_id current_commit;
> - struct object_id looking_for;
> + const struct object_id *looking_for;
> struct strbuf *dst;
> struct rev_info *revs;
> };
> @@ -505,7 +505,7 @@ static void process_object(struct object *obj, const char *path, void *data)
> {
> struct process_commit_data *pcd = data;
>
> - if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
> + if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
> reset_revision_walk();
> describe_commit(&pcd->current_commit, pcd->dst);
> strbuf_addf(pcd->dst, ":%s", path);
> @@ -514,7 +514,7 @@ static void process_object(struct object *obj, const char *path, void *data)
> }
> }
>
> -static void describe_blob(struct object_id oid, struct strbuf *dst)
> +static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> {
> struct rev_info revs;
> struct strvec args = STRVEC_INIT;
> @@ -554,7 +554,7 @@ static void describe(const char *arg, int last_one)
> describe_commit(&oid, &sb);
> else if (odb_read_object_info(the_repository->objects,
> &oid, NULL) == OBJ_BLOB)
> - describe_blob(oid, &sb);
> + describe_blob(&oid, &sb);
> else
> die(_("%s is neither a commit nor blob"), arg);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/5] describe: error if blob not found
2025-08-18 20:58 ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
2025-08-18 20:59 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
@ 2025-08-18 21:01 ` Jeff King
2025-08-18 21:12 ` Junio C Hamano
2025-08-19 18:32 ` René Scharfe
2025-08-18 21:01 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
` (2 subsequent siblings)
4 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:01 UTC (permalink / raw)
To: René Scharfe; +Cc: phillip.wood, Cheng, git
If describe_blob() does not find the blob in question, it returns an
empty strbuf, and we print an empty line. This differs from
describe_commit(), which always either returns an answer or calls die()
itself. As the blob function was bolted onto the command afterwards, I
think its behavior is not intentional, and it is just a bug that it does
not report an error.
Signed-off-by: Jeff King <peff@peff.net>
---
This one is perhaps the most controversial, as it is a change in
behavior. But the current behavior just really seems like a bug to me.
Unlike what René posted earlier, I didn't record the dst strbuf's
original size and compare against that. This is a static function with
only one caller that passes in an empty strbuf, so being overly
defensive didn't seem worth it (arguably these functions should just
return an allocated buffer anyway).
builtin/describe.c | 3 +++
t/t6120-describe.sh | 6 ++++++
2 files changed, 9 insertions(+)
diff --git a/builtin/describe.c b/builtin/describe.c
index 383d3e6b9a..06e413d937 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
reset_revision_walk();
release_revisions(&revs);
strvec_clear(&args);
+
+ if (!dst->len)
+ die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));
}
static void describe(const char *arg, int last_one)
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 256ccaefb7..470631d17d 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
'
+test_expect_success 'describe an unreachable blob' '
+ blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
+ test_must_fail git describe $blob 2>actual &&
+ test_grep "blob .$blob. not reachable from HEAD" actual
+'
+
test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
i=1 &&
while test $i -lt 8000
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 2/5] describe: error if blob not found
2025-08-18 21:01 ` [PATCH 2/5] describe: error if blob not found Jeff King
@ 2025-08-18 21:12 ` Junio C Hamano
2025-08-19 8:05 ` Patrick Steinhardt
2025-08-19 18:32 ` René Scharfe
1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:12 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
Jeff King <peff@peff.net> writes:
> If describe_blob() does not find the blob in question, it returns an
> empty strbuf, and we print an empty line. This differs from
> describe_commit(), which always either returns an answer or calls die()
> itself. As the blob function was bolted onto the command afterwards, I
> think its behavior is not intentional, and it is just a bug that it does
> not report an error.
Yes, let's do so. Silently succeeding without returning anything
useful is not what we usually do in this system.
> This one is perhaps the most controversial, as it is a change in
> behavior. But the current behavior just really seems like a bug to me.
>
> Unlike what René posted earlier, I didn't record the dst strbuf's
> original size and compare against that. This is a static function with
> only one caller that passes in an empty strbuf, so being overly
> defensive didn't seem worth it (arguably these functions should just
> return an allocated buffer anyway).
Sounds sensible.
Thanks.
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 383d3e6b9a..06e413d937 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> reset_revision_walk();
> release_revisions(&revs);
> strvec_clear(&args);
> +
> + if (!dst->len)
> + die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));
> }
>
> static void describe(const char *arg, int last_one)
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 256ccaefb7..470631d17d 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
> test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
> '
>
> +test_expect_success 'describe an unreachable blob' '
> + blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
> + test_must_fail git describe $blob 2>actual &&
> + test_grep "blob .$blob. not reachable from HEAD" actual
> +'
> +
> test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
> i=1 &&
> while test $i -lt 8000
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 2/5] describe: error if blob not found
2025-08-18 21:12 ` Junio C Hamano
@ 2025-08-19 8:05 ` Patrick Steinhardt
0 siblings, 0 replies; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19 8:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, René Scharfe, phillip.wood, Cheng, git
On Mon, Aug 18, 2025 at 02:12:15PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > If describe_blob() does not find the blob in question, it returns an
> > empty strbuf, and we print an empty line. This differs from
> > describe_commit(), which always either returns an answer or calls die()
> > itself. As the blob function was bolted onto the command afterwards, I
> > think its behavior is not intentional, and it is just a bug that it does
> > not report an error.
>
> Yes, let's do so. Silently succeeding without returning anything
> useful is not what we usually do in this system.
Seconded. It might be breaking backwards compatibility, but it really
sounds like just another bug in our codebase that ought to be fixed.
Patrick
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/5] describe: error if blob not found
2025-08-18 21:01 ` [PATCH 2/5] describe: error if blob not found Jeff King
2025-08-18 21:12 ` Junio C Hamano
@ 2025-08-19 18:32 ` René Scharfe
1 sibling, 0 replies; 28+ messages in thread
From: René Scharfe @ 2025-08-19 18:32 UTC (permalink / raw)
To: Jeff King; +Cc: phillip.wood, Cheng, git
On 8/18/25 11:01 PM, Jeff King wrote:
> If describe_blob() does not find the blob in question, it returns an
> empty strbuf, and we print an empty line. This differs from
> describe_commit(), which always either returns an answer or calls die()
> itself. As the blob function was bolted onto the command afterwards, I
> think its behavior is not intentional, and it is just a bug that it does
> not report an error.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This one is perhaps the most controversial, as it is a change in
> behavior. But the current behavior just really seems like a bug to me.
>
> Unlike what René posted earlier, I didn't record the dst strbuf's
> original size and compare against that. This is a static function with
> only one caller that passes in an empty strbuf, so being overly
> defensive didn't seem worth it
Makes sense.
> (arguably these functions should just
> return an allocated buffer anyway).
Or even print results directly.
> builtin/describe.c | 3 +++
> t/t6120-describe.sh | 6 ++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 383d3e6b9a..06e413d937 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> reset_revision_walk();
> release_revisions(&revs);
> strvec_clear(&args);
> +
> + if (!dst->len)
> + die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));
I like the clarity and precision of this message.
The rest of the patches look good to me as well. The first is a nice
little bonus.
René
> }
>
> static void describe(const char *arg, int last_one)
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 256ccaefb7..470631d17d 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
> test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
> '
>
> +test_expect_success 'describe an unreachable blob' '
> + blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
> + test_must_fail git describe $blob 2>actual &&
> + test_grep "blob .$blob. not reachable from HEAD" actual
> +'
> +
> test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
> i=1 &&
> while test $i -lt 8000
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 3/5] describe: catch unborn branch in describe_blob()
2025-08-18 20:58 ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
2025-08-18 20:59 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
2025-08-18 21:01 ` [PATCH 2/5] describe: error if blob not found Jeff King
@ 2025-08-18 21:01 ` Jeff King
2025-08-18 21:19 ` Junio C Hamano
2025-08-18 21:03 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
2025-08-18 21:04 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:01 UTC (permalink / raw)
To: René Scharfe; +Cc: phillip.wood, Cheng, git
When describing a blob, we search for it by traversing from HEAD. We do
this by feeding the name HEAD to setup_revisions(). But if we are on an
unborn branch, this will fail with a confusing message:
$ git describe $blob
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
It is OK for this to be an error (we cannot find $blob in an empty
traversal, so we'd eventually complain about that). But the error
message could be more helpful.
Let's resolve HEAD ourselves and pass the resolved object id to
setup_revisions(). If resolving fails, then we can print a more useful
message.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/describe.c | 8 +++++++-
t/t6120-describe.sh | 8 ++++++++
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index 06e413d937..f7bea3c8c5 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
{
struct rev_info revs;
struct strvec args = STRVEC_INIT;
+ struct object_id head_oid;
struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+ if (repo_get_oid(the_repository, "HEAD", &head_oid))
+ die(_("cannot search for blob '%s' on an unborn branch"),
+ oid_to_hex(oid));
+
strvec_pushl(&args, "internal: The first arg is not parsed",
- "--objects", "--in-commit-order", "--reverse", "HEAD",
+ "--objects", "--in-commit-order", "--reverse",
+ oid_to_hex(&head_oid),
NULL);
repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 470631d17d..feec57bcbc 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -415,6 +415,14 @@ test_expect_success 'describe an unreachable blob' '
test_grep "blob .$blob. not reachable from HEAD" actual
'
+test_expect_success 'describe blob on an unborn branch' '
+ oldbranch=$(git symbolic-ref HEAD) &&
+ test_when_finished "git symbolic-ref HEAD $oldbranch" &&
+ git symbolic-ref HEAD refs/heads/does-not-exist &&
+ test_must_fail git describe test-blob 2>actual &&
+ test_grep "cannot search .* on an unborn branch" actual
+'
+
test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
i=1 &&
while test $i -lt 8000
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 3/5] describe: catch unborn branch in describe_blob()
2025-08-18 21:01 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
@ 2025-08-18 21:19 ` Junio C Hamano
2025-08-18 23:07 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:19 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
Jeff King <peff@peff.net> writes:
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> {
> struct rev_info revs;
> struct strvec args = STRVEC_INIT;
> + struct object_id head_oid;
> struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
An unrelated tangent, but it seems that we are copying the object
name for the first member of this struct, even though 1/5 changed
the second one.
> + if (repo_get_oid(the_repository, "HEAD", &head_oid))
> + die(_("cannot search for blob '%s' on an unborn branch"),
> + oid_to_hex(oid));
Makes sense. I briefly wondered if there is really a point in doing
the traversal only from HEAD, but this topic is not about enhancing
and making the "describe <blob>" more useful, but it is a strict
improvement. When you first mentioned "resolve HEAD ourselves", I
somehow expected you to ask the ref subsystem to resolve HEAD, but
this should do fine, thanks to ref_rev_parse_rules[].
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 470631d17d..feec57bcbc 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -415,6 +415,14 @@ test_expect_success 'describe an unreachable blob' '
> test_grep "blob .$blob. not reachable from HEAD" actual
> '
>
> +test_expect_success 'describe blob on an unborn branch' '
> + oldbranch=$(git symbolic-ref HEAD) &&
> + test_when_finished "git symbolic-ref HEAD $oldbranch" &&
> + git symbolic-ref HEAD refs/heads/does-not-exist &&
> + test_must_fail git describe test-blob 2>actual &&
> + test_grep "cannot search .* on an unborn branch" actual
> +'
> +
> test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
> i=1 &&
> while test $i -lt 8000
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 3/5] describe: catch unborn branch in describe_blob()
2025-08-18 21:19 ` Junio C Hamano
@ 2025-08-18 23:07 ` Jeff King
0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 23:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: René Scharfe, phillip.wood, Cheng, git
On Mon, Aug 18, 2025 at 02:19:42PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> > {
> > struct rev_info revs;
> > struct strvec args = STRVEC_INIT;
> > + struct object_id head_oid;
> > struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
>
> An unrelated tangent, but it seems that we are copying the object
> name for the first member of this struct, even though 1/5 changed
> the second one.
Yep, agreed. It goes away in the final patch, though.
> > + if (repo_get_oid(the_repository, "HEAD", &head_oid))
> > + die(_("cannot search for blob '%s' on an unborn branch"),
> > + oid_to_hex(oid));
>
> Makes sense. I briefly wondered if there is really a point in doing
> the traversal only from HEAD, but this topic is not about enhancing
> and making the "describe <blob>" more useful, but it is a strict
> improvement.
Yeah. We do document the traversal from HEAD, so I wondered if anybody
would be upset at being more inclusive. So it might need a new option.
Somebody who is more interested in the option is welcome to pick up that
topic. :)
> When you first mentioned "resolve HEAD ourselves", I somehow expected
> you to ask the ref subsystem to resolve HEAD, but this should do fine,
> thanks to ref_rev_parse_rules[].
I hadn't really thought about the distinction. If you grep for '"HEAD"',
looks like we have a mix of both types. I don't think it should matter
much in practice.
Some even call lookup_commit_reference_by_name(). So we could perhaps do
that, in which case the segfault fix in patch 4 just happens naturally,
because we know we are always traversing from at least one commit. I
don't see any difference between the approaches compelling enough to
switch, though.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 4/5] describe: handle blob traversal with no commits
2025-08-18 20:58 ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
` (2 preceding siblings ...)
2025-08-18 21:01 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
@ 2025-08-18 21:03 ` Jeff King
2025-08-19 8:05 ` Patrick Steinhardt
2025-08-18 21:04 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:03 UTC (permalink / raw)
To: René Scharfe; +Cc: phillip.wood, Cheng, git
When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).
This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:
blob=$(echo foo | git hash-object -w --stdin)
echo $blob >.git/HEAD
git describe $blob
We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.
This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.
But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.
So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).
I didn't include a test here because it requires corrupting the
repository in a way that is only easy to do using the files ref backend.
It doesn't seem worth carrying a REFFILES test just for this oddity.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/describe.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index f7bea3c8c5..72b2e1162c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
reset_revision_walk();
- describe_commit(&pcd->current_commit, pcd->dst);
- strbuf_addf(pcd->dst, ":%s", path);
+ if (!is_null_oid(&pcd->current_commit)) {
+ describe_commit(&pcd->current_commit, pcd->dst);
+ strbuf_addf(pcd->dst, ":%s", path);
+ }
free_commit_list(pcd->revs->commits);
pcd->revs->commits = NULL;
}
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 4/5] describe: handle blob traversal with no commits
2025-08-18 21:03 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
@ 2025-08-19 8:05 ` Patrick Steinhardt
2025-08-19 16:59 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19 8:05 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
On Mon, Aug 18, 2025 at 05:03:12PM -0400, Jeff King wrote:
> When describing a blob, we traverse from HEAD, remembering each commit
> we saw, and then checking each blob to report the containing commit.
> But if we haven't seen any commits at all, we'll segfault (we store the
> "current" commit as an oid initialized to the null oid, causing
> lookup_commit_reference() to return NULL).
>
> This shouldn't be able to happen normally. We always start our traversal
> at HEAD, which must be a commit (a property which is enforced by the
> refs code). But you can trigger the segfault like this:
>
> blob=$(echo foo | git hash-object -w --stdin)
> echo $blob >.git/HEAD
> git describe $blob
I bet that the enforcement is only of partial nature, and that there are
ways to do the above e.g. via git-update-ref(1) or by playing around
with symrefs.
[snip]
> I didn't include a test here because it requires corrupting the
> repository in a way that is only easy to do using the files ref backend.
> It doesn't seem worth carrying a REFFILES test just for this oddity.
True:
$ git update-ref HEAD HEAD^{tree}
fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'
But:
$ git update-ref refs/some/tree HEAD^{tree}
$ git symbolic-ref HEAD refs/some/tree
$ git show
tree HEAD
So that should allow you to write a test, right?
Patrick
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 4/5] describe: handle blob traversal with no commits
2025-08-19 8:05 ` Patrick Steinhardt
@ 2025-08-19 16:59 ` Jeff King
2025-08-20 4:34 ` Patrick Steinhardt
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-19 16:59 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: René Scharfe, phillip.wood, Cheng, git
On Tue, Aug 19, 2025 at 10:05:19AM +0200, Patrick Steinhardt wrote:
> > I didn't include a test here because it requires corrupting the
> > repository in a way that is only easy to do using the files ref backend.
> > It doesn't seem worth carrying a REFFILES test just for this oddity.
>
> True:
>
> $ git update-ref HEAD HEAD^{tree}
> fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'
>
> But:
>
> $ git update-ref refs/some/tree HEAD^{tree}
> $ git symbolic-ref HEAD refs/some/tree
> $ git show
> tree HEAD
>
> So that should allow you to write a test, right?
Hrm, that seems like a bug. I thought we insisted that HEAD point at
refs/heads.
Ah, no. We did that in b229d18a80 (validate_headref: tighten
ref-matching to just branches, 2009-01-29), but had to revert it in
e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
keep compatibility for topgit. :(
Still, I'm not sure it's something I'd want to base a test on. Maybe if
there is a big comment that says "It is OK to invalidate and remove this
test if we ever tighten symbolic-ref" it would be OK?
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH 4/5] describe: handle blob traversal with no commits
2025-08-19 16:59 ` Jeff King
@ 2025-08-20 4:34 ` Patrick Steinhardt
2025-08-20 6:30 ` [replacement PATCH " Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-20 4:34 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
On Tue, Aug 19, 2025 at 12:59:47PM -0400, Jeff King wrote:
> On Tue, Aug 19, 2025 at 10:05:19AM +0200, Patrick Steinhardt wrote:
>
> > > I didn't include a test here because it requires corrupting the
> > > repository in a way that is only easy to do using the files ref backend.
> > > It doesn't seem worth carrying a REFFILES test just for this oddity.
> >
> > True:
> >
> > $ git update-ref HEAD HEAD^{tree}
> > fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'
> >
> > But:
> >
> > $ git update-ref refs/some/tree HEAD^{tree}
> > $ git symbolic-ref HEAD refs/some/tree
> > $ git show
> > tree HEAD
> >
> > So that should allow you to write a test, right?
>
> Hrm, that seems like a bug. I thought we insisted that HEAD point at
> refs/heads.
>
> Ah, no. We did that in b229d18a80 (validate_headref: tighten
> ref-matching to just branches, 2009-01-29), but had to revert it in
> e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
> keep compatibility for topgit. :(
Well, that's certainly from before my time in the Git project :) I guess
changing semantics now would be quite risky. Reintroducing this change
feels out of the picture, but an alternative one could think about is to
validate that HEAD always points to a commit(-ish?).
But ultimately I'm not sure it's even worth it. If people really want to
shoot themselves into the foot they'll find a way to do so.
> Still, I'm not sure it's something I'd want to base a test on. Maybe if
> there is a big comment that says "It is OK to invalidate and remove this
> test if we ever tighten symbolic-ref" it would be OK?
That sounds reasonable to me, yeah.
Patrick
^ permalink raw reply [flat|nested] 28+ messages in thread* [replacement PATCH 4/5] describe: handle blob traversal with no commits
2025-08-20 4:34 ` Patrick Steinhardt
@ 2025-08-20 6:30 ` Jeff King
0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-20 6:30 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: René Scharfe, phillip.wood, Cheng, git, Junio C Hamano
On Wed, Aug 20, 2025 at 06:34:16AM +0200, Patrick Steinhardt wrote:
> > Ah, no. We did that in b229d18a80 (validate_headref: tighten
> > ref-matching to just branches, 2009-01-29), but had to revert it in
> > e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
> > keep compatibility for topgit. :(
>
> Well, that's certainly from before my time in the Git project :) I guess
> changing semantics now would be quite risky. Reintroducing this change
> feels out of the picture, but an alternative one could think about is to
> validate that HEAD always points to a commit(-ish?).
Yeah, that's _probably_ OK. I don't remember how topgit works at all,
but I think its custom refs do at least point to commits.
> But ultimately I'm not sure it's even worth it. If people really want to
> shoot themselves into the foot they'll find a way to do so.
Yeah, agreed that it's probably not urgent.
> > Still, I'm not sure it's something I'd want to base a test on. Maybe if
> > there is a big comment that says "It is OK to invalidate and remove this
> > test if we ever tighten symbolic-ref" it would be OK?
>
> That sounds reasonable to me, yeah.
OK. Here's a replacement for patch 4, then, with a test. Nothing else in
the series should need to be touched.
-- >8 --
Subject: [PATCH] describe: handle blob traversal with no commits
When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).
This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:
blob=$(echo foo | git hash-object -w --stdin)
echo $blob >.git/HEAD
git describe $blob
We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.
This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.
But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.
So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).
The test here does not use the formula above, which works only for the
files backend (and not reftables). Instead we use another loophole to
create the bogus state using only Git commands. See the comment in the
test for details.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/describe.c | 6 ++++--
t/t6120-describe.sh | 16 ++++++++++++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index f7bea3c8c5..72b2e1162c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
reset_revision_walk();
- describe_commit(&pcd->current_commit, pcd->dst);
- strbuf_addf(pcd->dst, ":%s", path);
+ if (!is_null_oid(&pcd->current_commit)) {
+ describe_commit(&pcd->current_commit, pcd->dst);
+ strbuf_addf(pcd->dst, ":%s", path);
+ }
free_commit_list(pcd->revs->commits);
pcd->revs->commits = NULL;
}
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index feec57bcbc..2c70cc561a 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -423,6 +423,22 @@ test_expect_success 'describe blob on an unborn branch' '
test_grep "cannot search .* on an unborn branch" actual
'
+# This test creates a repository state that we generally try to disallow: HEAD
+# is pointing to an object that is not a commit. The ref update code forbids
+# non-commit writes directly to HEAD or to any branch in refs/heads/. But we
+# can use the loophole of pointing HEAD to another non-branch ref (something we
+# should forbid, but don't for historical reasons).
+#
+# Do not take this test as an endorsement of the loophole! If we ever tighten
+# it, it is reasonable to just drop this test entirely.
+test_expect_success 'describe blob on a non-commit HEAD' '
+ oldbranch=$(git symbolic-ref HEAD) &&
+ test_when_finished "git symbolic-ref HEAD $oldbranch" &&
+ git symbolic-ref HEAD refs/tags/test-blob &&
+ test_must_fail git describe test-blob 2>actual &&
+ test_grep "blob .* not reachable from HEAD" actual
+'
+
test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
i=1 &&
while test $i -lt 8000
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/5] describe: pass commit to describe_commit()
2025-08-18 20:58 ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
` (3 preceding siblings ...)
2025-08-18 21:03 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
@ 2025-08-18 21:04 ` Jeff King
2025-08-19 8:05 ` Patrick Steinhardt
4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:04 UTC (permalink / raw)
To: René Scharfe; +Cc: phillip.wood, Cheng, git
There's a call in describe_commit() to lookup_commit_reference(), but we
don't check the return value. If it returns NULL, we'll segfault as we
immediately dereference the result.
In practice this can never happen, since all callers pass an oid which
came from a "struct commit" already. So we can make this more obvious
by just taking that commit struct in the first place.
Reported-by: Cheng <prophecheng@stu.pku.edu.cn>
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/describe.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/builtin/describe.c b/builtin/describe.c
index 72b2e1162c..04df89d56b 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -313,26 +313,24 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
repo_find_unique_abbrev(the_repository, oid, abbrev));
}
-static void describe_commit(struct object_id *oid, struct strbuf *dst)
+static void describe_commit(struct commit *cmit, struct strbuf *dst)
{
- struct commit *cmit, *gave_up_on = NULL;
+ struct commit *gave_up_on = NULL;
struct commit_list *list;
struct commit_name *n;
struct possible_tag all_matches[MAX_TAGS];
unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
unsigned long seen_commits = 0;
unsigned int unannotated_cnt = 0;
- cmit = lookup_commit_reference(the_repository, oid);
-
n = find_commit_name(&cmit->object.oid);
if (n && (tags || all || n->prio == 2)) {
/*
* Exact match to an existing ref.
*/
append_name(n, dst);
if (n->misnamed || longformat)
- append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
+ append_suffix(0, n->tag ? get_tagged_oid(n->tag) : &cmit->object.oid, dst);
if (suffix)
strbuf_addstr(dst, suffix);
return;
@@ -489,7 +487,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
}
struct process_commit_data {
- struct object_id current_commit;
+ struct commit *current_commit;
const struct object_id *looking_for;
struct strbuf *dst;
struct rev_info *revs;
@@ -498,7 +496,7 @@ struct process_commit_data {
static void process_commit(struct commit *commit, void *data)
{
struct process_commit_data *pcd = data;
- pcd->current_commit = commit->object.oid;
+ pcd->current_commit = commit;
}
static void process_object(struct object *obj, const char *path, void *data)
@@ -507,8 +505,8 @@ static void process_object(struct object *obj, const char *path, void *data)
if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
reset_revision_walk();
- if (!is_null_oid(&pcd->current_commit)) {
- describe_commit(&pcd->current_commit, pcd->dst);
+ if (pcd->current_commit) {
+ describe_commit(pcd->current_commit, pcd->dst);
strbuf_addf(pcd->dst, ":%s", path);
}
free_commit_list(pcd->revs->commits);
@@ -521,7 +519,7 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
struct rev_info revs;
struct strvec args = STRVEC_INIT;
struct object_id head_oid;
- struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+ struct process_commit_data pcd = { NULL, oid, dst, &revs};
if (repo_get_oid(the_repository, "HEAD", &head_oid))
die(_("cannot search for blob '%s' on an unborn branch"),
@@ -562,7 +560,7 @@ static void describe(const char *arg, int last_one)
cmit = lookup_commit_reference_gently(the_repository, &oid, 1);
if (cmit)
- describe_commit(&oid, &sb);
+ describe_commit(cmit, &sb);
else if (odb_read_object_info(the_repository->objects,
&oid, NULL) == OBJ_BLOB)
describe_blob(&oid, &sb);
--
2.51.0.326.gecbb38d78e
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH 5/5] describe: pass commit to describe_commit()
2025-08-18 21:04 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
@ 2025-08-19 8:05 ` Patrick Steinhardt
2025-08-19 17:02 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19 8:05 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git
On Mon, Aug 18, 2025 at 05:04:17PM -0400, Jeff King wrote:
> There's a call in describe_commit() to lookup_commit_reference(), but we
> don't check the return value. If it returns NULL, we'll segfault as we
> immediately dereference the result.
>
> In practice this can never happen, since all callers pass an oid which
> came from a "struct commit" already. So we can make this more obvious
> by just taking that commit struct in the first place.
I was wondering a bit about commit-graphs. We had the case in the past
where it was possible to look up commits via the graph even though they
don't exist in the ODB. So we might actually end up with a missing
object if `GIT_COMMIT_GRAPH_PARANOIA=false`, which is the default value.
But that might be fine? No idea without digging further.
In any case, the refactoring makes sense regardless from my point of
view.
Patrick
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/5] describe: pass commit to describe_commit()
2025-08-19 8:05 ` Patrick Steinhardt
@ 2025-08-19 17:02 ` Jeff King
0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-19 17:02 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: René Scharfe, phillip.wood, Cheng, git
On Tue, Aug 19, 2025 at 10:05:25AM +0200, Patrick Steinhardt wrote:
> On Mon, Aug 18, 2025 at 05:04:17PM -0400, Jeff King wrote:
> > There's a call in describe_commit() to lookup_commit_reference(), but we
> > don't check the return value. If it returns NULL, we'll segfault as we
> > immediately dereference the result.
> >
> > In practice this can never happen, since all callers pass an oid which
> > came from a "struct commit" already. So we can make this more obvious
> > by just taking that commit struct in the first place.
>
> I was wondering a bit about commit-graphs. We had the case in the past
> where it was possible to look up commits via the graph even though they
> don't exist in the ODB. So we might actually end up with a missing
> object if `GIT_COMMIT_GRAPH_PARANOIA=false`, which is the default value.
> But that might be fine? No idea without digging further.
I don't think existence matters here. Any call to parse_object() will
call lookup_object() and find the same commit struct we already had, and
then exit immediately because its "parsed" flag is set. So we'd never
get a NULL return from lookup_commit_reference().
> In any case, the refactoring makes sense regardless from my point of
> view.
But yeah. Even if I am wrong above, this would fix it. ;)
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread