git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Potential Null Pointer Dereference detected by static analysis tool
@ 2025-08-13  0:23 Cheng
  2025-08-13 13:19 ` Phillip Wood
  0 siblings, 1 reply; 28+ messages in thread
From: Cheng @ 2025-08-13  0:23 UTC (permalink / raw)
  To: git



line 326 in builtin/describe.cdescribe.c, which is located in the function describe_commit. In the following code, cmit could be NULL passed to the call, which then causes a NULL dereference. Seems should be replaced lookup_commit_reference with lookup_commit_or_die.


```cpp
cmit = lookup_commit_reference(the_repository, oid);
n = find_commit_name(&cmit->object.oid);
```
    


The NULL value seems to come from function lookup_commit_reference_gently where:

- 1. call to deref_tag may return NULL.

- 2. call to object_as_type may return NULL.


In this repository,  other calls  lookup_commit_reference are followed by a null check. So this seems to lead to NULL dereference. Can I confirm with you whether this is a true positive bug report?









^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-13  0:23 Potential Null Pointer Dereference detected by static analysis tool Cheng
@ 2025-08-13 13:19 ` Phillip Wood
  2025-08-14 23:26   ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Phillip Wood @ 2025-08-13 13:19 UTC (permalink / raw)
  To: Cheng, git

Hi Cheng

On 13/08/2025 01:23, Cheng wrote:
> 
> 
> line 326 in builtin/describe.cdescribe.c, which is located in the function describe_commit. In the following code, cmit could be NULL passed to the call, which then causes a NULL dereference. Seems should be replaced lookup_commit_reference with lookup_commit_or_die.
> 
> 
> ```cpp
> cmit = lookup_commit_reference(the_repository, oid);
> n = find_commit_name(&cmit->object.oid);
> ```
>      
> 
> 
> The NULL value seems to come from function lookup_commit_reference_gently where:
> 
> - 1. call to deref_tag may return NULL.
> 
> - 2. call to object_as_type may return NULL.
> 
> 
> In this repository,  other calls  lookup_commit_reference are followed by a null check. So this seems to lead to NULL dereference. Can I confirm with you whether this is a true positive bug report?

I had a quick look at the callers of describe_commit() and they all seem 
to use an oid that they get from looking up a commit so I'm not sure 
under what circumstances this call to lookup_commit_reference() can fail.

Thanks

Phillip


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-13 13:19 ` Phillip Wood
@ 2025-08-14 23:26   ` Jeff King
  2025-08-15 15:49     ` Phillip Wood
  2025-08-17  9:27     ` René Scharfe
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2025-08-14 23:26 UTC (permalink / raw)
  To: phillip.wood; +Cc: Cheng, git

On Wed, Aug 13, 2025 at 02:19:14PM +0100, Phillip Wood wrote:

> I had a quick look at the callers of describe_commit() and they all seem to
> use an oid that they get from looking up a commit so I'm not sure under what
> circumstances this call to lookup_commit_reference() can fail.

I wonder if it would make sense for describe_commit() to just take a
"struct commit" pointer. Then it could skip the call to turn the oid
into a commit entirely, and the compiler would make sure we always have
a commit. :)

Something like this (totally untested, and not something I'm planning to
follow up on, but maybe inspirational):

diff --git a/builtin/describe.c b/builtin/describe.c
index 32f5bf513f..3e8691a4c4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -352,26 +352,24 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
 		    repo_find_unique_abbrev(the_repository, oid, abbrev));
 }
 
-static void describe_commit(struct object_id *oid, struct strbuf *dst)
+static void describe_commit(struct commit *cmit, struct strbuf *dst)
 {
-	struct commit *cmit, *gave_up_on = NULL;
+	struct commit *gave_up_on = NULL;
 	struct lazy_queue queue = LAZY_QUEUE_INIT;
 	struct commit_name *n;
 	struct possible_tag all_matches[MAX_TAGS];
 	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
-	cmit = lookup_commit_reference(the_repository, oid);
-
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
 		/*
 		 * Exact match to an existing ref.
 		 */
 		append_name(n, dst);
 		if (n->misnamed || longformat)
-			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
+			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : &cmit->object.oid, dst);
 		if (suffix)
 			strbuf_addstr(dst, suffix);
 		return;
@@ -528,7 +526,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 }
 
 struct process_commit_data {
-	struct object_id current_commit;
+	struct commit *current_commit;
 	struct object_id looking_for;
 	struct strbuf *dst;
 	struct rev_info *revs;
@@ -537,7 +535,7 @@ struct process_commit_data {
 static void process_commit(struct commit *commit, void *data)
 {
 	struct process_commit_data *pcd = data;
-	pcd->current_commit = commit->object.oid;
+	pcd->current_commit = commit;
 }
 
 static void process_object(struct object *obj, const char *path, void *data)
@@ -546,7 +544,7 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
+		describe_commit(pcd->current_commit, pcd->dst);
 		strbuf_addf(pcd->dst, ":%s", path);
 		clear_prio_queue(&pcd->revs->commits);
 	}
@@ -556,7 +554,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 {
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
-	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+	struct process_commit_data pcd = { NULL, oid, dst, &revs};
 
 	strvec_pushl(&args, "internal: The first arg is not parsed",
 		     "--objects", "--in-commit-order", "--reverse", "HEAD",
@@ -589,7 +587,7 @@ static void describe(const char *arg, int last_one)
 	cmit = lookup_commit_reference_gently(the_repository, &oid, 1);
 
 	if (cmit)
-		describe_commit(&oid, &sb);
+		describe_commit(cmit, &sb);
 	else if (odb_read_object_info(the_repository->objects,
 				      &oid, NULL) == OBJ_BLOB)
 		describe_blob(oid, &sb);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-14 23:26   ` Jeff King
@ 2025-08-15 15:49     ` Phillip Wood
  2025-08-17  9:27     ` René Scharfe
  1 sibling, 0 replies; 28+ messages in thread
From: Phillip Wood @ 2025-08-15 15:49 UTC (permalink / raw)
  To: Jeff King, phillip.wood; +Cc: Cheng, git

Hi Peff

On 15/08/2025 00:26, Jeff King wrote:
> On Wed, Aug 13, 2025 at 02:19:14PM +0100, Phillip Wood wrote:
> 
>> I had a quick look at the callers of describe_commit() and they all seem to
>> use an oid that they get from looking up a commit so I'm not sure under what
>> circumstances this call to lookup_commit_reference() can fail.
> 
> I wonder if it would make sense for describe_commit() to just take a
> "struct commit" pointer. Then it could skip the call to turn the oid
> into a commit entirely, and the compiler would make sure we always have
> a commit. :)

I think that's a good idea, it would be clearer to the reader that we've 
already looked up the commit before calling describe_commit() as well.

Thanks

Phillip

> Something like this (totally untested, and not something I'm planning to
> follow up on, but maybe inspirational):
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 32f5bf513f..3e8691a4c4 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -352,26 +352,24 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
>   		    repo_find_unique_abbrev(the_repository, oid, abbrev));
>   }
>   
> -static void describe_commit(struct object_id *oid, struct strbuf *dst)
> +static void describe_commit(struct commit *cmit, struct strbuf *dst)
>   {
> -	struct commit *cmit, *gave_up_on = NULL;
> +	struct commit *gave_up_on = NULL;
>   	struct lazy_queue queue = LAZY_QUEUE_INIT;
>   	struct commit_name *n;
>   	struct possible_tag all_matches[MAX_TAGS];
>   	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
>   	unsigned long seen_commits = 0;
>   	unsigned int unannotated_cnt = 0;
>   
> -	cmit = lookup_commit_reference(the_repository, oid);
> -
>   	n = find_commit_name(&cmit->object.oid);
>   	if (n && (tags || all || n->prio == 2)) {
>   		/*
>   		 * Exact match to an existing ref.
>   		 */
>   		append_name(n, dst);
>   		if (n->misnamed || longformat)
> -			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
> +			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : &cmit->object.oid, dst);
>   		if (suffix)
>   			strbuf_addstr(dst, suffix);
>   		return;
> @@ -528,7 +526,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
>   }
>   
>   struct process_commit_data {
> -	struct object_id current_commit;
> +	struct commit *current_commit;
>   	struct object_id looking_for;
>   	struct strbuf *dst;
>   	struct rev_info *revs;
> @@ -537,7 +535,7 @@ struct process_commit_data {
>   static void process_commit(struct commit *commit, void *data)
>   {
>   	struct process_commit_data *pcd = data;
> -	pcd->current_commit = commit->object.oid;
> +	pcd->current_commit = commit;
>   }
>   
>   static void process_object(struct object *obj, const char *path, void *data)
> @@ -546,7 +544,7 @@ static void process_object(struct object *obj, const char *path, void *data)
>   
>   	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
>   		reset_revision_walk();
> -		describe_commit(&pcd->current_commit, pcd->dst);
> +		describe_commit(pcd->current_commit, pcd->dst);
>   		strbuf_addf(pcd->dst, ":%s", path);
>   		clear_prio_queue(&pcd->revs->commits);
>   	}
> @@ -556,7 +554,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
>   {
>   	struct rev_info revs;
>   	struct strvec args = STRVEC_INIT;
> -	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
> +	struct process_commit_data pcd = { NULL, oid, dst, &revs};
>   
>   	strvec_pushl(&args, "internal: The first arg is not parsed",
>   		     "--objects", "--in-commit-order", "--reverse", "HEAD",
> @@ -589,7 +587,7 @@ static void describe(const char *arg, int last_one)
>   	cmit = lookup_commit_reference_gently(the_repository, &oid, 1);
>   
>   	if (cmit)
> -		describe_commit(&oid, &sb);
> +		describe_commit(cmit, &sb);
>   	else if (odb_read_object_info(the_repository->objects,
>   				      &oid, NULL) == OBJ_BLOB)
>   		describe_blob(oid, &sb);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-14 23:26   ` Jeff King
  2025-08-15 15:49     ` Phillip Wood
@ 2025-08-17  9:27     ` René Scharfe
  2025-08-18  4:48       ` Jeff King
  1 sibling, 1 reply; 28+ messages in thread
From: René Scharfe @ 2025-08-17  9:27 UTC (permalink / raw)
  To: Jeff King, phillip.wood; +Cc: Cheng, git

On 8/15/25 1:26 AM, Jeff King wrote:
> On Wed, Aug 13, 2025 at 02:19:14PM +0100, Phillip Wood wrote:
> 
>> I had a quick look at the callers of describe_commit() and they all seem to
>> use an oid that they get from looking up a commit so I'm not sure under what
>> circumstances this call to lookup_commit_reference() can fail.
> 
> I wonder if it would make sense for describe_commit() to just take a
> "struct commit" pointer.

Yes, a lot.

> Then it could skip the call to turn the oid
> into a commit entirely, and the compiler would make sure we always have
> a commit. :)
> 
> Something like this (totally untested, and not something I'm planning to
> follow up on, but maybe inspirational):
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 32f5bf513f..3e8691a4c4 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -352,26 +352,24 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
>  		    repo_find_unique_abbrev(the_repository, oid, abbrev));
>  }
>  
> -static void describe_commit(struct object_id *oid, struct strbuf *dst)
> +static void describe_commit(struct commit *cmit, struct strbuf *dst)
>  {
> -	struct commit *cmit, *gave_up_on = NULL;
> +	struct commit *gave_up_on = NULL;
>  	struct lazy_queue queue = LAZY_QUEUE_INIT;
>  	struct commit_name *n;
>  	struct possible_tag all_matches[MAX_TAGS];
>  	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
>  	unsigned long seen_commits = 0;
>  	unsigned int unannotated_cnt = 0;
>  
> -	cmit = lookup_commit_reference(the_repository, oid);
> -
>  	n = find_commit_name(&cmit->object.oid);
>  	if (n && (tags || all || n->prio == 2)) {
>  		/*
>  		 * Exact match to an existing ref.
>  		 */
>  		append_name(n, dst);
>  		if (n->misnamed || longformat)
> -			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
> +			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : &cmit->object.oid, dst);
>  		if (suffix)
>  			strbuf_addstr(dst, suffix);
>  		return;
> @@ -528,7 +526,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
>  }
>  
>  struct process_commit_data {
> -	struct object_id current_commit;
> +	struct commit *current_commit;
>  	struct object_id looking_for;
>  	struct strbuf *dst;
>  	struct rev_info *revs;
> @@ -537,7 +535,7 @@ struct process_commit_data {
>  static void process_commit(struct commit *commit, void *data)
>  {
>  	struct process_commit_data *pcd = data;
> -	pcd->current_commit = commit->object.oid;
> +	pcd->current_commit = commit;
>  }
>  
>  static void process_object(struct object *obj, const char *path, void *data)
> @@ -546,7 +544,7 @@ static void process_object(struct object *obj, const char *path, void *data)
>  
>  	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
>  		reset_revision_walk();
> -		describe_commit(&pcd->current_commit, pcd->dst);
> +		describe_commit(pcd->current_commit, pcd->dst);

pcd->current_commit is initialized to NULL below, but
traverse_commit_list() without a filter must have set it via our
process_commit() callback before we get to the describe_commit() call.

Or are there weird repositories (e.g., just a blob, just a tag) that can
cause traverse_commit_list() to call its show_object() callback without
ever calling its show_commit() callback?  I don't see how, but may be
missing some way.

>  		strbuf_addf(pcd->dst, ":%s", path);
>  		clear_prio_queue(&pcd->revs->commits);
>  	}
> @@ -556,7 +554,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
>  {
>  	struct rev_info revs;
>  	struct strvec args = STRVEC_INIT;
> -	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
> +	struct process_commit_data pcd = { NULL, oid, dst, &revs};
>  
>  	strvec_pushl(&args, "internal: The first arg is not parsed",
>  		     "--objects", "--in-commit-order", "--reverse", "HEAD",
> @@ -589,7 +587,7 @@ static void describe(const char *arg, int last_one)
>  	cmit = lookup_commit_reference_gently(the_repository, &oid, 1);
>  
>  	if (cmit)
> -		describe_commit(&oid, &sb);
> +		describe_commit(cmit, &sb);
>  	else if (odb_read_object_info(the_repository->objects,
>  				      &oid, NULL) == OBJ_BLOB)
>  		describe_blob(oid, &sb);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-17  9:27     ` René Scharfe
@ 2025-08-18  4:48       ` Jeff King
  2025-08-18  5:05         ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18  4:48 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

On Sun, Aug 17, 2025 at 11:27:12AM +0200, René Scharfe wrote:

> > @@ -546,7 +544,7 @@ static void process_object(struct object *obj, const char *path, void *data)
> >  
> >  	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
> >  		reset_revision_walk();
> > -		describe_commit(&pcd->current_commit, pcd->dst);
> > +		describe_commit(pcd->current_commit, pcd->dst);
> 
> pcd->current_commit is initialized to NULL below, but
> traverse_commit_list() without a filter must have set it via our
> process_commit() callback before we get to the describe_commit() call.
> 
> Or are there weird repositories (e.g., just a blob, just a tag) that can
> cause traverse_commit_list() to call its show_object() callback without
> ever calling its show_commit() callback?  I don't see how, but may be
> missing some way.

If there are, then I think the current code would segfault, too. It
initializes &pcd->current_commit to the null oid, and then
describe_commit() resolves that via lookup_commit_reference(). That
would return NULL, and the next line dereferencing the result would
segfault.

And it would be a counter-example to the claim that the call to
lookup_commit_reference() in describe_commit() never fails. ;)

I think your intuition is right that we could get the traversal code to
call show_object() without show_commit() in general. E.g. just:

  git init
  git tag foo $(echo bar | git hash-object -w --stdin)
  git rev-list --all --objects

would do so. But in this code our traversal always starts from HEAD,
which must be a commit (and this is enforced by the refs code). So you'd
have to corrupt your repository like:

  # do this in two steps since redirecting to .git/HEAD breaks the
  # repository for a moment!
  tag=$(git rev-parse foo)
  echo $tag >.git/HEAD

And indeed, the current code does then segfault on "git describe foo" at
the spot I mentioned. Even though the repository state here is
unexpected and corrupt, I do think we should probably be more defensive
and avoid the segfault.

I also find it a bit funny that describing a blob only walks from HEAD
(and not say, all refs or ones matching --match/--exclude, etc).  It is
documented that way, though. Actually, I find the whole feature a bit
pointless now that we have the diff "--find-object" option. Reading
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04), I think the git-describe behavior is mostly considered a
mistake (which we have to keep around for historical reasons). I guess
another possible candidate for removing in Git 3.0. :)

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-18  4:48       ` Jeff King
@ 2025-08-18  5:05         ` Jeff King
  2025-08-18 19:56           ` René Scharfe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18  5:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 12:48:07AM -0400, Jeff King wrote:

> And indeed, the current code does then segfault on "git describe foo" at
> the spot I mentioned. Even though the repository state here is
> unexpected and corrupt, I do think we should probably be more defensive
> and avoid the segfault.

So you almost nerd-sniped me into making a series. But the more I dug
into the rabbit-hole, the more I turned away in disgust. :)

The set of problems I found are:

  1. What should happen when traversing from HEAD does not find the blob
     in question? Right now we print a blank line, which is...weird.
     Probably we should either print nothing, or return an error. If we
     return an error, should we respect --always? Are we stuck with the
     current dumb behavior because it's a plumbing command?

  2. When we are on an unborn branch, we print a confusing message:

       $ git init
       $ git commit --allow-empty -m foo
       $ git tag foo
       $ git symbolic-ref HEAD refs/heads/unborn
       $ git describe $(echo blob | git hash-object -w --stdin)
       fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
       Use '--' to separate paths from revisions, like this:
       'git <command> [<revision>...] -- [<file>...]'

     We should probably resolve HEAD ourselves and either bail with an
     empty output or an error (depending on what we do for (1) above).

  3. When we do traverse, if process_object() sees that we didn't find a
     commit, we should detect that and either return an empty result or
     an error (again, depending on the behavior of (2) above). This is
     done by checking is_null_oid(&pcd->current_commit) there.

  4. Then we can teach describe_commit() to take a commit rather than an
     oid (and the is_null_oid() check becomes a NULL check).

So it all depends on what to do with (1), and for a feature that IMHO
should not even exist in the first place, I had trouble summoning the
will-power to make this 4-patch series.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-18  5:05         ` Jeff King
@ 2025-08-18 19:56           ` René Scharfe
  2025-08-18 20:21             ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: René Scharfe @ 2025-08-18 19:56 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Cheng, git

On 8/18/25 7:05 AM, Jeff King wrote:
> On Mon, Aug 18, 2025 at 12:48:07AM -0400, Jeff King wrote:
> 
>> And indeed, the current code does then segfault on "git describe foo" at
>> the spot I mentioned. Even though the repository state here is
>> unexpected and corrupt, I do think we should probably be more defensive
>> and avoid the segfault.
> 
> So you almost nerd-sniped me into making a series. But the more I dug
> into the rabbit-hole, the more I turned away in disgust. :)
> 
> The set of problems I found are:
> 
>   1. What should happen when traversing from HEAD does not find the blob
>      in question? Right now we print a blank line, which is...weird.

Weird indeed.

>      Probably we should either print nothing, or return an error.

The latter, consistent with "git describe <commit-ish>".

> If we return an error, should we respect --always?

The documentation says "git describe <blob>" takes no options.  It could
learn some, of course.  But does it have to?  Perhaps better keep that
thing contained.

> Are we stuck with the
>      current dumb behavior because it's a plumbing command?

git describe is a porcelain command.

The documentation doesn't say what happens when the blob is not found
in HEAD's ancestry.

I can't imagine why someone would use "git describe <blob>" to check
whether a particular blob is linked to, but it _is_ slightly faster than
"git rev-list --objects --no-object-names HEAD | grep <blob>" for me,
and of course easier to type.  "git log --find-object <blob>" is slower
than either.

>   2. When we are on an unborn branch, we print a confusing message:
> 
>        $ git init
>        $ git commit --allow-empty -m foo
>        $ git tag foo
>        $ git symbolic-ref HEAD refs/heads/unborn
>        $ git describe $(echo blob | git hash-object -w --stdin)
>        fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
>        Use '--' to separate paths from revisions, like this:
>        'git <command> [<revision>...] -- [<file>...]'
> 
>      We should probably resolve HEAD ourselves and either bail with an
>      empty output or an error (depending on what we do for (1) above).

It already is an error, just needs a better message.  It should still
report an error even if we were to stick with showing blank lines for
unrelated blobs.

>   3. When we do traverse, if process_object() sees that we didn't find a
>      commit, we should detect that and either return an empty result or
>      an error (again, depending on the behavior of (2) above). This is
>      done by checking is_null_oid(&pcd->current_commit) there.

OK, ending the search right there might be the best option.  Traversing
deeper into the forest that we then know to be cursed would be the
unappealing alternative.

>   4. Then we can teach describe_commit() to take a commit rather than an
>      oid (and the is_null_oid() check becomes a NULL check).

  5. When process_object() has a commit, but it is indescribable, it
     shows an error:

     $ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
     fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
     Try --always, or create some tags.

     It's not immediately clear that the reported hash belongs to the
     found commit.  And that suggestion to try --always is misleading,
     as "git describe <blob>" takes no options according to the
     documentation.  I'm not sure I like it in general -- can't tell
     if the command is being snarky with me.

> So it all depends on what to do with (1), and for a feature that IMHO
> should not even exist in the first place, I had trouble summoning the
> will-power to make this 4-patch series.

644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04) confuse me; the latter's commit message sounds like the
former wasn't (supposed to be?) merged.

I think the issues you listed are independent, though.  Or what's wrong
with this demo that addresses point 3 in process_object() and 1 in
describe_blob().  If we want a blank line for 1 then we apply only
the first hunk.  Or am I missing something?

René


diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..9e485240aa 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
@@ -519,6 +521,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
 	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+	size_t orig_len = dst->len;
 
 	strvec_pushl(&args, "internal: The first arg is not parsed",
 		     "--objects", "--in-commit-order", "--reverse", "HEAD",
@@ -532,6 +535,8 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 		die("revision walk setup failed");
 
 	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	if (dst->len == orig_len)
+		die(_("unable to describe blob '%s'"), oid_to_hex(&oid));
 	reset_revision_walk();
 	release_revisions(&revs);
 	strvec_clear(&args);


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-18 19:56           ` René Scharfe
@ 2025-08-18 20:21             ` Jeff King
  2025-08-18 20:56               ` Jeff King
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 20:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 09:56:35PM +0200, René Scharfe wrote:

> >   1. What should happen when traversing from HEAD does not find the blob
> >      in question? Right now we print a blank line, which is...weird.
> 
> Weird indeed.
> 
> >      Probably we should either print nothing, or return an error.
> 
> The latter, consistent with "git describe <commit-ish>".

Certainly if we were starting from scratch, that's my thought. I just
wondered if we were stuck with the behavior for historical reasons.

> > If we return an error, should we respect --always?
> 
> The documentation says "git describe <blob>" takes no options.  It could
> learn some, of course.  But does it have to?  Perhaps better keep that
> thing contained.

OK. It is easy to just treat --always like we do in the commit case, but
I'm not sure it's actually useful. Just bailing early to say the option
is incompatible is reasonable.

> >   2. When we are on an unborn branch, we print a confusing message:
> > 
> >        $ git init
> >        $ git commit --allow-empty -m foo
> >        $ git tag foo
> >        $ git symbolic-ref HEAD refs/heads/unborn
> >        $ git describe $(echo blob | git hash-object -w --stdin)
> >        fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
> >        Use '--' to separate paths from revisions, like this:
> >        'git <command> [<revision>...] -- [<file>...]'
> > 
> >      We should probably resolve HEAD ourselves and either bail with an
> >      empty output or an error (depending on what we do for (1) above).
> 
> It already is an error, just needs a better message.  It should still
> report an error even if we were to stick with showing blank lines for
> unrelated blobs.

I think an error here is OK. But if we quietly return no output for (1),
then I think this should do the same. If we return an error for (1),
then yeah, it should remain one here and just get a better message.

> >   3. When we do traverse, if process_object() sees that we didn't find a
> >      commit, we should detect that and either return an empty result or
> >      an error (again, depending on the behavior of (2) above). This is
> >      done by checking is_null_oid(&pcd->current_commit) there.
> 
> OK, ending the search right there might be the best option.  Traversing
> deeper into the forest that we then know to be cursed would be the
> unappealing alternative.

Right. There is no real "deeper" because we know we will never find a
commit.

>   5. When process_object() has a commit, but it is indescribable, it
>      shows an error:
> 
>      $ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
>      fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
>      Try --always, or create some tags.
> 
>      It's not immediately clear that the reported hash belongs to the
>      found commit.  And that suggestion to try --always is misleading,
>      as "git describe <blob>" takes no options according to the
>      documentation.  I'm not sure I like it in general -- can't tell
>      if the command is being snarky with me.

Yeah, describe_commit()'s messages are not really set up to handle
describing an arbitrary commit that the user did not specify.

> > So it all depends on what to do with (1), and for a feature that IMHO
> > should not even exist in the first place, I had trouble summoning the
> > will-power to make this 4-patch series.
> 
> 644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15) and
> 15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
> 2018-01-04) confuse me; the latter's commit message sounds like the
> former wasn't (supposed to be?) merged.
> 
> I think the issues you listed are independent, though.  Or what's wrong
> with this demo that addresses point 3 in process_object() and 1 in
> describe_blob().  If we want a blank line for 1 then we apply only
> the first hunk.  Or am I missing something?

Yeah, I came up with a similar patch. The question was more of what the
behavior _should_ be, and whether we wanted to align that with an unborn
HEAD. And of course tests for all of these cases.

I'll send out a few patches in a moment (I guess I summoned some
willpower in the interim).

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Potential Null Pointer Dereference detected by static analysis tool
  2025-08-18 20:21             ` Jeff King
@ 2025-08-18 20:56               ` Jeff King
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 20:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 04:21:40PM -0400, Jeff King wrote:

> >   5. When process_object() has a commit, but it is indescribable, it
> >      shows an error:
> > 
> >      $ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
> >      fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
> >      Try --always, or create some tags.
> > 
> >      It's not immediately clear that the reported hash belongs to the
> >      found commit.  And that suggestion to try --always is misleading,
> >      as "git describe <blob>" takes no options according to the
> >      documentation.  I'm not sure I like it in general -- can't tell
> >      if the command is being snarky with me.
> 
> Yeah, describe_commit()'s messages are not really set up to handle
> describing an arbitrary commit that the user did not specify.

Oh, there's one other related case I _thought_ was bug, but maybe isn't.
If you don't have any tags at all, then we'll bail immediately:

  $ git init
  $ echo foo >file && git add file && git commit -m foo
  $ git describe $(git rev-parse HEAD:file)
  fatal: No names found, cannot describe anything.

But we're traversing from HEAD, not the tags, so my initial thought was
that this should work. But of course it doesn't because we try to
describe the containing commit, which _does_ require tags.

And that leads back to: "well, you could pass --always". Which disagrees
with the documentation, but I think does actually do something useful
here.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/5] fix segfault and other oddities describing blobs
  2025-08-18 20:21             ` Jeff King
  2025-08-18 20:56               ` Jeff King
@ 2025-08-18 20:58               ` Jeff King
  2025-08-18 20:59                 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
                                   ` (4 more replies)
  1 sibling, 5 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 20:58 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 04:21:40PM -0400, Jeff King wrote:

> I'll send out a few patches in a moment (I guess I summoned some
> willpower in the interim).

Here's what I came up with.

  [1/5]: describe: pass oid struct by const pointer
  [2/5]: describe: error if blob not found
  [3/5]: describe: catch unborn branch in describe_blob()
  [4/5]: describe: handle blob traversal with no commits
  [5/5]: describe: pass commit to describe_commit()

 builtin/describe.c  | 41 +++++++++++++++++++++++++----------------
 t/t6120-describe.sh | 14 ++++++++++++++
 2 files changed, 39 insertions(+), 16 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] describe: pass oid struct by const pointer
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
@ 2025-08-18 20:59                 ` Jeff King
  2025-08-18 21:05                   ` Junio C Hamano
  2025-08-18 21:01                 ` [PATCH 2/5] describe: error if blob not found Jeff King
                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 20:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

We pass a "struct object_id" to describe_blob() by value. This isn't
wrong, as an oid is composed only of copy-able values. But it's unusual;
typically we pass structs by const pointer, including object_ids. Let's
do so.

It similarly makes sense for us to hold that pointer in the callback
data (rather than yet another copy of the oid).

Signed-off-by: Jeff King <peff@peff.net>
---
Not strictly related, but I noticed while in the area and remembered a
recent discussion in this direction.

 builtin/describe.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..383d3e6b9a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -490,7 +490,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 
 struct process_commit_data {
 	struct object_id current_commit;
-	struct object_id looking_for;
+	const struct object_id *looking_for;
 	struct strbuf *dst;
 	struct rev_info *revs;
 };
@@ -505,7 +505,7 @@ static void process_object(struct object *obj, const char *path, void *data)
 {
 	struct process_commit_data *pcd = data;
 
-	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
+	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
 		describe_commit(&pcd->current_commit, pcd->dst);
 		strbuf_addf(pcd->dst, ":%s", path);
@@ -514,7 +514,7 @@ static void process_object(struct object *obj, const char *path, void *data)
 	}
 }
 
-static void describe_blob(struct object_id oid, struct strbuf *dst)
+static void describe_blob(const struct object_id *oid, struct strbuf *dst)
 {
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
@@ -554,7 +554,7 @@ static void describe(const char *arg, int last_one)
 		describe_commit(&oid, &sb);
 	else if (odb_read_object_info(the_repository->objects,
 				      &oid, NULL) == OBJ_BLOB)
-		describe_blob(oid, &sb);
+		describe_blob(&oid, &sb);
 	else
 		die(_("%s is neither a commit nor blob"), arg);
 
-- 
2.51.0.326.gecbb38d78e


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] describe: error if blob not found
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
  2025-08-18 20:59                 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
@ 2025-08-18 21:01                 ` Jeff King
  2025-08-18 21:12                   ` Junio C Hamano
  2025-08-19 18:32                   ` René Scharfe
  2025-08-18 21:01                 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
                                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:01 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

If describe_blob() does not find the blob in question, it returns an
empty strbuf, and we print an empty line. This differs from
describe_commit(), which always either returns an answer or calls die()
itself. As the blob function was bolted onto the command afterwards, I
think its behavior is not intentional, and it is just a bug that it does
not report an error.

Signed-off-by: Jeff King <peff@peff.net>
---
This one is perhaps the most controversial, as it is a change in
behavior. But the current behavior just really seems like a bug to me.

Unlike what René posted earlier, I didn't record the dst strbuf's
original size and compare against that. This is a static function with
only one caller that passes in an empty strbuf, so being overly
defensive didn't seem worth it (arguably these functions should just
return an allocated buffer anyway).

 builtin/describe.c  | 3 +++
 t/t6120-describe.sh | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/describe.c b/builtin/describe.c
index 383d3e6b9a..06e413d937 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
 	reset_revision_walk();
 	release_revisions(&revs);
 	strvec_clear(&args);
+
+	if (!dst->len)
+		die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));
 }
 
 static void describe(const char *arg, int last_one)
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 256ccaefb7..470631d17d 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
 	test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
+test_expect_success 'describe an unreachable blob' '
+	blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
+	test_must_fail git describe $blob 2>actual &&
+	test_grep "blob .$blob. not reachable from HEAD" actual
+'
+
 test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.51.0.326.gecbb38d78e


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] describe: catch unborn branch in describe_blob()
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
  2025-08-18 20:59                 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
  2025-08-18 21:01                 ` [PATCH 2/5] describe: error if blob not found Jeff King
@ 2025-08-18 21:01                 ` Jeff King
  2025-08-18 21:19                   ` Junio C Hamano
  2025-08-18 21:03                 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
  2025-08-18 21:04                 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:01 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

When describing a blob, we search for it by traversing from HEAD. We do
this by feeding the name HEAD to setup_revisions(). But if we are on an
unborn branch, this will fail with a confusing message:

  $ git describe $blob
  fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

It is OK for this to be an error (we cannot find $blob in an empty
traversal, so we'd eventually complain about that). But the error
message could be more helpful.

Let's resolve HEAD ourselves and pass the resolved object id to
setup_revisions(). If resolving fails, then we can print a more useful
message.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/describe.c  | 8 +++++++-
 t/t6120-describe.sh | 8 ++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 06e413d937..f7bea3c8c5 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
 {
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
+	struct object_id head_oid;
 	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
 
+	if (repo_get_oid(the_repository, "HEAD", &head_oid))
+		die(_("cannot search for blob '%s' on an unborn branch"),
+		    oid_to_hex(oid));
+
 	strvec_pushl(&args, "internal: The first arg is not parsed",
-		     "--objects", "--in-commit-order", "--reverse", "HEAD",
+		     "--objects", "--in-commit-order", "--reverse",
+		     oid_to_hex(&head_oid),
 		     NULL);
 
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 470631d17d..feec57bcbc 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -415,6 +415,14 @@ test_expect_success 'describe an unreachable blob' '
 	test_grep "blob .$blob. not reachable from HEAD" actual
 '
 
+test_expect_success 'describe blob on an unborn branch' '
+	oldbranch=$(git symbolic-ref HEAD) &&
+	test_when_finished "git symbolic-ref HEAD $oldbranch" &&
+	git symbolic-ref HEAD refs/heads/does-not-exist &&
+	test_must_fail git describe test-blob 2>actual &&
+	test_grep "cannot search .* on an unborn branch" actual
+'
+
 test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.51.0.326.gecbb38d78e


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] describe: handle blob traversal with no commits
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
                                   ` (2 preceding siblings ...)
  2025-08-18 21:01                 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
@ 2025-08-18 21:03                 ` Jeff King
  2025-08-19  8:05                   ` Patrick Steinhardt
  2025-08-18 21:04                 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:03 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).

This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:

  blob=$(echo foo | git hash-object -w --stdin)
  echo $blob >.git/HEAD
  git describe $blob

We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.

This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.

But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.

So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).

I didn't include a test here because it requires corrupting the
repository in a way that is only easy to do using the files ref backend.
It doesn't seem worth carrying a REFFILES test just for this oddity.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/describe.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index f7bea3c8c5..72b2e1162c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
-- 
2.51.0.326.gecbb38d78e


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] describe: pass commit to describe_commit()
  2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
                                   ` (3 preceding siblings ...)
  2025-08-18 21:03                 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
@ 2025-08-18 21:04                 ` Jeff King
  2025-08-19  8:05                   ` Patrick Steinhardt
  4 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-18 21:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: phillip.wood, Cheng, git

There's a call in describe_commit() to lookup_commit_reference(), but we
don't check the return value. If it returns NULL, we'll segfault as we
immediately dereference the result.

In practice this can never happen, since all callers pass an oid which
came from a "struct commit" already. So we can make this more obvious
by just taking that commit struct in the first place.

Reported-by: Cheng <prophecheng@stu.pku.edu.cn>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/describe.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 72b2e1162c..04df89d56b 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -313,26 +313,24 @@ static void append_suffix(int depth, const struct object_id *oid, struct strbuf
 		    repo_find_unique_abbrev(the_repository, oid, abbrev));
 }
 
-static void describe_commit(struct object_id *oid, struct strbuf *dst)
+static void describe_commit(struct commit *cmit, struct strbuf *dst)
 {
-	struct commit *cmit, *gave_up_on = NULL;
+	struct commit *gave_up_on = NULL;
 	struct commit_list *list;
 	struct commit_name *n;
 	struct possible_tag all_matches[MAX_TAGS];
 	unsigned int match_cnt = 0, annotated_cnt = 0, cur_match;
 	unsigned long seen_commits = 0;
 	unsigned int unannotated_cnt = 0;
 
-	cmit = lookup_commit_reference(the_repository, oid);
-
 	n = find_commit_name(&cmit->object.oid);
 	if (n && (tags || all || n->prio == 2)) {
 		/*
 		 * Exact match to an existing ref.
 		 */
 		append_name(n, dst);
 		if (n->misnamed || longformat)
-			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
+			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : &cmit->object.oid, dst);
 		if (suffix)
 			strbuf_addstr(dst, suffix);
 		return;
@@ -489,7 +487,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 }
 
 struct process_commit_data {
-	struct object_id current_commit;
+	struct commit *current_commit;
 	const struct object_id *looking_for;
 	struct strbuf *dst;
 	struct rev_info *revs;
@@ -498,7 +496,7 @@ struct process_commit_data {
 static void process_commit(struct commit *commit, void *data)
 {
 	struct process_commit_data *pcd = data;
-	pcd->current_commit = commit->object.oid;
+	pcd->current_commit = commit;
 }
 
 static void process_object(struct object *obj, const char *path, void *data)
@@ -507,8 +505,8 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		if (!is_null_oid(&pcd->current_commit)) {
-			describe_commit(&pcd->current_commit, pcd->dst);
+		if (pcd->current_commit) {
+			describe_commit(pcd->current_commit, pcd->dst);
 			strbuf_addf(pcd->dst, ":%s", path);
 		}
 		free_commit_list(pcd->revs->commits);
@@ -521,7 +519,7 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
 	struct object_id head_oid;
-	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+	struct process_commit_data pcd = { NULL, oid, dst, &revs};
 
 	if (repo_get_oid(the_repository, "HEAD", &head_oid))
 		die(_("cannot search for blob '%s' on an unborn branch"),
@@ -562,7 +560,7 @@ static void describe(const char *arg, int last_one)
 	cmit = lookup_commit_reference_gently(the_repository, &oid, 1);
 
 	if (cmit)
-		describe_commit(&oid, &sb);
+		describe_commit(cmit, &sb);
 	else if (odb_read_object_info(the_repository->objects,
 				      &oid, NULL) == OBJ_BLOB)
 		describe_blob(&oid, &sb);
-- 
2.51.0.326.gecbb38d78e

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] describe: pass oid struct by const pointer
  2025-08-18 20:59                 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
@ 2025-08-18 21:05                   ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:05 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

Jeff King <peff@peff.net> writes:

> We pass a "struct object_id" to describe_blob() by value. This isn't
> wrong, as an oid is composed only of copy-able values. But it's unusual;
> typically we pass structs by const pointer, including object_ids. Let's
> do so.
>
> It similarly makes sense for us to hold that pointer in the callback
> data (rather than yet another copy of the oid).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Not strictly related, but I noticed while in the area and remembered a
> recent discussion in this direction.

Thanks for making this part of the code less odd.

>  builtin/describe.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/describe.c b/builtin/describe.c
> index d7dd8139de..383d3e6b9a 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -490,7 +490,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
>  
>  struct process_commit_data {
>  	struct object_id current_commit;
> -	struct object_id looking_for;
> +	const struct object_id *looking_for;
>  	struct strbuf *dst;
>  	struct rev_info *revs;
>  };
> @@ -505,7 +505,7 @@ static void process_object(struct object *obj, const char *path, void *data)
>  {
>  	struct process_commit_data *pcd = data;
>  
> -	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
> +	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
>  		reset_revision_walk();
>  		describe_commit(&pcd->current_commit, pcd->dst);
>  		strbuf_addf(pcd->dst, ":%s", path);
> @@ -514,7 +514,7 @@ static void process_object(struct object *obj, const char *path, void *data)
>  	}
>  }
>  
> -static void describe_blob(struct object_id oid, struct strbuf *dst)
> +static void describe_blob(const struct object_id *oid, struct strbuf *dst)
>  {
>  	struct rev_info revs;
>  	struct strvec args = STRVEC_INIT;
> @@ -554,7 +554,7 @@ static void describe(const char *arg, int last_one)
>  		describe_commit(&oid, &sb);
>  	else if (odb_read_object_info(the_repository->objects,
>  				      &oid, NULL) == OBJ_BLOB)
> -		describe_blob(oid, &sb);
> +		describe_blob(&oid, &sb);
>  	else
>  		die(_("%s is neither a commit nor blob"), arg);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] describe: error if blob not found
  2025-08-18 21:01                 ` [PATCH 2/5] describe: error if blob not found Jeff King
@ 2025-08-18 21:12                   ` Junio C Hamano
  2025-08-19  8:05                     ` Patrick Steinhardt
  2025-08-19 18:32                   ` René Scharfe
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:12 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

Jeff King <peff@peff.net> writes:

> If describe_blob() does not find the blob in question, it returns an
> empty strbuf, and we print an empty line. This differs from
> describe_commit(), which always either returns an answer or calls die()
> itself. As the blob function was bolted onto the command afterwards, I
> think its behavior is not intentional, and it is just a bug that it does
> not report an error.

Yes, let's do so.  Silently succeeding without returning anything
useful is not what we usually do in this system.

> This one is perhaps the most controversial, as it is a change in
> behavior. But the current behavior just really seems like a bug to me.
>
> Unlike what René posted earlier, I didn't record the dst strbuf's
> original size and compare against that. This is a static function with
> only one caller that passes in an empty strbuf, so being overly
> defensive didn't seem worth it (arguably these functions should just
> return an allocated buffer anyway).

Sounds sensible.

Thanks.

> diff --git a/builtin/describe.c b/builtin/describe.c
> index 383d3e6b9a..06e413d937 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
>  	reset_revision_walk();
>  	release_revisions(&revs);
>  	strvec_clear(&args);
> +
> +	if (!dst->len)
> +		die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));
>  }
>  
>  static void describe(const char *arg, int last_one)
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 256ccaefb7..470631d17d 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
>  	test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
>  '
>  
> +test_expect_success 'describe an unreachable blob' '
> +	blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
> +	test_must_fail git describe $blob 2>actual &&
> +	test_grep "blob .$blob. not reachable from HEAD" actual
> +'
> +
>  test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
>  	i=1 &&
>  	while test $i -lt 8000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] describe: catch unborn branch in describe_blob()
  2025-08-18 21:01                 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
@ 2025-08-18 21:19                   ` Junio C Hamano
  2025-08-18 23:07                     ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2025-08-18 21:19 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

Jeff King <peff@peff.net> writes:

> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
>  {
>  	struct rev_info revs;
>  	struct strvec args = STRVEC_INIT;
> +	struct object_id head_oid;
>  	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};

An unrelated tangent, but it seems that we are copying the object
name for the first member of this struct, even though 1/5 changed
the second one.

> +	if (repo_get_oid(the_repository, "HEAD", &head_oid))
> +		die(_("cannot search for blob '%s' on an unborn branch"),
> +		    oid_to_hex(oid));

Makes sense.  I briefly wondered if there is really a point in doing
the traversal only from HEAD, but this topic is not about enhancing
and making the "describe <blob>" more useful, but it is a strict
improvement.  When you first mentioned "resolve HEAD ourselves", I
somehow expected you to ask the ref subsystem to resolve HEAD, but
this should do fine, thanks to ref_rev_parse_rules[].

> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 470631d17d..feec57bcbc 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -415,6 +415,14 @@ test_expect_success 'describe an unreachable blob' '
>  	test_grep "blob .$blob. not reachable from HEAD" actual
>  '
>  
> +test_expect_success 'describe blob on an unborn branch' '
> +	oldbranch=$(git symbolic-ref HEAD) &&
> +	test_when_finished "git symbolic-ref HEAD $oldbranch" &&
> +	git symbolic-ref HEAD refs/heads/does-not-exist &&
> +	test_must_fail git describe test-blob 2>actual &&
> +	test_grep "cannot search .* on an unborn branch" actual
> +'
> +
>  test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
>  	i=1 &&
>  	while test $i -lt 8000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] describe: catch unborn branch in describe_blob()
  2025-08-18 21:19                   ` Junio C Hamano
@ 2025-08-18 23:07                     ` Jeff King
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-18 23:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 02:19:42PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -518,10 +518,16 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
> >  {
> >  	struct rev_info revs;
> >  	struct strvec args = STRVEC_INIT;
> > +	struct object_id head_oid;
> >  	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
> 
> An unrelated tangent, but it seems that we are copying the object
> name for the first member of this struct, even though 1/5 changed
> the second one.

Yep, agreed. It goes away in the final patch, though.

> > +	if (repo_get_oid(the_repository, "HEAD", &head_oid))
> > +		die(_("cannot search for blob '%s' on an unborn branch"),
> > +		    oid_to_hex(oid));
> 
> Makes sense.  I briefly wondered if there is really a point in doing
> the traversal only from HEAD, but this topic is not about enhancing
> and making the "describe <blob>" more useful, but it is a strict
> improvement.

Yeah. We do document the traversal from HEAD, so I wondered if anybody
would be upset at being more inclusive. So it might need a new option.
Somebody who is more interested in the option is welcome to pick up that
topic. :)

> When you first mentioned "resolve HEAD ourselves", I somehow expected
> you to ask the ref subsystem to resolve HEAD, but this should do fine,
> thanks to ref_rev_parse_rules[].

I hadn't really thought about the distinction. If you grep for '"HEAD"',
looks like we have a mix of both types. I don't think it should matter
much in practice.

Some even call lookup_commit_reference_by_name(). So we could perhaps do
that, in which case the segfault fix in patch 4 just happens naturally,
because we know we are always traversing from at least one commit. I
don't see any difference between the approaches compelling enough to
switch, though.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] describe: error if blob not found
  2025-08-18 21:12                   ` Junio C Hamano
@ 2025-08-19  8:05                     ` Patrick Steinhardt
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19  8:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, René Scharfe, phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 02:12:15PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > If describe_blob() does not find the blob in question, it returns an
> > empty strbuf, and we print an empty line. This differs from
> > describe_commit(), which always either returns an answer or calls die()
> > itself. As the blob function was bolted onto the command afterwards, I
> > think its behavior is not intentional, and it is just a bug that it does
> > not report an error.
> 
> Yes, let's do so.  Silently succeeding without returning anything
> useful is not what we usually do in this system.

Seconded. It might be breaking backwards compatibility, but it really
sounds like just another bug in our codebase that ought to be fixed.

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] describe: handle blob traversal with no commits
  2025-08-18 21:03                 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
@ 2025-08-19  8:05                   ` Patrick Steinhardt
  2025-08-19 16:59                     ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19  8:05 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 05:03:12PM -0400, Jeff King wrote:
> When describing a blob, we traverse from HEAD, remembering each commit
> we saw, and then checking each blob to report the containing commit.
> But if we haven't seen any commits at all, we'll segfault (we store the
> "current" commit as an oid initialized to the null oid, causing
> lookup_commit_reference() to return NULL).
> 
> This shouldn't be able to happen normally. We always start our traversal
> at HEAD, which must be a commit (a property which is enforced by the
> refs code). But you can trigger the segfault like this:
> 
>   blob=$(echo foo | git hash-object -w --stdin)
>   echo $blob >.git/HEAD
>   git describe $blob

I bet that the enforcement is only of partial nature, and that there are
ways to do the above e.g. via git-update-ref(1) or by playing around
with symrefs.

[snip]
> I didn't include a test here because it requires corrupting the
> repository in a way that is only easy to do using the files ref backend.
> It doesn't seem worth carrying a REFFILES test just for this oddity.

True:

    $ git update-ref HEAD HEAD^{tree}
    fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'

But:

    $ git update-ref refs/some/tree HEAD^{tree}
    $ git symbolic-ref HEAD refs/some/tree
    $ git show
    tree HEAD

So that should allow you to write a test, right?

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] describe: pass commit to describe_commit()
  2025-08-18 21:04                 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
@ 2025-08-19  8:05                   ` Patrick Steinhardt
  2025-08-19 17:02                     ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-19  8:05 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

On Mon, Aug 18, 2025 at 05:04:17PM -0400, Jeff King wrote:
> There's a call in describe_commit() to lookup_commit_reference(), but we
> don't check the return value. If it returns NULL, we'll segfault as we
> immediately dereference the result.
> 
> In practice this can never happen, since all callers pass an oid which
> came from a "struct commit" already. So we can make this more obvious
> by just taking that commit struct in the first place.

I was wondering a bit about commit-graphs. We had the case in the past
where it was possible to look up commits via the graph even though they
don't exist in the ODB. So we might actually end up with a missing
object if `GIT_COMMIT_GRAPH_PARANOIA=false`, which is the default value.
But that might be fine? No idea without digging further.

In any case, the refactoring makes sense regardless from my point of
view.

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] describe: handle blob traversal with no commits
  2025-08-19  8:05                   ` Patrick Steinhardt
@ 2025-08-19 16:59                     ` Jeff King
  2025-08-20  4:34                       ` Patrick Steinhardt
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2025-08-19 16:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: René Scharfe, phillip.wood, Cheng, git

On Tue, Aug 19, 2025 at 10:05:19AM +0200, Patrick Steinhardt wrote:

> > I didn't include a test here because it requires corrupting the
> > repository in a way that is only easy to do using the files ref backend.
> > It doesn't seem worth carrying a REFFILES test just for this oddity.
> 
> True:
> 
>     $ git update-ref HEAD HEAD^{tree}
>     fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'
> 
> But:
> 
>     $ git update-ref refs/some/tree HEAD^{tree}
>     $ git symbolic-ref HEAD refs/some/tree
>     $ git show
>     tree HEAD
> 
> So that should allow you to write a test, right?

Hrm, that seems like a bug. I thought we insisted that HEAD point at
refs/heads.

Ah, no. We did that in b229d18a80 (validate_headref: tighten
ref-matching to just branches, 2009-01-29), but had to revert it in
e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
keep compatibility for topgit. :(

Still, I'm not sure it's something I'd want to base a test on. Maybe if
there is a big comment that says "It is OK to invalidate and remove this
test if we ever tighten symbolic-ref" it would be OK?

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] describe: pass commit to describe_commit()
  2025-08-19  8:05                   ` Patrick Steinhardt
@ 2025-08-19 17:02                     ` Jeff King
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-19 17:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: René Scharfe, phillip.wood, Cheng, git

On Tue, Aug 19, 2025 at 10:05:25AM +0200, Patrick Steinhardt wrote:

> On Mon, Aug 18, 2025 at 05:04:17PM -0400, Jeff King wrote:
> > There's a call in describe_commit() to lookup_commit_reference(), but we
> > don't check the return value. If it returns NULL, we'll segfault as we
> > immediately dereference the result.
> > 
> > In practice this can never happen, since all callers pass an oid which
> > came from a "struct commit" already. So we can make this more obvious
> > by just taking that commit struct in the first place.
> 
> I was wondering a bit about commit-graphs. We had the case in the past
> where it was possible to look up commits via the graph even though they
> don't exist in the ODB. So we might actually end up with a missing
> object if `GIT_COMMIT_GRAPH_PARANOIA=false`, which is the default value.
> But that might be fine? No idea without digging further.

I don't think existence matters here. Any call to parse_object() will
call lookup_object() and find the same commit struct we already had, and
then exit immediately because its "parsed" flag is set. So we'd never
get a NULL return from lookup_commit_reference().

> In any case, the refactoring makes sense regardless from my point of
> view.

But yeah. Even if I am wrong above, this would fix it. ;)

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] describe: error if blob not found
  2025-08-18 21:01                 ` [PATCH 2/5] describe: error if blob not found Jeff King
  2025-08-18 21:12                   ` Junio C Hamano
@ 2025-08-19 18:32                   ` René Scharfe
  1 sibling, 0 replies; 28+ messages in thread
From: René Scharfe @ 2025-08-19 18:32 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Cheng, git

On 8/18/25 11:01 PM, Jeff King wrote:
> If describe_blob() does not find the blob in question, it returns an
> empty strbuf, and we print an empty line. This differs from
> describe_commit(), which always either returns an answer or calls die()
> itself. As the blob function was bolted onto the command afterwards, I
> think its behavior is not intentional, and it is just a bug that it does
> not report an error.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This one is perhaps the most controversial, as it is a change in
> behavior. But the current behavior just really seems like a bug to me.
> 
> Unlike what René posted earlier, I didn't record the dst strbuf's
> original size and compare against that. This is a static function with
> only one caller that passes in an empty strbuf, so being overly
> defensive didn't seem worth it

Makes sense.

> (arguably these functions should just
> return an allocated buffer anyway).

Or even print results directly.

>  builtin/describe.c  | 3 +++
>  t/t6120-describe.sh | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/builtin/describe.c b/builtin/describe.c
> index 383d3e6b9a..06e413d937 100644
> --- a/builtin/describe.c
> +++ b/builtin/describe.c
> @@ -535,6 +535,9 @@ static void describe_blob(const struct object_id *oid, struct strbuf *dst)
>  	reset_revision_walk();
>  	release_revisions(&revs);
>  	strvec_clear(&args);
> +
> +	if (!dst->len)
> +		die(_("blob '%s' not reachable from HEAD"), oid_to_hex(oid));

I like the clarity and precision of this message. 

The rest of the patches look good to me as well.  The first is a nice
little bonus.

René

>  }
>  
>  static void describe(const char *arg, int last_one)
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 256ccaefb7..470631d17d 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -409,6 +409,12 @@ test_expect_success 'describe tag object' '
>  	test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
>  '
>  
> +test_expect_success 'describe an unreachable blob' '
> +	blob=$(echo not-found-anywhere | git hash-object -w --stdin) &&
> +	test_must_fail git describe $blob 2>actual &&
> +	test_grep "blob .$blob. not reachable from HEAD" actual
> +'
> +
>  test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
>  	i=1 &&
>  	while test $i -lt 8000


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] describe: handle blob traversal with no commits
  2025-08-19 16:59                     ` Jeff King
@ 2025-08-20  4:34                       ` Patrick Steinhardt
  2025-08-20  6:30                         ` [replacement PATCH " Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-08-20  4:34 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, phillip.wood, Cheng, git

On Tue, Aug 19, 2025 at 12:59:47PM -0400, Jeff King wrote:
> On Tue, Aug 19, 2025 at 10:05:19AM +0200, Patrick Steinhardt wrote:
> 
> > > I didn't include a test here because it requires corrupting the
> > > repository in a way that is only easy to do using the files ref backend.
> > > It doesn't seem worth carrying a REFFILES test just for this oddity.
> > 
> > True:
> > 
> >     $ git update-ref HEAD HEAD^{tree}
> >     fatal: update_ref failed for ref 'HEAD': trying to write non-commit object 4b825dc642cb6eb9a060e54bf8d69288fbee4904 to branch 'HEAD'
> > 
> > But:
> > 
> >     $ git update-ref refs/some/tree HEAD^{tree}
> >     $ git symbolic-ref HEAD refs/some/tree
> >     $ git show
> >     tree HEAD
> > 
> > So that should allow you to write a test, right?
> 
> Hrm, that seems like a bug. I thought we insisted that HEAD point at
> refs/heads.
> 
> Ah, no. We did that in b229d18a80 (validate_headref: tighten
> ref-matching to just branches, 2009-01-29), but had to revert it in
> e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
> keep compatibility for topgit. :(

Well, that's certainly from before my time in the Git project :) I guess
changing semantics now would be quite risky. Reintroducing this change
feels out of the picture, but an alternative one could think about is to
validate that HEAD always points to a commit(-ish?).

But ultimately I'm not sure it's even worth it. If people really want to
shoot themselves into the foot they'll find a way to do so.

> Still, I'm not sure it's something I'd want to base a test on. Maybe if
> there is a big comment that says "It is OK to invalidate and remove this
> test if we ever tighten symbolic-ref" it would be OK?

That sounds reasonable to me, yeah.

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [replacement PATCH 4/5] describe: handle blob traversal with no commits
  2025-08-20  4:34                       ` Patrick Steinhardt
@ 2025-08-20  6:30                         ` Jeff King
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2025-08-20  6:30 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: René Scharfe, phillip.wood, Cheng, git, Junio C Hamano

On Wed, Aug 20, 2025 at 06:34:16AM +0200, Patrick Steinhardt wrote:

> > Ah, no. We did that in b229d18a80 (validate_headref: tighten
> > ref-matching to just branches, 2009-01-29), but had to revert it in
> > e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
> > keep compatibility for topgit. :(
> 
> Well, that's certainly from before my time in the Git project :) I guess
> changing semantics now would be quite risky. Reintroducing this change
> feels out of the picture, but an alternative one could think about is to
> validate that HEAD always points to a commit(-ish?).

Yeah, that's _probably_ OK. I don't remember how topgit works at all,
but I think its custom refs do at least point to commits.

> But ultimately I'm not sure it's even worth it. If people really want to
> shoot themselves into the foot they'll find a way to do so.

Yeah, agreed that it's probably not urgent.

> > Still, I'm not sure it's something I'd want to base a test on. Maybe if
> > there is a big comment that says "It is OK to invalidate and remove this
> > test if we ever tighten symbolic-ref" it would be OK?
> 
> That sounds reasonable to me, yeah.

OK. Here's a replacement for patch 4, then, with a test. Nothing else in
the series should need to be touched.

-- >8 --
Subject: [PATCH] describe: handle blob traversal with no commits

When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).

This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:

  blob=$(echo foo | git hash-object -w --stdin)
  echo $blob >.git/HEAD
  git describe $blob

We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.

This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.

But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.

So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).

The test here does not use the formula above, which works only for the
files backend (and not reftables). Instead we use another loophole to
create the bogus state using only Git commands. See the comment in the
test for details.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/describe.c  |  6 ++++--
 t/t6120-describe.sh | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index f7bea3c8c5..72b2e1162c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index feec57bcbc..2c70cc561a 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -423,6 +423,22 @@ test_expect_success 'describe blob on an unborn branch' '
 	test_grep "cannot search .* on an unborn branch" actual
 '
 
+# This test creates a repository state that we generally try to disallow: HEAD
+# is pointing to an object that is not a commit. The ref update code forbids
+# non-commit writes directly to HEAD or to any branch in refs/heads/.  But we
+# can use the loophole of pointing HEAD to another non-branch ref (something we
+# should forbid, but don't for historical reasons).
+#
+# Do not take this test as an endorsement of the loophole! If we ever tighten
+# it, it is reasonable to just drop this test entirely.
+test_expect_success 'describe blob on a non-commit HEAD' '
+	oldbranch=$(git symbolic-ref HEAD) &&
+	test_when_finished "git symbolic-ref HEAD $oldbranch" &&
+	git symbolic-ref HEAD refs/tags/test-blob &&
+	test_must_fail git describe test-blob 2>actual &&
+	test_grep "blob .* not reachable from HEAD" actual
+'
+
 test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.51.0.326.gecbb38d78e


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-08-20  6:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  0:23 Potential Null Pointer Dereference detected by static analysis tool Cheng
2025-08-13 13:19 ` Phillip Wood
2025-08-14 23:26   ` Jeff King
2025-08-15 15:49     ` Phillip Wood
2025-08-17  9:27     ` René Scharfe
2025-08-18  4:48       ` Jeff King
2025-08-18  5:05         ` Jeff King
2025-08-18 19:56           ` René Scharfe
2025-08-18 20:21             ` Jeff King
2025-08-18 20:56               ` Jeff King
2025-08-18 20:58               ` [PATCH 0/5] fix segfault and other oddities describing blobs Jeff King
2025-08-18 20:59                 ` [PATCH 1/5] describe: pass oid struct by const pointer Jeff King
2025-08-18 21:05                   ` Junio C Hamano
2025-08-18 21:01                 ` [PATCH 2/5] describe: error if blob not found Jeff King
2025-08-18 21:12                   ` Junio C Hamano
2025-08-19  8:05                     ` Patrick Steinhardt
2025-08-19 18:32                   ` René Scharfe
2025-08-18 21:01                 ` [PATCH 3/5] describe: catch unborn branch in describe_blob() Jeff King
2025-08-18 21:19                   ` Junio C Hamano
2025-08-18 23:07                     ` Jeff King
2025-08-18 21:03                 ` [PATCH 4/5] describe: handle blob traversal with no commits Jeff King
2025-08-19  8:05                   ` Patrick Steinhardt
2025-08-19 16:59                     ` Jeff King
2025-08-20  4:34                       ` Patrick Steinhardt
2025-08-20  6:30                         ` [replacement PATCH " Jeff King
2025-08-18 21:04                 ` [PATCH 5/5] describe: pass commit to describe_commit() Jeff King
2025-08-19  8:05                   ` Patrick Steinhardt
2025-08-19 17:02                     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).