git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: 'Ben Boeckel' <ben.boeckel@kitware.com>
To: rsbecker@nexbridge.com
Cc: 'Junio C Hamano' <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [BUG] `git describe` doesn't traverse the graph in topological order
Date: Wed, 19 Nov 2025 21:48:52 -0500	[thread overview]
Message-ID: <aR6BlHflRVLN8_XO@rotor> (raw)
In-Reply-To: <033c01d9ed8a$c6916f30$53b44d90$@nexbridge.com>

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On Fri, Sep 22, 2023 at 15:27:01 -0400, rsbecker@nexbridge.com wrote:
> I actually thought it worked that way. This may end up in a bigger
> change than fixing the issue because --first-parent does not appear to
> be sufficient to resolve the correct tag from your graph. My thought
> on using multiple commitish values to do that may help, but
> implementing that could lead to an O(n*m) scan (n=max commit tree
> width, m=depth to tag), plus a commitish hash lookup.

So I finally found some time to go back to this. The actual fix is
actually rather easy (patch attached). However, as guessed at previously
in the thread, the performance is in the tank without an up-to-date
commit graph ("instant" with it versus "minutes" without). On the other
hand, it is *accurate*. It does fix one expect-fail test case already in
the test suite (also included in the patch).

We could go one of two (or more! feel free to offer alternatives) ways:

- swallow the pill and accept the performance for accurate results
  (e.g., warn if there is not a recent `commit-graph`)
- add an `--accurate` flag to optionally use it with the caveat that
  reported descriptions may *change* under the flag (e.g., with the
  reproducer script, a "working" description is `tag-release-7-g<hash>`,
  but with the graph, it is the correct `tag-release-5-g<hash>`.

Also note the the reproducer provided was "fixed" in 7379046221
(describe: stop digging for max_candidates+1, 2024-11-06) because it
stopped searching because there were no more tags in the history.
Tagging the root commit preserves the reproducer state. I've attached an
updated reproducer script as well.

Thoughts on a plan forward?

--Ben

[-- Attachment #2: 0001-describe-traverse-commits-by-ancestry-instead-of-com.patch --]
[-- Type: text/plain, Size: 4153 bytes --]

From ca4df5b9c9542315f77c166d47d5c63a2ebdafd1 Mon Sep 17 00:00:00 2001
From: Ben Boeckel <mathstuf@gmail.com>
Date: Wed, 12 Nov 2025 23:53:20 -0500
Subject: [PATCH 1/1] describe: traverse commits by ancestry instead of commit
 date

An ancestor commit should never be traversed before its descendents.
This could happen if a series of commits are made in rapid succession
and they all share a commit date (to the 1-second resolution supported
in the metadata).

This was discovered in VTK's history where a `git describe` would return
the previous release's tag name rather than the one just made. The
problematic topology looks like:

    H ---- M1 -- M2 -- M3 -- M4 - ROOT
    |       \     \     \    \    /|
    |        \     \     |    \  / |
    |         \    R2 ---|---- R1 |
    |          \   /     |    /  /
     \          \ /       \  /  /
      P1 - P2 - P3 ------- P4 --

Where P1 and P3 are tagged commits. If all commits share a commit date,
`git describe` traverses in the following order:

  - H
  - M1
  - P1 (tagged)
  - M2
  - P3 (tagged)
  - P2
  - M3
  - R2
  - P4
  - M4
  - ROOT
  - R1

Although P1 is traversed before P3, P1's depth is incremented due to the
`flag_within` check despite the ancestry actually being the other way
around. When all is said and done, the description is reported as
`P3-7-g<hash>` despite the P1 tagged commit having it as an ancestor. If
P1 is restricted using `describe --match`, it is reported as
`P1-10-g<hash>` due to the traversal order issue.

Using topology sorting on the commit queue, the description is
accurately reported as `P1-5-g<hash>` instead and the traversal order
is:

Instead of commit date heuristics, use ancestry as the sort constraint.
This also fixes one expect-failure test case as well. However, the
performance depends on having a `git commit-graph` available.

Reported-in: <ZNffWAgldUZdpQcr@farprobe>
---
 builtin/describe.c  | 20 +++++++++++++++++++-
 t/t6120-describe.sh |  2 +-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index ffaf8d9f0a..789586e5a5 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 
 #include "builtin.h"
 #include "config.h"
+#include "commit-reach.h"
 #include "environment.h"
 #include "gettext.h"
 #include "hex.h"
@@ -256,7 +257,23 @@ struct lazy_queue {
 	bool get_pending;
 };
 
-#define LAZY_QUEUE_INIT { { compare_commits_by_commit_date }, false }
+/*
+ * Topological comparison: always return parents before children.
+ * This is reverse topological order: children before parents.
+ */
+static int compare_commits_topo(const void *a_, const void *b_, void *_unused_ UNUSED)
+{
+	struct commit *a = (struct commit *)a_;
+	struct commit *b = (struct commit *)b_;
+	if (repo_is_descendant_of(the_repository, a, &(struct commit_list){ b, NULL }))
+		return -1; // a is descendant, so comes before b
+	if (repo_is_descendant_of(the_repository, b, &(struct commit_list){ a, NULL }))
+		return 1; // b is descendant, so comes before a
+	// fallback: order by hash for determinism
+	return oidcmp(&a->object.oid, &b->object.oid);
+}
+
+#define LAZY_QUEUE_INIT { { compare_commits_topo }, false }
 
 static void *lazy_queue_get(struct lazy_queue *queue)
 {
@@ -413,6 +430,7 @@ static void describe_commit(struct commit *cmit, struct strbuf *dst)
 		struct commit_list *parents = c->parents;
 		struct commit_name **slot;
 
+		fprintf(stderr, "\n\nlooking at commit %s\n", oid_to_hex(&c->object.oid));
 		seen_commits++;
 
 		if (match_cnt == max_candidates ||
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 2c70cc561a..36e1b9d848 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -711,7 +711,7 @@ test_expect_success 'setup: describe commits with disjoint bases 2' '
 '
 
 check_describe -C disjoint2 "B-3-gHASH" HEAD
-check_describe -C disjoint2 --expect-failure "B-3-gHASH" --candidates=2 HEAD
+check_describe -C disjoint2 "B-3-gHASH" --candidates=2 HEAD
 
 test_expect_success 'setup misleading taggerdates' '
 	GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1
-- 
2.51.1


[-- Attachment #3: 2-repro-git-describe.sh --]
[-- Type: application/x-sh, Size: 2133 bytes --]

  reply	other threads:[~2025-11-20  2:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-12 19:36 [BUG] `git describe` doesn't traverse the graph in topological order Ben Boeckel
2023-09-22 15:39 ` Ben Boeckel
2023-09-22 16:13   ` rsbecker
2023-09-22 16:51     ` 'Ben Boeckel'
2023-09-22 17:14       ` rsbecker
2023-09-22 17:38         ` 'Ben Boeckel'
2023-09-22 17:51         ` Junio C Hamano
2023-09-22 18:12           ` rsbecker
2023-09-22 18:44             ` 'Ben Boeckel'
2023-09-22 18:49               ` rsbecker
2023-09-22 19:05                 ` 'Ben Boeckel'
2023-09-22 19:27                   ` rsbecker
2025-11-20  2:48                     ` 'Ben Boeckel' [this message]
2025-11-20  8:05                       ` Jeff King
2023-09-22 18:41           ` 'Ben Boeckel'
2023-09-23 12:32         ` 'Ben Boeckel'
2023-09-22 17:11     ` Kristoffer Haugsbakk
2023-09-22 17:35   ` Kristoffer Haugsbakk
2023-09-22 17:43     ` 'Ben Boeckel'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aR6BlHflRVLN8_XO@rotor \
    --to=ben.boeckel@kitware.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=rsbecker@nexbridge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).