[PATCH] last-modified: implement faster algorithm

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] last-modified: implement faster algorithm
@ 2025-10-16  8:39 Toon Claes
  2025-10-16 18:51 ` Justin Tobler
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-16  8:39 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Justin Tobler, Taylor Blau, Toon Claes

The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.

While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:

  1. There are many entries of interest, in which case there is simply
     a lot of work to do.

  2. Or, there are (even a few) entries which have not been updated in a
     long time, and so we must walk through a lot of history in order to
     find a commit that touches that path.

This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.

The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.

More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.

Then, while the queue is not empty, do the following:

  1. Pop an element, say, 'c', off of the queue, making sure that 'c'
     isn't reachable by anything in the '--not' set.

  2. For each parent 'p' (with index 'parent_i') of 'c', do the
     following:

     a. Compute the diff between 'c' and 'p'.
     b. Pass any active paths that are TREESAME from 'c' to 'p'.
     c. If 'p' has any active paths, push it onto the queue.

  3. Any path that remains active on 'c' is associated to that commit.

This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.

To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.

Comparing the performance of this new algorithm shows about a 2.6x
improvement on git.git:

    Benchmark 1: master
      Time (mean ± σ):      3.077 s ±  0.055 s    [User: 3.017 s, System: 0.051 s]
      Range (min … max):    2.947 s …  3.127 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):      1.181 s ±  0.010 s    [User: 1.139 s, System: 0.038 s]
      Range (min … max):    1.169 s …  1.194 s    10 runs

    Summary
      HEAD ran
        2.60 ± 0.05 times faster than master

But when comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is a lot bigger:

    Benchmark 1: master
      Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
      Range (min … max):    4.308 s …  4.509 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
      Range (min … max):   810.6 ms … 881.2 ms    10 runs

    Summary
      HEAD ran
        5.29 ± 0.16 times faster than master

As an added benefit, this implementation gives more correct results. For
example implementation in 'master' gives:

    $ git log --max-count=1 --format=%H -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de

    $ git last-modified -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h

    $ git last-modified | grep pkt-line.h
    5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h

With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.

One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.

Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
The subcommand git-last-modified(1) was based on the patches shared by
Taylor and the folks at GitHub[1]. That version used an alternative
implementation to make it "go faster". When I was working on upstreaming
those patches, I dropped the patches[2] for this implementation, because
I didn't see significant improvements.

This series revives those changes. I did more thorough deep dive through
the code and the algorithm and got the code working a lot faster. The
benchmark results can be found in the commit message.

Some changes compared to GitHub's version include:

 * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
   `struct commit_active_paths`.

 * Removed shortcut code that handled the case when commit and parent
   are fully treesame, and instead always checked 'active_c' whether the
   next parent is worth looking at.

 * Modified comments and commit message to make the algorithm more
   clear (at least to me).

 * Mentioned the use of PARENT1 and PARENT2 in object.h.

 * Removed the use of any global variables.

 * Less conditions are checked in mark_path() because the hashmap of
   'paths' is considered the single-source of truth.

 * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
   hashmap no more.

[1]: https://lore.kernel.org/git/Z+XJ+1L3PnC9Dyba@nand.local/
[2]: https://lore.kernel.org/git/20250630-toon-new-blame-tree-v3-0-3516025dc3bc@iotcl.com/
---
 builtin/last-modified.c  | 252 ++++++++++++++++++++++++++++++++++++++++++++---
 object.h                 |   1 +
 t/t8020-last-modified.sh |   2 +-
 3 files changed, 240 insertions(+), 15 deletions(-)

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index ae8b36a2c3..40e520ba18 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -2,26 +2,32 @@
 #include "bloom.h"
 #include "builtin.h"
 #include "commit-graph.h"
+#include "commit-slab.h"
 #include "commit.h"
 #include "config.h"
-#include "environment.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "environment.h"
+#include "ewah/ewok.h"
 #include "hashmap.h"
 #include "hex.h"
-#include "log-tree.h"
 #include "object-name.h"
 #include "object.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "quote.h"
 #include "repository.h"
 #include "revision.h"
 
+/* Remember to update object flag allocation in object.h */
+#define PARENT1 (1u<<16) /* used instead of SEEN */
+#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
+
 struct last_modified_entry {
 	struct hashmap_entry hashent;
 	struct object_id oid;
 	struct bloom_key key;
+	size_t diff_idx;
 	const char path[FLEX_ARRAY];
 };
 
@@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
 	return strcmp(ent1->path, path ? path : ent2->path);
 }
 
+/*
+ * Hold a bitmap for each commit we're working with. Each bit represents a path
+ * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
+ */
+define_commit_slab(commit_bitmaps, struct bitmap *);
+
 struct last_modified {
 	struct hashmap paths;
 	struct rev_info rev;
 	bool recursive;
 	bool show_trees;
+
+	const char **all_paths;
+	size_t all_paths_nr;
+	struct commit_bitmaps commit_bitmaps;
+
+	/* 'scratch' bitmap to avoid allocating every proccess_parent() */
+	struct bitmap *scratch;
 };
 
+static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = commit_bitmaps_at(&lm->commit_bitmaps, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
+
+	return *bitmap;
+}
+
 static void last_modified_release(struct last_modified *lm)
 {
 	struct hashmap_iter iter;
@@ -54,6 +82,8 @@ static void last_modified_release(struct last_modified *lm)
 
 	hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
 	release_revisions(&lm->rev);
+
+	free(lm->all_paths);
 }
 
 struct last_modified_callback_data {
@@ -196,7 +226,36 @@ static void last_modified_diff(struct diff_queue_struct *q,
 	}
 }
 
-static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
+static size_t path_idx(struct last_modified *lm, char *path)
+{
+	struct last_modified_entry *ent;
+	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
+					  struct last_modified_entry, hashent);
+
+	return ent ? ent->diff_idx : -1;
+}
+
+static void pass_to_parent(struct last_modified *lm,
+			   struct bitmap *c,
+			   struct bitmap *p,
+			   size_t pos)
+{
+	struct last_modified_entry *ent;
+	struct hashmap_iter iter;
+
+	bitmap_unset(c, pos);
+
+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		if (ent->diff_idx == pos) {
+			bitmap_set(p, pos);
+			break;
+		}
+	}
+}
+
+static bool maybe_changed_path(struct last_modified *lm,
+			       struct commit *origin,
+			       struct bitmap *active)
 {
 	struct bloom_filter *filter;
 	struct last_modified_entry *ent;
@@ -213,6 +272,9 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 		return true;
 
 	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		if (active && !bitmap_get(active, ent->diff_idx))
+			continue;
+
 		if (bloom_filter_contains(filter, &ent->key,
 					  lm->rev.bloom_filter_settings))
 			return true;
@@ -220,42 +282,197 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 	return false;
 }
 
+static void process_parent(struct last_modified *lm,
+			   struct prio_queue *queue,
+			   struct commit *c, struct bitmap *active_c,
+			   struct commit *parent, int parent_i)
+{
+	size_t i;
+	struct bitmap *active_p;
+
+	repo_parse_commit(lm->rev.repo, parent);
+	active_p = get_bitmap(lm, parent);
+
+	/*
+	 * The first time entering this function for this commit (i.e. first parent)
+	 * see if Bloom filters will tell us it's worth to do the diff.
+	 */
+	if (parent_i || maybe_changed_path(lm, c, active_c)) {
+		diff_tree_oid(&parent->object.oid,
+			      &c->object.oid, "", &lm->rev.diffopt);
+		diffcore_std(&lm->rev.diffopt);
+	}
+
+	/*
+	 * Otherwise, test each path for TREESAME-ness against the parent. If
+	 * a path is TREESAME, pass it on to this parent.
+	 *
+	 * First, collect all paths that are *not* TREESAME in 'scratch'.
+	 * Then, pass paths that *are* TREESAME and active to the parent.
+	 */
+	for (i = 0; i < diff_queued_diff.nr; i++) {
+		struct diff_filepair *fp = diff_queued_diff.queue[i];
+		size_t k = path_idx(lm, fp->two->path);
+		if (0 <= k && bitmap_get(active_c, k))
+			bitmap_set(lm->scratch, k);
+		diff_free_filepair(fp);
+	}
+	for (i = 0; i < lm->all_paths_nr; i++) {
+		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
+			pass_to_parent(lm, active_c, active_p, i);
+	}
+
+	/*
+	 * If parent has any active paths, put it on the queue (if not already).
+	 */
+	if (!bitmap_is_empty(active_p) && !(parent->object.flags & PARENT1)) {
+		parent->object.flags |= PARENT1;
+		prio_queue_put(queue, parent);
+	}
+
+	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
+	diff_queued_diff.nr = 0;
+	diff_queue_clear(&diff_queued_diff);
+}
+
 static int last_modified_run(struct last_modified *lm)
 {
+	int max_count, queue_popped = 0;
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
+	struct commit_list *list;
 	struct last_modified_callback_data data = { .lm = lm };
 
 	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	lm->rev.diffopt.format_callback = last_modified_diff;
 	lm->rev.diffopt.format_callback_data = &data;
+	lm->rev.no_walk = 1;
 
 	prepare_revision_walk(&lm->rev);
 
-	while (hashmap_get_size(&lm->paths)) {
-		data.commit = get_revision(&lm->rev);
-		if (!data.commit)
-			BUG("paths remaining beyond boundary in last-modified");
+	max_count = lm->rev.max_count;
+
+	init_commit_bitmaps(&lm->commit_bitmaps);
+	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
+
+	/*
+	 * lm->rev.commits holds the set of boundary commits for our walk.
+	 *
+	 * Loop through each such commit, and place it in the appropriate queue.
+	 */
+	for (list = lm->rev.commits; list; list = list->next) {
+		struct commit *c = list->item;
+
+		if (c->object.flags & BOTTOM) {
+			prio_queue_put(&not_queue, c);
+			c->object.flags |= PARENT2;
+		} else if (!(c->object.flags & PARENT1)) {
+			/*
+			 * If the commit is a starting point (and hasn't been
+			 * seen yet), then initialize the set of interesting
+			 * paths, too.
+			 */
+			struct bitmap *active;
+
+			prio_queue_put(&queue, c);
+			c->object.flags |= PARENT1;
 
-		if (data.commit->object.flags & BOUNDARY) {
+			active = get_bitmap(lm, c);
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
+				bitmap_set(active, i);
+		}
+	}
+
+	while (queue.nr) {
+		int parent_i;
+		struct commit_list *p;
+		struct commit *c = prio_queue_get(&queue);
+		struct bitmap *active_c = get_bitmap(lm, c);
+
+		if ((0 <= max_count && max_count < ++queue_popped) ||
+		    (c->object.flags & PARENT2)) {
+			/*
+			 * Either a boundary commit, or we have already seen too
+			 * many others. Either way, stop here.
+			 */
+			c->object.flags |= PARENT2 | BOUNDARY;
+			data.commit = c;
 			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
-				      &data.commit->object.oid, "",
-				      &lm->rev.diffopt);
+				      &c->object.oid,
+				      "", &lm->rev.diffopt);
 			diff_flush(&lm->rev.diffopt);
+			goto cleanup;
+		}
 
-			break;
+		/*
+		 * Otherwise, make sure that 'c' isn't reachable from anything
+		 * in the '--not' queue.
+		 */
+		repo_parse_commit(lm->rev.repo, c);
+
+		while (not_queue.nr) {
+			struct commit_list *np;
+			struct commit *n = prio_queue_get(&not_queue);
+
+			repo_parse_commit(lm->rev.repo, n);
+
+			for (np = n->parents; np; np = np->next) {
+				if (!(np->item->object.flags & PARENT2)) {
+					prio_queue_put(&not_queue, np->item);
+					np->item->object.flags |= PARENT2;
+				}
+			}
+
+			if (commit_graph_generation(n) < commit_graph_generation(c))
+				break;
 		}
 
-		if (!maybe_changed_path(lm, data.commit))
-			continue;
+		/*
+		 * Look at each parent and pass on each path that's TREESAME
+		 * with that parent. Stop early when no active paths remain.
+		 */
+		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
+			process_parent(lm, &queue,
+				       c, active_c,
+				       p->item, parent_i);
+
+			if (bitmap_is_empty(active_c))
+				break;
+		}
 
-		log_tree_commit(&lm->rev, data.commit);
+		/*
+		 * Paths that remain active, or not TREESAME with any parent,
+		 * were changed by 'c'.
+		 */
+		if (!bitmap_is_empty(active_c))  {
+			data.commit = c;
+			for (size_t i = 0; i < lm->all_paths_nr; i++) {
+				if (bitmap_get(active_c, i))
+					mark_path(lm->all_paths[i], NULL, &data);
+			}
+		}
+
+cleanup:
+		bitmap_free(active_c);
 	}
 
+	if (hashmap_get_size(&lm->paths))
+		BUG("paths remaining beyond boundary in last-modified");
+
+	clear_prio_queue(&not_queue);
+	clear_prio_queue(&queue);
+	clear_commit_bitmaps(&lm->commit_bitmaps);
+	bitmap_free(lm->scratch);
+
 	return 0;
 }
 
 static int last_modified_init(struct last_modified *lm, struct repository *r,
 			      const char *prefix, int argc, const char **argv)
 {
+	struct hashmap_iter iter;
+	struct last_modified_entry *ent;
+
 	hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
 
 	repo_init_revisions(r, &lm->rev, prefix);
@@ -280,6 +497,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
 	if (populate_paths_from_revs(lm) < 0)
 		return error(_("unable to setup last-modified"));
 
+	lm->all_paths = xcalloc(hashmap_get_size(&lm->paths), sizeof(const char *));
+	lm->all_paths_nr = 0;
+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		ent->diff_idx = lm->all_paths_nr++;
+		lm->all_paths[ent->diff_idx] = ent->path;
+	}
+
 	return 0;
 }
 
diff --git a/object.h b/object.h
index 8c3c1c46e1..fa504a09c0 100644
--- a/object.h
+++ b/object.h
@@ -75,6 +75,7 @@ void object_array_init(struct object_array *array);
  * http-push.c:                          11-----14
  * commit-graph.c:                                15
  * commit-reach.c:                                  16-----19
+ * builtin/last-modified.c:                         1617
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * bloom.c:                                                    2122
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 61f00bc15c..a4c1114ee2 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -57,9 +57,9 @@ test_expect_success 'last-modified recursive' '
 
 test_expect_success 'last-modified recursive with show-trees' '
 	check_last_modified -r -t <<-\EOF
-	3 a
 	3 a/b
 	3 a/b/file
+	3 a
 	2 a/file
 	1 file
 	EOF

---
base-commit: 143f58ef7535f8f8a80d810768a18bdf3807de26
change-id: 20251009-b4-toon-last-modified-faster-4c8956a95261

Best regards,
--  
Toon Claes <toon@iotcl.com>


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
@ 2025-10-16 18:51 ` Justin Tobler
  2025-10-17 10:38   ` Toon Claes
  2025-10-16 20:48 ` D. Ben Knoble
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Justin Tobler @ 2025-10-16 18:51 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Taylor Blau

On 25/10/16 10:39AM, Toon Claes wrote:
> The current implementation of git-last-modified(1) works by doing a
> revision walk, and inspecting the diff at each level of that walk to
> annotate entries remaining in the hashmap of paths. In other words, if
> the diff at some level touches a path which has not yet been associated
> with a commit, then that commit becomes associated with the path.
> 
> While a perfectly reasonable implementation, it can perform poorly in
> either one of two scenarios:
> 
>   1. There are many entries of interest, in which case there is simply
>      a lot of work to do.
> 
>   2. Or, there are (even a few) entries which have not been updated in a
>      long time, and so we must walk through a lot of history in order to
>      find a commit that touches that path.
> 
> This patch rewrites the last-modified implementation that addresses the
> second point. The idea behind the algorithm is to propagate a set of
> 'active' paths (a path is 'active' if it does not yet belong to a
> commit) up to parents and do a truncated revision walk.
> 
> The walk is truncated because it does not produce a revision for every
> change in the original pathspec, but rather only for active paths.

Ok so if I understand correctly, the optimization here is that as we
perform the revision walk, the set of paths we look for at each commit
monotonically decreases as changed paths are identified. Prior to this,
we were always checking all paths for each commit even though a path may
have already found the commit that last modified it.

> More specifically, consider a priority queue of commits sorted by
> generation number. First, enqueue the set of boundary commits with all
> paths in the original spec marked as interesting.
> 
> Then, while the queue is not empty, do the following:
> 
>   1. Pop an element, say, 'c', off of the queue, making sure that 'c'
>      isn't reachable by anything in the '--not' set.
> 
>   2. For each parent 'p' (with index 'parent_i') of 'c', do the
>      following:
> 
>      a. Compute the diff between 'c' and 'p'.
>      b. Pass any active paths that are TREESAME from 'c' to 'p'.
>      c. If 'p' has any active paths, push it onto the queue.
> 
>   3. Any path that remains active on 'c' is associated to that commit.
> 
> This ends up being equivalent to doing something like 'git log -1 --
> $path' for each path simultaneously. But, it allows us to go much faster
> than the original implementation by limiting the number of diffs we
> compute, since we can avoid parts of history that would have been
> considered by the revision walk in the original implementation, but are
> known to be uninteresting to us because we have already marked all paths
> in that area to be inactive.
> 
> To avoid computing many first-parent diffs, add another trick on top of
> this and check if all paths active in 'c' are DEFINITELY NOT in c's
> Bloom filter. Since the commit-graph only stores first-parent diffs in
> the Bloom filters, we can only apply this trick to first-parent diffs.
> 
[snip]
> As an added benefit, this implementation gives more correct results. For
> example implementation in 'master' gives:

s/implementation/the implementation/

>     $ git log --max-count=1 --format=%H -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de
> 
>     $ git last-modified -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h
> 
>     $ git last-modified | grep pkt-line.h
>     5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h
> 
> With the changes in this patch the results of git-last-modified(1)
> always match those of `git log --max-count=1`.
> 
> One thing to note though, the results might be outputted in a different
> order than before. This is not considerd to be an issue because nowhere
> is documented the order is guaranteed.
> 
> Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
[snip]
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index ae8b36a2c3..40e520ba18 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -2,26 +2,32 @@
>  #include "bloom.h"
>  #include "builtin.h"
>  #include "commit-graph.h"
> +#include "commit-slab.h"
>  #include "commit.h"
>  #include "config.h"
> -#include "environment.h"
>  #include "diff.h"
>  #include "diffcore.h"
>  #include "environment.h"
> +#include "ewah/ewok.h"
>  #include "hashmap.h"
>  #include "hex.h"
> -#include "log-tree.h"
>  #include "object-name.h"
>  #include "object.h"
>  #include "parse-options.h"
> +#include "prio-queue.h"
>  #include "quote.h"
>  #include "repository.h"
>  #include "revision.h"
>  
> +/* Remember to update object flag allocation in object.h */

At first I was wondering if this is a leftover note, but it looks like
it is just a reminder if the allocations change here.

> +#define PARENT1 (1u<<16) /* used instead of SEEN */
> +#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */

Naive question: why do we use these object flags instead of the ones
mentioned?

> +
>  struct last_modified_entry {
>  	struct hashmap_entry hashent;
>  	struct object_id oid;
>  	struct bloom_key key;
> +	size_t diff_idx;
>  	const char path[FLEX_ARRAY];
>  };
>  
> @@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
>  	return strcmp(ent1->path, path ? path : ent2->path);
>  }
>  
> +/*
> + * Hold a bitmap for each commit we're working with. Each bit represents a path
> + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
> + */
> +define_commit_slab(commit_bitmaps, struct bitmap *);

Why do we need a path bitmap for each commit? My understanding is that
we check commits in a certain order as dictated by the priority queue.
As soon as the commit that last-modified a path has been identified,
wouldn't we always want the remaining commits processed to only check
the outstanding paths?

>  struct last_modified {
>  	struct hashmap paths;
>  	struct rev_info rev;
>  	bool recursive;
>  	bool show_trees;
> +
> +	const char **all_paths;
> +	size_t all_paths_nr;

It is not immediately obvious to me why we have both `paths` and
`all_paths`. From my understanding, `all_paths` is defining the path
order for the bitmap. If this is the case, maybe it would be worth
explaining in a comment?

> +	struct commit_bitmaps commit_bitmaps;
> +
> +	/* 'scratch' bitmap to avoid allocating every proccess_parent() */
> +	struct bitmap *scratch;
>  };
>  
> +static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
> +{
> +	struct bitmap **bitmap = commit_bitmaps_at(&lm->commit_bitmaps, c);
> +	if (!*bitmap)
> +		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
> +
> +	return *bitmap;
> +}
> +
>  static void last_modified_release(struct last_modified *lm)
>  {
>  	struct hashmap_iter iter;
[snip]
>  static int last_modified_run(struct last_modified *lm)
>  {
> +	int max_count, queue_popped = 0;
> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> +	struct commit_list *list;
>  	struct last_modified_callback_data data = { .lm = lm };
>  
>  	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
>  	lm->rev.diffopt.format_callback = last_modified_diff;
>  	lm->rev.diffopt.format_callback_data = &data;
> +	lm->rev.no_walk = 1;
>  
>  	prepare_revision_walk(&lm->rev);
>  
> -	while (hashmap_get_size(&lm->paths)) {
> -		data.commit = get_revision(&lm->rev);
> -		if (!data.commit)
> -			BUG("paths remaining beyond boundary in last-modified");
> +	max_count = lm->rev.max_count;
> +
> +	init_commit_bitmaps(&lm->commit_bitmaps);
> +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);

It looks like we initialize and release both `commit_bitmaps` and
`scratch` here in `last_modified_run()`. Any reason we wouldn't want to
move this to `last_modified_{init,release}()`?

> +
> +	/*
> +	 * lm->rev.commits holds the set of boundary commits for our walk.

Naive question: would it be more correct to say that `rev.commits` is
the list of starting commits? Boundary commits sounds like commits on
the boundary of what we consider interesting/uninteresting which, from
my understanding, is not the case here.

> +	 *
> +	 * Loop through each such commit, and place it in the appropriate queue.
> +	 */
> +	for (list = lm->rev.commits; list; list = list->next) {
> +		struct commit *c = list->item;
> +
> +		if (c->object.flags & BOTTOM) {
> +			prio_queue_put(&not_queue, c);

Ok so commits with the BOTTOM flag are at the boundary of the
"interesting" commit graph. Thus they are not included in the search and
added to the "not_queue".

> +			c->object.flags |= PARENT2;

What is the meaning behind the name PARENT2 in this context? From my
understanding we are using this flag to denote a commit we are not
interested in.

> +		} else if (!(c->object.flags & PARENT1)) {

Same question about PARENT1. It seems to be used to just denote commits
that we have already encountered. The names confuse me a bit though.

> +			/*
> +			 * If the commit is a starting point (and hasn't been
> +			 * seen yet), then initialize the set of interesting
> +			 * paths, too.
> +			 */
> +			struct bitmap *active;
> +
> +			prio_queue_put(&queue, c);
> +			c->object.flags |= PARENT1;

We queue the commit and mark it as seen. Makes sense.

> -		if (data.commit->object.flags & BOUNDARY) {
> +			active = get_bitmap(lm, c);
> +			for (size_t i = 0; i < lm->all_paths_nr; i++)
> +				bitmap_set(active, i);

Here we set up the path bitmap for the commit. At this point, all paths
are still "active" and thus set accordingly. I'm still not entirely sure
though if we really need a path bitmap per commit.

> +		}
> +	}
> +
> +	while (queue.nr) {
> +		int parent_i;
> +		struct commit_list *p;
> +		struct commit *c = prio_queue_get(&queue);
> +		struct bitmap *active_c = get_bitmap(lm, c);
> +
> +		if ((0 <= max_count && max_count < ++queue_popped) ||
> +		    (c->object.flags & PARENT2)) {
> +			/*
> +			 * Either a boundary commit, or we have already seen too
> +			 * many others. Either way, stop here.
> +			 */
> +			c->object.flags |= PARENT2 | BOUNDARY;
> +			data.commit = c;
>  			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> -				      &data.commit->object.oid, "",
> -				      &lm->rev.diffopt);
> +				      &c->object.oid,
> +				      "", &lm->rev.diffopt);
>  			diff_flush(&lm->rev.diffopt);
> +			goto cleanup;
> +		}
>  
> -			break;
> +		/*
> +		 * Otherwise, make sure that 'c' isn't reachable from anything
> +		 * in the '--not' queue.
> +		 */
> +		repo_parse_commit(lm->rev.repo, c);
> +
> +		while (not_queue.nr) {
> +			struct commit_list *np;
> +			struct commit *n = prio_queue_get(&not_queue);
> +
> +			repo_parse_commit(lm->rev.repo, n);
> +
> +			for (np = n->parents; np; np = np->next) {
> +				if (!(np->item->object.flags & PARENT2)) {
> +					prio_queue_put(&not_queue, np->item);
> +					np->item->object.flags |= PARENT2;
> +				}
> +			}
> +
> +			if (commit_graph_generation(n) < commit_graph_generation(c))
> +				break;

If the generation number of 'c' is higher than 'n' we know 'c' cannot be
an ancestor of 'n' and thus we continue on. Makes sense.

>  		}
>  
> -		if (!maybe_changed_path(lm, data.commit))
> -			continue;
> +		/*
> +		 * Look at each parent and pass on each path that's TREESAME
> +		 * with that parent. Stop early when no active paths remain.
> +		 */
> +		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
> +			process_parent(lm, &queue,
> +				       c, active_c,
> +				       p->item, parent_i);
> +
> +			if (bitmap_is_empty(active_c))
> +				break;
> +		}
>  
> -		log_tree_commit(&lm->rev, data.commit);
> +		/*
> +		 * Paths that remain active, or not TREESAME with any parent,
> +		 * were changed by 'c'.
> +		 */
> +		if (!bitmap_is_empty(active_c))  {
> +			data.commit = c;
> +			for (size_t i = 0; i < lm->all_paths_nr; i++) {
> +				if (bitmap_get(active_c, i))
> +					mark_path(lm->all_paths[i], NULL, &data);
> +			}
> +		}
> +
> +cleanup:
> +		bitmap_free(active_c);
>  	}
>  
> +	if (hashmap_get_size(&lm->paths))
> +		BUG("paths remaining beyond boundary in last-modified");
> +
> +	clear_prio_queue(&not_queue);
> +	clear_prio_queue(&queue);
> +	clear_commit_bitmaps(&lm->commit_bitmaps);
> +	bitmap_free(lm->scratch);
> +
>  	return 0;
>  }
[snip]

I've taken an initial look a this patch and have mostly questions so
far. Just FYI, after applying this patch locally, the last-modified
tests seem to be failing. The command seems to be segfaulting, but I
haven't looked into it further.

-Justin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
  2025-10-16 18:51 ` Justin Tobler
@ 2025-10-16 20:48 ` D. Ben Knoble
  2025-10-17 10:45   ` Toon Claes
  2025-10-16 23:38 ` Taylor Blau
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: D. Ben Knoble @ 2025-10-16 20:48 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau

On Thu, Oct 16, 2025 at 4:39 AM Toon Claes <toon@iotcl.com> wrote:
> As an added benefit, this implementation gives more correct results. For
> example implementation in 'master' gives:

"More correct" is a bit of an oxymoron, no? It's either correct or it's not :)

>
>     $ git log --max-count=1 --format=%H -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de
>
>     $ git last-modified -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de    pkt-line.h
>
>     $ git last-modified | grep pkt-line.h
>     5b49c1af03e600c286f63d9d9c9fb01403230b9f    pkt-line.h

It seems this commit is the merge to a maintenance branch, which was
authored and committed after the mainline merge but topologically we'd
probably consider it "earlier," at least starting from master? Anyway,
I'm not clear why this result was produced.

Thanks!

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
  2025-10-16 18:51 ` Justin Tobler
  2025-10-16 20:48 ` D. Ben Knoble
@ 2025-10-16 23:38 ` Taylor Blau
  2025-10-17  6:30   ` Jeff King
  2025-10-17 12:07   ` Toon Claes
  2025-10-17  6:37 ` Jeff King
  2025-10-21 12:56 ` [PATCH v2] " Toon Claes
  4 siblings, 2 replies; 39+ messages in thread
From: Taylor Blau @ 2025-10-16 23:38 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee

On Thu, Oct 16, 2025 at 10:39:25AM +0200, Toon Claes wrote:
> [...]
>
> To avoid computing many first-parent diffs, add another trick on top of
> this and check if all paths active in 'c' are DEFINITELY NOT in c's
> Bloom filter. Since the commit-graph only stores first-parent diffs in
> the Bloom filters, we can only apply this trick to first-parent diffs.

OK, up to this point this is the same as the commit message that I wrote
with Stolee back in 2020.

> Comparing the performance of this new algorithm shows about a 2.6x
> improvement on git.git:
>
>     Benchmark 1: master
>       Time (mean ± σ):      3.077 s ±  0.055 s    [User: 3.017 s, System: 0.051 s]
>       Range (min … max):    2.947 s …  3.127 s    10 runs
>
>     Benchmark 2: HEAD
>       Time (mean ± σ):      1.181 s ±  0.010 s    [User: 1.139 s, System: 0.038 s]
>       Range (min … max):    1.169 s …  1.194 s    10 runs
>
>     Summary
>       HEAD ran
>         2.60 ± 0.05 times faster than master
>
> But when comparing a more extreme example of
> `git last-modified -- COPYING t`, the difference is a lot bigger:
>
>     Benchmark 1: master
>       Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
>       Range (min … max):    4.308 s …  4.509 s    10 runs
>
>     Benchmark 2: HEAD
>       Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
>       Range (min … max):   810.6 ms … 881.2 ms    10 runs
>
>     Summary
>       HEAD ran
>         5.29 ± 0.16 times faster than master

These benchmarks are different than the ones that I provided, which is
good, since we should be measuring modern Git, not dragging forward
stale benchmarks ;-).

I imagine that you are just doing a straight last-modified run here in
both instances. In the original patch, I timed this both with and
without changed-path Bloom filters, which helped illustrate their impact
on the changes here.

I'd suggest including those benchmarks as well, and potentially running
them on linux.git, or another comparably larger open-source repository.
git.git is large enough to show some interesting behavior, but I always
have found it useful to compare the results against a larger repository
as well.

> As an added benefit, this implementation gives more correct results. For
> example implementation in 'master' gives:
>
>     $ git log --max-count=1 --format=%H -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de
>
>     $ git last-modified -- pkt-line.h
>     15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h
>
>     $ git last-modified | grep pkt-line.h
>     5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h
>
> With the changes in this patch the results of git-last-modified(1)
> always match those of `git log --max-count=1`.
>
> One thing to note though, the results might be outputted in a different
> order than before. This is not considerd to be an issue because nowhere
> is documented the order is guaranteed.
>
> Based-on-patches-by: Taylor Blau <me@ttaylorr.com>

Stolee and I wrote these patches together many years ago, so he should
be credited here as well. Since this patch appears to be substantially
based on the original work, I think it is appropriate to include my
S-o-b once the patch is ready.

> ---
> This series revives those changes. I did more thorough deep dive through
> the code and the algorithm and got the code working a lot faster. The
> benchmark results can be found in the commit message.
>
> Some changes compared to GitHub's version include:
>
>  * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
>    `struct commit_active_paths`.
>
>  * Removed shortcut code that handled the case when commit and parent
>    are fully treesame, and instead always checked 'active_c' whether the
>    next parent is worth looking at.
>
>  * Modified comments and commit message to make the algorithm more
>    clear (at least to me).
>
>  * Mentioned the use of PARENT1 and PARENT2 in object.h.
>
>  * Removed the use of any global variables.
>
>  * Less conditions are checked in mark_path() because the hashmap of
>    'paths' is considered the single-source of truth.
>
>  * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
>    hashmap no more.

Thanks for clearly showing what the changes on top are. When I applied
this locally and ran it, it pretty quickly segfaulted for me:

    expecting success of 8020.3 'last-modified non-recursive':
      check_last_modified <<-\EOF
      3 a
      1 file
      EOF

    + check_last_modified
    + local indir=
    + test 0 != 0
    + cat
    + git last-modified
    Segmentation fault
    error: last command exited with $?=139
    not ok 3 - last-modified non-recursive
    #
    #		check_last_modified <<-\EOF
    #		3 a
    #		1 file
    #		EOF
    #
    1..3

Looking through the backtrace, it looks like someone is calling
mark_path() with a NULL oid, like so:

    (gdb) bt
    #0  __memcmp_evex_movbe ()
        at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:132
    #1  0x00005555555f2c32 in oideq (oid1=0x0, oid2=0x555555a5eeb0)
        at ./hash.h:408
    #2  0x00005555555f3523 in mark_path (path=0x555555a5eee8 "a", oid=0x0,
        data=0x7fffffffd650) at builtin/last-modified.c:179

, which makes sense, since at the end of the main loop we call
mark_path() on all remaining active paths to indicate that they were
modified by whatever commit we just popped off the queue.

Something like this on top (which matches the original patch that I sent
from GitHub's fork) fixes the tests:

--- 8< ---
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 40e520ba18..c8f66633a7 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -176,7 +176,7 @@ static void mark_path(const char *path, const struct object_id *oid,
 	 * Is it arriving at a version of interest, or is it from a side branch
 	 * which did not contribute to the final state?
 	 */
-	if (!oideq(oid, &ent->oid))
+	if (oid && !oideq(oid, &ent->oid))
 		return;

 	last_modified_emit(data->lm, path, data->commit);
--- >8 ---

> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1 (1u<<16) /* used instead of SEEN */
> +#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
> +
>  struct last_modified_entry {
>  	struct hashmap_entry hashent;
>  	struct object_id oid;
>  	struct bloom_key key;
> +	size_t diff_idx;
>  	const char path[FLEX_ARRAY];
>  };
>
> @@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
>  	return strcmp(ent1->path, path ? path : ent2->path);
>  }
>
> +/*
> + * Hold a bitmap for each commit we're working with. Each bit represents a path
> + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
> + */
> +define_commit_slab(commit_bitmaps, struct bitmap *);
> +

Nice, I am glad to see that we are using a bitmap here rather than the
hacky 'char *' that we had originally written. I seem to remember that
there was a tiny slow-down when using bitmaps, but can't find the
discussion anymore. (It wasn't in the internal PR that I originally
opened, and I no longer can read messages that far back in history.)

It might be worth benchmarking here to see if using a 'char *' is
faster. Of course, that's 8x worse in terms of memory usage, but not a
huge deal given both the magnitude and typical number of directory
elements (you'd need 1024^2 entries in a single tree to occupy even a
single MiB of heap).

Regardless of how you handle the above, I think that the commit slab
name here is a little generic. I guess it's OK since this is only
visible within this compilation unit, but perhaps something like
"active_paths_bitmap" would be more descriptive.

Likewise, I wonder if we should have elemtype here be just 'struct
bitmap'. Unfortunately I don't think the EWAH code has a function like:

    void bitmap_init(struct bitmap *);

and only has ones that allocate for us. So we may consider adding one,
or creating a dummy bitmap and copying its contents, or otherwise.

>  struct last_modified {
>  	struct hashmap paths;
>  	struct rev_info rev;
>  	bool recursive;
>  	bool show_trees;
> +
> +	const char **all_paths;
> +	size_t all_paths_nr;

I wonder if all_paths should be a strvec here? I think that this code
was all written when the type was called argv_array (hilariously, that
change took place towards the end of July, 2020, and the --go-faster
code where this patch came from was written just a couple of weeks
earlier.)

> @@ -196,7 +226,36 @@ static void last_modified_diff(struct diff_queue_struct *q,
>  	}
>  }
>
> -static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
> +static size_t path_idx(struct last_modified *lm, char *path)
> +{
> +	struct last_modified_entry *ent;
> +	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
> +					  struct last_modified_entry, hashent);
> +
> +	return ent ? ent->diff_idx : -1;
> +}
> +
> +static void pass_to_parent(struct last_modified *lm,
> +			   struct bitmap *c,
> +			   struct bitmap *p,
> +			   size_t pos)
> +{
> +	struct last_modified_entry *ent;
> +	struct hashmap_iter iter;
> +
> +	bitmap_unset(c, pos);
> +
> +	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
> +		if (ent->diff_idx == pos) {
> +			bitmap_set(p, pos);
> +			break;
> +		}
> +	}
> +}

This one I'm not quite following. The original implementation does
something like:

    c->active[i] = 0;
    c->nr--;
    p->active[i] = 1;
    p->nr++;

, where 'i' is an index into the all_paths array. It looks like you are
effectively doing the first part of that with the bitmap_unset() call,
but I'm confused why you're iterating over paths here.

The caller in process_parents() is iterating over all entries that are
treesame to the parent, bit by bit. So I think you can just bitmap_set()
in the parent directly here, but let me know if I am missing something.

> @@ -220,42 +282,197 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
>  	return false;
>  }
>
> +static void process_parent(struct last_modified *lm,
> +			   struct prio_queue *queue,
> +			   struct commit *c, struct bitmap *active_c,
> +			   struct commit *parent, int parent_i)
> +{
> +	size_t i;
> +	struct bitmap *active_p;
> +
> +	repo_parse_commit(lm->rev.repo, parent);
> +	active_p = get_bitmap(lm, parent);
> +
> +	/*
> +	 * The first time entering this function for this commit (i.e. first parent)
> +	 * see if Bloom filters will tell us it's worth to do the diff.
> +	 */
> +	if (parent_i || maybe_changed_path(lm, c, active_c)) {
> +		diff_tree_oid(&parent->object.oid,
> +			      &c->object.oid, "", &lm->rev.diffopt);
> +		diffcore_std(&lm->rev.diffopt);
> +	}
> +
> +	/*
> +	 * Otherwise, test each path for TREESAME-ness against the parent. If

This "otherwise" is referencing a piece of the patch that doesn't appear
to be here directly, which is how we handle the special case of having
nothing in the diff queue, meaning we are treesame at the root.

In the GitHub version of this patch, we pass all active paths to the
parent, assign the PARENT1 flag if it doesn't already have it, and put
it in the queue as well.

In your version, we'd skip past the next for-loop, and do the same
pass-to-parent dance below, along with inserting the parent into the
prio queue.

So I think that this is all functionally equivalent, but I had to work
through a little bit of the details here, mostly since I haven't looked
at or thought about this code in many years ;-).

>  static int last_modified_run(struct last_modified *lm)
>  {
> +	int max_count, queue_popped = 0;
> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> +	struct commit_list *list;
>  	struct last_modified_callback_data data = { .lm = lm };
>
>  	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
>  	lm->rev.diffopt.format_callback = last_modified_diff;
>  	lm->rev.diffopt.format_callback_data = &data;
> +	lm->rev.no_walk = 1;

This one is new relative to the original patch. Why set no_walk here?

>
>  	prepare_revision_walk(&lm->rev);
>
> -	while (hashmap_get_size(&lm->paths)) {
> -		data.commit = get_revision(&lm->rev);
> -		if (!data.commit)
> -			BUG("paths remaining beyond boundary in last-modified");
> +	max_count = lm->rev.max_count;
> +
> +	init_commit_bitmaps(&lm->commit_bitmaps);
> +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
> +
> +	/*
> +	 * lm->rev.commits holds the set of boundary commits for our walk.
> +	 *
> +	 * Loop through each such commit, and place it in the appropriate queue.
> +	 */
> +	for (list = lm->rev.commits; list; list = list->next) {

Hmm. In the original patch, we look at rev.pending, not rev.commits. The
rest of the patch looks good to me and looks like a faithful
representation of the original patch from GitHub's fork. Thanks for
working on this and making the new last-modified builtin faster ;-).

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16 23:38 ` Taylor Blau
@ 2025-10-17  6:30   ` Jeff King
  2025-10-17 14:54     ` Taylor Blau
  2025-10-17 12:07   ` Toon Claes
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2025-10-17  6:30 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, Derrick Stolee

On Thu, Oct 16, 2025 at 07:38:36PM -0400, Taylor Blau wrote:

> Looking through the backtrace, it looks like someone is calling
> mark_path() with a NULL oid, like so:
> 
>     (gdb) bt
>     #0  __memcmp_evex_movbe ()
>         at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:132
>     #1  0x00005555555f2c32 in oideq (oid1=0x0, oid2=0x555555a5eeb0)
>         at ./hash.h:408
>     #2  0x00005555555f3523 in mark_path (path=0x555555a5eee8 "a", oid=0x0,
>         data=0x7fffffffd650) at builtin/last-modified.c:179
> 
> , which makes sense, since at the end of the main loop we call
> mark_path() on all remaining active paths to indicate that they were
> modified by whatever commit we just popped off the queue.

Hmm, sounds like the mark_path() discussion from:

  https://lore.kernel.org/git/aHmPHcNQYlhGo8JB@nand.local/

coming home to roost. I'm sure you already knew that, but there's maybe
an interesting process observation here: in pulling a battle-tested
implementation apart into patches to be applied in chunks, we ended up
missing a critical part of that original implementation and getting a
bug.

It's not like we didn't know that was a risk, of course, and the payoff
was getting a fresh look at the patches (to improve them and maybe even
fix latent bugs). So it's probably something to just live with. But I
wonder if/how we could mitigate that risk. When I reorganize patches in
a tricky way locally, I often eyeball the diff of the end states
(whatever mess I had originally, versus the result of the "clean"
version), and that might have shown the omission here.

I'm not sure if that would have helped here or not. The "end state" of
the battle-tested version is really GitHub's internal fork. But maybe
your original patches extracted from that (tb/blame-tree in your fork, I
think) applied on top of the same base point (e.g., the current tip of
master) might be an interesting comparison? Or maybe not. The earlier
rounds have may have had other adjustments which introduce a bunch of
noise.

Anyway, that is mostly philosophical rambling. I did have one concrete
thing to say below.

> > +/*
> > + * Hold a bitmap for each commit we're working with. Each bit represents a path
> > + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
> > + */
> > +define_commit_slab(commit_bitmaps, struct bitmap *);
> > +
> 
> Nice, I am glad to see that we are using a bitmap here rather than the
> hacky 'char *' that we had originally written. I seem to remember that
> there was a tiny slow-down when using bitmaps, but can't find the
> discussion anymore. (It wasn't in the internal PR that I originally
> opened, and I no longer can read messages that far back in history.)
> 
> It might be worth benchmarking here to see if using a 'char *' is
> faster. Of course, that's 8x worse in terms of memory usage, but not a
> huge deal given both the magnitude and typical number of directory
> elements (you'd need 1024^2 entries in a single tree to occupy even a
> single MiB of heap).

I doubt the memory usage matters too much. We throw away each bitmap
after we finish processing its associated commit, so our max memory is
really the size of the bitmap/char array times the size of the queue (so
effectively the width of the history graph). So yeah, I too would be
curious if the performance is actually better with chars.

I also wonder how often we pass an unchanged bitmap to our parents
(e.g., for the common case that a commit has a single parent, and does
not touch any of the active paths, the active set will be the same for
both). There's probably an easy-ish optimization to avoid allocating a
new bitmap, and to just transfer ownership via pointer.

  You can even get fancier and always just speculatively pass your
  bitmap to _all_ parents, keeping a refcount, and then copy-on-write
  when you need to clear a bit. We do something similar in
  delta-island's set_island_marks(). But the complexity may not be worth
  it, as I'd guess that the single-parent, nothing-cleared case would
  dominate.

That's definitely something that could come on top of this, though (or
never).

> Likewise, I wonder if we should have elemtype here be just 'struct
> bitmap'. Unfortunately I don't think the EWAH code has a function like:
> 
>     void bitmap_init(struct bitmap *);
> 
> and only has ones that allocate for us. So we may consider adding one,
> or creating a dummy bitmap and copying its contents, or otherwise.

I thought that, too, though it does change the max memory use a bit.
Right now we are storing one pointer per commit (the "struct bitmap *")
and that is true whether we have processed the commit or not (it is
populated while the commit is in the queue, and then NULL after). If we
stored the struct directly, that's twice as many bytes (the eword_t
pointer, plus a size_t), and it's per commit.

So for the 1.3M commits in linux.git, for example, that's 10MB used
during the whole program. We save a few bytes by not having the extra
pointers, but only for items that are in the queue (which are relatively
small).

Probably doesn't matter much, as 10MB isn't that much (and we are surely
storing much more for the commit structs themselves). I'd be curious if
avoiding the extra malloc/free and pointer chasing shows a run-time
improvement, though.

Ironically, we do not really need the bitmap's size field at all! All of
these bitmaps are going to start as "all_paths_nr" worth of 1's and get
whittled down from there. So they could all just be the same size
allocation and skip their "nr" field entirely. Of course we couldn't use
the bitmap struct then.

I notice that the original implementation does keep a "nr" field
per-commit, but it's not the size of the allocation. It's the number of
bits set. It feels like we shouldn't need to keep that forever. It's
mostly used to see if the parent got anything passed to it, but we can
throw it away after that. So you could be saving 8 bytes per commit
there (and I don't see that information kept at all in Toon's version).

Anyway, these are all probably micro-optimizations that don't matter
that much. I am a little curious if chars outperform bitmaps, though.

-Peff

PS I tried building tb/blame-tree from your repo because I was poking at
   how some of it worked (having forgotten everything I ever knew about
   it by this point). It does work, but needs this:

diff --git a/blame-tree.c b/blame-tree.c
index 6addac7b0b..2448f2caf4 100644
--- a/blame-tree.c
+++ b/blame-tree.c
@@ -800,7 +800,6 @@ static int process_parent(struct blame_tree *bt,
 		int k = diff2idx(bt, fp->two->path);
 		if (0 <= k && active_c->active[k])
 			scratch[k] = 1;
-		diff_free_filepair(fp);
 	}
 	for (i = 0; i < bt->all_paths_nr; i++) {
 		if (active_c->active[i] && !scratch[i])

  on top, since otherwise we try to double-free the filepairs. I'd guess
  it is a victim of rebasing across a5aecb2cdc (diff: improve lifecycle
  management of diff queues, 2024-09-30), which swapped out
  DIFF_QUEUE_CLEAR(), which left freeing the responsibility of the
  caller, for diff_queue_clear() which handles that itself.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
                   ` (2 preceding siblings ...)
  2025-10-16 23:38 ` Taylor Blau
@ 2025-10-17  6:37 ` Jeff King
  2025-10-17 10:47   ` Toon Claes
  2025-10-21 12:56 ` [PATCH v2] " Toon Claes
  4 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2025-10-17  6:37 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau

On Thu, Oct 16, 2025 at 10:39:25AM +0200, Toon Claes wrote:

> +	for (i = 0; i < diff_queued_diff.nr; i++) {
> +		struct diff_filepair *fp = diff_queued_diff.queue[i];
> +		size_t k = path_idx(lm, fp->two->path);
> +		if (0 <= k && bitmap_get(active_c, k))
> +			bitmap_set(lm->scratch, k);
> +		diff_free_filepair(fp);
> +	}

Just one little oddity while looking at this versus the old patches from
Taylor. Here you call diff_free_filepair(). But later...

> +	diff_queued_diff.nr = 0;
> +	diff_queue_clear(&diff_queued_diff);

...you call diff_queue_clear(), which frees the filepairs itself. It
does the right thing, because you truncate the queue explicitly. But
would it be simpler to just leave them in place and let the _clear()
function clean up? I.e., this:

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 40e520ba18..47f2b0ed44 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -315,7 +315,6 @@ static void process_parent(struct last_modified *lm,
 		size_t k = path_idx(lm, fp->two->path);
 		if (0 <= k && bitmap_get(active_c, k))
 			bitmap_set(lm->scratch, k);
-		diff_free_filepair(fp);
 	}
 	for (i = 0; i < lm->all_paths_nr; i++) {
 		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
@@ -331,7 +330,6 @@ static void process_parent(struct last_modified *lm,
 	}
 
 	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
-	diff_queued_diff.nr = 0;
 	diff_queue_clear(&diff_queued_diff);
 }
 

which feels a lot more idiomatic to me.

-Peff

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16 18:51 ` Justin Tobler
@ 2025-10-17 10:38   ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-17 10:38 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, Karthik Nayak, Taylor Blau

Justin Tobler <jltobler@gmail.com> writes:

> Ok so if I understand correctly, the optimization here is that as we
> perform the revision walk, the set of paths we look for at each commit
> monotonically decreases as changed paths are identified. Prior to this,
> we were always checking all paths for each commit even though a path may
> have already found the commit that last modified it.

Not only that, we only pass on paths to one parent. So either a path is
treesame to a parent, that means it was not changed in the current
commit, and then we only pass that path to that parent. If the commit
had multiple parents, the merge ignored versions from other parents.

If the path isn't treesame to any of the parents, we know the current
commit modified the path, and we associate that commit with that path.


> [snip]
>> As an added benefit, this implementation gives more correct results. For
>> example implementation in 'master' gives:
>
> s/implementation/the implementation/

Fair enough, but I also need to rephrase the "more correct" as pointed
out by Ben[1].

[1]: https://lore.kernel.org/git/CALnO6CBwuAdBFjESZSYZkChNdU9R17OXDc+CY=Z96QoACPgrpQ@mail.gmail.com/

>> +/* Remember to update object flag allocation in object.h */
>
> At first I was wondering if this is a leftover note, but it looks like
> it is just a reminder if the allocations change here.

Yeah, it seems to be the common pattern to do it like this. Didn't
question it further.

>> +#define PARENT1 (1u<<16) /* used instead of SEEN */
>> +#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
>
> Naive question: why do we use these object flags instead of the ones
> mentioned?

We set these bits in `struct object::flags`, because other systems might
be using other bits for their internal workings we want to avoid
colliding bit ranges. That's why users define the bits they use in
object.h, to have a general overview over who is using what and which
systems might get in trouble with eachother.

Bit 16 and 17 seems to be safe to use. We could have named these defines
SEEN and BOUNDARY, but in revision.h they have different bit positions,
so we name that PARENT1 and PARENT2, similar to what these bits are
named in other files.

>> +
>>  struct last_modified_entry {
>>  	struct hashmap_entry hashent;
>>  	struct object_id oid;
>>  	struct bloom_key key;
>> +	size_t diff_idx;
>>  	const char path[FLEX_ARRAY];
>>  };
>>  
>> @@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
>>  	return strcmp(ent1->path, path ? path : ent2->path);
>>  }
>>  
>> +/*
>> + * Hold a bitmap for each commit we're working with. Each bit represents a path
>> + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
>> + */
>> +define_commit_slab(commit_bitmaps, struct bitmap *);
>
> Why do we need a path bitmap for each commit? My understanding is that
> we check commits in a certain order as dictated by the priority queue.
> As soon as the commit that last-modified a path has been identified,
> wouldn't we always want the remaining commits processed to only check
> the outstanding paths?

Because we want to pass paths to parents that are treesame. Imagine a
tree with:

    bar-a.txt
    bar-b.txt
    baz-a.txt
    foo-a.txt
    foo-a.txt

When we're looking at a commit and it has the 'active_c' bitmap
`11111`. This commit has two parents, and we see bar-a.txt and bar-b.txt
are treesame on parent-0, we pass bitmap `11000` to that parent. If
foo-a.txt and foo-b.txt are treesame on the parent-1, we pass bitmap
`00011` to that parent. This leaves bitmap `00100` behind on the current
commit, and we associate baz-a.txt with that commit.

So a bit for a path would only be passed on to a single parent (or
none).

>>  struct last_modified {
>>  	struct hashmap paths;
>>  	struct rev_info rev;
>>  	bool recursive;
>>  	bool show_trees;
>> +
>> +	const char **all_paths;
>> +	size_t all_paths_nr;
>
> It is not immediately obvious to me why we have both `paths` and
> `all_paths`. From my understanding, `all_paths` is defining the path
> order for the bitmap. If this is the case, maybe it would be worth
> explaining in a comment?

Yeah, I'm not extremely happy about storing the information twice, but
I kept it for two reasons:

 * The hashmap is filled by populate_paths_from_revs() and it grows with
   every call of add_path_from_diff(). Afterwards we malloc() the
   all_paths buffer to the correct size. Re-allocing all_paths on every
   call of add_path_from_diff() would be clunky.

 * Reverse lookups: When we have a path, we can lookup the index in
   all_paths using path_idx(), which uses the hashmap to locate the
   hashmap entry (which has the `diff_idx`).

> [snip]
>>  static int last_modified_run(struct last_modified *lm)
>>  {
>> +	int max_count, queue_popped = 0;
>> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>> +	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
>> +	struct commit_list *list;
>>  	struct last_modified_callback_data data = { .lm = lm };
>>  
>>  	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
>>  	lm->rev.diffopt.format_callback = last_modified_diff;
>>  	lm->rev.diffopt.format_callback_data = &data;
>> +	lm->rev.no_walk = 1;
>>  
>>  	prepare_revision_walk(&lm->rev);
>>  
>> -	while (hashmap_get_size(&lm->paths)) {
>> -		data.commit = get_revision(&lm->rev);
>> -		if (!data.commit)
>> -			BUG("paths remaining beyond boundary in last-modified");
>> +	max_count = lm->rev.max_count;
>> +
>> +	init_commit_bitmaps(&lm->commit_bitmaps);
>> +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
>
> It looks like we initialize and release both `commit_bitmaps` and
> `scratch` here in `last_modified_run()`. Any reason we wouldn't want to
> move this to `last_modified_{init,release}()`?

For lm->commit_bitmaps, at the `cleanup:` label we call bitmap_free() on
`active_c`, but `lm->commit_bitmaps` still has a pointer to those
bitmaps. It feels a bit cleaner to me to not exit this function with
having any in `lm` pointing to invalid data.
(Also note: we don't call deep_clear_commit_bitmaps() because
the bitmaps are already freed, the commit_slab of bitmaps isn't)

`lm->scratch` we could move to `last_modified_{init,release}()` but I
decided to keep the bitmaps together.

>> +
>> +	/*
>> +	 * lm->rev.commits holds the set of boundary commits for our walk.
>
> Naive question: would it be more correct ...

Don't use "more correct", I've been told :-P

> ... to say that `rev.commits` is
> the list of starting commits? Boundary commits sounds like commits on
> the boundary of what we consider interesting/uninteresting which, from
> my understanding, is not the case here.

Why aren't these boundaries? Users can pass uninteresting commits to
this command with `--not aaaaaa` or `aaaaaa..bbbbbb`. So in this context
boundaries are used on both ends, and `queue` and `not_queue` are used
to store those.


>> +	 *
>> +	 * Loop through each such commit, and place it in the appropriate queue.
>> +	 */
>> +	for (list = lm->rev.commits; list; list = list->next) {
>> +		struct commit *c = list->item;
>> +
>> +		if (c->object.flags & BOTTOM) {
>> +			prio_queue_put(&not_queue, c);
>
> Ok so commits with the BOTTOM flag are at the boundary of the
> "interesting" commit graph. Thus they are not included in the search and
> added to the "not_queue".
>
>> +			c->object.flags |= PARENT2;
>
> What is the meaning behind the name PARENT2 in this context? From my
> understanding we are using this flag to denote a commit we are not
> interested in.

Yes, that's correct. It's used to avoid us adding it to the not_queue
more than once.

>
>> +		} else if (!(c->object.flags & PARENT1)) {
>
> Same question about PARENT1. It seems to be used to just denote commits
> that we have already encountered. The names confuse me a bit though.

Naming isn't great, but see my earlier comment.

>> +			/*
>> +			 * If the commit is a starting point (and hasn't been
>> +			 * seen yet), then initialize the set of interesting
>> +			 * paths, too.
>> +			 */
>> +			struct bitmap *active;
>> +
>> +			prio_queue_put(&queue, c);
>> +			c->object.flags |= PARENT1;
>
> We queue the commit and mark it as seen. Makes sense.
>
>> -		if (data.commit->object.flags & BOUNDARY) {
>> +			active = get_bitmap(lm, c);
>> +			for (size_t i = 0; i < lm->all_paths_nr; i++)
>> +				bitmap_set(active, i);
>
> Here we set up the path bitmap for the commit. At this point, all paths
> are still "active" and thus set accordingly. I'm still not entirely sure
> though if we really need a path bitmap per commit.

After my explanation above, let me know if you still have questions. I
must admit it took me a while to have it click in my brain too.

>> +		}
>> +	}
>> +
>> +	while (queue.nr) {
>> +		int parent_i;
>> +		struct commit_list *p;
>> +		struct commit *c = prio_queue_get(&queue);
>> +		struct bitmap *active_c = get_bitmap(lm, c);
>> +
>> +		if ((0 <= max_count && max_count < ++queue_popped) ||
>> +		    (c->object.flags & PARENT2)) {
>> +			/*
>> +			 * Either a boundary commit, or we have already seen too
>> +			 * many others. Either way, stop here.
>> +			 */
>> +			c->object.flags |= PARENT2 | BOUNDARY;
>> +			data.commit = c;
>>  			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
>> -				      &data.commit->object.oid, "",
>> -				      &lm->rev.diffopt);
>> +				      &c->object.oid,
>> +				      "", &lm->rev.diffopt);
>>  			diff_flush(&lm->rev.diffopt);
>> +			goto cleanup;
>> +		}
>>  
>> -			break;
>> +		/*
>> +		 * Otherwise, make sure that 'c' isn't reachable from anything
>> +		 * in the '--not' queue.
>> +		 */
>> +		repo_parse_commit(lm->rev.repo, c);
>> +
>> +		while (not_queue.nr) {
>> +			struct commit_list *np;
>> +			struct commit *n = prio_queue_get(&not_queue);
>> +
>> +			repo_parse_commit(lm->rev.repo, n);
>> +
>> +			for (np = n->parents; np; np = np->next) {
>> +				if (!(np->item->object.flags & PARENT2)) {
>> +					prio_queue_put(&not_queue, np->item);
>> +					np->item->object.flags |= PARENT2;
>> +				}
>> +			}
>> +
>> +			if (commit_graph_generation(n) < commit_graph_generation(c))
>> +				break;
>
> If the generation number of 'c' is higher than 'n' we know 'c' cannot be
> an ancestor of 'n' and thus we continue on. Makes sense.

<3

>>  		}
>>  
>> -		if (!maybe_changed_path(lm, data.commit))
>> -			continue;
>> +		/*
>> +		 * Look at each parent and pass on each path that's TREESAME
>> +		 * with that parent. Stop early when no active paths remain.
>> +		 */
>> +		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
>> +			process_parent(lm, &queue,
>> +				       c, active_c,
>> +				       p->item, parent_i);
>> +
>> +			if (bitmap_is_empty(active_c))
>> +				break;
>> +		}
>>  
>> -		log_tree_commit(&lm->rev, data.commit);
>> +		/*
>> +		 * Paths that remain active, or not TREESAME with any parent,
>> +		 * were changed by 'c'.
>> +		 */
>> +		if (!bitmap_is_empty(active_c))  {
>> +			data.commit = c;
>> +			for (size_t i = 0; i < lm->all_paths_nr; i++) {
>> +				if (bitmap_get(active_c, i))
>> +					mark_path(lm->all_paths[i], NULL, &data);
>> +			}
>> +		}
>> +
>> +cleanup:
>> +		bitmap_free(active_c);
>>  	}
>>  
>> +	if (hashmap_get_size(&lm->paths))
>> +		BUG("paths remaining beyond boundary in last-modified");
>> +
>> +	clear_prio_queue(&not_queue);
>> +	clear_prio_queue(&queue);
>> +	clear_commit_bitmaps(&lm->commit_bitmaps);
>> +	bitmap_free(lm->scratch);
>> +
>>  	return 0;
>>  }
> [snip]
>
> I've taken an initial look a this patch and have mostly questions so
> far. Just FYI, after applying this patch locally, the last-modified
> tests seem to be failing. The command seems to be segfaulting, but I
> haven't looked into it further.

Yeah, sorry about that. Taylor also noticed. In my final review I revived
some code I had removed at some point. I brought it back, but didn't do
a final test afterward. He provided a path to fix it[2] (which I also
had in my code at some point, but reverted).

[2]: https://lore.kernel.org/git/aPGB%2FFJtjDmyNLvG@nand.local/

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16 20:48 ` D. Ben Knoble
@ 2025-10-17 10:45   ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-17 10:45 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau

"D. Ben Knoble" <ben.knoble@gmail.com> writes:

> On Thu, Oct 16, 2025 at 4:39 AM Toon Claes <toon@iotcl.com> wrote:
>> As an added benefit, this implementation gives more correct results. For
>> example implementation in 'master' gives:
>
> "More correct" is a bit of an oxymoron, no? It's either correct or
> it's not :)

I can rephrase to "more accurate" or something that's better suited.

>>     $ git log --max-count=1 --format=%H -- pkt-line.h
>>     15df15fe07ef66b51302bb77e393f3c5502629de
>>
>>     $ git last-modified -- pkt-line.h
>>     15df15fe07ef66b51302bb77e393f3c5502629de    pkt-line.h
>>
>>     $ git last-modified | grep pkt-line.h
>>     5b49c1af03e600c286f63d9d9c9fb01403230b9f    pkt-line.h
>
> It seems this commit is the merge to a maintenance branch, which was
> authored and committed after the mainline merge but topologically we'd
> probably consider it "earlier," at least starting from master? Anyway,
> I'm not clear why this result was produced.

Yeah, me neither. But it's a nice side-effect this behavior result goes
away with this patch.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17  6:37 ` Jeff King
@ 2025-10-17 10:47   ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-17 10:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau

Jeff King <peff@peff.net> writes:

> On Thu, Oct 16, 2025 at 10:39:25AM +0200, Toon Claes wrote:
>
>> +	for (i = 0; i < diff_queued_diff.nr; i++) {
>> +		struct diff_filepair *fp = diff_queued_diff.queue[i];
>> +		size_t k = path_idx(lm, fp->two->path);
>> +		if (0 <= k && bitmap_get(active_c, k))
>> +			bitmap_set(lm->scratch, k);
>> +		diff_free_filepair(fp);
>> +	}
>
> Just one little oddity while looking at this versus the old patches from
> Taylor. Here you call diff_free_filepair(). But later...
>
>> +	diff_queued_diff.nr = 0;
>> +	diff_queue_clear(&diff_queued_diff);
>
> ...you call diff_queue_clear(), which frees the filepairs itself. It
> does the right thing, because you truncate the queue explicitly. But
> would it be simpler to just leave them in place and let the _clear()
> function clean up? I.e., this:
>
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index 40e520ba18..47f2b0ed44 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -315,7 +315,6 @@ static void process_parent(struct last_modified *lm,
>  		size_t k = path_idx(lm, fp->two->path);
>  		if (0 <= k && bitmap_get(active_c, k))
>  			bitmap_set(lm->scratch, k);
> -		diff_free_filepair(fp);
>  	}
>  	for (i = 0; i < lm->all_paths_nr; i++) {
>  		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
> @@ -331,7 +330,6 @@ static void process_parent(struct last_modified *lm,
>  	}
>  
>  	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> -	diff_queued_diff.nr = 0;
>  	diff_queue_clear(&diff_queued_diff);
>  }
>
> which feels a lot more idiomatic to me.

Hah, yes. I think I ended up in this situation because initially I was
only trying to fix memory leaks. Thanks, I will included these changes.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-16 23:38 ` Taylor Blau
  2025-10-17  6:30   ` Jeff King
@ 2025-10-17 12:07   ` Toon Claes
  2025-10-21  9:04     ` Toon Claes
                       ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-17 12:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Thu, Oct 16, 2025 at 10:39:25AM +0200, Toon Claes wrote:
>> [...]
>>
>> To avoid computing many first-parent diffs, add another trick on top of
>> this and check if all paths active in 'c' are DEFINITELY NOT in c's
>> Bloom filter. Since the commit-graph only stores first-parent diffs in
>> the Bloom filters, we can only apply this trick to first-parent diffs.
>
> OK, up to this point this is the same as the commit message that I wrote
> with Stolee back in 2020.
>
>> Comparing the performance of this new algorithm shows about a 2.6x
>> improvement on git.git:
>>
>>     Benchmark 1: master
>>       Time (mean ± σ):      3.077 s ±  0.055 s    [User: 3.017 s, System: 0.051 s]
>>       Range (min … max):    2.947 s …  3.127 s    10 runs
>>
>>     Benchmark 2: HEAD
>>       Time (mean ± σ):      1.181 s ±  0.010 s    [User: 1.139 s, System: 0.038 s]
>>       Range (min … max):    1.169 s …  1.194 s    10 runs
>>
>>     Summary
>>       HEAD ran
>>         2.60 ± 0.05 times faster than master
>>
>> But when comparing a more extreme example of
>> `git last-modified -- COPYING t`, the difference is a lot bigger:
>>
>>     Benchmark 1: master
>>       Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
>>       Range (min … max):    4.308 s …  4.509 s    10 runs
>>
>>     Benchmark 2: HEAD
>>       Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
>>       Range (min … max):   810.6 ms … 881.2 ms    10 runs
>>
>>     Summary
>>       HEAD ran
>>         5.29 ± 0.16 times faster than master
>
> These benchmarks are different than the ones that I provided, which is
> good, since we should be measuring modern Git, not dragging forward
> stale benchmarks ;-).
>
> I imagine that you are just doing a straight last-modified run here in
> both instances. In the original patch, I timed this both with and
> without changed-path Bloom filters, which helped illustrate their impact
> on the changes here.
>
> I'd suggest including those benchmarks as well, and potentially running
> them on linux.git, or another comparably larger open-source repository.
> git.git is large enough to show some interesting behavior, but I always
> have found it useful to compare the results against a larger repository
> as well.

Sure.

>> As an added benefit, this implementation gives more correct results. For
>> example implementation in 'master' gives:
>>
>>     $ git log --max-count=1 --format=%H -- pkt-line.h
>>     15df15fe07ef66b51302bb77e393f3c5502629de
>>
>>     $ git last-modified -- pkt-line.h
>>     15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h
>>
>>     $ git last-modified | grep pkt-line.h
>>     5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h
>>
>> With the changes in this patch the results of git-last-modified(1)
>> always match those of `git log --max-count=1`.
>>
>> One thing to note though, the results might be outputted in a different
>> order than before. This is not considerd to be an issue because nowhere
>> is documented the order is guaranteed.
>>
>> Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
>
> Stolee and I wrote these patches together many years ago, so he should
> be credited here as well. Since this patch appears to be substantially
> based on the original work, I think it is appropriate to include my
> S-o-b once the patch is ready.

I'm very grateful you (GitHub) have written and shared these patches, so
I'm happy to give proper attribution.

>> ---
>> This series revives those changes. I did more thorough deep dive through
>> the code and the algorithm and got the code working a lot faster. The
>> benchmark results can be found in the commit message.
>>
>> Some changes compared to GitHub's version include:
>>
>>  * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
>>    `struct commit_active_paths`.
>>
>>  * Removed shortcut code that handled the case when commit and parent
>>    are fully treesame, and instead always checked 'active_c' whether the
>>    next parent is worth looking at.
>>
>>  * Modified comments and commit message to make the algorithm more
>>    clear (at least to me).
>>
>>  * Mentioned the use of PARENT1 and PARENT2 in object.h.
>>
>>  * Removed the use of any global variables.
>>
>>  * Less conditions are checked in mark_path() because the hashmap of
>>    'paths' is considered the single-source of truth.
>>
>>  * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
>>    hashmap no more.
>
> Thanks for clearly showing what the changes on top are. When I applied
> this locally and ran it, it pretty quickly segfaulted for me:
>
>     expecting success of 8020.3 'last-modified non-recursive':
>       check_last_modified <<-\EOF
>       3 a
>       1 file
>       EOF
>
>     + check_last_modified
>     + local indir=
>     + test 0 != 0
>     + cat
>     + git last-modified
>     Segmentation fault
>     error: last command exited with $?=139
>     not ok 3 - last-modified non-recursive
>     #
>     #		check_last_modified <<-\EOF
>     #		3 a
>     #		1 file
>     #		EOF
>     #
>     1..3
>
> Looking through the backtrace, it looks like someone is calling
> mark_path() with a NULL oid, like so:
>
>     (gdb) bt
>     #0  __memcmp_evex_movbe ()
>         at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:132
>     #1  0x00005555555f2c32 in oideq (oid1=0x0, oid2=0x555555a5eeb0)
>         at ./hash.h:408
>     #2  0x00005555555f3523 in mark_path (path=0x555555a5eee8 "a", oid=0x0,
>         data=0x7fffffffd650) at builtin/last-modified.c:179
>
> , which makes sense, since at the end of the main loop we call
> mark_path() on all remaining active paths to indicate that they were
> modified by whatever commit we just popped off the queue.
>
> Something like this on top (which matches the original patch that I sent
> from GitHub's fork) fixes the tests:
>
> --- 8< ---
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index 40e520ba18..c8f66633a7 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -176,7 +176,7 @@ static void mark_path(const char *path, const struct object_id *oid,
>  	 * Is it arriving at a version of interest, or is it from a side branch
>  	 * which did not contribute to the final state?
>  	 */
> -	if (!oideq(oid, &ent->oid))
> +	if (oid && !oideq(oid, &ent->oid))
>  		return;
>
>  	last_modified_emit(data->lm, path, data->commit);
> --- >8 ---

Sorry for this oopsie. Right before I sent this patch, I had a version
that removed the oideq() condition. In the final diff I noticed and I
decided to undo that, but didn't check my tests again. Thanks for the
patch, because I know I had that at one point as well.

But this makes me wonder, is there even any value in keeping `oid` on
`struct last_modified_entry`? I'll have a deeper look into this.

>> +/* Remember to update object flag allocation in object.h */
>> +#define PARENT1 (1u<<16) /* used instead of SEEN */
>> +#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
>> +
>>  struct last_modified_entry {
>>  	struct hashmap_entry hashent;
>>  	struct object_id oid;
>>  	struct bloom_key key;
>> +	size_t diff_idx;
>>  	const char path[FLEX_ARRAY];
>>  };
>>
>> @@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
>>  	return strcmp(ent1->path, path ? path : ent2->path);
>>  }
>>
>> +/*
>> + * Hold a bitmap for each commit we're working with. Each bit represents a path
>> + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
>> + */
>> +define_commit_slab(commit_bitmaps, struct bitmap *);
>> +
>
> Nice, I am glad to see that we are using a bitmap here rather than the
> hacky 'char *' that we had originally written. I seem to remember that
> there was a tiny slow-down when using bitmaps, but can't find the
> discussion anymore. (It wasn't in the internal PR that I originally
> opened, and I no longer can read messages that far back in history.)
>
> It might be worth benchmarking here to see if using a 'char *' is
> faster. Of course, that's 8x worse in terms of memory usage, but not a
> huge deal given both the magnitude and typical number of directory
> elements (you'd need 1024^2 entries in a single tree to occupy even a
> single MiB of heap).

Okay, I can give it a try.

> Regardless of how you handle the above, I think that the commit slab
> name here is a little generic. I guess it's OK since this is only
> visible within this compilation unit, but perhaps something like
> "active_paths_bitmap" would be more descriptive.

I struggled a lot naming this thing, so I'm open to suggestions.

> Likewise, I wonder if we should have elemtype here be just 'struct
> bitmap'. Unfortunately I don't think the EWAH code has a function like:
>
>     void bitmap_init(struct bitmap *);
>
> and only has ones that allocate for us. So we may consider adding one,
> or creating a dummy bitmap and copying its contents, or otherwise.
>
>>  struct last_modified {
>>  	struct hashmap paths;
>>  	struct rev_info rev;
>>  	bool recursive;
>>  	bool show_trees;
>> +
>> +	const char **all_paths;
>> +	size_t all_paths_nr;
>
> I wonder if all_paths should be a strvec here? I think that this code
> was all written when the type was called argv_array (hilariously, that
> change took place towards the end of July, 2020, and the --go-faster
> code where this patch came from was written just a couple of weeks
> earlier.)

Ahha, that might be a good idea. This might allow us to get rid of the
hashmap, which stops us from storing the paths twice. Not sure what the
impact on the performance would be, because the hashmap now is valuable
for path_idx() lookups.

>> @@ -196,7 +226,36 @@ static void last_modified_diff(struct diff_queue_struct *q,
>>  	}
>>  }
>>
>> -static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
>> +static size_t path_idx(struct last_modified *lm, char *path)
>> +{
>> +	struct last_modified_entry *ent;
>> +	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
>> +					  struct last_modified_entry, hashent);
>> +
>> +	return ent ? ent->diff_idx : -1;
>> +}
>> +
>> +static void pass_to_parent(struct last_modified *lm,
>> +			   struct bitmap *c,
>> +			   struct bitmap *p,
>> +			   size_t pos)
>> +{
>> +	struct last_modified_entry *ent;
>> +	struct hashmap_iter iter;
>> +
>> +	bitmap_unset(c, pos);
>> +
>> +	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
>> +		if (ent->diff_idx == pos) {
>> +			bitmap_set(p, pos);
>> +			break;
>> +		}
>> +	}
>> +}
>
> This one I'm not quite following. The original implementation does
> something like:
>
>     c->active[i] = 0;
>     c->nr--;
>     p->active[i] = 1;
>     p->nr++;
>
> , where 'i' is an index into the all_paths array. It looks like you are
> effectively doing the first part of that with the bitmap_unset() call,
> but I'm confused why you're iterating over paths here.
>
> The caller in process_parents() is iterating over all entries that are
> treesame to the parent, bit by bit. So I think you can just bitmap_set()
> in the parent directly here, but let me know if I am missing something.

It is an attempt from my side to optimize things further. I was thinking
a path could have been associated by another commit already. But now
you've brought this up, I don't think this no more.

A bit can only be set in two places:

* Initially when a commit is added from lm->rev.commits. We can only
  have one commit in there, because we count `num_interesting` in
  populate_paths_from_revs() and abort if it's more than 1.

* Here in pass_to_parent(). A commit only passes a bit to one parent.

So I'll remove the extra guard in the next version.

>> @@ -220,42 +282,197 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
>>  	return false;
>>  }
>>
>> +static void process_parent(struct last_modified *lm,
>> +			   struct prio_queue *queue,
>> +			   struct commit *c, struct bitmap *active_c,
>> +			   struct commit *parent, int parent_i)
>> +{
>> +	size_t i;
>> +	struct bitmap *active_p;
>> +
>> +	repo_parse_commit(lm->rev.repo, parent);
>> +	active_p = get_bitmap(lm, parent);
>> +
>> +	/*
>> +	 * The first time entering this function for this commit (i.e. first parent)
>> +	 * see if Bloom filters will tell us it's worth to do the diff.
>> +	 */
>> +	if (parent_i || maybe_changed_path(lm, c, active_c)) {
>> +		diff_tree_oid(&parent->object.oid,
>> +			      &c->object.oid, "", &lm->rev.diffopt);
>> +		diffcore_std(&lm->rev.diffopt);
>> +	}
>> +
>> +	/*
>> +	 * Otherwise, test each path for TREESAME-ness against the parent. If
>
> This "otherwise" is referencing a piece of the patch that doesn't appear
> to be here directly, which is how we handle the special case of having
> nothing in the diff queue, meaning we are treesame at the root.

True, I shall drop the "otherwise" and rephrase further if needed.

> In the GitHub version of this patch, we pass all active paths to the
> parent, assign the PARENT1 flag if it doesn't already have it, and put
> it in the queue as well.
>
> In your version, we'd skip past the next for-loop, and do the same
> pass-to-parent dance below, along with inserting the parent into the
> prio queue.

This is the "shortcut" I'm mentioning in my cover letter. In my testing
it seemed it didn't provide any performance gains to keep it. I consider
less code better code, so I left it out.

> So I think that this is all functionally equivalent, but I had to work
> through a little bit of the details here, mostly since I haven't looked
> at or thought about this code in many years ;-).
>
>>  static int last_modified_run(struct last_modified *lm)
>>  {
>> +	int max_count, queue_popped = 0;
>> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>> +	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
>> +	struct commit_list *list;
>>  	struct last_modified_callback_data data = { .lm = lm };
>>
>>  	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
>>  	lm->rev.diffopt.format_callback = last_modified_diff;
>>  	lm->rev.diffopt.format_callback_data = &data;
>> +	lm->rev.no_walk = 1;
>
> This one is new relative to the original patch. Why set no_walk here?

This comes from
https://github.com/ttaylorr/git/commit/e8ea49705873d28f64b815bd00d14bdf6d48ca4d

Well, it basically squashes various commits together. There are various
commits doing different things here. I don't think it's valuable for
anyone to see the full history of the iterations at GitHub, that's why I
squashed it in.

Would you consider it better to not set `no_walk`?

>>  	prepare_revision_walk(&lm->rev);
>>
>> -	while (hashmap_get_size(&lm->paths)) {
>> -		data.commit = get_revision(&lm->rev);
>> -		if (!data.commit)
>> -			BUG("paths remaining beyond boundary in last-modified");
>> +	max_count = lm->rev.max_count;
>> +
>> +	init_commit_bitmaps(&lm->commit_bitmaps);
>> +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
>> +
>> +	/*
>> +	 * lm->rev.commits holds the set of boundary commits for our walk.
>> +	 *
>> +	 * Loop through each such commit, and place it in the appropriate queue.
>> +	 */
>> +	for (list = lm->rev.commits; list; list = list->next) {
>
> Hmm. In the original patch, we look at rev.pending, not rev.commits. The
> rest of the patch looks good to me and looks like a faithful
> representation of the original patch from GitHub's fork. Thanks for
> working on this and making the new last-modified builtin faster ;-).

Euh, interesting. I'll look into it.

Thanks for the support!

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17  6:30   ` Jeff King
@ 2025-10-17 14:54     ` Taylor Blau
  2025-10-21  8:20       ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-10-17 14:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, Derrick Stolee

On Fri, Oct 17, 2025 at 02:30:39AM -0400, Jeff King wrote:
> On Thu, Oct 16, 2025 at 07:38:36PM -0400, Taylor Blau wrote:
>
> > Looking through the backtrace, it looks like someone is calling
> > mark_path() with a NULL oid, like so:
> >
> >     (gdb) bt
> >     #0  __memcmp_evex_movbe ()
> >         at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:132
> >     #1  0x00005555555f2c32 in oideq (oid1=0x0, oid2=0x555555a5eeb0)
> >         at ./hash.h:408
> >     #2  0x00005555555f3523 in mark_path (path=0x555555a5eee8 "a", oid=0x0,
> >         data=0x7fffffffd650) at builtin/last-modified.c:179
> >
> > , which makes sense, since at the end of the main loop we call
> > mark_path() on all remaining active paths to indicate that they were
> > modified by whatever commit we just popped off the queue.
>
> Hmm, sounds like the mark_path() discussion from:
>
>   https://lore.kernel.org/git/aHmPHcNQYlhGo8JB@nand.local/
>
> coming home to roost. I'm sure you already knew that, but there's maybe
> an interesting process observation here: in pulling a battle-tested
> implementation apart into patches to be applied in chunks, we ended up
> missing a critical part of that original implementation and getting a
> bug.

Hmm. Is that what happened in this case, though?

In GitHub's version of this code, mark_path() didn't have the NULL-ness
check on 'oid' until we added the "--go-faster" mode, which is where
this patch is derived from. Looking at the original changes from
GitHub's side:

--- 8< ---
diff --git a/blame-tree.c b/blame-tree.c
--- a/blame-tree.c
+++ b/blame-tree.c
@@ -119,28 +142,38 @@
 static void mark_path(const char *path, const struct object_id *oid,
 		      struct blame_tree_callback_data *data)
 {
 	struct blame_tree_entry *ent;
+	struct commit_active_paths *active;

 	/* Is it even a path that we are interested in? */
 	ent = hashmap_get_entry_from_hash(data->paths, strhash(path), path,
 					  struct blame_tree_entry, hashent);
 	if (!ent)
 		return;

 	/* Have we already blamed a commit? */
 	if (ent->commit)
 		return;
+
+	/* Are we inactive on the current commit? */
+	if (data->go_faster) {
+		active = active_paths_at(&active_paths, data->commit);
+		if (active && active->active &&
+		    !active->active[ent->diff_idx])
+			return;
+	}
+
 	/*
 	 * Is it arriving at a version of interest, or is it from a side branch
 	 * which did not contribute to the final state?
 	 */
-	if (oidcmp(oid, &ent->oid))
+	if (oid && oidcmp(oid, &ent->oid))
 		return;

 	ent->commit = data->commit;
 	data->num_interesting--;
 	if (data->callback)
 		data->callback(path, data->commit, data->callback_data);
 	hashmap_remove(data->paths, &ent->hashent, path);
 }
--- >8 ---

, where the above was generated with:

    $ git log -1 --oneline 0603f6d9c3c040c914c1412fab972252c4a765c4 \
        -L:mark_path:blame-tree.c

(in this case, 0603f6d9c3 is the hash of the commit that originally
introduced these changes on the GitHub side).

So I don't think that it's the case that we somehow missed this portion
of the changes when pulling the series apart, but rather that the check
was added later on, and not correctly pulled into the version that was
submitted here.

I was wondering if perhaps I had made an error when pulling these
patches out of GitHub's fork, but even in my b0ae8b3cc0 (blame-tree:
introduce '--go-faster' mode, 2025-03-27) from my fork, you can see the
same diff in mark_path() as above.

> It's not like we didn't know that was a risk, of course, and the payoff
> was getting a fresh look at the patches (to improve them and maybe even
> fix latent bugs). So it's probably something to just live with. But I
> wonder if/how we could mitigate that risk. When I reorganize patches in
> a tricky way locally, I often eyeball the diff of the end states
> (whatever mess I had originally, versus the result of the "clean"
> version), and that might have shown the omission here.
>
> I'm not sure if that would have helped here or not. The "end state" of
> the battle-tested version is really GitHub's internal fork. But maybe
> your original patches extracted from that (tb/blame-tree in your fork, I
> think) applied on top of the same base point (e.g., the current tip of
> master) might be an interesting comparison? Or maybe not. The earlier
> rounds have may have had other adjustments which introduce a bunch of
> noise.

I share your feeling here in genreal, but I think in this particular
case the patches were pulled out correctly (at least with respect to the
changes here in mark_path()), and that check was simply dropped or not
properly carried over when the patch we're discussing here was written.

> > Nice, I am glad to see that we are using a bitmap here rather than the
> > hacky 'char *' that we had originally written. I seem to remember that
> > there was a tiny slow-down when using bitmaps, but can't find the
> > discussion anymore. (It wasn't in the internal PR that I originally
> > opened, and I no longer can read messages that far back in history.)
> >
> > It might be worth benchmarking here to see if using a 'char *' is
> > faster. Of course, that's 8x worse in terms of memory usage, but not a
> > huge deal given both the magnitude and typical number of directory
> > elements (you'd need 1024^2 entries in a single tree to occupy even a
> > single MiB of heap).
>
> I doubt the memory usage matters too much. We throw away each bitmap
> after we finish processing its associated commit, so our max memory is
> really the size of the bitmap/char array times the size of the queue (so
> effectively the width of the history graph). So yeah, I too would be
> curious if the performance is actually better with chars.
>
> I also wonder how often we pass an unchanged bitmap to our parents
> (e.g., for the common case that a commit has a single parent, and does
> not touch any of the active paths, the active set will be the same for
> both). There's probably an easy-ish optimization to avoid allocating a
> new bitmap, and to just transfer ownership via pointer.

Funny enough, while we don't have this optimization in the original
version of this code, we did handle being TREESAME at the root tree as a
special case in the original blame-tree.c code. Toon dropped that change
here which I commented on earlier, but that would be a good opportunity
to optimize this case.

I don't think we ever bothered to measure how often we were able to just
pass all active path(s) up to the parent, probably because the original
code didn't actually use the optimization you're talking about here, and
instead did:

    if (!diff_queued_diff.nr) {
        for (i = 0; i < bt->all_paths_nr; i++) {
            if (active_c->active[i])
                pass_to_parent(active_c, active_p, i);
        }

        if (!(parent->object.flags & PARENT1)) {
            parent->object.flags |= PARENT1;
            prio_queue_put(queue, parent);

            ret = 1;
            goto cleanup;
        }
    }

, so that case was just handled specially, but not optimized. But I
don't know that you can just pass the bitmap up directly, since the
parent may already have some bits set if we reached it along some
different path.

I thought that we had to AND NOT out the bits in lm->scratch here, but
those are only set for non-TREESAME paths, so lm->scratch is going to be
all zeros in that case.

I think you could reasonably do something like the following on top of
Toon's patch, though:

--- 8< ---
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 40e520ba18..1a9ab3b2b0 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -303,6 +303,18 @@ static void process_parent(struct last_modified *lm,
 		diffcore_std(&lm->rev.diffopt);
 	}

+	if (!diff_queued_diff.nr) {
+		bitmap_or(active_p, active_c);
+		for (i = 0; i < active_c->word_alloc; i++)
+			active_c->words[i] = 0;
+
+		if (!(parent->object.flags & PARENT1)) {
+			parent->object.flags |= PARENT1;
+			prio_queue_put(queue, parent);
+		}
+		goto cleanup;
+	}
+
 	/*
 	 * Otherwise, test each path for TREESAME-ness against the parent. If
 	 * a path is TREESAME, pass it on to this parent.
@@ -330,6 +342,7 @@ static void process_parent(struct last_modified *lm,
 		prio_queue_put(queue, parent);
 	}

+cleanup:
 	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
 	diff_queued_diff.nr = 0;
 	diff_queue_clear(&diff_queued_diff);
--- >8 ---

> > Likewise, I wonder if we should have elemtype here be just 'struct
> > bitmap'. Unfortunately I don't think the EWAH code has a function like:
> >
> >     void bitmap_init(struct bitmap *);
> >
> > and only has ones that allocate for us. So we may consider adding one,
> > or creating a dummy bitmap and copying its contents, or otherwise.
>
> I thought that, too, though it does change the max memory use a bit.
> Right now we are storing one pointer per commit (the "struct bitmap *")
> and that is true whether we have processed the commit or not (it is
> populated while the commit is in the queue, and then NULL after). If we
> stored the struct directly, that's twice as many bytes (the eword_t
> pointer, plus a size_t), and it's per commit.

Mmm, good point. I wrote this thinking that the commit_slab was going to
end up a little gross under this patch, with the slab itself having type
'struct bitmap ***', but I agree with everything you wrote here.

> PS I tried building tb/blame-tree from your repo because I was poking at
>    how some of it worked (having forgotten everything I ever knew about
>    it by this point). It does work, but needs this:
>
> diff --git a/blame-tree.c b/blame-tree.c
> index 6addac7b0b..2448f2caf4 100644
> --- a/blame-tree.c
> +++ b/blame-tree.c
> @@ -800,7 +800,6 @@ static int process_parent(struct blame_tree *bt,
>  		int k = diff2idx(bt, fp->two->path);
>  		if (0 <= k && active_c->active[k])
>  			scratch[k] = 1;
> -		diff_free_filepair(fp);
>  	}
>  	for (i = 0; i < bt->all_paths_nr; i++) {
>  		if (active_c->active[i] && !scratch[i])
>
>   on top, since otherwise we try to double-free the filepairs. I'd guess
>   it is a victim of rebasing across a5aecb2cdc (diff: improve lifecycle
>   management of diff queues, 2024-09-30), which swapped out
>   DIFF_QUEUE_CLEAR(), which left freeing the responsibility of the
>   caller, for diff_queue_clear() which handles that itself.

Ah, good catch. When I pulled those patches out a while ago, I think I
wrote something like, "this should more or less work, but doesn't, and
I'll leave it as an exercise to the reader to figure out why ;-)."

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17 14:54     ` Taylor Blau
@ 2025-10-21  8:20       ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2025-10-21  8:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, Derrick Stolee

On Fri, Oct 17, 2025 at 10:54:53AM -0400, Taylor Blau wrote:

> > Hmm, sounds like the mark_path() discussion from:
> >
> >   https://lore.kernel.org/git/aHmPHcNQYlhGo8JB@nand.local/
> >
> > coming home to roost. I'm sure you already knew that, but there's maybe
> > an interesting process observation here: in pulling a battle-tested
> > implementation apart into patches to be applied in chunks, we ended up
> > missing a critical part of that original implementation and getting a
> > bug.
> 
> Hmm. Is that what happened in this case, though?
> [...]
> I was wondering if perhaps I had made an error when pulling these
> patches out of GitHub's fork, but even in my b0ae8b3cc0 (blame-tree:
> introduce '--go-faster' mode, 2025-03-27) from my fork, you can see the
> same diff in mark_path() as above.

Yeah, I think the patches in your fork are correct, and it got lost in
Toon's rewrite. It is probably naive to think we could diff the endpoint
(your fork vs Toon's patches) to find such changes, though. There have
been too many other cleanups and changes as it was upstreamed. So you
can ignore most of my other email as philosophical musing.

> [...some more clever optimizations...]

All of that looked plausibly correct to me. ;) I'll leave it to Toon to
experiment with it for correctness and performance improvements.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17 12:07   ` Toon Claes
@ 2025-10-21  9:04     ` Toon Claes
  2025-10-23 23:59       ` Taylor Blau
  2025-10-21 13:00     ` Toon Claes
  2025-10-23 23:56     ` Taylor Blau
  2 siblings, 1 reply; 39+ messages in thread
From: Toon Claes @ 2025-10-21  9:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee, Jeff King

> Taylor Blau <me@ttaylorr.com> writes:

>> Nice, I am glad to see that we are using a bitmap here rather than the
>> hacky 'char *' that we had originally written. I seem to remember that
>> there was a tiny slow-down when using bitmaps, but can't find the
>> discussion anymore. (It wasn't in the internal PR that I originally
>> opened, and I no longer can read messages that far back in history.)
>>
>> It might be worth benchmarking here to see if using a 'char *' is
>> faster. Of course, that's 8x worse in terms of memory usage, but not a
>> huge deal given both the magnitude and typical number of directory
>> elements (you'd need 1024^2 entries in a single tree to occupy even a
>> single MiB of heap).

Using ewah bitmaps is slightly faster, although the difference is almost
neglible.

    Benchmark 1: bitmap-ewah
      Time (mean ± σ):     793.1 ms ±   6.2 ms    [User: 755.1 ms, System: 35.2 ms]
      Range (min … max):   784.7 ms … 804.8 ms    10 runs

    Benchmark 2: bitmap-chars
      Time (mean ± σ):     808.9 ms ±  11.2 ms    [User: 770.8 ms, System: 35.4 ms]
      Range (min … max):   800.2 ms … 830.5 ms    10 runs

    Summary
      bitmap-ewah ran
        1.02 ± 0.02 times faster than bitmap-chars

And ewah bitmap being more memory efficient, it makes more sense to keep
using those.

>> Likewise, I wonder if we should have elemtype here be just 'struct
>> bitmap'. Unfortunately I don't think the EWAH code has a function like:
>>
>>     void bitmap_init(struct bitmap *);
>>
>> and only has ones that allocate for us. So we may consider adding one,
>> or creating a dummy bitmap and copying its contents, or otherwise.

I've done some testing, and to do so I've made bitmap_grow() public.

    Benchmark 1: bitmap-as-pointers
      Time (mean ± σ):     783.7 ms ±   8.9 ms    [User: 744.1 ms, System: 37.5 ms]
      Range (min … max):   774.4 ms … 803.4 ms    10 runs

    Benchmark 2: bitmap-as-values
      Time (mean ± σ):     856.7 ms ±  10.5 ms    [User: 816.0 ms, System: 38.1 ms]
      Range (min … max):   845.7 ms … 872.5 ms    10 runs

    Summary
      bitmap-as-pointers ran
        1.09 ± 0.02 times faster than bitmap-as-values

It seems using ewah bitmaps as pointers is faster than using bitmaps as
values. I must admit I'm surprised as well, but in case you want to
double check, here's the patch:

------------------------ >8 ------------------------

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index c1316e1019..f607c47506 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -47,7 +47,7 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
  * Hold a bitmap for each commit we're working with. Each bit represents a path
  * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
  */
-define_commit_slab(commit_bitmaps, struct bitmap *);
+define_commit_slab(commit_bitmaps, struct bitmap);

 struct last_modified {
        struct hashmap paths;
@@ -65,11 +65,12 @@ struct last_modified {

 static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
 {
-       struct bitmap **bitmap = commit_bitmaps_at(&lm->commit_bitmaps, c);
-       if (!*bitmap)
-               *bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
+       struct bitmap *bm = commit_bitmaps_at(&lm->commit_bitmaps, c);
+       if (!bm->word_alloc) {
+               bitmap_grow(bm, lm->all_paths_nr);
+       }

-       return *bitmap;
+       return bm;
 }

 static void last_modified_release(struct last_modified *lm)
@@ -442,7 +443,8 @@ static int last_modified_run(struct last_modified *lm)
                }

 cleanup:
-               bitmap_free(active_c);
+               free(active_c->words);
+               active_c->word_alloc = 0;
        }

        if (hashmap_get_size(&lm->paths))
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 55928dada8..2500e3a0d7 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -42,7 +42,7 @@ struct bitmap *bitmap_dup(const struct bitmap *src)
        return dst;
 }

-static void bitmap_grow(struct bitmap *self, size_t word_alloc)
+void bitmap_grow(struct bitmap *self, size_t word_alloc)
 {
        size_t old_size = self->word_alloc;
        ALLOC_GROW(self->words, word_alloc, self->word_alloc);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c29d354236..3316807572 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -188,6 +188,7 @@ struct bitmap *bitmap_word_alloc(size_t word_alloc);
 struct bitmap *bitmap_dup(const struct bitmap *src);
 void bitmap_set(struct bitmap *self, size_t pos);
 void bitmap_unset(struct bitmap *self, size_t pos);
+void bitmap_grow(struct bitmap *self, size_t word_alloc);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);


-- 
Cheers,
Toon

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2] last-modified: implement faster algorithm
  2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
                   ` (3 preceding siblings ...)
  2025-10-17  6:37 ` Jeff King
@ 2025-10-21 12:56 ` Toon Claes
  2025-10-21 17:52   ` Junio C Hamano
                     ` (2 more replies)
  4 siblings, 3 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-21 12:56 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee, Toon Claes

The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.

While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:

  1. There are many entries of interest, in which case there is simply
     a lot of work to do.

  2. Or, there are (even a few) entries which have not been updated in a
     long time, and so we must walk through a lot of history in order to
     find a commit that touches that path.

This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.

The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.

More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.

Then, while the queue is not empty, do the following:

  1. Pop an element, say, 'c', off of the queue, making sure that 'c'
     isn't reachable by anything in the '--not' set.

  2. For each parent 'p' (with index 'parent_i') of 'c', do the
     following:

     a. Compute the diff between 'c' and 'p'.
     b. Pass any active paths that are TREESAME from 'c' to 'p'.
     c. If 'p' has any active paths, push it onto the queue.

  3. Any path that remains active on 'c' is associated to that commit.

This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.

To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.

Comparing the performance of this new algorithm shows about a 2.5x
improvement on git.git:

    Benchmark 1: master   no bloom
      Time (mean ± σ):      2.868 s ±  0.023 s    [User: 2.811 s, System: 0.051 s]
      Range (min … max):    2.847 s …  2.926 s    10 runs

    Benchmark 2: master with bloom
      Time (mean ± σ):     949.9 ms ±  15.2 ms    [User: 907.6 ms, System: 39.5 ms]
      Range (min … max):   933.3 ms … 971.2 ms    10 runs

    Benchmark 3: HEAD     no bloom
      Time (mean ± σ):     782.0 ms ±   6.3 ms    [User: 740.7 ms, System: 39.2 ms]
      Range (min … max):   776.4 ms … 798.2 ms    10 runs

    Benchmark 4: HEAD   with bloom
      Time (mean ± σ):     307.1 ms ±   1.7 ms    [User: 276.4 ms, System: 29.9 ms]
      Range (min … max):   303.7 ms … 309.5 ms    10 runs

    Summary
      HEAD   with bloom ran
        2.55 ± 0.02 times faster than HEAD     no bloom
        3.09 ± 0.05 times faster than master with bloom
        9.34 ± 0.09 times faster than master   no bloom

In short, the existing implementation is comparably fast *with* Bloom
filters as the new implementation is *without* Bloom filters. So, most
repositories should get a dramatic speed-up by just deploying this (even
without computing Bloom filters), and all repositories should get faster
still when computing Bloom filters.

When comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is even 5 times better:

    Benchmark 1: master
      Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
      Range (min … max):    4.308 s …  4.509 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
      Range (min … max):   810.6 ms … 881.2 ms    10 runs

    Summary
      HEAD ran
        5.29 ± 0.16 times faster than master

As an added benefit, results are more consistent now. For example
implementation in 'master' gives:

    $ git log --max-count=1 --format=%H -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de

    $ git last-modified -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h

    $ git last-modified | grep pkt-line.h
    5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h

With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.

One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.

Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
The subcommand git-last-modified(1) was based on the patches shared by
Taylor and the folks at GitHub[1]. That version used an alternative
implementation to make it "go faster". When I was working on upstreaming
those patches, I dropped the patches[2] for this implementation, because
I didn't see significant improvements.

This series revives those changes. I did more thorough deep dive through
the code and the algorithm and got the code working a lot faster. The
benchmark results can be found in the commit message.

Some changes compared to GitHub's version include:

 * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
   `struct commit_active_paths`.

 * Removed shortcut code that handled the case when commit and parent
   are fully treesame, and instead always checked 'active_c' whether the
   next parent is worth looking at.

 * Modified comments and commit message to make the algorithm more
   clear (at least to me).

 * Mentioned the use of PARENT1 and PARENT2 in object.h.

 * Removed the use of any global variables.

 * Less conditions are checked in mark_path() because the hashmap of
   'paths' is considered the single-source of truth.

 * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
   hashmap no more.

[1]: https://lore.kernel.org/git/Z+XJ+1L3PnC9Dyba@nand.local/
[2]: https://lore.kernel.org/git/20250630-toon-new-blame-tree-v3-0-3516025dc3bc@iotcl.com/
---
Changes in v2:
- Add benchmark results comparing repositories with and without Bloom
  filters in commit message.
- Add Stolee and Taylor in the commit message trailers.
- Fix segfault by checking if 'oid' is set in mark_path().
- Remove hashmap lookup in pass_to_parent().
- Remove manually calling diff_free_filepair().
- Rename commit slab bitmap to "active_paths_bitmap".
- Link to v1: https://lore.kernel.org/r/20251016-b4-toon-last-modified-faster-v1-1-85dca8a29e5c@iotcl.com

Range-diff versus v1

1:  f7d169e846 ! 1:  e2aa8e1fa0 last-modified: implement faster algorithm
    @@ Commit message
         Bloom filter. Since the commit-graph only stores first-parent diffs in
         the Bloom filters, we can only apply this trick to first-parent diffs.

    -    Comparing the performance of this new algorithm shows about a 2.6x
    +    Comparing the performance of this new algorithm shows about a 2.5x
         improvement on git.git:

    -        Benchmark 1: master
    -          Time (mean ± σ):      3.077 s ±  0.055 s    [User: 3.017 s, System: 0.051 s]
    -          Range (min … max):    2.947 s …  3.127 s    10 runs
    +        Benchmark 1: master   no bloom
    +          Time (mean ± σ):      2.868 s ±  0.023 s    [User: 2.811 s, System: 0.051 s]
    +          Range (min … max):    2.847 s …  2.926 s    10 runs

    -        Benchmark 2: HEAD
    -          Time (mean ± σ):      1.181 s ±  0.010 s    [User: 1.139 s, System: 0.038 s]
    -          Range (min … max):    1.169 s …  1.194 s    10 runs
    +        Benchmark 2: master with bloom
    +          Time (mean ± σ):     949.9 ms ±  15.2 ms    [User: 907.6 ms, System: 39.5 ms]
    +          Range (min … max):   933.3 ms … 971.2 ms    10 runs
    +
    +        Benchmark 3: HEAD     no bloom
    +          Time (mean ± σ):     782.0 ms ±   6.3 ms    [User: 740.7 ms, System: 39.2 ms]
    +          Range (min … max):   776.4 ms … 798.2 ms    10 runs
    +
    +        Benchmark 4: HEAD   with bloom
    +          Time (mean ± σ):     307.1 ms ±   1.7 ms    [User: 276.4 ms, System: 29.9 ms]
    +          Range (min … max):   303.7 ms … 309.5 ms    10 runs

             Summary
    -          HEAD ran
    -            2.60 ± 0.05 times faster than master
    +          HEAD   with bloom ran
    +            2.55 ± 0.02 times faster than HEAD     no bloom
    +            3.09 ± 0.05 times faster than master with bloom
    +            9.34 ± 0.09 times faster than master   no bloom

    -    But when comparing a more extreme example of
    -    `git last-modified -- COPYING t`, the difference is a lot bigger:
    +    In short, the existing implementation is comparably fast *with* Bloom
    +    filters as the new implementation is *without* Bloom filters. So, most
    +    repositories should get a dramatic speed-up by just deploying this (even
    +    without computing Bloom filters), and all repositories should get faster
    +    still when computing Bloom filters.
    +
    +    When comparing a more extreme example of
    +    `git last-modified -- COPYING t`, the difference is even 5 times better:

             Benchmark 1: master
               Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
    @@ Commit message
               HEAD ran
                 5.29 ± 0.16 times faster than master

    -    As an added benefit, this implementation gives more correct results. For
    -    example implementation in 'master' gives:
    +    As an added benefit, results are more consistent now. For example
    +    implementation in 'master' gives:

             $ git log --max-count=1 --format=%H -- pkt-line.h
             15df15fe07ef66b51302bb77e393f3c5502629de
    @@ Commit message
         order than before. This is not considerd to be an issue because nowhere
         is documented the order is guaranteed.

    +    Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
         Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Toon Claes <toon@iotcl.com>

      ## builtin/last-modified.c ##
    @@ builtin/last-modified.c: static int last_modified_entry_hashcmp(const void *unus
     +
     +	const char **all_paths;
     +	size_t all_paths_nr;
    -+	struct commit_bitmaps commit_bitmaps;
    ++	struct commit_bitmaps active_paths_bitmap;
     +
     +	/* 'scratch' bitmap to avoid allocating every proccess_parent() */
     +	struct bitmap *scratch;
    @@ builtin/last-modified.c: static int last_modified_entry_hashcmp(const void *unus

     +static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
     +{
    -+	struct bitmap **bitmap = commit_bitmaps_at(&lm->commit_bitmaps, c);
    ++	struct bitmap **bitmap = commit_bitmaps_at(&lm->active_paths_bitmap, c);
     +	if (!*bitmap)
     +		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
     +
    @@ builtin/last-modified.c: static void last_modified_release(struct last_modified
      }

      struct last_modified_callback_data {
    +@@ builtin/last-modified.c: static void mark_path(const char *path, const struct object_id *oid,
    + 	 * Is it arriving at a version of interest, or is it from a side branch
    + 	 * which did not contribute to the final state?
    + 	 */
    +-	if (!oideq(oid, &ent->oid))
    ++	if (oid && !oideq(oid, &ent->oid))
    + 		return;
    +
    + 	last_modified_emit(data->lm, path, data->commit);
     @@ builtin/last-modified.c: static void last_modified_diff(struct diff_queue_struct *q,
      	}
      }
    @@ builtin/last-modified.c: static void last_modified_diff(struct diff_queue_struct
     +			   struct bitmap *p,
     +			   size_t pos)
     +{
    -+	struct last_modified_entry *ent;
    -+	struct hashmap_iter iter;
    -+
     +	bitmap_unset(c, pos);
    -+
    -+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
    -+		if (ent->diff_idx == pos) {
    -+			bitmap_set(p, pos);
    -+			break;
    -+		}
    -+	}
    ++	bitmap_set(p, pos);
     +}
     +
     +static bool maybe_changed_path(struct last_modified *lm,
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +	}
     +
     +	/*
    -+	 * Otherwise, test each path for TREESAME-ness against the parent. If
    -+	 * a path is TREESAME, pass it on to this parent.
    ++	 * Test each path for TREESAME-ness against the parent. If a path is
    ++	 * TREESAME, pass it on to this parent.
     +	 *
     +	 * First, collect all paths that are *not* TREESAME in 'scratch'.
     +	 * Then, pass paths that *are* TREESAME and active to the parent.
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +		size_t k = path_idx(lm, fp->two->path);
     +		if (0 <= k && bitmap_get(active_c, k))
     +			bitmap_set(lm->scratch, k);
    -+		diff_free_filepair(fp);
     +	}
     +	for (i = 0; i < lm->all_paths_nr; i++) {
     +		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +	}
     +
     +	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
    -+	diff_queued_diff.nr = 0;
     +	diff_queue_clear(&diff_queued_diff);
     +}
     +
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     -			BUG("paths remaining beyond boundary in last-modified");
     +	max_count = lm->rev.max_count;
     +
    -+	init_commit_bitmaps(&lm->commit_bitmaps);
    ++	init_commit_bitmaps(&lm->active_paths_bitmap);
     +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
     +
     +	/*
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +
     +			prio_queue_put(&queue, c);
     +			c->object.flags |= PARENT1;
    -
    --		if (data.commit->object.flags & BOUNDARY) {
    ++
     +			active = get_bitmap(lm, c);
     +			for (size_t i = 0; i < lm->all_paths_nr; i++)
     +				bitmap_set(active, i);
     +		}
     +	}
    -+
    +
    +-		if (data.commit->object.flags & BOUNDARY) {
     +	while (queue.nr) {
     +		int parent_i;
     +		struct commit_list *p;
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +			if (bitmap_is_empty(active_c))
     +				break;
     +		}
    -
    --		log_tree_commit(&lm->rev, data.commit);
    ++
     +		/*
     +		 * Paths that remain active, or not TREESAME with any parent,
     +		 * were changed by 'c'.
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +					mark_path(lm->all_paths[i], NULL, &data);
     +			}
     +		}
    -+
    +
    +-		log_tree_commit(&lm->rev, data.commit);
     +cleanup:
     +		bitmap_free(active_c);
      	}
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +
     +	clear_prio_queue(&not_queue);
     +	clear_prio_queue(&queue);
    -+	clear_commit_bitmaps(&lm->commit_bitmaps);
    ++	clear_commit_bitmaps(&lm->active_paths_bitmap);
     +	bitmap_free(lm->scratch);
     +
      	return 0;
---
 builtin/last-modified.c  | 243 ++++++++++++++++++++++++++++++++++++++++++++---
 object.h                 |   1 +
 t/t8020-last-modified.sh |   2 +-
 3 files changed, 230 insertions(+), 16 deletions(-)

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index ae8b36a2c3..e9050485a9 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -2,26 +2,32 @@
 #include "bloom.h"
 #include "builtin.h"
 #include "commit-graph.h"
+#include "commit-slab.h"
 #include "commit.h"
 #include "config.h"
-#include "environment.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "environment.h"
+#include "ewah/ewok.h"
 #include "hashmap.h"
 #include "hex.h"
-#include "log-tree.h"
 #include "object-name.h"
 #include "object.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "quote.h"
 #include "repository.h"
 #include "revision.h"
 
+/* Remember to update object flag allocation in object.h */
+#define PARENT1 (1u<<16) /* used instead of SEEN */
+#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
+
 struct last_modified_entry {
 	struct hashmap_entry hashent;
 	struct object_id oid;
 	struct bloom_key key;
+	size_t diff_idx;
 	const char path[FLEX_ARRAY];
 };
 
@@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
 	return strcmp(ent1->path, path ? path : ent2->path);
 }
 
+/*
+ * Hold a bitmap for each commit we're working with. Each bit represents a path
+ * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
+ */
+define_commit_slab(commit_bitmaps, struct bitmap *);
+
 struct last_modified {
 	struct hashmap paths;
 	struct rev_info rev;
 	bool recursive;
 	bool show_trees;
+
+	const char **all_paths;
+	size_t all_paths_nr;
+	struct commit_bitmaps active_paths_bitmap;
+
+	/* 'scratch' bitmap to avoid allocating every proccess_parent() */
+	struct bitmap *scratch;
 };
 
+static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = commit_bitmaps_at(&lm->active_paths_bitmap, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
+
+	return *bitmap;
+}
+
 static void last_modified_release(struct last_modified *lm)
 {
 	struct hashmap_iter iter;
@@ -54,6 +82,8 @@ static void last_modified_release(struct last_modified *lm)
 
 	hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
 	release_revisions(&lm->rev);
+
+	free(lm->all_paths);
 }
 
 struct last_modified_callback_data {
@@ -146,7 +176,7 @@ static void mark_path(const char *path, const struct object_id *oid,
 	 * Is it arriving at a version of interest, or is it from a side branch
 	 * which did not contribute to the final state?
 	 */
-	if (!oideq(oid, &ent->oid))
+	if (oid && !oideq(oid, &ent->oid))
 		return;
 
 	last_modified_emit(data->lm, path, data->commit);
@@ -196,7 +226,27 @@ static void last_modified_diff(struct diff_queue_struct *q,
 	}
 }
 
-static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
+static size_t path_idx(struct last_modified *lm, char *path)
+{
+	struct last_modified_entry *ent;
+	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
+					  struct last_modified_entry, hashent);
+
+	return ent ? ent->diff_idx : -1;
+}
+
+static void pass_to_parent(struct last_modified *lm,
+			   struct bitmap *c,
+			   struct bitmap *p,
+			   size_t pos)
+{
+	bitmap_unset(c, pos);
+	bitmap_set(p, pos);
+}
+
+static bool maybe_changed_path(struct last_modified *lm,
+			       struct commit *origin,
+			       struct bitmap *active)
 {
 	struct bloom_filter *filter;
 	struct last_modified_entry *ent;
@@ -213,6 +263,9 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 		return true;
 
 	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		if (active && !bitmap_get(active, ent->diff_idx))
+			continue;
+
 		if (bloom_filter_contains(filter, &ent->key,
 					  lm->rev.bloom_filter_settings))
 			return true;
@@ -220,42 +273,195 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 	return false;
 }
 
+static void process_parent(struct last_modified *lm,
+			   struct prio_queue *queue,
+			   struct commit *c, struct bitmap *active_c,
+			   struct commit *parent, int parent_i)
+{
+	size_t i;
+	struct bitmap *active_p;
+
+	repo_parse_commit(lm->rev.repo, parent);
+	active_p = get_bitmap(lm, parent);
+
+	/*
+	 * The first time entering this function for this commit (i.e. first parent)
+	 * see if Bloom filters will tell us it's worth to do the diff.
+	 */
+	if (parent_i || maybe_changed_path(lm, c, active_c)) {
+		diff_tree_oid(&parent->object.oid,
+			      &c->object.oid, "", &lm->rev.diffopt);
+		diffcore_std(&lm->rev.diffopt);
+	}
+
+	/*
+	 * Test each path for TREESAME-ness against the parent. If a path is
+	 * TREESAME, pass it on to this parent.
+	 *
+	 * First, collect all paths that are *not* TREESAME in 'scratch'.
+	 * Then, pass paths that *are* TREESAME and active to the parent.
+	 */
+	for (i = 0; i < diff_queued_diff.nr; i++) {
+		struct diff_filepair *fp = diff_queued_diff.queue[i];
+		size_t k = path_idx(lm, fp->two->path);
+		if (0 <= k && bitmap_get(active_c, k))
+			bitmap_set(lm->scratch, k);
+	}
+	for (i = 0; i < lm->all_paths_nr; i++) {
+		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
+			pass_to_parent(lm, active_c, active_p, i);
+	}
+
+	/*
+	 * If parent has any active paths, put it on the queue (if not already).
+	 */
+	if (!bitmap_is_empty(active_p) && !(parent->object.flags & PARENT1)) {
+		parent->object.flags |= PARENT1;
+		prio_queue_put(queue, parent);
+	}
+
+	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
+	diff_queue_clear(&diff_queued_diff);
+}
+
 static int last_modified_run(struct last_modified *lm)
 {
+	int max_count, queue_popped = 0;
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
+	struct commit_list *list;
 	struct last_modified_callback_data data = { .lm = lm };
 
 	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	lm->rev.diffopt.format_callback = last_modified_diff;
 	lm->rev.diffopt.format_callback_data = &data;
+	lm->rev.no_walk = 1;
 
 	prepare_revision_walk(&lm->rev);
 
-	while (hashmap_get_size(&lm->paths)) {
-		data.commit = get_revision(&lm->rev);
-		if (!data.commit)
-			BUG("paths remaining beyond boundary in last-modified");
+	max_count = lm->rev.max_count;
+
+	init_commit_bitmaps(&lm->active_paths_bitmap);
+	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
+
+	/*
+	 * lm->rev.commits holds the set of boundary commits for our walk.
+	 *
+	 * Loop through each such commit, and place it in the appropriate queue.
+	 */
+	for (list = lm->rev.commits; list; list = list->next) {
+		struct commit *c = list->item;
+
+		if (c->object.flags & BOTTOM) {
+			prio_queue_put(&not_queue, c);
+			c->object.flags |= PARENT2;
+		} else if (!(c->object.flags & PARENT1)) {
+			/*
+			 * If the commit is a starting point (and hasn't been
+			 * seen yet), then initialize the set of interesting
+			 * paths, too.
+			 */
+			struct bitmap *active;
+
+			prio_queue_put(&queue, c);
+			c->object.flags |= PARENT1;
+
+			active = get_bitmap(lm, c);
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
+				bitmap_set(active, i);
+		}
+	}
 
-		if (data.commit->object.flags & BOUNDARY) {
+	while (queue.nr) {
+		int parent_i;
+		struct commit_list *p;
+		struct commit *c = prio_queue_get(&queue);
+		struct bitmap *active_c = get_bitmap(lm, c);
+
+		if ((0 <= max_count && max_count < ++queue_popped) ||
+		    (c->object.flags & PARENT2)) {
+			/*
+			 * Either a boundary commit, or we have already seen too
+			 * many others. Either way, stop here.
+			 */
+			c->object.flags |= PARENT2 | BOUNDARY;
+			data.commit = c;
 			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
-				      &data.commit->object.oid, "",
-				      &lm->rev.diffopt);
+				      &c->object.oid,
+				      "", &lm->rev.diffopt);
 			diff_flush(&lm->rev.diffopt);
+			goto cleanup;
+		}
 
-			break;
+		/*
+		 * Otherwise, make sure that 'c' isn't reachable from anything
+		 * in the '--not' queue.
+		 */
+		repo_parse_commit(lm->rev.repo, c);
+
+		while (not_queue.nr) {
+			struct commit_list *np;
+			struct commit *n = prio_queue_get(&not_queue);
+
+			repo_parse_commit(lm->rev.repo, n);
+
+			for (np = n->parents; np; np = np->next) {
+				if (!(np->item->object.flags & PARENT2)) {
+					prio_queue_put(&not_queue, np->item);
+					np->item->object.flags |= PARENT2;
+				}
+			}
+
+			if (commit_graph_generation(n) < commit_graph_generation(c))
+				break;
 		}
 
-		if (!maybe_changed_path(lm, data.commit))
-			continue;
+		/*
+		 * Look at each parent and pass on each path that's TREESAME
+		 * with that parent. Stop early when no active paths remain.
+		 */
+		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
+			process_parent(lm, &queue,
+				       c, active_c,
+				       p->item, parent_i);
+
+			if (bitmap_is_empty(active_c))
+				break;
+		}
+
+		/*
+		 * Paths that remain active, or not TREESAME with any parent,
+		 * were changed by 'c'.
+		 */
+		if (!bitmap_is_empty(active_c))  {
+			data.commit = c;
+			for (size_t i = 0; i < lm->all_paths_nr; i++) {
+				if (bitmap_get(active_c, i))
+					mark_path(lm->all_paths[i], NULL, &data);
+			}
+		}
 
-		log_tree_commit(&lm->rev, data.commit);
+cleanup:
+		bitmap_free(active_c);
 	}
 
+	if (hashmap_get_size(&lm->paths))
+		BUG("paths remaining beyond boundary in last-modified");
+
+	clear_prio_queue(&not_queue);
+	clear_prio_queue(&queue);
+	clear_commit_bitmaps(&lm->active_paths_bitmap);
+	bitmap_free(lm->scratch);
+
 	return 0;
 }
 
 static int last_modified_init(struct last_modified *lm, struct repository *r,
 			      const char *prefix, int argc, const char **argv)
 {
+	struct hashmap_iter iter;
+	struct last_modified_entry *ent;
+
 	hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
 
 	repo_init_revisions(r, &lm->rev, prefix);
@@ -280,6 +486,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
 	if (populate_paths_from_revs(lm) < 0)
 		return error(_("unable to setup last-modified"));
 
+	lm->all_paths = xcalloc(hashmap_get_size(&lm->paths), sizeof(const char *));
+	lm->all_paths_nr = 0;
+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		ent->diff_idx = lm->all_paths_nr++;
+		lm->all_paths[ent->diff_idx] = ent->path;
+	}
+
 	return 0;
 }
 
diff --git a/object.h b/object.h
index 8c3c1c46e1..fa504a09c0 100644
--- a/object.h
+++ b/object.h
@@ -75,6 +75,7 @@ void object_array_init(struct object_array *array);
  * http-push.c:                          11-----14
  * commit-graph.c:                                15
  * commit-reach.c:                                  16-----19
+ * builtin/last-modified.c:                         1617
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * bloom.c:                                                    2122
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 61f00bc15c..a4c1114ee2 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -57,9 +57,9 @@ test_expect_success 'last-modified recursive' '
 
 test_expect_success 'last-modified recursive with show-trees' '
 	check_last_modified -r -t <<-\EOF
-	3 a
 	3 a/b
 	3 a/b/file
+	3 a
 	2 a/file
 	1 file
 	EOF

---
base-commit: 133d151831d32bdcc02422599a3f26cef44f929b
change-id: 20251009-b4-toon-last-modified-faster-4c8956a95261


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17 12:07   ` Toon Claes
  2025-10-21  9:04     ` Toon Claes
@ 2025-10-21 13:00     ` Toon Claes
  2025-10-23 23:56     ` Taylor Blau
  2 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-21 13:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee

> Taylor Blau <me@ttaylorr.com> writes:

>> I'd suggest including those benchmarks as well, and potentially running
>> them on linux.git, or another comparably larger open-source repository.
>> git.git is large enough to show some interesting behavior, but I always
>> have found it useful to compare the results against a larger repository
>> as well.

I've submitted v2 of my patch. In that version I included the benchmark
for a repo without and with Bloom filters. But I didn't include
benchmarks on linux.git. So for the record I'm sharing it here:

    Benchmark 1: master   no bloom
      Time (mean ± σ):     12.290 s ±  0.105 s    [User: 11.893 s, System: 0.370 s]
      Range (min … max):   12.089 s … 12.461 s    10 runs

    Benchmark 2: master with bloom
      Time (mean ± σ):      1.536 s ±  0.072 s    [User: 1.418 s, System: 0.114 s]
      Range (min … max):    1.500 s …  1.740 s    10 runs

    Benchmark 3: HEAD     no bloom
      Time (mean ± σ):     868.7 ms ±   3.0 ms    [User: 810.7 ms, System: 56.1 ms]
      Range (min … max):   863.1 ms … 873.5 ms    10 runs

    Benchmark 4: HEAD   with bloom
      Time (mean ± σ):     110.0 ms ±   1.8 ms    [User: 89.3 ms, System: 20.3 ms]
      Range (min … max):   106.6 ms … 114.2 ms    27 runs

    Summary
      HEAD   with bloom ran
        7.90 ± 0.13 times faster than HEAD     no bloom
       13.96 ± 0.69 times faster than master with bloom
      111.73 ± 2.02 times faster than master   no bloom

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-21 12:56 ` [PATCH v2] " Toon Claes
@ 2025-10-21 17:52   ` Junio C Hamano
  2025-10-22  0:26     ` Taylor Blau
  2025-10-23  8:01     ` Toon Claes
  2025-10-23  7:50   ` [PATCH v3] " Toon Claes
  2025-11-03 15:47   ` [PATCH v4] " Toon Claes
  2 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2025-10-21 17:52 UTC (permalink / raw)
  To: Toon Claes
  Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee

Toon Claes <toon@iotcl.com> writes:

> +static size_t path_idx(struct last_modified *lm, char *path)
> +{
> +	struct last_modified_entry *ent;
> +	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
> +					  struct last_modified_entry, hashent);
> +
> +	return ent ? ent->diff_idx : -1;
> +}

size_t is unsigned and cannot reutrn -1 sanely, unless the caller
knows that ((size_t)-1) signals an error.  The compiler warns, and
we compile with -Werror, so we end up getting

    builtin/last-modified.c: In function 'path_idx':
    builtin/last-modified.c:235:38: error: operand of '?:' changes signedness from 'int' to 'size_t' {aka 'long unsigned int'} due to unsignedness of other operand [-Werror=sign-compare]
      235 |         return ent ? ent->diff_idx : -1;
          |                                      ^~

> +static void pass_to_parent(struct last_modified *lm,
> +			   struct bitmap *c,
> +			   struct bitmap *p,
> +			   size_t pos)
> +{
> +	bitmap_unset(c, pos);
> +	bitmap_set(p, pos);
> +}

Mark lm as UNUSED, or we'd get 

    builtin/last-modified.c: In function 'pass_to_parent':
    builtin/last-modified.c:238:50: error: unused parameter 'lm' [-Werror=unused-parameter]
      238 | static void pass_to_parent(struct last_modified *lm,
          |                            ~~~~~~~~~~~~~~~~~~~~~~^~

> @@ -220,42 +273,195 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
>  	return false;
>  }
>  
> +static void process_parent(struct last_modified *lm,
> +			   struct prio_queue *queue,
> +			   struct commit *c, struct bitmap *active_c,
> +			   struct commit *parent, int parent_i)
> +{
> +	size_t i;
> +...
> +	for (i = 0; i < diff_queued_diff.nr; i++) {

Our -Wsign-compare forces the compiler to give a stupid warning
here, that i is unsigned and diff_queued_diff.nr is signed.  

    builtin/last-modified.c: In function 'process_parent':
    builtin/last-modified.c:304:23: error: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare]
      304 |         for (i = 0; i < diff_queued_diff.nr; i++) {
          |                       ^

Yes, it is comparing unsigned with signed but so what?

As people may now know, my preference is to wean ourselves off of
this dogmatic trust in -Wsign-compare but those who disagree and want
to remove #define DISABLE_SIGN_COMPARE_WARNINGS should help our poor
compiler here by telling it that this comparison is perfectly fine.

We know diff_queued_diff.nr is an int, and i is size_t, but we also
know diff_queued_diff.nr won't be negative (or we have much bigger
problems) and cannot be larger than what size_t can represent.

> +		struct diff_filepair *fp = diff_queued_diff.queue[i];
> +		size_t k = path_idx(lm, fp->two->path);
> +		if (0 <= k && bitmap_get(active_c, k))
> +			bitmap_set(lm->scratch, k);
> +	}

Earlier path_idx() wanted to signal an error by returning negative,
but the type is size_t that is unsigned so it cannot do so.  We
instead get

    builtin/last-modified.c:307:23: error: comparison of unsigned expression in '>= 0' is always true [-Werror=type-limits]
      307 |                 if (0 <= k && bitmap_get(active_c, k))
          |                       ^~

and in this case we actually deserve it (in the sense that this is
not the fault of dogmatic trust in -Wsign-compare; this is caused by
using size_t to count things).

And the solution for this is *not* "size_t" -> "ssize_t", because
ssize_t is not "store half the range of size_t with negative values
reserved for something else like errors".  Its width can be much
narrower (this came up in a separate thread very recently [*]).
Instead we'd need something ugly like

	if (k != (size_t)-1 && bitmap_get(active_c, k))

A quick band-aid patch to make it compile is attached at the end,
but it does not try to address the root causes, which are abuse of
size_t as count_t and religious use of "-Wsign-compare" [*].


[Reference]

* https://lore.kernel.org/git/9eafee4d-ea94-4382-ada0-58000d229d2e@gmail.com/
* https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/


 builtin/last-modified.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git c/builtin/last-modified.c w/builtin/last-modified.c
index e9050485a9..6135bcc584 100644
--- c/builtin/last-modified.c
+++ w/builtin/last-modified.c
@@ -232,10 +232,10 @@ static size_t path_idx(struct last_modified *lm, char *path)
 	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
 					  struct last_modified_entry, hashent);
 
-	return ent ? ent->diff_idx : -1;
+	return ent ? ent->diff_idx : (size_t)-1;
 }
 
-static void pass_to_parent(struct last_modified *lm,
+static void pass_to_parent(struct last_modified *lm UNUSED,
 			   struct bitmap *c,
 			   struct bitmap *p,
 			   size_t pos)
@@ -278,7 +278,6 @@ static void process_parent(struct last_modified *lm,
 			   struct commit *c, struct bitmap *active_c,
 			   struct commit *parent, int parent_i)
 {
-	size_t i;
 	struct bitmap *active_p;
 
 	repo_parse_commit(lm->rev.repo, parent);
@@ -301,13 +300,13 @@ static void process_parent(struct last_modified *lm,
 	 * First, collect all paths that are *not* TREESAME in 'scratch'.
 	 * Then, pass paths that *are* TREESAME and active to the parent.
 	 */
-	for (i = 0; i < diff_queued_diff.nr; i++) {
+	for (int i = 0; i < diff_queued_diff.nr; i++) {
 		struct diff_filepair *fp = diff_queued_diff.queue[i];
 		size_t k = path_idx(lm, fp->two->path);
-		if (0 <= k && bitmap_get(active_c, k))
+		if (k != (size_t)-1 && bitmap_get(active_c, k))
 			bitmap_set(lm->scratch, k);
 	}
-	for (i = 0; i < lm->all_paths_nr; i++) {
+	for (size_t i = 0; i < lm->all_paths_nr; i++) {
 		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
 			pass_to_parent(lm, active_c, active_p, i);
 	}


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-21 17:52   ` Junio C Hamano
@ 2025-10-22  0:26     ` Taylor Blau
  2025-10-22  0:28       ` Taylor Blau
  2025-10-22  3:48       ` Junio C Hamano
  2025-10-23  8:01     ` Toon Claes
  1 sibling, 2 replies; 39+ messages in thread
From: Taylor Blau @ 2025-10-22  0:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

On Tue, Oct 21, 2025 at 10:52:11AM -0700, Junio C Hamano wrote:
> > +		struct diff_filepair *fp = diff_queued_diff.queue[i];
> > +		size_t k = path_idx(lm, fp->two->path);
> > +		if (0 <= k && bitmap_get(active_c, k))
> > +			bitmap_set(lm->scratch, k);
> > +	}
>
> Earlier path_idx() wanted to signal an error by returning negative,
> but the type is size_t that is unsigned so it cannot do so.  We
> instead get
>
>     builtin/last-modified.c:307:23: error: comparison of unsigned expression in '>= 0' is always true [-Werror=type-limits]
>       307 |                 if (0 <= k && bitmap_get(active_c, k))
>           |                       ^~
>
> and in this case we actually deserve it (in the sense that this is
> not the fault of dogmatic trust in -Wsign-compare; this is caused by
> using size_t to count things).
>
> And the solution for this is *not* "size_t" -> "ssize_t", because
> ssize_t is not "store half the range of size_t with negative values
> reserved for something else like errors".  Its width can be much
> narrower (this came up in a separate thread very recently [*]).
> Instead we'd need something ugly like
>
> 	if (k != (size_t)-1 && bitmap_get(active_c, k))
>
> A quick band-aid patch to make it compile is attached at the end,
> but it does not try to address the root causes, which are abuse of
> size_t as count_t and religious use of "-Wsign-compare" [*].
>
>
> [Reference]
>
> * https://lore.kernel.org/git/9eafee4d-ea94-4382-ada0-58000d229d2e@gmail.com/
> * https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/

Yeah, this is a true positive. I was curious if GitHub's version of the
code also returned "-1" from a function whose return type is unsigned,
and in fact our version of this function (called diff2idx()) returns an
'int'.

Practically speaking that's probably OK, since we are unlikely to have
so many active paths anyway (or if we did, we'd likely have other
problems to deal with ;-)), but it is gross nonetheless.

I wonder if we should inline all of this into its own function and not
expose the bitmap index for a given path at all? Perhaps something like
the following (on top of Junio's other suggestions to get this compiling
under DEVELOPER=1):

--- 8< ---
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index e9050485a9..ce3ae4fb3d 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -226,13 +226,16 @@ static void last_modified_diff(struct diff_queue_struct *q,
 	}
 }

-static size_t path_idx(struct last_modified *lm, char *path)
+static void last_modified_mark_non_treesame(struct last_modified *lm,
+					    struct bitmap *active_c,
+					    char *path)
 {
 	struct last_modified_entry *ent;
 	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
 					  struct last_modified_entry, hashent);

-	return ent ? ent->diff_idx : -1;
+	if (ent && bitmap_get(active_c, ent->diff_idx))
+		bitmap_set(lm->scratch, ent->diff_idx);
 }

 static void pass_to_parent(struct last_modified *lm,
@@ -303,9 +306,7 @@ static void process_parent(struct last_modified *lm,
 	 */
 	for (i = 0; i < diff_queued_diff.nr; i++) {
 		struct diff_filepair *fp = diff_queued_diff.queue[i];
-		size_t k = path_idx(lm, fp->two->path);
-		if (0 <= k && bitmap_get(active_c, k))
-			bitmap_set(lm->scratch, k);
+		last_modified_mark_non_treesame(lm, active_c, fp->two->path);
 	}
 	for (i = 0; i < lm->all_paths_nr; i++) {
 		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-22  0:26     ` Taylor Blau
@ 2025-10-22  0:28       ` Taylor Blau
  2025-10-22  3:48       ` Junio C Hamano
  1 sibling, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-10-22  0:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

On Tue, Oct 21, 2025 at 08:26:42PM -0400, Taylor Blau wrote:
> -static size_t path_idx(struct last_modified *lm, char *path)
> +static void last_modified_mark_non_treesame(struct last_modified *lm,
> +					    struct bitmap *active_c,
> +					    char *path)

Err... this should be const, but otherwise I stand by what I wrote ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-22  0:26     ` Taylor Blau
  2025-10-22  0:28       ` Taylor Blau
@ 2025-10-22  3:48       ` Junio C Hamano
  2025-10-24  0:01         ` Taylor Blau
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2025-10-22  3:48 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 21, 2025 at 10:52:11AM -0700, Junio C Hamano wrote:
>> > +		struct diff_filepair *fp = diff_queued_diff.queue[i];
>> > +		size_t k = path_idx(lm, fp->two->path);
>> > +		if (0 <= k && bitmap_get(active_c, k))
>> > +			bitmap_set(lm->scratch, k);
>> > +	}
>>
>> Earlier path_idx() wanted to signal an error by returning negative,
>> but the type is size_t that is unsigned so it cannot do so.  We
>> instead get
>>
>>     builtin/last-modified.c:307:23: error: comparison of unsigned expression in '>= 0' is always true [-Werror=type-limits]
>>       307 |                 if (0 <= k && bitmap_get(active_c, k))
>>           |                       ^~
>> ...
> Yeah, this is a true positive. I was curious if GitHub's version of the
> code also returned "-1" from a function whose return type is unsigned,
> and in fact our version of this function (called diff2idx()) returns an
> 'int'.

Yes, I think the tool is doing the right thing here, unlike "hey you
are comparing int with size_t" we saw earlier, and is giving us a
useful diagnosis.

> Practically speaking that's probably OK, since we are unlikely to have
> so many active paths anyway (or if we did, we'd likely have other
> problems to deal with ;-)), but it is gross nonetheless.

The case path_idx() returns -1 is an error case, not "there are too
many paths we are following" case.  I do not see what relevance the
number of active paths has here.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3] last-modified: implement faster algorithm
  2025-10-21 12:56 ` [PATCH v2] " Toon Claes
  2025-10-21 17:52   ` Junio C Hamano
@ 2025-10-23  7:50   ` Toon Claes
  2025-10-24  0:03     ` Taylor Blau
  2025-11-03 15:47   ` [PATCH v4] " Toon Claes
  2 siblings, 1 reply; 39+ messages in thread
From: Toon Claes @ 2025-10-23  7:50 UTC (permalink / raw)
  To: git
  Cc: Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee, Toon Claes

The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.

While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:

  1. There are many entries of interest, in which case there is simply
     a lot of work to do.

  2. Or, there are (even a few) entries which have not been updated in a
     long time, and so we must walk through a lot of history in order to
     find a commit that touches that path.

This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.

The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.

More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.

Then, while the queue is not empty, do the following:

  1. Pop an element, say, 'c', off of the queue, making sure that 'c'
     isn't reachable by anything in the '--not' set.

  2. For each parent 'p' (with index 'parent_i') of 'c', do the
     following:

     a. Compute the diff between 'c' and 'p'.
     b. Pass any active paths that are TREESAME from 'c' to 'p'.
     c. If 'p' has any active paths, push it onto the queue.

  3. Any path that remains active on 'c' is associated to that commit.

This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.

To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.

Comparing the performance of this new algorithm shows about a 2.5x
improvement on git.git:

    Benchmark 1: master   no bloom
      Time (mean ± σ):      2.868 s ±  0.023 s    [User: 2.811 s, System: 0.051 s]
      Range (min … max):    2.847 s …  2.926 s    10 runs

    Benchmark 2: master with bloom
      Time (mean ± σ):     949.9 ms ±  15.2 ms    [User: 907.6 ms, System: 39.5 ms]
      Range (min … max):   933.3 ms … 971.2 ms    10 runs

    Benchmark 3: HEAD     no bloom
      Time (mean ± σ):     782.0 ms ±   6.3 ms    [User: 740.7 ms, System: 39.2 ms]
      Range (min … max):   776.4 ms … 798.2 ms    10 runs

    Benchmark 4: HEAD   with bloom
      Time (mean ± σ):     307.1 ms ±   1.7 ms    [User: 276.4 ms, System: 29.9 ms]
      Range (min … max):   303.7 ms … 309.5 ms    10 runs

    Summary
      HEAD   with bloom ran
        2.55 ± 0.02 times faster than HEAD     no bloom
        3.09 ± 0.05 times faster than master with bloom
        9.34 ± 0.09 times faster than master   no bloom

In short, the existing implementation is comparably fast *with* Bloom
filters as the new implementation is *without* Bloom filters. So, most
repositories should get a dramatic speed-up by just deploying this (even
without computing Bloom filters), and all repositories should get faster
still when computing Bloom filters.

When comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is even 5 times better:

    Benchmark 1: master
      Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
      Range (min … max):    4.308 s …  4.509 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
      Range (min … max):   810.6 ms … 881.2 ms    10 runs

    Summary
      HEAD ran
        5.29 ± 0.16 times faster than master

As an added benefit, results are more consistent now. For example
implementation in 'master' gives:

    $ git log --max-count=1 --format=%H -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de

    $ git last-modified -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h

    $ git last-modified | grep pkt-line.h
    5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h

With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.

One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.

Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
The subcommand git-last-modified(1) was based on the patches shared by
Taylor and the folks at GitHub[1]. That version used an alternative
implementation to make it "go faster". When I was working on upstreaming
those patches, I dropped the patches[2] for this implementation, because
I didn't see significant improvements.

This series revives those changes. I did more thorough deep dive through
the code and the algorithm and got the code working a lot faster. The
benchmark results can be found in the commit message.

Some changes compared to GitHub's version include:

 * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
   `struct commit_active_paths`.

 * Removed shortcut code that handled the case when commit and parent
   are fully treesame, and instead always checked 'active_c' whether the
   next parent is worth looking at.

 * Modified comments and commit message to make the algorithm more
   clear (at least to me).

 * Mentioned the use of PARENT1 and PARENT2 in object.h.

 * Removed the use of any global variables.

 * Less conditions are checked in mark_path() because the hashmap of
   'paths' is considered the single-source of truth.

 * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
   hashmap no more.

[1]: https://lore.kernel.org/git/Z+XJ+1L3PnC9Dyba@nand.local/
[2]: https://lore.kernel.org/git/20250630-toon-new-blame-tree-v3-0-3516025dc3bc@iotcl.com/
---
Changes in v3:
- Make code cleanly compile with -Wsign-compare
- Remove path_idx() and instead inline the code in the loop. This fixes
  the sign comparison issue and the function was only used in one place
  anyway.
- Make all the memory leaks go away.
- Small tweaks in naming and code comments.
- Link to v2: https://lore.kernel.org/r/20251021-b4-toon-last-modified-faster-v2-1-f6dcbc26fc5c@iotcl.com

Changes in v2:
- Add benchmark results comparing repositories with and without Bloom
  filters in commit message.
- Add Stolee and Taylor in the commit message trailers.
- Fix segfault by checking if 'oid' is set in mark_path().
- Remove hashmap lookup in pass_to_parent().
- Remove manually calling diff_free_filepair().
- Rename commit slab bitmap to "active_paths_bitmap".
- Link to v1: https://lore.kernel.org/r/20251016-b4-toon-last-modified-faster-v1-1-85dca8a29e5c@iotcl.com

Range-diff against v2:

1:  e131d0df85 ! 1:  8b1a5cc055 last-modified: implement faster algorithm
    @@ builtin/last-modified.c: static int last_modified_entry_hashcmp(const void *unus
      }

     +/*
    -+ * Hold a bitmap for each commit we're working with. Each bit represents a path
    -+ * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
    ++ * Hold a bitmap for each commit we're working with. In the bitmap, each bit
    ++ * represents a path in `lm->all_paths`. An active bit indicates the path still
    ++ * needs to be associated to a commit.
     + */
    -+define_commit_slab(commit_bitmaps, struct bitmap *);
    ++define_commit_slab(active_paths_for_commit, struct bitmap *);
     +
      struct last_modified {
      	struct hashmap paths;
    @@ builtin/last-modified.c: static int last_modified_entry_hashcmp(const void *unus
     +
     +	const char **all_paths;
     +	size_t all_paths_nr;
    -+	struct commit_bitmaps active_paths_bitmap;
    ++	struct active_paths_for_commit active_paths;
     +
    -+	/* 'scratch' bitmap to avoid allocating every proccess_parent() */
    ++	/* 'scratch' to avoid allocating a bitmap every process_parent() */
     +	struct bitmap *scratch;
      };

    -+static struct bitmap *get_bitmap(struct last_modified *lm, struct commit *c)
    ++static struct bitmap *active_paths_for(struct last_modified *lm, struct commit *c)
     +{
    -+	struct bitmap **bitmap = commit_bitmaps_at(&lm->active_paths_bitmap, c);
    ++	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
     +	if (!*bitmap)
    -+		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD);
    ++		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD + 1);
     +
     +	return *bitmap;
     +}
    ++
    ++static void active_paths_free(struct last_modified *lm, struct commit *c)
    ++{
    ++	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
    ++	if (*bitmap) {
    ++		bitmap_free(*bitmap);
    ++		*bitmap = NULL;
    ++	}
    ++}
     +
      static void last_modified_release(struct last_modified *lm)
      {
    @@ builtin/last-modified.c: static void last_modified_diff(struct diff_queue_struct
      }

     -static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
    -+static size_t path_idx(struct last_modified *lm, char *path)
    -+{
    -+	struct last_modified_entry *ent;
    -+	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
    -+					  struct last_modified_entry, hashent);
    -+
    -+	return ent ? ent->diff_idx : -1;
    -+}
    -+
    -+static void pass_to_parent(struct last_modified *lm,
    -+			   struct bitmap *c,
    ++static void pass_to_parent(struct bitmap *c,
     +			   struct bitmap *p,
     +			   size_t pos)
     +{
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +			   struct commit *c, struct bitmap *active_c,
     +			   struct commit *parent, int parent_i)
     +{
    -+	size_t i;
     +	struct bitmap *active_p;
     +
     +	repo_parse_commit(lm->rev.repo, parent);
    -+	active_p = get_bitmap(lm, parent);
    ++	active_p = active_paths_for(lm, parent);
     +
     +	/*
     +	 * The first time entering this function for this commit (i.e. first parent)
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +	 * First, collect all paths that are *not* TREESAME in 'scratch'.
     +	 * Then, pass paths that *are* TREESAME and active to the parent.
     +	 */
    -+	for (i = 0; i < diff_queued_diff.nr; i++) {
    ++	for (int i = 0; i < diff_queued_diff.nr; i++) {
     +		struct diff_filepair *fp = diff_queued_diff.queue[i];
    -+		size_t k = path_idx(lm, fp->two->path);
    -+		if (0 <= k && bitmap_get(active_c, k))
    -+			bitmap_set(lm->scratch, k);
    ++		const char *path = fp->two->path;
    ++		struct last_modified_entry *ent =
    ++			hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
    ++						    struct last_modified_entry, hashent);
    ++		if (ent) {
    ++			size_t k = ent->diff_idx;
    ++			if (bitmap_get(active_c, k))
    ++				bitmap_set(lm->scratch, k);
    ++		}
     +	}
    -+	for (i = 0; i < lm->all_paths_nr; i++) {
    ++	for (size_t i = 0; i < lm->all_paths_nr; i++) {
     +		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
    -+			pass_to_parent(lm, active_c, active_p, i);
    ++			pass_to_parent(active_c, active_p, i);
     +	}
     +
     +	/*
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +		parent->object.flags |= PARENT1;
     +		prio_queue_put(queue, parent);
     +	}
    ++	if (!(parent->object.flags & PARENT1))
    ++		active_paths_free(lm, parent);
     +
     +	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
     +	diff_queue_clear(&diff_queued_diff);
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     -			BUG("paths remaining beyond boundary in last-modified");
     +	max_count = lm->rev.max_count;
     +
    -+	init_commit_bitmaps(&lm->active_paths_bitmap);
    ++	init_active_paths_for_commit(&lm->active_paths);
     +	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
     +
     +	/*
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +			prio_queue_put(&queue, c);
     +			c->object.flags |= PARENT1;
     +
    -+			active = get_bitmap(lm, c);
    ++			active = active_paths_for(lm, c);
     +			for (size_t i = 0; i < lm->all_paths_nr; i++)
     +				bitmap_set(active, i);
     +		}
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +		int parent_i;
     +		struct commit_list *p;
     +		struct commit *c = prio_queue_get(&queue);
    -+		struct bitmap *active_c = get_bitmap(lm, c);
    ++		struct bitmap *active_c = active_paths_for(lm, c);
     +
     +		if ((0 <= max_count && max_count < ++queue_popped) ||
     +		    (c->object.flags & PARENT2)) {
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm

     -		log_tree_commit(&lm->rev, data.commit);
     +cleanup:
    -+		bitmap_free(active_c);
    ++		active_paths_free(lm, c);
      	}

     +	if (hashmap_get_size(&lm->paths))
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +
     +	clear_prio_queue(&not_queue);
     +	clear_prio_queue(&queue);
    -+	clear_commit_bitmaps(&lm->active_paths_bitmap);
    ++	clear_active_paths_for_commit(&lm->active_paths);
     +	bitmap_free(lm->scratch);
     +
      	return 0;
---
 builtin/last-modified.c  | 250 ++++++++++++++++++++++++++++++++++++++++++++---
 object.h                 |   1 +
 t/t8020-last-modified.sh |   2 +-
 3 files changed, 237 insertions(+), 16 deletions(-)

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index ae8b36a2c3..c271d9585b 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -2,26 +2,32 @@
 #include "bloom.h"
 #include "builtin.h"
 #include "commit-graph.h"
+#include "commit-slab.h"
 #include "commit.h"
 #include "config.h"
-#include "environment.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "environment.h"
+#include "ewah/ewok.h"
 #include "hashmap.h"
 #include "hex.h"
-#include "log-tree.h"
 #include "object-name.h"
 #include "object.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "quote.h"
 #include "repository.h"
 #include "revision.h"
 
+/* Remember to update object flag allocation in object.h */
+#define PARENT1 (1u<<16) /* used instead of SEEN */
+#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
+
 struct last_modified_entry {
 	struct hashmap_entry hashent;
 	struct object_id oid;
 	struct bloom_key key;
+	size_t diff_idx;
 	const char path[FLEX_ARRAY];
 };
 
@@ -37,13 +43,45 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
 	return strcmp(ent1->path, path ? path : ent2->path);
 }
 
+/*
+ * Hold a bitmap for each commit we're working with. In the bitmap, each bit
+ * represents a path in `lm->all_paths`. An active bit indicates the path still
+ * needs to be associated to a commit.
+ */
+define_commit_slab(active_paths_for_commit, struct bitmap *);
+
 struct last_modified {
 	struct hashmap paths;
 	struct rev_info rev;
 	bool recursive;
 	bool show_trees;
+
+	const char **all_paths;
+	size_t all_paths_nr;
+	struct active_paths_for_commit active_paths;
+
+	/* 'scratch' to avoid allocating a bitmap every process_parent() */
+	struct bitmap *scratch;
 };
 
+static struct bitmap *active_paths_for(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD + 1);
+
+	return *bitmap;
+}
+
+static void active_paths_free(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
+	if (*bitmap) {
+		bitmap_free(*bitmap);
+		*bitmap = NULL;
+	}
+}
+
 static void last_modified_release(struct last_modified *lm)
 {
 	struct hashmap_iter iter;
@@ -54,6 +92,8 @@ static void last_modified_release(struct last_modified *lm)
 
 	hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
 	release_revisions(&lm->rev);
+
+	free(lm->all_paths);
 }
 
 struct last_modified_callback_data {
@@ -146,7 +186,7 @@ static void mark_path(const char *path, const struct object_id *oid,
 	 * Is it arriving at a version of interest, or is it from a side branch
 	 * which did not contribute to the final state?
 	 */
-	if (!oideq(oid, &ent->oid))
+	if (oid && !oideq(oid, &ent->oid))
 		return;
 
 	last_modified_emit(data->lm, path, data->commit);
@@ -196,7 +236,17 @@ static void last_modified_diff(struct diff_queue_struct *q,
 	}
 }
 
-static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
+static void pass_to_parent(struct bitmap *c,
+			   struct bitmap *p,
+			   size_t pos)
+{
+	bitmap_unset(c, pos);
+	bitmap_set(p, pos);
+}
+
+static bool maybe_changed_path(struct last_modified *lm,
+			       struct commit *origin,
+			       struct bitmap *active)
 {
 	struct bloom_filter *filter;
 	struct last_modified_entry *ent;
@@ -213,6 +263,9 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 		return true;
 
 	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		if (active && !bitmap_get(active, ent->diff_idx))
+			continue;
+
 		if (bloom_filter_contains(filter, &ent->key,
 					  lm->rev.bloom_filter_settings))
 			return true;
@@ -220,42 +273,202 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 	return false;
 }
 
+static void process_parent(struct last_modified *lm,
+			   struct prio_queue *queue,
+			   struct commit *c, struct bitmap *active_c,
+			   struct commit *parent, int parent_i)
+{
+	struct bitmap *active_p;
+
+	repo_parse_commit(lm->rev.repo, parent);
+	active_p = active_paths_for(lm, parent);
+
+	/*
+	 * The first time entering this function for this commit (i.e. first parent)
+	 * see if Bloom filters will tell us it's worth to do the diff.
+	 */
+	if (parent_i || maybe_changed_path(lm, c, active_c)) {
+		diff_tree_oid(&parent->object.oid,
+			      &c->object.oid, "", &lm->rev.diffopt);
+		diffcore_std(&lm->rev.diffopt);
+	}
+
+	/*
+	 * Test each path for TREESAME-ness against the parent. If a path is
+	 * TREESAME, pass it on to this parent.
+	 *
+	 * First, collect all paths that are *not* TREESAME in 'scratch'.
+	 * Then, pass paths that *are* TREESAME and active to the parent.
+	 */
+	for (int i = 0; i < diff_queued_diff.nr; i++) {
+		struct diff_filepair *fp = diff_queued_diff.queue[i];
+		const char *path = fp->two->path;
+		struct last_modified_entry *ent =
+			hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
+						    struct last_modified_entry, hashent);
+		if (ent) {
+			size_t k = ent->diff_idx;
+			if (bitmap_get(active_c, k))
+				bitmap_set(lm->scratch, k);
+		}
+	}
+	for (size_t i = 0; i < lm->all_paths_nr; i++) {
+		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
+			pass_to_parent(active_c, active_p, i);
+	}
+
+	/*
+	 * If parent has any active paths, put it on the queue (if not already).
+	 */
+	if (!bitmap_is_empty(active_p) && !(parent->object.flags & PARENT1)) {
+		parent->object.flags |= PARENT1;
+		prio_queue_put(queue, parent);
+	}
+	if (!(parent->object.flags & PARENT1))
+		active_paths_free(lm, parent);
+
+	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
+	diff_queue_clear(&diff_queued_diff);
+}
+
 static int last_modified_run(struct last_modified *lm)
 {
+	int max_count, queue_popped = 0;
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
+	struct commit_list *list;
 	struct last_modified_callback_data data = { .lm = lm };
 
 	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	lm->rev.diffopt.format_callback = last_modified_diff;
 	lm->rev.diffopt.format_callback_data = &data;
+	lm->rev.no_walk = 1;
 
 	prepare_revision_walk(&lm->rev);
 
-	while (hashmap_get_size(&lm->paths)) {
-		data.commit = get_revision(&lm->rev);
-		if (!data.commit)
-			BUG("paths remaining beyond boundary in last-modified");
+	max_count = lm->rev.max_count;
+
+	init_active_paths_for_commit(&lm->active_paths);
+	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
+
+	/*
+	 * lm->rev.commits holds the set of boundary commits for our walk.
+	 *
+	 * Loop through each such commit, and place it in the appropriate queue.
+	 */
+	for (list = lm->rev.commits; list; list = list->next) {
+		struct commit *c = list->item;
+
+		if (c->object.flags & BOTTOM) {
+			prio_queue_put(&not_queue, c);
+			c->object.flags |= PARENT2;
+		} else if (!(c->object.flags & PARENT1)) {
+			/*
+			 * If the commit is a starting point (and hasn't been
+			 * seen yet), then initialize the set of interesting
+			 * paths, too.
+			 */
+			struct bitmap *active;
+
+			prio_queue_put(&queue, c);
+			c->object.flags |= PARENT1;
+
+			active = active_paths_for(lm, c);
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
+				bitmap_set(active, i);
+		}
+	}
 
-		if (data.commit->object.flags & BOUNDARY) {
+	while (queue.nr) {
+		int parent_i;
+		struct commit_list *p;
+		struct commit *c = prio_queue_get(&queue);
+		struct bitmap *active_c = active_paths_for(lm, c);
+
+		if ((0 <= max_count && max_count < ++queue_popped) ||
+		    (c->object.flags & PARENT2)) {
+			/*
+			 * Either a boundary commit, or we have already seen too
+			 * many others. Either way, stop here.
+			 */
+			c->object.flags |= PARENT2 | BOUNDARY;
+			data.commit = c;
 			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
-				      &data.commit->object.oid, "",
-				      &lm->rev.diffopt);
+				      &c->object.oid,
+				      "", &lm->rev.diffopt);
 			diff_flush(&lm->rev.diffopt);
+			goto cleanup;
+		}
 
-			break;
+		/*
+		 * Otherwise, make sure that 'c' isn't reachable from anything
+		 * in the '--not' queue.
+		 */
+		repo_parse_commit(lm->rev.repo, c);
+
+		while (not_queue.nr) {
+			struct commit_list *np;
+			struct commit *n = prio_queue_get(&not_queue);
+
+			repo_parse_commit(lm->rev.repo, n);
+
+			for (np = n->parents; np; np = np->next) {
+				if (!(np->item->object.flags & PARENT2)) {
+					prio_queue_put(&not_queue, np->item);
+					np->item->object.flags |= PARENT2;
+				}
+			}
+
+			if (commit_graph_generation(n) < commit_graph_generation(c))
+				break;
 		}
 
-		if (!maybe_changed_path(lm, data.commit))
-			continue;
+		/*
+		 * Look at each parent and pass on each path that's TREESAME
+		 * with that parent. Stop early when no active paths remain.
+		 */
+		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
+			process_parent(lm, &queue,
+				       c, active_c,
+				       p->item, parent_i);
+
+			if (bitmap_is_empty(active_c))
+				break;
+		}
+
+		/*
+		 * Paths that remain active, or not TREESAME with any parent,
+		 * were changed by 'c'.
+		 */
+		if (!bitmap_is_empty(active_c))  {
+			data.commit = c;
+			for (size_t i = 0; i < lm->all_paths_nr; i++) {
+				if (bitmap_get(active_c, i))
+					mark_path(lm->all_paths[i], NULL, &data);
+			}
+		}
 
-		log_tree_commit(&lm->rev, data.commit);
+cleanup:
+		active_paths_free(lm, c);
 	}
 
+	if (hashmap_get_size(&lm->paths))
+		BUG("paths remaining beyond boundary in last-modified");
+
+	clear_prio_queue(&not_queue);
+	clear_prio_queue(&queue);
+	clear_active_paths_for_commit(&lm->active_paths);
+	bitmap_free(lm->scratch);
+
 	return 0;
 }
 
 static int last_modified_init(struct last_modified *lm, struct repository *r,
 			      const char *prefix, int argc, const char **argv)
 {
+	struct hashmap_iter iter;
+	struct last_modified_entry *ent;
+
 	hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
 
 	repo_init_revisions(r, &lm->rev, prefix);
@@ -280,6 +493,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
 	if (populate_paths_from_revs(lm) < 0)
 		return error(_("unable to setup last-modified"));
 
+	lm->all_paths = xcalloc(hashmap_get_size(&lm->paths), sizeof(const char *));
+	lm->all_paths_nr = 0;
+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		ent->diff_idx = lm->all_paths_nr++;
+		lm->all_paths[ent->diff_idx] = ent->path;
+	}
+
 	return 0;
 }
 
diff --git a/object.h b/object.h
index 8c3c1c46e1..fa504a09c0 100644
--- a/object.h
+++ b/object.h
@@ -75,6 +75,7 @@ void object_array_init(struct object_array *array);
  * http-push.c:                          11-----14
  * commit-graph.c:                                15
  * commit-reach.c:                                  16-----19
+ * builtin/last-modified.c:                         1617
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * bloom.c:                                                    2122
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 61f00bc15c..a4c1114ee2 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -57,9 +57,9 @@ test_expect_success 'last-modified recursive' '
 
 test_expect_success 'last-modified recursive with show-trees' '
 	check_last_modified -r -t <<-\EOF
-	3 a
 	3 a/b
 	3 a/b/file
+	3 a
 	2 a/file
 	1 file
 	EOF

---
base-commit: 133d151831d32bdcc02422599a3f26cef44f929b
change-id: 20251009-b4-toon-last-modified-faster-4c8956a95261


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-21 17:52   ` Junio C Hamano
  2025-10-22  0:26     ` Taylor Blau
@ 2025-10-23  8:01     ` Toon Claes
  1 sibling, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-23  8:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>> +static size_t path_idx(struct last_modified *lm, char *path)
>> +{
>> +	struct last_modified_entry *ent;
>> +	ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
>> +					  struct last_modified_entry, hashent);
>> +
>> +	return ent ? ent->diff_idx : -1;
>> +}
>
> size_t is unsigned and cannot reutrn -1 sanely, unless the caller
> knows that ((size_t)-1) signals an error.  The compiler warns, and
> we compile with -Werror, so we end up getting
>
>     builtin/last-modified.c: In function 'path_idx':
>     builtin/last-modified.c:235:38: error: operand of '?:' changes signedness from 'int' to 'size_t' {aka 'long unsigned int'} due to unsignedness of other operand [-Werror=sign-compare]
>       235 |         return ent ? ent->diff_idx : -1;
>           |                                      ^~

Whoops, I didn't realize I wasn't compiling with the proper DEVELOPER
settings. I recently set up a new computer and didn't bring over all
configuration correctly.

I just sent out v3 and in that version I decided to solve this issue
differently: inline the code from path_idx() in only place it was used.


-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-17 12:07   ` Toon Claes
  2025-10-21  9:04     ` Toon Claes
  2025-10-21 13:00     ` Toon Claes
@ 2025-10-23 23:56     ` Taylor Blau
  2025-10-27 15:48       ` Toon Claes
  2 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-10-23 23:56 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee

On Fri, Oct 17, 2025 at 02:07:18PM +0200, Toon Claes wrote:
> > Regardless of how you handle the above, I think that the commit slab
> > name here is a little generic. I guess it's OK since this is only
> > visible within this compilation unit, but perhaps something like
> > "active_paths_bitmap" would be more descriptive.
>
> I struggled a lot naming this thing, so I'm open to suggestions.

I think calling it "active_paths_bitmap" or similar conveys that this is
somehow specific to "active paths", and since the slab is static within
the last-modified builtin, I think that's fine. It's a little word-y, so
if you have better ideas with fewer characters, I'm open to just about
anything.

> > Likewise, I wonder if we should have elemtype here be just 'struct
> > bitmap'. Unfortunately I don't think the EWAH code has a function like:
> >
> >     void bitmap_init(struct bitmap *);
> >
> > and only has ones that allocate for us. So we may consider adding one,
> > or creating a dummy bitmap and copying its contents, or otherwise.
> >
> >>  struct last_modified {
> >>  	struct hashmap paths;
> >>  	struct rev_info rev;
> >>  	bool recursive;
> >>  	bool show_trees;
> >> +
> >> +	const char **all_paths;
> >> +	size_t all_paths_nr;
> >
> > I wonder if all_paths should be a strvec here? I think that this code
> > was all written when the type was called argv_array (hilariously, that
> > change took place towards the end of July, 2020, and the --go-faster
> > code where this patch came from was written just a couple of weeks
> > earlier.)
>
> Ahha, that might be a good idea. This might allow us to get rid of the
> hashmap, which stops us from storing the paths twice. Not sure what the
> impact on the performance would be, because the hashmap now is valuable
> for path_idx() lookups.

Looking at this a little further, I don't think I consider this worth
doing. Using a strvec here is awkward since last_modified_init() really
wants to assign lm->all_paths based on each entry's diff_idx, which is
not what strvec.h is designed for.

You *could* use a string_list, and shove a pointer to the
last_modified_entry struct in the ->util field, but that is also
inefficient since we remove paths from our hashmap as we mark them, and
repeating that in a string_list would be wasteful.

> > In the GitHub version of this patch, we pass all active paths to the
> > parent, assign the PARENT1 flag if it doesn't already have it, and put
> > it in the queue as well.
> >
> > In your version, we'd skip past the next for-loop, and do the same
> > pass-to-parent dance below, along with inserting the parent into the
> > prio queue.
>
> This is the "shortcut" I'm mentioning in my cover letter. In my testing
> it seemed it didn't provide any performance gains to keep it. I consider
> less code better code, so I left it out.

Fair enough. I don't think that less code here results in a lack of
clarity, so this seems alright to me. But in general I am not sure that
I always agree that less code is better ;-).

> > So I think that this is all functionally equivalent, but I had to work
> > through a little bit of the details here, mostly since I haven't looked
> > at or thought about this code in many years ;-).
> >
> >>  static int last_modified_run(struct last_modified *lm)
> >>  {
> >> +	int max_count, queue_popped = 0;
> >> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> >> +	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> >> +	struct commit_list *list;
> >>  	struct last_modified_callback_data data = { .lm = lm };
> >>
> >>  	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> >>  	lm->rev.diffopt.format_callback = last_modified_diff;
> >>  	lm->rev.diffopt.format_callback_data = &data;
> >> +	lm->rev.no_walk = 1;
> >
> > This one is new relative to the original patch. Why set no_walk here?
>
> This comes from
> https://github.com/ttaylorr/git/commit/e8ea49705873d28f64b815bd00d14bdf6d48ca4d
>
> Well, it basically squashes various commits together. There are various
> commits doing different things here. I don't think it's valuable for
> anyone to see the full history of the iterations at GitHub, that's why I
> squashed it in.
>
> Would you consider it better to not set `no_walk`?

Hah, I forgot that we added this one later on. Adding it here in this
patch makes sense to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-21  9:04     ` Toon Claes
@ 2025-10-23 23:59       ` Taylor Blau
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2025-10-23 23:59 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee, Jeff King

On Tue, Oct 21, 2025 at 11:04:05AM +0200, Toon Claes wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
>
> >> Nice, I am glad to see that we are using a bitmap here rather than the
> >> hacky 'char *' that we had originally written. I seem to remember that
> >> there was a tiny slow-down when using bitmaps, but can't find the
> >> discussion anymore. (It wasn't in the internal PR that I originally
> >> opened, and I no longer can read messages that far back in history.)
> >>
> >> It might be worth benchmarking here to see if using a 'char *' is
> >> faster. Of course, that's 8x worse in terms of memory usage, but not a
> >> huge deal given both the magnitude and typical number of directory
> >> elements (you'd need 1024^2 entries in a single tree to occupy even a
> >> single MiB of heap).
>
> Using ewah bitmaps is slightly faster, although the difference is almost
> neglible.
>
>     Benchmark 1: bitmap-ewah
>       Time (mean ± σ):     793.1 ms ±   6.2 ms    [User: 755.1 ms, System: 35.2 ms]
>       Range (min … max):   784.7 ms … 804.8 ms    10 runs
>
>     Benchmark 2: bitmap-chars
>       Time (mean ± σ):     808.9 ms ±  11.2 ms    [User: 770.8 ms, System: 35.4 ms]
>       Range (min … max):   800.2 ms … 830.5 ms    10 runs
>
>     Summary
>       bitmap-ewah ran
>         1.02 ± 0.02 times faster than bitmap-chars

OK, makes sense, though just to clarify, "bitmap-ewah" is just a
bog-standard "struct bitmap", right? That happens to come from the EWAH
implementation, but the bitmap itself is not being EWAH compressed,
right?

> And ewah bitmap being more memory efficient, it makes more sense to keep
> using those.
>
> >> Likewise, I wonder if we should have elemtype here be just 'struct
> >> bitmap'. Unfortunately I don't think the EWAH code has a function like:
> >>
> >>     void bitmap_init(struct bitmap *);
> >>
> >> and only has ones that allocate for us. So we may consider adding one,
> >> or creating a dummy bitmap and copying its contents, or otherwise.
>
> I've done some testing, and to do so I've made bitmap_grow() public.
>
>     Benchmark 1: bitmap-as-pointers
>       Time (mean ± σ):     783.7 ms ±   8.9 ms    [User: 744.1 ms, System: 37.5 ms]
>       Range (min … max):   774.4 ms … 803.4 ms    10 runs
>
>     Benchmark 2: bitmap-as-values
>       Time (mean ± σ):     856.7 ms ±  10.5 ms    [User: 816.0 ms, System: 38.1 ms]
>       Range (min … max):   845.7 ms … 872.5 ms    10 runs
>
>     Summary
>       bitmap-as-pointers ran
>         1.09 ± 0.02 times faster than bitmap-as-values
>
> It seems using ewah bitmaps as pointers is faster than using bitmaps as
> values. I must admit I'm surprised as well, but in case you want to
> double check, here's the patch:

I think this makes sense; the pointers are half as wide as a struct
bitmap. Even though we're going through another layer of indirection, I
think that the smaller slab footprint results in better cache locality,
and ultimately faster code. Thanks for testing it out.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-22  3:48       ` Junio C Hamano
@ 2025-10-24  0:01         ` Taylor Blau
  2025-10-24  0:37           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-10-24  0:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

On Tue, Oct 21, 2025 at 08:48:31PM -0700, Junio C Hamano wrote:
> > Practically speaking that's probably OK, since we are unlikely to have
> > so many active paths anyway (or if we did, we'd likely have other
> > problems to deal with ;-)), but it is gross nonetheless.
>
> The case path_idx() returns -1 is an error case, not "there are too
> many paths we are following" case.  I do not see what relevance the
> number of active paths has here.

I just meant that we are unlikely to ever have so many active paths at
once that (size_t)-1 would actually have a valid entry, or IOW that
active_paths_nr is smaller than 2^32-1.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3] last-modified: implement faster algorithm
  2025-10-23  7:50   ` [PATCH v3] " Toon Claes
@ 2025-10-24  0:03     ` Taylor Blau
  2025-10-27  7:03       ` Toon Claes
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-10-24  0:03 UTC (permalink / raw)
  To: Toon Claes
  Cc: git, Karthik Nayak, Justin Tobler, D. Ben Knoble, Derrick Stolee

On Thu, Oct 23, 2025 at 09:50:14AM +0200, Toon Claes wrote:
> ---
>  builtin/last-modified.c  | 250 ++++++++++++++++++++++++++++++++++++++++++++---
>  object.h                 |   1 +
>  t/t8020-last-modified.sh |   2 +-
>  3 files changed, 237 insertions(+), 16 deletions(-)

This version looks good to me, thanks for porting it forward and
cleaning it up, so it has my

    Acked-by: Taylor Blau <me@ttaylorr.com>

As an aside, do you plan on upstreaming the blame-tree cache, (which I
imagine would get renamed to last-modified cache)? Just curious.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-24  0:01         ` Taylor Blau
@ 2025-10-24  0:37           ` Junio C Hamano
  2025-10-27 19:22             ` Taylor Blau
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2025-10-24  0:37 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 21, 2025 at 08:48:31PM -0700, Junio C Hamano wrote:
>> > Practically speaking that's probably OK, since we are unlikely to have
>> > so many active paths anyway (or if we did, we'd likely have other
>> > problems to deal with ;-)), but it is gross nonetheless.
>>
>> The case path_idx() returns -1 is an error case, not "there are too
>> many paths we are following" case.  I do not see what relevance the
>> number of active paths has here.
>
> I just meant that we are unlikely to ever have so many active paths at
> once that (size_t)-1 would actually have a valid entry, or IOW that
> active_paths_nr is smaller than 2^32-1.

So?  If path_idx() needed to signal an error, it will return (size_t)-1,
but as the compiler correctly caught, the code as written, i.e.

	k = path_idx(...);
	if (0 <= k) {
		/* did not error so we can safely use k */
		...
	}
 
is outright buggy. I do not see why it is "practically speaking
that's probably OK".  It certainly does not matter if the number we
will receive in 'k' in the success case is expected to be
small---the problem is only for an error case.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3] last-modified: implement faster algorithm
  2025-10-24  0:03     ` Taylor Blau
@ 2025-10-27  7:03       ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-27  7:03 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Karthik Nayak, Justin Tobler, D. Ben Knoble, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> This version looks good to me, thanks for porting it forward and
> cleaning it up, so it has my
>
>     Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks for confirming!

> As an aside, do you plan on upstreaming the blame-tree cache, (which I
> imagine would get renamed to last-modified cache)? Just curious.

Currently that's not planned.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] last-modified: implement faster algorithm
  2025-10-23 23:56     ` Taylor Blau
@ 2025-10-27 15:48       ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-27 15:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Karthik Nayak, Justin Tobler, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

>> Ahha, that might be a good idea. This might allow us to get rid of the
>> hashmap, which stops us from storing the paths twice. Not sure what the
>> impact on the performance would be, because the hashmap now is valuable
>> for path_idx() lookups.
>
> Looking at this a little further, I don't think I consider this worth
> doing. Using a strvec here is awkward since last_modified_init() really
> wants to assign lm->all_paths based on each entry's diff_idx, which is
> not what strvec.h is designed for.

Yeah, I decided to leave it as is for now.

> You *could* use a string_list, and shove a pointer to the
> last_modified_entry struct in the ->util field, but that is also
> inefficient since we remove paths from our hashmap as we mark them, and
> repeating that in a string_list would be wasteful.

There was another idea I had. Instead of giving each commit a bitmap,
give each commit a hashmap, and pass the hashmap entries across commits.
This allows us to store paths only in once place and removes the need to
allocate a fixed array `all_paths` again.

But in my measurements, this was about 30% slower than the solution I've
submitted in v3. And it also complicates the memory management. So I've
abandonned this idea.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-24  0:37           ` Junio C Hamano
@ 2025-10-27 19:22             ` Taylor Blau
  2025-10-29 13:01               ` Toon Claes
  0 siblings, 1 reply; 39+ messages in thread
From: Taylor Blau @ 2025-10-27 19:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Toon Claes, git, Karthik Nayak, Justin Tobler, D. Ben Knoble,
	Derrick Stolee

On Thu, Oct 23, 2025 at 05:37:25PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Tue, Oct 21, 2025 at 08:48:31PM -0700, Junio C Hamano wrote:
> >> > Practically speaking that's probably OK, since we are unlikely to have
> >> > so many active paths anyway (or if we did, we'd likely have other
> >> > problems to deal with ;-)), but it is gross nonetheless.
> >>
> >> The case path_idx() returns -1 is an error case, not "there are too
> >> many paths we are following" case.  I do not see what relevance the
> >> number of active paths has here.
> >
> > I just meant that we are unlikely to ever have so many active paths at
> > once that (size_t)-1 would actually have a valid entry, or IOW that
> > active_paths_nr is smaller than 2^32-1.
>
> So?  If path_idx() needed to signal an error, it will return (size_t)-1,
> but as the compiler correctly caught, the code as written, i.e.
>
> 	k = path_idx(...);
> 	if (0 <= k) {
> 		/* did not error so we can safely use k */
> 		...
> 	}
>
> is outright buggy. I do not see why it is "practically speaking
> that's probably OK".  It certainly does not matter if the number we
> will receive in 'k' in the success case is expected to be
> small---the problem is only for an error case.

I am not saying that we should continue to write "if (0 <= k)", since we
will clearly never take the else branch. I am trying to say that
path_idx() *could* return (size_t)-1, and callers would be able to write
"if (k == (size_t)-1)" to check for that error condition.

My observation was that there are unlikely to be so many active paths at
any one time such that we'd ever want to return (size_t)-1 as a valid
index, and could always use it as an error sentinel.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2] last-modified: implement faster algorithm
  2025-10-27 19:22             ` Taylor Blau
@ 2025-10-29 13:01               ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-10-29 13:01 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano
  Cc: git, Karthik Nayak, Justin Tobler, D. Ben Knoble, Derrick Stolee

Taylor Blau <me@ttaylorr.com> writes:

> I am not saying that we should continue to write "if (0 <= k)", since we
> will clearly never take the else branch. I am trying to say that
> path_idx() *could* return (size_t)-1, and callers would be able to write
> "if (k == (size_t)-1)" to check for that error condition.
>
> My observation was that there are unlikely to be so many active paths at
> any one time such that we'd ever want to return (size_t)-1 as a valid
> index, and could always use it as an error sentinel.

Yes, I agree. If the list grows to (size_t)-1 entries, I think we also
run into other issues.

Anyhow, in the v3 I've submitted, I resolved it differently so no
awkward type cast is needed.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4] last-modified: implement faster algorithm
  2025-10-21 12:56 ` [PATCH v2] " Toon Claes
  2025-10-21 17:52   ` Junio C Hamano
  2025-10-23  7:50   ` [PATCH v3] " Toon Claes
@ 2025-11-03 15:47   ` Toon Claes
  2025-11-03 16:44     ` Junio C Hamano
  2025-11-19 11:34     ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
  2 siblings, 2 replies; 39+ messages in thread
From: Toon Claes @ 2025-11-03 15:47 UTC (permalink / raw)
  To: git, toon
  Cc: Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee, Junio C Hamano

The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.

While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:

  1. There are many entries of interest, in which case there is simply
     a lot of work to do.

  2. Or, there are (even a few) entries which have not been updated in a
     long time, and so we must walk through a lot of history in order to
     find a commit that touches that path.

This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.

The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.

More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.

Then, while the queue is not empty, do the following:

  1. Pop an element, say, 'c', off of the queue, making sure that 'c'
     isn't reachable by anything in the '--not' set.

  2. For each parent 'p' (with index 'parent_i') of 'c', do the
     following:

     a. Compute the diff between 'c' and 'p'.
     b. Pass any active paths that are TREESAME from 'c' to 'p'.
     c. If 'p' has any active paths, push it onto the queue.

  3. Any path that remains active on 'c' is associated to that commit.

This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.

To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.

Comparing the performance of this new algorithm shows about a 2.5x
improvement on git.git:

    Benchmark 1: master   no bloom
      Time (mean ± σ):      2.868 s ±  0.023 s    [User: 2.811 s, System: 0.051 s]
      Range (min … max):    2.847 s …  2.926 s    10 runs

    Benchmark 2: master with bloom
      Time (mean ± σ):     949.9 ms ±  15.2 ms    [User: 907.6 ms, System: 39.5 ms]
      Range (min … max):   933.3 ms … 971.2 ms    10 runs

    Benchmark 3: HEAD     no bloom
      Time (mean ± σ):     782.0 ms ±   6.3 ms    [User: 740.7 ms, System: 39.2 ms]
      Range (min … max):   776.4 ms … 798.2 ms    10 runs

    Benchmark 4: HEAD   with bloom
      Time (mean ± σ):     307.1 ms ±   1.7 ms    [User: 276.4 ms, System: 29.9 ms]
      Range (min … max):   303.7 ms … 309.5 ms    10 runs

    Summary
      HEAD   with bloom ran
        2.55 ± 0.02 times faster than HEAD     no bloom
        3.09 ± 0.05 times faster than master with bloom
        9.34 ± 0.09 times faster than master   no bloom

In short, the existing implementation is comparably fast *with* Bloom
filters as the new implementation is *without* Bloom filters. So, most
repositories should get a dramatic speed-up by just deploying this (even
without computing Bloom filters), and all repositories should get faster
still when computing Bloom filters.

When comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is even 5 times better:

    Benchmark 1: master
      Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
      Range (min … max):    4.308 s …  4.509 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
      Range (min … max):   810.6 ms … 881.2 ms    10 runs

    Summary
      HEAD ran
        5.29 ± 0.16 times faster than master

As an added benefit, results are more consistent now. For example
implementation in 'master' gives:

    $ git log --max-count=1 --format=%H -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de

    $ git last-modified -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h

    $ git last-modified | grep pkt-line.h
    5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h

With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.

One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.

Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>

---
The subcommand git-last-modified(1) was based on the patches shared by
Taylor and the folks at GitHub[1]. That version used an alternative
implementation to make it "go faster". When I was working on upstreaming
those patches, I dropped the patches[2] for this implementation, because
I didn't see significant improvements.

This series revives those changes. I did more thorough deep dive through
the code and the algorithm and got the code working a lot faster. The
benchmark results can be found in the commit message.

Some changes compared to GitHub's version include:

 * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
   `struct commit_active_paths`.

 * Removed shortcut code that handled the case when commit and parent
   are fully treesame, and instead always checked 'active_c' whether the
   next parent is worth looking at.

 * Modified comments and commit message to make the algorithm more
   clear (at least to me).

 * Mentioned the use of PARENT1 and PARENT2 in object.h.

 * Removed the use of any global variables.

 * Less conditions are checked in mark_path() because the hashmap of
   'paths' is considered the single-source of truth.

 * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
   hashmap no more.

[1]: https://lore.kernel.org/git/Z+XJ+1L3PnC9Dyba@nand.local/
[2]: https://lore.kernel.org/git/20250630-toon-new-blame-tree-v3-0-3516025dc3bc@iotcl.com/
---
Changes in v4:
- Use CALLOC_ARRAY() instead of xcalloc() as identified by Junio using
  'make coccicheck'.
- Small formatting changes.
- Link to v3: https://lore.kernel.org/all/20251023-b4-toon-last-modified-faster-v3-1-40a4ddbbadec@iotcl.com

Changes in v3:
- Make code cleanly compile with -Wsign-compare
- Remove path_idx() and instead inline the code in the loop. This fixes
  the sign comparison issue and the function was only used in one place
  anyway.
- Make all the memory leaks go away.
- Small tweaks in naming and code comments.
- Link to v2: https://lore.kernel.org/r/20251021-b4-toon-last-modified-faster-v2-1-f6dcbc26fc5c@iotcl.com

Changes in v2:
- Add benchmark results comparing repositories with and without Bloom
  filters in commit message.
- Add Stolee and Taylor in the commit message trailers.
- Fix segfault by checking if 'oid' is set in mark_path().
- Remove hashmap lookup in pass_to_parent().
- Remove manually calling diff_free_filepair().
- Rename commit slab bitmap to "active_paths_bitmap".
- Link to v1: https://lore.kernel.org/r/20251016-b4-toon-last-modified-faster-v1-1-85dca8a29e5c@iotcl.com

Range-diff against v3:

1:  2d86f0406a ! 1:  9991283ec2 last-modified: implement faster algorithm
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +				bitmap_set(lm->scratch, k);
     +		}
     +	}
    -+	for (size_t i = 0; i < lm->all_paths_nr; i++) {
    ++	for (size_t i = 0; i < lm->all_paths_nr; i++)
     +		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
     +			pass_to_parent(active_c, active_p, i);
    -+	}
     +
     +	/*
     +	 * If parent has any active paths, put it on the queue (if not already).
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +			 * paths, too.
     +			 */
     +			struct bitmap *active;
    -+
    +
    +-		if (data.commit->object.flags & BOUNDARY) {
     +			prio_queue_put(&queue, c);
     +			c->object.flags |= PARENT1;
     +
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +				bitmap_set(active, i);
     +		}
     +	}
    -
    --		if (data.commit->object.flags & BOUNDARY) {
    ++
     +	while (queue.nr) {
     +		int parent_i;
     +		struct commit_list *p;
    @@ builtin/last-modified.c: static bool maybe_changed_path(struct last_modified *lm
     +		 * Paths that remain active, or not TREESAME with any parent,
     +		 * were changed by 'c'.
     +		 */
    -+		if (!bitmap_is_empty(active_c))  {
    ++		if (!bitmap_is_empty(active_c)) {
     +			data.commit = c;
    -+			for (size_t i = 0; i < lm->all_paths_nr; i++) {
    ++			for (size_t i = 0; i < lm->all_paths_nr; i++)
     +				if (bitmap_get(active_c, i))
     +					mark_path(lm->all_paths[i], NULL, &data);
    -+			}
     +		}

     -		log_tree_commit(&lm->rev, data.commit);
    @@ builtin/last-modified.c: static int last_modified_init(struct last_modified *lm,
      	if (populate_paths_from_revs(lm) < 0)
      		return error(_("unable to setup last-modified"));

    -+	lm->all_paths = xcalloc(hashmap_get_size(&lm->paths), sizeof(const char *));
    ++	CALLOC_ARRAY(lm->all_paths, hashmap_get_size(&lm->paths));
     +	lm->all_paths_nr = 0;
     +	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
     +		ent->diff_idx = lm->all_paths_nr++;

---
 builtin/last-modified.c  | 248 ++++++++++++++++++++++++++++++++++++---
 object.h                 |   1 +
 t/t8020-last-modified.sh |   2 +-
 3 files changed, 235 insertions(+), 16 deletions(-)

diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index ae8b36a2c3..3028abd25e 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -2,26 +2,32 @@
 #include "bloom.h"
 #include "builtin.h"
 #include "commit-graph.h"
+#include "commit-slab.h"
 #include "commit.h"
 #include "config.h"
-#include "environment.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "environment.h"
+#include "ewah/ewok.h"
 #include "hashmap.h"
 #include "hex.h"
-#include "log-tree.h"
 #include "object-name.h"
 #include "object.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "quote.h"
 #include "repository.h"
 #include "revision.h"

+/* Remember to update object flag allocation in object.h */
+#define PARENT1 (1u<<16) /* used instead of SEEN */
+#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
+
 struct last_modified_entry {
 	struct hashmap_entry hashent;
 	struct object_id oid;
 	struct bloom_key key;
+	size_t diff_idx;
 	const char path[FLEX_ARRAY];
 };

@@ -37,13 +43,45 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
 	return strcmp(ent1->path, path ? path : ent2->path);
 }

+/*
+ * Hold a bitmap for each commit we're working with. In the bitmap, each bit
+ * represents a path in `lm->all_paths`. An active bit indicates the path still
+ * needs to be associated to a commit.
+ */
+define_commit_slab(active_paths_for_commit, struct bitmap *);
+
 struct last_modified {
 	struct hashmap paths;
 	struct rev_info rev;
 	bool recursive;
 	bool show_trees;
+
+	const char **all_paths;
+	size_t all_paths_nr;
+	struct active_paths_for_commit active_paths;
+
+	/* 'scratch' to avoid allocating a bitmap every process_parent() */
+	struct bitmap *scratch;
 };

+static struct bitmap *active_paths_for(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(lm->all_paths_nr / BITS_IN_EWORD + 1);
+
+	return *bitmap;
+}
+
+static void active_paths_free(struct last_modified *lm, struct commit *c)
+{
+	struct bitmap **bitmap = active_paths_for_commit_at(&lm->active_paths, c);
+	if (*bitmap) {
+		bitmap_free(*bitmap);
+		*bitmap = NULL;
+	}
+}
+
 static void last_modified_release(struct last_modified *lm)
 {
 	struct hashmap_iter iter;
@@ -54,6 +92,8 @@ static void last_modified_release(struct last_modified *lm)

 	hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
 	release_revisions(&lm->rev);
+
+	free(lm->all_paths);
 }

 struct last_modified_callback_data {
@@ -146,7 +186,7 @@ static void mark_path(const char *path, const struct object_id *oid,
 	 * Is it arriving at a version of interest, or is it from a side branch
 	 * which did not contribute to the final state?
 	 */
-	if (!oideq(oid, &ent->oid))
+	if (oid && !oideq(oid, &ent->oid))
 		return;

 	last_modified_emit(data->lm, path, data->commit);
@@ -196,7 +236,17 @@ static void last_modified_diff(struct diff_queue_struct *q,
 	}
 }

-static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
+static void pass_to_parent(struct bitmap *c,
+			   struct bitmap *p,
+			   size_t pos)
+{
+	bitmap_unset(c, pos);
+	bitmap_set(p, pos);
+}
+
+static bool maybe_changed_path(struct last_modified *lm,
+			       struct commit *origin,
+			       struct bitmap *active)
 {
 	struct bloom_filter *filter;
 	struct last_modified_entry *ent;
@@ -213,6 +263,9 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 		return true;

 	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		if (active && !bitmap_get(active, ent->diff_idx))
+			continue;
+
 		if (bloom_filter_contains(filter, &ent->key,
 					  lm->rev.bloom_filter_settings))
 			return true;
@@ -220,42 +273,200 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
 	return false;
 }

+static void process_parent(struct last_modified *lm,
+			   struct prio_queue *queue,
+			   struct commit *c, struct bitmap *active_c,
+			   struct commit *parent, int parent_i)
+{
+	struct bitmap *active_p;
+
+	repo_parse_commit(lm->rev.repo, parent);
+	active_p = active_paths_for(lm, parent);
+
+	/*
+	 * The first time entering this function for this commit (i.e. first parent)
+	 * see if Bloom filters will tell us it's worth to do the diff.
+	 */
+	if (parent_i || maybe_changed_path(lm, c, active_c)) {
+		diff_tree_oid(&parent->object.oid,
+			      &c->object.oid, "", &lm->rev.diffopt);
+		diffcore_std(&lm->rev.diffopt);
+	}
+
+	/*
+	 * Test each path for TREESAME-ness against the parent. If a path is
+	 * TREESAME, pass it on to this parent.
+	 *
+	 * First, collect all paths that are *not* TREESAME in 'scratch'.
+	 * Then, pass paths that *are* TREESAME and active to the parent.
+	 */
+	for (int i = 0; i < diff_queued_diff.nr; i++) {
+		struct diff_filepair *fp = diff_queued_diff.queue[i];
+		const char *path = fp->two->path;
+		struct last_modified_entry *ent =
+			hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
+						    struct last_modified_entry, hashent);
+		if (ent) {
+			size_t k = ent->diff_idx;
+			if (bitmap_get(active_c, k))
+				bitmap_set(lm->scratch, k);
+		}
+	}
+	for (size_t i = 0; i < lm->all_paths_nr; i++)
+		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
+			pass_to_parent(active_c, active_p, i);
+
+	/*
+	 * If parent has any active paths, put it on the queue (if not already).
+	 */
+	if (!bitmap_is_empty(active_p) && !(parent->object.flags & PARENT1)) {
+		parent->object.flags |= PARENT1;
+		prio_queue_put(queue, parent);
+	}
+	if (!(parent->object.flags & PARENT1))
+		active_paths_free(lm, parent);
+
+	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
+	diff_queue_clear(&diff_queued_diff);
+}
+
 static int last_modified_run(struct last_modified *lm)
 {
+	int max_count, queue_popped = 0;
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
+	struct commit_list *list;
 	struct last_modified_callback_data data = { .lm = lm };

 	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
 	lm->rev.diffopt.format_callback = last_modified_diff;
 	lm->rev.diffopt.format_callback_data = &data;
+	lm->rev.no_walk = 1;

 	prepare_revision_walk(&lm->rev);

-	while (hashmap_get_size(&lm->paths)) {
-		data.commit = get_revision(&lm->rev);
-		if (!data.commit)
-			BUG("paths remaining beyond boundary in last-modified");
+	max_count = lm->rev.max_count;
+
+	init_active_paths_for_commit(&lm->active_paths);
+	lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
+
+	/*
+	 * lm->rev.commits holds the set of boundary commits for our walk.
+	 *
+	 * Loop through each such commit, and place it in the appropriate queue.
+	 */
+	for (list = lm->rev.commits; list; list = list->next) {
+		struct commit *c = list->item;
+
+		if (c->object.flags & BOTTOM) {
+			prio_queue_put(&not_queue, c);
+			c->object.flags |= PARENT2;
+		} else if (!(c->object.flags & PARENT1)) {
+			/*
+			 * If the commit is a starting point (and hasn't been
+			 * seen yet), then initialize the set of interesting
+			 * paths, too.
+			 */
+			struct bitmap *active;

-		if (data.commit->object.flags & BOUNDARY) {
+			prio_queue_put(&queue, c);
+			c->object.flags |= PARENT1;
+
+			active = active_paths_for(lm, c);
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
+				bitmap_set(active, i);
+		}
+	}
+
+	while (queue.nr) {
+		int parent_i;
+		struct commit_list *p;
+		struct commit *c = prio_queue_get(&queue);
+		struct bitmap *active_c = active_paths_for(lm, c);
+
+		if ((0 <= max_count && max_count < ++queue_popped) ||
+		    (c->object.flags & PARENT2)) {
+			/*
+			 * Either a boundary commit, or we have already seen too
+			 * many others. Either way, stop here.
+			 */
+			c->object.flags |= PARENT2 | BOUNDARY;
+			data.commit = c;
 			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
-				      &data.commit->object.oid, "",
-				      &lm->rev.diffopt);
+				      &c->object.oid,
+				      "", &lm->rev.diffopt);
 			diff_flush(&lm->rev.diffopt);
+			goto cleanup;
+		}

-			break;
+		/*
+		 * Otherwise, make sure that 'c' isn't reachable from anything
+		 * in the '--not' queue.
+		 */
+		repo_parse_commit(lm->rev.repo, c);
+
+		while (not_queue.nr) {
+			struct commit_list *np;
+			struct commit *n = prio_queue_get(&not_queue);
+
+			repo_parse_commit(lm->rev.repo, n);
+
+			for (np = n->parents; np; np = np->next) {
+				if (!(np->item->object.flags & PARENT2)) {
+					prio_queue_put(&not_queue, np->item);
+					np->item->object.flags |= PARENT2;
+				}
+			}
+
+			if (commit_graph_generation(n) < commit_graph_generation(c))
+				break;
 		}

-		if (!maybe_changed_path(lm, data.commit))
-			continue;
+		/*
+		 * Look at each parent and pass on each path that's TREESAME
+		 * with that parent. Stop early when no active paths remain.
+		 */
+		for (p = c->parents, parent_i = 0; p; p = p->next, parent_i++) {
+			process_parent(lm, &queue,
+				       c, active_c,
+				       p->item, parent_i);
+
+			if (bitmap_is_empty(active_c))
+				break;
+		}
+
+		/*
+		 * Paths that remain active, or not TREESAME with any parent,
+		 * were changed by 'c'.
+		 */
+		if (!bitmap_is_empty(active_c)) {
+			data.commit = c;
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
+				if (bitmap_get(active_c, i))
+					mark_path(lm->all_paths[i], NULL, &data);
+		}

-		log_tree_commit(&lm->rev, data.commit);
+cleanup:
+		active_paths_free(lm, c);
 	}

+	if (hashmap_get_size(&lm->paths))
+		BUG("paths remaining beyond boundary in last-modified");
+
+	clear_prio_queue(&not_queue);
+	clear_prio_queue(&queue);
+	clear_active_paths_for_commit(&lm->active_paths);
+	bitmap_free(lm->scratch);
+
 	return 0;
 }

 static int last_modified_init(struct last_modified *lm, struct repository *r,
 			      const char *prefix, int argc, const char **argv)
 {
+	struct hashmap_iter iter;
+	struct last_modified_entry *ent;
+
 	hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);

 	repo_init_revisions(r, &lm->rev, prefix);
@@ -280,6 +491,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
 	if (populate_paths_from_revs(lm) < 0)
 		return error(_("unable to setup last-modified"));

+	CALLOC_ARRAY(lm->all_paths, hashmap_get_size(&lm->paths));
+	lm->all_paths_nr = 0;
+	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
+		ent->diff_idx = lm->all_paths_nr++;
+		lm->all_paths[ent->diff_idx] = ent->path;
+	}
+
 	return 0;
 }

diff --git a/object.h b/object.h
index 8c3c1c46e1..fa504a09c0 100644
--- a/object.h
+++ b/object.h
@@ -75,6 +75,7 @@ void object_array_init(struct object_array *array);
  * http-push.c:                          11-----14
  * commit-graph.c:                                15
  * commit-reach.c:                                  16-----19
+ * builtin/last-modified.c:                         1617
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * bloom.c:                                                    2122
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 61f00bc15c..a4c1114ee2 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -57,9 +57,9 @@ test_expect_success 'last-modified recursive' '

 test_expect_success 'last-modified recursive with show-trees' '
 	check_last_modified -r -t <<-\EOF
-	3 a
 	3 a/b
 	3 a/b/file
+	3 a
 	2 a/file
 	1 file
 	EOF
--
2.51.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] last-modified: implement faster algorithm
  2025-11-03 15:47   ` [PATCH v4] " Toon Claes
@ 2025-11-03 16:44     ` Junio C Hamano
  2025-11-04 15:08       ` Toon Claes
  2025-11-19 11:34     ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2025-11-03 16:44 UTC (permalink / raw)
  To: Toon Claes
  Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee

Toon Claes <toon@iotcl.com> writes:

> Changes in v4:
> - Use CALLOC_ARRAY() instead of xcalloc() as identified by Junio using
>   'make coccicheck'.
> - Small formatting changes.
> - Link to v3: https://lore.kernel.org/all/20251023-b4-toon-last-modified-faster-v3-1-40a4ddbbadec@iotcl.com

Sorry, but this came way after I started today's integration cycle,
which included merging the fixed-up version to 'next'.  I saw some
"let's drop {} around the body of if/for with a single statement"
changes but each of these single statements was not a simple
statement but an if-statement, and personally I feel that it is
clearer to enclose them in {} (in other words, once the code is
written in that way, it is not worth the patch noise to go and fix
them).  The only regrettable thing without v4 is the double space
between ") {" in the second hunk below X-<, but perhaps it is minor
enough to leave it to the next person who touches the vicinity of
this code ;-).  If you feel strongly about them, please send in an
incremental updates, but as I said, I do not think it is necessary.

Thanks.

diff --git a/b0ecbdc540 b/3028abd25e
index b0ecbdc540..3028abd25e 100644
--- a/b0ecbdc540
+++ b/3028abd25e
@@ -312,10 +312,9 @@ static void process_parent(struct last_modified *lm,
 				bitmap_set(lm->scratch, k);
 		}
 	}
-	for (size_t i = 0; i < lm->all_paths_nr; i++) {
+	for (size_t i = 0; i < lm->all_paths_nr; i++)
 		if (bitmap_get(active_c, i) && !bitmap_get(lm->scratch, i))
 			pass_to_parent(active_c, active_p, i);
-	}

 	/*
 	 * If parent has any active paths, put it on the queue (if not already).
@@ -440,12 +439,11 @@ static int last_modified_run(struct last_modified *lm)
 		 * Paths that remain active, or not TREESAME with any parent,
 		 * were changed by 'c'.
 		 */
-		if (!bitmap_is_empty(active_c))  {
+		if (!bitmap_is_empty(active_c)) {
 			data.commit = c;
-			for (size_t i = 0; i < lm->all_paths_nr; i++) {
+			for (size_t i = 0; i < lm->all_paths_nr; i++)
 				if (bitmap_get(active_c, i))
 					mark_path(lm->all_paths[i], NULL, &data);
-			}
 		}

 cleanup:

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] last-modified: implement faster algorithm
  2025-11-03 16:44     ` Junio C Hamano
@ 2025-11-04 15:08       ` Toon Claes
  0 siblings, 0 replies; 39+ messages in thread
From: Toon Claes @ 2025-11-04 15:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Karthik Nayak, Justin Tobler, Taylor Blau, D. Ben Knoble,
	Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>> Changes in v4:
>> - Use CALLOC_ARRAY() instead of xcalloc() as identified by Junio using
>>   'make coccicheck'.
>> - Small formatting changes.
>> - Link to v3: https://lore.kernel.org/all/20251023-b4-toon-last-modified-faster-v3-1-40a4ddbbadec@iotcl.com
>
> Sorry, but this came way after I started today's integration cycle,

No worries.

> which included merging the fixed-up version to 'next'.

Well, thank you.

> I saw some
> "let's drop {} around the body of if/for with a single statement"
> changes but each of these single statements was not a simple
> statement but an if-statement, and personally I feel that it is
> clearer to enclose them in {} (in other words, once the code is
> written in that way, it is not worth the patch noise to go and fix
> them).

Understood.

> The only regrettable thing without v4 is the double space
> between ") {" in the second hunk below X-<, but perhaps it is minor
> enough to leave it to the next person who touches the vicinity of
> this code ;-). If you feel strongly about them, please send in an
> incremental updates, but as I said, I do not think it is necessary.

It remained undiscovered by various reviewers and myself. I only noticed
because I saw CI style-check mention this (also the reason I touched the
{} on the for loops). So I don't mind leaving it this way.

> Thanks.

<3

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-03 15:47   ` [PATCH v4] " Toon Claes
  2025-11-03 16:44     ` Junio C Hamano
@ 2025-11-19 11:34     ` Anders Kaseorg
  2025-11-19 13:49       ` Kristoffer Haugsbakk
  1 sibling, 1 reply; 39+ messages in thread
From: Anders Kaseorg @ 2025-11-19 11:34 UTC (permalink / raw)
  To: Toon Claes, git

t8020-last-modified.sh is broken on the s390x platform in v2.52.0. 
Bisection implicates commit 2a04e8c293766a4976ceceb4c663dd2963e0339e 
“last-modified: implement faster algorithm” [1].

$ uname -m
s390x

$ ./t8020-last-modified.sh
ok 1 - setup
ok 2 - cannot run last-modified on two trees
ok 3 - last-modified non-recursive
ok 4 - last-modified recursive
ok 5 - last-modified recursive with show-trees
ok 6 - last-modified non-recursive with show-trees
ok 7 - last-modified subdir
ok 8 - last-modified subdir recursive
ok 9 - last-modified from non-HEAD commit
ok 10 - last-modified from subdir defaults to root
ok 11 - last-modified from subdir uses relative pathspecs
ok 12 - limit last-modified traversal by count
ok 13 - limit last-modified traversal by commit
ok 14 - only last-modified files in the current tree
ok 15 - subdirectory modified via merge
not ok 16 - cross merge boundaries in blaming
#	
#		git checkout HEAD^0 &&
#		git rm -rf . &&
#		test_commit m1 &&
#		git checkout HEAD^ &&
#		git rm -rf . &&
#		test_commit m2 &&
#		git merge m1 &&
#		check_last_modified <<-\EOF
#		m2 m2.t
#		m1 m1.t
#		EOF
#	
ok 17 - last-modified merge for resolved conflicts
ok 18 - last-modified merge ignores content from branch
not ok 19 - last-modified merge undoes changes
#	
#		git checkout HEAD^0 &&
#		git rm -rf . &&
#		test_commit b1 file A &&
#		test_commit b2 file B &&
#		test_commit b3 file C &&
#		test_commit b4 file D &&
#		git checkout b2 &&
#		test_commit b5 file2 2 &&
#		git checkout b4 &&
#		git merge --no-commit --no-ff b5 &&
#		git checkout b2 -- file &&
#		git merge --continue &&
#		check_last_modified <<-\EOF
#		b5 file2
#		b2 file
#		EOF
#	
ok 20 - last-modified complains about unknown arguments
# failed 2 among 20 test(s)
1..20

$ ./t8020-last-modified.sh --verbose
[…]

expecting success of 8020.16 'cross merge boundaries in blaming':
	git checkout HEAD^0 &&
	git rm -rf . &&
	test_commit m1 &&
	git checkout HEAD^ &&
	git rm -rf . &&
	test_commit m2 &&
	git merge m1 &&
	check_last_modified <<-\EOF
	m2 m2.t
	m1 m1.t
	EOF

Note: switching to 'HEAD^0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

   git switch -c <new-branch-name>

Or undo this operation with:

   git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 08525b6 remove a
rm 'file'
[detached HEAD 53e7187] m1
  Author: A U Thor <author@example.com>
  2 files changed, 1 insertion(+), 1 deletion(-)
  delete mode 100644 file
  create mode 100644 m1.t
Previous HEAD position was 53e7187 m1
HEAD is now at 08525b6 remove a
rm 'file'
[detached HEAD 9b81a41] m2
  Author: A U Thor <author@example.com>
  2 files changed, 1 insertion(+), 1 deletion(-)
  delete mode 100644 file
  create mode 100644 m2.t
Merge made by the 'ort' strategy.
  m1.t | 1 +
  1 file changed, 1 insertion(+)
  create mode 100644 m1.t
--- expect	2025-11-19 11:28:57.966106204 +0000
+++ actual	2025-11-19 11:28:58.110112543 +0000
@@ -1,2 +1,2 @@
+ac29b6e974b49803f1c6ec5a705d1bf7dbfa7d2f m1.t
  m2 m2.t
-m1 m1.t
not ok 16 - cross merge boundaries in blaming
#	
#		git checkout HEAD^0 &&
#		git rm -rf . &&
#		test_commit m1 &&
#		git checkout HEAD^ &&
#		git rm -rf . &&
#		test_commit m2 &&
#		git merge m1 &&
#		check_last_modified <<-\EOF
#		m2 m2.t
#		m1 m1.t
#		EOF
#	

[…]

expecting success of 8020.19 'last-modified merge undoes changes':
	git checkout HEAD^0 &&
	git rm -rf . &&
	test_commit b1 file A &&
	test_commit b2 file B &&
	test_commit b3 file C &&
	test_commit b4 file D &&
	git checkout b2 &&
	test_commit b5 file2 2 &&
	git checkout b4 &&
	git merge --no-commit --no-ff b5 &&
	git checkout b2 -- file &&
	git merge --continue &&
	check_last_modified <<-\EOF
	b5 file2
	b2 file
	EOF

HEAD is now at 7b0602a Merge tag 'a4' into HEAD
rm 'file'
[detached HEAD c48d0f6] b1
  Author: A U Thor <author@example.com>
  1 file changed, 1 insertion(+), 1 deletion(-)
[detached HEAD ee27b37] b2
  Author: A U Thor <author@example.com>
  1 file changed, 1 insertion(+), 1 deletion(-)
[detached HEAD c90ce7d] b3
  Author: A U Thor <author@example.com>
  1 file changed, 1 insertion(+), 1 deletion(-)
[detached HEAD 317a439] b4
  Author: A U Thor <author@example.com>
  1 file changed, 1 insertion(+), 1 deletion(-)
Previous HEAD position was 317a439 b4
HEAD is now at ee27b37 b2
[detached HEAD 5526d49] b5
  Author: A U Thor <author@example.com>
  1 file changed, 1 insertion(+)
  create mode 100644 file2
Previous HEAD position was 5526d49 b5
HEAD is now at 317a439 b4
Automatic merge went well; stopped before committing as requested
[detached HEAD da1857e] Merge tag 'b5' into HEAD
  Author: A U Thor <author@example.com>
--- expect	2025-11-19 11:29:03.492349022 +0000
+++ actual	2025-11-19 11:29:03.648355864 +0000
@@ -1,2 +1,2 @@
-b5 file2
-b2 file
+da1857e0652b6f264c0038d684ddecddc273e506 file2
+da1857e0652b6f264c0038d684ddecddc273e506 file
not ok 19 - last-modified merge undoes changes
#	
#		git checkout HEAD^0 &&
#		git rm -rf . &&
#		test_commit b1 file A &&
#		test_commit b2 file B &&
#		test_commit b3 file C &&
#		test_commit b4 file D &&
#		git checkout b2 &&
#		test_commit b5 file2 2 &&
#		git checkout b4 &&
#		git merge --no-commit --no-ff b5 &&
#		git checkout b2 -- file &&
#		git merge --continue &&
#		check_last_modified <<-\EOF
#		b5 file2
#		b2 file
#		EOF
#	

Anders

[1] https://lore.kernel.org/git/20251103154726.26592-1-toon@iotcl.com/


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-19 11:34     ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
@ 2025-11-19 13:49       ` Kristoffer Haugsbakk
  2025-11-19 20:06         ` Anders Kaseorg
  0 siblings, 1 reply; 39+ messages in thread
From: Kristoffer Haugsbakk @ 2025-11-19 13:49 UTC (permalink / raw)
  To: Anders Kaseorg, Toon Claes, git

On Wed, Nov 19, 2025, at 12:34, Anders Kaseorg wrote:
> t8020-last-modified.sh is broken on the s390x platform in v2.52.0.
> Bisection implicates commit 2a04e8c293766a4976ceceb4c663dd2963e0339e
> “last-modified: implement faster algorithm” [1].
>

Does `./t8020-last-modified.sh --verbose` give any interesting output?

>[snip]
>
> [1] https://lore.kernel.org/git/20251103154726.26592-1-toon@iotcl.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-19 13:49       ` Kristoffer Haugsbakk
@ 2025-11-19 20:06         ` Anders Kaseorg
  2025-11-20  8:16           ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Anders Kaseorg @ 2025-11-19 20:06 UTC (permalink / raw)
  To: Kristoffer Haugsbakk, Toon Claes, git

On 11/19/25 05:49, Kristoffer Haugsbakk wrote:
> On Wed, Nov 19, 2025, at 12:34, Anders Kaseorg wrote: >> t8020-last-modified.sh is broken on the s390x platform in v2.52.0. 
 >> Bisection implicates commit >> 
2a04e8c293766a4976ceceb4c663dd2963e0339e “last-modified: implement >> 
faster algorithm” [1]. > > Does `./t8020-last-modified.sh --verbose` 
give any interesting > output?
I quoted that output in my previous message. The failures in subtests 16 
and 19 come with these diffs:

--- expect    2025-11-19 11:28:57.966106204 +0000
+++ actual    2025-11-19 11:28:58.110112543 +0000
@@ -1,2 +1,2 @@
+ac29b6e974b49803f1c6ec5a705d1bf7dbfa7d2f m1.t
  m2 m2.t
-m1 m1.t

[…]

--- expect    2025-11-19 11:29:03.492349022 +0000
+++ actual    2025-11-19 11:29:03.648355864 +0000
@@ -1,2 +1,2 @@
-b5 file2
-b2 file
+da1857e0652b6f264c0038d684ddecddc273e506 file2
+da1857e0652b6f264c0038d684ddecddc273e506 file

Anders


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-19 20:06         ` Anders Kaseorg
@ 2025-11-20  8:16           ` Jeff King
  2025-11-28 16:45             ` Toon Claes
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2025-11-20  8:16 UTC (permalink / raw)
  To: Anders Kaseorg; +Cc: rsbecker, Kristoffer Haugsbakk, Toon Claes, git

On Wed, Nov 19, 2025 at 12:06:35PM -0800, Anders Kaseorg wrote:

> On 11/19/25 05:49, Kristoffer Haugsbakk wrote:
> > On Wed, Nov 19, 2025, at 12:34, Anders Kaseorg wrote: >>
> > t8020-last-modified.sh is broken on the s390x platform in v2.52.0.
> >> Bisection implicates commit >> 2a04e8c293766a4976ceceb4c663dd2963e0339e
> “last-modified: implement >> faster algorithm” [1]. > > Does
> `./t8020-last-modified.sh --verbose` give any interesting > output?
> I quoted that output in my previous message. The failures in subtests 16 and
> 19 come with these diffs:
> 
> --- expect    2025-11-19 11:28:57.966106204 +0000
> +++ actual    2025-11-19 11:28:58.110112543 +0000
> @@ -1,2 +1,2 @@
> +ac29b6e974b49803f1c6ec5a705d1bf7dbfa7d2f m1.t
>  m2 m2.t
> -m1 m1.t
> 
> […]
> 
> --- expect    2025-11-19 11:29:03.492349022 +0000
> +++ actual    2025-11-19 11:29:03.648355864 +0000
> @@ -1,2 +1,2 @@
> -b5 file2
> -b2 file
> +da1857e0652b6f264c0038d684ddecddc273e506 file2
> +da1857e0652b6f264c0038d684ddecddc273e506 file

Interestingly, the commits it returns are merges. E.g., here is the
state after test 16:

  $ git log --oneline --graph
  *   ac29b6e (HEAD) Merge tag 'm1' into HEAD
  |\
  | * 53e7187 (tag: m1) m1
  * | 9b81a41 (tag: m2) m2
  |/
  * 08525b6 (master) remove a
  * 664d121 (tag: 3) 3
  * a732b0c (tag: 2) 2
  * 1edf6f6 (tag: 1) 1

Though it is also the first commit we start traversing from. The same is
true after test 19 (da1857e is the tip of HEAD there). So I am not sure
if the bug is "we are not passing down blame from the merge", or just
"we are not passing down blame at all".

I can't help but notice that this same failure is seen on s390x and HP
NonStop[1], both of which are (I think) big-endian. And not on any of
our usual little-endian platforms.

I don't see anything that looks questionable in terms of casting or
integer handling in the patch that introduced the problem, though.
Probably a long shot, but if you are able to build with "make
SANITIZE=address,undefined" and re-run t8020, that would let us check
for endian-specific memory access issues.

-Peff

[1] https://lore.kernel.org/git/003901dc596c$40bfbd80$c23f3880$@nexbridge.com/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-20  8:16           ` Jeff King
@ 2025-11-28 16:45             ` Toon Claes
  2025-11-28 17:35               ` Kristoffer Haugsbakk
  0 siblings, 1 reply; 39+ messages in thread
From: Toon Claes @ 2025-11-28 16:45 UTC (permalink / raw)
  To: Jeff King, Anders Kaseorg; +Cc: rsbecker, Kristoffer Haugsbakk, git

Jeff King <peff@peff.net> writes:

> On Wed, Nov 19, 2025 at 12:06:35PM -0800, Anders Kaseorg wrote:
>
>> The failures in subtests 16 and 19 come with these diffs:
>> 
>> --- expect    2025-11-19 11:28:57.966106204 +0000
>> +++ actual    2025-11-19 11:28:58.110112543 +0000
>> @@ -1,2 +1,2 @@
>> +ac29b6e974b49803f1c6ec5a705d1bf7dbfa7d2f m1.t
>>  m2 m2.t
>> -m1 m1.t
>> 
>> […]
>> 
>> --- expect    2025-11-19 11:29:03.492349022 +0000
>> +++ actual    2025-11-19 11:29:03.648355864 +0000
>> @@ -1,2 +1,2 @@
>> -b5 file2
>> -b2 file
>> +da1857e0652b6f264c0038d684ddecddc273e506 file2
>> +da1857e0652b6f264c0038d684ddecddc273e506 file

Kristoffer, thank you for reporting this bug. It seems there was a real
bug in git-last-modified, which was uncovered by these tests running on
s390x.

> Interestingly, the commits it returns are merges. E.g., here is the
> state after test 16:
>
>   $ git log --oneline --graph
>   *   ac29b6e (HEAD) Merge tag 'm1' into HEAD
>   |\
>   | * 53e7187 (tag: m1) m1
>   * | 9b81a41 (tag: m2) m2
>   |/
>   * 08525b6 (master) remove a
>   * 664d121 (tag: 3) 3
>   * a732b0c (tag: 2) 2
>   * 1edf6f6 (tag: 1) 1
>
> Though it is also the first commit we start traversing from. The same is
> true after test 19 (da1857e is the tip of HEAD there). So I am not sure
> if the bug is "we are not passing down blame from the merge", or just
> "we are not passing down blame at all".
>
> I can't help but notice that this same failure is seen on s390x and HP
> NonStop[1], both of which are (I think) big-endian. And not on any of
> our usual little-endian platforms.

Peff, thanks for the pointer. It costed me more time than I'd like to
admit, but I've reproduced and debugged the issue to find and fix the
root cause. The problem is bigger than on big-endian only, bug it was
uncovered by this test running on a big-endian system.

Both, I've submitted a bug fix at:
https://lore.kernel.org/git/20251128-toon-big-endian-ci-v1-1-80da0f629c1e@iotcl.com/

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm)
  2025-11-28 16:45             ` Toon Claes
@ 2025-11-28 17:35               ` Kristoffer Haugsbakk
  0 siblings, 0 replies; 39+ messages in thread
From: Kristoffer Haugsbakk @ 2025-11-28 17:35 UTC (permalink / raw)
  To: Toon Claes, Jeff King, Anders Kaseorg; +Cc: rsbecker, git

On Fri, Nov 28, 2025, at 17:45, Toon Claes wrote:
> Jeff King <peff@peff.net> writes:
>[snip]
>>> --- expect    2025-11-19 11:29:03.492349022 +0000
>>> +++ actual    2025-11-19 11:29:03.648355864 +0000
>>> @@ -1,2 +1,2 @@
>>> -b5 file2
>>> -b2 file
>>> +da1857e0652b6f264c0038d684ddecddc273e506 file2
>>> +da1857e0652b6f264c0038d684ddecddc273e506 file
>
> Kristoffer, thank you for reporting this bug. It seems there was a real
> bug in git-last-modified, which was uncovered by these tests running on
> s390x.

Typo. r/Kristoffer/Anders/ :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-11-28 17:36 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16  8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
2025-10-16 18:51 ` Justin Tobler
2025-10-17 10:38   ` Toon Claes
2025-10-16 20:48 ` D. Ben Knoble
2025-10-17 10:45   ` Toon Claes
2025-10-16 23:38 ` Taylor Blau
2025-10-17  6:30   ` Jeff King
2025-10-17 14:54     ` Taylor Blau
2025-10-21  8:20       ` Jeff King
2025-10-17 12:07   ` Toon Claes
2025-10-21  9:04     ` Toon Claes
2025-10-23 23:59       ` Taylor Blau
2025-10-21 13:00     ` Toon Claes
2025-10-23 23:56     ` Taylor Blau
2025-10-27 15:48       ` Toon Claes
2025-10-17  6:37 ` Jeff King
2025-10-17 10:47   ` Toon Claes
2025-10-21 12:56 ` [PATCH v2] " Toon Claes
2025-10-21 17:52   ` Junio C Hamano
2025-10-22  0:26     ` Taylor Blau
2025-10-22  0:28       ` Taylor Blau
2025-10-22  3:48       ` Junio C Hamano
2025-10-24  0:01         ` Taylor Blau
2025-10-24  0:37           ` Junio C Hamano
2025-10-27 19:22             ` Taylor Blau
2025-10-29 13:01               ` Toon Claes
2025-10-23  8:01     ` Toon Claes
2025-10-23  7:50   ` [PATCH v3] " Toon Claes
2025-10-24  0:03     ` Taylor Blau
2025-10-27  7:03       ` Toon Claes
2025-11-03 15:47   ` [PATCH v4] " Toon Claes
2025-11-03 16:44     ` Junio C Hamano
2025-11-04 15:08       ` Toon Claes
2025-11-19 11:34     ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
2025-11-19 13:49       ` Kristoffer Haugsbakk
2025-11-19 20:06         ` Anders Kaseorg
2025-11-20  8:16           ` Jeff King
2025-11-28 16:45             ` Toon Claes
2025-11-28 17:35               ` Kristoffer Haugsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).