From: Taylor Blau <me@ttaylorr.com>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH] last-modified: implement faster algorithm
Date: Thu, 23 Oct 2025 19:56:56 -0400 [thread overview]
Message-ID: <aPrAyJYvWtlvmiEx@nand.local> (raw)
In-Reply-To: <87jz0tu3yh.fsf@iotcl.com>
On Fri, Oct 17, 2025 at 02:07:18PM +0200, Toon Claes wrote:
> > Regardless of how you handle the above, I think that the commit slab
> > name here is a little generic. I guess it's OK since this is only
> > visible within this compilation unit, but perhaps something like
> > "active_paths_bitmap" would be more descriptive.
>
> I struggled a lot naming this thing, so I'm open to suggestions.
I think calling it "active_paths_bitmap" or similar conveys that this is
somehow specific to "active paths", and since the slab is static within
the last-modified builtin, I think that's fine. It's a little word-y, so
if you have better ideas with fewer characters, I'm open to just about
anything.
> > Likewise, I wonder if we should have elemtype here be just 'struct
> > bitmap'. Unfortunately I don't think the EWAH code has a function like:
> >
> > void bitmap_init(struct bitmap *);
> >
> > and only has ones that allocate for us. So we may consider adding one,
> > or creating a dummy bitmap and copying its contents, or otherwise.
> >
> >> struct last_modified {
> >> struct hashmap paths;
> >> struct rev_info rev;
> >> bool recursive;
> >> bool show_trees;
> >> +
> >> + const char **all_paths;
> >> + size_t all_paths_nr;
> >
> > I wonder if all_paths should be a strvec here? I think that this code
> > was all written when the type was called argv_array (hilariously, that
> > change took place towards the end of July, 2020, and the --go-faster
> > code where this patch came from was written just a couple of weeks
> > earlier.)
>
> Ahha, that might be a good idea. This might allow us to get rid of the
> hashmap, which stops us from storing the paths twice. Not sure what the
> impact on the performance would be, because the hashmap now is valuable
> for path_idx() lookups.
Looking at this a little further, I don't think I consider this worth
doing. Using a strvec here is awkward since last_modified_init() really
wants to assign lm->all_paths based on each entry's diff_idx, which is
not what strvec.h is designed for.
You *could* use a string_list, and shove a pointer to the
last_modified_entry struct in the ->util field, but that is also
inefficient since we remove paths from our hashmap as we mark them, and
repeating that in a string_list would be wasteful.
> > In the GitHub version of this patch, we pass all active paths to the
> > parent, assign the PARENT1 flag if it doesn't already have it, and put
> > it in the queue as well.
> >
> > In your version, we'd skip past the next for-loop, and do the same
> > pass-to-parent dance below, along with inserting the parent into the
> > prio queue.
>
> This is the "shortcut" I'm mentioning in my cover letter. In my testing
> it seemed it didn't provide any performance gains to keep it. I consider
> less code better code, so I left it out.
Fair enough. I don't think that less code here results in a lack of
clarity, so this seems alright to me. But in general I am not sure that
I always agree that less code is better ;-).
> > So I think that this is all functionally equivalent, but I had to work
> > through a little bit of the details here, mostly since I haven't looked
> > at or thought about this code in many years ;-).
> >
> >> static int last_modified_run(struct last_modified *lm)
> >> {
> >> + int max_count, queue_popped = 0;
> >> + struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> >> + struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> >> + struct commit_list *list;
> >> struct last_modified_callback_data data = { .lm = lm };
> >>
> >> lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> >> lm->rev.diffopt.format_callback = last_modified_diff;
> >> lm->rev.diffopt.format_callback_data = &data;
> >> + lm->rev.no_walk = 1;
> >
> > This one is new relative to the original patch. Why set no_walk here?
>
> This comes from
> https://github.com/ttaylorr/git/commit/e8ea49705873d28f64b815bd00d14bdf6d48ca4d
>
> Well, it basically squashes various commits together. There are various
> commits doing different things here. I don't think it's valuable for
> anyone to see the full history of the iterations at GitHub, that's why I
> squashed it in.
>
> Would you consider it better to not set `no_walk`?
Hah, I forgot that we added this one later on. Adding it here in this
patch makes sense to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-10-23 23:56 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
2025-10-16 18:51 ` Justin Tobler
2025-10-17 10:38 ` Toon Claes
2025-10-16 20:48 ` D. Ben Knoble
2025-10-17 10:45 ` Toon Claes
2025-10-16 23:38 ` Taylor Blau
2025-10-17 6:30 ` Jeff King
2025-10-17 14:54 ` Taylor Blau
2025-10-21 8:20 ` Jeff King
2025-10-17 12:07 ` Toon Claes
2025-10-21 9:04 ` Toon Claes
2025-10-23 23:59 ` Taylor Blau
2025-10-21 13:00 ` Toon Claes
2025-10-23 23:56 ` Taylor Blau [this message]
2025-10-27 15:48 ` Toon Claes
2025-10-17 6:37 ` Jeff King
2025-10-17 10:47 ` Toon Claes
2025-10-21 12:56 ` [PATCH v2] " Toon Claes
2025-10-21 17:52 ` Junio C Hamano
2025-10-22 0:26 ` Taylor Blau
2025-10-22 0:28 ` Taylor Blau
2025-10-22 3:48 ` Junio C Hamano
2025-10-24 0:01 ` Taylor Blau
2025-10-24 0:37 ` Junio C Hamano
2025-10-27 19:22 ` Taylor Blau
2025-10-29 13:01 ` Toon Claes
2025-10-23 8:01 ` Toon Claes
2025-10-23 7:50 ` [PATCH v3] " Toon Claes
2025-10-24 0:03 ` Taylor Blau
2025-10-27 7:03 ` Toon Claes
2025-11-03 15:47 ` [PATCH v4] " Toon Claes
2025-11-03 16:44 ` Junio C Hamano
2025-11-04 15:08 ` Toon Claes
2025-11-19 11:34 ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
2025-11-19 13:49 ` Kristoffer Haugsbakk
2025-11-19 20:06 ` Anders Kaseorg
2025-11-20 8:16 ` Jeff King
2025-11-28 16:45 ` Toon Claes
2025-11-28 17:35 ` Kristoffer Haugsbakk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPrAyJYvWtlvmiEx@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).