From: Taylor Blau <me@ttaylorr.com>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH] last-modified: implement faster algorithm
Date: Thu, 23 Oct 2025 19:56:56 -0400 [thread overview]
Message-ID: <aPrAyJYvWtlvmiEx@nand.local> (raw)
In-Reply-To: <87jz0tu3yh.fsf@iotcl.com>
On Fri, Oct 17, 2025 at 02:07:18PM +0200, Toon Claes wrote:
> > Regardless of how you handle the above, I think that the commit slab
> > name here is a little generic. I guess it's OK since this is only
> > visible within this compilation unit, but perhaps something like
> > "active_paths_bitmap" would be more descriptive.
>
> I struggled a lot naming this thing, so I'm open to suggestions.
I think calling it "active_paths_bitmap" or similar conveys that this is
somehow specific to "active paths", and since the slab is static within
the last-modified builtin, I think that's fine. It's a little word-y, so
if you have better ideas with fewer characters, I'm open to just about
anything.
> > Likewise, I wonder if we should have elemtype here be just 'struct
> > bitmap'. Unfortunately I don't think the EWAH code has a function like:
> >
> > void bitmap_init(struct bitmap *);
> >
> > and only has ones that allocate for us. So we may consider adding one,
> > or creating a dummy bitmap and copying its contents, or otherwise.
> >
> >> struct last_modified {
> >> struct hashmap paths;
> >> struct rev_info rev;
> >> bool recursive;
> >> bool show_trees;
> >> +
> >> + const char **all_paths;
> >> + size_t all_paths_nr;
> >
> > I wonder if all_paths should be a strvec here? I think that this code
> > was all written when the type was called argv_array (hilariously, that
> > change took place towards the end of July, 2020, and the --go-faster
> > code where this patch came from was written just a couple of weeks
> > earlier.)
>
> Ahha, that might be a good idea. This might allow us to get rid of the
> hashmap, which stops us from storing the paths twice. Not sure what the
> impact on the performance would be, because the hashmap now is valuable
> for path_idx() lookups.
Looking at this a little further, I don't think I consider this worth
doing. Using a strvec here is awkward since last_modified_init() really
wants to assign lm->all_paths based on each entry's diff_idx, which is
not what strvec.h is designed for.
You *could* use a string_list, and shove a pointer to the
last_modified_entry struct in the ->util field, but that is also
inefficient since we remove paths from our hashmap as we mark them, and
repeating that in a string_list would be wasteful.
> > In the GitHub version of this patch, we pass all active paths to the
> > parent, assign the PARENT1 flag if it doesn't already have it, and put
> > it in the queue as well.
> >
> > In your version, we'd skip past the next for-loop, and do the same
> > pass-to-parent dance below, along with inserting the parent into the
> > prio queue.
>
> This is the "shortcut" I'm mentioning in my cover letter. In my testing
> it seemed it didn't provide any performance gains to keep it. I consider
> less code better code, so I left it out.
Fair enough. I don't think that less code here results in a lack of
clarity, so this seems alright to me. But in general I am not sure that
I always agree that less code is better ;-).
> > So I think that this is all functionally equivalent, but I had to work
> > through a little bit of the details here, mostly since I haven't looked
> > at or thought about this code in many years ;-).
> >
> >> static int last_modified_run(struct last_modified *lm)
> >> {
> >> + int max_count, queue_popped = 0;
> >> + struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> >> + struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> >> + struct commit_list *list;
> >> struct last_modified_callback_data data = { .lm = lm };
> >>
> >> lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> >> lm->rev.diffopt.format_callback = last_modified_diff;
> >> lm->rev.diffopt.format_callback_data = &data;
> >> + lm->rev.no_walk = 1;
> >
> > This one is new relative to the original patch. Why set no_walk here?
>
> This comes from
> https://github.com/ttaylorr/git/commit/e8ea49705873d28f64b815bd00d14bdf6d48ca4d
>
> Well, it basically squashes various commits together. There are various
> commits doing different things here. I don't think it's valuable for
> anyone to see the full history of the iterations at GitHub, that's why I
> squashed it in.
>
> Would you consider it better to not set `no_walk`?
Hah, I forgot that we added this one later on. Adding it here in this
patch makes sense to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-10-23 23:56 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
2025-10-16 18:51 ` Justin Tobler
2025-10-17 10:38 ` Toon Claes
2025-10-16 20:48 ` D. Ben Knoble
2025-10-17 10:45 ` Toon Claes
2025-10-16 23:38 ` Taylor Blau
2025-10-17 6:30 ` Jeff King
2025-10-17 14:54 ` Taylor Blau
2025-10-21 8:20 ` Jeff King
2025-10-17 12:07 ` Toon Claes
2025-10-21 9:04 ` Toon Claes
2025-10-23 23:59 ` Taylor Blau
2025-10-21 13:00 ` Toon Claes
2025-10-23 23:56 ` Taylor Blau [this message]
2025-10-27 15:48 ` Toon Claes
2025-10-17 6:37 ` Jeff King
2025-10-17 10:47 ` Toon Claes
2025-10-21 12:56 ` [PATCH v2] " Toon Claes
2025-10-21 17:52 ` Junio C Hamano
2025-10-22 0:26 ` Taylor Blau
2025-10-22 0:28 ` Taylor Blau
2025-10-22 3:48 ` Junio C Hamano
2025-10-24 0:01 ` Taylor Blau
2025-10-24 0:37 ` Junio C Hamano
2025-10-27 19:22 ` Taylor Blau
2025-10-29 13:01 ` Toon Claes
2025-10-23 8:01 ` Toon Claes
2025-10-23 7:50 ` [PATCH v3] " Toon Claes
2025-10-24 0:03 ` Taylor Blau
2025-10-27 7:03 ` Toon Claes
2025-11-03 15:47 ` [PATCH v4] " Toon Claes
2025-11-03 16:44 ` Junio C Hamano
2025-11-04 15:08 ` Toon Claes
2025-11-19 11:34 ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
2025-11-19 13:49 ` Kristoffer Haugsbakk
2025-11-19 20:06 ` Anders Kaseorg
2025-11-20 8:16 ` Jeff King
2025-11-28 16:45 ` Toon Claes
2025-11-28 17:35 ` Kristoffer Haugsbakk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPrAyJYvWtlvmiEx@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.