From: Taylor Blau <me@ttaylorr.com>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org,
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified
Date: Thu, 17 Jul 2025 20:02:37 -0400 [thread overview]
Message-ID: <aHmPHcNQYlhGo8JB@nand.local> (raw)
In-Reply-To: <20250716133518.1788126-1-toon@iotcl.com>
On Wed, Jul 16, 2025 at 03:35:13PM +0200, Toon Claes wrote:
> 11 files changed, 549 insertions(+)
> create mode 100644 Documentation/git-last-modified.adoc
> create mode 100644 builtin/last-modified.c
> create mode 100755 t/t8020-last-modified.sh
I'm admittedly not entirely sure what the best way to review this patch
is given its size and my previous exposure to (similar) code.
From what I can tell, this does not include the optimizations that
Stolee and I worked on back in 2020-ish. Those would be nice to have,
but they are somewhat complex and I think more easily reviewed as an
incremental change on top rather than as part of the initial version.
As I mentioned in my response to your the cover letter, I would be more
than happy to help you with an effort to introduce those optimizations
on top.
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> new file mode 100644
> index 0000000000..63993bc1c9
> --- /dev/null
> +++ b/builtin/last-modified.c
> @@ -0,0 +1,289 @@
> +#include "git-compat-util.h"
> +#include "builtin.h"
> +#include "commit.h"
> +#include "config.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +#include "hashmap.h"
> +#include "hex.h"
> +#include "log-tree.h"
> +#include "object-name.h"
> +#include "object.h"
> +#include "parse-options.h"
> +#include "quote.h"
> +#include "repository.h"
> +#include "revision.h"
> +
> +struct last_modified_entry {
> + struct hashmap_entry hashent;
> + struct object_id oid;
> + const char path[FLEX_ARRAY];
> +};
As a general comment on this patch, I am a little sad to see that many
of the implementation details have been moved back into the builtin
itself and not in their own last-modified.ch file(s).
Apologies if this was already discussed earlier in the thread and I
simply missed it, but can you comment on why the last-modified internals
were moved into the builtin?
Even in the earliest version of 'blame-tree' that I could find (from
26999d045b (add blame-tree command, 2012-10-20) in my fork) many of the
internals were written in blame-tree.c instead of builtin/blame-tree.c.
> +static int last_modified_entry_hashcmp(const void *unused UNUSED,
> + const struct hashmap_entry *hent1,
> + const struct hashmap_entry *hent2,
> + const void *path)
> +{
> + const struct last_modified_entry *ent1 =
> + container_of(hent1, const struct last_modified_entry, hashent);
> + const struct last_modified_entry *ent2 =
> + container_of(hent2, const struct last_modified_entry, hashent);
> + return strcmp(ent1->path, path ? path : ent2->path);
> +}
> +
> +struct last_modified {
> + struct hashmap paths;
> + struct rev_info rev;
> + int recursive, tree_in_recursive;
Can we either make these two part of a bitfield, or at least declare
them separately?
> +};
> +
> +static void last_modified_release(struct last_modified *lm)
> +{
> + hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
> + release_revisions(&lm->rev);
> +}
> +
> +typedef void (*last_modified_callback)(const char *path,
> + const struct commit *commit, void *data);
> +
> +struct last_modified_callback_data {
> + struct commit *commit;
> + struct hashmap *paths;
> +
> + last_modified_callback callback;
> + void *callback_data;
> +};
I can't quite tell what the purpose of this struct is in conjunction
with the last_modified_callback type above.
The last_modified_callback type makes sense as a generic callback
function that callers can pass to get <path, commit> pairs, along with
an arbitrary "data" pointer.
But then you define a last_modified_callback_data struct that, which
made me think that it would be used as the data type passed to the
callback. In other words, given the existence of this struct, I would
have expected the function pointer above to be defined like:
typedef void (*last_modified_callback)(const char *path,
const struct commit *commit,
struct last_modified_callback_data *data);
But the fact that the _data struct contains a last_modified_callback
function pointer gives us a hint at what's going on here. It seems like
last_modified_callback_data is used to store some bookkeeping
information and dispatch calls to the "callback" function pointer.
I think that the fact the struct's name ends with "_data" is what is
confusing to me. I think this would be a little clearer if you renamed
this "struct last_modified_callback" and the function pointer to
"last_modified_callback_fn" or similar.
(The irony is not lost on me that these comments would be applicable to
GitHub's version of this code, too :-s).
> +static int populate_paths_from_revs(struct last_modified *lm)
> +{
> + int num_interesting = 0;
> + struct diff_options diffopt;
> +
> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
> + /*
> + * Use a callback to populate the paths from revs
> + */
> + diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + diffopt.format_callback = add_path_from_diff;
> + diffopt.format_callback_data = lm;
> +
> + for (size_t i = 0; i < lm->rev.pending.nr; i++) {
> + struct object_array_entry *obj = lm->rev.pending.objects + i;
> +
> + if (obj->item->flags & UNINTERESTING)
> + continue;
> +
> + if (num_interesting++)
> + return error(_("can only get last-modified one tree at a time"));
This error text is a little difficult to parse, but I'm not sure that I
have a great suggestion for improving it. The equivalent from GitHub's
fork is "can only blame one tree at a time", and I think the difficulty
in parsing is that "last-modified" isn't a verb.
> +static void mark_path(const char *path, const struct object_id *oid,
> + struct last_modified_callback_data *data)
> +{
> + struct last_modified_entry *ent;
> +
> + /* Is it even a path that we are interested in? */
> + ent = hashmap_get_entry_from_hash(data->paths, strhash(path), path,
> + struct last_modified_entry, hashent);
> + if (!ent)
> + return;
> +
> + /*
> + * Is it arriving at a version of interest, or is it from a side branch
> + * which did not contribute to the final state?
> + */
> + if (!oideq(oid, &ent->oid))
> + return;
GitHub's fork writes this as "if (oid && !oideq(oid, &ent->oid))", but
the commit that introduces the "oid &&" portion of that expression
doesn't provide us with any clues as to why the change was necessary.
Since you have spent more time with these patches than I have recently,
perhaps you can help shed some light on what's going on here?
The rest of the code roughly matches my memory of the early versions of
this command.
Thanks,
Taylor
next prev parent reply other threads:[~2025-07-18 0:02 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau [this message]
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHmPHcNQYlhGo8JB@nand.local \
--to=me@ttaylorr.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).