From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>,
"Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Christian Couder" <christian.couder@gmail.com>,
"Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified
Date: Thu, 31 Jul 2025 08:42:33 +0200 [thread overview]
Message-ID: <aIsQWcHf82ipHoWf@pks.im> (raw)
In-Reply-To: <20250730175510.987383-2-toon@iotcl.com>
On Wed, Jul 30, 2025 at 07:55:07PM +0200, Toon Claes wrote:
> diff --git a/Documentation/git-last-modified.adoc b/Documentation/git-last-modified.adoc
> new file mode 100644
> index 0000000000..89138ebeb7
> --- /dev/null
> +++ b/Documentation/git-last-modified.adoc
> @@ -0,0 +1,49 @@
> +git-last-modified(1)
> +====================
> +
> +NAME
> +----
> +git-last-modified - EXPERIMENTAL: Show when files were last modified
> +
> +
> +SYNOPSIS
> +--------
> +[synopsis]
> +git last-modified [-r] [-t] [<revision-range>] [[--] <path>...]
I think we typically list long options here, not the short single-letter
ones.
> +
> +DESCRIPTION
> +-----------
> +
> +Shows which commit last modified each of the relevant files and subdirectories.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> +
> +OPTIONS
> +-------
> +
> +-r::
-r, --recursive::
> + Recurse into subtrees.
> +
> +-t::
-t, --tree-in-recursive::
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> new file mode 100644
> index 0000000000..e4c73464c7
> --- /dev/null
> +++ b/builtin/last-modified.c
[snip]
> +static int populate_paths_from_revs(struct last_modified *lm)
> +{
> + int num_interesting = 0;
> + struct diff_options diffopt;
> +
> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
> + /*
> + * Use a callback to populate the paths from revs
> + */
> + diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + diffopt.format_callback = add_path_from_diff;
> + diffopt.format_callback_data = lm;
I feel like this whole block could use a comment that explains what
we're doing. Why do we copy `diffopt` around? Why is it fine to free
the struct at the end without unsetting `lm->rev.diffopt`? Couldn't that
cause a double free?
> + for (size_t i = 0; i < lm->rev.pending.nr; i++) {
> + struct object_array_entry *obj = lm->rev.pending.objects + i;
> +
> + if (obj->item->flags & UNINTERESTING)
> + continue;
> +
> + if (num_interesting++)
> + return error(_("last-modified can only operate on one tree at a time"));
> +
> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> + &obj->item->oid, "", &diffopt);
> + diff_flush(&diffopt);
> + }
> + diff_free(&diffopt);
> +
> + return 0;
> +}
> +
> +static void last_modified_emit(struct last_modified *lm,
> + const char *path, const struct commit *commit)
> +
> +{
> + if (commit->object.flags & BOUNDARY)
> + putchar('^');
> + printf("%s\t", oid_to_hex(&commit->object.oid));
> +
> + if (lm->rev.diffopt.line_termination)
> + write_name_quoted(path, stdout, '\n');
> + else
> + printf("%s%c", path, '\0');
> +
> + fflush(stdout);
Is there a reason why we have to explicitly flush output? This command
doesn't have any interactivity with the caller.
> +static void last_modified_diff(struct diff_queue_struct *q,
> + struct diff_options *opt UNUSED, void *cbdata)
> +{
> + struct last_modified_callback_data *data = cbdata;
> +
> + for (int i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + switch (p->status) {
> + case DIFF_STATUS_DELETED:
> + /*
> + * There's no point in feeding a deletion, as it could
> + * not have resulted in our current state, which
> + * actually has the file.
> + */
> + break;
> +
> + default:
> + /*
> + * Otherwise, we care only that we somehow arrived at
> + * a final oid state. Note that this covers some
> + * potentially controversial areas, including:
> + *
> + * 1. A rename or copy will be found, as it is the
> + * first time the content has arrived at the given
> + * path.
Makes sense that we don't handle renames (yet). I think I didn't spot
this in the manual, so maybe this is something we should document there.
> + * 2. Even a non-content modification like a mode or
> + * type change will trigger it.
Seems sensible as a default, as well. And likewise, we can add
`--ignore-mode-changes` at a later point if we ever have a use case for
it.
> + * We take the inclusive approach for now, and find
> + * anything which impacts the path. Options to tweak
> + * the behavior (e.g., to "--follow" the content across
> + * renames) can come later.
> + */
> + mark_path(p->two->path, &p->two->oid, data);
> + break;
> + }
> + }
> +}
> +
> +static int last_modified_run(struct last_modified *lm)
> +{
> + struct last_modified_callback_data data = { .lm = lm };
> +
> + lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + lm->rev.diffopt.format_callback = last_modified_diff;
> + lm->rev.diffopt.format_callback_data = &data;
> +
> + prepare_revision_walk(&lm->rev);
> +
> + while (hashmap_get_size(&lm->paths)) {
> + data.commit = get_revision(&lm->rev);
> + if (!data.commit)
> + break;
So in this case we have reached the end of our commit range. I assume we
simply print the oldest commit of that range in this case?
> + if (data.commit->object.flags & BOUNDARY) {
> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> + &data.commit->object.oid, "",
> + &lm->rev.diffopt);
> + diff_flush(&lm->rev.diffopt);
> + } else {
> + log_tree_commit(&lm->rev, data.commit);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int last_modified_init(struct last_modified *lm, struct repository *r,
> + const char *prefix, int argc, const char **argv)
> +{
> + hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
> +
> + repo_init_revisions(r, &lm->rev, prefix);
> + lm->rev.def = "HEAD";
> + lm->rev.combine_merges = 1;
> + lm->rev.show_root_diff = 1;
> + lm->rev.boundary = 1;
> + lm->rev.no_commit_id = 1;
> + lm->rev.diff = 1;
> + lm->rev.diffopt.flags.recursive = lm->recursive || lm->tree_in_recursive;
> + lm->rev.diffopt.flags.tree_in_recursive = lm->tree_in_recursive;
> +
> + if ((argc = setup_revisions(argc, argv, &lm->rev, NULL)) > 1) {
Tiny nit: it's rather unusual in our codebase to assign values in
conditionals. I personally don't mind this usage at all -- I think it
can make error handling way less verbose. But I'm not sure whether we
deem this style acceptable.
argc = setup_revisions(argc, argv, &lm->rev, NULL)
if (argc) {
...
}
I've seen this style several times in this patch. I think we should keep
our typical style for now, but I wouldn't mind if you sent a patch for
our coding style document so that we can discuss this.
> + error(_("unknown last-modified argument: %s"), argv[1]);
> + return argc;
> + }
> +
> + if (populate_paths_from_revs(lm) < 0)
> + return error(_("unable to setup last-modified"));
> +
> + return 0;
> +}
> +
> +int cmd_last_modified(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> +{
> + int ret;
> + struct last_modified lm;
> +
> + const char * const last_modified_usage[] = {
> + N_("git last-modified [-r] [-t] "
> + "[<revision-range>] [[--] <path>...]"),
> + NULL
> + };
> +
> + struct option last_modified_options[] = {
> + OPT_BOOL('r', "recursive", &lm.recursive,
> + N_("recurse into subtrees")),
> + OPT_BOOL('t', "tree-in-recursive", &lm.tree_in_recursive,
> + N_("recurse into subtrees and include the tree entries too")),
Should this maybe be called something like "--recursive-with-trees"?
"--tree-in-recursive" reads somewhat strange to me.
> + OPT_END()
> + };
> +
> + memset(&lm, 0, sizeof(lm));
You can avoid the `memset()` and directly zero-initialize the struct
when it's declared. Alternatively, you can move this function call into
`last_modified_init()` itself, where it would be more reasonable.
> + argc = parse_options(argc, argv, prefix, last_modified_options,
> + last_modified_usage,
> + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN_OPT);
> +
> + repo_config(repo, git_default_config, NULL);
> +
> + if ((ret = last_modified_init(&lm, repo, prefix, argc, argv))) {
> + if (ret > 0)
> + usage_with_options(last_modified_usage,
> + last_modified_options);
> + goto out;
> + }
> +
> + if ((ret = last_modified_run(&lm)))
> + goto out;
Two more cases where we assign `if ((ret = ...))`.
Patrick
next prev parent reply other threads:[~2025-07-31 6:42 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt [this message]
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIsQWcHf82ipHoWf@pks.im \
--to=ps@pks.im \
--cc=avarab@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.