From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>,
"Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Christian Couder" <christian.couder@gmail.com>,
"Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified
Date: Thu, 31 Jul 2025 08:42:33 +0200 [thread overview]
Message-ID: <aIsQWcHf82ipHoWf@pks.im> (raw)
In-Reply-To: <20250730175510.987383-2-toon@iotcl.com>
On Wed, Jul 30, 2025 at 07:55:07PM +0200, Toon Claes wrote:
> diff --git a/Documentation/git-last-modified.adoc b/Documentation/git-last-modified.adoc
> new file mode 100644
> index 0000000000..89138ebeb7
> --- /dev/null
> +++ b/Documentation/git-last-modified.adoc
> @@ -0,0 +1,49 @@
> +git-last-modified(1)
> +====================
> +
> +NAME
> +----
> +git-last-modified - EXPERIMENTAL: Show when files were last modified
> +
> +
> +SYNOPSIS
> +--------
> +[synopsis]
> +git last-modified [-r] [-t] [<revision-range>] [[--] <path>...]
I think we typically list long options here, not the short single-letter
ones.
> +
> +DESCRIPTION
> +-----------
> +
> +Shows which commit last modified each of the relevant files and subdirectories.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> +
> +OPTIONS
> +-------
> +
> +-r::
-r, --recursive::
> + Recurse into subtrees.
> +
> +-t::
-t, --tree-in-recursive::
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> new file mode 100644
> index 0000000000..e4c73464c7
> --- /dev/null
> +++ b/builtin/last-modified.c
[snip]
> +static int populate_paths_from_revs(struct last_modified *lm)
> +{
> + int num_interesting = 0;
> + struct diff_options diffopt;
> +
> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
> + /*
> + * Use a callback to populate the paths from revs
> + */
> + diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + diffopt.format_callback = add_path_from_diff;
> + diffopt.format_callback_data = lm;
I feel like this whole block could use a comment that explains what
we're doing. Why do we copy `diffopt` around? Why is it fine to free
the struct at the end without unsetting `lm->rev.diffopt`? Couldn't that
cause a double free?
> + for (size_t i = 0; i < lm->rev.pending.nr; i++) {
> + struct object_array_entry *obj = lm->rev.pending.objects + i;
> +
> + if (obj->item->flags & UNINTERESTING)
> + continue;
> +
> + if (num_interesting++)
> + return error(_("last-modified can only operate on one tree at a time"));
> +
> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> + &obj->item->oid, "", &diffopt);
> + diff_flush(&diffopt);
> + }
> + diff_free(&diffopt);
> +
> + return 0;
> +}
> +
> +static void last_modified_emit(struct last_modified *lm,
> + const char *path, const struct commit *commit)
> +
> +{
> + if (commit->object.flags & BOUNDARY)
> + putchar('^');
> + printf("%s\t", oid_to_hex(&commit->object.oid));
> +
> + if (lm->rev.diffopt.line_termination)
> + write_name_quoted(path, stdout, '\n');
> + else
> + printf("%s%c", path, '\0');
> +
> + fflush(stdout);
Is there a reason why we have to explicitly flush output? This command
doesn't have any interactivity with the caller.
> +static void last_modified_diff(struct diff_queue_struct *q,
> + struct diff_options *opt UNUSED, void *cbdata)
> +{
> + struct last_modified_callback_data *data = cbdata;
> +
> + for (int i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + switch (p->status) {
> + case DIFF_STATUS_DELETED:
> + /*
> + * There's no point in feeding a deletion, as it could
> + * not have resulted in our current state, which
> + * actually has the file.
> + */
> + break;
> +
> + default:
> + /*
> + * Otherwise, we care only that we somehow arrived at
> + * a final oid state. Note that this covers some
> + * potentially controversial areas, including:
> + *
> + * 1. A rename or copy will be found, as it is the
> + * first time the content has arrived at the given
> + * path.
Makes sense that we don't handle renames (yet). I think I didn't spot
this in the manual, so maybe this is something we should document there.
> + * 2. Even a non-content modification like a mode or
> + * type change will trigger it.
Seems sensible as a default, as well. And likewise, we can add
`--ignore-mode-changes` at a later point if we ever have a use case for
it.
> + * We take the inclusive approach for now, and find
> + * anything which impacts the path. Options to tweak
> + * the behavior (e.g., to "--follow" the content across
> + * renames) can come later.
> + */
> + mark_path(p->two->path, &p->two->oid, data);
> + break;
> + }
> + }
> +}
> +
> +static int last_modified_run(struct last_modified *lm)
> +{
> + struct last_modified_callback_data data = { .lm = lm };
> +
> + lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + lm->rev.diffopt.format_callback = last_modified_diff;
> + lm->rev.diffopt.format_callback_data = &data;
> +
> + prepare_revision_walk(&lm->rev);
> +
> + while (hashmap_get_size(&lm->paths)) {
> + data.commit = get_revision(&lm->rev);
> + if (!data.commit)
> + break;
So in this case we have reached the end of our commit range. I assume we
simply print the oldest commit of that range in this case?
> + if (data.commit->object.flags & BOUNDARY) {
> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> + &data.commit->object.oid, "",
> + &lm->rev.diffopt);
> + diff_flush(&lm->rev.diffopt);
> + } else {
> + log_tree_commit(&lm->rev, data.commit);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int last_modified_init(struct last_modified *lm, struct repository *r,
> + const char *prefix, int argc, const char **argv)
> +{
> + hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
> +
> + repo_init_revisions(r, &lm->rev, prefix);
> + lm->rev.def = "HEAD";
> + lm->rev.combine_merges = 1;
> + lm->rev.show_root_diff = 1;
> + lm->rev.boundary = 1;
> + lm->rev.no_commit_id = 1;
> + lm->rev.diff = 1;
> + lm->rev.diffopt.flags.recursive = lm->recursive || lm->tree_in_recursive;
> + lm->rev.diffopt.flags.tree_in_recursive = lm->tree_in_recursive;
> +
> + if ((argc = setup_revisions(argc, argv, &lm->rev, NULL)) > 1) {
Tiny nit: it's rather unusual in our codebase to assign values in
conditionals. I personally don't mind this usage at all -- I think it
can make error handling way less verbose. But I'm not sure whether we
deem this style acceptable.
argc = setup_revisions(argc, argv, &lm->rev, NULL)
if (argc) {
...
}
I've seen this style several times in this patch. I think we should keep
our typical style for now, but I wouldn't mind if you sent a patch for
our coding style document so that we can discuss this.
> + error(_("unknown last-modified argument: %s"), argv[1]);
> + return argc;
> + }
> +
> + if (populate_paths_from_revs(lm) < 0)
> + return error(_("unable to setup last-modified"));
> +
> + return 0;
> +}
> +
> +int cmd_last_modified(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> +{
> + int ret;
> + struct last_modified lm;
> +
> + const char * const last_modified_usage[] = {
> + N_("git last-modified [-r] [-t] "
> + "[<revision-range>] [[--] <path>...]"),
> + NULL
> + };
> +
> + struct option last_modified_options[] = {
> + OPT_BOOL('r', "recursive", &lm.recursive,
> + N_("recurse into subtrees")),
> + OPT_BOOL('t', "tree-in-recursive", &lm.tree_in_recursive,
> + N_("recurse into subtrees and include the tree entries too")),
Should this maybe be called something like "--recursive-with-trees"?
"--tree-in-recursive" reads somewhat strange to me.
> + OPT_END()
> + };
> +
> + memset(&lm, 0, sizeof(lm));
You can avoid the `memset()` and directly zero-initialize the struct
when it's declared. Alternatively, you can move this function call into
`last_modified_init()` itself, where it would be more reasonable.
> + argc = parse_options(argc, argv, prefix, last_modified_options,
> + last_modified_usage,
> + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN_OPT);
> +
> + repo_config(repo, git_default_config, NULL);
> +
> + if ((ret = last_modified_init(&lm, repo, prefix, argc, argv))) {
> + if (ret > 0)
> + usage_with_options(last_modified_usage,
> + last_modified_options);
> + goto out;
> + }
> +
> + if ((ret = last_modified_run(&lm)))
> + goto out;
Two more cases where we assign `if ((ret = ...))`.
Patrick
next prev parent reply other threads:[~2025-07-31 6:42 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt [this message]
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIsQWcHf82ipHoWf@pks.im \
--to=ps@pks.im \
--cc=avarab@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).