git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified
Date: Thu, 31 Jul 2025 08:42:33 +0200	[thread overview]
Message-ID: <aIsQWcHf82ipHoWf@pks.im> (raw)
In-Reply-To: <20250730175510.987383-2-toon@iotcl.com>

On Wed, Jul 30, 2025 at 07:55:07PM +0200, Toon Claes wrote:
> diff --git a/Documentation/git-last-modified.adoc b/Documentation/git-last-modified.adoc
> new file mode 100644
> index 0000000000..89138ebeb7
> --- /dev/null
> +++ b/Documentation/git-last-modified.adoc
> @@ -0,0 +1,49 @@
> +git-last-modified(1)
> +====================
> +
> +NAME
> +----
> +git-last-modified - EXPERIMENTAL: Show when files were last modified
> +
> +
> +SYNOPSIS
> +--------
> +[synopsis]
> +git last-modified [-r] [-t] [<revision-range>] [[--] <path>...]

I think we typically list long options here, not the short single-letter
ones.

> +
> +DESCRIPTION
> +-----------
> +
> +Shows which commit last modified each of the relevant files and subdirectories.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> +
> +OPTIONS
> +-------
> +
> +-r::

-r, --recursive::

> +	Recurse into subtrees.
> +
> +-t::

-t, --tree-in-recursive::

> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> new file mode 100644
> index 0000000000..e4c73464c7
> --- /dev/null
> +++ b/builtin/last-modified.c
[snip]
> +static int populate_paths_from_revs(struct last_modified *lm)
> +{
> +	int num_interesting = 0;
> +	struct diff_options diffopt;
> +
> +	memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
> +	copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
> +	/*
> +	 * Use a callback to populate the paths from revs
> +	 */
> +	diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	diffopt.format_callback = add_path_from_diff;
> +	diffopt.format_callback_data = lm;

I feel like this whole block could use a comment that explains what
we're doing. Why do we copy `diffopt` around? Why is it fine to free
the struct at the end without unsetting `lm->rev.diffopt`? Couldn't that
cause a double free?

> +	for (size_t i = 0; i < lm->rev.pending.nr; i++) {
> +		struct object_array_entry *obj = lm->rev.pending.objects + i;
> +
> +		if (obj->item->flags & UNINTERESTING)
> +			continue;
> +
> +		if (num_interesting++)
> +			return error(_("last-modified can only operate on one tree at a time"));
> +
> +		diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> +			      &obj->item->oid, "", &diffopt);
> +		diff_flush(&diffopt);
> +	}
> +	diff_free(&diffopt);
> +
> +	return 0;
> +}
> +
> +static void last_modified_emit(struct last_modified *lm,
> +			       const char *path, const struct commit *commit)
> +
> +{
> +	if (commit->object.flags & BOUNDARY)
> +		putchar('^');
> +	printf("%s\t", oid_to_hex(&commit->object.oid));
> +
> +	if (lm->rev.diffopt.line_termination)
> +		write_name_quoted(path, stdout, '\n');
> +	else
> +		printf("%s%c", path, '\0');
> +
> +	fflush(stdout);

Is there a reason why we have to explicitly flush output? This command
doesn't have any interactivity with the caller.

> +static void last_modified_diff(struct diff_queue_struct *q,
> +			       struct diff_options *opt UNUSED, void *cbdata)
> +{
> +	struct last_modified_callback_data *data = cbdata;
> +
> +	for (int i = 0; i < q->nr; i++) {
> +		struct diff_filepair *p = q->queue[i];
> +		switch (p->status) {
> +		case DIFF_STATUS_DELETED:
> +			/*
> +			 * There's no point in feeding a deletion, as it could
> +			 * not have resulted in our current state, which
> +			 * actually has the file.
> +			 */
> +			break;
> +
> +		default:
> +			/*
> +			 * Otherwise, we care only that we somehow arrived at
> +			 * a final oid state. Note that this covers some
> +			 * potentially controversial areas, including:
> +			 *
> +			 *  1. A rename or copy will be found, as it is the
> +			 *     first time the content has arrived at the given
> +			 *     path.

Makes sense that we don't handle renames (yet). I think I didn't spot
this in the manual, so maybe this is something we should document there.

> +			 *  2. Even a non-content modification like a mode or
> +			 *     type change will trigger it.

Seems sensible as a default, as well. And likewise, we can add
`--ignore-mode-changes` at a later point if we ever have a use case for
it.

> +			 * We take the inclusive approach for now, and find
> +			 * anything which impacts the path. Options to tweak
> +			 * the behavior (e.g., to "--follow" the content across
> +			 * renames) can come later.
> +			 */
> +			mark_path(p->two->path, &p->two->oid, data);
> +			break;
> +		}
> +	}
> +}
> +
> +static int last_modified_run(struct last_modified *lm)
> +{
> +	struct last_modified_callback_data data = { .lm = lm };
> +
> +	lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> +	lm->rev.diffopt.format_callback = last_modified_diff;
> +	lm->rev.diffopt.format_callback_data = &data;
> +
> +	prepare_revision_walk(&lm->rev);
> +
> +	while (hashmap_get_size(&lm->paths)) {
> +		data.commit = get_revision(&lm->rev);
> +		if (!data.commit)
> +			break;

So in this case we have reached the end of our commit range. I assume we
simply print the oldest commit of that range in this case?

> +		if (data.commit->object.flags & BOUNDARY) {
> +			diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> +				      &data.commit->object.oid, "",
> +				      &lm->rev.diffopt);
> +			diff_flush(&lm->rev.diffopt);
> +		} else {
> +			log_tree_commit(&lm->rev, data.commit);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int last_modified_init(struct last_modified *lm, struct repository *r,
> +			      const char *prefix, int argc, const char **argv)
> +{
> +	hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
> +
> +	repo_init_revisions(r, &lm->rev, prefix);
> +	lm->rev.def = "HEAD";
> +	lm->rev.combine_merges = 1;
> +	lm->rev.show_root_diff = 1;
> +	lm->rev.boundary = 1;
> +	lm->rev.no_commit_id = 1;
> +	lm->rev.diff = 1;
> +	lm->rev.diffopt.flags.recursive = lm->recursive || lm->tree_in_recursive;
> +	lm->rev.diffopt.flags.tree_in_recursive = lm->tree_in_recursive;
> +
> +	if ((argc = setup_revisions(argc, argv, &lm->rev, NULL)) > 1) {

Tiny nit: it's rather unusual in our codebase to assign values in
conditionals. I personally don't mind this usage at all -- I think it
can make error handling way less verbose. But I'm not sure whether we
deem this style acceptable.

        argc = setup_revisions(argc, argv, &lm->rev, NULL)
        if (argc) {
            ...
        }

I've seen this style several times in this patch. I think we should keep
our typical style for now, but I wouldn't mind if you sent a patch for
our coding style document so that we can discuss this.

> +		error(_("unknown last-modified argument: %s"), argv[1]);
> +		return argc;
> +	}
> +
> +	if (populate_paths_from_revs(lm) < 0)
> +		return error(_("unable to setup last-modified"));
> +
> +	return 0;
> +}
> +
> +int cmd_last_modified(int argc, const char **argv, const char *prefix,
> +		      struct repository *repo)
> +{
> +	int ret;
> +	struct last_modified lm;
> +
> +	const char * const last_modified_usage[] = {
> +		N_("git last-modified [-r] [-t] "
> +		   "[<revision-range>] [[--] <path>...]"),
> +		NULL
> +	};
> +
> +	struct option last_modified_options[] = {
> +		OPT_BOOL('r', "recursive", &lm.recursive,
> +			 N_("recurse into subtrees")),
> +		OPT_BOOL('t', "tree-in-recursive", &lm.tree_in_recursive,
> +			 N_("recurse into subtrees and include the tree entries too")),

Should this maybe be called something like "--recursive-with-trees"?
"--tree-in-recursive" reads somewhat strange to me.

> +		OPT_END()
> +	};
> +
> +	memset(&lm, 0, sizeof(lm));

You can avoid the `memset()` and directly zero-initialize the struct
when it's declared. Alternatively, you can move this function call into
`last_modified_init()` itself, where it would be more reasonable.

> +	argc = parse_options(argc, argv, prefix, last_modified_options,
> +			     last_modified_usage,
> +			     PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN_OPT);
> +
> +	repo_config(repo, git_default_config, NULL);
> +
> +	if ((ret = last_modified_init(&lm, repo, prefix, argc, argv))) {
> +		if (ret > 0)
> +			usage_with_options(last_modified_usage,
> +					   last_modified_options);
> +		goto out;
> +	}
> +
> +	if ((ret = last_modified_run(&lm)))
> +		goto out;

Two more cases where we assign `if ((ret = ...))`.

Patrick

  reply	other threads:[~2025-07-31  6:42 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19   ` Junio C Hamano
2025-05-07 13:13     ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22   ` Toon Claes
2025-05-07 20:23     ` Marc Branchaud
2025-05-07 20:45       ` Junio C Hamano
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 14:26           ` Junio C Hamano
2025-05-08 15:12             ` Marc Branchaud
2025-05-14 14:42               ` Toon Claes
2025-05-14 19:29                 ` Junio C Hamano
2025-05-14 21:15                   ` Marc Branchaud
2025-05-15 13:29                     ` Patrick Steinhardt
2025-05-15 16:39                       ` Junio C Hamano
2025-05-15 17:39                         ` Marc Branchaud
2025-05-15 19:30                           ` Jeff King
2025-05-16  4:38                             ` Patrick Steinhardt
2025-05-20  8:49                               ` Toon Claes
2025-05-15 17:30                       ` Marc Branchaud
2025-05-16  4:30                         ` Patrick Steinhardt
2025-05-14 21:15                 ` Marc Branchaud
2025-05-07 20:49       ` Kristoffer Haugsbakk
2025-05-08 13:20         ` D. Ben Knoble
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 13:18       ` D. Ben Knoble
2025-05-23  9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07     ` Justin Tobler
2025-06-05  8:32       ` Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-06-13  9:34       ` Toon Claes
2025-06-13  9:52         ` Kristoffer Haugsbakk
2025-05-23  9:33   ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40     ` Patrick Steinhardt
2025-06-13 11:05       ` Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-05-23  9:33   ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-07-01 20:35   ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06     ` Junio C Hamano
2025-07-01 21:30       ` Kristoffer Haugsbakk
2025-07-02 13:00         ` Toon Claes
2025-07-09 15:53           ` Toon Claes
2025-07-09 17:00             ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20     ` Kristoffer Haugsbakk
2025-07-02 11:51     ` Junio C Hamano
2025-06-30 18:49   ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01   ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26   ` [PATCH v4 " Toon Claes
2025-07-09 21:57     ` Junio C Hamano
2025-07-10 18:37       ` Junio C Hamano
2025-07-16 13:32     ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35       ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18  0:02         ` Taylor Blau
2025-07-19  6:44           ` Jeff King
2025-07-22 15:50           ` Toon Claes
2025-08-01  9:09           ` Christian Couder
2025-08-01 16:59             ` Junio C Hamano
2025-07-16 13:35       ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18  0:08         ` Taylor Blau
2025-07-22 15:52           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18  0:16         ` Taylor Blau
2025-07-22 16:02           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-17 22:37         ` Junio C Hamano
2025-07-18 17:36           ` Junio C Hamano
2025-07-22 16:06             ` Toon Claes
2025-07-16 13:42       ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39       ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35         ` Toon Claes
2025-07-30 17:59           ` Toon Claes
2025-07-31  7:45             ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40         ` Junio C Hamano
2025-07-31 23:57           ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34           ` Patrick Steinhardt
2025-08-05 16:21             ` Junio C Hamano
2025-08-05 16:34           ` Junio C Hamano
2025-08-05 16:55             ` Toon Claes
2025-08-05 17:20               ` Jean-Noël AVILA
2025-08-05 21:46                 ` Junio C Hamano
2025-08-06 12:01                   ` Toon Claes
2025-08-06 15:38                     ` Junio C Hamano
2025-08-28 22:44                       ` Junio C Hamano
2025-08-05 18:28               ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05  9:33         ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05  9:33         ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55       ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt [this message]
2025-08-01 16:22           ` Toon Claes
2025-08-01 17:09             ` Junio C Hamano
2025-08-04  6:34               ` Patrick Steinhardt
2025-08-04 17:14                 ` Junio C Hamano
2025-08-05  5:35                   ` Toon Claes
2025-08-01 20:34             ` Jean-Noël AVILA
2025-08-05  5:36               ` Toon Claes
2025-08-04  6:33             ` Patrick Steinhardt
2025-08-01 10:18         ` Christian Couder
2025-08-01 10:22           ` Patrick Steinhardt
2025-08-01 17:06             ` Junio C Hamano
2025-08-02  8:18               ` Christian Couder
2025-08-02 11:31                 ` Christian Couder
2025-08-02 13:38                   ` Christian Couder
2025-08-02 16:26                     ` Junio C Hamano
2025-08-04  6:35               ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55       ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31  6:43         ` Patrick Steinhardt
2025-08-01 16:23           ` Toon Claes
2025-08-04  6:33             ` Patrick Steinhardt
2025-07-09 15:26   ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26   ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26   ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35   ` [PATCH v5 6/6] fixup! " Toon Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIsQWcHf82ipHoWf@pks.im \
    --to=ps@pks.im \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).