From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
"Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified
Date: Tue, 27 May 2025 12:39:55 +0200 [thread overview]
Message-ID: <aDWWe6qCQXorPESd@pks.im> (raw)
In-Reply-To: <20250523-toon-new-blame-tree-v2-1-101e4ca4c1c9@iotcl.com>
On Fri, May 23, 2025 at 11:33:48AM +0200, Toon Claes wrote:
> diff --git a/Documentation/git-last-modified.adoc b/Documentation/git-last-modified.adoc
> new file mode 100644
> index 0000000000..1af38f402e
> --- /dev/null
> +++ b/Documentation/git-last-modified.adoc
> @@ -0,0 +1,49 @@
> +git-last-modified(1)
> +====================
> +
> +NAME
> +----
> +git-last-modified - EXPERIMENTAL: Show when files were last modified
Nit: we don't have the EXPERIMENTAL label here for git-switch(1) or
git-restore(1).
> +
> +
> +SYNOPSIS
> +--------
> +[synopsis]
> +git last-modified [-r] [<revision-range>] [[--] <path>...]
> +
> +DESCRIPTION
> +-----------
> +
> +Shows which commit last modified each of the relevant files and subdirectories.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
> +
> +OPTIONS
> +-------
> +
> +-r::
> + Recurse into subtrees.
> +
> +-t::
> + Show tree entry itself as well as subtrees. Implies `-r`.
These flags aren't yet supported in this version, are they?
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> new file mode 100644
> index 0000000000..0d4733f666
> --- /dev/null
> +++ b/builtin/last-modified.c
> @@ -0,0 +1,43 @@
> +#include "git-compat-util.h"
> +#include "last-modified.h"
> +#include "hex.h"
> +#include "quote.h"
> +#include "config.h"
> +#include "object-name.h"
> +#include "parse-options.h"
> +#include "builtin.h"
> +
> +static void show_entry(const char *path, const struct commit *commit, void *d)
> +{
> + struct last_modified *lm = d;
> +
> + if (commit->object.flags & BOUNDARY)
> + putchar('^');
> + printf("%s\t", oid_to_hex(&commit->object.oid));
> +
> + if (lm->rev.diffopt.line_termination)
> + write_name_quoted(path, stdout, '\n');
> + else
> + printf("%s%c", path, '\0');
> +
> + fflush(stdout);
> +}
> +
> +int cmd_last_modified(int argc,
> + const char **argv,
> + const char *prefix,
> + struct repository *repo)
> +{
> + int ret = 0;
`ret` is basically unused here, we only use it to return 0.
> diff --git a/last-modified.c b/last-modified.c
> new file mode 100644
> index 0000000000..9283f8fcae
> --- /dev/null
> +++ b/last-modified.c
> @@ -0,0 +1,213 @@
> +#include "git-compat-util.h"
> +#include "last-modified.h"
> +#include "commit.h"
> +#include "diffcore.h"
> +#include "diff.h"
> +#include "object.h"
> +#include "revision.h"
> +#include "repository.h"
> +#include "log-tree.h"
> +
> +struct last_modified_entry {
> + struct hashmap_entry hashent;
> + struct object_id oid;
> + struct commit *commit;
> + const char path[FLEX_ARRAY];
> +};
> +
> +static void add_from_diff(struct diff_queue_struct *q,
> + struct diff_options *opt UNUSED,
> + void *data)
> +{
> + struct last_modified *lm = data;
> +
> + for (int i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + struct last_modified_entry *ent;
> + const char *path = p->two->path;
> +
> + FLEX_ALLOC_STR(ent, path, path);
> + oidcpy(&ent->oid, &p->two->oid);
> + hashmap_entry_init(&ent->hashent, strhash(ent->path));
> + hashmap_add(&lm->paths, &ent->hashent);
> + }
> +}
>
> +static int add_from_revs(struct last_modified *lm)
> +{
> + size_t count = 0;
> + struct diff_options diffopt;
> +
> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
> + diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + diffopt.format_callback = add_from_diff;
> + diffopt.format_callback_data = lm;
As far as I understand we populate `paths` from the diff here, and
`paths` later on acts as a filter of paths we're interested in? Might be
nice to add a comment explaining the intent of this.
> + for (size_t i = 0; i < lm->rev.pending.nr; i++) {
> + struct object_array_entry *obj = lm->rev.pending.objects + i;
> +
> + if (obj->item->flags & UNINTERESTING)
> + continue;
> +
> + if (count++)
> + return error(_("can only get last-modified one tree at a time"));
It's a bit funny that `count` is pretending to be a counter even though
it ultimately is only a boolean flag whether we have already seen an
interesting item.
> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
> + &obj->item->oid, "", &diffopt);
> + diff_flush(&diffopt);
> + }
> + clear_pathspec(&diffopt.pathspec);
Shouldn't we call `diff_free()` instead of `clear_pathspec` to clear the
whole `struct diff_options`?
> +
> + return 0;
> +}
> +
> +static int last_modified_entry_hashcmp(const void *unused UNUSED,
> + const struct hashmap_entry *hent1,
> + const struct hashmap_entry *hent2,
> + const void *path)
> +{
> + const struct last_modified_entry *ent1 =
> + container_of(hent1, const struct last_modified_entry, hashent);
> + const struct last_modified_entry *ent2 =
> + container_of(hent2, const struct last_modified_entry, hashent);
> + return strcmp(ent1->path, path ? path : ent2->path);
> +}
> +
> +void last_modified_init(struct last_modified *lm,
> + struct repository *r,
> + const char *prefix,
> + int argc, const char **argv)
> +{
> + memset(lm, 0, sizeof(*lm));
> + hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
> +
> + repo_init_revisions(r, &lm->rev, prefix);
> + lm->rev.def = "HEAD";
> + lm->rev.combine_merges = 1;
> + lm->rev.show_root_diff = 1;
> + lm->rev.boundary = 1;
> + lm->rev.no_commit_id = 1;
> + lm->rev.diff = 1;
> + if (setup_revisions(argc, argv, &lm->rev, NULL) > 1)
> + die(_("unknown last-modified argument: %s"), argv[1]);
> +
> + if (add_from_revs(lm) < 0)
> + die(_("unable to setup last-modified"));
Given that this is library code, do we rather want to have
`last_modified_init()` return an error code and let the caller die?
> +}
> +
> +void last_modified_release(struct last_modified *lm)
> +{
> + hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
> + release_revisions(&lm->rev);
> +}
> +
> +struct last_modified_callback_data {
> + struct commit *commit;
> + struct hashmap *paths;
> +
> + last_modified_callback callback;
> + void *callback_data;
> +};
> +
> +static void mark_path(const char *path, const struct object_id *oid,
> + struct last_modified_callback_data *data)
> +{
> + struct last_modified_entry *ent;
> +
> + /* Is it even a path that we are interested in? */
> + ent = hashmap_get_entry_from_hash(data->paths, strhash(path), path,
> + struct last_modified_entry, hashent);
> + if (!ent)
> + return;
Yup, so this here is the filter to figure out whether we care for a
path, which uses the `paths` map we have populated at the beginning.
> + /* Have we already found a commit? */
> + if (ent->commit)
> + return;
Can this case even be hit? We remove the entry from the map once we have
seen it, so I'd expect that we never hit the same commit map entry
twice. If so, can this be converted to a `BUG()` or am I missing the
obvious?
> + /*
> + * Is it arriving at a version of interest, or is it from a side branch
> + * which did not contribute to the final state?
> + */
> + if (!oideq(oid, &ent->oid))
> + return;
> +
> + ent->commit = data->commit;
> + if (data->callback)
> + data->callback(path, data->commit, data->callback_data);
> +
> + hashmap_remove(data->paths, &ent->hashent, path);
And we end up removing that entry from paths so that we don't revisit it
in the future. After all, we're only interested in a single commit per
path.
> + free(ent);
> +}
> +
> +static void last_modified_diff(struct diff_queue_struct *q,
> + struct diff_options *opt UNUSED, void *cbdata)
> +{
> + struct last_modified_callback_data *data = cbdata;
> +
> + for (int i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + switch (p->status) {
> + case DIFF_STATUS_DELETED:
> + /*
> + * There's no point in feeding a deletion, as it could
> + * not have resulted in our current state, which
> + * actually has the file.
> + */
> + break;
> +
> + default:
> + /*
> + * Otherwise, we care only that we somehow arrived at
> + * a final path/sha1 state. Note that this covers some
> + * potentially controversial areas, including:
> + *
> + * 1. A rename or copy will be found, as it is the
> + * first time the content has arrived at the given
> + * path.
> + *
> + * 2. Even a non-content modification like a mode or
> + * type change will trigger it.
Curious, but sensible. We're looking for the last time a specific tree
entry was changed, and that of course includes modifications. I could
totally see that we may eventually want to add a flag that ignores such
mode changes and only presents content changes. But for now I agree that
this is sensible.
> + * We take the inclusive approach for now, and find
> + * anything which impacts the path. Options to tweak
> + * the behavior (e.g., to "--follow" the content across
> + * renames) can come later.
> + */
> + mark_path(p->two->path, &p->two->oid, data);
> + break;
> + }
> + }
> +}
> +
> +int last_modified_run(struct last_modified *lm, last_modified_callback cb, void *cbdata)
> +{
> + struct last_modified_callback_data data;
> +
> + data.paths = &lm->paths;
> + data.callback = cb;
> + data.callback_data = cbdata;
> +
> + lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> + lm->rev.diffopt.format_callback = last_modified_diff;
> + lm->rev.diffopt.format_callback_data = &data;
> +
> + prepare_revision_walk(&lm->rev);
> +
> + while (hashmap_get_size(&lm->paths)) {
Okay, and this is the core of our logic: we continue walking the tree
until there are no more paths that we care about.
> diff --git a/last-modified.h b/last-modified.h
> new file mode 100644
> index 0000000000..42a819d979
> --- /dev/null
> +++ b/last-modified.h
> @@ -0,0 +1,27 @@
> +#ifndef LAST_MODIFIED_H
> +#define LAST_MODIFIED_H
> +
> +#include "commit.h"
> +#include "revision.h"
> +#include "hashmap.h"
> +
> +struct last_modified {
> + struct hashmap paths;
> + struct rev_info rev;
> +};
> +
> +void last_modified_init(struct last_modified *lm,
> + struct repository *r,
> + const char *prefix,
> + int argc, const char **argv);
> +
> +void last_modified_release(struct last_modified *);
> +
> +typedef void (*last_modified_callback)(const char *path,
> + const struct commit *commit,
> + void *data);
> +int last_modified_run(struct last_modified *lm,
> + last_modified_callback cb,
> + void *cbdata);
> +#endif /* LAST_MODIFIED_H */
It would be nice to have some documentation for each of these functions
as well as a bit of a higher-level conceptual info.
next prev parent reply other threads:[~2025-05-27 10:39 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt [this message]
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDWWe6qCQXorPESd@pks.im \
--to=ps@pks.im \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).