From: Toon Claes <toon@iotcl.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
"Taylor Blau" <me@ttaylorr.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified
Date: Fri, 13 Jun 2025 11:34:28 +0200 [thread overview]
Message-ID: <87ldpw0yu3.fsf@iotcl.com> (raw)
In-Reply-To: <aDWWe6qCQXorPESd@pks.im>
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, May 23, 2025 at 11:33:48AM +0200, Toon Claes wrote:
>> diff --git a/Documentation/git-last-modified.adoc b/Documentation/git-last-modified.adoc
>> new file mode 100644
>> index 0000000000..1af38f402e
>> --- /dev/null
>> +++ b/Documentation/git-last-modified.adoc
>> @@ -0,0 +1,49 @@
>> +git-last-modified(1)
>> +====================
>> +
>> +NAME
>> +----
>> +git-last-modified - EXPERIMENTAL: Show when files were last modified
>
> Nit: we don't have the EXPERIMENTAL label here for git-switch(1) or
> git-restore(1).
But we do for `git-replay(1)`. Because I haven't gotten much feedback
about the usage of the command, I wanted to be on the safe side and not
commit to the behavior. Marking it EXPERIMENTAL would allow us to make
changes on it's interface without _breaking_. But I wouldn't mind
dropping the experimental status.
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[synopsis]
>> +git last-modified [-r] [<revision-range>] [[--] <path>...]
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +Shows which commit last modified each of the relevant files and subdirectories.
>> +
>> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
>> +
>> +OPTIONS
>> +-------
>> +
>> +-r::
>> + Recurse into subtrees.
>> +
>> +-t::
>> + Show tree entry itself as well as subtrees. Implies `-r`.
>
> These flags aren't yet supported in this version, are they?
They are, but I see there are no tests for `-t`. I shall add them.
>
>> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
>> new file mode 100644
>> index 0000000000..0d4733f666
>> --- /dev/null
>> +++ b/builtin/last-modified.c
>> @@ -0,0 +1,43 @@
>> +#include "git-compat-util.h"
>> +#include "last-modified.h"
>> +#include "hex.h"
>> +#include "quote.h"
>> +#include "config.h"
>> +#include "object-name.h"
>> +#include "parse-options.h"
>> +#include "builtin.h"
>> +
>> +static void show_entry(const char *path, const struct commit *commit, void *d)
>> +{
>> + struct last_modified *lm = d;
>> +
>> + if (commit->object.flags & BOUNDARY)
>> + putchar('^');
>> + printf("%s\t", oid_to_hex(&commit->object.oid));
>> +
>> + if (lm->rev.diffopt.line_termination)
>> + write_name_quoted(path, stdout, '\n');
>> + else
>> + printf("%s%c", path, '\0');
>> +
>> + fflush(stdout);
>> +}
>> +
>> +int cmd_last_modified(int argc,
>> + const char **argv,
>> + const char *prefix,
>> + struct repository *repo)
>> +{
>> + int ret = 0;
>
> `ret` is basically unused here, we only use it to return 0.
Good catch. Thanks!
>> diff --git a/last-modified.c b/last-modified.c
>> new file mode 100644
>> index 0000000000..9283f8fcae
>> --- /dev/null
>> +++ b/last-modified.c
>> @@ -0,0 +1,213 @@
>> +#include "git-compat-util.h"
>> +#include "last-modified.h"
>> +#include "commit.h"
>> +#include "diffcore.h"
>> +#include "diff.h"
>> +#include "object.h"
>> +#include "revision.h"
>> +#include "repository.h"
>> +#include "log-tree.h"
>> +
>> +struct last_modified_entry {
>> + struct hashmap_entry hashent;
>> + struct object_id oid;
>> + struct commit *commit;
>> + const char path[FLEX_ARRAY];
>> +};
>> +
>> +static void add_from_diff(struct diff_queue_struct *q,
>> + struct diff_options *opt UNUSED,
>> + void *data)
>> +{
>> + struct last_modified *lm = data;
>> +
>> + for (int i = 0; i < q->nr; i++) {
>> + struct diff_filepair *p = q->queue[i];
>> + struct last_modified_entry *ent;
>> + const char *path = p->two->path;
>> +
>> + FLEX_ALLOC_STR(ent, path, path);
>> + oidcpy(&ent->oid, &p->two->oid);
>> + hashmap_entry_init(&ent->hashent, strhash(ent->path));
>> + hashmap_add(&lm->paths, &ent->hashent);
>> + }
>> +}
>>
>> +static int add_from_revs(struct last_modified *lm)
>> +{
>> + size_t count = 0;
>> + struct diff_options diffopt;
>> +
>> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
>> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
>> + diffopt.output_format = DIFF_FORMAT_CALLBACK;
>> + diffopt.format_callback = add_from_diff;
>> + diffopt.format_callback_data = lm;
>
> As far as I understand we populate `paths` from the diff here, and
> `paths` later on acts as a filter of paths we're interested in? Might be
> nice to add a comment explaining the intent of this.
Will do.
>> + for (size_t i = 0; i < lm->rev.pending.nr; i++) {
>> + struct object_array_entry *obj = lm->rev.pending.objects + i;
>> +
>> + if (obj->item->flags & UNINTERESTING)
>> + continue;
>> +
>> + if (count++)
>> + return error(_("can only get last-modified one tree at a time"));
>
> It's a bit funny that `count` is pretending to be a counter even though
> it ultimately is only a boolean flag whether we have already seen an
> interesting item.
Okay, I'll refactor.
>> + diff_tree_oid(lm->rev.repo->hash_algo->empty_tree,
>> + &obj->item->oid, "", &diffopt);
>> + diff_flush(&diffopt);
>> + }
>> + clear_pathspec(&diffopt.pathspec);
>
> Shouldn't we call `diff_free()` instead of `clear_pathspec` to clear the
> whole `struct diff_options`?
Yeah, that's better.
>> +
>> + return 0;
>> +}
>> +
>> +static int last_modified_entry_hashcmp(const void *unused UNUSED,
>> + const struct hashmap_entry *hent1,
>> + const struct hashmap_entry *hent2,
>> + const void *path)
>> +{
>> + const struct last_modified_entry *ent1 =
>> + container_of(hent1, const struct last_modified_entry, hashent);
>> + const struct last_modified_entry *ent2 =
>> + container_of(hent2, const struct last_modified_entry, hashent);
>> + return strcmp(ent1->path, path ? path : ent2->path);
>> +}
>> +
>> +void last_modified_init(struct last_modified *lm,
>> + struct repository *r,
>> + const char *prefix,
>> + int argc, const char **argv)
>> +{
>> + memset(lm, 0, sizeof(*lm));
>> + hashmap_init(&lm->paths, last_modified_entry_hashcmp, NULL, 0);
>> +
>> + repo_init_revisions(r, &lm->rev, prefix);
>> + lm->rev.def = "HEAD";
>> + lm->rev.combine_merges = 1;
>> + lm->rev.show_root_diff = 1;
>> + lm->rev.boundary = 1;
>> + lm->rev.no_commit_id = 1;
>> + lm->rev.diff = 1;
>> + if (setup_revisions(argc, argv, &lm->rev, NULL) > 1)
>> + die(_("unknown last-modified argument: %s"), argv[1]);
>> +
>> + if (add_from_revs(lm) < 0)
>> + die(_("unable to setup last-modified"));
>
> Given that this is library code, do we rather want to have
> `last_modified_init()` return an error code and let the caller die?
Yes, agreed.
>> +}
>> +
>> +void last_modified_release(struct last_modified *lm)
>> +{
>> + hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
>> + release_revisions(&lm->rev);
>> +}
>> +
>> +struct last_modified_callback_data {
>> + struct commit *commit;
>> + struct hashmap *paths;
>> +
>> + last_modified_callback callback;
>> + void *callback_data;
>> +};
>> +
>> +static void mark_path(const char *path, const struct object_id *oid,
>> + struct last_modified_callback_data *data)
>> +{
>> + struct last_modified_entry *ent;
>> +
>> + /* Is it even a path that we are interested in? */
>> + ent = hashmap_get_entry_from_hash(data->paths, strhash(path), path,
>> + struct last_modified_entry, hashent);
>> + if (!ent)
>> + return;
>
> Yup, so this here is the filter to figure out whether we care for a
> path, which uses the `paths` map we have populated at the beginning.
>
>> + /* Have we already found a commit? */
>> + if (ent->commit)
>> + return;
>
> Can this case even be hit? We remove the entry from the map once we have
> seen it, so I'd expect that we never hit the same commit map entry
> twice. If so, can this be converted to a `BUG()` or am I missing the
> obvious?
I was wondering about that. But took it over from the original code, and
left it in. I believe it's from a version where entries weren't removed
from the hashmap. I agree a `BUG()` would be a better approach, or
even better simply delete this condition.
>> + /*
>> + * Is it arriving at a version of interest, or is it from a side branch
>> + * which did not contribute to the final state?
>> + */
>> + if (!oideq(oid, &ent->oid))
>> + return;
>> +
>> + ent->commit = data->commit;
>> + if (data->callback)
>> + data->callback(path, data->commit, data->callback_data);
>> +
>> + hashmap_remove(data->paths, &ent->hashent, path);
>
> And we end up removing that entry from paths so that we don't revisit it
> in the future. After all, we're only interested in a single commit per
> path.
True. So it doesn't even make sense a `struct last_modified_entry` has a
`commit` attribute. I will delete that in next version.
>> + free(ent);
>> +}
>> +
>> +static void last_modified_diff(struct diff_queue_struct *q,
>> + struct diff_options *opt UNUSED, void *cbdata)
>> +{
>> + struct last_modified_callback_data *data = cbdata;
>> +
>> + for (int i = 0; i < q->nr; i++) {
>> + struct diff_filepair *p = q->queue[i];
>> + switch (p->status) {
>> + case DIFF_STATUS_DELETED:
>> + /*
>> + * There's no point in feeding a deletion, as it could
>> + * not have resulted in our current state, which
>> + * actually has the file.
>> + */
>> + break;
>> +
>> + default:
>> + /*
>> + * Otherwise, we care only that we somehow arrived at
>> + * a final path/sha1 state. Note that this covers some
>> + * potentially controversial areas, including:
>> + *
>> + * 1. A rename or copy will be found, as it is the
>> + * first time the content has arrived at the given
>> + * path.
>> + *
>> + * 2. Even a non-content modification like a mode or
>> + * type change will trigger it.
>
> Curious, but sensible. We're looking for the last time a specific tree
> entry was changed, and that of course includes modifications. I could
> totally see that we may eventually want to add a flag that ignores such
> mode changes and only presents content changes. But for now I agree that
> this is sensible.
>
>> + * We take the inclusive approach for now, and find
>> + * anything which impacts the path. Options to tweak
>> + * the behavior (e.g., to "--follow" the content across
>> + * renames) can come later.
>> + */
>> + mark_path(p->two->path, &p->two->oid, data);
>> + break;
>> + }
>> + }
>> +}
>> +
>> +int last_modified_run(struct last_modified *lm, last_modified_callback cb, void *cbdata)
>> +{
>> + struct last_modified_callback_data data;
>> +
>> + data.paths = &lm->paths;
>> + data.callback = cb;
>> + data.callback_data = cbdata;
>> +
>> + lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
>> + lm->rev.diffopt.format_callback = last_modified_diff;
>> + lm->rev.diffopt.format_callback_data = &data;
>> +
>> + prepare_revision_walk(&lm->rev);
>> +
>> + while (hashmap_get_size(&lm->paths)) {
>
> Okay, and this is the core of our logic: we continue walking the tree
> until there are no more paths that we care about.
>
>> diff --git a/last-modified.h b/last-modified.h
>> new file mode 100644
>> index 0000000000..42a819d979
>> --- /dev/null
>> +++ b/last-modified.h
>> @@ -0,0 +1,27 @@
>> +#ifndef LAST_MODIFIED_H
>> +#define LAST_MODIFIED_H
>> +
>> +#include "commit.h"
>> +#include "revision.h"
>> +#include "hashmap.h"
>> +
>> +struct last_modified {
>> + struct hashmap paths;
>> + struct rev_info rev;
>> +};
>> +
>> +void last_modified_init(struct last_modified *lm,
>> + struct repository *r,
>> + const char *prefix,
>> + int argc, const char **argv);
>> +
>> +void last_modified_release(struct last_modified *);
>> +
>> +typedef void (*last_modified_callback)(const char *path,
>> + const struct commit *commit,
>> + void *data);
>> +int last_modified_run(struct last_modified *lm,
>> + last_modified_callback cb,
>> + void *cbdata);
>> +#endif /* LAST_MODIFIED_H */
>
> It would be nice to have some documentation for each of these functions
> as well as a bit of a higher-level conceptual info.
Will do.
--
Cheers,
Toon
next prev parent reply other threads:[~2025-06-13 9:34 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes [this message]
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldpw0yu3.fsf@iotcl.com \
--to=toon@iotcl.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).