git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-last-modified weirdness
       [not found] <406222e6-d10b-47d8-a177-de5912db4512@codeberg.org>
@ 2026-01-04  5:13 ` Gusted
  2026-01-05 10:57   ` Toon Claes
  0 siblings, 1 reply; 3+ messages in thread
From: Gusted @ 2026-01-04  5:13 UTC (permalink / raw)
  To: git

Hi,

Resending this mail as it looks like it might not have arrived (couldn't 
find
it in the mailing list archive).

For Forgejo, I wanted to look into using git-last-modified to gain extra
performance for larger repositories where this can often result in being 
(one
of) the slowest git operation. However I noticed some problems that 
looks to
be bugs.

I've ran all the following commands on the following Git repository, on Git
v2.52.0 (Arch Linux) and my git config does not enable or disable any 
feature
that should've impacted the any of the following observations.

$ tmp=$(mktemp -d)
$ git clone https://codeberg.org/forgejo/forgejo $tmp
$ cd tmp

During some experiments I noticed it being slower for some files. An 
example:

$ hyperfine --warmup 5 'git log --max-count=1 DCO' 'git last-modified DCO'
Benchmark 1: git log --max-count=1 DCO
   Time (mean ± σ):      86.9 ms ±   0.8 ms    [User: 70.1 ms, System: 
15.6 ms]
   Range (min … max):    85.5 ms …  88.3 ms    34 runs

Benchmark 2: git last-modified DCO
   Time (mean ± σ):     151.3 ms ±   4.3 ms    [User: 133.4 ms, System: 
15.9 ms]
   Range (min … max):   145.4 ms … 167.1 ms    19 runs



This might be me misunderstanding the feature, but it looks to me this 
cannot
be used for paths that is inside a directory. The following two commands 
yield
the same output:

$ git last-modified -- web_src
24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
$ git last-modified -- web_src/svg
24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src

Where I expected the latter command to return the last commit of 
web_src/svg.



I'm not sure why I tried this, but I can trigger a BUG when giving it some
nonsense input:

$ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584
BUG: builtin/last-modified.c:456: paths remaining beyond boundary in
last-modified
[1]    690163 IOT instruction (core dumped)  git last-modified

`fb06ce04173d47aaaa498385621cba8b8dfd7584` is the tree commit id of 
web_src. I
suppose this should've returned a nice error message or blank output. It 
does
give a blank output when you specify a valid path:

$ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584 web_src

Kind regards,
Gusted


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git-last-modified weirdness
  2026-01-04  5:13 ` git-last-modified weirdness Gusted
@ 2026-01-05 10:57   ` Toon Claes
  2026-01-05 11:52     ` Gusted
  0 siblings, 1 reply; 3+ messages in thread
From: Toon Claes @ 2026-01-05 10:57 UTC (permalink / raw)
  To: Gusted, git

Gusted <gusted@codeberg.org> writes:

> Hi,
>
> Resending this mail as it looks like it might not have arrived (couldn't 
> find it in the mailing list archive).

Thanks for following up. I didn't see it yet.

> For Forgejo, I wanted to look into using git-last-modified to gain extra
> performance for larger repositories where this can often result in being 
> (one of) the slowest git operation. However I noticed some problems that 
> looks to be bugs.
>
> I've ran all the following commands on the following Git repository, on Git
> v2.52.0 (Arch Linux) and my git config does not enable or disable any 
> feature that should've impacted the any of the following observations.
>
> $ tmp=$(mktemp -d)
> $ git clone https://codeberg.org/forgejo/forgejo $tmp
> $ cd tmp
>
> During some experiments I noticed it being slower for some files. An 
> example:
>
> $ hyperfine --warmup 5 'git log --max-count=1 DCO' 'git last-modified DCO'
> Benchmark 1: git log --max-count=1 DCO
>    Time (mean ± σ):      86.9 ms ±   0.8 ms    [User: 70.1 ms, System: 15.6 ms]
>    Range (min … max):    85.5 ms …  88.3 ms    34 runs
>
> Benchmark 2: git last-modified DCO
>    Time (mean ± σ):     151.3 ms ±   4.3 ms    [User: 133.4 ms, System: 15.9 ms]
>    Range (min … max):   145.4 ms … 167.1 ms    19 runs

In my local benchmarks I see similar results.

I agree this isn't great, but git-log(1) is just very good at logging a
single path. git-last-modified(1) is mostly designed to give commits
for a bunch of paths. For example:

    $ hyperfine --warmup 5 'git ls-tree HEAD --name-only | xargs --max-args=1 git log --max-count=1 --format=oneline --' 'git last-modified'
    Benchmark 1: git ls-tree HEAD --name-only | xargs --max-args=1 git log --max-count=1 --format=oneline --
      Time (mean ± σ):     852.5 ms ±   9.2 ms    [User: 703.8 ms, System: 141.9 ms]
      Range (min … max):   841.9 ms … 869.4 ms    10 runs

    Benchmark 2: git last-modified
      Time (mean ± σ):     141.2 ms ±   2.0 ms    [User: 133.0 ms, System: 7.9 ms]
      Range (min … max):   137.7 ms … 146.0 ms    21 runs

    Summary
      git last-modified ran
        6.04 ± 0.11 times faster than git ls-tree HEAD --name-only | xargs --max-args=1 git log --max-count=1 --format=oneline --

> This might be me misunderstanding the feature, but it looks to me this 
> cannot be used for paths that is inside a directory. The following two commands 
> yield the same output:
>
> $ git last-modified -- web_src
> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
> $ git last-modified -- web_src/svg
> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
>
> Where I expected the latter command to return the last commit of 
> web_src/svg.

I agree this is confusing. And I plan to propose a change to this
behavior. But at the moment what you're supposed to do in this
situation:

    $ git last-modified -- web_src
    28e0af23faf6c8e8f353ba2ae818ee0f83fd3e5c        web_src
    $ git last-modified -r --max-depth=0 -- web_src/svg
    b8f15e4ea09c6571872607874ae099269ea4b201        web_src/svg

I plan to change the default behavior to basically behave like `-r
--max-depth=0`. But I'm happy to hear your input if you think it should
be something else?
There's some context here[1], but as said, I might shift direction a bit
toward making the default more intuitive.

[1]: https://lore.kernel.org/git/20251126-toon-last-modified-zzzz-v1-0-608350df0caa@iotcl.com/

> I'm not sure why I tried this, but I can trigger a BUG when giving it some
> nonsense input:
>
> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584
> BUG: builtin/last-modified.c:456: paths remaining beyond boundary in
> last-modified
> [1]    690163 IOT instruction (core dumped)  git last-modified
>
> `fb06ce04173d47aaaa498385621cba8b8dfd7584` is the tree commit id of 
> web_src. I
> suppose this should've returned a nice error message or blank output. It 
> does
> give a blank output when you specify a valid path:
>
> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584 web_src
>

Hah, that sounds like a real bug. Thanks for reporting, I will look into
it.

> Kind regards,
> Gusted
>
>

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git-last-modified weirdness
  2026-01-05 10:57   ` Toon Claes
@ 2026-01-05 11:52     ` Gusted
  0 siblings, 0 replies; 3+ messages in thread
From: Gusted @ 2026-01-05 11:52 UTC (permalink / raw)
  To: Toon Claes, git

On 1/5/26 11:57 AM, Toon Claes wrote:

 > Gusted <gusted@codeberg.org> writes:
 >
 >> Hi,
 >>
 >> Resending this mail as it looks like it might not have arrived (couldn't
 >> find it in the mailing list archive).
 > Thanks for following up. I didn't see it yet.
 >
 >> For Forgejo, I wanted to look into using git-last-modified to gain extra
 >> performance for larger repositories where this can often result in being
 >> (one of) the slowest git operation. However I noticed some problems that
 >> looks to be bugs.
 >>
 >> I've ran all the following commands on the following Git repository, 
on Git
 >> v2.52.0 (Arch Linux) and my git config does not enable or disable any
 >> feature that should've impacted the any of the following observations.
 >>
 >> $ tmp=$(mktemp -d)
 >> $ git clone https://codeberg.org/forgejo/forgejo $tmp
 >> $ cd tmp
 >>
 >> During some experiments I noticed it being slower for some files. An
 >> example:
 >>
 >> $ hyperfine --warmup 5 'git log --max-count=1 DCO' 'git 
last-modified DCO'
 >> Benchmark 1: git log --max-count=1 DCO
 >>     Time (mean ± σ):      86.9 ms ±   0.8 ms    [User: 70.1 ms, 
System: 15.6 ms]
 >>     Range (min … max):    85.5 ms …  88.3 ms    34 runs
 >>
 >> Benchmark 2: git last-modified DCO
 >>     Time (mean ± σ):     151.3 ms ±   4.3 ms    [User: 133.4 ms, 
System: 15.9 ms]
 >>     Range (min … max):   145.4 ms … 167.1 ms    19 runs
 > In my local benchmarks I see similar results.
 >
 > I agree this isn't great, but git-log(1) is just very good at logging a
 > single path. git-last-modified(1) is mostly designed to give commits
 > for a bunch of paths. For example:
 >
 >      $ hyperfine --warmup 5 'git ls-tree HEAD --name-only | xargs 
--max-args=1 git log --max-count=1 --format=oneline --' 'git last-modified'
 >      Benchmark 1: git ls-tree HEAD --name-only | xargs --max-args=1 
git log --max-count=1 --format=oneline --
 >        Time (mean ± σ):     852.5 ms ±   9.2 ms    [User: 703.8 ms, 
System: 141.9 ms]
 >        Range (min … max):   841.9 ms … 869.4 ms    10 runs
 >
 >      Benchmark 2: git last-modified
 >        Time (mean ± σ):     141.2 ms ±   2.0 ms    [User: 133.0 ms, 
System: 7.9 ms]
 >        Range (min … max):   137.7 ms … 146.0 ms    21 runs
 >
 >      Summary
 >        git last-modified ran
 >          6.04 ± 0.11 times faster than git ls-tree HEAD --name-only | 
xargs --max-args=1 git log --max-count=1 --format=oneline --
Only using git-last-modified when there are more than a few paths is
okay for how I want to use it. I was not really able to deduce this
from the manual, the general feeling after reading Github blog, Gitlab
blog and the release notes of v2.52.0 it looked to be a good
replacement of git log -n1 in all cases.
 >> This might be me misunderstanding the feature, but it looks to me this
 >> cannot be used for paths that is inside a directory. The following 
two commands
 >> yield the same output:
 >>
 >> $ git last-modified -- web_src
 >> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
 >> $ git last-modified -- web_src/svg
 >> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
 >>
 >> Where I expected the latter command to return the last commit of
 >> web_src/svg.
 > I agree this is confusing. And I plan to propose a change to this
 > behavior. But at the moment what you're supposed to do in this
 > situation:
 >
 >      $ git last-modified -- web_src
 >      28e0af23faf6c8e8f353ba2ae818ee0f83fd3e5c        web_src
 >      $ git last-modified -r --max-depth=0 -- web_src/svg
 >      b8f15e4ea09c6571872607874ae099269ea4b201        web_src/svg
 >
 > I plan to change the default behavior to basically behave like `-r
 > --max-depth=0`. But I'm happy to hear your input if you think it should
 > be something else?
 > There's some context here[1], but as said, I might shift direction a bit
 > toward making the default more intuitive.
 >
 > [1]: 
https://lore.kernel.org/git/20251126-toon-last-modified-zzzz-v1-0-608350df0caa@iotcl.com/

Oh, there's a whole new option! That's exactly what I was looking for
to get that behavior. Only returning the root level information by
default looks and feels silly and does remind me of git-diff-tree's
default, so I would agree on having -r --max-depth=0 as the default.
Returning the information exactly for the paths being given sounds most
reasonable.

Although given you mention that this command works best for multiple
paths I can also imagine -r --max-depth=1 as default to nudge people to
use it for that purpose.

 >> I'm not sure why I tried this, but I can trigger a BUG when giving 
it some
 >> nonsense input:
 >>
 >> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584
 >> BUG: builtin/last-modified.c:456: paths remaining beyond boundary in
 >> last-modified
 >> [1]    690163 IOT instruction (core dumped)  git last-modified
 >>
 >> `fb06ce04173d47aaaa498385621cba8b8dfd7584` is the tree commit id of
 >> web_src. I
 >> suppose this should've returned a nice error message or blank output. It
 >> does
 >> give a blank output when you specify a valid path:
 >>
 >> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584 web_src
 >>
 > Hah, that sounds like a real bug. Thanks for reporting, I will look into
 > it.
 >
 >> Kind regards,
 >> Gusted
 >>
 >>

Kind Regards
Gusted

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-05 11:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <406222e6-d10b-47d8-a177-de5912db4512@codeberg.org>
2026-01-04  5:13 ` git-last-modified weirdness Gusted
2026-01-05 10:57   ` Toon Claes
2026-01-05 11:52     ` Gusted

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).