All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Kai Koponen <kaikoponen@google.com>
Cc: git@vger.kernel.org
Subject: Re: Perf bug: rev-list w/ 2+ paths relatively slow with commit-graph
Date: Mon, 23 Jun 2025 12:36:21 -0700	[thread overview]
Message-ID: <xmqq8qli5jyi.fsf@gitster.g> (raw)
In-Reply-To: <CADYQcGqaMC=4jgbmnF9Q11oC11jfrqyvH8EuiRRHytpMXd4wYA@mail.gmail.com> (Kai Koponen's message of "Mon, 23 Jun 2025 13:58:03 -0400")

Kai Koponen <kaikoponen@google.com> writes:

> Reproduce steps:
> ```
> git clone https://github.com/golang/go.git
> cd go
> git config core.commitGraph true
> git commit-graph write --split --reachable --changed-paths  # Without
> this, all calls equally slow (~1s)
> time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> src/clean.bash > /dev/null  # ~90ms
> time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> src/Make.dist > /dev/null  # ~100ms
> time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> src/clean.bash src/Make.dist > /dev/null  # ~650ms
> ```
>
> The rev-list call with multiple paths takes over 3x longer than the
> sum of individual calls to it for the same files.
>
> Expectation: rev-list with multiple paths should take <= the sum of
> the time it takes to call it with each path individually (ideally <,
> since with the count limit it should be able to early-exit and search
> less commits for either path).
>
> Also reproduces without the -10 arg, or with a lower count (double
> instead of triple w/ -1), but these results are perhaps most
> surprising with a count present.

I asked 

    How does "git log -- path" use the changed-paths bloom filter
    stored in the commit-graph file?

to https://deepwiki.com/git/git (there is a text field in the bottom
of the page), and an early part of its answer explains why in a
fairly convincing way ;-)

    When you run git log -- path, Git first prepares to use bloom
    filters in the prepare_to_use_bloom_filter function. This function:

     1. Validates the pathspec - It calls forbid_bloom_filters to check
        if bloom filters can be used revision.c:674-686 . Bloom filters
        are disabled for wildcards, multiple paths, or complex pathspec
        magic.

     ...

In short, the changed-path filter is used only when following
pathspec with a single element that is not a wildcard.  So the
observed result is (unfortunately) quite expected.


  parent reply	other threads:[~2025-06-23 19:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-23 17:58 Perf bug: rev-list w/ 2+ paths relatively slow with commit-graph Kai Koponen
2025-06-23 18:04 ` Kai Koponen
2025-06-23 19:36 ` Junio C Hamano [this message]
2025-06-23 20:19   ` Kai Koponen
2025-06-23 21:00     ` Junio C Hamano
2025-06-24  3:16       ` Lidong Yan
2025-06-24 13:32         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8qli5jyi.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=kaikoponen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.