public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Robin H. Johnson" <robbat2@gentoo.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Feature request: secondary index by path fragment
Date: Tue, 7 May 2024 06:25:08 +0200	[thread overview]
Message-ID: <ZjmtJFF7rv7B8Nhj@tanuki> (raw)
In-Reply-To: <xmqqttjawmos.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]

On Mon, May 06, 2024 at 04:22:11PM -0700, Junio C Hamano wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> writes:
> 
> > Gentoo has some tooling that boils down to repeated runs of 'git log -- somepath/'
> > via cgit as well as other shell tooling.
> > ...
> > I was wondering if Git could gain a secondary index of commits, based on
> > path prefixes, that would speed up the 'git log' run.
> 
> Perhaps the bloom filters are good fit for the use case?

Yes, Bloom filters are the first thing that pop into my mind here as
they are exactly designed to solve this problem. So if you rewrite your
commit graphs with `git commit-graph write --changed-paths --reachable`
you should hopefully see a significant speedup.

It does surface some a usability issues though:

  - There is no easy way to enable the computation of bloom filters via
    configuration, to the best of my knowledge.

  - How would a non-Git-expert know?

It makes me wonder whether we can maybe enable generation of Bloom
filters by default. The biggest downside is of course that writing
commit graphs becomes slower. But that should happen in the background
for normal users anyway, and most forges probably hand-roll maintenance
and thus wouldn't care.

Is there any other thing I'm missing why those are not written by
default?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-05-07  4:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 23:11 Feature request: secondary index by path fragment Robin H. Johnson
2024-05-06 23:22 ` Junio C Hamano
2024-05-07  4:25   ` Patrick Steinhardt [this message]
2024-05-07  5:38     ` Robin H. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjmtJFF7rv7B8Nhj@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox