git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, gitster@pobox.com,
	johannes.schindelin@gmx.de, peff@peff.net, johncai86@gmail.com,
	newren@gmail.com, christian.couder@gmail.com,
	kristofferhaugsbakk@fastmail.com, jonathantanmy@google.com,
	karthik nayak <karthik.188@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 0/6] PATH WALK I: The path-walk API
Date: Mon, 25 Nov 2024 09:56:09 +0100	[thread overview]
Message-ID: <Z0Q7oGF6Q5U-f4VX@pks.im> (raw)
In-Reply-To: <Zz+61fat+vGgb+xL@nand.local>

On Thu, Nov 21, 2024 at 05:57:25PM -0500, Taylor Blau wrote:
> On Sat, Nov 09, 2024 at 07:41:06PM +0000, Derrick Stolee via GitGitGadget wrote:
> > Derrick Stolee (6):
> >   path-walk: introduce an object walk by path
> >   test-lib-functions: add test_cmp_sorted
> >   t6601: add helper for testing path-walk API
> >   path-walk: allow consumer to specify object types
> >   path-walk: visit tags and cached objects
> >   path-walk: mark trees and blobs as UNINTERESTING
> 
> My apologies for taking so long to review this. Having rad through the
> patches in detail, a couple of thoughts:
> 
>   - First, I like the structure that you decided on for this series. It
>     nicely demonstrates a minimal caller for this new API instead of
>     implementing a bunch of untested code. I think that's a great way to
>     lay out things up until this point.
> 
>   - Second, I read through the existing API and only had minor comments.
>     I read through the implementation in detail and found it to match my
>     expectation of how each step should function.
> 
> So my take-away from spending a few hours with this series is that
> everything seems on track so far, and I think this is in a good spot to
> build on for more path-walk features.
> 
> That all said, I am still not totally sold on the idea that we need a
> separate path-based traversal given the significant benefits of the
> full-name hash approach that I reviewed earlier today.

The repo size reductions achieved via the path-walk API was only one of
the selling points of this series. And from my current understanding we
will likely not end up realizing those gains via path-walk, but rather
via the much simpler full-name hash algorithm indeed.

But there were two more selling points:

  - git-survey(1) as a native replacement for git-sizer(1). I think it
    is a great idea to have a native tool that allows us to gain deep
    insights into a repository so that we get better signals from our
    users in case they face problems with their repository. I'd love to
    have this tool as a baseline for an extensible format where we can
    eventually also start reporting the health state of refs as well as
    any auxiliary data structures.

  - git-backfill(1) as a helper to fetch blobs more efficiently from a
    promisor remote. This is a boon to have as well in our odyssey
    towards a better UI/UX with huge monorepos.

Both of these tools are quite exciting to me, and there is a need for
having such tools from my point of view.

The question of course is whether these tools require the path-walk API,
or whether they could be built on top of existing functionality. But if
there are good reasons why the existing functionality is insufficient
then I'd be all for having the path-walk API, even if it doesn't help us
with repo size reductions as we initially thought.

Patrick

  reply	other threads:[~2024-11-25  8:56 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31  6:26 [PATCH 0/6] PATH WALK I: The path-walk API Derrick Stolee via GitGitGadget
2024-10-31  6:26 ` [PATCH 1/6] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-11-01 13:12   ` karthik nayak
2024-11-01 13:44     ` Derrick Stolee
     [not found]   ` <draft-87r07v14kl.fsf@archlinux.mail-host-address-is-not-set>
2024-11-01 13:42     ` karthik nayak
2024-10-31  6:26 ` [PATCH 2/6] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-10-31  6:27 ` [PATCH 3/6] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-11-01 13:46   ` karthik nayak
2024-11-01 22:23   ` Jonathan Tan
2024-11-04 15:56     ` Derrick Stolee
2024-11-04 23:39       ` Jonathan Tan
2024-11-08 14:53         ` Derrick Stolee
2024-11-06 14:04   ` Patrick Steinhardt
2024-11-08 14:58     ` Derrick Stolee
2024-10-31  6:27 ` [PATCH 4/6] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-31  6:27 ` [PATCH 5/6] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-11-01 14:25   ` karthik nayak
2024-11-04 15:56     ` Derrick Stolee
2024-10-31  6:27 ` [PATCH 6/6] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-10-31 12:36 ` [PATCH 0/6] PATH WALK I: The path-walk API Derrick Stolee
2024-11-01 19:23 ` Taylor Blau
2024-11-04 15:48   ` Derrick Stolee
2024-11-04 17:25     ` Jeff King
2024-11-05  0:11       ` Junio C Hamano
2024-11-08 15:17         ` Derrick Stolee
2024-11-11  2:56           ` Junio C Hamano
2024-11-11 13:20             ` Derrick Stolee
2024-11-11 21:55             ` Jeff King
2024-11-11 22:29               ` Junio C Hamano
2024-11-11 22:04           ` Jeff King
2024-11-09 19:41 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-11-09 19:41   ` [PATCH v2 1/6] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-11-09 19:41   ` [PATCH v2 2/6] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-11-09 19:41   ` [PATCH v2 3/6] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-11-21 22:39     ` Taylor Blau
2024-11-09 19:41   ` [PATCH v2 4/6] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-11-21 22:44     ` Taylor Blau
2024-11-09 19:41   ` [PATCH v2 5/6] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-11-09 19:41   ` [PATCH v2 6/6] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-11-21 22:57   ` [PATCH v2 0/6] PATH WALK I: The path-walk API Taylor Blau
2024-11-25  8:56     ` Patrick Steinhardt [this message]
2024-11-26  7:39       ` Junio C Hamano
2024-11-26  7:43         ` Patrick Steinhardt
2024-11-26  8:16           ` Junio C Hamano
2024-12-06 19:45   ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2024-12-06 19:45     ` [PATCH v3 1/7] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-12-13 11:58       ` Patrick Steinhardt
2024-12-18 14:21         ` Derrick Stolee
2024-12-27 14:18           ` Patrick Steinhardt
2024-12-06 19:45     ` [PATCH v3 2/7] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-12-06 19:45     ` [PATCH v3 3/7] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-12-06 19:45     ` [PATCH v3 4/7] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-12-06 19:45     ` [PATCH v3 5/7] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-12-13 11:58       ` Patrick Steinhardt
2024-12-18 14:23         ` Derrick Stolee
2024-12-06 19:45     ` [PATCH v3 6/7] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-12-06 19:45     ` [PATCH v3 7/7] path-walk: reorder object visits Derrick Stolee via GitGitGadget
2024-12-13 11:58     ` [PATCH v3 0/7] PATH WALK I: The path-walk API Patrick Steinhardt
2024-12-20 16:21     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 1/7] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-12-27 14:18         ` Patrick Steinhardt
2024-12-20 16:21       ` [PATCH v4 2/7] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 3/7] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 4/7] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 5/7] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 6/7] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-12-20 16:21       ` [PATCH v4 7/7] path-walk: reorder object visits Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z0Q7oGF6Q5U-f4VX@pks.im \
    --to=ps@pks.im \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=karthik.188@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).