git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Shubham Kanodia <shubham.kanodia10@gmail.com>,
	 Han Young <hanyang.tony@bytedance.com>,
	 Burke Libbey <burke.libbey@shopify.com>,
	git@vger.kernel.org
Subject: Re: [External] Re: git-blame extremely slow in partial clones due to serial object fetching
Date: Mon, 25 Nov 2024 09:22:20 +0900	[thread overview]
Message-ID: <xmqq4j3w9n7n.fsf@gitster.g> (raw)
In-Reply-To: <20241122175536.510952-1-jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 22 Nov 2024 09:55:35 -0800")

Jonathan Tan <jonathantanmy@google.com> writes:

> number of blobs fetched as the blame is being run. My biggest concern
> is that there is no good limit - I suspect that for a file that is
> extensively changed, 10 blobs is too few and you'll need something like
> 50 blobs. But 50 blobs means 50 RTTs, which also might be too much for
> an end user.

Depending on the project, size of a typical change to a blob may be
different, so "10 commits that touched this blob" may touch 20% of
the contents in one project, but in another that tends to prefer
finer-grained commits, it may take 50 commits to make the same
amount of change.

I agree with you that there is no good default that fits all
projects.

Do 50 blobs have to mean 50 RTTs?  I wonder if there is a good way
to say "please give me all necessary tree and blob objects to
complete the blobs at path $F for the past 50 commits" to the lazy
fetch machinery and receive a single pack that contain all the
objects that are listed in "git rev-list --objects HEAD~50.. -- $F"?

I am not sure what should happen in the commit in that range where
the path $F appears (meaning: the path did not exist, and its
contents came from a different path in the parent of that commit).
You'd need (a subset of) objects in "git rev-list --objects C^!" for
that commit to find out where it came from, but what subset should
we use?  Fully hydrating the trees of these commits at the rename
boundary would ensure you'd catch the same rename in a non-lazy
repository, but that is way too much more than what the user can
afford (otherwise, you wouldn't be using a narrow clone in the first
place).  So, I dunno.


      reply	other threads:[~2024-11-25  0:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-19 20:16 git-blame extremely slow in partial clones due to serial object fetching Burke Libbey
2024-11-20 11:59 ` Manoraj K
2024-11-20 18:52 ` Jonathan Tan
2024-11-20 22:55   ` Junio C Hamano
2024-11-21  3:12     ` [External] " Han Young
2024-11-22  3:32       ` Shubham Kanodia
2024-11-22  8:29         ` Junio C Hamano
2024-11-22  8:51           ` Shubham Kanodia
2024-11-22 17:55             ` Jonathan Tan
2024-11-25  0:22               ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4j3w9n7n.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=burke.libbey@shopify.com \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=shubham.kanodia10@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).