git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-blame extremely slow in partial clones due to serial object fetching
@ 2024-11-19 20:16 Burke Libbey
  2024-11-20 11:59 ` Manoraj K
  2024-11-20 18:52 ` Jonathan Tan
  0 siblings, 2 replies; 10+ messages in thread
From: Burke Libbey @ 2024-11-19 20:16 UTC (permalink / raw)
  To: git

When running git-blame in a partial clone (--filter=blob:none), it fetches
missing blob objects one at a time. This can result in thousands of serial fetch
operations, making blame extremely slow, regardless of network latency.

For example, in one large repository, blaming a single large file required 
fetching about 6500 objects. Each fetch requiring a round-trip means this 
operation would have taken something on the order of an hour to complete.

The core issue appears to be in fill_origin_blob(), which is called
individually for each blob needed during the blame process. While the blame
algorithm does need blob contents to make detailed line-matching decisions,
it seems like we don't necessarily need the contents just to determine which 
blobs we'llexamine.

It seems like this could be optimized by batch-fetching the needed objects
upfront, rather than fetching them one at a time. This would convert O(n)
round-trips into a small number of batch fetches.

Reproduction:
1. Create a partial clone with --filter=blob:none
2. Run git blame on a file with significant history
3. Observe serial fetching of objects in the trace output

Let me know if you need any additional information to investigate this issue.

—burke

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-11-25  0:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-19 20:16 git-blame extremely slow in partial clones due to serial object fetching Burke Libbey
2024-11-20 11:59 ` Manoraj K
2024-11-20 18:52 ` Jonathan Tan
2024-11-20 22:55   ` Junio C Hamano
2024-11-21  3:12     ` [External] " Han Young
2024-11-22  3:32       ` Shubham Kanodia
2024-11-22  8:29         ` Junio C Hamano
2024-11-22  8:51           ` Shubham Kanodia
2024-11-22 17:55             ` Jonathan Tan
2024-11-25  0:22               ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).