git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object()
@ 2025-12-04 17:21 Aaron Plattner
  2025-12-05 12:36 ` Patrick Steinhardt
  2025-12-05 17:48 ` Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: Aaron Plattner @ 2025-12-04 17:21 UTC (permalink / raw)
  To: git; +Cc: Aaron Plattner

When is_promisor_object() is called for the first time, it lazily
initializes a set of all promisor objects by iterating through all
objects in promisor packs. For each object, add_promisor_object() calls
parse_object(), which decompresses and hashes the entire object.

For repositories with large pack files, this can take an extremely long
time. For example, on a production repository with a 176 GB promisor
pack:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in   76.10 mins    fish           external
    usr time   72.10 mins    1.83 millis   72.10 mins
    sys time    3.56 mins    0.17 millis    3.56 mins

add_promisor_object() needs the full object for trees, commits, and
tags. But blobs contain no references to other objects, so the function
can just insert their oids into the set and move on.

For objects that weren't already parsed, use odb_read_object_info() to
query the object type. If it's a blob, just insert it into the oidset
without parsing it. This improves performance for very large pack files
significantly:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in  118.76 secs    fish           external
    usr time   50.88 secs   11.02 millis   50.87 secs
    sys time   36.31 secs    0.08 millis   36.31 secs

Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
---
 packfile.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/packfile.c b/packfile.c
index 9cc11b6dc5..563fd14f0e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2309,6 +2309,17 @@ static int add_promisor_object(const struct object_id *oid,
 	if (obj && obj->parsed) {
 		we_parsed_object = 0;
 	} else {
+		/*
+		 * Blobs don't reference other objects, so skip parsing them
+		 * to save time.
+		 */
+		enum object_type type;
+		type = odb_read_object_info(pack->repo->objects, oid, NULL);
+		if (type == OBJ_BLOB) {
+			oidset_insert(set, oid);
+			return 0;
+		}
+
 		we_parsed_object = 1;
 		obj = parse_object(pack->repo, oid);
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-06  1:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-04 17:21 [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-05 12:36 ` Patrick Steinhardt
2025-12-05 16:55   ` Aaron Plattner
2025-12-05 17:59     ` Jeff King
2025-12-05 17:48 ` Jeff King
2025-12-05 18:01   ` Jeff King
2025-12-05 18:50     ` Aaron Plattner
2025-12-05 21:28       ` Jeff King
2025-12-05 21:56         ` Aaron Plattner
2025-12-06  1:58           ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).