git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, hanyang.tony@bytedance.com
Subject: [PATCH 2/3] index-pack: no blobs during outgoing link check
Date: Mon,  2 Dec 2024 12:18:39 -0800	[thread overview]
Message-ID: <300f53b8e39fa1dd55f65924d20f8abd22cbbfc9.1733170252.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1733170252.git.jonathantanmy@google.com>

As a follow-up to the parent of this commit, it was found that not
checking for the existence of blobs linked from trees sped up the fetch
from 24m47.815s to 2m2.127s. Teach Git to do that.

The benefit of doing this is as above (fetch speedup), but the drawback
is that if the packfile to be indexed references a local blob directly
(that is, not through a local tree), that local blob is in danger of
being garbage collected. Such a situation may arise if we push local
commits, including one with a change to a blob in the root tree,
and then the server incorporates them into its main branch through a
"rebase" or "squash" merge strategy, and then we fetch the new main
branch from the server.

This situation has not been observed yet - we have only noticed missing
commits, not missing trees or blobs. (In fact, if it were believed that
only missing commits are problematic, one could argue that we should
also exclude trees during the outgoing link check; but it is safer to
include them.)

Due to the rarity of the situation (it has not been observed to happen
in real life), and because the "penalty" in such a situation is merely
to refetch the missing blob when it's needed, the tradeoff seems
worth it.

(Blobs may also be linked from tag objects, but it is impossible to know
the type of an object linked from a tag object without looking it up in
the object database, so the code for that is untouched.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/index-pack.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8e7d14c17e..58d24540dc 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -830,8 +830,10 @@ static void do_record_outgoing_links(struct object *obj)
 			 * verified, so do not print any here.
 			 */
 			return;
-		while (tree_entry_gently(&desc, &entry))
-			record_outgoing_link(&entry.oid);
+		while (tree_entry_gently(&desc, &entry)) {
+			if (S_ISDIR(entry.mode))
+				record_outgoing_link(&entry.oid);
+		}
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;
-- 
2.47.0.338.g60cca15819-goog


  parent reply	other threads:[~2024-12-02 20:18 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 20:18 [PATCH 0/3] Performance improvements for repacking non-promisor objects Jonathan Tan
2024-12-02 20:18 ` [PATCH 1/3] index-pack: dedup first during outgoing link check Jonathan Tan
2024-12-02 21:24   ` Josh Steadmon
2024-12-02 20:18 ` Jonathan Tan [this message]
2024-12-03  6:00   ` [PATCH 2/3] index-pack: no blobs " Patrick Steinhardt
2024-12-03 21:40     ` Jonathan Tan
2024-12-03 22:16     ` Junio C Hamano
2024-12-02 20:18 ` [PATCH 3/3] index-pack: commit tree " Jonathan Tan
2024-12-03  3:10   ` Junio C Hamano
2024-12-03 21:42     ` Jonathan Tan
2024-12-04  0:21       ` Junio C Hamano
2024-12-09 20:29         ` Jonathan Tan
2024-12-09 23:51           ` Junio C Hamano
2024-12-02 21:25 ` [PATCH 0/3] Performance improvements for repacking non-promisor objects Josh Steadmon
2024-12-03  4:09   ` Junio C Hamano
2024-12-03  4:18 ` Junio C Hamano
2024-12-03  4:20   ` Junio C Hamano
2024-12-03  4:39     ` Junio C Hamano
2024-12-03 21:43 ` [PATCH v2 " Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:43   ` [PATCH v2 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-03 21:52 ` [PATCH v3 0/3] Performance improvements for repacking non-promisor objects Jonathan Tan
2024-12-03 21:52   ` [PATCH v3 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-04  4:36     ` Junio C Hamano
2024-12-03 21:52   ` [PATCH v3 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:52   ` [PATCH v3 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-04  2:22   ` [PATCH v3 0/3] Performance improvements for repacking non-promisor objects Junio C Hamano
2024-12-04  4:46   ` [PATCH 4/3] index-pack: work around false positive use of uninitialized variable Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=300f53b8e39fa1dd55f65924d20f8abd22cbbfc9.1733170252.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).