From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, ps@pks.im, gitster@pobox.com
Subject: [PATCH v3 0/3] Performance improvements for repacking non-promisor objects
Date: Tue, 3 Dec 2024 13:52:53 -0800 [thread overview]
Message-ID: <cover.1733262661.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1733170252.git.jonathantanmy@google.com>
Apparently I did not save in my text editor (and didn't notice because
the code comment was still valid syntactically, so everything still
compiled). Here's a version with the updated and correctly formatted
code comment.
Jonathan Tan (3):
index-pack --promisor: dedup before checking links
index-pack --promisor: don't check blobs
index-pack --promisor: also check commits' trees
builtin/index-pack.c | 103 +++++++++++++++++++++++++++++--------------
1 file changed, 71 insertions(+), 32 deletions(-)
Range-diff against v2:
1: 7ae21c921f = 1: 7ae21c921f index-pack --promisor: dedup before checking links
2: 5a63c9a5ca ! 2: a1d2a20203 index-pack --promisor: don't check blobs
@@ builtin/index-pack.c: static void record_outgoing_link(const struct object_id *o
+static void maybe_record_name_entry(const struct name_entry *entry)
+{
+ /*
-+ * The benefit of doing this is as above (fetch speedup), but the drawback
-+is that if the packfile to be indexed references a local blob directly
-+(that is, not through a local tree), that local blob is in danger of
-+being garbage collected. Such a situation may arise if we push local
-+commits, including one with a change to a blob in the root tree,
-+and then the server incorporates them into its main branch through a
-+"rebase" or "squash" merge strategy, and then we fetch the new main
-+branch from the server.
-+
-+This situation has not been observed yet - we have only noticed missing
-+commits, not missing trees or blobs. (In fact, if it were believed that
-+only missing commits are problematic, one could argue that we should
-+also exclude trees during the outgoing link check; but it is safer to
-+include them.)
-+
-+Due to the rarity of the situation (it has not been observed to happen
-+in real life), and because the "penalty" in such a situation is merely
-+to refetch the missing blob when it's needed, the tradeoff seems
-+worth it.
++ * Checking only trees here results in a significantly faster packfile
++ * indexing, but the drawback is that if the packfile to be indexed
++ * references a local blob only directly (that is, never through a
++ * local tree), that local blob is in danger of being garbage
++ * collected. Such a situation may arise if we push local commits,
++ * including one with a change to a blob in the root tree, and then the
++ * server incorporates them into its main branch through a "rebase" or
++ * "squash" merge strategy, and then we fetch the new main branch from
++ * the server.
++ *
++ * This situation has not been observed yet - we have only noticed
++ * missing commits, not missing trees or blobs. (In fact, if it were
++ * believed that only missing commits are problematic, one could argue
++ * that we should also exclude trees during the outgoing link check;
++ * but it is safer to include them.)
++ *
++ * Due to the rarity of the situation (it has not been observed to
++ * happen in real life), and because the "penalty" in such a situation
++ * is merely to refetch the missing blob when it's needed (and this
++ * happens only once - when refetched, the blob goes into a promisor
++ * pack, so it won't be GC-ed, the tradeoff seems worth it.
+ */
+ if (S_ISDIR(entry->mode))
+ record_outgoing_link(&entry->oid);
3: 8139325bf2 = 3: f9f9969a8f index-pack --promisor: also check commits' trees
--
2.47.0.338.g60cca15819-goog
next prev parent reply other threads:[~2024-12-03 21:53 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-02 20:18 [PATCH 0/3] Performance improvements for repacking non-promisor objects Jonathan Tan
2024-12-02 20:18 ` [PATCH 1/3] index-pack: dedup first during outgoing link check Jonathan Tan
2024-12-02 21:24 ` Josh Steadmon
2024-12-02 20:18 ` [PATCH 2/3] index-pack: no blobs " Jonathan Tan
2024-12-03 6:00 ` Patrick Steinhardt
2024-12-03 21:40 ` Jonathan Tan
2024-12-03 22:16 ` Junio C Hamano
2024-12-02 20:18 ` [PATCH 3/3] index-pack: commit tree " Jonathan Tan
2024-12-03 3:10 ` Junio C Hamano
2024-12-03 21:42 ` Jonathan Tan
2024-12-04 0:21 ` Junio C Hamano
2024-12-09 20:29 ` Jonathan Tan
2024-12-09 23:51 ` Junio C Hamano
2024-12-02 21:25 ` [PATCH 0/3] Performance improvements for repacking non-promisor objects Josh Steadmon
2024-12-03 4:09 ` Junio C Hamano
2024-12-03 4:18 ` Junio C Hamano
2024-12-03 4:20 ` Junio C Hamano
2024-12-03 4:39 ` Junio C Hamano
2024-12-03 21:43 ` [PATCH v2 " Jonathan Tan
2024-12-03 21:43 ` [PATCH v2 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-03 21:43 ` [PATCH v2 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:43 ` [PATCH v2 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-03 21:52 ` Jonathan Tan [this message]
2024-12-03 21:52 ` [PATCH v3 1/3] index-pack --promisor: dedup before checking links Jonathan Tan
2024-12-04 4:36 ` Junio C Hamano
2024-12-03 21:52 ` [PATCH v3 2/3] index-pack --promisor: don't check blobs Jonathan Tan
2024-12-03 21:52 ` [PATCH v3 3/3] index-pack --promisor: also check commits' trees Jonathan Tan
2024-12-04 2:22 ` [PATCH v3 0/3] Performance improvements for repacking non-promisor objects Junio C Hamano
2024-12-04 4:46 ` [PATCH 4/3] index-pack: work around false positive use of uninitialized variable Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1733262661.git.jonathantanmy@google.com \
--to=jonathantanmy@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.