git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] packfile: avoid access(3p) calls for missing packs
@ 2025-05-16  8:55 Patrick Steinhardt
  2025-05-16 18:34 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-16  8:55 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

The function `add_packed_git()` adds a packfile to the known list of
packs in case it exists. In case the packfile doesn't look like a pack
though, or in case it doesn't exist, the function will simply return a
`NULL` pointer.

The ordering in which those checks are done is somewhat curious though:
we first perform a couple of access(3p) syscalls for auxiliary files
like ".keep" before we determine whether the ".pack" file itself exists.
And if we see that the ".pack" file does not exist, we bail out and thus
effectively discard all the information. This means that the access(3p)
calls were a complete waste of time.

The reason why we do this is likely because we reuse `p->pack_name` to
derive the other files' names, as well, so that we only have to allocate
this buffer once. As such, we have to compute the packfile name last so
that it doesn't get overwritten by the other names.

All of this likely doesn't matter in the general case: Git shouldn't end
up looking too many nonexistent packfiles anyway. But there are edge
cases where it _does_ matter. One such edge case that we have hit in our
production systems was a stale "multi-pack-index" file that contained
references to nonexistent packfiles. As there is no negative lookup
cache this causes us to repeatedly look up the same packfiles via the
multi-pack index, only to realize that they don't exist. This translated
into hundreds of thousands of syscalls to access(3p) and stat(3p).

While the issue was entirely self-made because the multi-pack index
should have been regenerated, we can still reduce the number of syscalls
by 75% in the case of nonexistent packfiles by reordering these calls.
This requires us to restore the final packfile name after the access(3p)
calls. But given that this is a mere memory copy it is very unlikely to
matter in comparison to the four syscalls we performed beforehand.

Another mitigation would be to introduce a negative lookup cache so that
we don't try to add missing packfiles repeatedly. But that refactoring
would be a bit more involved for dubious gains, so for now the author
has decided against it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

this patch addresses an issue we have recently seen in our production
systems due to a stale MIDX. The MIDX contained entries for packfiles
that didn't exist anymore, which caused Git to repeatedly look up those
packfiles. Each missing packfile resulted in four repeated syscalls:
three access(3p) calls to check for supporting data structures, and one
call to stat(3p) to check for the packfile itself. The first three calls
are essentially wasted though when the stat(3p) call itself fails, which
is being fixed by this patch.

I doubt that the patch matters in almost any repository, but given that
the refactoring is trivial I thought to submit the patch regardless of
that. Another step would be to introduce a negative lookup cache -- but
that would be a bit more involved, so I decided against it for now as I
don't want to introduce complexity for dubious gains.

Thanks!

Patrick
---
 packfile.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index d91016f1c7f..870d48bd949 100644
--- a/packfile.c
+++ b/packfile.c
@@ -737,6 +737,12 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	p = alloc_packed_git(r, alloc);
 	memcpy(p->pack_name, path, path_len);
 
+	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
+		free(p);
+		return NULL;
+	}
+
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;
@@ -749,11 +755,8 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	if (!access(p->pack_name, F_OK))
 		p->is_cruft = 1;
 
+	/* Restore the final packfile name. */
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
-		free(p);
-		return NULL;
-	}
 
 	/* ok, it looks sane as far as we can check without
 	 * actually mapping the pack file.

---
base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2025-08-28 23:25 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
2025-05-16 18:34 ` Junio C Hamano
2025-05-19  6:52   ` Jeff King
2025-05-19 15:46     ` Junio C Hamano
2025-05-20  6:45     ` Patrick Steinhardt
2025-05-22  5:28       ` Jeff King
2025-05-23  1:02     ` Taylor Blau
2025-05-23  2:03       ` Jeff King
2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-23  1:03     ` Taylor Blau
2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-22  5:32     ` Jeff King
2025-05-22 15:47       ` Junio C Hamano
2025-05-22 16:59         ` Jeff King
2025-05-22 18:44           ` Junio C Hamano
2025-05-23  1:22           ` Taylor Blau
2025-05-23  2:08             ` Jeff King
2025-05-23 17:46               ` Taylor Blau
2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:00                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:08                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:15                       ` Taylor Blau
2025-05-25 18:42                   ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-26  7:24                     ` Patrick Steinhardt
2025-05-28  2:18                       ` Taylor Blau
2025-05-28 11:53                         ` Patrick Steinhardt
2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
2025-05-29 20:47                       ` Junio C Hamano
2025-06-03 22:22                         ` Taylor Blau
2025-05-29 20:51                       ` Junio C Hamano
2025-06-03 22:23                         ` Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-30  6:50                       ` Jeff King
2025-06-03 22:27                         ` Taylor Blau
2025-08-28 23:25                           ` Junio C Hamano
2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
2025-05-23  2:18         ` Jeff King
2025-05-21 13:24   ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-30  6:27   ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).