All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: jonathantanmy@google.com, me@ttaylorr.com, peff@peff.net,
	Derrick Stolee <dstolee@microsoft.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v2] connected.c: reprepare packs for corner cases
Date: Fri, 13 Mar 2020 21:11:55 +0000	[thread overview]
Message-ID: <pull.579.v2.git.1584133915654.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.579.git.1584027403779.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

While updating the microsoft/git fork on top of v2.26.0-rc0 and
consuming that build into Scalar, I noticed a corner case bug around
partial clone.

The "scalar clone" command can create a Git repository with the
proper config for using partial clone with the "blob:none" filter.
Instead of calling "git clone", it runs "git init" then sets a few
more config values before running "git fetch".

In our builds on v2.26.0-rc0, we noticed that our "git fetch"
command was failing with

  error: https://github.com/microsoft/scalar did not send all necessary objects

This does not happen if you copy the config file from a repository
created by "git clone --filter=blob:none <url>", but it does happen
when adding the config option "core.logAllRefUpdates = true".

By debugging, I was able to see that the loop inside
check_connnected() that checks if all refs are contained in
promisor packs actually did not have any packfiles in the packed_git
list.

I'm not sure what corner-case issues caused this config option to
prevent the reprepare_packed_git() from being called at the proper
spot during the fetch operation. This approach requires a situation
where we use the remote helper process, which makes it difficult to
test.

It is possible to place a reprepare_packed_git() call in the fetch code
closer to where we receive a pack, but that leaves an opening for a
later change to re-introduce this problem. Further, a concurrent repack
operation could replace the pack-file list we already loaded into
memory, causing this issue in an even harder to reproduce scenario.

It is really the responsibility of anyone looping through the list of
pack-files for a certain object to fall back to reprepare_packed_git()
on a fail-to-find. The loop in check_connected() does not have this
fallback, leading to this bug.

We _could_ try looping through the packs and only reprepare the packs
after a miss, but that change is more involved and has little value.
Since this case is isolated to the case when
opt->check_refs_are_promisor_objects_only is true, we are confident that
we are verifying the refs after downloading new data. This implies that
calling reprepare_packed_git() in advance is not a huge cost compared to
the rest of the operations already made.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    connected.c: reprepare packs for corner cases
    
    I included how I found this (integrating v2.26.0-rc0 into Scalar), but I
    am able to reproduce it on my Linux machine using real fetches from
    github.com. I'm not sure why I was unable to reproduce the issue in test
    cases using the file:// URLs or the HTTP tests.
    
    Update in V2: I've updated the commit message to discuss the options
    presented on-list, but also provide why I'm keeping the code unchanged
    in light of that discussion.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-579%2Fderrickstolee%2Ffetch-reprepare-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-579/derrickstolee/fetch-reprepare-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/579

Range-diff vs v1:

 1:  cde7aa20ca8 ! 1:  696a51bd5a0 connected.c: reprepare packs for corner cases
     @@ -27,17 +27,31 @@
      
          I'm not sure what corner-case issues caused this config option to
          prevent the reprepare_packed_git() from being called at the proper
     -    spot during the fetch operation. Even worse, I have failed to create
     -    a test case to prevent a regression.
     +    spot during the fetch operation. This approach requires a situation
     +    where we use the remote helper process, which makes it difficult to
     +    test.
      
     -    Placing a reprepare_packed_git() call inside chck_connected() before
     -    looping through the packed_git list seems like the safest way to
     -    avoid this issue in the future. While reprepare_packed_git() does
     -    another scan of the pack directory, it is not terribly expensive as
     -    long as we do not run it in a loop. We check connectivity only a
     -    few times per command, so this will not have a meaningful performance
     -    impact. In exchange, we get extra safety around this check.
     +    It is possible to place a reprepare_packed_git() call in the fetch code
     +    closer to where we receive a pack, but that leaves an opening for a
     +    later change to re-introduce this problem. Further, a concurrent repack
     +    operation could replace the pack-file list we already loaded into
     +    memory, causing this issue in an even harder to reproduce scenario.
      
     +    It is really the responsibility of anyone looping through the list of
     +    pack-files for a certain object to fall back to reprepare_packed_git()
     +    on a fail-to-find. The loop in check_connected() does not have this
     +    fallback, leading to this bug.
     +
     +    We _could_ try looping through the packs and only reprepare the packs
     +    after a miss, but that change is more involved and has little value.
     +    Since this case is isolated to the case when
     +    opt->check_refs_are_promisor_objects_only is true, we are confident that
     +    we are verifying the refs after downloading new data. This implies that
     +    calling reprepare_packed_git() in advance is not a huge cost compared to
     +    the rest of the operations already made.
     +
     +    Helped-by: Jeff King <peff@peff.net>
     +    Helped-by: Junio Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/connected.c b/connected.c


 connected.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/connected.c b/connected.c
index 7e9bd1bc622..ac52b07b474 100644
--- a/connected.c
+++ b/connected.c
@@ -61,7 +61,11 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		 * object is a promisor object. Instead, just make sure we
 		 * received, in a promisor packfile, the objects pointed to by
 		 * each wanted ref.
+		 *
+		 * Before checking for promisor packs, be sure we have the
+		 * latest pack-files loaded into memory.
 		 */
+		reprepare_packed_git(the_repository);
 		do {
 			struct packed_git *p;
 

base-commit: b4374e96c84ed9394fed363973eb540da308ed4f
-- 
gitgitgadget

      parent reply	other threads:[~2020-03-13 21:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 15:36 [PATCH] connected.c: reprepare packs for corner cases Derrick Stolee via GitGitGadget
2020-03-12 16:39 ` Jonathan Tan
2020-03-12 17:34   ` Derrick Stolee
2020-03-12 20:42 ` Junio C Hamano
2020-03-12 21:16   ` Jeff King
2020-03-12 21:26     ` Jeff King
2020-03-13  0:54       ` Derrick Stolee
2020-03-13  1:14         ` Junio C Hamano
2020-03-13  2:30         ` Jeff King
2020-03-13  2:34           ` Jeff King
2020-03-13 12:43             ` Derrick Stolee
2020-03-13 21:11 ` Derrick Stolee via GitGitGadget [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.579.v2.git.1584133915654.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.