All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	steadmon@google.com, me@ttaylorr.com,  hanxin.hx@bytedance.com
Subject: [PATCH v2 0/2] When fetching, die if in commit graph but not obj db
Date: Thu, 31 Oct 2024 14:18:59 -0700	[thread overview]
Message-ID: <cover.1730409376.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1730235646.git.jonathantanmy@google.com>

Thanks everyone for your review comments. I've updated the patch 1
commit message as Josh requested. I'll reply individually to comments on
patch 2.

Jonathan Tan (2):
  Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
  fetch-pack: warn if in commit graph but not obj db

 fetch-pack.c                               | 42 +++++++++++-----------
 t/t5330-no-lazy-fetch-with-commit-graph.sh |  2 +-
 2 files changed, 23 insertions(+), 21 deletions(-)

Range-diff against v1:
1:  4dea8933cf ! 1:  34e87b8388 Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
    @@ Commit message
     
         This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.
     
    +    This revert simplifies the next patch in this patch set.
    +
         The commit message of that commit mentions that the new function "will
         be used for the bundle-uri client in a subsequent commit", but it seems
         that eventually it wasn't used.
2:  1027ff2cb7 ! 2:  631b9a8677 fetch-pack: warn if in commit graph but not obj db
    @@ Commit message
         action, because the object is present in the commit graph file but not in
         the object DB.
     
    -    Therefore, detect when this occurs and print a warning. (Note that
    -    we cannot proceed to include this object in the list of objects to
    -    be fetched without changing at least the fetch negotiation code:
    -    what would happen is that the client will send "want X" and "have X"
    -    and when I tested at $DAYJOB with a work server that uses JGit, the
    -    server reasonably returned an empty packfile. And changing the fetch
    -    negotiation code to only use the object DB when deciding what to report
    -    as "have" would be an unnecessary slowdown, I think.)
    +    Therefore, make it a fatal error when this occurs. (Note that we cannot
    +    proceed to include this object in the list of objects to be fetched
    +    without changing at least the fetch negotiation code: what would happen
    +    is that the client will send "want X" and "have X" and when I tested
    +    at $DAYJOB with a work server that uses JGit, the server reasonably
    +    returned an empty packfile. And changing the fetch negotiation code to
    +    only use the object DB when deciding what to report as "have" would be
    +    an unnecessary slowdown, I think.)
     
         This was discovered when a lazy fetch of a missing commit completed with
         nothing actually fetched, and the writing of the commit graph file after
         every fetch then attempted to read said missing commit, triggering a
         lazy fetch of said missing commit, resulting in an infinite loop with no
         user-visible indication (until they check the list of processes running
    -    on their computer). With this fix, at least a warning message will be
    -    printed. Note that although the repo corruption we discovered was caused
    -    by a bug in GC in a partial clone, the behavior that this patch teaches
    -    Git to warn about applies to any repo with commit graph enabled and with
    -    a missing commit, whether it is a partial clone or not.
    +    on their computer). With this fix, there is no infinite loop. Note that
    +    although the repo corruption we discovered was caused by a bug in GC in
    +    a partial clone, the behavior that this patch teaches Git to warn about
    +    applies to any repo with commit graph enabled and with a missing commit,
    +    whether it is a partial clone or not.
    +
    +    t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
    +    lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
    +    fetch and the commit graph does not cause an infinite loop. This patch
    +    changes the exit code in that situation, so that test had to be changed.
     
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## fetch-pack.c ##
    -@@ fetch-pack.c: static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
    - #define ALTERNATE	(1U << 1)
    - #define COMMON		(1U << 6)
    - #define REACH_SCRATCH	(1U << 7)
    -+#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
    - 
    - /*
    -  * After sending this many "have"s if we do not get any new ACK , we
     @@ fetch-pack.c: static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
    + 		cb(negotiator, cache.items[i]);
      }
      
    ++static void die_in_commit_graph_only(const struct object_id *oid)
    ++{
    ++	die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n"
    ++	      "This is probably due to repo corruption.\n"
    ++	      "If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."),
    ++	      oid_to_hex(oid));
    ++}
    ++
      static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
     -					       int mark_tags_complete)
    -+					       int mark_additional_complete_information)
    ++					       int mark_tags_complete_and_check_obj_db)
      {
      	enum object_type type;
      	struct object_info info = { .typep = &type };
    @@ fetch-pack.c: static void for_each_cached_alternate(struct fetch_negotiator *neg
      	commit = lookup_commit_in_graph(the_repository, oid);
     -	if (commit)
     +	if (commit) {
    -+		if (mark_additional_complete_information)
    -+			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
    ++		if (mark_tags_complete_and_check_obj_db) {
    ++			if (!has_object(the_repository, oid, 0))
    ++				die_in_commit_graph_only(oid);
    ++		}
      		return commit;
     +	}
      
    @@ fetch-pack.c: static struct commit *deref_without_lazy_fetch(const struct object
      			if (!tag->tagged)
      				return NULL;
     -			if (mark_tags_complete)
    -+			if (mark_additional_complete_information)
    ++			if (mark_tags_complete_and_check_obj_db)
      				tag->object.flags |= COMPLETE;
      			oid = &tag->tagged->oid;
      		} else {
    -@@ fetch-pack.c: static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
    - 	save_commit_buffer = old_save_commit_buffer;
    - }
    - 
    -+static void warn_in_commit_graph_only(const struct object_id *oid)
    -+{
    -+	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
    -+		oid_to_hex(oid));
    -+	warning(_("This is probably due to repo corruption."));
    -+	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
    -+}
    -+
    - /*
    -  * Returns 1 if every object pointed to by the given remote refs is available
    -  * locally and reachable from a local ref, and 0 otherwise.
    -@@ fetch-pack.c: static int everything_local(struct fetch_pack_args *args,
    - 				      ref->name);
    - 			continue;
    - 		}
    -+		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
    -+			if (!has_object(the_repository, remote, 0))
    -+				warn_in_commit_graph_only(remote);
    -+		}
    - 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
    - 			      ref->name);
    - 	}
     
    - ## object.h ##
    -@@ object.h: void object_array_init(struct object_array *array);
    - /*
    -  * object flag allocation:
    -  * revision.h:               0---------10         15               23------27
    -- * fetch-pack.c:             01    67
    -+ * fetch-pack.c:             01    6-8
    -  * negotiator/default.c:       2--5
    -  * walker.c:                 0-2
    -  * upload-pack.c:                4       11-----14  16-----19
    + ## t/t5330-no-lazy-fetch-with-commit-graph.sh ##
    +@@ t/t5330-no-lazy-fetch-with-commit-graph.sh: test_expect_success 'fetch any commit from promisor with the usage of the commit
    + 	test_commit -C with-commit any-commit &&
    + 	anycommit=$(git -C with-commit rev-parse HEAD) &&
    + 	GIT_TRACE="$(pwd)/trace.txt" \
    +-		git -C with-commit-graph fetch origin $anycommit 2>err &&
    ++		test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err &&
    + 	! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
    + 	grep "git fetch origin" trace.txt >actual &&
    + 	test_line_count = 1 actual
-- 
2.47.0.163.g1226f6d8fa-goog


  parent reply	other threads:[~2024-10-31 21:19 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-03 22:35 [RFC PATCH] promisor-remote: always JIT fetch with --refetch Emily Shaffer
2024-10-06 22:43 ` Junio C Hamano
2024-10-07  0:21   ` Robert Coup
2024-10-07  0:37     ` Junio C Hamano
2024-10-11 16:40   ` Emily Shaffer
2024-10-11 17:54     ` Junio C Hamano
2024-10-23  0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
2024-10-23 18:53   ` Emily Shaffer
2024-10-23 20:11   ` Taylor Blau
2024-10-28 22:55     ` Jonathan Tan
2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
2024-10-29 21:11   ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-30 21:22     ` Josh Steadmon
2024-10-29 21:11   ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-10-30 21:22     ` Josh Steadmon
2024-10-31 21:23       ` Jonathan Tan
2024-10-31 20:59     ` Taylor Blau
2024-10-31 21:43       ` Jonathan Tan
2024-11-01 14:33         ` Taylor Blau
2024-11-01 17:33           ` Jonathan Tan
2024-10-30 21:22   ` [PATCH 0/2] When fetching, " Josh Steadmon
2024-10-31 21:18   ` Jonathan Tan [this message]
2024-10-31 21:19     ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-31 21:19     ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-11-01  2:22       ` Junio C Hamano
2024-11-01  4:25         ` Junio C Hamano
2024-11-01  8:59           ` [External] " Han Xin
2024-11-01 17:46             ` Jonathan Tan
2024-11-01 17:40           ` Jonathan Tan
2024-11-02  2:08             ` Junio C Hamano
2024-11-01 17:36         ` Jonathan Tan
2024-11-01 15:18       ` Taylor Blau
2024-11-01 17:49         ` Jonathan Tan
2024-10-31 22:33     ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
2024-11-05 19:24   ` [PATCH v3 " Jonathan Tan
2024-11-05 19:24     ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-11-05 19:24     ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
2024-11-06  3:12     ` [PATCH v3 0/2] When fetching, " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1730409376.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=hanxin.hx@bytedance.com \
    --cc=me@ttaylorr.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.