From: Junio C Hamano <gitster@pobox.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, hanyang.tony@bytedance.com,
jonathantanmy@google.com, sokcevic@google.com
Subject: Re: [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos
Date: Sat, 21 Sep 2024 23:53:14 -0700 [thread overview]
Message-ID: <xmqqr09c89id.fsf@gitster.g> (raw)
In-Reply-To: <20240919234741.1317946-3-calvinwan@google.com> (Calvin Wan's message of "Thu, 19 Sep 2024 23:47:41 +0000")
Calvin Wan <calvinwan@google.com> writes:
> In a partial repository, creating a local commit and then fetching
> causes the following state to occur:
>
> commit tree blob
> C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
> |
> C2 ---- T2 -- B2 (created locally, in non-promisor pack)
> |
> C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
>
> During garbage collection, parents of promisor objects are marked as
> UNINTERESTING and are subsequently garbage collected. In this case, C2
> would be deleted and attempts to access that commit would result in "bad
> object" errors (originally reported here[1]).
Understandable.
> This is not a bug in gc since it should be the case that parents of
> promisor objects are also promisor objects (fsck assumes this as
> well).
I am not sure where this "not a bug" claim comes from. Here, the
definition of "promisor objects" seems to be anything that are
reachable from objects in promisor packs, but isn't the source of
the bug that collects C2 exactly that "gc" uses such a definition
for discardable objects that can be refetchd from promisor remotes?
> When promisor objects are fetched, the state of the repository
> should ensure that the above holds true. Therefore, do not declare local
> commits as "have" in partial repositores so they can be fetched into a
> promisor pack.
Could you clarify what it means in the context of the above example
you gave in an updated version of the proposed log message?
We pretend that C2 and anything it reaches do not exist locally, to
force them to be fetched from the remote? We'd end up having two
copies of C2 (one that we created locally and had before starting
this fetch, the other we fetched when we fetched C3 from them)?
This sounds like it is awfully inefficient both network bandwidth-
and local disk-wise.
I was hoping to see that the issue can be fixed on the "gc" side,
regardless of how the objects enter our repository, but perhaps I am
missing something. Isn't it just the matter of collecting C1, C3
but not C2? Or to put it another way, if we first create a list of
objects to be packed (regardless of whether they are in promisor
packs), and then remove the objects that are in promisor packs from
the list, and pack the objects still remaining in the list?
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 58b4581ad8..c39b0f6ad4 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1297,12 +1297,23 @@ static void add_common(struct strbuf *req_buf, struct oidset *common)
>
> static int add_haves(struct fetch_negotiator *negotiator,
> struct strbuf *req_buf,
> - int *haves_to_send)
> + int *haves_to_send,
> + int from_promisor)
> {
> int haves_added = 0;
> const struct object_id *oid;
>
> while ((oid = negotiator->next(negotiator))) {
> + /*
> + * In partial repos, do not declare local objects as "have"
> + * so that they can be fetched into a promisor pack. Certain
> + * operations mark parent commits of promisor objects as
> + * UNINTERESTING and are subsequently garbage collected so
> + * this ensures local commits are still available in promisor
> + * packs after a fetch + gc.
> + */
> + if (from_promisor && !is_in_promisor_pack(oid, 0))
> + continue;
> packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
> if (++haves_added >= *haves_to_send)
> break;
> @@ -1405,7 +1416,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
> /* Add all of the common commits we've found in previous rounds */
> add_common(&req_buf, common);
>
> - haves_added = add_haves(negotiator, &req_buf, haves_to_send);
> + haves_added = add_haves(negotiator, &req_buf, haves_to_send, args->from_promisor);
> *in_vain += haves_added;
> trace2_data_intmax("negotiation_v2", the_repository, "haves_added", haves_added);
> trace2_data_intmax("negotiation_v2", the_repository, "in_vain", *in_vain);
> @@ -2178,7 +2189,7 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
>
> packet_buf_write(&req_buf, "wait-for-done");
>
> - haves_added = add_haves(&negotiator, &req_buf, &haves_to_send);
> + haves_added = add_haves(&negotiator, &req_buf, &haves_to_send, 0);
> in_vain += haves_added;
> if (!haves_added || (seen_ack && in_vain >= MAX_IN_VAIN))
> last_iteration = 1;
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 8415884754..cba9f7ed9b 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -693,6 +693,35 @@ test_expect_success 'lazy-fetch in submodule succeeds' '
> git -C client restore --recurse-submodules --source=HEAD^ :/
> '
>
> +test_expect_success 'fetching from promisor remote fetches previously local commits' '
> + # Setup
> + git init full &&
> + git -C full config uploadpack.allowfilter 1 &&
> + git -C full config uploadpack.allowanysha1inwant 1 &&
> + touch full/foo &&
> + git -C full add foo &&
> + git -C full commit -m "commit 1" &&
> + git -C full checkout --detach &&
> +
> + # Partial clone and push commit to remote
> + git clone "file://$(pwd)/full" --filter=blob:none partial &&
> + echo "hello" > partial/foo &&
> + git -C partial commit -a -m "commit 2" &&
> + git -C partial push &&
> +
> + # gc in partial repo
> + git -C partial gc --prune=now &&
> +
> + # Create another commit in normal repo
> + git -C full checkout main &&
> + echo " world" >> full/foo &&
> + git -C full commit -a -m "commit 3" &&
> +
> + # Pull from remote in partial repo, and run gc again
> + git -C partial pull &&
> + git -C partial gc --prune=now
> +'
> +
> . "$TEST_DIRECTORY"/lib-httpd.sh
> start_httpd
next prev parent reply other threads:[~2024-09-22 6:53 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 7:31 [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Han Young
2024-08-02 7:31 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-08-02 16:45 ` Junio C Hamano
2024-08-12 12:34 ` [External] " 韩仰
2024-08-12 16:09 ` Junio C Hamano
2024-08-22 8:28 ` 韩仰
2024-08-13 0:45 ` [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Jonathan Tan
2024-08-13 17:18 ` Jonathan Tan
2024-08-14 4:10 ` Junio C Hamano
2024-08-14 19:30 ` Jonathan Tan
2024-08-23 12:43 ` [WIP v2 0/4] " Han Young
2024-08-23 12:43 ` [WIP v2 1/4] packfile: split promisor objects oidset into two Han Young
2024-08-23 12:43 ` [WIP v2 2/4] revision: add exclude-promisor-pack-objects option Han Young
2024-08-23 12:43 ` [WIP v2 3/4] revision: don't mark commit as UNINTERESTING if --exclude-promisor-objects is set Han Young
2024-08-23 12:43 ` [WIP v2 4/4] repack: use new exclude promisor pack objects option Han Young
2024-09-19 23:47 ` [PATCH 0/2] revision: fix reachable commits being gc'ed in partial repo Calvin Wan
2024-09-19 23:47 ` [PATCH 1/2] packfile: split promisor objects oidset into two Calvin Wan
2024-09-22 6:37 ` Junio C Hamano
2024-09-19 23:47 ` [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos Calvin Wan
2024-09-22 6:53 ` Junio C Hamano [this message]
2024-09-22 16:41 ` Junio C Hamano
2024-09-23 3:44 ` [External] " 韩仰
2024-09-23 16:21 ` Junio C Hamano
2024-10-02 22:35 ` Calvin Wan
2024-09-25 7:20 ` [PATCH 0/2] repack: pack everything into promisor packfile " Han Young
2024-09-25 7:20 ` [PATCH 1/2] repack: pack everything into packfile Han Young
2024-09-25 7:20 ` [PATCH 2/2] t0410: adapt tests to repack changes Han Young
2024-09-25 15:20 ` [PATCH 0/2] repack: pack everything into promisor packfile in partial repos Phillip Wood
2024-09-25 16:48 ` Junio C Hamano
2024-09-25 17:03 ` Junio C Hamano
2024-10-01 19:17 ` Missing Promisor Objects in Partial Repo Design Doc Calvin Wan
2024-10-01 19:35 ` Junio C Hamano
2024-10-02 2:54 ` Junio C Hamano
2024-10-02 7:57 ` [External] " Han Young
2024-10-08 21:35 ` Calvin Wan
2024-10-09 6:46 ` [External] " Han Young
2024-10-09 18:34 ` Jonathan Tan
2024-10-12 2:05 ` Jonathan Tan
2024-10-12 3:30 ` Han Young
2024-10-14 17:52 ` Jonathan Tan
2024-10-09 18:53 ` Jonathan Tan
2024-10-08 8:13 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Han Young
2024-10-08 8:13 ` [PATCH v2 1/3] repack: pack everything into packfile Han Young
2024-10-08 21:41 ` Calvin Wan
2024-10-08 8:13 ` [PATCH v2 2/3] t0410: adapt tests to repack changes Han Young
2024-10-08 8:13 ` [PATCH v2 3/3] partial-clone: update doc Han Young
2024-10-08 21:57 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-08 22:43 ` Junio C Hamano
2024-10-09 6:31 ` [External] " Han Young
2024-10-11 8:24 ` [PATCH v3 " Han Young
2024-10-11 8:24 ` [PATCH v3 1/3] repack: pack everything into packfile Han Young
2024-10-11 8:24 ` [PATCH v3 2/3] repack: adapt tests to repack changes Han Young
2024-10-11 8:24 ` [PATCH v3 3/3] partial-clone: update doc Han Young
2024-10-11 18:18 ` [PATCH v3 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-11 18:23 ` Junio C Hamano
2024-10-14 3:25 ` [PATCH v4 " Han Young
2024-10-14 3:25 ` [PATCH v4 1/3] repack: pack everything into packfile Han Young
2024-10-14 3:25 ` [PATCH v4 2/3] t0410: adapt tests to repack changes Han Young
2024-10-14 3:25 ` [PATCH v4 3/3] partial-clone: update doc Han Young
2024-10-21 22:29 ` [WIP 0/3] Repack on fetch Jonathan Tan
2024-10-21 22:29 ` [WIP 1/3] move variable Jonathan Tan
2024-10-21 22:29 ` [WIP 2/3] pack-objects Jonathan Tan
2024-10-21 22:29 ` [WIP 3/3] record local links and call pack-objects Jonathan Tan
2024-10-23 7:00 ` [External] [WIP 0/3] Repack on fetch Han Young
2024-10-23 17:03 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqr09c89id.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=calvinwan@google.com \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=jonathantanmy@google.com \
--cc=sokcevic@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).