From: Junio C Hamano <gitster@pobox.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, hanyang.tony@bytedance.com,
jonathantanmy@google.com, sokcevic@google.com
Subject: Re: [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos
Date: Sun, 22 Sep 2024 09:41:01 -0700 [thread overview]
Message-ID: <xmqqh6a78wv6.fsf@gitster.g> (raw)
In-Reply-To: <xmqqr09c89id.fsf@gitster.g> (Junio C. Hamano's message of "Sat, 21 Sep 2024 23:53:14 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> Calvin Wan <calvinwan@google.com> writes:
>
>> In a partial repository, creating a local commit and then fetching
>> causes the following state to occur:
>>
>> commit tree blob
>> C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
>> |
>> C2 ---- T2 -- B2 (created locally, in non-promisor pack)
>> |
>> C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
>>
>> During garbage collection, parents of promisor objects are marked as
>> UNINTERESTING and are subsequently garbage collected. In this case, C2
>> would be deleted and attempts to access that commit would result in "bad
>> object" errors (originally reported here[1]).
>
> Understandable.
> ...
>> When promisor objects are fetched, the state of the repository
>> should ensure that the above holds true. Therefore, do not declare local
>> commits as "have" in partial repositores so they can be fetched into a
>> promisor pack.
> ...
> We pretend that C2 and anything it reaches do not exist locally, to
> force them to be fetched from the remote? We'd end up having two
> copies of C2 (one that we created locally and had before starting
> this fetch, the other we fetched when we fetched C3 from them)?
> This sounds like it is awfully inefficient both network bandwidth-
> and local disk-wise.
One related thing that worries me is what happens after we make a
large push, either directly to the remote, or what we pushed
elsewhere were fetched by the remote, and then we need to fetch what
they created on top. The history may look like this:
1. we fetch from promisor remote. C is in promisor packs
---C
2. we build on top. 'x' are local.
---C---x---x---x---x---x---x---x---x
3. 'x' we created above ends up to be a the promisor side,
and others build a few commits on top.
---C---x---x---x---x---x---x---x---x---o---o
4. Now we try to fetch from them. I.e. a repository that has
history illustrated in 2. fetches the history illustrated in 3.
Because this change forbids the fetching side to tell the other side
that it has 'x', the first "have" we are allowed to send is 'C',
even though we only need to fetch two commits 'o' from them.
And 'x' could be numerous in distributed development workflows, as
these "local" commits do not have to be ones you created locally
yourself. You may have fetched and merged these commits from
elsewhere where the active development is happening. The only
criterion that qualifies a commit to be "local" (and causes us to
omit them from "have" communication) is that we didn't obtain it
directly from our promisor remote, so you may end up fetching
a large portion of the history you already have from the promisor
remote, just to have them into a promisor pack.
If we cannot change the definition of "is-promisor-object" for the
purpose of "gc" (and it is probably I am missing what you, JTan, and
HanYang thought about that I do not see he reason why), I wonder if
there is a way to somehow avoid the refetching but still "move"
these 'x' objects purely locally into a promisor pack?
That is, the current "git fetch" without this patch would only fetch
two 'o' commits (and its associated trees and blobs) into a new
promisor pack, but because we know that commits 'x' have now become
re-fetchable from the promisor, we can make them promisor objects by
repacking locally them and mark the resulting pack a promisor pack,
without incurring the cost to the remote to prepare and send 'x'
again to us. That would give us the same protection the patch under
discussion offers, wouldn't it?
I however still think fixing "gc" would give us a lot more intuitive
behaviour, though.
Thanks.
next prev parent reply other threads:[~2024-09-22 16:41 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 7:31 [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Han Young
2024-08-02 7:31 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-08-02 16:45 ` Junio C Hamano
2024-08-12 12:34 ` [External] " 韩仰
2024-08-12 16:09 ` Junio C Hamano
2024-08-22 8:28 ` 韩仰
2024-08-13 0:45 ` [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Jonathan Tan
2024-08-13 17:18 ` Jonathan Tan
2024-08-14 4:10 ` Junio C Hamano
2024-08-14 19:30 ` Jonathan Tan
2024-08-23 12:43 ` [WIP v2 0/4] " Han Young
2024-08-23 12:43 ` [WIP v2 1/4] packfile: split promisor objects oidset into two Han Young
2024-08-23 12:43 ` [WIP v2 2/4] revision: add exclude-promisor-pack-objects option Han Young
2024-08-23 12:43 ` [WIP v2 3/4] revision: don't mark commit as UNINTERESTING if --exclude-promisor-objects is set Han Young
2024-08-23 12:43 ` [WIP v2 4/4] repack: use new exclude promisor pack objects option Han Young
2024-09-19 23:47 ` [PATCH 0/2] revision: fix reachable commits being gc'ed in partial repo Calvin Wan
2024-09-19 23:47 ` [PATCH 1/2] packfile: split promisor objects oidset into two Calvin Wan
2024-09-22 6:37 ` Junio C Hamano
2024-09-19 23:47 ` [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos Calvin Wan
2024-09-22 6:53 ` Junio C Hamano
2024-09-22 16:41 ` Junio C Hamano [this message]
2024-09-23 3:44 ` [External] " 韩仰
2024-09-23 16:21 ` Junio C Hamano
2024-10-02 22:35 ` Calvin Wan
2024-09-25 7:20 ` [PATCH 0/2] repack: pack everything into promisor packfile " Han Young
2024-09-25 7:20 ` [PATCH 1/2] repack: pack everything into packfile Han Young
2024-09-25 7:20 ` [PATCH 2/2] t0410: adapt tests to repack changes Han Young
2024-09-25 15:20 ` [PATCH 0/2] repack: pack everything into promisor packfile in partial repos Phillip Wood
2024-09-25 16:48 ` Junio C Hamano
2024-09-25 17:03 ` Junio C Hamano
2024-10-01 19:17 ` Missing Promisor Objects in Partial Repo Design Doc Calvin Wan
2024-10-01 19:35 ` Junio C Hamano
2024-10-02 2:54 ` Junio C Hamano
2024-10-02 7:57 ` [External] " Han Young
2024-10-08 21:35 ` Calvin Wan
2024-10-09 6:46 ` [External] " Han Young
2024-10-09 18:34 ` Jonathan Tan
2024-10-12 2:05 ` Jonathan Tan
2024-10-12 3:30 ` Han Young
2024-10-14 17:52 ` Jonathan Tan
2024-10-09 18:53 ` Jonathan Tan
2024-10-08 8:13 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Han Young
2024-10-08 8:13 ` [PATCH v2 1/3] repack: pack everything into packfile Han Young
2024-10-08 21:41 ` Calvin Wan
2024-10-08 8:13 ` [PATCH v2 2/3] t0410: adapt tests to repack changes Han Young
2024-10-08 8:13 ` [PATCH v2 3/3] partial-clone: update doc Han Young
2024-10-08 21:57 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-08 22:43 ` Junio C Hamano
2024-10-09 6:31 ` [External] " Han Young
2024-10-11 8:24 ` [PATCH v3 " Han Young
2024-10-11 8:24 ` [PATCH v3 1/3] repack: pack everything into packfile Han Young
2024-10-11 8:24 ` [PATCH v3 2/3] repack: adapt tests to repack changes Han Young
2024-10-11 8:24 ` [PATCH v3 3/3] partial-clone: update doc Han Young
2024-10-11 18:18 ` [PATCH v3 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-11 18:23 ` Junio C Hamano
2024-10-14 3:25 ` [PATCH v4 " Han Young
2024-10-14 3:25 ` [PATCH v4 1/3] repack: pack everything into packfile Han Young
2024-10-14 3:25 ` [PATCH v4 2/3] t0410: adapt tests to repack changes Han Young
2024-10-14 3:25 ` [PATCH v4 3/3] partial-clone: update doc Han Young
2024-10-21 22:29 ` [WIP 0/3] Repack on fetch Jonathan Tan
2024-10-21 22:29 ` [WIP 1/3] move variable Jonathan Tan
2024-10-21 22:29 ` [WIP 2/3] pack-objects Jonathan Tan
2024-10-21 22:29 ` [WIP 3/3] record local links and call pack-objects Jonathan Tan
2024-10-23 7:00 ` [External] [WIP 0/3] Repack on fetch Han Young
2024-10-23 17:03 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqh6a78wv6.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=calvinwan@google.com \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=jonathantanmy@google.com \
--cc=sokcevic@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).