From: Calvin Wan <calvinwan@google.com>
To: Han Young <hanyang.tony@bytedance.com>
Cc: Calvin Wan <calvinwan@google.com>,
git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>,
Phillip Wood <phillip.wood123@gmail.com>,
Enrico Mrass <emrass@google.com>,
sokcevic@google.com
Subject: Missing Promisor Objects in Partial Repo Design Doc
Date: Tue, 1 Oct 2024 19:17:51 +0000 [thread overview]
Message-ID: <20241001191811.1934900-1-calvinwan@google.com> (raw)
In-Reply-To: <20240802073143.56731-1-hanyang.tony@bytedance.com>
It seems that we're at a standstill for the various possible designs
that can solve this problem, so I decided to write up a design document
to discuss the ideas we've come up with so far and new ones. Hopefully
this will get us closer to a viable implementation we can agree on.
Missing Promisor Objects in Partial Repo Design Doc
===================================================
Basic Reproduction Steps
------------------------
- Partial clone repository
- Create local commit and push
- Fetch new changes
- Garbage collection
State After Reproduction
------------------------
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2b ---- T2b -- B2b (created locally, in non-promisor pack)
|
C2a ---- T2a -- B2a (created locally, in non-promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Explanation of the Problem
--------------------------
In a partial clone repository, non-promisor commits are locally
committed as children of promisor commits and then pushed up to the
server. Fetches of new history can result in promisor commits that have
non-promisor commits as ancestors. During garbage collection, objects
are repacked in 2 steps. In the first step, if there is more than one
promisor packfile, all objects in promisor packfiles are repacked into a
single promisor packfile. In the second step, a revision walk is made
from all refs (and some other things like HEAD and reflog entries) that
stops whenever it encounters a promisor object. In the example above, if
a ref pointed directly to C2a, it would be returned by the walk (as an
object to be packed). But if we only had a ref pointing to C3, the
revision walk immediately sees that it is a promisor object, does not
return it, and does not iterate through its parents.
(C2b is a bit of a special case. Despite not being in a promisor pack,
it is still considered to be a promisor object since C3 directly
references it.)
If we think this is a bad state, we should propagate the “promisor-ness”
of C3 to its ancestors. Git commands should either prevent this state
from occurring or tolerate it and fix it when we can. If we did run into
this state unexpectedly, then it would be considered a BUG.
If we think it is a valid state, we should NOT propagate the
“promisor-ness” of C3 to its ancestors. Git commands should respect that
this is a possible state and be able to work around it. Therefore, this
bug would then be strictly caused by garbage collection
Bad State Solutions
===================
Fetch negotiation
-----------------
Implemented at
https://lore.kernel.org/git/20240919234741.1317946-1-calvinwan@google.com/
During fetch negotiation, if a commit is not in a promisor pack and
therefore local, do not declare it as "have" so they can be fetched into
a promisor pack.
Cost:
- Creation of set of promisor pack objects (by iterating through every
.idx of promisor packs)
- Refetch number of local commits
Pros: Implementation is simple, client doesn’t have to repack, prevents
state from ever occurring in the repository.
Cons: Network cost of refetching could be high if many local commits
need to be refetched.
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2 ---- T2 -- B2 (created locally, refetched into promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Fetch repack
------------
Not yet implemented.
Enumerate the objects in the freshly fetched promisor packs, checking
every outgoing link to see if they reference a non-promisor object that
we have, to get a list of tips where local objects are parents of
promisor objects ("bad history"). After collecting these "tips of bad
history", you then start another traversal from them until you hit an
object in a promisor pack and stop traversal there. You have
successfully enumerated the local objects to be repacked into a promisor
pack.
Cost:
- Traversal through newly fetched promisor trees and commits
- Creation of set of promisor pack objects (for tips of bad history
traversal to stop at a promisor object)
- Traversal through all local commits and check existence in promisor
pack set
- Repack all pushed local commits
Pros: Prevents state from ever occurring in the repository, no network
cost.
Cons: Additional cost of repacking is incurred during fetch, more
complex implementation.
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2 ---- T2 -- B2 (created locally, packed into promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Garbage Collection repack
-------------------------
Not yet implemented.
Same concept at “fetch repack”, but happens during garbage collection
instead. The traversal is more expensive since we no longer have access
to what was recently fetched so we have to traverse through all promisor
packs to collect tips of “bad” history.
Cost:
- Creation of set of promisor pack objects
- Traversal through all promisor commits
- Traversal through all local commits and check existence in promisor
object set
- Repack all pushed local commits
Pros: Can be run in the background as part of maintenance, no network
cost.
Cons: More expensive than “fetch repack”, state isn’t fixed until
garbage collection, more complex implementation
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2 ---- T2 -- B2 (created locally, packed into promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Garbage Collection repack all
-----------------------------
Implemented at
https://lore.kernel.org/git/20240925072021.77078-1-hanyang.tony@bytedance.com/
Repack all local commits into promisor packs during garbage collection.
Both valid scenarios
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2 ---- T2 -- B2 (created locally, packed into promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
commit tree blob
C3 ---- T3 -- B3 (created locally, packed into promisor pack)
|
C2 ---- T2 -- B2 (created locally, packed into promisor pack)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Cost:
Repack all local commits
Pros: Can be run in the background as part of maintenance, no network
cost, less complex implementation, and less expensive than “garbage
collection repack”.
Cons: Packing local objects into promisor packs means that it is no
longer possible to detect if an object is missing due to repository
corruption or because we need to fetch it from a promisor remote.
Packing local objects into promisor packs means that garbage collection
will no longer remove unreachable local objects.
Valid State Solutions
=====================
Garbage Collection check
------------------------
Not yet implemented.
Currently during the garbage collection rev walk, whenever a promisor
commit is reached, it is marked UNINTERESTING, and then subsequently all
ancestors of the promisor commit are traversed and also marked
UNINTERESTING. Therefore, add a check for whether a commit is local or
not during promisor commit ancestor traversal and do not mark local
commits as UNINTERESTING.
commit tree blob
C3 ---- T3 -- B3 (fetched from remote, in promisor pack)
|
C2 ---- T2 -- B2 (created locally, in non-promisor pack, gc does not delete)
|
C1 ---- T1 -- B1 (fetched from remote, in promisor pack)
Cost:
- Adds an additional check to every ancestor of a promisor commit.
This is practically the only solution if the state is valid. Fsck would
also have to start checking for validity of ancestors of promisor
commits instead of ignoring them as it currently does.
Optimizations
=============
The “creation of set of promisor pack objects” can be replaced with
“creation of set of non-promisor objects” since the latter is almost
always cheaper and we can check for non-existence rather than existence.
This does not work for “fetch negotiation” since if we have a commit
that's in both a promisor pack and a non-promisor pack, the algorithm's
correctness relies on the fact that we report it as a promisor object
(because we really need the server to re-send it).
next prev parent reply other threads:[~2024-10-01 19:18 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 7:31 [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Han Young
2024-08-02 7:31 ` [PATCH 1/1] revision: don't set parents as uninteresting if exclude promisor objects Han Young
2024-08-02 16:45 ` Junio C Hamano
2024-08-12 12:34 ` [External] " 韩仰
2024-08-12 16:09 ` Junio C Hamano
2024-08-22 8:28 ` 韩仰
2024-08-13 0:45 ` [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo Jonathan Tan
2024-08-13 17:18 ` Jonathan Tan
2024-08-14 4:10 ` Junio C Hamano
2024-08-14 19:30 ` Jonathan Tan
2024-08-23 12:43 ` [WIP v2 0/4] " Han Young
2024-08-23 12:43 ` [WIP v2 1/4] packfile: split promisor objects oidset into two Han Young
2024-08-23 12:43 ` [WIP v2 2/4] revision: add exclude-promisor-pack-objects option Han Young
2024-08-23 12:43 ` [WIP v2 3/4] revision: don't mark commit as UNINTERESTING if --exclude-promisor-objects is set Han Young
2024-08-23 12:43 ` [WIP v2 4/4] repack: use new exclude promisor pack objects option Han Young
2024-09-19 23:47 ` [PATCH 0/2] revision: fix reachable commits being gc'ed in partial repo Calvin Wan
2024-09-19 23:47 ` [PATCH 1/2] packfile: split promisor objects oidset into two Calvin Wan
2024-09-22 6:37 ` Junio C Hamano
2024-09-19 23:47 ` [PATCH 2/2] fetch-pack.c: do not declare local commits as "have" in partial repos Calvin Wan
2024-09-22 6:53 ` Junio C Hamano
2024-09-22 16:41 ` Junio C Hamano
2024-09-23 3:44 ` [External] " 韩仰
2024-09-23 16:21 ` Junio C Hamano
2024-10-02 22:35 ` Calvin Wan
2024-09-25 7:20 ` [PATCH 0/2] repack: pack everything into promisor packfile " Han Young
2024-09-25 7:20 ` [PATCH 1/2] repack: pack everything into packfile Han Young
2024-09-25 7:20 ` [PATCH 2/2] t0410: adapt tests to repack changes Han Young
2024-09-25 15:20 ` [PATCH 0/2] repack: pack everything into promisor packfile in partial repos Phillip Wood
2024-09-25 16:48 ` Junio C Hamano
2024-09-25 17:03 ` Junio C Hamano
2024-10-01 19:17 ` Calvin Wan [this message]
2024-10-01 19:35 ` Missing Promisor Objects in Partial Repo Design Doc Junio C Hamano
2024-10-02 2:54 ` Junio C Hamano
2024-10-02 7:57 ` [External] " Han Young
2024-10-08 21:35 ` Calvin Wan
2024-10-09 6:46 ` [External] " Han Young
2024-10-09 18:34 ` Jonathan Tan
2024-10-12 2:05 ` Jonathan Tan
2024-10-12 3:30 ` Han Young
2024-10-14 17:52 ` Jonathan Tan
2024-10-09 18:53 ` Jonathan Tan
2024-10-08 8:13 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Han Young
2024-10-08 8:13 ` [PATCH v2 1/3] repack: pack everything into packfile Han Young
2024-10-08 21:41 ` Calvin Wan
2024-10-08 8:13 ` [PATCH v2 2/3] t0410: adapt tests to repack changes Han Young
2024-10-08 8:13 ` [PATCH v2 3/3] partial-clone: update doc Han Young
2024-10-08 21:57 ` [PATCH v2 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-08 22:43 ` Junio C Hamano
2024-10-09 6:31 ` [External] " Han Young
2024-10-11 8:24 ` [PATCH v3 " Han Young
2024-10-11 8:24 ` [PATCH v3 1/3] repack: pack everything into packfile Han Young
2024-10-11 8:24 ` [PATCH v3 2/3] repack: adapt tests to repack changes Han Young
2024-10-11 8:24 ` [PATCH v3 3/3] partial-clone: update doc Han Young
2024-10-11 18:18 ` [PATCH v3 0/3] repack: pack everything into promisor packfile in partial repos Junio C Hamano
2024-10-11 18:23 ` Junio C Hamano
2024-10-14 3:25 ` [PATCH v4 " Han Young
2024-10-14 3:25 ` [PATCH v4 1/3] repack: pack everything into packfile Han Young
2024-10-14 3:25 ` [PATCH v4 2/3] t0410: adapt tests to repack changes Han Young
2024-10-14 3:25 ` [PATCH v4 3/3] partial-clone: update doc Han Young
2024-10-21 22:29 ` [WIP 0/3] Repack on fetch Jonathan Tan
2024-10-21 22:29 ` [WIP 1/3] move variable Jonathan Tan
2024-10-21 22:29 ` [WIP 2/3] pack-objects Jonathan Tan
2024-10-21 22:29 ` [WIP 3/3] record local links and call pack-objects Jonathan Tan
2024-10-23 7:00 ` [External] [WIP 0/3] Repack on fetch Han Young
2024-10-23 17:03 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241001191811.1934900-1-calvinwan@google.com \
--to=calvinwan@google.com \
--cc=emrass@google.com \
--cc=git@vger.kernel.org \
--cc=hanyang.tony@bytedance.com \
--cc=jonathantanmy@google.com \
--cc=phillip.wood123@gmail.com \
--cc=sokcevic@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).